Code documentation¶
Setting up the EnergyOptimizer object¶
The first step is to create the EnergyOptimizer object and update it with the properties of the desired dataset. All these properties can be set either in the constructor or one-by-one with setter functions. Warning: do not manually change any method variables without using the appropriate functions.
-
class
eecr.eeoptimizer.
EnergyOptimizer
(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None)¶ -
__init__
(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None)¶ - Parameters
sequence – a sequence of contexts to be used for testing the energy-efficiency of settings. Their ordering and proportions should be representative for the domain
contexts – the list of contexts in the domain. If not provided, they will be inferred as all unique elements of the
sequence
parameter in the alphabetical ordersetting_to_energy – a dictionary, where the keys are the possible settings and the values are the energy costs (per time unit) when using that setting
setting_to_sequence – a dictionary, where the keys are the possible settings and the values are lists of contexts. Each list represents the original sequence (see
set_sequence()
), classified using the corresponding settingquality_metric – a function that maps a confusion matrix to a quality indicator. Accuracy score is used by default
sensors_off – the energy cost per time unit when the system is sleeping
sensors_on – the energy cost per time unit when the system is working
path – the root of the path used for saving and loading objects (e.g. with
save_data()
orload_data()
)
-
set_dca_costs
(cost_off, cost_on)¶ Sets the base energy costs for duty-cycling.
This costs will be used for all DCA methods (e.g.
dca_model()
,dca_real()
,find_dca_tradeoffs()
etc.). This overrides the default costs of 1 when the system is working and 0 when the system is sleeping.- Parameters
cost_off – the energy cost per time unit when the system is sleeping
cost_on – the energy cost per time unit when the system is working. If a setting is specified when using any of the DCA methods this value will be ignored and the energy cost of that setting will be used instead
-
set_path
(path)¶ Sets the path to a folder from which the data is loaded and to which data is saved.
- Parameters
path – relative path to a folder
-
set_sequence
(sequence)¶ Sets the sequence of contexts that will be used for testing the energy-efficiency of settings.
- Parameters
sequence – a sequence of contexts. Their ordering and proportions should be representative for the domain
-
set_settings
(setting_to_sequence, setting_to_energy=None, setting_fn_energy=None)¶ Defines the possible settings and their performance/energy for the optimization.
Either
setting_to_energy
orsetting_fn_energy parameter
must be provided. Providing both or neither may result in unexpected behavior.- Parameters
setting_to_sequence – a dictionary, where the keys are the possible settings and the values are lists of contexts. Each list represents the original sequence (see
set_sequence()
), classified using the corresponding settingsetting_to_energy – a dictionary, where the keys are the possible settings and the values are the energy costs (per time unit) when using that setting
setting_fn_energy – a function to be used if the
setting_to_energy
is None. This function should take a setting as the input and return energy costs (per time unit) when using that setting
-
Summarizing its properties¶
Use these functions for a quick look-up into created EnergyOptimizer object.
-
class
eecr.eeoptimizer.
EnergyOptimizer
(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None) -
energy_quality
()¶ Returns a summary of all settings and their classification/energy performances.
- Returns
a dictionary, where the keys are the possible settings and values are (q,e) tuples, where the “q” represents the classification performance according to the specified quality metric and “e” represents the energy cost of that setting.
-
quality
()¶ Returns a summary of all settings and their classification performances.
- Returns
a dictionary, where the keys are the possible settings and values the classification performances according the specified quality metric
-
summary
()¶ Prints a summary (mathematical properties) of the dataset.
-
Searching for and evaluating configurations¶
The meat of the module. It is able to automatically find three types of energy-efficient solutions and predict their performance.
First type of solutions are the SCA configurations. Such a configuration is a list of settings (e.g. [s1, s2, s3]) where settings can be any hashable object. A configuration represents a system that works as follows: context c1 is detected, system switches to using setting s1, and keeps using it until a context change is detected. Then if the context c_i is detected the setting s_i is used (s_i is the i-th setting in the given configuration).
Second type of solutions are the DCA configurations. Such a configuration is a list of integers (e.g. [5,4,2]). A configuration represents a system that works as follows: context c1 is detected and the system stop working for (s_i)-1 time periods (s_i is the i-th entry in the configuration). Then it works for one (or more if so explicitly specified) time periods. The last classified context in this active period determines the length of the next sleeping period.
Third type of solutions are the SCA-DCA configurations. They essentially do both: switching both the setting in use and the length of the sleeping period. They are represented in the form (SCA_configuration, DCA_configuration).
Any such configuration can be evaluated using two different mechanism. One is the simulation,
and the other is the mathematical prediction (based on the dataset properties).
The latter approach is usually roughly 100 times faster and almost as accurate. In all cases
the evaluations of configurations are returned as the list of tuples [(q1,e1), (q2,e2), …]
where q
represents the quality metrics of choice (e.g. accuracy) and e
represents
the energy cost.
All functions in this segment follow roughly the same pattern of naming, parameters and return types. Methods that evaluate configurations have the form similar to func:~EnergyOptimizer.sca_model - evaluates SCA configurations using the mathematical model. Methods that automatically find the configurations have the form similar to: func:~EnergyOptimizer.find_sca_tradeoffs - automatically finds SCA configurations.
-
class
eecr.eeoptimizer.
EnergyOptimizer
(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None) -
dca_model
(configurations, active=1, setting=None, cf=None, max_cycle=None, energy_costs=None, name=None)¶ Tests the DCA configurations using the DCA mathematical model.
- Parameters
configurations – a list of DCA configurations. A configuration is a list of the same length as the number of contexts, each element being an integer >= 1. Optionally this parameter can be a single configuration instead of a list
active – the length of the active period. For most purposes this length should be left as the default of 1
setting – the setting that is in use while duty-cycling. If None is provided the function assumes that the classification accuracy is 100% and the energy cost is 1
cf – uses the given confusion matrix (must be normalized, so each row sums to 1) instead of the one prescribed by the
setting
parametermax_cycle – the maximum duty-cycle length out of any found in configurations. If set the evaluation time will be slightly faster
energy_costs – internal parameter (its value should not be changed)
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
- Returns
a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration
-
dca_real
(configurations, setting=None, active=1, name=None)¶ Tests the DCA configurations using a simulation.
This method reads from a sequence (either base one or one that corresponds to the given setting), skipping some in a way that simulates duty-cycling in a real-life environment. This is slower than using the mathematical model (
dca_model()
).- Parameters
configurations – a list of DCA configurations. A configuration is a list of the same length as the number of contexts, each element being an integer >= 1. Optionally this parameter can be a single configuration instead of a list
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
setting – the setting that is in use while duty-cycling. If None is provided the function assumes that the classification accuracy is 100% and the energy cost is 1
active – the length of the active period. For most purposes this length should be left as the default of 1
- Returns
a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration
-
find_dca_random
(n_samples=100, max_cycle=10)¶ Returns
n_samples
random DCA configurations.- Parameters
n_samples – number of configurations to return
max_cycle – the maximum desired duty-cycle length for the configurations
- Returns
a list of configurations
-
find_dca_static
(max_cycle, name=None, setting=None, active=1)¶ Returns all configurations where the same duty-cycle length is used for all contexts.
- Parameters
max_cycle – the maximum desired duty-cycle length for the configurations to be generated
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
setting – the setting that is in use while duty-cycling. If None is provided the function assumes that the classification accuracy is 100% and the energy cost is 1
active – the length of the active period. For most purposes this length should be left as the default of 1
- Returns
two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).
-
find_dca_tradeoffs
(max_cycle=10, active=1, seeded=True, setting=None, cf=None, name=None, energy_costs=None, ngen=200)¶ Attempts to find the best DCA trade-offs for the current dataset.
- Parameters
max_cycle – the maximum desired duty-cycle length for the configurations
active – the length of the active period. For most purposes this length should be left as the default of 1
seeded – if true, the search starts always start with two specific configuration in the starting population. One configuration uses the mininum and the other the maximum duty-cycle length
setting – the setting that is in use while duty-cycling. If None is provided the function assumes that the classification accuracy is 100% and the energy cost is 1
cf – uses the given confusion matrix (must be normalized, so each row sums to 1) instead of the one prescribed by the
setting
parametername – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
energy_costs – internal parameter (its value should not be changed)
ngen – the number of generations in the NSGA_II search
- Returns
two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).
-
find_sca_dca_tradeoffs
(sca_configurations=None, sca_tradeoffs=None, dca_indices=None, n_points=5, binary_representation=False, name=None, max_cycle=10, active=1, verbose=False, cstree=False)¶ Attempts to find the best SCA-DCA trade-offs for the current dataset.
- Parameters
sca_tradeoffs – if Pareto-optimal trade-ofs were already precalculated using
find_sca_tradeoffs()
they can be set using this parameter to avoid calculating them againsca_configurations – if
sca_tradeoffs
parameter is set, this parameter should list the configurations from which thesca_tradeoffs
were generateddca_indices – index of the configurations in
sca_configurations
to be expanded using the DCA method. If not set, indices will be determined automatically by making them equidistantn_points – the number of sca configurations selected for expanding with the dca method
binary_representation – a flag indicating that settings are represented by a binary list. In this case, a different mutation/crossover will be used for the NSGA-II algorithm
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
max_cycle – the maximum desired duty-cycle length for the configurations
active – the length of the active period. For most purposes this length should be left as the default of 1
verbose – if true, the function prints out the current progress
cstree – a flag indicating that cost-sensitive trees are used. Makes the evaluation more accurate if used
- Returns
two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).
-
find_sca_random
(n_samples=100)¶ Returns
n_samples
random SCA configurations.- Parameters
n_samples – number of configurations to return
- Returns
a list of configurations
-
find_sca_static
(name=None)¶ Returns all Pareto-optimal configurations where the same setting is used for all contexts.
- Parameters
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
- Returns
two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).
-
find_sca_tradeoffs
(binary_representation=False, name=None, cstree=False)¶ Attempts to find best SCA trade-offs for the current dataset.
Uses the NSGA-II algorithm to search the space of different configurations and finds and returns the Pareto-optimal ones.
- Parameters
binary_representation – a flag indicating that settings are represented by a binary list. In this case, a different mutation/crossover will be used for the NSGA-II algorithm
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
cstree – a flag indicating that cost-sensitive trees are used. Makes the evaluation more accurate if used
- Returns
two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).
-
sca_dca_model
(configurations, active=1, cstree=False, name=None)¶ Tests the SCA-DCA configurations using a simulation.
This combines the simulations from (see
sca_real()
) and (seesca_real()
). It is slower than using the mathematical model (seesca_dca_model()
).- Parameters
configurations – a list of SCA-DCA configurations. A configuration can have the same syntax as the SCA configuration (see
sca_real()
) or it can be represented as a tuple, where the first element is a SCA configuration and the second element is a DCA configuration (seesca_real()
)active – the length of the active period. For most purposes this length should be left as the default of 1
cstree – a flag indicating that cost-sensitive trees are used. Makes the evaluation more accurate if used
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
- Returns
a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration
-
sca_dca_real
(configurations, active=1, name=None, cstree_energy=False)¶ Tests the SCA-DCA configurations using a simulation.
This combines the simulations from (see
sca_real()
) and (seesca_real()
). It is slower than using the mathematical model (seesca_dca_model()
).- Parameters
configurations – a list of SCA-DCA configurations. A configuration can have the same syntax as the SCA configuration (see
sca_real()
) or it can be represented as a tuple, where the first element is a SCA configuration and the second element is a DCA configuration (seesca_real()
)active – the length of the active period. For most purposes this length should be left as the default of 1
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
cstree_energy – this flag slightly increases the accuracy of the simulation when using the cost-sensitive decision trees. In order to use it, the energy sequence must be precalculated, by using
cstree_energy
flag when generating the trees (e.g. inadd_csdt_weighted()
oradd_csdt_borders()
)
- Returns
a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration
-
sca_model
(configurations, name=None, cstree=False, encrypted=False)¶ Tests the SCA configurations using the SCA mathematical model.
- Parameters
configurations – a list of SCA configurations. A configuration is a list of the same length as the number of contexts, each element being a setting. Optionally this parameter can be a single configuration instead of a list
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
cstree – a flag indicating that cost-sensitive trees are used. Makes the evaluation more accurate if used
encrypted – a flag for internal use (do not change its value)
- Returns
a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration
-
sca_real
(configurations, name=None, cstree_energy=False)¶ Tests the SCA configurations using a simulation.
This method reads from different sequences (that correspond to different settings), switching between them as context changes. This simulates a real-life environment where contexts are classified one-by-one using different system settings. This is slower than using the mathematical model instead (
sca_model()
).- Parameters
configurations – a list of SCA configurations. A configuration is a list of the same length as the number of contexts, each element being a setting. Optionally this parameter can be a single configuration instead of a list
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
cstree_energy – this flag slightly increases the accuracy of the simulation when using the cost-sensitive decision trees. In order to use it, the energy sequence must be precalculated, by using
cstree_energy
flag when generating the trees (e.g. inadd_csdt_weighted()
oradd_csdt_borders()
)
- Returns
a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration
-
sca_simple
(configurations, name=None)¶ Tests the SCA configurations using a simple mathematical model.
The model used simply multiplies the performance of each setting with the expected proportion of time that setting is in use. E.g if configuration consists of two settings, one with accuracy 60% and other with 100%, and if both contexts appear equally often, then the expected accuracy is 80%.
- Parameters
configurations – a list of SCA configurations. A configuration is a list of the same length as the number of contexts, each element being a setting. Optionally this parameter can be a single configuration instead of a list
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
- Returns
a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration
-
Saving and loading data¶
To avoid repeated processing of the data, many components can be quickly saved and loaded. Many functions have this
functionality already built-in by using the name
parameter. In addition, the following functions are specialized for the
task. Make sure the path is correctly set before using them. Additionally note, that all saving and loading uses pickle
in the background.
-
class
eecr.eeoptimizer.
EnergyOptimizer
(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None) -
load_config
(name)¶ Loads the settings information from a file (instead of calling
set_settings()
)- Parameters
name – relative path to a file
-
load_data
(name)¶ Loads the base sequence from a file (instead of calling
set_sequence()
).- Parameters
name – relative path to a file
-
load_data_config
(data_name=None, config_name=None, sample_dataset=None)¶ Loads the sequence and settings information from files.
This is equivalent to sequentially calling the
load_data()
andload_config()
functions.- Parameters
data_name – relative path to a file with the sequence
config_name – relative path to a file with the settings information
sample_dataset – if specified, a sample dataset will be loaded (currently only “SHL” keyword works)
-
load_solution
(name)¶ Loads a set of energy-efficient solutions from a file.
- Parameters
name – relative path to a file
- Returns
two lists representing different tradeoffs. First list contains the configurations, while the other their evaluations. Evaluations are represented by a tuple with the form (quality, energy)
-
save_config
(name)¶ Save the settings information (set with
set_settings()
) to a file.- Parameters
name – relative path to a file
-
save_data
(name)¶ Saves the base sequence set with
set_settings()
to a file.- Parameters
name – relative path to a file
-
save_solution
(configurations, values, name)¶ Saves a set of energy-efficient solutions from a file.
- Parameters
configurations – a list of configurations
values – a list of configuration evaluation in form of (quality, energy)
name – relative path to a file
-
Automatic settings generators¶
Sometimes the generation of settings can be automated. Two of the methods listed
(add_csdt_weighted()
, add_csdt_borders()
)
automate only settings represented by the cost-sensitive decision trees. The last
one (add_subsets()
), automates the task of creating settings where
each setting represents using a different attribute subset.
-
class
eecr.eeoptimizer.
EnergyOptimizer
(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None) -
add_csdt_borders
(cs_tree, x1, y1, x2, y2, x_p=None, y_p=None, test_fn=None, buffer_range=1, name=None, cstree_energy=False, verbose=False, weights_range=None, energy_range=None, n_tree=15)¶ Generates different CS-DTs from data around different contexts.
When generating these cost-sensitive decision trees, only data from one contexts (and instances around it) are taken. This should create a tree that is good at recognizing that context and contexts to which it frequently transitions to. This method also tries different weights, same as the
add_csdt_weighted()
method. Generated CS-DT get added to the list of possible settings.- Parameters
cs_tree – the base cost-sensitive tree object (
eecr.cstree.CostSensitiveTree
) to use for generation of its variantsx1 – a pandas dataframe of attributes to be used as the training set
y1 – a pandas series of labels for the instances in
x1
x2 – a pandas dataframe of attributes to be used as the testing set
y2 – a pandas series of labels for the instances in
x2
x_p – (optional) a pandas dataframe of attributes to be used as the pruning set
y_p – (optional) a pandas series of labels for the instances in
x_p
buffer_range – specifies how many instances to take after a contexts transitions
test_fn – a function used for measuring energy while testing a CS-DT, if it need to be different than the one used for training
verbose – if true, the function prints out the current progress
weights_range – a tuple with two elements, representing the minimum and maximum weight to be used. Trees will use
n_tree
different weights equidistantly sampled between these two extreme values. Sensible weight range must be determined manually, but as a heuristic, it is usually inversely proportional to the energy costs. For example if energy costs are all >1 then weights should all be <1.energy_range – An alternative to the
weights_range
parameter, setting this to x is equivalent to settingweights_range
to 1/nn_tree – number of different CS-DT to be generated
name – None or relative path to a file. If latter, the generated settings will be saved
cstree_energy – Generate accurate energy sequences that can be used to increase the accuracy of the
sca_real()
. It makes this function much slower
- Returns
the set of generated CS-DTs
-
add_csdt_weighted
(cs_tree, x1, y1, x2, y2, x_p=None, y_p=None, test_fn=None, verbose=True, weights_range=None, energy_range=None, n_tree=15, name=None, cstree_energy=False)¶ Generates different CS-DTs with different ratios between energy and classification quality.
When generating cost-sensitive decision trees, at each node a decision is made on whether an attribute is worth using (based on its informativeness and cost). Different weights can be used to skew the decision in one way or another. Generated CS-DT get added to the list of possible settings.
- Parameters
cs_tree – the base cost-sensitive tree object (
eecr.cstree.CostSensitiveTree
) to use for generation of its variantsx1 – a pandas dataframe of attributes to be used as the training set
y1 – a pandas series of labels for the instances in
x1
x2 – a pandas dataframe of attributes to be used as the testing set
y2 – a pandas series of labels for the instances in
x2
x_p – (optional) a pandas dataframe of attributes to be used as the pruning set
y_p – (optional) a pandas series of labels for the instances in
x_p
test_fn – a function used for measuring energy while testing a CS-DT, if it need to be different than the one used for training
verbose – if true, the function prints out the current progress
weights_range – a tuple with two elements, representing the minimum and maximum weight to be used. Trees will use
n_tree
different weights equidistantly sampled between these two extreme values. Sensible weight range must be determined manually, but as a heuristic, it is usually inversely proportional to the energy costs. For example if energy costs are all >1 then weights should all be <1.energy_range – An alternative to the
weights_range
parameter, setting this to x is equivalent to settingweights_range
to 1/nn_tree – number of different CS-DT to be generated
name – None or relative path to a file. If latter, the generated settings will be saved
cstree_energy – Generate accurate energy sequences that can be used to increase the accuracy of the
sca_real()
. It makes this function much slower
- Returns
the set of generated CS-DTs
-
add_subsets
(x1, y1, x2, y2, classifier, setting_to_energy=None, setting_fn_energy=None, subsettings=None, subsetting_to_features=None, n=0, feature_groups=None, setting_fn_features=None, name=None, y_p=None, x_p=None, csdt=False, csdt_fn_energy=None, cstree_energy=False)¶ Automatically generates settings by taking different attribute subsets from a database
This method assumes you have a pandas dataframe where each column represent a different attribute. Each row represents an instance to be classify. However, each attribute has a cost and thus the goal is to classify the instances with as small subset as possible. The cost is often shared between the attributes: e.g. if two attributes are calculated from the GPS stream, then having one or both has the same cost (the cost of having the GPS open). It may also be the case that using some sensor increases or decreases the cost of another sensor as they share resources when used.
There are several ways of specifying which subsets to transform into settings. Use the one most convenient for the current domain. In all examples we will assume that data comes from different sensors (subsettings) and a setting is set of sensors that is in use. If the data (and costs) comes from different sources a similar logic applies.
Set
subsettings
parameter as a list of sensors andsubsetting_to_features
as a dictionary that maps each sensor to attribute list (attributes calculated from that sensor)Set
feature_groups
as a list of lists. Each list represents features that should all be included or excluded in any attribute subset used (e.g. they come from the same sensor). Setn
as the number offeature_groups
.Set
n
as the number of sensors, setsetting_fn_features
as a function that takes a binary string of lengthn
and outputs a list of attributes. Character i in binary string represents whether i-th sensor is active.
This method is not a substitute for attribute selection.
- Parameters
x1 – a pandas dataframe of attributes to be used as the training set
y1 – a pandas series of labels for the instances in
x1
x2 – a pandas dataframe of attributes to be used as the testing set
y2 – a pandas series of labels for the instances in
x2
x_p – (optional) a pandas dataframe of attributes to be used as the pruning set
y_p – (optional) a pandas series of labels for the instances in
x_p
classifier – any classifier (tested with sklearn and CS-DTs). Should provide functions “fit” and “predict”
setting_to_energy – a dictionary, where the keys are the possible settings and the values are the energy costs (per time unit) when using that setting. Settings are binary string
setting_fn_energy – a function to be used if the
setting_to_energy
is None. This function should take a setting as the input and return energy costs (per time unit) when using that setting. Settings are binary stringsubsettings – the list of subsettings
subsetting_to_features – a dictionary that maps subsettings to attribute sets
n – the number of subsettings
feature_groups – a list of lists, elements of the sublists are always all included or excluded
setting_fn_features – a function that maps settings to attribute sets
name – None or relative path to a file. If latter, the generated settings will be saved
x_p – (optional) a pandas dataframe of attributes to be used as the pruning set
y_p – (optional) a pandas series of labels for the instances in
x_p
csdt – if the classifier is a CS-DT set this flag to true
csdt_fn_energy – a function used for measuring energy while testing a CS-DT, if it need to be different than the one used for training
cstree_energy – If using CS-DTs it generates accurate energy sequences that can be used to increase the accuracy of the
sca_real()
. It makes this function much slower
- Returns
a list of generated classifiers
-
Visualizing solutions¶
Most functions in the ee.eeutility module are meant for internal use.
However, the draw_tradeoffs()
provides an easy way
to visualize the results achieved with other methods.
-
eecr.eeutility.
draw_tradeoffs
(plots, labels, xlim=None, ylim=None, name=None, reverse=True, pareto=False, short=False, points=None, folder='artificial', percentage=True, percentage_energy=False, scatter_indices=None, color_indices=None, text_factor=50, ylabel='Energy', dotted_indices=None, thick_indices=None, xlabel='Classification error')¶ A tool for quick visualization of different trade-offs.
Essentially a wrapper around matplotlib.pyplot that makes drawing and comparing different sets of trade-offs easier. As an input it expects a list of trade-offs and can plot them in the way that is standard for Pareto fronts: step-wise with the quality axis reversed so that the ideal point lies in the lower-left corner. It streamlines some other aspects, for example labeling, saving and making sure only Pareto-optimal points are drawn.
While some utility parameters are presents for modifying the look of the graph, it is recommended to use matplotlib.pyplot library directly for any complex drawing task.
- Parameters
plots – a list of one or more trade-off sets. Each trade-off set is for example a result of a different energy-optimization function. They can be represented in two different ways. Either as list of tuples (each tuples representing quality, energy) or a tuple of two lists (first list containing quality, second energy). All outputs returned from energy-optimization functions already fit this criteria.
labels – a list of labels in the same order as the trade-offs in the
plots
parameterxlim – a tuple with min and max for x-axis
ylim – a tuple with min and max for y-axis
name – None or name of a file where the figure is saved
reverse – reverses the x-axis
pareto – only Pareto-optimal points are drawn
short – if true, the graph will be drawn as square instead of a rectangle
points – a list of points of to be drawn [(x1,y1,s1), (x2,y2,s2)]. s1, s2, etc. are optional and provide a way to annotate points
folder – None or name of a folder where the figure is saved
percentage – multiplies the x-axis by 100 (in order to transform the accuracy of 0.45 into 45%)
percentage_energy – multiplies the x-axis by 100
scatter_indices – indices of trade-offs sets that should be drawn un-connected
color_indices – if not None, each element of this list determines the index of a color (trade-offs with the same index will be drawn with the same color)
text_factor – changing this parameter moves the label from/away the annotated
points
ylabel – label for the y-axis
dotted_indices – indices of trade-offs sets that should be drawn with dots
thick_indices – indices of trade-offs sets that should be drawn thicker than the rest
xlabel – label for the x-axis
Alternative search methods¶
Methods below are based on the related work, but solve the same kind of problem as the methodology presented in this module. They are here included as possible alternatives and for easier comparison of the approaches.
-
class
eecr.eeoptimizer.
EnergyOptimizer
(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None) -
find_aimd_tradeoffs
(increase_range=None, decrease_range=None, name=None, active=1)¶ Solves the duty-cycle assignment problem in a different ways than the DCA methods
Imagine a system with parameters
inc
anddec
that works as follows: 1.) System duty-cycles with a cycle of lengthlen
. 2.) If the next detected context is the same as the previous one:len = len + inc
. 3.) If the next detected context is different than the previous one:len = len * dec
. This method finds good combinations ofinc
anddec
from a specified range.This method is based on paper: Au, Lawrence K., et al. “Episodic sampling: Towards energy-efficient patient monitoring with wearable sensors.” 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2009.
- Parameters
increase_range – a list of possible
inc
valuesdecrease_range – a list of possible
dec
valuesname – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
active – the length of the active part of the duty-cycle
- Returns
two lists, the first contains pareto-optimal configurations in the form (
inc
,dec
), the second their evaluations in the form (quality, energy).
-
find_coh_tradeoffs
(alphas=None, name=None)¶ Finds the SCA configurations based on context transition probabilities.
While the method behind this is completely different than
find_sca_tradeoffs()
, it returns the same kind of output, including the same type of configurations.This method is based on paper: Gordon, Dawud, Jürgen Czerny, and Michael Beigl. “Activity recognition for creatures of habit.” Personal and ubiquitous computing 18.1 (2014): 205-221.
- Parameters
alphas – a list of different values of parameter alpha to test (0 <= alpha <=1), see the paper for details
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
- Returns
two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).
-
find_simple_tradeoffs
(name=None)¶ Finds the SCA configurations based on simple mathematical model.
It works fast, but the results are often much worse than the alternatives. Finds either one or two trade-offs.
- Parameters
name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs
- Returns
two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).
-
Cost-sensitive decision trees¶
Cost-sensitive trees are a version of the decision tree classifier that is interested in both attribute informativeness and attribute costs when building the tree. To explain it briefly: at each tree node – instead of checking each attribute’s information gain or a similar metric – we compare the expected cost of misclassification if this node becomes a leaf, with the expected cost of the attribute and misclassification if the attribute is used to further divide the instances. The attribute that reduces the expected cost the most, if any, then expands this node.
This particular implementation of the cost-sensitive trees is optimized for the tasks in context-recognition. It can be used as a stand-alone method for reducing the energy consumption or it can be used in conjuction with other methods described in this module.
-
class
eecr.cstree.
CostSensitiveTree
(contexts, cost_function, feature_to_sensor=None, feature_groups=None, tree_type='default', min_samples=1, extension=0, default=None, weight=1)¶ -
__init__
(contexts, cost_function, feature_to_sensor=None, feature_groups=None, tree_type='default', min_samples=1, extension=0, default=None, weight=1)¶ Constructor for the cost-sensitive decision tree.
Either (but not both)
feature_to_sensor
orfeature_groups
must be set.- Parameters
contexts – a list of contexts to be recognized
cost_function – maps a set of sensors into the energy cost
feature_to_sensor – a dictionary that maps each attribute to a sensor
feature_groups – a dictionary that maps each sensor to a list of attributes
tree_type – there are three types of trees that can be generated: “default”, “pruned” and “batched”. The pruned version prunes the tree using a dedicated pruning set after tree generation. “batched” version tries to place attributes that share energy costs as close together as possible. This results in a longer train time, but usually slightly better performance
min_samples – the minimum number of samples in a non-leaf node
extension – if this is set as >0 the tree building process will sometimes expand nodes even if energy-inefficient in anticipation that the decision will pay off later in the tree. It prunes these branches if the anticipation proves to be wrong. This results in a longer train time, but usually slightly better performance (in practice extension=1 proved best)
default – context to be classified in case of an empty tree
weight – weight of the energy cost when compared to the misclassification cost
-
fit
(x1, y1, x_p=None, y_p=None)¶ Trains the tree classifier
- Parameters
x1 – a pandas dataframe of attributes to be used as the training set
y1 – a pandas series of labels for the instances in
x1
x_p – (optional) a pandas dataframe of attributes to be used as the pruning set
y_p – (optional) a pandas series of labels for the instances in
x_p
-
predict
(x2)¶ Tests the tree classifier
- Parameters
x2 – a pandas dataframe of attributes to be used as the test set
- Returns
a list of predictions, where the i-th element is the prediction for i-th row in x2
-
show
()¶ Prints the structure of the tree.
-
show_sensors
()¶ Prints the structure of the tree, abstracted so only different sensors are shown.
-