Code documentation

Setting up the EnergyOptimizer object

The first step is to create the EnergyOptimizer object and update it with the properties of the desired dataset. All these properties can be set either in the constructor or one-by-one with setter functions. Warning: do not manually change any method variables without using the appropriate functions.

class eecr.eeoptimizer.EnergyOptimizer(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None)
__init__(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None)
Parameters
  • sequence – a sequence of contexts to be used for testing the energy-efficiency of settings. Their ordering and proportions should be representative for the domain

  • contexts – the list of contexts in the domain. If not provided, they will be inferred as all unique elements of the sequence parameter in the alphabetical order

  • setting_to_energy – a dictionary, where the keys are the possible settings and the values are the energy costs (per time unit) when using that setting

  • setting_to_sequence – a dictionary, where the keys are the possible settings and the values are lists of contexts. Each list represents the original sequence (see set_sequence()), classified using the corresponding setting

  • quality_metric – a function that maps a confusion matrix to a quality indicator. Accuracy score is used by default

  • sensors_off – the energy cost per time unit when the system is sleeping

  • sensors_on – the energy cost per time unit when the system is working

  • path – the root of the path used for saving and loading objects (e.g. with save_data() or load_data())

set_dca_costs(cost_off, cost_on)

Sets the base energy costs for duty-cycling.

This costs will be used for all DCA methods (e.g. dca_model(), dca_real(), find_dca_tradeoffs() etc.). This overrides the default costs of 1 when the system is working and 0 when the system is sleeping.

Parameters
  • cost_off – the energy cost per time unit when the system is sleeping

  • cost_on – the energy cost per time unit when the system is working. If a setting is specified when using any of the DCA methods this value will be ignored and the energy cost of that setting will be used instead

set_path(path)

Sets the path to a folder from which the data is loaded and to which data is saved.

Parameters

path – relative path to a folder

set_sequence(sequence)

Sets the sequence of contexts that will be used for testing the energy-efficiency of settings.

Parameters

sequence – a sequence of contexts. Their ordering and proportions should be representative for the domain

set_settings(setting_to_sequence, setting_to_energy=None, setting_fn_energy=None)

Defines the possible settings and their performance/energy for the optimization.

Either setting_to_energy or setting_fn_energy parameter must be provided. Providing both or neither may result in unexpected behavior.

Parameters
  • setting_to_sequence – a dictionary, where the keys are the possible settings and the values are lists of contexts. Each list represents the original sequence (see set_sequence()), classified using the corresponding setting

  • setting_to_energy – a dictionary, where the keys are the possible settings and the values are the energy costs (per time unit) when using that setting

  • setting_fn_energy – a function to be used if the setting_to_energy is None. This function should take a setting as the input and return energy costs (per time unit) when using that setting

Summarizing its properties

Use these functions for a quick look-up into created EnergyOptimizer object.

class eecr.eeoptimizer.EnergyOptimizer(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None)
energy_quality()

Returns a summary of all settings and their classification/energy performances.

Returns

a dictionary, where the keys are the possible settings and values are (q,e) tuples, where the “q” represents the classification performance according to the specified quality metric and “e” represents the energy cost of that setting.

quality()

Returns a summary of all settings and their classification performances.

Returns

a dictionary, where the keys are the possible settings and values the classification performances according the specified quality metric

summary()

Prints a summary (mathematical properties) of the dataset.

Searching for and evaluating configurations

The meat of the module. It is able to automatically find three types of energy-efficient solutions and predict their performance.

First type of solutions are the SCA configurations. Such a configuration is a list of settings (e.g. [s1, s2, s3]) where settings can be any hashable object. A configuration represents a system that works as follows: context c1 is detected, system switches to using setting s1, and keeps using it until a context change is detected. Then if the context c_i is detected the setting s_i is used (s_i is the i-th setting in the given configuration).

Second type of solutions are the DCA configurations. Such a configuration is a list of integers (e.g. [5,4,2]). A configuration represents a system that works as follows: context c1 is detected and the system stop working for (s_i)-1 time periods (s_i is the i-th entry in the configuration). Then it works for one (or more if so explicitly specified) time periods. The last classified context in this active period determines the length of the next sleeping period.

Third type of solutions are the SCA-DCA configurations. They essentially do both: switching both the setting in use and the length of the sleeping period. They are represented in the form (SCA_configuration, DCA_configuration).

Any such configuration can be evaluated using two different mechanism. One is the simulation, and the other is the mathematical prediction (based on the dataset properties). The latter approach is usually roughly 100 times faster and almost as accurate. In all cases the evaluations of configurations are returned as the list of tuples [(q1,e1), (q2,e2), …] where q represents the quality metrics of choice (e.g. accuracy) and e represents the energy cost.

All functions in this segment follow roughly the same pattern of naming, parameters and return types. Methods that evaluate configurations have the form similar to func:~EnergyOptimizer.sca_model - evaluates SCA configurations using the mathematical model. Methods that automatically find the configurations have the form similar to: func:~EnergyOptimizer.find_sca_tradeoffs - automatically finds SCA configurations.

class eecr.eeoptimizer.EnergyOptimizer(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None)
dca_model(configurations, active=1, setting=None, cf=None, max_cycle=None, energy_costs=None, name=None)

Tests the DCA configurations using the DCA mathematical model.

Parameters
  • configurations – a list of DCA configurations. A configuration is a list of the same length as the number of contexts, each element being an integer >= 1. Optionally this parameter can be a single configuration instead of a list

  • active – the length of the active period. For most purposes this length should be left as the default of 1

  • setting – the setting that is in use while duty-cycling. If None is provided the function assumes that the classification accuracy is 100% and the energy cost is 1

  • cf – uses the given confusion matrix (must be normalized, so each row sums to 1) instead of the one prescribed by the setting parameter

  • max_cycle – the maximum duty-cycle length out of any found in configurations. If set the evaluation time will be slightly faster

  • energy_costs – internal parameter (its value should not be changed)

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

Returns

a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration

dca_real(configurations, setting=None, active=1, name=None)

Tests the DCA configurations using a simulation.

This method reads from a sequence (either base one or one that corresponds to the given setting), skipping some in a way that simulates duty-cycling in a real-life environment. This is slower than using the mathematical model ( dca_model()).

Parameters
  • configurations – a list of DCA configurations. A configuration is a list of the same length as the number of contexts, each element being an integer >= 1. Optionally this parameter can be a single configuration instead of a list

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

  • setting – the setting that is in use while duty-cycling. If None is provided the function assumes that the classification accuracy is 100% and the energy cost is 1

  • active – the length of the active period. For most purposes this length should be left as the default of 1

Returns

a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration

find_dca_random(n_samples=100, max_cycle=10)

Returns n_samples random DCA configurations.

Parameters
  • n_samples – number of configurations to return

  • max_cycle – the maximum desired duty-cycle length for the configurations

Returns

a list of configurations

find_dca_static(max_cycle, name=None, setting=None, active=1)

Returns all configurations where the same duty-cycle length is used for all contexts.

Parameters
  • max_cycle – the maximum desired duty-cycle length for the configurations to be generated

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

  • setting – the setting that is in use while duty-cycling. If None is provided the function assumes that the classification accuracy is 100% and the energy cost is 1

  • active – the length of the active period. For most purposes this length should be left as the default of 1

Returns

two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).

find_dca_tradeoffs(max_cycle=10, active=1, seeded=True, setting=None, cf=None, name=None, energy_costs=None, ngen=200)

Attempts to find the best DCA trade-offs for the current dataset.

Parameters
  • max_cycle – the maximum desired duty-cycle length for the configurations

  • active – the length of the active period. For most purposes this length should be left as the default of 1

  • seeded – if true, the search starts always start with two specific configuration in the starting population. One configuration uses the mininum and the other the maximum duty-cycle length

  • setting – the setting that is in use while duty-cycling. If None is provided the function assumes that the classification accuracy is 100% and the energy cost is 1

  • cf – uses the given confusion matrix (must be normalized, so each row sums to 1) instead of the one prescribed by the setting parameter

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

  • energy_costs – internal parameter (its value should not be changed)

  • ngen – the number of generations in the NSGA_II search

Returns

two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).

find_sca_dca_tradeoffs(sca_configurations=None, sca_tradeoffs=None, dca_indices=None, n_points=5, binary_representation=False, name=None, max_cycle=10, active=1, verbose=False, cstree=False)

Attempts to find the best SCA-DCA trade-offs for the current dataset.

Parameters
  • sca_tradeoffs – if Pareto-optimal trade-ofs were already precalculated using find_sca_tradeoffs() they can be set using this parameter to avoid calculating them again

  • sca_configurations – if sca_tradeoffs parameter is set, this parameter should list the configurations from which the sca_tradeoffs were generated

  • dca_indices – index of the configurations in sca_configurations to be expanded using the DCA method. If not set, indices will be determined automatically by making them equidistant

  • n_points – the number of sca configurations selected for expanding with the dca method

  • binary_representation – a flag indicating that settings are represented by a binary list. In this case, a different mutation/crossover will be used for the NSGA-II algorithm

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

  • max_cycle – the maximum desired duty-cycle length for the configurations

  • active – the length of the active period. For most purposes this length should be left as the default of 1

  • verbose – if true, the function prints out the current progress

  • cstree – a flag indicating that cost-sensitive trees are used. Makes the evaluation more accurate if used

Returns

two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).

find_sca_random(n_samples=100)

Returns n_samples random SCA configurations.

Parameters

n_samples – number of configurations to return

Returns

a list of configurations

find_sca_static(name=None)

Returns all Pareto-optimal configurations where the same setting is used for all contexts.

Parameters

name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

Returns

two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).

find_sca_tradeoffs(binary_representation=False, name=None, cstree=False)

Attempts to find best SCA trade-offs for the current dataset.

Uses the NSGA-II algorithm to search the space of different configurations and finds and returns the Pareto-optimal ones.

Parameters
  • binary_representation – a flag indicating that settings are represented by a binary list. In this case, a different mutation/crossover will be used for the NSGA-II algorithm

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

  • cstree – a flag indicating that cost-sensitive trees are used. Makes the evaluation more accurate if used

Returns

two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).

sca_dca_model(configurations, active=1, cstree=False, name=None)

Tests the SCA-DCA configurations using a simulation.

This combines the simulations from (see sca_real()) and (see sca_real()). It is slower than using the mathematical model (see sca_dca_model()).

Parameters
  • configurations – a list of SCA-DCA configurations. A configuration can have the same syntax as the SCA configuration (see sca_real()) or it can be represented as a tuple, where the first element is a SCA configuration and the second element is a DCA configuration (see sca_real())

  • active – the length of the active period. For most purposes this length should be left as the default of 1

  • cstree – a flag indicating that cost-sensitive trees are used. Makes the evaluation more accurate if used

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

Returns

a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration

sca_dca_real(configurations, active=1, name=None, cstree_energy=False)

Tests the SCA-DCA configurations using a simulation.

This combines the simulations from (see sca_real()) and (see sca_real()). It is slower than using the mathematical model (see sca_dca_model()).

Parameters
  • configurations – a list of SCA-DCA configurations. A configuration can have the same syntax as the SCA configuration (see sca_real()) or it can be represented as a tuple, where the first element is a SCA configuration and the second element is a DCA configuration (see sca_real())

  • active – the length of the active period. For most purposes this length should be left as the default of 1

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

  • cstree_energy – this flag slightly increases the accuracy of the simulation when using the cost-sensitive decision trees. In order to use it, the energy sequence must be precalculated, by using cstree_energy flag when generating the trees (e.g. in add_csdt_weighted() or add_csdt_borders())

Returns

a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration

sca_model(configurations, name=None, cstree=False, encrypted=False)

Tests the SCA configurations using the SCA mathematical model.

Parameters
  • configurations – a list of SCA configurations. A configuration is a list of the same length as the number of contexts, each element being a setting. Optionally this parameter can be a single configuration instead of a list

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

  • cstree – a flag indicating that cost-sensitive trees are used. Makes the evaluation more accurate if used

  • encrypted – a flag for internal use (do not change its value)

Returns

a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration

sca_real(configurations, name=None, cstree_energy=False)

Tests the SCA configurations using a simulation.

This method reads from different sequences (that correspond to different settings), switching between them as context changes. This simulates a real-life environment where contexts are classified one-by-one using different system settings. This is slower than using the mathematical model instead ( sca_model()).

Parameters
  • configurations – a list of SCA configurations. A configuration is a list of the same length as the number of contexts, each element being a setting. Optionally this parameter can be a single configuration instead of a list

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

  • cstree_energy – this flag slightly increases the accuracy of the simulation when using the cost-sensitive decision trees. In order to use it, the energy sequence must be precalculated, by using cstree_energy flag when generating the trees (e.g. in add_csdt_weighted() or add_csdt_borders())

Returns

a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration

sca_simple(configurations, name=None)

Tests the SCA configurations using a simple mathematical model.

The model used simply multiplies the performance of each setting with the expected proportion of time that setting is in use. E.g if configuration consists of two settings, one with accuracy 60% and other with 100%, and if both contexts appear equally often, then the expected accuracy is 80%.

Parameters
  • configurations – a list of SCA configurations. A configuration is a list of the same length as the number of contexts, each element being a setting. Optionally this parameter can be a single configuration instead of a list

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

Returns

a list of trade-offs, in the form (quality, energy). i-th element represents the evaluation of the i-th configuration

Saving and loading data

To avoid repeated processing of the data, many components can be quickly saved and loaded. Many functions have this functionality already built-in by using the name parameter. In addition, the following functions are specialized for the task. Make sure the path is correctly set before using them. Additionally note, that all saving and loading uses pickle in the background.

class eecr.eeoptimizer.EnergyOptimizer(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None)
load_config(name)

Loads the settings information from a file (instead of calling set_settings())

Parameters

name – relative path to a file

load_data(name)

Loads the base sequence from a file (instead of calling set_sequence()).

Parameters

name – relative path to a file

load_data_config(data_name=None, config_name=None, sample_dataset=None)

Loads the sequence and settings information from files.

This is equivalent to sequentially calling the load_data() and load_config() functions.

Parameters
  • data_name – relative path to a file with the sequence

  • config_name – relative path to a file with the settings information

  • sample_dataset – if specified, a sample dataset will be loaded (currently only “SHL” keyword works)

load_solution(name)

Loads a set of energy-efficient solutions from a file.

Parameters

name – relative path to a file

Returns

two lists representing different tradeoffs. First list contains the configurations, while the other their evaluations. Evaluations are represented by a tuple with the form (quality, energy)

save_config(name)

Save the settings information (set with set_settings()) to a file.

Parameters

name – relative path to a file

save_data(name)

Saves the base sequence set with set_settings() to a file.

Parameters

name – relative path to a file

save_solution(configurations, values, name)

Saves a set of energy-efficient solutions from a file.

Parameters
  • configurations – a list of configurations

  • values – a list of configuration evaluation in form of (quality, energy)

  • name – relative path to a file

Automatic settings generators

Sometimes the generation of settings can be automated. Two of the methods listed (add_csdt_weighted(), add_csdt_borders()) automate only settings represented by the cost-sensitive decision trees. The last one (add_subsets()), automates the task of creating settings where each setting represents using a different attribute subset.

class eecr.eeoptimizer.EnergyOptimizer(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None)
add_csdt_borders(cs_tree, x1, y1, x2, y2, x_p=None, y_p=None, test_fn=None, buffer_range=1, name=None, cstree_energy=False, verbose=False, weights_range=None, energy_range=None, n_tree=15)

Generates different CS-DTs from data around different contexts.

When generating these cost-sensitive decision trees, only data from one contexts (and instances around it) are taken. This should create a tree that is good at recognizing that context and contexts to which it frequently transitions to. This method also tries different weights, same as the add_csdt_weighted() method. Generated CS-DT get added to the list of possible settings.

Parameters
  • cs_tree – the base cost-sensitive tree object (eecr.cstree.CostSensitiveTree) to use for generation of its variants

  • x1 – a pandas dataframe of attributes to be used as the training set

  • y1 – a pandas series of labels for the instances in x1

  • x2 – a pandas dataframe of attributes to be used as the testing set

  • y2 – a pandas series of labels for the instances in x2

  • x_p – (optional) a pandas dataframe of attributes to be used as the pruning set

  • y_p – (optional) a pandas series of labels for the instances in x_p

  • buffer_range – specifies how many instances to take after a contexts transitions

  • test_fn – a function used for measuring energy while testing a CS-DT, if it need to be different than the one used for training

  • verbose – if true, the function prints out the current progress

  • weights_range – a tuple with two elements, representing the minimum and maximum weight to be used. Trees will use n_tree different weights equidistantly sampled between these two extreme values. Sensible weight range must be determined manually, but as a heuristic, it is usually inversely proportional to the energy costs. For example if energy costs are all >1 then weights should all be <1.

  • energy_range – An alternative to the weights_range parameter, setting this to x is equivalent to setting weights_range to 1/n

  • n_tree – number of different CS-DT to be generated

  • name – None or relative path to a file. If latter, the generated settings will be saved

  • cstree_energy – Generate accurate energy sequences that can be used to increase the accuracy of the sca_real(). It makes this function much slower

Returns

the set of generated CS-DTs

add_csdt_weighted(cs_tree, x1, y1, x2, y2, x_p=None, y_p=None, test_fn=None, verbose=True, weights_range=None, energy_range=None, n_tree=15, name=None, cstree_energy=False)

Generates different CS-DTs with different ratios between energy and classification quality.

When generating cost-sensitive decision trees, at each node a decision is made on whether an attribute is worth using (based on its informativeness and cost). Different weights can be used to skew the decision in one way or another. Generated CS-DT get added to the list of possible settings.

Parameters
  • cs_tree – the base cost-sensitive tree object (eecr.cstree.CostSensitiveTree) to use for generation of its variants

  • x1 – a pandas dataframe of attributes to be used as the training set

  • y1 – a pandas series of labels for the instances in x1

  • x2 – a pandas dataframe of attributes to be used as the testing set

  • y2 – a pandas series of labels for the instances in x2

  • x_p – (optional) a pandas dataframe of attributes to be used as the pruning set

  • y_p – (optional) a pandas series of labels for the instances in x_p

  • test_fn – a function used for measuring energy while testing a CS-DT, if it need to be different than the one used for training

  • verbose – if true, the function prints out the current progress

  • weights_range – a tuple with two elements, representing the minimum and maximum weight to be used. Trees will use n_tree different weights equidistantly sampled between these two extreme values. Sensible weight range must be determined manually, but as a heuristic, it is usually inversely proportional to the energy costs. For example if energy costs are all >1 then weights should all be <1.

  • energy_range – An alternative to the weights_range parameter, setting this to x is equivalent to setting weights_range to 1/n

  • n_tree – number of different CS-DT to be generated

  • name – None or relative path to a file. If latter, the generated settings will be saved

  • cstree_energy – Generate accurate energy sequences that can be used to increase the accuracy of the sca_real(). It makes this function much slower

Returns

the set of generated CS-DTs

add_subsets(x1, y1, x2, y2, classifier, setting_to_energy=None, setting_fn_energy=None, subsettings=None, subsetting_to_features=None, n=0, feature_groups=None, setting_fn_features=None, name=None, y_p=None, x_p=None, csdt=False, csdt_fn_energy=None, cstree_energy=False)

Automatically generates settings by taking different attribute subsets from a database

This method assumes you have a pandas dataframe where each column represent a different attribute. Each row represents an instance to be classify. However, each attribute has a cost and thus the goal is to classify the instances with as small subset as possible. The cost is often shared between the attributes: e.g. if two attributes are calculated from the GPS stream, then having one or both has the same cost (the cost of having the GPS open). It may also be the case that using some sensor increases or decreases the cost of another sensor as they share resources when used.

There are several ways of specifying which subsets to transform into settings. Use the one most convenient for the current domain. In all examples we will assume that data comes from different sensors (subsettings) and a setting is set of sensors that is in use. If the data (and costs) comes from different sources a similar logic applies.

  1. Set subsettings parameter as a list of sensors and subsetting_to_features as a dictionary that maps each sensor to attribute list (attributes calculated from that sensor)

  2. Set feature_groups as a list of lists. Each list represents features that should all be included or excluded in any attribute subset used (e.g. they come from the same sensor). Set n as the number of feature_groups.

  3. Set n as the number of sensors, set setting_fn_features as a function that takes a binary string of length n and outputs a list of attributes. Character i in binary string represents whether i-th sensor is active.

This method is not a substitute for attribute selection.

Parameters
  • x1 – a pandas dataframe of attributes to be used as the training set

  • y1 – a pandas series of labels for the instances in x1

  • x2 – a pandas dataframe of attributes to be used as the testing set

  • y2 – a pandas series of labels for the instances in x2

  • x_p – (optional) a pandas dataframe of attributes to be used as the pruning set

  • y_p – (optional) a pandas series of labels for the instances in x_p

  • classifier – any classifier (tested with sklearn and CS-DTs). Should provide functions “fit” and “predict”

  • setting_to_energy – a dictionary, where the keys are the possible settings and the values are the energy costs (per time unit) when using that setting. Settings are binary string

  • setting_fn_energy – a function to be used if the setting_to_energy is None. This function should take a setting as the input and return energy costs (per time unit) when using that setting. Settings are binary string

  • subsettings – the list of subsettings

  • subsetting_to_features – a dictionary that maps subsettings to attribute sets

  • n – the number of subsettings

  • feature_groups – a list of lists, elements of the sublists are always all included or excluded

  • setting_fn_features – a function that maps settings to attribute sets

  • name – None or relative path to a file. If latter, the generated settings will be saved

  • x_p – (optional) a pandas dataframe of attributes to be used as the pruning set

  • y_p – (optional) a pandas series of labels for the instances in x_p

  • csdt – if the classifier is a CS-DT set this flag to true

  • csdt_fn_energy – a function used for measuring energy while testing a CS-DT, if it need to be different than the one used for training

  • cstree_energy – If using CS-DTs it generates accurate energy sequences that can be used to increase the accuracy of the sca_real(). It makes this function much slower

Returns

a list of generated classifiers

Visualizing solutions

Most functions in the ee.eeutility module are meant for internal use. However, the draw_tradeoffs() provides an easy way to visualize the results achieved with other methods.

eecr.eeutility.draw_tradeoffs(plots, labels, xlim=None, ylim=None, name=None, reverse=True, pareto=False, short=False, points=None, folder='artificial', percentage=True, percentage_energy=False, scatter_indices=None, color_indices=None, text_factor=50, ylabel='Energy', dotted_indices=None, thick_indices=None, xlabel='Classification error')

A tool for quick visualization of different trade-offs.

Essentially a wrapper around matplotlib.pyplot that makes drawing and comparing different sets of trade-offs easier. As an input it expects a list of trade-offs and can plot them in the way that is standard for Pareto fronts: step-wise with the quality axis reversed so that the ideal point lies in the lower-left corner. It streamlines some other aspects, for example labeling, saving and making sure only Pareto-optimal points are drawn.

While some utility parameters are presents for modifying the look of the graph, it is recommended to use matplotlib.pyplot library directly for any complex drawing task.

Parameters
  • plots – a list of one or more trade-off sets. Each trade-off set is for example a result of a different energy-optimization function. They can be represented in two different ways. Either as list of tuples (each tuples representing quality, energy) or a tuple of two lists (first list containing quality, second energy). All outputs returned from energy-optimization functions already fit this criteria.

  • labels – a list of labels in the same order as the trade-offs in the plots parameter

  • xlim – a tuple with min and max for x-axis

  • ylim – a tuple with min and max for y-axis

  • name – None or name of a file where the figure is saved

  • reverse – reverses the x-axis

  • pareto – only Pareto-optimal points are drawn

  • short – if true, the graph will be drawn as square instead of a rectangle

  • points – a list of points of to be drawn [(x1,y1,s1), (x2,y2,s2)]. s1, s2, etc. are optional and provide a way to annotate points

  • folder – None or name of a folder where the figure is saved

  • percentage – multiplies the x-axis by 100 (in order to transform the accuracy of 0.45 into 45%)

  • percentage_energy – multiplies the x-axis by 100

  • scatter_indices – indices of trade-offs sets that should be drawn un-connected

  • color_indices – if not None, each element of this list determines the index of a color (trade-offs with the same index will be drawn with the same color)

  • text_factor – changing this parameter moves the label from/away the annotated points

  • ylabel – label for the y-axis

  • dotted_indices – indices of trade-offs sets that should be drawn with dots

  • thick_indices – indices of trade-offs sets that should be drawn thicker than the rest

  • xlabel – label for the x-axis

Alternative search methods

Methods below are based on the related work, but solve the same kind of problem as the methodology presented in this module. They are here included as possible alternatives and for easier comparison of the approaches.

class eecr.eeoptimizer.EnergyOptimizer(sequence=None, contexts=None, setting_to_energy=None, setting_to_sequence=None, quality_metric=None, sensors_off=None, sensors_on=None, path=None)
find_aimd_tradeoffs(increase_range=None, decrease_range=None, name=None, active=1)

Solves the duty-cycle assignment problem in a different ways than the DCA methods

Imagine a system with parameters inc and dec that works as follows: 1.) System duty-cycles with a cycle of length len. 2.) If the next detected context is the same as the previous one: len = len + inc. 3.) If the next detected context is different than the previous one: len = len * dec. This method finds good combinations of inc and dec from a specified range.

This method is based on paper: Au, Lawrence K., et al. “Episodic sampling: Towards energy-efficient patient monitoring with wearable sensors.” 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2009.

Parameters
  • increase_range – a list of possible inc values

  • decrease_range – a list of possible dec values

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

  • active – the length of the active part of the duty-cycle

Returns

two lists, the first contains pareto-optimal configurations in the form (inc, dec), the second their evaluations in the form (quality, energy).

find_coh_tradeoffs(alphas=None, name=None)

Finds the SCA configurations based on context transition probabilities.

While the method behind this is completely different than find_sca_tradeoffs(), it returns the same kind of output, including the same type of configurations.

This method is based on paper: Gordon, Dawud, Jürgen Czerny, and Michael Beigl. “Activity recognition for creatures of habit.” Personal and ubiquitous computing 18.1 (2014): 205-221.

Parameters
  • alphas – a list of different values of parameter alpha to test (0 <= alpha <=1), see the paper for details

  • name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

Returns

two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).

find_simple_tradeoffs(name=None)

Finds the SCA configurations based on simple mathematical model.

It works fast, but the results are often much worse than the alternatives. Finds either one or two trade-offs.

Parameters

name – None or relative path to a file. In latter case the path will be used to save the generated configurations and trade-offs

Returns

two lists, the first contains pareto-optimal configurations, the second their evaluations in the form (quality, energy).

Cost-sensitive decision trees

Cost-sensitive trees are a version of the decision tree classifier that is interested in both attribute informativeness and attribute costs when building the tree. To explain it briefly: at each tree node – instead of checking each attribute’s information gain or a similar metric – we compare the expected cost of misclassification if this node becomes a leaf, with the expected cost of the attribute and misclassification if the attribute is used to further divide the instances. The attribute that reduces the expected cost the most, if any, then expands this node.

This particular implementation of the cost-sensitive trees is optimized for the tasks in context-recognition. It can be used as a stand-alone method for reducing the energy consumption or it can be used in conjuction with other methods described in this module.

class eecr.cstree.CostSensitiveTree(contexts, cost_function, feature_to_sensor=None, feature_groups=None, tree_type='default', min_samples=1, extension=0, default=None, weight=1)
__init__(contexts, cost_function, feature_to_sensor=None, feature_groups=None, tree_type='default', min_samples=1, extension=0, default=None, weight=1)

Constructor for the cost-sensitive decision tree.

Either (but not both) feature_to_sensor or feature_groups must be set.

Parameters
  • contexts – a list of contexts to be recognized

  • cost_function – maps a set of sensors into the energy cost

  • feature_to_sensor – a dictionary that maps each attribute to a sensor

  • feature_groups – a dictionary that maps each sensor to a list of attributes

  • tree_type – there are three types of trees that can be generated: “default”, “pruned” and “batched”. The pruned version prunes the tree using a dedicated pruning set after tree generation. “batched” version tries to place attributes that share energy costs as close together as possible. This results in a longer train time, but usually slightly better performance

  • min_samples – the minimum number of samples in a non-leaf node

  • extension – if this is set as >0 the tree building process will sometimes expand nodes even if energy-inefficient in anticipation that the decision will pay off later in the tree. It prunes these branches if the anticipation proves to be wrong. This results in a longer train time, but usually slightly better performance (in practice extension=1 proved best)

  • default – context to be classified in case of an empty tree

  • weight – weight of the energy cost when compared to the misclassification cost

fit(x1, y1, x_p=None, y_p=None)

Trains the tree classifier

Parameters
  • x1 – a pandas dataframe of attributes to be used as the training set

  • y1 – a pandas series of labels for the instances in x1

  • x_p – (optional) a pandas dataframe of attributes to be used as the pruning set

  • y_p – (optional) a pandas series of labels for the instances in x_p

predict(x2)

Tests the tree classifier

Parameters

x2 – a pandas dataframe of attributes to be used as the test set

Returns

a list of predictions, where the i-th element is the prediction for i-th row in x2

show()

Prints the structure of the tree.

show_sensors()

Prints the structure of the tree, abstracted so only different sensors are shown.