Genetic-programming-based multi-objective optimization of strategies for home energy-management systems

Home energy-management systems can optimize performance either by computing the next step dynamically e online, or rely on a precomputed strategy used to introduce the next decision e offline. Further, such systems can optimize based on only one or several objectives. In this paper, the multiobjective optimization of offline strategies for home energy-management systems is addressed. Two approaches are compared: the common timetable-based versus our approach based on decision trees. The timetable-based strategy is optimized using a multi-objective genetic algorithm, while the treebased strategy is optimized using multi-objective genetic programming. As a result, a set of rules that comprise the trees for efficient management of an energy system is generated automatically. First, the approaches are addressed theoretically, with the finding that the tree-based approach is more powerful than the timetable-based approach. Second, the performance of the tree-based approach is compared with the performance of the timetable-based approach and manually defined strategies in an experiment involving real-world data. A performance increase of up to 17% in terms of the cost objective was confirmed for the tree-based approach. This is achieved without changing the user habits, i.e., there is no need of having to adapt the appliance usage to the energy-management system. © 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
As users become more concerned about the environment, regulating authorities are increasingly restricting the consumption of non-renewable energy, while the deployment of smart grids [1] continues to increase. In addition, methods and systems for smart electrical energy management in homes, industrial facilities and office buildings are becoming ever more important. This work is motivated by the lack of an energy-management system that: 1. Can be automatically personalized to a particular home energy system deployment. 2. Can outperform standard timetable-based energy-management systems. 3. Can take into account several conflicting objectives. 4. Is not computationally expensive and can be deployed on lowcost hardware. 5. Does not require the user to change the user habits.
Current smart-home systems, especially the ones that are commercially available, use relatively simple and predefined control mechanisms for home energy management. Even the solutions that can be personalized by learning user habits and adjusting the performance of the smart home accordingly, usually perform the optimization with respect to a single objective only, e.g., decreasing the costs.
Energy-management systems [2] for smart homes typically model the problem of energy management as a scheduling problem. First, predictive models for solar irradiation [3], wind speed [4], consumption [5] and/or prices [6], are computed and then used in the optimization of an energy-management schedule for the next time horizon. One day is the usual time horizon used. Since the re-computation is required for every time horizon, such strategies are classified as rolling time horizon strategies and sometimes referred to as "classical" strategies [7].
The result of some energy-management system optimization methods are the schedules for a set of managed appliances [8]. This impacts the user comfort, since the user has to adapt to the time of allowed appliance usage in order to meet the optimization objective. To address this problem, some approaches [9] take into account certain aspects of the user comfort, such as indoor temperature, illumination of the occupied room, electric vehicle range and preferred time window for appliance operation.
Recently, deep neural networks have been used to model the energy consumption and weather, indicating the possibility for accurate predictions when large amounts of diverse data are available. Deep neural networks with multilayer perceptron were used for short-term power probability density forecasting in Ref. [10], while recurrent neural networks were assessed for shortterm building energy prediction in Ref. [11]. In order for these techniques to work well, the predictions have to be rather or even extremely accurate, which is hard to achieve in real life due to the uncertainty associated with external factors and parameters, mainly the local weather, but also the energy consumption that results from users' activities at any particular time. Furthermore, predictive models and optimization strategies are tightly coupled in such systems [12]. However, this is rarely addressed, e.g., the deficiency of predictive models is not taken into account during the optimization phase. Since the weather and the users' activities directly influence the production and consumption of energy in the home, predefined scheduling might not be an appropriate technique for optimum energy-flow management.
Some studies present energy-management systems that do not use scheduling. For example, in Ref. [13] Markov decision processes are used, and in Ref. [14] fuzzy-logic expert systems are deployed, with the claim of near-optimally managed energy flows. Fuzzy logic-based energy-management system is proposed in Ref. [15], where a rule set for energy management is generated by means of a hierarchical genetic algorithm with the aim of profit optimization. All three papers indicate the deficiencies of using schedules for energy-consumption optimization, such as increased computational costs due to frequent optimization runs, an inability to adapt to new, unexpected situations, increased computational costs due to complex prediction models such as deep neural networks, and an inability to run a high-quality energy-management program onsite using cost-efficient equipment.
While the majority of approaches address only one optimization criterion, others, such as [16], acknowledge the need to optimize the system according to multiple contradictory criteria. The reason is that often-used and practically relevant criteria, such as energy consumption, carbon emission, self-consumption, and costs, are usually conflicting in the sense that improving one objective can worsen the other. For instance, if selling the energy is economically beneficial, then increasing the energy sales to the grid lowers the operational costs and thereby enhances the profit, but at the same time reduces the self-sufficiency rate. The common techniques transform the multiple criteria into a single objective, usually applying a weighted-sum approach, and then perform singleobjective optimization. Energy storage and management system design optimization for a photovoltaic-battery energy storage system using both weighted sum approach and Pareto-based multi-objective optimization is addressed in Ref. [17]. Further, a complex rule-based energy-management strategy is proposed, indicating that designing such strategy by hand is a laborious process without guarantees on optimal performance. Another way of transforming multiple objectives into a single objective is to define optimum points in the objective space that the strategies try to achieve. This is called steering. Steering approaches to Pareto-optimal multi-objective reinforcement learning of strategies for the control of local battery storage for a residential solarpower system are presented in Ref. [18].
The parameter-based optimization of energy-management systems according to multiple objectives is described in Ref. [19], where the daily optimization of the operational schedule using optimized timetables is performed, and in Ref. [20], where the reference points for lower-level controllers are dynamically optimized.
Some approaches [21] construct a thermal and an electrical model based on existing data and other inputs, such as energy rules, to derive an overall energy model that is then used to predict electrical and thermal demand and production. The optimization that takes into account the overall energy model is then performed in order to generate suggestions for additional energy rules that can be applied to the energy-management system by the building managers.
However, to the best of our knowledge, no system presented in the related work can perform a robust, well-performing offline Nomenclature I standard standard irradiation T standard standard temperature Charge i energy input into the battery in the i-th time interval Discharge i energy output out of the battery in the i-th time interval SoC i battery's state of charge at the end of the i-th time interval k correction coefficient k d discharge coefficient for battery self-discharging rate Balance i energy balance in the i-th time interval Cost total operation cost for running the energymanagement system strategy Cost buy i buy price for the i-th time interval Cost sell i sell price for the i-th time interval Green total green factor for running the energymanagement system strategy Grid in i amount of electrical energy sold to the grid in the i-th time interval Grid out i amount of electrical energy bought from the grid in the i-th time interval Loss i the amount of energy lost in the i-th time interval PV receive max maximum electrical power that can be sent the grid PV i photovoltaic module average power production in the i-th time interval PV declared declared power of the photovoltaic module TM i average photovoltaic module temperature in the i-th time interval f max coefficient for limiting the total power that can be sent to the grid I i total irradiation in the i-th time interval L i electrical load in the i-th time interval T i average temperature in the i-th time interval W i average wind speed in the i-th time interval HEMS home energy-management system MORL multi-objective reinforcement learning NSGA-II non-dominated sorting genetic algorithm II strategy optimization for an energy-management system and provide multiple trade-off solutions in configurations with conflicting objectives, which is the case in the presented approach. The contributions of this work are as follows.
1. The proposed tree-based solutions are computationally less expensive than some of the online approaches, which require recomputation for each next time period. The computationally intensive step for the proposed tree-based strategies can be performed only a few times in a year and can be executed offsite, preferably in a cloud. Only the tree-based solutions can then be transferred on-site. The solutions can then be easily implemented on a home energy-management system hardware, since they comprise only simple arithmetic operations and if-then rules. 2. The proposed tree-based strategy outperforms other often-used strategies, those based on timetables and manually defined strategies, by up to 17% in terms of the cost objective while keeping the green objective fixed, as evident from the experimental results. 3. The superiority of the tree-based strategies over the timetable based strategies with respect to the expressive power is proven theoretically. 4. The proposed approach is based on the true Pareto-based multiobjective optimization, where the user can pick the solution with the preferred trade-off after the trade-offs are clearly presented to him or her. This is an advantage over the weightbased multi-objective approaches where the weights are usually chosen beforehand, when the trade-offs are not yet evident. 5. The user of the proposed approach does not have to change his or her habits and can use any appliance at any time. This is in contrast with some of the methods that prescribe time intervals for appliance usage, which have to be considered by the user in order to achieve the optimal criteria values.
The advantage of the presented approach, based on a tree-based strategy (an example is provided in Fig. 1), over a timetable-based strategy (an example is provided in Fig. 2) is presented in the following example. Assume that the price of electrical energy changes dynamically throughout the day. This is known as realtime pricing and is already available in certain parts of the world [22]. Assume that, generally, the price for electrical energy starts increasing in the morning. Since this is a typical behavior, the costeffective, robust, timetable-based strategy would learn to sell the surplus energy in the morning instead of storing it in the battery for later use. Now assume that on a particular morning it is very sunny and windy. In this case, the solar and wind farms produce a large amount of electrical energy, which becomes available on the energy market, thereby lowering the price of electrical energy. A robust, timetable-based strategy would continue to sell the energy at a low price, since it only takes into account the time of the day, and selling the energy is beneficial on a typical day. A tree-based strategy, however, also takes into account the low (or even negative) price. At times of an exceptionally low price, the selling of energy is not beneficial and an optimized, tree-based strategy, which would learn this, would deffer the selling of energy to a later time. The advantage of the tree-based strategy is three-fold. First, the energy stored in the battery could be used by a smart home, thereby increasing its independence from the grid. Second, the cost would be lower, since the profit made from selling the energy at a low price is surpassed by the cost of buying more expensive energy at a later time. Third and overall, the tree-based strategy is more flexible and enables adaptation to the current situation based on the previous construction of potential decisions needed in most relevant situations.
The rest of the paper is structured as follows. In Section 2, the energy-management system optimization problem is introduced, and in Section 3, energy-management strategies are discussed. In Section 4, the presented framework is described. In Section 5, the experiments and results are presented, while Section 6 discusses the case-study results and Section 7 concludes the paper.

Problem formulation
The problem of managing electrical energy in a smart home that has one or more sources of electrical energy, an electrical energy storage option, a smart grid, and an electrical energy consumption or load is addressed. The proposed Home Energy-Management System (HEMS) does not manage the devices by turning them on or off or by setting different modes of operation.
The HEMS problem is, therefore, to find one or a set of the best (according to one or multiple objectives) strategies that decide on how much energy to buy from or sell to the grid or how much energy to store in the battery based on the past, present and predicted-future states (regarding the price of electrical energy, the production and consumption of electrical energy, and the state of charge of the battery).
The overall schematic that illustrates the problem of managing the electrical energy in a smart home is presented in Fig. 3. The historical data is first retrieved and used as an input for the optimization procedure. The result of optimization is a set of nearoptimal strategies with respect to multiple criteria, e.g., green and cost criteria. The user or the HEMS operator then chooses a solution strategy that is preferred by the user, which is then uploaded to the HEMS central unit and utilized to manage the energy flows within the system. Additional data is recorded and can be reused in the event of another optimization run.

Model
The home energy system ( Fig. 4) comprises components that are only energy sources (photovoltaic modules and wind generators), only energy consumers (load), or both (grid, battery). The HEMS is responsible for managing the energy flows between the home energy system's components. For the purpose of this paper, all the hardware details are abstracted away, and only the logic of the control of the energy flow is addressed.
Photovoltaics. A regression model [23] is used to determine the energy generation for the given weather conditions. Given the declared power of the photovoltaic modules (PV declared ), the total irradiation (I i ), where subscript i denotes the i-th time interval, the average wind speed (W i ) and the average temperature (T i ), the power output (PV i ) can be approximated as follows: where TM i is the average photovoltaic module temperature, I standard ¼ 1000 W=m 2 is the standard irradiance, T standard ¼ 25 + C is the standard temperature, and k ¼ À0:004 is the correction coefficient.
Battery. The battery's state of charge at the end of the i-th time interval (SoC i ) is calculated from the previous state of charge (SoC iÀ1 ), the battery charge (Charge i ), the discharge (Discharge i ), and the self-discharging rate k d : During each time interval, the battery can either charge or discharge: ci; Charge i , Discharge i ¼ 0: (4) Further, that energy cannot be transferred between the grid and the battery.
The smart grid sends the price signals to the HEMS. Buy and sell prices are denoted with Cost buy i and Cost sell i , respectively. During each time interval, the HEMS sends Grid in i to or receives Grid out i electrical energy from the grid: Sending the energy to the grid can be limited by coefficient f max 2½0; 1, determined by the national law: Electrical load. The energy consumption of all the electrical devices in a residential home during the i-th time interval is aggregated into the electrical load (L i ). The load management [24] is not covered in this paper.

Objectives
Often-used objectives in HEMS include running costs, CO 2 emission, self-consumption rate, maximum peak load, total energy consumption and battery life expectancy. The objectives considered in this paper are the running costs and the green factor.
The running costs (Cost) comprise the cost of buying and the profit of selling the electrical energy during each time interval: where Grid out i and Grid in i denote the amount of electric energy bought from or sold to the system, respectively. Further, Cost buy i and Cost sell i are the prices for buying and selling the electric energy to or from the system.
The green factor (Green) represents the level of independence of a home with the HEMS from the grid: where subscript i indicates i-th time interval, PV i is the amount of energy produced, Loss i is the amount of energy that is not sold or used in the HEMS e effectively being lost, Grid in i is the amount of energy sold to the grid, and L i is the electrical energy load.

Simulator
The simulator models the energy flows in the home energy   Fig. 4. Home energy-system model. system in discrete time intervals with respect to the given energymanagement strategy and is used to evaluate the performance of the strategy according to the specified objectives. The following simulator components are considered: photovoltaic module, smart grid, battery, load, and HEMS. For each time interval, the simulator receives the following input data: 1. Grid energy prices. 2. Energy production. 3. Load.
Additionally, the home energy system's configuration consists of: 1. Battery: maximum charge rate, maximum discharge rate, minimum state of charge, maximum state of charge, initial state of charge, self-discharge rate, charge efficiency, discharge efficiency. 2. Photovoltaics: peak power (the maximum power of a photovoltaic module in a standardized test). 3. Grid: efficiency of selling energy to the grid, peak power sell coefficient (specifying the maximum power allowed for transmitting the electrical energy from the home energy system to the grid).
The remaining energy balance (Balance i ) in each time interval is denoted with: where PV i is the electrical energy production and L i is the electrical energy load. There are two options during each time interval: either the HEMS first uses the battery in order to store/obtain energy or it uses the grid in order to sell/buy the energy. If the balance is positive, there is energy excess, and in the case of a negative balance, there is a lack of energy. In each time interval the HEMS executes a control action and receives the values of the objectives (cost and green factor).

Strategies for home energy-management systems
Two types of controllers or strategies were developed and tested for the purpose of this study: the timetable-based strategy and the tree-based strategy. The HEMS strategy maps the current HEMS state into the control action. In this paper, both strategies use the same control actions: 1. a 0 e use battery first: (a) If the balance < 0: Try to use the energy from the battery first. If this is not sufficient, supply energy from the grid. (b) If the balance ! 0: Try to put the extra energy into the battery first. If the production is greater than the maximum charge rate, sell the energy to the grid up to the grid sell limit and discard the rest.
2. a 1 e use grid first: (a) If the balance < 0: Get all the required energy from the grid.
(b) If the balance ! 0: Try to sell the energy to the grid up to the sell limit. If there is energy left, charge the battery. If there is still some energy left, discard it.
Although the control actions are the same, the HEMS strategies differ in how the automatic decisions to choose one particular control action are taken, i.e., they use different mappings from the HEMS state space into the control action state space.
where f is an objective function that depends on the parameters D, and Ag is a set of all the mappings from S to A.
For the purpose of this study, f is a simulator that computes the running costs and the green factor as two conflicting objectives.

Timetable strategy
In the timetable strategy, the action is specified for each time interval. Using only the time-based decision making can be efficient in domains where the working conditions are periodic, e.g., a user goes to work at a specific hour on a workday, which is partially the case in the energy-management domain, and not much other data is available. Because of its simplicity it is often used in controlstrategy problems.
Often, the timetable strategy is used in combination with some prediction mechanism trained on a training dataset. Next, for each time interval, where the HEMS states are partially predicted using the previously built predictive models, the problem of finding the optimum timetable(s) is solved. This kind of strategy requires continuously solving the optimization problem. Moreover, the quality of the strategy depends on the accuracy of the predictive models.
In this paper, the focus is on finding robust timetable strategies that do not require resource-intensive continuous recomputation for each new time horizon (i.e., a set of consecutive time intervals). This requires finding timetable strategies that perform well on the training data, given some longer time horizon, i.e., the optimization problem is solved once and the solution is then used by the HEMS for new data. Note that the HEMS state space S ¼ T Â F, where the time space T is only one dimension, while F includes all the other feature dimensions (price, energy consumption, energy production and others).
For the purpose of this study, the timetable strategy's mapping is discretized in order to correspond to the discretized dataset D, i.e., for each possible value of the time of the day an action is specified.

Tree-based strategy
In the tree-based strategy, the decision about which action to take is based on a logic flow as defined by a binary decision tree. An example of such a decision tree is shown in Fig. 5. The inner nodes represent tests of the form: is the value of the feature X (e.g., the current balance) greater than or equal to the value v (e.g., 0). Based on the result, the control logic proceeds along either of the two branches. The terminal nodes, i.e., the leaves, represent actions to be taken (e.g., use the battery first). This decision-making flow is repeated each time a decision needs to be made.
In this case, the following features are used: the current minute of the day, the average previous day's buy price, the average previous day's sell price, the current load, the current production, the current buy price, and the current sell price.
The test value v for the feature X is calculated dynamically, based on the possible values that can be achieved, given the decision flow. Each inner node is represented by the feature X and the relative test value v relative 2ð0; 1Þ. If the relative test value does not yet exist (because the test node has not yet been visited in the simulation run) it is randomly generated and from this value the absolute test value (v absolute ) is calculated as where v X min and v X max are the current minimum and maximum values that can be attained in the current node for the feature X. When the feature X is first used in a test, v X min and v X max are the minimum and maximum values, respectively, for the feature X in the data. Assume that v X i is the value for the feature X in the i-th time interval. At the root node the following values are set: However, when X has already been used in a test in some predecessor node, then v X min and v X max are adjusted as follows: Proof. First, let us prove that each timetable strategy can be represented by some decision-tree strategy. Without loss of generality, assume a timetable strategy t s with daily periodicity represented by the following pairs t s ¼ ½ðt 0 ; a 0 Þ; ðt 1 ; a 1 Þ; …; ðt nÀ1 ; a nÀ1 Þ; ðt n ; a n Þ: Here, each pair ðt j ; a j Þ means that in the time interval ðt jÀ1 ; t j action a j is used, where t À1 is defined as t À1 ¼ t n ¼ midnight. For the timetable strategy t s a corresponding tree-based strategy can be constructed that behaves exactly as shown in Fig. 6. This proves that a set of timetable strategies T is included within the set of tree-based strategies J, therefore, ct 2 T dj2Psi : t $ j0T4J: (15) Second, let us prove that a tree-based strategy exists that does not have a timetable-based strategy representation. First, a feature that is not exactly repeated after a certain time is needed, which effectively means any feature with the exception of the time-of-day feature. An example tree-based strategy as in Fig. 5 for instance uses the current balance feature and the difference between the current sell price and the average previous day's sell price. The example tree-based strategy cannot be represented using any timetable-based strategy, since neither of the features used by the tree-based strategy are periodic. Therefore, dj 2 J∧et2T : j $ t0J?T: (16) Combining the two results proves the theorem: T4J∧J?T0T3 s J: (17) Theorem 1 can also be demonstrated experimentally, as seen in Fig. 7. To demonstrate the different expressive powers, 500,000 random timetable-based and tree-based strategies were generated and their performance with respect to the two objectives (cost and green factor) was compared on two different datasets. The mapping of the timetable-based strategies is observed to be included in the mapping of the tree-based strategies.

Optimization framework
In order to solve the optimization problem from Section 2, two alternative methods are utilized: one for the timetable strategy optimization and one for the tree-based strategy optimization. For solving this problem the methods from the field of evolutionary computation are used. The methods originate from the idea of evolution, i.e., solution candidates are put into an artificial environment, where they can interact with each other and produce offspring. The survival of the solution candidates depends on their fitness, derived from the objective function. Over several generations (i.e., iterations) the fittest solution candidates emerge as the solutions to the problem. When there are multiple conflicting objectives to be considered for optimization, a single best solution does not exist. Instead, a set of incomparable solutions (i.e., a Pareto set) are optimum in such cases, i.e., considering two solutions from the Pareto set, one is always better than the other for at least one objective. Population-based evolutionary algorithms, such as the   6. A decision tree corresponding to the timetable ts well-known algorithm NSGA-II [25], are especially suitable for these problems, since they evolve a set of solution candidates in a single run.
Here, NSGA-II handles the optimization of the timetable strategy. Individuals are represented as a hash map, with mappings from the time-of-day to a set of actions. For the purpose of this study, a day was divided into half-hour intervals and the actions set comprised two actions: a 0 , where the battery is used first, and a 1 , where the grid is used first. This results in a timetable representation in the form of a list of 48 binary values, where for each halfhour interval either action a 0 or a 1 is taken.
The multi-objective genetic programming algorithm [26] is applied to the optimization of the tree-based strategy. Individuals are represented as binary decision trees, where each inner node (decision node) tests whether a feature satisfies a condition or not. Terminal or leaf nodes denote the actions in the simulator. The following genetic programming operators are applied to evolve the solutions: Initialization: ramped half-and-half method (half of the initial decision trees are full trees, the other half are of varying depth).
Selection: a non-dominated sorting-based selection as in NSGA-II [25] is used for the purpose of selecting individuals based on the values of multiple criteria.
Mutation: uniform mutation, which randomly selects a sub-tree in the existing tree and replaces it by a randomly generated subtree.
Crossover: one point crossover. It randomly selects sub-trees in each of two parent individuals and exchanges them.
Reproduction: this operator returns a copy of the parent individual.
Bloat control: common problem of increasingly larger programs in later generations of the evolutionary algorithm run is addressed by the static limit as proposed by Ref. [26]. If a node is positioned at the depth specified by the static limit and is at the same time the root of a sub-tree, it is replaced by a random leaf node from its subtree.
The overview of the algorithm is presented in Fig. 8. The values of the parameters of the inner nodes are generated randomly when a tree is initialized.

Case study
In this section, the data used for the simulation and the strategy optimization are described. Additionally, the setup and the experimental results are presented.

Data
Real-life data is used in the experimental evaluation of the strategy optimization. Weather data was obtained from the Slovenian Environment Agency weather portal [27] for the location of "Bilje pri Novi Gorici, Slovenia" for the period between January 1, 2007, and October 23, 2016. Energy consumption data is used from the available Electricity Load Diagrams 2011e2014 dataset [28]. Load data with the id "MT_003" is used for these experiments. The electrical energy pricing data was obtained from the historical intra-day continuous pricing data for the German market at the Epexspot portal [29]. The weighted average price is used for both e buy and sell prices.
All the data was resampled to 30-min intervals and the missing values were filled using the forward-fill method, i.e., each missing value is set to a value of the first existing, previously valid value. Sub-sample of the data is visualized in Fig. 9.
In order to properly test the strategies, the data was divided into two datasets e the training dataset that included the data from April 1, 2014, to June 30, 2014, and the test dataset that included the data from July 1, 2014, to September 30, 2014. Both datasets cover a period of 3 months of summer and transition periods. The training dataset is used during the process of optimization. In order to prevent overfitting (the over-adaptation of strategies to the already seen data), the strategies are also tested on a previously unseen test dataset, which gives a sense of the strategy generalization.

Alternative strategies
In order to obtain a sense of the strategy optimization performance, the results are compared to three strategy groups: manually defined strategies, robust timetable strategies, and near-optimum strategies.
The first group includes a set of manually defined strategies, The second group includes the robust timetable strategies described in Section 3.1 and obtained using the NSGA-II algorithm, as described in Section 4.
The third group includes the near-optimum strategies used for the purpose of assessing the performance of the strategy optimization method. In order to obtain the near-optimum strategy, a multi-objective optimization using NSGA-II is performed in order to find a binary vector of length 4368 and 4416, which defines an action for each possible time interval in the training and test datasets, respectively. This requires solving two optimization problems, one for the training and one for the test dataset.

Optimization runs
The optimization was performed using a workstation with an Intel Xeon CPU E5-1620 v4 @3.5 GHz with 32 GB of RAM. Python [30] was used as the programming language and the DEAP library [31] was used as the optimization framework. The EMS simulator [32] was implemented using the NumPy [33] and Pandas [34] libraries. The hypervolume indicator [35], which measures the area covered by non-dominated solutions, was used to monitor the evolution of the strategies and compare the optimization runs.
Due to the resource-intensive experiment, the parameter settings were chosen based on a smaller set of preliminary runs. As a result, the parameter settings for the timetable-based and treebased strategy optimizations specified in Tables 1 and 2, respectively, were set.
For the simulator, the parameter settings reported in Table 3 were used.
Each optimization was executed 40 times in order to obtain representative results, with the exception of the near-optimum strategy optimization, which was run only once. The parameter settings used in the near-optimum strategy optimization were the same as in the timetable-based strategy, with the exception of the number of generations, which was increased to 40,000. The increase was needed for the hypervolume value to stabilize.

Results
The hypervolume progress plots for the optimization of the timetable-based, tree-based and near-optimum strategies on the training data are presented in Fig. 10. The progress of each optimization approach is shown, i.e., 40 runs of the timetable-based strategy and tree-based strategy optimizations, and one run of the near-optimum strategy optimization. The reference point used for the hypervolume calculation was determined from the data. First, the worst values for each of the objectives were calculated and then multiplied by 1.1, if the value was positive, or 0.9, if the value was negative. The same reference point was used for all the hypervolume calculations, so that the hypervolume values are comparable. The number of generations in the timetable-based strategy and tree-based strategy optimizations was set to 1000, while for the near-optimum strategy it was increased to 40,000, keeping the number of individuals the same. Therefore, the normalized generation or evolution progress (i.e., the current iteration divided by the total number of iterations) is used for the horizontal axis.
The hypervolume indicator stabilizes in late generations; therefore, increasing the number of generations is not necessary.
The percentile values of the last generation for the optimization of the timetable-based, tree-based and near-optimum strategies on training data are presented in Table 4. The percentile values for near-optimum strategies remain the same, because only one run was executed.
The performance of the non-dominated strategies in the final populations of all the optimization runs together with the performance of the predefined strategies is shown in Fig. 11. Each found solution is represented by a dot with the values of the Cost objective on the horizontal axis and 100 Â Green objective value on the vertical axis. In order to observe the generalizability of the solutions, the performance of each strategy was also evaluated on the previously unseen test data (the right-hand plot).
Observe that the negative Cost value actually means profit and solutions to the far left-hand side are preferred. At the same time, the solutions with the highest Green value at the top are preferred. The ideal solution would, therefore, be present in the upper-left corner. As a general rule in such cases, solutions that are to the upper left-hand side are considered the best. Furthermore, observe that nearly every two points on the front that belongs to the same strategy type are incomparable. When a decision maker chooses one over the other, the performance of the HEMS improves according to one objective and worsens according to the other. This enables the decision maker to select a trade-off between the objectives according to his or her preferences. The trade-off is presented clearly in the case of the true multi-objective optimization, Fig. 9. Sub-sample of the data from April 2014: electrical energy production in kWh (left), electrical energy consumption in kWh (center), and electrical energy prices in EUR/kWh (right).

Table 1
Parameter settings for the timetable-based strategy optimization.

Parameter name
Parameter value Number of parents 500 Number of offspring 500 Number of generations 1000 Gene mutation probability 1/24 Crossover probability 0.9 Table 2 Parameter settings for the tree-based strategy optimization.

Parameter name Parameter value
Number of parents 500 Number of offspring 500 Number of generations 1000 Individual mutation probability 0.1 Crossover probability 0.9 Initial minimum tree depth 1 Initial maximum tree depth 10 Mutation minimum tree depth 0 Mutation maximum tree depth 3 Static limit for the maximum tree depth 10 Table 3 Parameter settings for the simulator. which is not always possible when other methods are used. Examples of generated tree-based strategies are presented in Figs. 12e14, where the trees correspond to strategies with the best cost criterion, median cost criterion and best green criterion, respectively, for one of the optimization runs. Similarly, examples of generated timetable-based strategies are presented in Figs. 15e17, where the timetables correspond to strategies with the best cost criterion, median cost criterion and best green criterion, respectively, for one of the optimization runs.
The performance of the chosen examples is further compared in Table 5. In addition, the improvement of the tree-based strategy over timetable-based strategy is presented in the Improvement column of Table 5.

Discussion
In this section, the differences between the timetable-based, tree-based and manually defined approaches are discussed. Every tree-based strategy found using the proposed method either dominates or is not comparable with any robust timetable-based strategy (Fig. 11). When observed as a whole, the found treebased strategies dominate the found timetable-based strategies and the predefined strategies. This domination is especially evident for the lower cost (i.e., higher profit) solutions (Fig. 11). In our experiments, a 17.1% increase in profit can be observed (Table 5), when comparing two particular solutions of two random optimization runs that obtain the best cost objective values in their respective runs. The green objective of the tree-based approach is lower in that case, however, this is due to the fact that much better solution regarding the cost objective was found. If a solution with a better green objective is preferred, the decision maker may choose another one. The non-dominated greenest solutions are comparable regarding the green objective, however, the cost objective of the tree-based strategies is better. In the case of the presented examples, a 10.0% improvement can be observed (Table 5). For trade-off solutions, i.e., those that do not obtain maximum or minimum values of the objectives, the tree-based strategies that perform similarly regarding the green objective outperform the timetablebased strategies in terms of the cost objective, and the tree-based strategies that perform similarly regarding the cost objective outperform the timetable-based strategies in terms of the green objective (Fig. 11). This is the consequence of tree-based solutions dominating the timetable-based solutions. In the case of the presented examples, where the solutions with the median cost objective values in their respective runs were inspected, the cost objective values are similar, while a 5.8% improvement can be observed in the green objective.
A similar conclusion can be drawn for the test data set, where the relations between the strategy types remain the same, although the scale (i.e., the range of objective values achieved) changes. This indicates good generalizability of the proposed method and the robust timetable-based strategy optimization.
The timetable-based and tree-based strategies dominate the predefined strategies used in a commercially available HEMS, with the exception of the "use-battery-first" strategy, which achieves a very good value according to the green-factor criterion. According to the hypervolume indicator, the timetable-based strategy optimization and tree-based strategy optimization converge at approximately the same rate, while the proposed approach finds significantly better-performing strategies ( Fig. 10 and Table 4). The median hypervolume indicator value of the tree-based strategy optimization runs was about 20% higher than the median hypervolume indicator value of the timetable-based strategy optimization runs, without the overlap between the maximum and minimum values. The indicator values of the nearoptimum strategy are the highest, as expected (Table 4). It is important to take into account that these values are not achievable in real-life. Since theoretical Pareto front is not available, it is approximated using optimization, where decision for each time interval is sought for the whole operation period with data availability. To achieve the Pareto front approximation, the optimization budget was greatly increased for the near-optimal strategy optimization, 40,000 vs. 1000 generations, while keeping all the other population parameters the same.
On the other hand, there is still room for improvement of the tree-based strategy optimization, as is evident from the difference in the performance of the found near-optimum strategies and treebased strategies. This is probably because of too few features being used in the tree-based strategies. Since for the described approach the features have to be designed by hand, some important features that could increase the performance are probably missing. An automatic feature-generation method could increase the method's performance; however, at the cost of strategy explainability.
Some rolling time horizon strategies may perform better, however, at the cost of increased online computational complexity. Note that in the presented approaches, the strategies are precomputed based on simulation and historical data. In order for the HEMS to utilize a one particular chosen strategy, that strategy has to be uploaded onto the HEMS controller, while the strategies could be computed only once per several months. Afterward, only simple arithmetic and if-then-else rules with a low number of total operations are required for the controller execution of the chosen strategy. This is in contrast with the rolling time horizon strategies that usually perform expensive optimization every day or even every few hours.
True multi-objective optimization enables the user to choose the trade-off solution strategy following post-hoc analysis of all the provided trade-offs. The schematic of the decision making is presented in Fig. 3, while the actual trade-offs can be observed in Fig. 11. Some optimization approaches require the user to choose weights in advance, even before the optimization is executed. Choosing the weights, however, can be non-intuitive for a user, since the weights mix different types of objectives, e.g., green and cost objectives in our case. Performing a true multi-objective optimization enables displaying a front of the non-dominated, i.e., the best, solutions on a graphical user interface. In the case of Since the presented approach only manages the energy flows and no appliance is directly controlled, it means that there is no need for the user to change his or her behavior regarding the usage of home appliances. This is in contrast with some other approaches in the related work, where some appliances can be operational only inside the specified time window, in order to achieve the optimal performance.

Conclusions
The optimization of HEMS is an important issue in sustainable smart home energy management. Our approach describes the optimization procedure that enables the users to choose the preferred trade-off between multiple criteria: costs, ecology, comfort etc. The optimization solutions can be precomputed in advance and are represented in the form of decision trees, integrating higher-level strategic with lower-level operational decisions, e.g., sunny-afternoon and rainy-evening with an adaptation to the actual situation. This results in well performing strategies that require low computational resources when deployed in HEMS.
Instead of schedules, our system automatically designs decision Fig. 11. Union of the final non-dominated solutions (left: training data, right: test data) of all the runs using NSGA-II for the multi-objective timetable-based strategy optimization, the proposed algorithm for the multi-objective tree-based strategy optimization, the near-optimum strategy optimization, and other predefined strategies. trees based on history data, and optimizes them. Computationally, the decision trees are as demanding as schedules, but offer greater expression power and advanced optimization possibilities. To demonstrate the improvements, the tree-based strategy optimization is compared to the timetable-based strategy optimization and predefined strategies, those found in the literature and those used by currently available energy-management systems.
Additionally, the near-optimum strategy optimization with NSGA-II was used as a reference to measure how the performance of different approaches can be approximated to the optimal one. The tree-based strategies consistently outperformed the timetablebased strategies on the training and test data. The optimization of tree-based strategies yielded solutions with the hypervolume indicator value increase of 20% over the hypervolume indicator value    obtained from the timetable-based strategies. Three representative solutions of a random optimization run for tree-and timetablebased approaches were also compared. The tree-based solution with the best cost objective resulted in a 17% improvement in the cost objective, when compared to the timetable-based solution.
The median cost solutions of both approaches performed similarly.
In the case of the best green solutions, the tree-based approach yielded a 10% improvement in the cost objective, while keeping the green objective the same, when compared to the timetable-based approach. The higher expression power of the tree-based strategies compared with the timetable-based strategies was also proven theoretically.
The main advantages of the presented approach over the existing ones are the following: 1. The proposed tree-based solutions are computationally much less expensive than online approaches that require optimal strategy recomputation for each time period. 2. Compared to other precomputed strategy approaches demanding approximately the same online computing capabilities, the proposed tree-based strategies outperform the timetable-based and manually defined strategies by up to 17% in terms of the cost objective while keeping the green objective fixed. 3. The proposed approach is based on the true Pareto-based multiobjective optimization, where the user can pick the solution with the preferred trade-off after the trade-offs are clearly presented to him or her. This improves the user experience with respect to the approaches that use the weighted sum of the objectives, since the weights are hard to define in advance. 4. The proposed approach manages the energy flows to and from the battery and the grid, which means that there is no need for the user to adjust his or her habits of using the appliances as is the case in some of the energy-management systems. 5. The source code for running the energy-management systems simulation and optimization is provided.
The limitations of the presented approach are: 1. To fully adapt the strategies to a particular location, historical data on the load, electric energy prices and solar irradiation is required with a 30-min resolution. This data is often not available in traditional energy systems. 2. To offload the computationally intensive optimization step to an off-site location, an internet connection is required. While this is usually not a problem, since smart home energy-management systems are often already connected, it could present an obstacle in some cases.
Several research directions based on the presented work are possible. A similar approach can be tested for other strategies, such as neural networks; however, the decision trees can still be preferred when an explainable model is required.
Additionally, the feature generation could be automated. Currently, there is a need to specify the features by hand, which requires some domain knowledge. Feature transformation of the time-series data could be applied in order to increase the performance; however, at the cost of explainability.
The proposed tree-based strategy optimization method could be improved using cooperative bi-level optimization methods. The two identified optimization levels are the following: a decision-tree structure optimization on the upper level and a threshold-value optimization that corresponds to the given decision tree on the lower level. Both levels are cooperative e they both strive to optimize the same objectives. Currently, a random search using only one instance generation is used at the lower level, i.e., the inner nodes' test values for each new tree-based strategy are generated randomly and are not subject to any optimization.
Furthermore, the method of multi-objective strategy optimization could be applied to other domains besides energy management. Optimizing the energy-management system as presented in this paper can be categorized as a Multi-Objective Reinforcement Learning (MORL) problem with complex rewards (the rewards are not additive and are delayed). This class of problems has only recently been addressed [36]. The proposed method can be classified as a multi-policy approach for MORL problems [36] and is therefore applicable to several other domains. Since the implementation is available [32], the described problem can also be included in MORL benchmark problems. The simulator provides a complex, real-life MORL problem implementation, of which only a few are reported in the literature.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Fig. 17. Timetable-based strategy with best green criterion performance in the last generation of one random run.