Optimal price-based control of heterogeneous thermostatically controlled loads under uncertainty using LSTM networks and genetic algorithms [version 1; peer review: awaiting peer review]

In this paper, we consider the problem of thermostatically controlled load (TCL) control through dynamic electricity prices, under partial observability of the environment and uncertainty of the control response. The problem is formulated as a Markov decision process where an agent must find a near-optimal pricing scheme using partial observations of the state and action. We propose a long-short-term memory (LSTM) network to learn the individual behaviors of TCL units. We use the aggregated information to predict the response of the TCL cluster to a pricing policy. We use this prediction model in a genetic algorithm to find the best prices in terms of profit maximization in an energy arbitrage operation. The simulation results show that the proposed method offers a profit equal to 96% of the theoretical optimal solution.


Introduction
In a power network relying on distributed and renewable energy resources, the exploration of new sources of flexibility is a key factor for its stability.Given the intermittent nature of renewable energy resources, it is challenging to maintain the power balance under normal operating conditions in a grid with deep penetration of these resources.Therefore, more integration of renewable resources increases the need for ancillary services such as regulation reserve and load following requirements 1 .However, using traditional fossil fuel generators to provide these reserves will decrease the net carbon benefit from renewables, weaken generation efficiency and will be economically untenable.Alternatively, demand-side resources can play a key role in supplying the regulation service needed for deep renewable integration with zero-emission operations.Demand-side resources such as thermostatically controlled loads (TCLs), electric vehicles and strategic storage can contribute to ancillary services by acting as a source of flexibility to the grid.Unlike the traditional demand-side management programs, such as peak load shaving and emergency load management, the exploration of higher flexibility from the above-mentioned loads has a big potential in offering more lucrative and faster ancillary services.The potential of these sources of flexibility is reflected on the energy market.Electricity prices fluctuate according to the availability and demand of energy.This can open considerable opportunities for energy arbitrage 2 .
A significant potential for provision of flexibility resides in TCLs such as air conditioners ACs, heat pumps, water heaters, and refrigerators.TCLs represent a high percentage of the total electricity consumption 3,4 .The nature of TCLs permits them to act as a thermal storage which makes it possible to adjust their electricity consumption while maintaining the temperature requirements and the comfort level of the end user.The idea of TCL flexibility relies on the principle that the temperature constraints specified by the users, can be fulfilled by different power trajectories.Finding the optimal trajectory that provides the required flexibility and high lucrative ancillary service is the subject of several studies [5][6][7] .However, this problem requires real-time information about the state of TCLs, their envelope temperature and their behavior in response to temperature dynamics.In most of the cases, this information is only partially available and requires qualitative or quantitative models to estimate it.It is also possible to use model-free approaches to solve the problem of uncertainty and find near-optimal power trajectories 2 .
The optimal power trajectory for a cluster of TCLs is then translated to individual or aggregated control signals using a variety of control methods.Control methods can be categorized into intrusive forms, including direct and indirect control, and non-intrusive form using price proxies.The direct intrusive form of control consists of directly controlling the on/off states of the TCLs, the indirect intrusive form consists of controlling the parameters of TCLs, such as the temperature set points and the switch cycles and the non-intrusive form of control uses dynamic prices to steer the consumption of TCLs relying on price-based demand response programs.The intrusive 1 This work was supported by The Jenny and Antti Wihuri Foundation, FINLAND.
form requires an aggregator contracting with each TCL unit holder for taking control of their TCLs with the condition that their temperature constraints will be respected throughout the control period.The non-intrusive approach relies on the end user's involvement and response to a given control signal in return of a certain incentive or special pricing.The users' response to these signals can also be an automatic response to electricity prices throughout the day using home energy management systems or embedded TCL controllers 8 .
Intrusive control of TCLs has a big potential in offering a wide range of flexibility and market opportunities for the aggregators.It offers a faster response to control signals and permits the design of a more reliable energy arbitrage strategy compared to non-intrusive control through price proxies.However, the implementation of the technological requirements for an intrusive control on a large scale can be challenging due to its high financial requirements.Additionally, the question of whether the consumers are ready to give up the control of their TCLs to an external party can also be a barrier for the implementation of these programs.According to 9, the integration of end users in the demand response (DR) programs is a key factor for its success.Several smart grid projects were analyzed from this perspective and the conclusions suggest that more attention should be given to the domestication of these technologies and their adaptation with the users' experience considering their social dimensions such as individual behavior, education, and income level 9,10,11 .It is therefore necessary to include all these factors in the design of a DR program.Non-intrusive control, on the other hand, has fewer constraints regarding the users' comfort and data privacy.It makes the end user feel included in the decision making of the grid and involved in the energy management.This discussion can serve as a benchmark when making the choice of the control strategy and the implementation of a large-scale DR program.
In our paper, we choose to implement a non-intrusive control using dynamic electricity prices.We first formulate the problem as a Markov decision process (MDP) 12 , where the policy consists of a sequence of electricity prices.The agent is assumed to have no prior knowledge or data about the state of TCL units except their real time power consumption.The idea is to use data-driven models that can learn the consumption patterns of each individual TCL unit and their response to temperatures and prices.We use a long-short-term memory (LSTM) neural network architecture to learn individual TCL units' behaviors as in 13.This method can overcome the problem of uncertainty and the diversity of power consumption preferences in response to varying prices.The aggregator uses these models to simulate the aggregate response TCLs to different pricing schemes during a certain control horizon.An optimization algorithm is then applied to find the best pricing strategy given an objective function.When controlling a cluster of TCLs, different objective functions are considered in the literature, such as tracking a balancing signal 7 or energy arbitrage 5 .In this work we adopt an energy arbitrage objective function, where we maximize the profit of an aggregator that buys electricity from the wholesale market and sells it in the retail market to end users with TCL units.A genetic algorithm is implemented to find the best pricing solution of the aggregate TCLs.

Related work and contributions
The literature contains extensive research concerning TCL control and their flexibility potential.

TCL control approaches
Most early studies, as well as current work, focus on direct intrusive control methods and frameworks.Early work that tackled aggregated modeling of TCLs can be found in 14 and 15.The solution computation and controller design of these approaches is considerably difficult, which represents a drawback for these approaches.These issues were mitigated in more recent works 5,7,16 using a different class of linear population-bin transition models based on Markov chains.Other approaches have proposed time-varying battery models with dissipation such as 17 or without dissipation as in 18.These approaches were used to compute near-optimal control trajectories with a reduced computational cost.Although optimal pricing for demand side management has been thoroughly studied in the literature [19][20][21] , the price-based control of TCLs remains only briefly addressed in the literature.In 22, the operating reserve capacity of aggregated heterogeneous TCLs was evaluated using a TCL model that takes into consideration consumer behavior.The price-based approach was also addressed from the consumer perspective in 23.The objective of the proposed method was mainly to find the optimal set point change in response to electricity prices in other to minimize the increases in the electricity bill due to dynamic pricing.The power gain from this control scheme was then used for load following supply.Another approach was proposed to find the equilibrium between the electricity prices and the users' comfort.Using a Stackelberg game approach, authors in 24 presented a unique Stackelberg equilibrium that maximizes the utility function and minimizes dissatisfaction cost of TCLs users.A similar approach was proposed in 25 and 26 using a mean-field game approach to find the best pricing scheme considering TCLs as price-responsive rational agents.

Deep learning-based models for TCL control with partial observability
Deep learning and other machine learning methods are largely applied in DR programs 27 .The implementation of a TCL cluster control program faces the problem of uncertainty and heterogeneity of the TCL units' behaviors in response to control prices.Consequently, many researchers were interested in using machine learning models that can learn aggregate or individual behavior of TCL units under partial observability.A model-free reinforcement learning was early proposed in 28 for TCL control that gives similar results as model predictive approaches.Reinforcement learning approaches were also used in 29 to control domestic water buffers according to a local photovoltaic production for the maximization of selfconsumption.More recently, the success of deep reinforcement learning approaches has inspired more researchers to tackle the problem of direct TCL control using deep reinforcement learning.Authors in 30-33 have used different deep neural architectures for automatic estimation of the TCLs' state's features in a batch reinforcement learning model.The same authors have later provided a comparison of the different architectures in 33,34.The LSTM architecture has outperformed the other deep neural network architectures.These works focused only on deep Q-learning, which is based on the estimation of a quality function for every potential action before performing the optimization.In 35 Deep policy gradient method was explored along with deep Q-learning for an on-line energy optimization of the buildings.

Contributions
Following the above-mentioned literature and the success of LSTM networks in mitigating the problem of partial state information and solving long-term dependency problem 13,33,34 , we propose a two-step pricing optimization method for the exploration of TCL flexibility in energy arbitrage.This paper addresses the need for new non-intrusive TCL control methods via electricity prices proxies, so far lacking in the scientific literature.The proposed method relies on LSTM networks learning individual TCL unit behavior and the prediction of individual responses to electricity prices.The individual predictions are aggregated to form an overall prediction model.This model is used in a genetic algorithm (GA)-based optimization algorithm to maximize a retailer's profit considering grid and energy cost constraints.To the best of the authors' knowledge, this is the first work that uses LSTM networks in a non-intrusive TCL control problem based on electricity prices within a DR program.The main contributions of this paper are the following: • An MDP formulation of the price control problem where the policy is the set of electricity prices during a control horizon.
• An LSTM network for learning the individual behavior of TCL units in response of electricity prices and temperatures.
• An aggregation of individual TCL units' behaviors, in response to prices, to derive a global estimation of the potential response of the TCL units cluster.
• A genetic algorithm that uses the aggregated information from the LSTM networks to optimize the lucrative benefits from an energy arbitrage operation.

Problem formulation
We consider a cluster of residential households powered by electricity from the same retailer or utility company.The households are equipped with smart meters and TCLs that can react to electricity prices and indoor temperatures.The retailer implements a price-based DR program that announces electricity prices for a certain time horizon in such a way that maximizes an objective function.The optimization is based on an estimated information about the responsiveness to electricity prices and temperatures.Before discussing the pricing optimization approach, we formulate the problem as an MDP 12 .An MDP is defined by its state space X, its action space U, and its transition function f, which defines the dynamics between the current state x t ∈ X and the next step x t+1 under a control action u t ∈U and subject to a random process w ∈ W with a probability distribution p w (., x t ).The transition equation is defined as follows: 1 ( , , ) The objective of this process will be to find a policy h: X→U that minimizes or maximizes a cost function or a reward function throughout the control horizon starting from a state x 1 denoted by: where ρ is the reward or the cost of each time step k given an action h t .Unlike the classic Q-iteration methods, the policy is characterized directly by sum of rewards during a time horizon H.The optimization is performed on the set of actions during the time horizon H and the fitness function is the cost function R h of the policy h.For each policy h, a corresponding sequence of states is estimated implicitly by the forecasting model.

State and control action description
The agent is only able to measure a partial observation of the true state i.e. no information about the indoor temperatures, resulting in a partially observable Markov decision problem.The observable state space X consists of two variables: the outside temperature, and the electric load: ( , ) Since the observable state space only includes part of the true state, it is not possible to directly model future state transitions.Yet this remains convenient when following the results from 13 that we can predict the next step electric load L t+1 using the information of outdoor temperature T t , the electric load L t and the electricity price P t+1 .The state is extended with sequences of past observations of states and actions, which results in a non-Markovian state.
For each TCL, the electric load is approximated by: We assume that the outside temperatures' forecasts are available for every future timestep in the control horizon.
The control action u t consists of the electricity price that the retailer announces for each time step of the control horizon.
As mentioned earlier, even though the retailer is not controlling the TCLs directly, we assume that the TCLs react directly to electricity prices.Therefore, the electricity price controls the state by influencing the amount of energy consumed during a timestep t.The next state is then defined by: ( , , ) ( ( , , ), ).

Objective function
According to the existing literature, the control of TCLs clusters can be performed considering different objective functions.For instance, the objective can be tracking a balancing signal or energy arbitrage.In this work we consider an energy arbitrage problem where a retailer is trying to maximize their profit.However, the framework and methods presented here might as well be applied to different objective functions.We consider the profit as the difference between the revenue and the cost function.We assume that the cost function C t (L t ) is convex increasing in L t for each timestep as formulated in 36.

( )
where, q > 0 is a constant, p t > 0 is the electricity price in the wholesale market and c > 0 is a fixed cost.
In order to avoid overload during peak times, we introduce a maximum load capacity of the power network, denoted L t,max at each timestep.Therefore, we have the following constraint: The revenue is the bill that customers would pay for using the energy during the time window H: Usually, there exists a total revenue cap, denoted as R max , for the retailer.Therefore, we need to add the revenue constraint to improve the acceptability of the retailer's pricing strategies, i.e., without such a constraint, the retail prices will keep going up to a level which is against energy regulations as well as financially unacceptable to the customers.As a result, we have the following constraint:

Methods and implementation
Given the partial observability of this problem, the methods proposed in this paper are nondeterministic.An LSTM network is used to estimate the next states given an initial state and a pricing policy.The method consists of learning the individual behavior of each TCL agent n using an LSTM method as illustrated in 13.The N estimation models will predict the reaction L n,t+1 of each TCL to a state x and a pricing action P t .
The overall estimated load L t is the sum of all the load predictions as in (7).Given this estimation model, we apply a genetic algorithm to find the best pricing policy.
LSTM networks for state estimation LSTM networks are recurrent neural networks that consist of memory blocks.These memory blocks replace the summation units in the hidden layers in a standard recurrent neural network.The input vector and the hidden state vector are passed through the forget gate to determine the keeping rate of the cell state components.The same vector is passed through the input gate to determine how much of the new state candidate C can pass to the new cell state.Finally, the output gate will decide how much of the transformed state cell vector can be passed to the next hidden state vector h t .Following 13, the proposed LSTM network consists of multiple layers of LSTM cells followed by a fully connected layer as illustrated in Figure 1.
In the case of our model, the input I n,t is a 2 x 3 matrix that consists of the electric loads, the temperatures and the electricity prices as follows: , The LSTM network recurrently uses the historical information of loads, temperatures and prices to predict electric load for an individual TCL n, in the next timestep.The aggregation of these predictions gives an approximation of g function mentioned in the previous section.
Initially, for each TCL agent n ∈ N we train an LSTM network based on the historical reactions of these TCLs to prices and temperatures.We assume that a DR program is implemented during a long period, enough to collect a sufficient amount of data related to the reactions of TCL agents to prices and temperatures.

Genetic algorithms for price optimization
Due to the discontinuous nature of the objective function and the complicated dependency between the function electric load L and the electricity prices P, the conventional nonlinear optimization methods are not usable for this problem.Therefore, GA-based optimization algorithms are more suited for this problem 37 .The proposed GA algorithm uses rank selection and value encoding 38 .Each chromosome represents a pricing policy P and consists of a vector of size H.We use uniform crossover 39 and non-uniform mutation 40 .The constraints are handled by the approach proposed in 41.
The proposed GA-based optimization algorithms for TCL pricing control are given in Algorithm 1 and Algorithm 2.
Algorithm 1. GA-based optimization algorithm for TCL pricing control.

1:
Population Initialization, i.e., generating a population of PN chromosomes randomly; each chromosome denotes a pricing policy for the next time horizon H.

2:
for i=1 to PN do 3: Concatenate the price vector to the temperature forecasts of the next time horizon.

4:
for each TCL agent n in N do: 12: Announce the best price vector via the two-way communication infrastructure at the beginning of the control horizon.
Algorithm 2. Individual TCL load prediction using LSTM network.

1:
Build the initial input matrix I n,0 using the initial values of prices, loads and temperatures.

2:
for t=0 to H do

3:
Use the input matrix I n,t to predict L n,t+1 In Algorithm 1, we initialize a population of NP pricing policies at step 1.For each policy P we perform steps 2-6 to evaluate the fitness function and the feasibility for each policy.The evaluation of policies is performed using LSTM sequence prediction presented in Algorithm 2. The best policies are selected, and a new generation is created using crossover and mutation operations in step 10.This process is repeated until a stopping condition or maximum number of iterations is reached.At the end of the optimization process, the best pricing policy is selected, and prices are announced to TCL agents via two-way communications technology.After each control episode, the LSTM learning models are updated according to the new data collected from the actual response to the implemented electricity prices.

Results
In this section we evaluate the functionality of the proposed pricing control methods.A set of numerical experiments were performed on a simulation scenario comprising a population of 30 TCLs exposed to dynamic electricity prices during a period where the outdoor temperatures change significantly.The thermal inertia of each TCL allows the electric demand to be shifted towards lower price moments.The TCL agents determine the amount of electricity to be consumed at each timestep according to the indoor temperature and the electricity prices.The objective of TCL agents is to maintain a reasonable comfort level while minimizing the electricity bill.Therefore, the different TCL agents have different reactions given a set of prices and temperatures depending on individual user's preferences and buildings' characteristics.We define a control timestep of 1 hour and a control horizon of 6 hours.The choice of the control horizon is justified by the limited ability of LSTM to predict large sequences of the future electric loads.The control horizon is chosen in a way that minimizes the number of times the retailer runs the control algorithms and announces the prices, while keeping a good accuracy of the LSTM predictions.

Simulation data
Following 13 the simulation data is generated using two fuzzy logic systems with the following assumptions: • The TCL agents are reacting to indoor temperatures and electricity prices.
• The difference between the outdoor and indoor temperature ΔT depends on the building characteristics and the amount of energy spent in heating/cooling in previous timesteps.
TCL agents are operating during the day to maintain a comfortable temperature of the space while taking into consideration the electricity price in a given hour.Fuzzy logic is used in this problem because it can model non-qualitative concepts like "hot temperature" or "low price".The combination of the two fuzzy logic systems delivers the load L n,t+1 using the outdoor temperature T t and the electricity price P t+1 .The simulation is performed with different parameters to generate diverse data for 30 TCL agents.The temperature and price data used for the simulation are taken respectively from the Kaisaniemi observation station in Helsinki, available online in 42, and Elspot DA electricity prices in Finland 43 for the period between 1 st January 2017 and the 7th September 2018.The generated dataset consists of 14,734 data points for each TCL agent.

LSTM networks results
The data generated from the above-mentioned simulations is used to train the LSTM networks to learn the behavior of each individual TCL agent.The hyperparameters and structure of the LSTM networks are chosen according to the results of 13 and summarized in Table 1.
The results are evaluated using validation data generated from the same simulations.Figure 2a illustrates the learning results for three TCL agents during different time periods with different temperatures and prices.Figure 2b illustrates the comparison between the real and predicted average power consumption of the 30 TCL agents cluster.The power curves show that the TCL agents' responses to prices and temperatures are slightly different.In general, the power consumption is high when the temperatures and electricity prices are low and vice-versa.The comparison between the true load curves and the predicted load curves show a very small prediction error per hour in most cases.The true and predicted load curves have similar shapes and significant resemblances.The peaks and valleys are also predicted accurately in most of the cases, which gives a valuable insight for demand side management.

GA Optimization results
We run the GA optimization algorithm on a population of size 100 for 100 iterations.The parameters used for the optimization are summarized in Table 2.The optimization process is graphically presented in Figure 3.The learning process is measured by the fitness of the best individual in the population at each iteration.Figure 4 illustrates the results of the best pricing solutions for one day.Figure 4a is an illustration of the electricity prices fluctuations during the 24 hours.Figure 4b shows a comparison between the power consumption of the whole cluster under original prices and the power consumption under optimized prices.Figure 4c presents the revenue and profit that the retailer would make under original and optimized prices.Figure 4d presents daily bill of each user of the cluster under original and optimized prices.
The results show a general increase in prices throughout the day.However, this increase didn't result in an increase in the daily electricity bills.Most of customers will be paying a slightly lower amount per day.This is a consequence of upper limit constraint on the revenue described in (12).The overall consumption of electricity was decreased comparing to the original pricing scheme which gives a good idea about the potential energy saving that an optimal pricing strategy can offer.

Comparison with a theoretical benchmark
In order to validate the performance of the proposed algorithm, we consider a case where we have a full access to TCL units' behavior, i.e. the exact electricity consumption of each TCL unit given temperatures and prices at each timestep.The optimization is performed with direct access to the simulation model described above, which provides full observability and perfect information about the TCLs.This theoretical setup can   serve as a benchmark of our method.It can be seen as an upper limit on the profit possibly made by the aggregator without violating the constraints.
The results illustrated in Figure 5a-d, show that the proposed methods have performed very similarly to the benchmark.The hourly prices in Figure 5a, are only slightly shifted from the benchmark prices during most of the day.The difference is only significant in 2 to 3 points.The same observation can be made for the revenues and profits in Figure 5b and electricity consumption in Figure 5c.The comparison of daily bills under optimized prices and benchmark prices in Figure 5d shows a slight rise in the electricity bill in the benchmark model for most customers.This can be explained by the slight increase in prices illustrated in Figure 5a.
The daily revenues and profits under original, optimized and benchmark prices are compared in Figure 6.The comparison shows a closely similar revenue in the three cases.The optimized prices have given a slightly smaller revenue compared to the revenue from original and benchmark prices.However, the profit from original prices is considerably smaller than the profit from optimized prices.The latter is only slightly smaller than the benchmark's profit.Numerically, the profit from the proposed methods is 95.97% of the optimal benchmark profit.This observation shows that an increase in the profit can be made without an increase in the revenue when the prices are optimized correctly.

Discussion and conclusion
In this paper, we demonstrated the effectiveness of a new TCL control using electricity price proxies.The control policy    consists of a sequence of prices influencing the electricity consumption from TCLs.The problem was formulated as a Markov decision process with non-Markovian state to handle the sparse observations of the TCL cluster's state.We extend the observable state with sequences of past observations to approximate the transition function using an LSTM architecture.The LSTM network is used to capture the individual behavior of TCLs under price-based DR.The individual models are aggregated to approximate the next state of the cluster.This approximation is used iteratively in a genetic algorithm to evaluate the potential profit from an energy arbitrage operation and find the optimal pricing policy for a given control horizon.The LSTM models are updated every 24 hours to capture the changes in the TCL units' behavior.
The experiment consists of a retailer agent buying electricity from the wholesale market and selling it to a group of residential TCLs.The agent can only measure the electricity consumption of each TCL and the outside temperature.The agent has access to a significant amount of historical data from an already implemented DR program.Which allows it to train the LSTM models for each TCL unit and perform an optimization on the electricity prices.
We first evaluate the performance of the LSTM network by comparing the real and predicted loads from 30 TCL units during different days.The predicted load profiles are closely similar to the real load profiles both at individual and aggregate level.
The optimization relies on a genetic algorithm with a profit maximization objective.The results of the optimization show that the proposed methods offer a much higher daily profit than the original prices and 95.97% of the optimal profit from a model that has full observation of the state.
The flexibility offered by TCLs is a high potential for ancillary services required for a deep integration of renewable energy sources in the grid.An energy arbitrage operation can offer a service to the grid by exploiting this flexibility using direct or indirect control.The partially observable state and the uncertainty of the TCL response to prices was tackled in this paper with an LSTM network using past observations and actions.The LSTM network offered a high performance by extracting relevant features of the hidden state using its internal memory cell, allowing it to process sequences of sparse observations to learn the hidden patterns of power consumption.
This project contains the following underlying data: • Data used by the fuzzy logic simulation model such as temp_prices and temperatures.
• Data generated by the fuzzy simulator such as fuzzy_outxx.csv and used to train the LSTM models.
• Data related to the optimization process such as results and GA_pricing, optimized_prices_loads Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

Figure 1 .
Figure 1.LSTM Network for TCLs load prediction.The model uses the information about temperatures, loads and price in the previous timesteps to predict the load L(t).Since this is a regressions problem, the fully connected layer uses a linear activation function.

Figure 2 .
Figure 2. LSTM Learning results.(a) Power consumption of different TCL agents in response to electricity prices and outdoor temperatures.(b) Average real and predicted power consumption of the cluster surrounded by an envelope containing 9% of the power consumption profiles for different days.

Figure 3 .
Figure 3. Learning process of a population of size 100.

Figure 4 .
Figure 4. Results' comparison of original and optimized pricing policy.(a) Optimized prices solution for 24 hours.(b) Revenue and profit under original and optimized prices for 24 hours.(c) Total electricity consumption under original and optimized prices.(d) Daily electricity bills under original and optimized prices.

Figure 5 .
Figure 5. Results' comparison of optimized and benchmark pricing policy.(a) Comparison between benchmark and optimized prices.(b) Hourly revenues and profits under optimized prices and benchmark prices.(c) Hourly total electricity consumption under optimized prices and benchmark prices.(d) Daily electricity bills under optimized and benchmark prices.

Figure 6 .
Figure 6.Daily revenues and profits under original, optimized and benchmark prices. ,