Stochastic Nonlinear Model Predictive Control for a Switched Photovoltaic Battery System

Battery systems gain popularity among users in residential household setups. In this setup, currently the main source of profitability is to increase photovoltaic (PV) self-sufficiency which is highly dependent on the battery system efficiency. We present a control approach based on stochastic dynamic programming (SDP) suitable to increase the system efficiency. The optimization framework includes a switched system with standby losses, a nonlinear modeling of the converter losses as well as a stochastic forecast model for household load and PV generation. We show in a simulation of a typical benchmark case that our approach can in fact reduce overall system losses and costs of operation. Then, the applicability in a real-world scenario is shown using a commercially available battery system in a field test.

renewable energy system. However, using the flexibility of a PV battery system in such ways is not relevant in practice for residential users as the regulations and pricing structure currently do not remunerate such behavior. Instead, the main source of profitability is to increase the self-consumption of PV generated electricity on site using a battery electric storage system (BESS).
In this setup, conversion losses in the power electronics and standby losses play a large role in the overall profitability of the PV battery systems [6]. Here, the conversion losses depend nonlinearly on the battery power and play a larger role at low conversion power. This offers a means to reduce conversion losses by avoiding this region of low efficiency and therefore increasing the profit of the user. To do so, a smart control strategy is needed that takes the nonlinear modeling of the losses into account. Such a control strategy is developed in this article. The new control strategy schedules battery usage in real time as well as the switching sequence between operational and sleep state. For the optimal control problems (OCPs) arising from this control scheme, the expected PV generation and household load on site are used as input data. As both time series are stochastic processes, forecasts are afflicted with sizable uncertainties, necessitating a stochastic modeling of PV and household load in the control scheme.
Model predictive control (MPC) is a widely used approach to control PV battery systems with nonlinear models. The resulting OCPs are solved using various methods including dynamic programming [7], [8], approximate dynamic programming [9], or analytically solving the KKT conditions [10]. In [5], an MPC formulation as a quadratic program is developed while [11] approximates the nonlinearities with piecewise affine functions leading to a mixed integer linear problem. Both transform the problems to be able to use efficient commercially available solvers.
Systems with switched discrete states are typically handled using mixed integer formulations. Such a system is presented in [12]. These approaches typically lead to high complexity and cannot be used to integrate uncertainty modeling efficiently.
Stochastic control has been a topic in control systems applied to a smart grid context frequently [13], [14], [15], [16], [17]. Here, the complexity of the system modeling is reduced to linear approximations to allow for a consideration of forecast uncertainties. In previous publications [18], [19], we have combined a nonlinear efficiency modeling with a stochastic forecast albeit without a discrete switched state of the system. Both, mitigating forecast uncertainty and increasing overall efficiency of battery usage have implications This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ on the economic performance. This was briefly discussed in [18].
In this brief, we integrate the nonlinear modeling of conversion losses, a discrete switchable state, and stochastic system dynamics in a control scheme using stochastic dynamic programming (SDP).
After this introductory section this brief is organized as follows. In Section II, the battery system modeling is presented followed by the description of the used control scheme in Section III. The control scheme is evaluated using a simulation in Section IV. A field test is performed to demonstrate the usability in practice.

II. MODELING
We intend to control a PV battery system. For our system model, we use a discrete time formulation with a time step t and horizon length N. The lower index k of a time dependent variable specifies the point in time over the horizon k ∈ {0, . . . , N − 1}. Our system consists of a PV generator, a household load, and a battery system. The system is connected to a public electricity grid. PV generation p PV k will primarily be used to cover the household load p load k . We therefore define the residual load If the PV generation exceeds the household load (R k < 0), it may be stored in the battery storage to cover the load at a later time. The ac power of the battery system p b (u b k , s k ) depends on the controllable converted power u b k and the operation state s k (cf. Sections II-A and II-B, respectively). Positive and negative values for p b k denote charging and discharging, respectively.
Furthermore, excess generation can also be fed into the public grid and is remunerated with a feed-in tariff c f . We denote the grid power with p g where negative values denote feed-in and positive values denote grid supply with costs c s . To calculate p g , we use the conservation of energy We differentiate between grid supply and feed-in using the functions respectively. Hence, we arrive at an expression for the stage cost of the system at time point k given by

A. Battery and Converter Model
The BESS consists of a converter system and a battery storage where we consider standby and conversion losses. The variable refers to the power at the point of ac coupling to the household bus. It comprises the converted power u b k including the conversion losses and the standby losses of the converter system l sb (s k ). Standby losses depend on the current operation state s k which is explained in Section II-B.
The converter system is composed of an ac-dc inverter and a dc-dc converter resulting in losses l c (u b k ) dependent on the battery power. With that, the evolution of the state of charge of the battery system x k can be modeled as with the battery capacity C b and effective charging power with the dynamic loss parameter eff accounting for losses in the battery cells.
The converter losses l c (u b k ) can be described by a quadratic polynomial [20] where a lab measurement of the power dependent losses is performed to determine the model coefficients p a , u a , and r a . The results of this measurement and the fit model are shown in Fig. 1. An upper and lower limit of the ac power is given by the nominal power of the converter system p nom .

B. Switching and Resulting Efficiency
On sunny afternoons a fully charged battery system is often not used as the PV generation still exceeds the load. This also occurs after the battery is discharged completely during the night. Then, standby losses can accumulate to a significant contribution of losses. For our system, a sleep mode is implemented where these standby losses are reduced significantly. Switching between operating and sleep state is modeled by introducing the discrete state indicating if the system at time k is in sleep state (s k = 0), in the working state (s k = S − 1), or in a transition between the two states. The number of modeled states corresponds to the switching time Furthermore, the switching control defines the transition function of the operation state If the converter is not in an operational state (i.e., s k < S − 1), no conversion is possible and u b k = 0. The standby losses normalized to the nominal inverter power in both operational and deep sleep state can be measured at the real system and are

C. Stochastic Residual Load Model
The stage costs in (2) depend on the residual generation R k which are unknown previously. We estimate it with an external forecastR k using a Markov chain model developed in [18] Therein, the parameter τ ∈ [0, 1] holds information on how much a current state is valued as a short-term forecast over the external information. The parameter σ > 0 denotes the uncertainty of the external model and can be determined from probabilistic forecasts as described in [19]. The white noise models the stochastic behavior of the process.

III. OPTIMIZATION
Optimal operation of the system with respect to the costs of operation is achieved using MPC, a paradigm widely applied to various systems. Details on MPC may be found in [21].
In classic MPC, the system is controlled by solving an OCP at each sampling time and using the first entry of the resulting control trajectory as control input.
Here, we deviate from this procedure. Using dynamic programming, an OCP is solved at a lower frequency resulting in optimal policies. From these, the optimal control is obtained at each sampling time using updated state measurements.
We formulate the OCP used to obtain the policies in the following. Values for the controller parameters will be presented in Section IV-A.

A. Stochastic Optimal Control Problem
With the definitions in Section II, the system can be summarized in the state consisting of state of charge x k , switching state s k , and the residual generation R k . Here, the symbol R denotes the real numbers. This state evolves as defined in (6), (12), and (14) as The transition is a Markov chain depending on the previous state, the realization of the uncertainty ε k , and the controls u k . Similar to y k , the switching and battery control variables u s k and u b k , respectively, are summarized to At each time step, a control is determined from a policy This set M is defined as the set of functions that assign a feasible control to each point in the feasible set.
Consecutively, the state sequences y can be determined given the feedback laws μ and the noise ε. In the following, we use boldface symbols without an index to indicate the complete time series of a variable over the horizon. For k = 0, we start with an initial valueŷ obtained from measurement at the time of solving the OCP and define and recursively obtain values for the following time steps: Finally, the optimization problem can be written as where we define the terminal costs based on the value of the stored energy at the end of the horizon Solving Problem (22) yields a feedback law for each time step.

B. Discretization of State and Control Space
In Problem (22), the policy μ can be of any form leading to an infinite-dimensional optimization problem. Along with the stochastic state transition this leads to a challenging problem. We tackle this problem using SDP extending the algorithm described in [18] to accommodate the efficiency measurements and the switching from Section II-A and II-B. For further details on dynamic programming the reader is redirected to textbooks such as [22].
To use SDP, we discretize the state and control space starting with n x discrete states for the state of charge that are the same for the complete horizon. We use the calligraphic setting to indicate that X is a discrete set. Similarly, the continuous feasible set for the battery power is discretized into leading to the discretized feasible set of the battery power u b In (26), we have also ensured that the constraints on x k are always met by only allowing charging and discharging power values that lead to a feasible state in the next time step. Note that no battery activity (u b k = 0) is always part of the feasible set. This guarantees that problem (22) is always feasible.
For the residual generation, we use a discrete state space centered around the external forecastR k where the parameter R controls the width of the domain of the policy. With these definitions, we can define the policy as a lookup table defined on every point of the discrete state space.

C. Approximation Scheme
Typically, external forecasts for residual generation are available with a time resolution of 15 min. With (10), this entails a modeled switching time of at least 15 min. In practice, switching times are in the range of 1 or 2 min. Hence, we interpolate the forecast to a time step of 1 min to model the switching time correctly.
However, computation time scales linearly with the horizon length and SDP is computationally demanding in general. We therefore developed an approximation scheme using a time step of t = 15 min and S = 2.
In it, we define approximated wake up costs as where a transition time of is assumed defining r wu implicitly. Switching OFF is modeled as instantaneous leading to costs This defines the approximated stage costs As (6) is linear in t, a similar modification leads to an altered transition for switching ON With these modifications, the switching time is modeled in a framework with a coarser forecast time step. In contrast to the approach in Section II-B, switching between states during a 15 min interval is not modeled in optimization. Therefore, optimization results may be suboptimal. However, modeling a switching time shorter than t presumably models switching more accurate than the bare algorithm in Section II-B with a time step of 15 min.

D. Control Law in Operation
In operation, a new set of policies with a horizon of 12 h is calculated after every update time of t pol update = 6 h. Optimization is performed using an updated forecast for the residual generation.
Control feedback is determined by evaluating the policies at each sampling time with state measurementsx k andR k and the current operation stateŝ k . The policy is interpolated between the four closest points in the X × R k grid for statê s k . If a measurement of the residual generationR k lies outside the grid, the policy is evaluated at the closest grid point.
With a sampling time of 1 min, the policy is possibly evaluated with a higher frequency than the optimization time step. Then, switching can occur with higher frequency even though it was not modeled in the optimization.

IV. SIMULATION STUDY
To evaluate the control scheme for the switched inverter using a precise model of the efficiency curve, a simulation study is performed with a yearlong dataset.

A. Simulation Setup
For the simulation we use a PV generation time series measured in Freiburg in 2012. For the generation forecast, we use a forecast developed at Fraunhofer Institute for Solar Energy Systems using data of the ECMWF-IFS ensemble forecast [23].
The PV profile was scaled to a nominal power of 4.7 kW. For the household load, we use a measured profile of a four-person household with a total load summed over the year of 4662 kWh. A load forecast is generated using a KNN approach that has shown a high forecast accuracy on other household data [24].
We simulated a battery with a capacity of 4.7 kWh and the model of the converter system shown in Fig. 1 with a maximum ac power of 2.35 kW. Our simulation data had a time resolution of 1 min. Sizing of PV and battery is selected in accordance with the 1 MWh/1 kW p /1 kWh rule presented in [25].
Using a smaller dataset of 12 days of the year we performed a benchmark study to determine the optimal choice of the model and algorithm parameters. This led us to using the following parameters for the yearlong simulation: The value for σ was obtained from forecast uncertainty of the PV forecast and previous errors of the load forecast using a method presented in [19]. Subsequently, the parameter R was determined corresponding to the expected standard deviation of model (14) to be

B. Performance Criteria
To analyze the performance of the proposed algorithm for the control of a PV battery system, the following performance indicators are defined.
1) The amount of converter usage can be measured by the summed Converted Energy A high amount of energy stored in the battery leads to a high saving as the PV energy used to cover the household load is increased. This is the main driver of profitability when supply prices are substantially higher than feed-in remuneration.

2) The Conversion Efficiencȳ
measures if the converter system is used in an efficient power range.

3) The Standby Losses
denote the overall energy lost in the converter system due to standby losses. These losses decrease if the converter system is switched to the sleep state more often.

4) The Total Losses in the battery system
sum up the overall energy lost due to conversion and standby. 5) A hypothetical Electricity Bill is obtained using the distinction between feed-in and supply in (3)

C. Compared Algorithms and Cases
Commercially available PV-battery systems maximize the self-sufficiency of the system. The battery power set value u * can be determined by the control law This is equivalent to charging the battery with excess PV power and covering the household load from the battery when it exceeds the PV generation. If the battery state of charge forbids this, the battery power is changed accordingly. The battery is switched to the sleep state if either R k < 0 and x k = 0 or if R k > 0 and x k = 1. This control scheme is referenced as the standard approach. Furthermore, we have implemented a deterministic MPC approach (Det. DP). A receding horizon of N = 720 and t = 1 min is used with the detailed optimization approach in Section II-B with S = 3, σ = 0, and τ = 0. The same method yields the ideal operation when assuming perfect foresight.
Three types of the developed SDP algorithm are compared. Section III-C, the average calculation time per optimization routine could be reduced to 8.9 s. Deterministic DP had a computational load of 9.0 s per optimization routine while the standard control has negligible computational load.
As a further reference, we have also included simulation results without a battery system installed.

D. Simulation Results
For the setup described in Section IV-A, a simulation was performed using every algorithm of those listed in Section IV-C. From the resulting power time series, certain key performance indicators explained above were calculated and reported in Table I. In the simulation study, both the high-resolution SDP scheme and the approximation scheme performed better than the Standard approach which in turn outperformed Large t and deterministic modeling. A sizable difference remains in the costs of ideal operation compared with all other methods. Fig. 2 shows the mechanism leading to the cost reduction. It shows the time a specific loss power occurs over the year. A higher loss power is entailed by a higher converter power (typically leading to lower relative losses). Therefore, ideal operation shows the steepest decline from either high losses (i.e., efficient operation) to standby losses of approximately 0.01 kW (i.e., no converted power, sleep state).
A second horizontal line above at 0.03 kW corresponds to standby power of l ON exclusively with no converted power. Compared with the standard approach, all other methods reduce inefficient operation at low converter powers. Although standby losses increase as the battery system is in the sleep state for a longer period, the overall losses decrease (see Table I). Operation using a large t without the approximation scheme leads to the highest standby losses. The large time step renders it unattractive to switch to the sleep state. Hence, the battery is idling in the on state for extended periods of time.
Additionally, in the ideal case, and to a lower extent also when using deterministic DP, operation is increased at high powers yielding low relative losses. However, achieving this increase, relies on forecasting the peaks of load and PV generation reliably. Otherwise, battery capacity is reserved to buffer peaks falsely and self-sufficiency is decreased leading to higher costs. In fact, all non-standard methods decrease the self-sufficiency compared with the ideal operation due to this effect.
In general, all methods show lower overall battery usage as inefficient operation is prevented in favor of feeding in the public grid. The current regulatory environment incentivizes selfsufficiency even at the cost of increased efficiency losses. These results in slim margins of improvement compared to the standard heuristic control optimized for this static pricing scheme. However, with an increasing renewable generation, flexibility will be valued higher. Then, storage operation even in residential setups may face a more complex incentive structure. In this, optimization-based and especially stochastic MPC approaches perform better than heuristics. This has been shown in previous publications [18], [19] for a setup with a feed-in limit. These results can be observed with the switched system as well. However, the case is not studied here for shortness of presentation.

V. FIELD TEST
In the simulation study reported in the previous section, it was shown that the control scheme based on SDP can be used to increase the charging and discharging efficiency. A field test was performed to explore the applicability of the proposed control scheme to a real-world setup. To this end, the approximated SDP scheme developed in this article was integrated into the control software of a commercially available battery system. The BESS used in the field test had a battery capacity of 5.9 kWh and the inverter system had a maximum ac power of 2.2 kW. The system was installed in a residential household that also had a 9 kW p PV generator. The field experiment was performed over a duration of 15 days and measured using the sensors of the BESS. These data were recorded with a sampling time of 5 s. Subsequently, a comparative lab test of the same system was performed using the integrated control. To this end, the residual generation profiles measured in the field test were provided to the system through power hardware-in-the-loop.
As discussed in Section III-D, the control scheme consists of the optimization algorithm determining optimal policies and the real-time control interpreting those policies. The control scheme of the commercial BESS minimizes the power at the grid access point p g at all times which is equivalent to the Standard control strategy.
In the field test, this strategy is replaced by the approximated SDP scheme. A policy μ k : X ×S ×R k → U with time step of t = 15 min and horizon of 9 h was stored in the real-time management unit. Control feedback was applied every five seconds as in the Standard control. The optimal control u * was determined by bilinear interpolation of the policy μ. With this value, a corresponding set point for the grid power was determined and used in the real-time management.
Every t pol update = 6 h, a PV forecast is obtained from an external service [26]. A load forecast is generated using the load profile of exactly one week prior.
With these forecasts, the optimization algorithm is triggered using it as a standalone executable via a JSON interface. No additional hardware was necessary for solving the optimization problems. Instead, the optimizer was compiled for and run on the existing BESS control hardware. The average calculation time for a policy of a 9 h horizon was 20 s. After calculation, the stored policy in the real-time measurement was updated and the new policy was used for determination of the real-time control input.
This decoupling of optimization and real-time management facilitated the integration of the optimization based approach into the heuristic control scheme of the existing product. Using the optimizer as a standalone executable led to decoupling of the optimizer from the BESS control software. Fallback controls are implemented for the case of failure of policy generation due to unforeseen reasons. Over the period of 15 days, the converter efficiency using the full SDP approach wasη c,field = 71.18%. The standard control led to a converter efficiency ofη¸, lab = 71.21%. This is close to the roundtrip efficiency in the simulation study. However, due to the limited duration of the field test, this can not be seen as a quantitative analysis comparable to Table I, but is restricted to a qualitative assessment of the battery usage characteristics.
From Section IV-D, it was expected that the proposed optimization algorithm would lead to charging at higher powers and hence an increased efficiency. However, some effects were observed that led to losses in efficiency. First, a slight misparameterization in the optimization algorithm used in the field test led to assuming a maximum conversion efficiency at powers of 1.5 kW. Therefore, the charging power was reduced to that value on sunny days like shown in the profiles of the exemplary day in Fig. 3.
Second, the internal controller of the battery system used a measurement of the power at the grid access point to determine the optimal set point for the battery power from the policy. As the grid power itself is influenced by the battery power, this led to an oscillating behavior. The precise mechanism could not be identified, but the effect was not observed in the lab test using the Standard control.
In contrast to other studies, here switching to the sleep state and subsequent lower standby losses are modeled in the optimization. In the simulation study, it could be observed that charging was performed at powers with higher assumed efficiency to reduce the overall time of battery operation. When the battery was charged completely and PV generation exceeded the household load the BESS was switched to sleep until sunset. Analogously, when the energy needed for the household load during the night exceeded the energy stored in the battery, the BESS was only discharged with power of high efficiency. Unfortunately, this could not be observed in the field test. The cause for that can be observed in the late afternoon on the day shown in Fig. 3. The battery did not charge anymore, although a set point was given and the state of charge was not at 100%. To reach the highest self-sufficiency, the battery is charged with excess PV generation until a SoC of x = 100% is reached. At x = 100%, the system is switched to deep sleep state. In the real system, charging stopped before a full battery was reached preventing the switching command.
The field test was performed in the second half of October leading to days with meager sunlight and hence longer durations without battery activity. In this situation, a large potential can be unlocked by enabling to switch to the sleep state in a subsequent improvement of the real-time management.

VI. CONCLUSION
In this article, we presented a novel optimization scheme to model switching behavior of a converter system into a stochastic MPC approach for a PV battery system. An optimization model was presented and evaluated in a simulation study. An approximation scheme for switching times shorter than model time steps was developed to reduce calculation times at low costs of suboptimality.
It was shown that considering standby and conversion losses in an optimization model can improve the battery efficiency. This leads to reduced electricity costs. Then, the MPC approach can be adapted to different economic environments easily.
A field test showed that the policy based stochastic MPC scheme could be integrated into the energy management software of a commercially available BESS with little overhead. Beyond that, the test indicated that the performance could be improved by more precisely modeling the behavior of the battery cells at maximum state of charge and preventing interactions of battery and grid power.