Integrated day-ahead and intraday self-schedule bidding for energy storage systems using approximate dynamic programming

Most modern energy markets trade electricity in advance for technical reasons. Thus, market participants must commit to delivering or consuming a certain amount of energy before the actual delivery. In Germany, two markets with daily auctions coexist. In the day-ahead auction market, the energy is traded in 60-minute time slots, which are further partitioned into 15-minute time slots for the intraday auction market. Because of the slow ramp-ups of nuclear and fossil power plants, these price-makers trade mostly in the day-ahead market. Only the residual energy is traded in the intraday market, where the market prices fluctuate substantially more. These fluctuations as well as the expected price difference between these markets can be exploited by fast ramping energy storage systems. We address the decision problem of an owner of an energy storage who trades on both markets, taking ramping times into account. Because the state variable of our dynamic programming formulation includes all features of our high-dimensional electricity price forecast, this problem cannot be solved to optimality. Instead, we use approximate dynamic programming. In a numerical study based on real-world data, we benchmark the algorithm against an adapted state-of-the-art approach from literature and an expectation model with a receding horizon. Furthermore, we investigate the influence of the price forecast on expected profit and demonstrate that it is essential for the dynamic program to capture the high dimensionality of the price forecast to compete with the expectation model, which does not suffer from the curses of dimensionality.


Introduction
The energy sector is a large producer of CO 2 and therefore contributes significantly to climate change. Therefore, many countries try to reduce the CO 2 emissions by encouraging power producers to increase the share of renewable energies. This is encouraged by subsidies for produced green energy or by CO 2 prices which have to be paid by conventional CO 2 producing power plants. As the penetration of renewable energy resources increases, an increasing share of the energy production is not controllable. However, the energy production has to always match the fluctuating energy demand. In liberalized energy markets, electricity prices reflect the balance of production and demand. Accordingly, energy storage systems which buy energy at low prices and sell it later at higher prices help to match production and demand, and thus improve grid stability. In most energy markets, market participants must commit to delivering or consuming a certain amount of energy before the actual delivery. In addition, in most countries multiple energy markets coexist.
The market rules including the pricing and the temporal sequence differ between countries.
In this paper, we focus on profit maximizing self-schedule bidding of a fast ramping energy storage in the German day-ahead and intraday auction markets. We formulate the decision problem as a dynamic program and solve it heuristically using backwards approximate dynamic programming.
Our approach is largely independent of the storage technology, but we focus on a pumped hydro storage (PHS). In Section 1.1 we introduce the focused decision problem. The assumptions needed are stated in Section 1.2. Section 1.3 describes our contribution to the literature and outlines the remainder of the paper.

Problem statement
We focus on profit maximizing self-schedule bidding for a PHS in the German day-ahead and intraday auction market. In the following, we describe the considered key characteristics of the PHS and the two markets.
The main components of a PHS are the lower and upper water reservoir, the pump, and the turbine. While storing energy, the pump uses electricity to pump water from the lower reservoir to the upper reservoir. This converts the electric energy into potential energy. While discharging the PHS, the water flows from the upper reservoir through the turbine into the lower one. Meanwhile, the turbine converts the potential energy into electric energy. In this paper, we consider the following properties of the PHS: capacity, working power range, efficiency, start-up cost, and ramping-time. The capacity is the maximum amount of energy which can be stored in the PHS.
Due to technical restrictions, the pump and the turbine cannot operate at arbitrary low power.
They are either switched off completely or operate in the working power range defined by the maximum and minimum power of the turbine and the pump. While the pump or the turbine converts the electric energy into potential energy or vice versa, some energy is lost for the PHS.
The share of energy which is not lost is the efficiency of the pump/turbine. When the pump or the turbine starts to work, they suffer from increased abrasion. The corresponding increase of the maintenance costs is captured by the start-up cost. The ramping-time describes how long the pump and the turbine need to change from one power level to a new, increased/decreased power level.
During ramping, the storage owner cannot meet the commitments on the energy markets exactly.
The difference is balanced by the transmission system operator (TSO) and the storage owner is paid/billed by the TSO for the under/overproduction. Based on the TSO contracts, intentional under/overproduction is forbidden.
In Germany, two auction markets with daily auction coexist; the day-ahead and the intraday auction market. The key characteristic of the German markets is that the intraday auction market differs in terms of the time length (15 min) of the traded products from the day-ahead market (60 min). In the day-ahead market, market participants can submit multiple price-volume combinations, called bids, for each traded product, until noon. The traded products are commitments to generate or consume a certain amount of energy for each hour of the following day. Forty minutes later, the market clearing prices (MCP) are published and the market participants must fulfill their accepted bids to the MCP. A buying bid is accepted if the price component of the bid is above the MCP. A selling bid is accepted if the price component is below the MCP. To remove the uncertainty regarding acceptance of bids, market participants can use price-accepting bids (also called self-schedule bidding). The latter term is derived from the fact that the production/consumption schedule of the market participant only depends on the self-chosen volumes and not on the market's outcome. The market participant's actual profit/cost, of course, still depends on the realized market prices. The German intraday auction market works similar to the day-ahead market, but with quarter-hourly products. After the intraday auction market closure at 15:00, the market results are published. The main purpose of this market is to capture load ramps with its finer temporal resolution (15 min instead of 1 h at the day-ahead markets). Primarily because of these ramps, the prices in the intraday auction market fluctuate around the day-ahead prices. This is visualized in Figure 1 for an exemplary week. The fluctuation can be exploited by fast ramping energy storage systems. For a detailed overview of the German energy markets see Viehmann (2017). While market participants consume energy they have to pay the obligatory grid fees.

Assumptions
In this section, we state the assumptions underlying this work in detail.

Assumption 1. The PHS only trades in the day-ahead and intraday auction market.
Even if it is desirable to consider all possible trading oppotunities (see Viehmann (2017) for an overview), we restrict the trading to the two auction markets. Assumption 1 reduces the complexity and computational burden of the decision problem. The restriction of the considered markets even down do a single market setting is common in the literature (e.g. Gönsch & Hassler (2016), Zhou et al. (2016), Salas & Powell (2018), Scott (2019), Durante et al. (2020), Ghavidel et al. (2020), Finnah & Gönsch (2021)). In Section 7 we discuss the limitations of our research regarding further trading opportunities (derivative markets, balancing markets, continuous intraday markets, overthe-counter markets) and how our research can be beneficial for market participants which trade in more markets.

Assumption 2. The PHS uses price-accepting bids/self-schedule bidding.
Assumption 2 further reduces the complexity of the decision problem. Self-schedule bidding is common for energy storage systems (e.g. Löhndorf & Minner (2010), Heredia et al. (2018), Khatami et al. (2020), Vahedipour-Dahraie et al. (2020), Xu et al. (2020)), as energy storage systems are profitable if the difference between the buying price and the selling price is large enough. The price difference cannot be reflected by price-volume bids. In Appendix A, we numerically investigate the benefits of price-volume bids in a simplified setting. In particular, we show that price-volume bids are not needed for energy storage systems with realistic correlation of the noise of the price process.
Price-volume bids can only outperform self-schedule bids with artificially reduced correlation. This may be different for hydropower plants with huge reservoirs (discharge time of days to weeks) and a natural water inflow. For these power plants to be profitable, it is essential to consume the resource at high price hours (e.g. Fleten & Kristoffersen (2008), Boomsma et al. (2014)).

Assumption 3. The PHS ramps linearly with identical rates for ramping up and down.
Identical ramping rates reduce the notation needed but do not influence the computational burden of the decision problem. However, linearity reduces complexity, as this allow us to model the daily decision problem as a mixed integer linear program (MILP).

Assumption 4. Balancing prices are linear in intraday market prices.
During ramping, the storage owner cannot meet the commitments on the day-ahead and intraday market exactly. The difference is balanced by the transmission system operator (TSO) and the storage owner is paid/billed by the TSO for the under/overproduction. Based on the TSO contracts, intentional under/overproduction is forbidden. To reflect this, the PHS must integrate the balancing prices into the decision problem. Assumption 4 is in line with the literature (e.g. Löhndorf & Minner (2010), Kim & Powell (2011), Moghaddam et al. (2013), Kuznetsova et al. (2015), Gönsch & Hassler (2016), Vahedipour-Dahraie et al. (2020), Finnah & Gönsch (2021)).
The modeling approach with balancing prices allows to make the deviation from the commitment financially unattractive. A special case is to set the balancing prices equal to the intraday market prices. This reduces the needed notation but does hardly decrease the computational burden.
Additionally, this contradicts the TSO contracts, as deviations may now be financially attractive and, thus, occur intentionally when following an optimal policy.
Assumption 5. The PHS pays grid fees for the stored energy quantity.
While the PHS stores energy the energy storage acts like a consumer and has to pay the obligatory grid fees. In Germany, there are several types of grid fees, based on the consumed energy (MWh) and the yearly peak demand (MW). For an overview of the different grid fees see Sterner & Stadler (2017, Ch. 15.4). Based on Assumption 5, we do not take the grid fees based on the peak demand into account. Since it is reasonable to assume that the PHS uses the pump's maximum power at least once a year, the peak demand is out of focus.

Assumption 6. The PHS owner's price forecasts depend only on the last week's market prices.
Assumption 6 restricts the storage owner's price forecasts, but this kind of price processes is well researched and accepted in the price forecasting literature (e.g., Ziel (2016), Marcjasz et al.
(2019), Narajewski & Ziel (2020b)). Furthermore, price processes which follow Assumption 6 are considerably more sophisticated than price processes usually handled by dynamic programs in the energy storage literature. Their inclusion is challenging as dynamic programs have to extend the state variable by all information which is necessary for the considered price processes.
Because of the curses of dimensionality, dynamic programs with large state spaces cannot be solved using standard discrete lookup tables in an appropriate time and can rarely be solved analytically. Therefore, dynamic programs have to use simplified price processes (one may argue that price processes are always approximations and, thus, no exact model/solution with regard to reality exists) and/or have to be solved approximately by approximate dynamic programming (ADP) algorithms. Both can lead to a poor solution quality, so it is crucial to find a reasonable trade-off between complexity of the price process and solution quality together with a suitable solution approach. Especially algorithms that approximate the value function can struggle with high dimensional forecasts (Powell, 2019). To keep the dynamic program simple, nearly all authors use price processes which only depend on a single electricity price (e.g., current, previous, or average of past prices) and state-independent features such as the day of the week (e.g., Gönsch & Hassler (2016), Zhou et al. (2016), Zéphyr & Anderson (2017), Salas & Powell (2018), Scott (2019)). To capture the characteristics of the electricity price like price spikes, some authors add jump-diffusion or regime-switching models (e.g. Jiang et al. (2014), Zhou et al. (2016), Salas & Powell (2018), Scott (2019), Durante et al. (2020)). For a detailed overview of energy price forecasting see Weron (2014).

Contribution and outline
We make the following contributions to the existing literature: (1) We model the energy storage problem with high-dimensional, competitive, Markovian price processes for sequential auction markets as a dynamic program and solve it using a backwards approximate dynamic programming algorithm.
(2) We compare this algorithm with the state-of-the-art approximate dual dynamic programming approach (Löhndorf et al. (2013), Löhndorf & Shapiro (2019), Wozabal & Rameseder (2020)) and an expectation model with a receding horizon. We demonstrate that dynamic programs have to handle a high dimensional state variable to capture the features of the underlying price processes to compete with a simple benchmark which does not suffer from the curses of dimensionality.
(3) In our numerical study, we focus on a pumped hydro storage in both German auction markets. We show that the integrated trading in both markets outperforms the sequential trading. For this we evaluate the trading strategies with real world data.
The remainder of the paper is organized as follows. Section 2 is the literature review. Section 3 models the storage owner's decision problem as a dynamic program. In Section 4, we state a simplified version of the base model and explain how we apply the solutions of this simplified model.
In other words, how we derive a policy for the base model from the simplified model's solution.
Section 5 describes the approximate dynamic programming algorithm to solve the aforementioned simplified problem. Section 6 presents the numerical study. Section 7 discusses limitations with a focus on the markets considered before Section 8 concludes the paper. The appendix contains further numerical investigations and summarizes the notation.

Literature review
The existing literature on the management of energy storage systems is already discussed in multiple literature reviews (e.g., Fathima & Palanisamy (2015), Rahman et al. (2015), Weitzel & Glock (2018)). Therefore, we focus the following review on the key topics that distinguish our paper from the existing literature: modeling/solution approaches, price processes used in optimization, and integrated trading in multiple markets.
The most commonly used modeling approaches for the energy storage management which capture the stochastics and dynamics of the problem are multi-stage stochastic programming (MSSP) and dynamic programming. To solve MSSPs, the model is usually replaced with the sample average approximation (SAA). Because the computational burden of MSSP increases exponentially in the optimization horizon, it is only suitable for short-term management (a few days). For longer horizons, the problem can be solved with a receding horizon. A main advantage is MSSPs' independence of the underlying price processes. By contrast, dynamic programs have to extend the state variable by all information which is necessary for the considered price processes. Therefore, competitive price processes are rarely used in dynamic programs. Löhndorf et al. (2013) formulate the decision problem of an owner of a hydro storage with stochastic inflow on the German day-ahead and continuous intraday market as a dynamic program and use approximate dual dynamic programming (ADDP) to solve it. The energy price process is based on a few fundamentals like mean temperature, wind and solar power generation, and the gas (2014) focus a profit maximizing power producer in the Nordic day-ahead market and a balancing market. The problem is formulated as a MSSP and takes price/volume bids into account. Based on the assumption that the scenarios in the SAA are the true distribution of the market prices, the solution is exact. In addition, the authors derive an upper bound for the gains from coordinated bidding in both markets. The price processes are modeled as a SARIMA process, but the noise is assumed to be independent and identically distributed. In fact, the forecast error of day-ahead market prices can be strongly correlated. Ding et al. (2015) use stochastic programming to (re-)optimize the management of a wind turbine combined with an energy storage on the Spanish dayahead, intraday, and real-time market. Here, they employ four different timescales down to one minute to capture the dynamics of the real-time market. Instead of simultaneously optimizing all considered markets, they use separate stochastic programs in a receding horizon manner. Moreover, the authors assume that the next market's price is deterministic. This dramatically decreases (2020) and uses a more complex price process. Again, no energy storage is modeled and each day is optimized independently. Since the modeled dynamic program has a simple structure, the optimal policy is derived analytically via backward recursion.
To the best of our knowledge, no paper addresses the stochastic storage problem in both German auction markets. The key characteristic is that the German intraday auction market differs in terms of the time length of the traded products from the German day-ahead market. Since the German intraday market is the only one with 15-minute products, models for other markets have to be reworked to be applicable. While this does not have be challenging (depending on the model), this may structurally change the numerical results. In addition, we are the first to extend the state variable of a dynamic program by all Markovian features of a high-dimensional price forecast (several hundred dimensions/ explanatory variables). We handle the full complexity of the state variable while solving the problem.

Dynamic program
In this section, we model the storage owner's decision problem for T days as a dynamic program.
More precisely, the exogenous information, the state variables, and the action variables are defined in Section 3.1 to Section 3.3. The contribution function is described in Section 3.4, while the transition function is defined in Section 3.5. These are used in Section 3.6 for the value function.
The notation used is summarized in Table D.5 in Appendix D.
The PHS owner trades in the two German auction markets: the day-ahead and the intraday market. In the German day-ahead auction market, each day t until noon, the PHS owner submits a self-schedule bid (volume) x DA t,h for each hour h of the following day (t + 1). Forty minutes later, the market clearing prices (MCPs) P DA t+1,h for each hour are published, and the volumes come into effect at the MCPs. Subsequently, until 15:00, the PHS owner makes an offer x ID t,h,q for each quarter hour q of each hour h of the following day (t + 1) in the intraday auction market. Again, the MCPs P ID t+1,h,q are published after market closure. This sequence of events is illustrated in Figure 2.  In the following, we formulate the problem as a dynamic program. Since each day t ∈ {0, · · · , T − 1} consists of two decision stages (day-ahead market and intraday market) we model the problem with alternating stages. The day-ahead market stages are labeled with the superscript DA and the intraday market stages with ID.
with day of the week dummies and correlation matrix Σ DA . The regression parameters are denoted β DA (·) . Thus, the exogenous information which becomes known right after the day-ahead market decision stage is W DA t+1 = P DA t+1 . We assume that the 96-dimensional intraday price vector P ID t+1 = P ID t+1,0,0 , · · · , P ID t+1,23,3 depends linearly on the day of the week, the intraday, and day-ahead prices of the last week, including the known day-ahead prices of the same day P DA t+1 .
with correlation matrix Σ ID and regression parameters β ID (·) . Thus, the exogenous information which becomes known right after the intraday market decision stage is W ID t+1 = P ID t+1 .

State variable
The state variables include everything the storage owner's decision depends on and all necessary information to evaluate the transition and contribution functions. In particular, it must include all prices of the last week as they are necessary for the price forecasts. Since the day of the week DoW (·) (·) can be derived based on the period/index t, it does not have to be part of the state even if it is needed for the price forecasts.
The state S DA t is the system right before the day-ahead market decision x DA t is made.
Obviously, the storage owner's decisions depend on the current storage level. R start t is the storage level at the start of day t. x start t is the current power in-or outflow. This is needed for the considered start-up cost and ramping. Altogether, the state S DA t has 842 continuous variables.
The state S ID t is the state of the system right before the intraday market decision x ID t is made and after the day-ahead market prices P DA t+1 become known.
The state S ID t extends the state S DA t by the commitments for the day-ahead market x DA t (which are not physically fulfilled) and the known day-ahead market prices P DA t+1 which are needed for the forecast of the intraday market prices P ID t+1 .

Action variable
The day-ahead market decision x DA t is the 24 commitments for this market's products.
A positive x DA t,h corresponds to buying energy and a negative one to selling energy. To reduce speculation on the price split between day-ahead and intraday market, we introduce the no-shortselling constraint (9). According to this constraint, the commitments must be in the working range of the energy storage. Likewise, the intraday market decision x ID t is the 96 commitments for this market's products. s.t.
All constraints are defined for all q ∈ {0, 1, 2, 3} and h ∈ {0, · · · , 23}. Additionally, (·) t,h,4 is de- Compared to the day-ahead market, additional constraints and auxiliary variables like ∆x turbine,up t,h,q are needed because costs like grid fees, TSO balancing or start-up that depend on net commitments in both markets are considered in the profit related to the intraday market (see Section 3.4).
Constraint (11) splits the summed trading volume on both markets into a positive (store energy, x pump t,h,q ) and a negative part (generate electricity, x turbine t,h,q ). If a component (pump or turbine) is working (y comp t,h,q = 1), constraint (12) limits the power to its feasible range; otherwise, it forces the power to zero. According to the working mode constraint (13), the pump and the turbine cannot work at the same time. Constraint (14) captures how much the components have to ramp up (∆x comp,up t,h,q ) or ramp down (∆x comp,down t,h,q ) to reach the new power level x comp t,h,q . The storage level equation is constraint (15). The new storage level is equal to the previous plus the incoming minus the outgoing energy taking the ramps and pump and turbine efficiency (η pump , η turbine ) into account. According to Assumption 3, the energy storage ramps-up/down linearly with constant ramping time. We denote the ramping time as ∆t ramp . While ramping to reach the new power level, ∆t ramp 2 ∆x (·) t,h,q less/more energy is consumed/produced. ∆t ID is the temporal resolution of the intraday auction market. Constraint (16) keeps the storage level non-negative and within the storage capacity R max . Constraint (17) defines the only binary variables, and constraint (18) ensures non-negativity for the remainder of variables except x ID t,h,q .

Contribution function
The contribution function C DA t is the storage owner's one-stage profit at the day-ahead market.
The one-stage profit at the day-ahead market is the sum of the volumes times the market clearing prices times the duration of the day-ahead market products ∆t DA . Since a positive x DA t,h corresponds to buying energy, the sum has a negative sign.
The storage owner's one-stage profit at the intraday market is denoted by The first row is the profits based on the commitments on the intraday market times the duration of the intraday market products ∆t ID and the grid fees. While the energy storage stores energy, it acts as a consumer and has to pay the obligatory grid fees c gf . The second row corresponds to the profits and costs due to the TSO's balancing. During ramping, the storage owner cannot meet the commitments exactly and the difference is balanced by the TSO. The deviation from the commitment is ∆t ramp t,h,q . The storage owner is paid/billed by the TSO based on the balancing prices Q over t+1,h,q and Q under t+1,h,q . According to Assumption 4, the balancing price is linear in the intraday market prices. While the pump ramps up or the turbine ramps down, the storage owner consumes less resp. produces more energy than committed. In this case, the storage owner is paid the balancing price for overproduction with intercept Q over intercept and slope Q over slope . While the pump ramps down or the turbine ramp up, the storage owner consumes more resp. produces less energy than committed. In this case, the storage owner is billed the balancing price for underproduction with intercept Q under intercept and slope Q over under . The last row in (20) is the start-up cost of the pump (c pump ) and the turbine (c turbine ). They capture the increased maintenance cost (e.g. abrasion). The variables z pump t,h,q and z turbine t,h,q are binaries that capture if the pump or the turbine has to start-up. They are integrated by the following constraints.
These constraints are defined for all q ∈ {0, 1, 2, 3} and h ∈ {0, · · · , 23}. Additionally, (·) t,h,4 is defined as (·) t,h+1,0 and y pump t,0,−1 = 1 x start t >0 and y turbine t,0,−1 = 1 x start t <0 . Constraint (23) forces the non-negative start-up variable z comp t,h,q to be at least 1 if the pump/turbine starts up. Because z comp t,h,q exerts a negative influence on the intraday stage contribution function (20) and does not influence future profits, z comp t,h,q is binary in every optimal solution without being forced to be binary. Constraint (24) restricts the start-up variables to be non-negative.

Transition functions
The two transition functions capture how the system transitions from one decision stage to the next. The transition function S M,DA simply unions the state, action and exogenous information of the day-ahead market stage.
The transition function from the intraday market stage to the day-ahead market stage is denoted S M,ID and only slightly more complex: where R t,23,3 follows the storage level equation (15).

Value function
Obviously, the decisions x DA Now, we have all necessary notation and can formally state the optimization problem faced by the storage owner who maximizes the profit over an optimization horizon of length T .
An alternative formulation of the optimization problem are the recursive Bellman equations with the boundary condition Obviously, the same constraints have to be respected. One way to find a policy is to compute the value functions V DA t and V ID t and derive the decisions based on (28) and (29) with respect to their corresponding constraints. For other ways to derive a policy see Powell (2019).
The policy function X DA,π t (·) maps each state S DA t to a decision X DA,π t S DA t for the day-ahead market stage. The policy function X ID,π t (·) does this for the intraday market stage. To define the policy functions, we introduce the following policy model. For this, we replace the alternating decision stages by an expectation model. This is illustrated in Figure 3.
The upper half of Figure 3 illustrates the base model described in Section 3. It visualizes the alternating decision stages for one period t. Squares symbolize decision nodes, whereas circles denote stochastic nodes where new information arrives. The period starts with state S DA t in the day-ahead market stage, where the day-ahead market decision x DA t has to be determined. Then, the exogenous information W DA t+1 (i.e. the day-ahead market prices) becomes known. This brings the system to state S ID t in the intraday market stage, where now the intraday market decision x ID t has to be determined. Then, the exogenous information W ID t+1 (intraday market prices) arrives. This completes period t and transitions the system to the the first state of period t + 1, S DA t+1 . The lower half of Figure 3 shows on an intuitive level the policy model for the decision problem. x ID t are made simultaneously before the exogenous information W DA t+1 and W ID t+1 becomes known. Different from the base model, the intraday market decision x ID t is already determined in state S DA t . In the following, we state the policy model and how we use it to derive the policy functions X DA,π t (·) and X ID,π t (·). The notation is summarized in Table D.6 in Appendix D.
The state variable of the policy model is equal to the state of the day-ahead market decision stage (6). The action variable combines the actions of the day-ahead and intraday decision stages.
The exogenous information is the joint exogenous information of the day-ahead and intraday market stage, where the latter one is conditioned on the first one.
Since we will remove the stochastics in the contribution function of the policy model, W policy t+1 is only used for the state's transition.
The transition function S M,policy is the function composition We denote the policy model's one-stage contribution by C policy t . Instead of modeling a stochastic contribution based on the exogenous information W policy t+1 and later derive the expectation, we model C policy t as the expectation itself. For this, we need the expectation of the day-ahead market prices, intraday market prices, and the balancing prices. We denote these expectations asP DA t,t+1,h , t+1,h,q , andQ under t,t+1,h,q . The subscript t denotes that these expectations use only information that is available at day t. The subscript t + 1 indicates that these are the expectations of exogenous information which become known right before day t + 1. Since the exogenous in-formation is modeled linearly, the expectations' calculation is straightforward. Finally, the policy model's one-stage contribution function is This results in the value function (9), (11) to (18), (23) to (24), (33).
In Section 5, we derive approximationsV policy t of the value function V policy t to yield the pol- to (18), (23) to (24), (33) and (18), (23) to (24), (26). Accordingly, we adjust the intraday market decisions based on the new day-ahead market prices. More precisely, the storage owner does not actually use the intraday market decision of the policy model, but re-optimizes the decision according to the policy function X ID,π t S ID t taking into account the outcome of the day-ahead market.
While the expectation of the linear C ID t S ID t , x ID t , W ID t+1 is again straightforward, we explain in the following section how we compute the expectation of the value function. In Appendix B, we demonstrate that the policy defined by X DA,π t S DA t and X ID,π t S ID t is as good as the solution of the SAA of the stochastic counterpart on a daily base.

Approximate dynamic programming
As the formulated policy model is continuous and high-dimensional, we cannot solve it optimally. To tackle this problem, we propose a BADP algorithm. This algorithm solves a problem with discretized state space in which R start t and x start t are discretized and the price history P history t is replaced by a set of sample paths. The notation is summarized in Table D.7 in Appendix D.

BADP
It is common practice to discretize the state space , for example by discretizing the endogenous part of the state and replacing the exogenous part with a scenario lattice. The corresponding sample paths are sampled with the stochastic processes based on the initial state. To derive the expectation of the next value function, transition probabilities P P history t+1 P history t are needed.
In our numerical study in Section 6, we show that this approach has a poor solution quality for the problem considered here. Therefore, we propose in the following an algorithm that does not need transition probabilities nor cluster-centroids. Instead, BADP (Algorithm 1) estimates the expected value of being in the next state by a linear combination of the sample paths. This approach is similar to an approach proposed in Broadie et al. (2000) and Glasserman (2003, Ch. 8.5) for pricing options. The authors estimate weights that mimic transition probabilities based on a constrained optimization problem. In contrast, the algorithm presented here does allow negative weights that do not have to sum up to one. In Section 5.2, we enhance the distance measure used in the constrained optimization problem computing the weights. Furthermore, we introduce box constraints which depend on the distance of the sample paths. (ρ 1 , · · · ,ρ N ) := arg min end for 18: end for BADP (Algorithm 1) mimics the classical backwards recursion for discrete dynamic programs.
The main purpose of BADP is to fill a look-up table which approximates the value of being in one of the considered discrete states (Step 14). Steps 1 to Step 7 sample N scenarios P history t,n , n ∈ {1, · · · , N } for the price history P history t and pre-process data for the further algorithm. In particular, Step 1 initializes the sampled price history in the starting period P history 0,n as the price history Step 4 computes the expectation of the price history of the next periodP history t,t+1,n , conditioned on the current price history sample P history t,n . For this, the transition function (33) and the price processes in Section 3.1 are used. The expected price history is needed later in the algorithm (Step 11). Since our price processes are linear, the expectation is straightforward. The subscript t indicates that only information available at iteration t is used to derive the expectation for stage t + 1 (indicated by subscript t + 1).
Step 5 samples a new price history for the next period. The price history sample P history t+1,n in Step 5 is determined by the transition function (33)  The expectation of the next period's value function is estimated by a weighted sum based on the regression parametersρ k . These parameters mimic (on an intuitive level) the transition probabilities, but as they can be negative, the parameters are able to extrapolate exogenous states that are not evaluated. This is illustrated in Figure 4 (explained later) for out-of-sample data.

If a decision in
Step 14 leads to a storage level R start t+1 or trading volume x start t+1 that is not part of the lookup table, we linearly interpolate the value function in these. Therefore, the look-up tableV policy t+1 can be interpreted as a piece-wise linear function. Thus, Step 14 can be solved with standard MILP-solvers. This linear interpolation is standard in the literature (e.g., Löhndorf et al. (2013), Wozabal & Rameseder (2020)).
After fixing the value function approximationV policy t , the policy in Section 4 can be applied.
The expectation of the next stage's value function necessary to calculate the policy functions X DA,π t (·) and X ID,π t (·) is approximated as in Algorithm 1. A pseudo code for the execution logic can be found in Appendix C. In contrast to other ADP algorithms, it can be applied without re-optimization. However, this may increase solution quality. is computed (orange dashed line). It comprises of the realized prices of the last 6 days and the expectations for the next day (day t + 1). Now, this expected price history is regressed on the price history samples P history t+1,n . The fitted regression N k=1ρ k P history t+1,k is drawn with a blue dashed line. In Case a, the expected price history can be approximated well by a convex combination of the samples. The corresponding regression parameters can be easily interpreted as transition probabilities. In Case b, the expected price history lies outside of the sampled price histories. Therefore, one needs negative regression parameters and/or they do not sum to one.

BADP-weighted
We modify the BADP algorithm by replacing the regression in Step 11 with a box-constrained, weighted regression. With this, we take into account that the influence of some prices on the price forecast is more important than the effect of others. In addition, due to the box-constrains, the optimal regression parameters may be unique even for N > 840.
(ρ 1 , · · · ,ρ N ) := arg min For the upper and lower bound we select K d t,k as which is the Gaussian kernel with the weighted Euclidean norm ∥·∥ wt . We set the bandwidth d as this works well in the preliminary test.
We select the vector of weights w t = (w t,1 , · · · , w t,j , · · · , w t,840 ) for the weighted Euclidean norm Algorithm 2 works as follows. For each period (Step 1) and over the length of the price history (Step 2), we select the price history as the j-th unit vector (Step 3). Additionally, we initialize the expected price historyP history t,t which is needed in the first iteration of Step 6 and Step 7. We advance forwards (Step 5) in time and update the price history (Step 7), while we add up the absolute values of the expected outcomes (Step 6). We do not normalize the weights w t,j . To exclude the effects of the day of the week, we set their parameters to zero in the price processes.
With the weighted regression, we take into account that some prices are substantially more important for the future prices and should be better approximated than others. For instance, the last day-ahead price of the previous day contains the newest information about the cumulative renewable energy production which influences the next day's electricity prices (Ziel, 2016). With the box constraints, we force the regression to approximate the expected price history with the closest scenarios in the set of sample paths, as the value function should be close as well.

Numerical study
In Section 6.1, we state the parameters for our numerical study and specify the implemented approaches. The parameter values are summarized in Table D.9 in Appendix D. Based on this, Section 6.2 compares the different implemented algorithms. Additionally, Section 6.3 indicates that the decisions of the best approach are reasonable. For this, we investigate the storage level over time. Section 6.4 and Section 6.5 show the gains from using a competitive price forecast and from the integrated trading in both markets. Thus, a deviation from the commitments is financially unattractive. While the storage owner buys energy,the energy storage acts like a consumer and has to pay the obligatory grid fees (up to 30 e/MWh in Germany). However, because energy storage systems stabilize the grid, they only pay reduced fees. This reduction is a case by case decision Sterner & Stadler (2017, Ch. 15.4). We choose rather low fees of c gf = 5 e/MWh. Finally, the optimization horizon is T = 30 days. Since the storage can be fully charged/discharged in half a day, the start-of-horizon and end-of-horizon effects are quite small. In practice, a storage owner would re-optimize in a rolling horizon fashion before the end-of-horizon. Since all approaches specified in this section suffer from the same start-of-horizon and end-of-horizon effects, we do not control for them and the results are still comparable.

Data for the numerical study
While solving and evaluating the energy trading problem, multiple sources of stochasticity exist. This is illustrated in Figure 5. The arrows indicate the dependence of the three main steps (estimating price processes, solving trading problem, and evaluation) in the numerical study and how they are influenced by the stochastics. Because of the random division of the training data into subsets for the cross validation, the parameters of the estimated price processes are stochastic (explained in more detail during this section). Based on these price processes, we sample the sample paths for our optimization approaches (e.g., BADP). Additionally, the price processes are used in the optimization (explained in Section 5.1). Therefore, the policy functions X DA,π t (·) and X ID,π t (·) are influenced by this stochasticity . Thus, the actual decisions and the profit on the evaluation level are stochastic, even when we evaluate with real-world data and no further stochastic influence on the evaluation level exists. To compute confidence bands that capture the stochastics present in evaluation, we run the simulation (estimating price processes, solving trading problem, and simulation of operation) 20 times.
Because the German energy markets' prices behave differently over the course of a year (e.g. due to increased solar radiation in summer), we consider four different settings denoted as Winter, Spring, Summer and Autumn. The data is summerized in Table 1. For each season, the price processes are estimated based on data from EPEX SPOT SE (2020) with 365 days in the training set. For this we use the least absolute shrinkage and selection operator (lasso) with ten-fold cross-validation. For the cross-validation, the training data is randomly partitioned into ten subsets. Note that on 10/01/2018, there was a major change in the German auction markets when Austria left the German system. However, to estimate Summer with 365 days in the training set, we have to include data before this major change. Table 2 shows the out-of-sample mean absolute error (MAE) and the root mean squared error (RMSE) of our price processes. To compute the errors, we forecast the next day's day-ahead and intraday market prices with the estimated price processes based on the real-world data known at this time. Afterwards, we measure the error between the forecasted prices and the next real-world data. This is done for all days in the test data set.
The parameters of the estimated price processes (β DA (·) and β ID (·) ) contain some kind of randomness. More precisely, the splitting of the training data set into subsets for the cross-validation is random (illustrated in Figure 5). To tackle this randomness, we estimate the price processes 20 times and report the average MAE and RMSE. For each estimation with ten-fold cross-validation, the training data is newly divided into ten subsets. To ease the interpretation of the errors, we report the average market prices of the test sets.
Therefore, the data in Table 2 captures the randomness on estimation level. In practice, one would average the 20 estimated price processes. This is not done in our numerical study for two reasons. First, the parameters of the price processes hardly differ for the cross-validation runs.
Second, this allows us to investigate the influence of the stochastics on expected profit on the estimation level.  Table 2 shows that the day-ahead market prices are harder to forecast than the intraday market prices. Both MAE and RMSE are much higher for the day-ahead market. Especially hard to predict are the day-ahead-market prices in Spring, as it shows the highest error measurements while the mean price is rather low.
Following the standard approach for the evaluation of ADP algorithms, we conduct numerical comparisons with benchmark approaches. Therefore, we implement several approaches using Matlab version R2020a and run simulations on a standard PC with a 3.6 GHz AMD Ryzen 5 and 16 GB of RAM. We evaluate every policy with the data in the test data set. In particular, we evaluate the following BADP variants and benchmark approaches: • BADP: This is the BADP algorithm discribed in Algorithm 1. We discretize the storage levels R start t in 11 equidistant points. Preliminary tests indicate that the last trade x start t exerts less influence on the decision. Therefore, we choose a rougher discretization for it: x start t ∈ −x turbine max , 0, x pump max . We set the number of sample paths as N = 50. This results in 11 · 3 · 50 = 1, 650 discrete states. The MILP in Step 14 is solved with the Matlab solver intlinprog. After fixing the value functions, the induced policy is applied without re-optimization.
• BADP-w: For this BADP approach we replace Step 11 in Algorithm 1 with the quadratic problem (36) and (37). The coefficients for the lower and upper bounds are described by (38) and (39). The weights are computed with Algorithm 2. We select the same discretization for the states and the same number of samples as for the unweighted BADP. After fixing the value function approximations, the induced policy is applied without re-optimization.
• BADP-w(re-optimized): This approach is largely the same as BADP-w, but the value function approximations are re-optimized once at t = 15. In more detail, the value function approximations are derived first with BADP-w. After applying the induced policy for t ∈ {0, · · · 14} with fixed value function-approximations, these are re-optimized. For this, we use BADP-w with a new set of sample paths which is sampled based on the current state in t = 15.
• ADDP: We adapt the approximate dual dynamic programming algorithm proposed by Löhndorf et al. (2013). This algorithm was recently used by Wozabal & Rameseder (2020) for a problem with 48 exogenous dimensions in the state variable. This algorithm is a version of stochastic dual dynamic programming (SDDP) but allows the stochastic processes to be Markovian and is not limited to stagewise independend processes (Wozabal & Rameseder (2020)).
Originally, this algorithm iterates over K iterations and approximates the value functions of the post-decision states with hyperplanes. Since the value function approximations are piecewise linear and the stochastic processes are linear and the transition function is linear in these, we can derive the expectation in (35) analytically. So, there is no computational advantage to use the post-decision state formulation. Therefore, we approximate the value functions of the pre-decision states. Like BADP, the ADDP approach replaces the exogenous part of the state P history t with a scenario lattice. But in contrast to BADP and in line with Löhndorf et al. (2013), the scenario lattice is described by N cluster centroids (we use N = 50). For generating the scenario lattice, we sample 1, 000 price histories for each period t and apply k-means clustering with the city block norm. The necessary transition probabilities are approximated by counting the transitions between the clusters. Each iteration of the ADDP algorithm contains a forwards pass and a backwards pass. During the forwards pass, ADDP generates new endogenous states (R start t and x start t ). After this, during the backwards pass, we iterate over the endogenous states and all price histories in the scenario lattice to add a new hyperplane for each combination.
Preliminary tests indicate that K = 10 iterations are a reasonable trade-off between solution quality and machine time. More precisely, additional iterations hardly improve the profit. To speed the algorithm up, when we add a new hyperplane, we remove all hyperplanes which are dominated by the new one. We do not implement the rejection of hyperplanes with negligible improvement. This could speed the ADDP approach up, but also worsen the results slightly.
After fixing the value function approximationV policy , the policy in Section 4 can be applied.
If the transition leads to exogenous states which are not part of the scenario lattice, the tran-sition probabilities have to be updated. This is in contrast to the other approaches which are nevertheless applicable. To speed this up, we follow Wozabal & Rameseder (2020) and base the decisions on the transition probabilities of the closest cluster centroid. Additionally, the sample paths depend on the estimation level. See Figure 5 for an illustration.
Therefore, a comparison of the implemented approaches cannot rely on a single run for each approach. Accordingly, we run the approaches 20 times, once for each set of estimated price processes (estimation level). The only stochastics which influences EM-7 is the stochastics on estimation level. Perfect foresight is not influenced by stochastics at all.

Comparison of the solution approaches
In this section, we compare the solution approaches based on the parameters in the previous section. To capture the randomness on estimation level and policy level, we solve the trading problem 20 times with each approach, once for each set of estimated price processes and evaluate with the test data set (see above). We report the relative mean profit summed over the four seasons of the approaches and the 95% confidence bands. We do not report the profits for the single seasons, as the approaches' behavior does hardly change in the different seasons.
The reported machine time is the average machine time to learn the value functions for 30 stages.
Since the machine time hardly differs for the different seasons and estimated price processes, we do not show the confidence bands. Furthermore, we do not report the machine time for deriving the policies during evaluation because, even with the slowest approach (EM-7 ), the policies are derived in a few seconds per decision stage. ADDP is dominated in terms of profit and machine time by all other approaches. We acknowledge that the rejection of hyperplanes or better redundancy detection techniques could further reduce the machine time, but we do not think this would allow to considerably increase the solution quality as the performance appears to be quite flat in this regard (not shown here). As EM-7 and Perfect foresight do not have to learn the value functions, the machine time is zero. The large gap between the upper bound Perfect foresight and the other approaches is not surprising. Electricity prices are hard to forecast and show price spikes which can be exploited by Perfect foresight. This underlines that deterministic approaches with full hindsight information have little value in this context. We also tested a variant of the popular least squares monte carlo approach (with lasso and ten-fold cross-validation, not shown here), which yielded 94.31% of EM-7.
As mentioned, the approaches behave similar in all four seasons (not shown here). But further investigations indicate that BADP-w benefits more from reliable forecasts than EM-7. While there is no significant difference of the profits in Spring, where the prices are hard to forecast, the largest difference is in Autumn, where the error measurements are lowest (see Table 2).
Re-optimizing with BADP-w in the middle of the optimization horizon (at t = 15) has almost no benefit. Even if more re-optimization steps may slightly improve the profit, this already indicates that BADP-w works well without re-optimization. BADP and ADDP could benefit from reoptimizing as well. To save space, we do not investigate this in the numerical study, since these approaches cannot compete with BADP-w.
The objective function includes only costs that are variable in the considered trading problem.
We use the term "profit" for simplicity, although, formally, "contribution margin" would be more appropriate. In reality, additional costs like wages and depreciation of investment have to be taken into account. Therefore, the 1.44 percentage point increase from EM-7 to BADP-w increases the net profit much more and may even turn a loss into a profit. Because of the high investment and running costs of energy storage systems they usually have a low internal rate of return (IRR) of 0.2% to 9.1% (Steffen (2012), Zhu et al. (2013), Olaszi & Ladanyi (2017)). The actual IRR depends on the specifications and costs of the energy storage and the trading strategy. If one can increase the "contribution margin" with BADP-w by 1.44 percentage points this also increases the IRR by nearly 1.44 percentage points. Therefore, the IRR can be dramatically increased by 16 to 840 %.

Decision making insights
To investigate the decisions of the storage owner and to show that the decisions derived by BADP-w are reasonable, we analyze the storage level for Autumn. Since we derived 20 different policies with our best approach BADP-w we report the average storage level over the 20 policies in Figure 6. Additionally, we report the maximum and minimum storage levels as well as the day-ahead prices. As reference points the Mondays (0:00) of Autumn are marked. Overall, we observe strong daily and weekly patterns in the storage level, which follow electricity prices. On a weekly level, the storage is charged on weekends and discharged at the beginning of the week. On a daily level, it is charged right after midnight and discharged during the day. In particular, we see that at the first three Sundays the storage is full, as it was charged during the weekends where the prices are usually low to sell electricity during the week where the prices are higher. Only during the fourth weekend the storage is not fully charged as prices did not drop.
After the storage is fully charged, not all energy is sold directly on Monday. This is only true for the first Monday. After the second Monday, most energy is sold Tuesday and after the third Monday at Wednesday. These were the weekdays where the prices peaked.
In some hours or even days the gap between the policies with maximum storage level and minimum storage level is quite large (e.g. in the week of the second Monday). This happens if multiple nearly equally good decisions exist (e.g. discharging a day later at nearly equal prices).

Influence of the price forecast
To check if it is worth to handle a high-dimensional price forecast in the dynamic program, we vary the number of days in the price forecast and therefore the dimension of the exogenous part of the state P history t . We denote the number of days in the price history as D. The corresponding price history is P history For each D ∈ {1, · · · , 10} we estimate the price forecast using lasso with ten-fold cross validation 20 times and learn the value functions with our best approach BADP-w. Again we report the mean profit and the 95 % confidence bands.
In addition, we use EM-7 with seven days in the price history as a benchmark, since this does not suffer from the curses of dimensionality.  Figure 7 illustrates the influence of the length of the price history used in the optimization on profit. We estimate ten different price processes for each market with one (S DA t ∈ R 122 ) up to ten days (S DA t ∈ R 1202 ) in the price history. We fit the value functions with BADP-w and derive the decisions based on this forecast and evaluate these with the data in our test data set. The difference of the mean profits with 1 to 6 days in the price history is not significant. But the seventh day in the price history is important for BADP-w to outperform EM-7 which always uses seven days in the price history to calculate the forecast used in the expectation model. This significant increase in the profit follows because the electricity prices have a strong weekly pattern and, thus, the prices of the same day of the last week are good predictors for current prices. Again, the difference of the mean profits with 7 to 10 days in the price history is not significant. Note that the fact that profit increases in the price history is not obvious, because a larger state variable is harder to handle, which could also lead to a decline in profit. Thus, Figure 7 demonstrates that BADP-w can handle a large state variable. More importantly, as there is no reason not to use a high-dimensional price forecast for the benchmark approach, Figure 7 indicates that the dynamic program needs at least seven days in the price history to outperform the benchmark approach with seven days. Thus, if we are not willing to handle the large state variable in the dynamic program, the much simpler benchmark approach would be the better choice. Moreover, we also tested the influence of the price forecast on EM-7 (i.e. with D days in the history used to compute the expectations for the next 7 days); not shown here. The profit of EM-7 shows a similar pattern as the profit of BADP-w, but BADP-w outperforms EM-7 for every D.
Since Figure 7 indicates that the seventh day in the price history is more important than the previous days, this suggests the reduction of the state space via feature selection. One could only include the most relevant prices of the previous week. While the importance of features (dimensions of the price history) is considered in the weights of BADP-w, the feature selection is not investigated in full detail. As the modeled price processes must follow the Markov property, the feature selection is not straightforward. It is possible to fit the value functions with BADP-w based on a simplified forecast and therefore simplified model (e.g. D = 1) but derive the decisions based on a more complex forecast (e.g. D = 7). For this, the higher dimensional S DA t and therefore P histoy t is treated as a latent state variable while deriving the trading decisions based on the fixed value function approximations. Therefore, we compare this approach with the data in Figure 7.
For this, we derive the decisions based on our basic forecast (D = 7).   do not show the EM-7 benchmark with its confidence bands. Figure 8 demonstrates that treating a higher dimensional forecast as a latent variable pays off, but the gain decreases in D. The latent state variable approach with three days in the price history comes closest to the basic setting D = 7, but further tests (not shown here) indicate that the difference is still significant. We do not include D > 7, because in this cases the price history would contain dimensions that are not used for the forecast and therefore would be the irrelevant.
In conclusion, it is essential for dynamic programs to use a competitive forecast to outperform a simple benchmark approach.

Gains from integrated trading
In this section, we compare the integrated day-ahead and intraday trading with restricted trading settings. In particular we consider the following settings: • Day-ahead only (DA): This setting trades only in the day-ahead market and corresponds to our policy functions X DA,π t (·) and X ID,π t (·) with additional constraint x ID t,h,q = 0 ∀h, q.
• Intraday only (ID): This setting trades only in the intraday market. This can be reflected with our policy functions X DA,π t (·) and X ID,π t (·) by adding the constraint x DA t,h = 0 ∀h.
• DA+ID, sequential: This setting trades in the day-ahead market without anticipating the intraday market. Afterwards, the commitments are adjusted on the intraday market. To reflect this with our policy functions, we add the constraint x ID t,h,q = 0 ∀h, q to X DA,π t (·) and the constraint x DA t,h = 0 ∀h to X ID,π t (·).
• DA+ID, integrated: This is the basic setting considered in the remainder of the paper.
• DA+ID, integrated, no arbitrage: Integrated trading in both auction markets can speculate on the price difference between the hourly day-ahead market products and the price of the corresponding quarter-hours on the intraday market. In this setting, we modify the intraday market price process defined in Section 3.1 to enforce that the expected market prices are equal.
Therefore, no arbitrage based on the expected price difference is possible. For this setting, we shift the expected intraday market prices to the expected day-ahead market prices and use them in the contribution function of the policy model (34): All settings are optimized with the BADP-w approach with seven days in the price history and evaluated with real-world data. Again, we run the simulation (estimating price processes, solving trading problem, and evaluation) 20 times, to capture the stochastics (see, Figure 5). We report the mean of the summed profits over the four seasons and the 95% confidence bands. Table 4 compares our basic setting (DA+ID, integrated) with four other settings: only using the day-ahead market (DA), only using the intraday market (ID), using both markets with a sequential optimization (DA+ID, sequential), and trading integrated but without exploiting the price differences (DA+ID, integrated, no arbitrage). If only one market is used, the intraday market is much more profitable for the storage owner than the day-ahead market. This is because the intraday market prices fluctuate substantially more than the day-ahead market prices, which can be exploited by the fast ramping energy storage. The immense difference between the dayahead and intraday profits decreases in the ramping time (not shown here). In the sequential day-ahead and intraday setting, the storage owner first trades on the day-ahead market, as if only the day-ahead market exists, i.e. exactly as in setting (DA). After this, the storage owner adjusts the commitments on the intraday market. The integrated bidding can enhance the profit by 8 percentage points compared to the sequential consideration of the markets and 2.5 percentage points compared to the intraday market setting. In our study, the intraday market setting outperforms the sequential setting. This is because decisions on the day-ahead market cause a bad starting point for trading on the intraday market and, thus, profit is lower. This changes with increasing ramping times (not shown here). If the storage owner does not exploit the expected price difference between the markets, the relative profit only decreases by 0.12 percentage points. The difference between these two settings is not significant. Thus, (almost) no arbitrage is possible, which indicates that the German auction markets are efficient.

Discussion
In this section, we discuss possible limitations of our research. We observed that integrated trading in both German auction markets outperforms sequential trading and trading in a single auction market. We did not investigate whether integrated trading in the auction markets outperforms trading in other markets (e.g., derivative markets, continuous intraday market, balancing markets, over-the-counter). Furthermore, we did not investigate if integrated trading with additional markets is beneficial. In the following, we discuss how this limits our research and how practice can still benefit.
Derivative markets: Trading derivatives can be an important instrument to hedge the risk (e.g., variance of returns), see Hain et al. (2018). Since we optimize expected profit, the derivative markets are not beneficial.

Continuous intraday market:
As the continuous intraday market works similar to stock markets, a product's current price includes all relevant information and is therefore the best predictor for the future price of the product (see, e.g., Narajewski & Ziel (2020a)). Since the continuous intraday market starts with the intraday auction market clearing prices and both markets have the same temporal resolution, integrated trading in the intraday auction market and the continuous intraday market should not be profitable. Moreover, the continuous intraday market has a rather high market entrance barrier and a 24/7 trading floor is needed.
Nevertheless, sequential trading in the continuous intraday market after integrated trading in the auction markets should outperform pure auction market trading. Since the trading in the auction markets restricts the trading in the continuous intraday market, pure continuous intraday market trading could be even better than a heuristic solution of integrated trading.

Balancing markets:
The balancing markets can be an important opportunity for fast ramping energy storage systems. We do not include them in our research for the following reason. The rules of the German balancing markets changed multiple times in recent years (e.g., 2018, 2019, and 2020). Since these changes happen during the time span considered in the numerical study of Section 6, nearly no data for the current regime exists.
Over-the-counter markets: For energy storage systems, the over-the-counter (OTC) markets are hardly relevant, because there, most energy is contracted several days or months ahead. Most energy storage systems are fully charged and discharged within hours, so they benefit most from short-term price fluctuations. Additionally, there is no public information about OTC markets available.
In summary, we are confident that we modeled the two most important German markets, but acknowledge that the integration of additional markets, like balancing markets, is desirable.
Based on our correspondence with practitioners and in line with literature, we see that market participants often trade in more than two markets but in a sequential manner. As seen in Section 6.5 this can decrease profit, even compared to trading in fewer markets. Nevertheless, market participants that trade in a sequential manner can use our model and approach to combine two links in the chain to improve their trades. This is the first step towards a fully anticipating trading strategy that integrates all relevant markets in an optimization problem.

Conclusion and insights
In this paper, we modeled the energy trading problem for both German auction markets with a competitive price forecast as a high-dimensional dynamic program. We implemented a suitable backwards ADP solution algorithm and showed that the approach successfully overcomes the resulting curse of dimensionality. Furthermore, the implemented approach outperforms an intuitive and widely used benchmark using an expectation model in a receding horizon fashion as well as a state-of-the-art approach from literature. However, it is important to use a long enough price history in the optimization's forecast to outperform the expectation model. This is an important observation as the resulting explosion of the DP's state space often forces researchers to use shorter histories, whereas this is much less a problem for expectation models. Furthermore, one has to use an approach that is designed for handling high-dimensional price processes, albeit common methods from literature cannot.
Regarding the integrated trading, we have shown that integrated trading in both German auction markets is beneficial. Further, we have shown that trading in a sequential manner can be even worse than only trading in the intraday auction market. This shows that storage owners that sequentially trade have to rethink this and anticipate later markets. This may also be true for markets not included in our research. To reduce the computational burden of such an integrated optimization it is sufficient to use self-schedule bidding in the auction markets. If one needs to reduce the computational burden even more, an expectation model with a competitive price process is more appropriate than an ADP approach with a simple price process.
Future work could go in several directions. First, from a methodological perspective, infinitehorizon dynamic programs should be considered. Whereas our proposed dynamic program is timeindependent (except the day of the week), future work can reformulate the problem as a steady state problem and solve the storage problem with an infinite horizon. This has its own challenges.
For example, the set of sample paths can no longer simply be sampled with the price processes because this depends upon the initial price history. Instead, one can use all past price histories of the last few years. Second, another research avenue would be to investigate the influence of ramping time and other properties of the energy storage on expected profit. Especially if investment costs are taken into account one can, for example, investigate the benefit of a small buffer battery for a pumped hydro storage to enable instant ramping.

Appendix A. Self-schedule bidding
To check the self-schedule assumption Assumption 2, we show numerically that the gains from price-volume bids are negligible for electricity prices with highly correlated noise (like in Germany).
For this, we solve a simplified energy trading problem numerically with price-volume bids and with self-schedule bids and evaluate these with common out-of-sample scenarios. In this simplified setting, the PHS only trades in the day-ahead market and ramping is ignored. The notation is summarized in Appendix Appendix D. To show the effect of the correlation of the noise, we draw the scenarios based on the price process defined in Section 3.1, but artificially reduce the correlation. We denote the artificial correlation matrix as Σ DA,γ . This is defined as ··· ,24;j=1,··· ,24 with γ ∈ [0, 1] and the originally estimated correlation matrix Σ DA = (σ DA i,j ) i=1, ··· ,24;j=1,··· ,24 . For γ = 1, the matrix Σ DA,γ has the correlation estimated with real-world data. For γ = 0, the correlation is zero.
We model the simplified decision problem as a two-stage stochastic program. Based on a state  is defined for all b ∈ {1, · · · B} and h ∈ {0, · · · , 23}. Additionally, R t,−1,s is defined as R start t and y pump t,0,−1 = 1 x start t >0 and y turbine t,0,−1 = 1 x start t <0 . The objective (A.4) maximizes the average profit over all samples. While pumping, the PHS must pay the market price P DA t+1,h,s and the grid fees c gf . While generating energy, the PHS receives the market price. If the PHS does not deliver the energy sold (e.g., empty storage), it has to pay the balancing price for under supply Q under t+1,h,s . If the PHS stores less energy than bought (e.g., full storage), it receives the balancing price for over supply Q over t+1,h,s . The increased maintenance costs during start-up are reflected by c pump and c turbine . Constraints If not stated otherwise, we use the same parameters as in the numerical study of Section 6.
Since this section focuses on daily energy trading, we use R start t = 0 and x start t = 0. Moreover, we define balancing prices for over-and undersupply that are linear in the day-ahead market price. The following parameters differ for the energy trading problem with price-volume bids and the problem with self-schedule bids: • Price-volume bids: Here we use S = 50 price scenarios drawn with the estimated price process but artificial correlation matrix Σ DA,γ . As parameters for the price components of the B = 33 bids, we use P buy,bid , ∈ −∞,P DA t,t+1,h − 30,P DA t,t+1,h − 28, · · · ,P DA t,t+1,h + 30, ∞ ∀h. Thus, self-schedule bids within ±30 e/MWh of the expected market prices are used.
• Self-schedule bids: Here we use a single price scenario (S = 1) with P DA t+1,h,s =P DA t,t+1,h ∀h. Therefore, the prices are equal to the expected day ahead market on day t + 1 only using data known at day t. To force (A.4) to (A.15) to use self schedule bids, we use as price Obviously we cannot report the objective value of (A.4) but have to evaluate this with out-ofsample random numbers. In the following pseudocode we state how to apply (A.4) to (A.15) on out-of-sample data: To show the effect of the correlation of the noise, we solve the daily trading problem for each of the 120 days in the test data presented in Section 6 with γ ∈ {1, 0.75, 0.5, 0.25, 0}. Since we estimated the price processes 20 times with cross validation, we solve the problem 20 times for each day. In contrast to the remainder of the paper, we only use the corresponding price history P history t and do not evaluate with real-world data. The reason for this is that the real world data follows a price process with γ = 1. Therefore, we have to evaluate with randomly drawn scenarios based iii on the correlation matrix Σ DA,γ . This procedure introduces a new level of randomness (evaluation level) which is not included in Figure 5. To reduce the computational burden, we set a relative MILP solution gap of 1% while deriving the first-stage decisions. For the correlation based on real-world data (γ = 1) there is no significant difference. The reason for this is, that energy storage systems are profitable if the price difference between the buying hours prices and selling hours prices is large enough. Since the forecast errors of the 24 day-ahead market prices are very highly correlated (coefficient of correlation up to 0.98 for consecutive hours), forecasts usually over-or underestimate the whole day's prices, which hardly influences the daily price differences. This justifies our self-schedule Assumption 2. Furthermore, the machine time with price-volume bids is over 5,000 times higher than the machine time with self-schedule bids (not shown here). With decreasing γ and therefore decreasing correlation the price-volume bids become increasingly profitable. For uncorrelated noise (γ = 0), price-volume bids dramatically outperform self-schedule bids by over 130 percentage points. Therefore, in energy markets with lower correlation, energy storage systems must use price-volume bids. For example, the Nordic markets behave very different than the German markets because of the immense penetration of storeable hydropower.
Please note that we only show above that price-volume bids are not needed for the daily energy trading. However, we are confident that this is sound enough for the following reasons: First, from iv a theoretical perspective, price-volume bids outperform self-schedule bids if one is able so solve the corresponding problems exactly. Thus, it is reassuring that price-volume bids never perform significantly worse than self-schedule bids. Second, one may ask whether this also holds for the problem with multiple days. We did not investigate this setting here, as the tractability mentioned above becomes more difficult as the problem grows. However, since most energy storage systems are fully charged/discharged within hours, the daily price differences are the most important ones for energy storage systems. In contrast, hydropower plants have huge reservoirs (discharge time of days to weeks) and a natural water inflow. For these power plants it is essential to release the exogenous resource at high price hours to be profitable (e.g. Fleten & Kristoffersen (2008), Boomsma et al. (2014) All constraints are defined for all q ∈ {0, 1, 2, 3}, h ∈ {0, · · · , 23}, and s ∈ {1, · · · , S}. Additionally, (·) t,h,4,s is defined as (·) t,h+1,0,s and R t,0,−1,s = R start t and x pump t,0,−1,s = max x start t , 0 and x turbine t,0,−1,s = max −x start t , 0 . Since this section focuses on daily energy trading, we use R start t = 0 and x start t = 0.
The SAA (B.1) to (B.12) is the stochastic counterpart of the policy model. The scenarios for the day-ahead market price P DA t+1,h,s are sampled based on the price process defined in Section 3.1 and the current price history. The samples for the expected intraday market priceP ID t+1,h,q,s are derived based on the price process defined in Section 3.1 conditioned on the sample P DA t+1,h,s . The expected balancing prices are computed based on these the usual way. Obviously, we cannot report the objective value of (B.1) but have to evaluate this with the real-world data in our test data set.
In the following pseudocode we state how to apply (B.1) to (B.12) to real-world data: We solve the daily problem for each day in our test data set and evaluate the decisions with the real world data. We generate the samples based on the price processes with seven days in the price history. Since we estimated the price processes 20 times, we solve the daily problems 20 times and report the relative mean profits for S ∈ {10, 20, 30, 40, 50} and the 95% confidence bands. To reduce the computational burden, we set a relative MILP solution gap of 1% while deriving the first-stage decisions (Step 2 in Algorithm 4). functions with BADP-w and the policies (expectation model) designed in Section 4 and derive the decisions while evaluating with the stochastic program, but we do not think that the stochastic program would benefit dramatically enough to significantly outperform the expectation model.

Appendix C. Evaluating BADP
Using the value function approximation derived by BADP to evaluate the induced policies works similar to Step 4, Step 11, and Step 14 in Algorithm 1. The execution logic can be found in Algorithm 5.