(2019). An argument-driven classification and comparison of reservoir operation optimization methods. Advances Water Resources 74-86.

Reservoir operation optimization aims to determine release and transfer decisions that maximise water man- agement objectives such as ensuring a reliable water supply, hydropower production, mitigation of downstream ﬂoods, etc. An extensive and growing body of scientiﬁc literature exists on advancing and applying mathematical optimization methods to reservoir operation problems. In this paper, we review such literature according to a novel classiﬁcation system of optimization approaches, which focuses on the characteristics of the actual oper- ation problem – i.e. what needs to be optimized, or in mathematical terms, ‘the argument’ of the optimization problems - rather than the mathematical properties of the optimization algorithm. This enables us to discuss the advantages, limitations and the scope of application of the diﬀerent optimisation methods; and to provide practical guidelines for matching the properties of a system and operation problem with a suitable optimization method. Alongside this paper we provide code to implement many of the methods we review for an illustrative reservoir system.


Introduction
A recent estimate places the total global storage capacity of reservoirs and dams between 7000-8,000 km 3 ( Lehner et al., 2011 ). While dam construction has slowed in countries with a Human Development Index (HDI) above 0.7 (the UN Development Programme's threshold for a 'high' level of development ( Jahan, 2015 )), it is likely to continue at a considerable rate in countries with an HDI below 0.7. The latter countries contain around half of the human population and have the highest projected growth rates, with a total population increase of 18-27% by 2030 ( UN, 2015 ), as shown in Fig. 1 . Currently, the per capita water storage of low HDI ( < 0.7) countries is around one third of high HDI ( > 0.7) countries. Besides building new dams to fulfil irrigation and water supply needs, hydropower dams are also expected to triple worldwide (from 2000 to nearly 6000) by 2030, under growing electricity demand ( Ansar et al., 2014;Zarfl et al., 2014 ). Despite their importance and the level of planning and resources required to construct a dam, it is common for reservoirs not to achieve the goals envisaged in their design, in terms of both economic returns and mitigation of negative impacts ( WCD, 2000;Labadie, 2004 ). Dams are most commonly criticised for causing social and environmental damage, such as the displacement of communities or the obstruction of sediments ( Graf, 1999;Ouyang et al., 2010;Tockner et al., 2011;Liermann et al., 2012;Scudder, 2012 ), which may not be sufficiently understood before- Fig. 1. Map of the reservoirs listed in the GRanD database ( Lehner et al., 2011 ), centres of circles are a dam's location, the size is proportional to capacity and the colour indicates HDI. Countries are coloured by projected population growth by 2030. but more uncertain, cost in the mid-term) and across uses (for example, between irrigation, hydropower and municipal supply).
Research has demonstrated that the use of mathematical models to simulate and optimize reservoir operation can significantly enhance the performance of existing reservoirs, as well as enable efficient design of new dams or their repurposing/expansion. Traditionally, dam design has been based on Rippl's method, an approach that aims to find the smallest reservoir capacity that can ensure releasing the target water demand through a worst-case drought ( Rippl, 1883;Hazen, 1914;Loucks et al., 2005 ). Drawbacks of this approach include the difficulty in applying it to systems that go beyond the simple single-reservoir and singlepurpose case, for example coordinated reservoir networks or multiple purpose reservoirs ( Maass et al., 1962 ). More flexible approaches that can accommodate these drawbacks have been proposed for many years Stedinger, 1987, 1988;Douglas et al., 2002;Celeste, 2016 ) and are now widely adopted in scientific research ( Loucks et al., 2005 ). These design methods simulate the reservoir system against long time series of reservoir inflows and iterate the simulation until finding the minimum reservoir capacity that meets the target objectives under a variety of hydrological conditions. As such, they require an explicit formulation (and preferably nested optimization) of the reservoir operating policy that will be used to make release decisions in the various simulated circumstances.
Reservoir operation optimization is a mature and yet very active research area (see Fig. 2 ) and a number of reviews of the available optimization methods have been carried out over time ( Yeh, 1985;Labadie, 2004;Castelletti et al., 2008;Rani and Moreira, 2009;Ahmad et al., 2014 ). While these reviews may differ in the emphasis given to a particular group of methods or another, they all share the same fundamental approach to classifying and presenting methods, which is based on the mathematical properties of the optimization algorithms. However, we believe that an alternative approach to classifying methods is possible and useful, particularly for new and non-specialised users, by focusing on the argument of the optimization problem. In order to better understand this point, we note that there are four elements to an optimization problem: (1) the objective(s), i.e. the variable(s) to be minimised/maximised, such as the average water supply, or hydropower production, level of flood protection, etc.; (2) the argument of the optimization problem, i.e. the decision variable(s) whose optimal choice would deliver the minimum/maximum objective value(s); (3) the constraints, i.e. the set of equations that link the decision variables to the objectives; and (4) the optimization method, i.e. the algorithm used to determine the values of the decision variables that optimize the objectives while respecting all the constraints. We have presented these elements in the order in which they should be defined in practice. Indeed, when formulating an optimization problem the optimization method should be the last element to be chosen, and yet previous reviews in this field focus on this element as the key to present and compare literature contributions. We propose instead that the highest level of classification should be the argument, which determines the 'output' of the optimization task (which type of variable is being optimized, i.e. a sequence of release/transfer decisions or an operating policy, as further explained in the following sections). This changes the focus to the practical aspects that make an optimization approach more or less suitable for the problem at hand ( what type of solution they deliver and when they are useful), rather than the mathematical properties of the solution algorithm ( how the methods achieve those solutions).
This paper hence offers a new review of the scientific literature on reservoir operation optimization where optimization methods and applications are presented according to the type of argument to the optimization problem instead of the underlying mathematics in use. Indeed we will show that the same type of algorithm (for instance, a genetic algorithm) can be used to solve reservoir optimization problems with very different arguments (e.g. deriving the optimal sequence of short-term decisions vs determining the long-term optimal operating policy); while an optimization problem with the same type of argument (and hence solution) can be solved by using very different algorithms (e.g. a genetic algorithm vs a nonlinear programming one). We complement the review with a terminology disambiguation table to help the reader navigate both our review and the wider literature, where terms are sometimes used with different meanings by different authors. By focusing less on the mathematical properties of solution algorithms in favour of an argument-based classification of the optimization methods, we are also able to draw a comparison between them, discuss important practical factors such as the different assumptions required by each method, and ultimately provide guidelines towards selecting a suitable method for the decision-making problem at hand. We hope our paper will make the reservoir operation optimization literature accessible to a wider audience besides the academic community already active in water systems analysis and optimization.

Review and classification of reservoir operation optimization by argument
In this section we present our classification system by argument and review optimization methods and applications accordingly. Beforehand, however, in the following subsection we briefly define the two other elements of the optimization problem formulation described in the introduction: the objectives and the constraints.

Objectives and constraints
In the optimization literature, an 'objective' is a scalar variable that summarises the system performance over a temporal period. The optimization method aims at either minimizing or maximizing the objective; throughout this review we will assume that objectives must be minimized, i.e. they represent either costs or benefits changed in sign. Objectives that are commonly used to evaluate the performance of reservoir systems capture the system's reliability, resilience or vulnerability ( Hashimoto et al., 1982;Loucks et al., 2005;Kasprzyk et al., 2013 ). Reliability objectives measure the frequency of occurrence of a specified failure event (for example, failure to supply adequate amounts of water to a demand node), resilience objectives measure the recovery time from a failure event, and vulnerability objectives measure the severity of the failure's consequences. The choice and definition of objectives can vary greatly depending on the specific reservoir system under study, the availability of data, etc. and as a general rule should reflect as much as possible the reservoir operator's targets and preferences. However, two factors in the formulation of the objective impact the applicability of operation optimization methods. The first is the presence of non-linear components in the objective's mathematical definition, which may prevent the application of some methods that assume linearity, as will be summarised in Section 3.4 . The second is the so called 'time-separability', i.e. the fact that the objective is defined by temporal aggregation (for instance, averaging) of 'step costs' (or 'step benefits') that only depend on system variables at one time-step ( Barro and King, 1982 ). An example of a time non-separable objective is the profit from selling water in a water market, where the price at each time step is dependent on water sales at previous time steps.
Another critical aspect that may strongly influence the applicability of reservoir optimization methods is the number of objectives that the operation aims to minimize. In fact, reservoirs are typically expected to serve multiple purposes. For example, nearly half of all large dams included in the World Register of Dams ( ICoLD, 2003 ) have multiple uses -most commonly irrigation, hydropower, water supply and flood control. A possible approach to handle multiple objectives is to make them commensurable by appointing them an economic value so that they can be summed up into a single objective that expresses the total net benefit (or cost) over the simulation period ( Maass et al., 1962 ). However, this technique may not sufficiently compensate for non-economic indicators and does not express the available trade-offs between objectives to the decision maker, as described in detail in Kasprzyk et al. (2013) . An increasingly preferable alternative is to solve a multi-objective optimization problem, which returns a set of Pareto-optimal (or Pareto efficient) solutions, instead of one optimal solution. Pareto-optimal solutions are characterised by the property that an improvement in one objective is unattainable without a deterioration in at least one other objective ( Cohon and Marks, 1975 ). The choice of the 'best' solution within the set of Pareto-optimal ones is not considered as part of the optimization process because it involves a subjective evaluation of what acceptable trade-offs between the objectives should be. However, in order to assist the decision maker in such evaluation and choice, the set of Pareto optimal solutions can be displayed in the objective space (this representation is called the 'Pareto front') to reveal and quantify those trade-offs. The benefit of visualizing the Pareto front lies in enabling the decision maker to view the impact of their decisions in the context of all objectives rather than a single, prior weighted objective. In selecting one solution within the Pareto front the decision maker implicitly selects a posterior set of weights. For the sake of simplicity, in the next section we will first introduce optimization methods with reference to the single-objective case, and in Section 3.2 we will discuss their ability in handling multi-objective optimization problems, in particular when the number of objectives increases above 3 -the so called 'many objective' problems ( Fleming et al., 2005 ).
The 'constraints' of an optimization problem are all the equations that are needed to compute the objective(s) from the decision variables. In a reservoir operation optimization problem, this link is established via a simulation model of the reservoir system, which is run over the simulation period for given initial condition and trajectory of forcing inputs. Initial conditions, such as reservoir storages at the beginning of the simulation period, are typically selected by a sensible guess and the impact of their choice mitigated by calculating system performances only after a warmup period (see Ashbolt et al., 2016 as an example). Forcing inputs, such as trajectories of inflows or water demands, may be represented by historical data, synthetic data generated from a statistical model, or a statistical model in itself. As we later discuss in Section 3.1 , whether a statistical model or actual data (synthetic or historical) is used will impact the choice of the solution algorithm. How to best create the statistical model (whether to be used directly in the optimisation or to generate synthetic forcing data) has been an active field of research throughout the history of reservoir operation ( Fiering and Bund, 1971;Hirsch, 1979;Salas et al., 2005;Rajagopalan et al., 2010;Herman et al., 2016 ). The simulation model is essentially based on the law of conservation of mass, in the form of a water balance equation (over time) for each reservoir and a water balance (across space) for each link between reservoirs and abstractions/consumption nodes ( Ford and Fulkerson, 1962 ). A detailed description of typical reservoir system simulation equations can be found in Rani andMoreira (2009) , Matrosov et al. (2011) , Mo et al. (2013) and Seifollahi-Aghmiuni et al. (2016) and examples of reservoir simulation software in Coelho and Andrade-Campos (2014) and Wurbs (2005) . This mathematical description is typically complemented with several hard and soft constraints on individual variables. Hard constraints are those constraints that cannot be violated under any circumstance and typically represent physical limits, such as non-negativity of storage and flow variables. Less commonly used hard constraints include equations to impose conservation of energy and wave travel times. Soft constraints, instead, are those constraints that should not be violated but that are not physically impossible to break ( Mayne et al., 2000 ), for example a minimum environmental flow requirement downstream of a reservoir. Soft constraints may be included in the optimization problem either as additional objectives or as hard constraints. Treating soft constraints as Fig. 3. Our proposed classification system of reservoir operation optimization methods based on the argument of the optimization problem. The list of algorithms is not intended to be exhaustive, but it covers the literature applications reviewed in this paper.
objectives allows exploring the trade-off between breaking the soft constraints and preventing a greater cost elsewhere. The downside is the increase in the number of objectives, which may increase the difficulty of solving the multi-objective optimization problem. Depending on the case study application, a balance can be found between the ease of optimization (which would suggest using hard constraints) and completeness of information delivered to decision makers (which would suggest using objectives). Some interesting examples of swapping constraints and objectives include Sigvaldson (1976) , which uses channel capacity as an objective rather than a constraint, as most commonly treated, and Koutsoyiannis and Economou (2003) , which uses deficit as a constraint rather than an objective.

Classification of methods by argument
This section presents our classification system of reservoir operation optimization methods, which focuses on a higher-level understanding of the decision variables to which they are applied (the argument of the optimization problem). For each method, we will review applications in the literature, and provide a short description of the most commonly used optimization algorithms, with reference for further reading on more mathematical details. The classification system is summarised in Fig. 3 , while further details about the adopted terminology are given in Fig. 4 . In Appendix A we also provide the detailed equations of the revised methods applied to an illustrative case study. The MATLAB code implementing these methods can be found at: https://github.com/ barneydobson/reservoir _ operation _ optimization _ examples .
In our classification we distinguish three main types of argument: 1. Release Sequences (RS) optimization methods. Optimization aims at finding the sequence of release decisions over a prescribed time period ( Fig. 5 a) that minimises operational objectives under a given scenario of forcing inputs, for example a given time series of reservoir inflows and water demand. RS optimization can be used to directly inform operational decisions if the underlying assumption that forcing inputs can be deterministically predicted is valid. The larger the deviations from the assumed deterministic scenario, the less effective the 'optimal' RS will actually be when applied in reality. Since forcing inputs are typically very uncertain and the mismatch between predictions and actual trajectories very large, 'optimal' RS are rarely implemented in practice. More commonly, RS optimization is an intermediate step within a more complex optimization process of the other two types below (discussed in Section 2.2.3 ). Another possible use of RS optimization is in what-if studies, for example to determine a reference baseline for comparison with other optimization solutions or to assess the upper bound of system performance   (2012) , Wurbs (1993) , Oliveira and Loucks (1997) , Draper and Lund (2004) , You and Cai (2008) , Raso et al. (2014) , Castelletti et al., 2007 , Koutsoyiannis andEconomou (2003) . An optimal Operating Policy, i.e. a function that returns a release decision for a given time step (u t ) depending on the system state (e.g. storage, S t ) at that time (t). (c) Schematic illustrating the working principle of Real Time Optimization. Here, the Release Sequence is re-optimized at every time-step over a rolling horizon (from current time t to t + h) for which input forcing forecasts are available, but only the first release decision of the sequence is actually implemented.
-the maximum that could be achieved with the existing infrastructure under the "ideal " assumption of perfect foresight of all future inflows (for an example see Castelletti et al., 2012 ).
2. Operating policy (OP) optimization methods. Optimization aims at finding the optimal operating policy, i.e. a function that can be used to determine the release conditional on the state of the reservoir system in the current time-step ( Fig. 5 b). In other words, optimization returns a strategy (the OP) for making release decisions, rather than the release decisions themselves. At each time-step, the optimal OP should return the decisions that will perform best over the expected trajectories of forcing inputs that may occur from that time-step onwards. The assumption here is not that the future forcing inputs trajectory is deterministically known (as with RS optimization) but only that the trajectories (historical or synthetic time-series) or distributions assumed in the OP optimization are representative of actual conditions. The state variables that OPs depend on typically include reservoir storage and time of year. They may also include other variables, e.g. current inflow ( Oliveira and Loucks, 1997 ), depending on the characteristics of the study site, the reservoir system equations and the chosen optimization method. 3. Real Time Optimization (RTO) methods. RTO uses an optimized RS over a rolling time horizon for which real-time forecasts of forcing inputs are available. The first release in the RS is implemented, and then at the next time step the optimization process is performed again with updated forecasts, as displayed in Fig. 5 c. RTO is ideal if real time computing resources and accurate input forecasts are available.
For each of the three above cases, our classification system ( Fig. 3 ) distinguishes optimization methods based on their key working principles, i.e. essential mathematical properties of the optimization problem formulation. For each method, the optimization problem can be solved using different algorithms, as shown in the last layer of our classification system. While there are certainly differences between algorithms under the same method, they do not significantly affect the broader type of reservoir system to which the overarching method is applicable (with the exception of algorithms for Mathematical Programming, as further discussed in Section 2.2.1 ). Therefore, in the following sections we will focus on the description of the different methods and only provide references for further details on the specific algorithms. These descriptions form the basis for our discussion in Section 3 , where we will compare the applicability of the various methods to different types of reservoir systems (for example, presence of multiple reservoirs or multiple objectives, linearity or non-linearity of the reservoir simulation model, etc.) and give practical guidelines towards selecting an appropriate method for a given system.

Methods for Release Sequence (RS) optimization
The first case identified by our classification system is that of Release Sequence (RS) optimization (see Fig. 4 for disambiguation of terminology). A RS is a sequence of reservoir release decisions over a prescribed time period. Thus, each release in the sequence is a variable in the optimization problem. An optimal RS is the release sequence for which an objective is minimized (under a given deterministic scenario of the system forcing inputs, e.g. reservoir inflows and demands), i.e.: * = min (1) where U is a matrix containing all releases over the simulation period for all the reservoirs in the system under study (i.e. a RS), J is the aggregated objective associated with these releases, and U * is the optimal RS. Since the optimization argument is the matrix U , the solution space to be explored in the optimization quickly grows with the length of the simulation period and the number of reservoirs. The large search space is a characteristic difficultly of RS optimization. The three most commonly used methods for RS optimization are summarized below. Mathematical programming . We classify as mathematical programming any method that exploits the mathematical properties of the optimization problem (for example, linearity and convexity of the constraints and objective) to efficiently find an optimal RS. As such, mathematical programming is most effective where speed is important and simplifications to fit the required assumptions (e.g. linearizing non-linear components) are acceptable. Mathematical programming employs a broad range of algorithms, distinguishable primarily by the level of non-linearity allowed in the objective and constraint definitions. Linear and quadratic programming algorithms (LP, QP) require that all constraints be described by linear equations and that the objectives be either linear (LP, e.g. applied to RS optimization by Hiew et al., 1989 andTerlaky, 2013 ) or quadratic (QP, e.g. Mariño and Loaiciga, 1985 ). While these assumptions are strong, the advantage of LP and QP is that they can quickly find global optima even for large RS optimization problems. However, as the linearity assumptions become less acceptable and nonlinear equations are needed for a more realistic representation of the reservoir system, non-linear programming (NLP) is required. Sequential linear programming (e.g. Martin, 1983;Grygier and Stedinger, 1985 ) and sequential quadratic programming (e.g. Boggs and Tolle, 1995 ) have been most commonly applied to RS optimization, however other NLP algorithms exist and continue to be developed (see, for example, Bazaraa et al., 2013 for a recent collection of available algorithms). The disadvantage of NLP algorithms is that they cannot guarantee reaching a globally optimal solution in usable computation time for many problems ( Bazaraa et al., 2013 ).
Value function estimation . This method exploits the dynamic nature of the optimization problem by breaking it into a sequence of easier to solve sub-problems, each relevant to one time-step in the simulation period. The key idea is to define a value function that, for each time-step, represents the cost it takes to transition from the state at that time-step (t) to the state at the final time-step of the simulation period (T) if only optimal decisions are made , i.e. via the optimal RS from t to T ( Hall and Buras, 1961 ). The value function can be derived by solving the recursive Bellman equation of dynamic programming ( Bellman, 1956 ), which has been extensively used for reservoir operation optimization for a long time -the first review of its application dating back to Yakowitz (1982) .
There are two primary strengths to the value function estimation method. Firstly, it does not impose any limitation on the level of nonlinearity of the objective or constraints. Secondly, the solution time only increases linearly with the length T of the simulation period (in contrast to the other RS optimization methods, which increase polynomially or worse) so that it can be applied to find optimal RS that are very long. The drawback is that, since at each time-step the numerical resolution of the Bellman equation requires the evaluation of all possible combinations of state variables (e.g. storages) and decision variables (e.g. releases), the solution time scales exponentially with the number of states and decisions. This problem was named by Bellman as the curse of dimensionality ( Bellman, 1956 ) and it severely limits the applicability of this method to large reservoir systems. A second drawback is that, since the value function is only defined at discrete points, interpolation between point evaluations is required. The first weakness compounds the second: the curse of dimensionality pushes towards using a coarser resolution and this makes the interpolation less accurate. Several variants of the discrete DP algorithm have been proposed to mitigate the problem in the context of reservoir operation optimization, for example incremental dynamic programming ( Hall et al., 1967 ) and dynamic programming successive approximation ( Shim et al., 2002 ), however none of these have established as standard practice. Another very important limitation of the value function estimation method, which no technical advances will overcome, is that the very definition of a value function requires a timeseparable objective (as discussed in Section 2.1 ), making the method incompatible with common performance metrics such as resilience metrics ( Hashimoto et al., 1982 ).
Heuristic optimization . This term covers a wide range of algorithms that can use very different working principles, but have as a common trait the fact that they attempt to find an approximate solution to a problem (in our case, highly non-linear and/or with large number of reservoirs) for which classic methods (mathematical programming and value function estimation in our case) are not applicable. Given such variety of heuristic optimization algorithms, we do not discuss the entire spectrum of options but highlight that the two most common methods currently in use for RS optimization are genetic algorithms (GA) ( Wardlaw and Sharif, 1999;H ı nçal et al., 2010 ) and particle swarm optimization (PSO) ( Kumar and Reddy, 2007;Noory et al., 2012 ). However, numerous other algorithms have been tried in the context of RS optimization, such as honey bees mating ( Haddad et al., 2006 ), ant colony optimization ( Kumar and Reddy, 2006 ), simulated annealing ( Georgiou et al., 2006 ) and many more ( Garousi-Nejad et al., 2016 ). To the best of the authors' knowledge, heuristic optimization was first applied to the RS optimization problem by Wardlaw and Sharif (1999) . Given that no single algorithm dominates in all cases, newer algorithms use a combination of optimization search strategies blended from different algorithms , which are selected in an adaptive manner throughout the optimization process. An example that appears to be very successful is the Borg algorithm .
The advantage of heuristic optimization is that it can be equally applied to linear or non-linear constraints and objectives, as well as to either time-separable or non-separable objectives. Hence it can be applied to problems where complex decisions are investigated (for example, planning drought revenue loss insurance as in Herman et al., 2014 ). Since heuristic optimization covers a large variety of algorithms it is difficult to make generic statements about its weaknesses, which may vary from one algorithm to another. However, one general comment is that, as the size of the RS increases (either due to many decisions per timestep or a long simulation period, or both) the solution time can become prohibitively long.

Methods for Operating Policy (OP) optimization
An Operating Policy (OP) is a function that takes the current state of the system and returns a release decision, or set of release decisions, to be implemented in the current time-step. At a minimum, the system state vector (i.e. the independent variable of the OP) should include the reservoir storages at the current time-step; in a more sophisticated OP it may also include additional information such as time of year (useful for reservoir systems with strong seasonal behaviour), reservoir inflows at the current or previous time-step ( Tejada-Guibert et al., 1995 ), or other information like flows at upstream locations in the reservoir network ( Giuliani et al., 2015a ). In the following we denote an OP as where U t is the vector of all release decisions to be made at time t, X t is the vector of relevant state variables (such as storages, reservoir inflows, etc.) at time t, and is a set of parameters to be determined as part of the OP optimization task. The OP optimization problem can be described by We classify OP optimization methods into three categories below.
Release sequence based (RSB) . The first step for these methods is to solve a RS optimization problem and thus obtain an optimal RS ( U * ) and associated optimal states ( X * ). The OP (the function m and its parameters in Eq. (2) ) is then derived as the result of a regression between the time series of state variables ( X * ) and the optimal RS ( U * ). In other words, the OP is a "generalization " of the optimal RS it originates from. It follows that a better OP is obtained when the optimal states ( X * ) cover the state space as widely as possible. This in turn is more likely to be achieved if the RS is optimized over a long simulation period. Resultantly, in most cases heuristic optimization will not be applicable for the RS optimization step, while either mathematical programming or value function estimation will need to be employed, hence imposing constraints on the objective formulation and model structure, as discussed in Section 2.2.1 . As for the second step, many sophisticated regression techniques have been demonstrated, most commonly artificial neural networks, fuzzy logic and decision trees ( Celeste and Billib, 2009;Kumar et al., 2012;Zhou et al., 2015 ). Since each regression algorithm has different benefits and drawbacks (discussed in referenced papers), it is unlikely that a single algorithm is preferable for all possible reservoir systems ( Labadie, 2004 ).
A limitation of the RSB method is that it provides an OP that is only 'optimal' to the accuracy of the regression, i.e. it is actually sub-optimal even under the deterministic scenario used in the RS optimization step. Furthermore, and possibly more importantly, the very RSB optimization problem is somehow ill-posed. In fact, the ultimate goal of reservoir optimization is to find the OP that minimizes the management objective, and not the distance from an "optimal trajectory " ( X * ) that most likely will never occur (because it is based on a deterministic scenario of uncertain input forcing). Directly minimizing the objective function is precisely the key idea of the DPS method described in the next section. The RSB method thus appears to be an unnecessarily indirect way to achieve (in a sub-optimal way) what DPS can achieve more directly.
Direct policy search (DPS) . These methods aim to directly derive the OP by directly finding the parameterization of a pre-selected function (m) that minimizes the objective under a deterministic time-series of forcing inputs, in a special case of Eq. (3) : * = min (4) The DPS approach can be linked back to early works by Maass et al. (1962) and Revelle et al. (1969) on the Linear Decision Rule (LDR). In fact, the OP of a single-purpose, single-reservoir and singledemand node reservoir system can be expressed by the LDR: where t is the LDR parameter, essentially the target storage for timestep t , to be optimized. If the objective and all constraints are linear, then the optimization problem can be formulated as a linear program and the set of optimal parameters (one per time-step) obtained by MP. The LDR approach can be expanded to include a non-linear OP, i.e. a non-linear version of Eq. (5) by using first order Taylor series approximation of non-linear objectives ReVelle, 1994, 1995;Pan et al., 2015 ). Heuristic optimization algorithms can be used to extend the applicability of DPS to non-linear objectives and constraints, as well as any function form for the OP beyond the linear case. All the heuristic optimization algorithms described previously ( Section 2.2.1 ) are in principle suitable to solve Eq. (4) . Indeed both GA ( Oliveira and Loucks, 1997;Koutsoyiannis and Economou, 2003;Ahmed and Sarma, 2005;Chang et al., 2005;Momtahen and Dariane, 2007;Li et al., 2014 ) and PSO ( Ostadrahimi et al., 2011 ) have been tested for this purpose. The additional benefits of heuristic optimization algorithms such as no limits on using time non-separable objectives ( Giuliani et al., 2014 ), spontaneous multi-objective formulation and scalability to many-objective problems will be further discussed in Section 3.2 .
As for the choice of the OP form, many options beyond the simple linear curve in Eq. (5) have been proposed. The OP may be represented by, for example, a piecewise linear function ( Oliveira and Loucks, 1997 ), as depicted in Fig. 5 b. Given that non-linear and piecewise constraints are handled by heuristic optimization algorithms, it is possible to introduce variable policy structures within the same reservoir system, for example to operate some reservoirs based on their inflows and some others based on their storage levels ( Ashbolt et al., 2016 ). Universal approximating functions can also be used, for example Artificial Neural Networks (ANN) ( Pianosi et al., 2011 ) or Radial Basis Functions (RBF), which according to Giuliani et al. (2015b ) can outperform ANNs in many different aspects. In all these cases, the parameter vector contains the weights and biases of the ANN or RBF. The advantage of such universal approximating functions is that they do not a priori constrain the OP to any specific structure, and that they scale efficiently with the number of input arguments of the approximating function ( Barron, 1993 ), i.e. in our case the number of independent variables of the OP. The drawback is that the resulting OP is a black-box that is difficult to interpret and therefore possibly more difficult to communicate to decision-makers. A possible solution to this problem is to optimize the 'rule curves' (for definition see OP disambiguation in Fig. 4 ) in use by the current operators, as done for example by Chang et al. (2005) and Zhou and Guo (2013) . However, rule curves have a lack of flexibility that will significantly degrade their performance even when their parameters have been optimised. They are also typically expressed in the form of target reservoir storages, and hence they do not provide explicit recommendation on water release/abstraction decisions. An alternative is to create OPs that have highly flexible structures but are easy to visualize, such as in the form of a decision tree ( Herman and Giuliani, 2018 ).
Expected value function estimation . The expected value function estimation method extends the value function estimation approach discussed for RS optimization to the case of OP optimization, i.e. optimization under uncertain forcing inputs. Typically, forcing inputs are regarded as stochastic variables described by probability distributions, and the OP is obtained by the minimization of the expected value of the value function. The solution algorithm, called Stochastic Dynamic Programming (SDP), follows similar steps as the discrete DP algorithm but with one more layer of discretization for the forcing input variables. The value function is thus evaluated against all possible combinations of the forcing inputs and the sample mean is used to approximates its expected value for each discretised state. Another possible approach, although much less common, is to describe forcing inputs by membership sets (rather than probability distributions) and search for the OP that minimizes the maximum possible value function ( Nardini et al., 1992 ).
The SDP algorithm has been widely used for reservoir operation (for reviews see Yakowitz, 1982;Nandalal and Bogardi, 2007 ). However, its applicability is subject to the same limitations as deterministic DP, i.e. the need for time-separable objectives, limited scalability to multi-objective problems, and the curse of dimensionality (as discussed in Section 2.2.1 ). The latter problem is even more severe here given that each state-decision combination must be evaluated against each combination of forcing input variables. To partially mitigate the computing burden, several variants of the SDP algorithm have been proposed, including the Neuro-Dynamic Programming (NDP) algorithm ( Castelletti et al., 2007 ), which uses a neural network to interpolate the value function evaluations, allowing for a coarser state discretization grid.
Another limitation of the SDP approach is that each forcing input must be characterized by an independent probability density function, which might be an overly simplistic approach for input processes (e.g. inflows), which typically exhibit complex temporal and spatial structures ( Carrillo et al., 2011 ). On the other hand, including temporal and spatial correlations among probability distributions would increase the number of variables required for discretization, up to a point that the problem becomes computationally intractable. An SDP variant that aims at overcoming the issue is Sampling SDP (SSDP), which uses a large number of sample inflow sequences in place of inflow probability distributions (see Kelman et al., 1990 for one of the earlier works on this, and Stedinger et al., 2013 for a more recent review). More recently, Reinforcement Learning (RL) algorithms have been demonstrated as viable options for approximating expected value functions using time-series of inflows rather than distributions (e.g. Castelletti et al., 2010Castelletti et al., , 2013Dariane and Moradi, 2016 ).

Real-Time Optimization (RTO)
When forcing inputs of the reservoir system are uncertain but a forecasting system is in place, Real-Time Optimization (RTO) is an interesting alternative to the OP approach. Differently from OP optimization, where the optimization task is concentrated in one effort, in RTO the optimization is repeated each time an operational decision needs to be taken. This allows for exploiting forecasts (for example inflow or demand forecasts) that are available from a continuously updated forecasting system. The optimization problem is typically formulated as an RS optimization with forcing inputs set equal to their (deterministic) forecasts, and initial system state reflecting the current system conditions. Long-term costs beyond the forecast horizon are accounted for by including a term that penalizes 'unfavourable' final states into the objective function. For example, for a supply reservoir the penalisation function would help finding a balance between maximising supply reliability over the forecast period and not leaving the storage depleted at the end of the period. The RTO problem is hence formulated as * = arg min where p represents the penalisation function, h is the length of the forecast period, U is the RS over the period [t, t + h-1], J [t,t + h-1] is the cost associated with implementing U over this forecasted period and X t + h represents the system state at the end of the forecasted period. The RTO problem in Eq. (6) can be solved by any of the RS optimization methods discussed in Section 2.2.1 , provided they can accommodate any nonlinearity associated with the penalization function. Although the optimal RS obtained through Eq. (6) provides release decisions over the entire forecast period, only a small portion of the sequence will actually be implemented. This may only be the releases for the current time-step, or it may be the releases until a new forecast becomes available. When new forecasts become available, the optimization process is performed again with updated forecasts and initial states ( Fig. 5 c). Hence, despite the mathematical formulation and the optimisation argument in Eq. (6) are the same as in RS optimisation, the 'practical' output of RTO is different -only the decision(s) to be made in the current time-step (or until the next forecast becomes available), instead of the entire sequence of decisions.
A key issue in the application of RTO is the adequate definition of the penalisation function. Different approaches have been demonstrated, from using deviations from seasonal 'target storages' (as given, for example, by the reservoir's filling curves, e.g. Ficchì et al., 2015 ), to linking the penalization function to the solution of an optimization problem where 'off-line' forecasts (e.g. seasonal distributions) are used in place of 'posterior' (real-time) forecasts (e.g. Galelli et al., 2014 ). In principal, any of the OP optimization algorithms introduced in Section 2.2.2 could be used to derive penalisation function. The expected value function has been demonstrated to be suitable in Pianosi and Soncini-Sessa (2009) . However, as discussed, expected value function optimization is not applicable to systems that contain many reservoirs or require many decisions to be made each time step.
As the deviations from the assumed deterministic forecast increase, the RTO release becomes less effective when applied in reality. Therefore, the benefits of using RTO are highly dependent on the quality of the real-time forecasts. If these forecasts are not significantly better than 'off-line forecasts' (e.g. seasonal distributions) then using an OP will be equivalent to RTO (but at lower implementation costs, as the optimization effort of an OP is done once and for all before the operation starts). This is the primary reason why RTO has only recently received significant attention, as a result of increasingly accurate forecasting systems ( Anghileri et al., 2016 ). A proven way to increase RTO performance in the presence of inaccurate forecasts is by explicitly taking into account forecast uncertainty in the optimization problem. This has been mainly implemented using two approaches. The first is to explicitly characterise forecast's uncertainty by probability distributions and solve the resulting stochastic optimization problem by expected value function estimation, which means the RTO problem is mathematically reformulated as an OP optimisation problem, instead of an RS optimisation problem as in Eq. (6) . The initial illustration of the idea (although with an extremely simplified flow forecasting approach) dates back to Bras et al. (1983) and more recent applications include Pianosi and Soncini-Sessa (2009) and Zhao et al. (2011) . The second is to optimize an RS against an ensemble of inflow forecasts, as done for instance by Zhao et al. (2011) , Raso et al. (2014 and Ficchì et al. (2015) . Interestingly, all these authors have found that including forecast uncertainty consistently outperforms any single deterministic 'worst-case' or 'most likely forecast' RTO approach.
Finally, an interesting question for RTO is the impact of the forecast horizon length (or 'lead-time') on RTO performance. For example, Zhao et al. (2012) investigated how forecast horizon length and forecast uncertainty trade off against each other, aiming to find the 'effective forecast horizon' for which the forecast provides the most valuable information for decision making. If the forecast horizon is short, the optimized decision is highly sensitive to the horizon length; as the horizon length increases, the decision becomes increasingly sensitive to forecast uncertainty. Interestingly, in Zhao et al. (2012) the inclusion of ensemble forecasts improved performance but had no effect on determining the effective forecast horizon. Seasonal forecasts (between a month and a year) with some skill are becoming widely available for water resources operators, although the value of seasonal forecasts to improve operation by RTO has proved limited thus far ( Celeste et al., 2008;Anghileri et al., 2016 ). We would expect this to become an increasingly active area of research as the skill of these forecasts improves (or perhaps as characterization of their uncertainties becomes more accurate).

Comparison and choice of reservoir optimization methods
In the previous sections we have briefly reviewed operation optimization methods individually. In this section we will discuss some concepts and properties that are relevant across methods and can be useful for the comparison and choice of the most adequate method for the problem at hand. We start by discussing how operating policy (OP) optimization methods could be further classified based on the approach they use to handle uncertainty in forcing inputs (typically reservoir inflows but possibly also other input variables/parameters like water demand or energy price), i.e. 'implicit' or 'explicit'. Such distinction is useful both in mathematical and in practical terms. Following this discussion, we debate the concept of 'optimality' within a practice-oriented research field such as reservoir operation optimization and compare the extent to which different methods can be regarded as 'optimal' given the uncertainties that the reservoir modelling and optimization process is subject to. Finally, we compare the ability of different methods to scale from single-objective to multi-objective problems, which, as anticipated in Section 2.1 , is a very important feature in the context of reservoir operation optimization. These topics and the advantages and disadvantages anticipated in Sections 2.2 are then brought together into a set of practical guidelines towards appropriate method selection.

Implicit versus explicit treatment of forcing inputs variability
In the context of OP optimization, a typical distinction is made between two 'classes' of optimization methods based on the way they handle the variability in forcing inputs (the distinction is used, for example, in the review by Labadie, 2004 ). These two classes are 'implicit stochastic optimization' and 'explicit stochastic optimization'. Implicit stochastic optimization accounts for variability in forcing inputs 'implicitly' by using a long and diverse realisation, typically a time series of historical observations or a synthetic time series generated by a statistical model. Explicit stochastic optimisation instead directly uses the statistical model within the optimisation process. By this distinction, Release Sequence Based (RSB) optimization and Direct Policy Search (DPS) both belong to the 'implicit stochastic optimization' class (although at the time of Labadie (2004) review, DPS was not a common approach and so implicit stochastic optimization was used almost as a synonym of RSB optimization), while the expected value function approach is an example of 'explicit stochastic optimisation'. Authors who have adopted the implicit-explicit divide seem to suggest that an explicitly approach is preferable because it is more rigorous. However, it should be noted that forcing input probability distributions are also subject to simplifying assumptions, such as simplification or omission of spatial and temporal correlations. Furthermore, probability distributions are estimated from historical data and therefore can also be affected by scarcity or poor quality of the data ( Koutsoyiannis, 2000;Chatfield, 2013 ). So, in our opinion the preference for explicit characterisation of uncertainty is often not strongly motivated, except for the simplest case of a single-input reservoir system where a complete characterization of inflow uncertainty by probability distribution is often possible. More generally, we would argue that classifying OP optimization algorithms based on the implicit-explicit divide is mathematically elegant but much less salient from a user's perspective.

Scaling methods to multi-objective optimization
As anticipated in Section 2.1 , reservoir operation is typically a multi-objective optimization problem as different objectives need to be achieved simultaneously. Multi-objective optimization problems can be approached through two distinct methods: a priori and a posteriori methods ( Cohon and Marks, 1975 ). A priori methods rely on a 'prior' weighting of the objectives in a way that (tries to) capture the decision maker's preferences. As a result of this weighting, a single aggregated objective function is obtained and the problem is traced back to a single-objective optimisation. A posteriori methods instead preserve the multi-objective nature of the problem and aim at creating a set of Pareto-optimal solutions, each realising a different trade-off between the multiple objectives. There are two ways to create a set of Paretooptimal solutions. One approach is to repeatedly solve a single-objective optimization problem with different aggregation weights of the multiple objectives. The main drawback of this approach is that the number of optimizations required to approximate the Pareto front (the representation of optimal tradeoff solutions in the objective space) at given resolution increases factorially with the number of objectives . The other approach is to obtain a complete set of Paretooptimal solutions in a single optimization run using population-based optimization techniques, which scale far more effectively with the number of objectives.
In the context of the reservoir operation optimization methods presented so far, we note that the algorithms for mathematical programming and value function estimation (and, by extension, for expected value, see Fig. 3 ) are inherently single-objective. A priori techniques are therefore the only available way to handle multi-objective problems if these approaches are used.
On the contrary, most Heuristic Optimization (HO) algorithms, and in particular population-based algorithms such as GA and PSO, can equally handle single or multi-objective optimization problems and therefore provide an entire set of Pareto solutions in a single optimization run in the multi-objective case ( Sharif and Wardlaw, 2000 ). They thus constitute an a posteriori approach, and make heuristic optimization (whether it is used to directly obtain a RS or within DPS) particularly efficient when the number of objectives is large . A review and comparison of many state-of-art population-based approaches for MO optimization for DPS is provided by Salazar et al. (2016) .
Finally, one exception to the distinction delineated above is the Reinforcement Learning (RL) algorithm introduced by Castelletti et al. (2013) , which is the first (and to the authors' knowledge, only) a posteriori algorithm for multi-objective expected value function optimization. This method is also applied and compared with DPS in Dariane and Moradi (2016) , where it is found that DPS outperforms RL, although it is not possible to conclude whether the result would hold in a water system of lower dimensionality in which the limitations of expected value function optimization are less prohibitive.

Optimality and modelling assumptions
The optimization algorithms reviewed in the previous sections provide different degrees of confidence with regard to the optimality or sub-optimality of their solutions. For example, a correctly executed mathematical programming algorithm provides an optimal solution of the optimization problem, value function approaches provide approximately optimal solutions (i.e. accurate to the resolution of the interpolation), while heuristic optimization algorithms give no guarantee of optimality and simply return the best solution that could be found in a given number of iterations. However, it is important to highlight that such 'optimality' statements are only valid within the given problem formulation. If the problem formulation is not 'correct', i.e. the underlying assumptions (for example, linear reservoir equations or a single, timeseparable objective) provide an oversimplified representation of the system behaviour, then the 'optimal' solution will perform sub-optimally when applied in the real world. This is an important factor to be considered when selecting an optimization method, as we will further discuss in the following section.

Practical guidelines towards selecting reservoir operation optimization methods
The literature review presented in the previous sections was aimed to provide practical information about the advantages and limitations of reservoir operation optimization methods. Another contribution of this review is to identify a set of guidelines for the selection of the most appropriate optimization method for a given reservoir system. Our advice is summarised in the comparison table presented in Fig. 6 . The table can be used to narrow down the number of suitable methods (horizontal axis) for a reservoir system of given characteristics (vertical axis).
For release sequence (RS) optimization, the choice is least obvious and highly dependent on the system characteristics. The greatest advantage of mathematical programming is its speed, scalability and the guarantee of analytical optimality (albeit under the caveat that the problem must be simplified to fit the required assumptions on objectives and constraints). Value function estimation can solve non-linear (and thus more realistic) optimization problems over long time periods (of the order of T = 10,000 time-steps, e.g. decades if the time-step is daily or centuries Fig. 6. A summary of how the computational tractability of different reservoir optimization methods varies depending on different characteristics of the reservoir system and decision-making problem. Green indicates that the optimization method is applicable with relatively low computational effort, red indicates that the method is technically applicable but with great computational effort, and N/A ( "Not Applicable ") means that the method cannot be applied at all. Other acronyms as in Fig. 3 . RSB-MP and RSB-VFE refer to Release Sequence Based optimization of an OP where Mathematical Programming or Value Function Estimation are used in the RS optimization step. RTO is not included because it can use methods from either release sequence optimization or operating policy optimization and so the same considerations apply.
if it is weekly) but is limited to small reservoir networks (up to 3-4 reservoirs at current computing power) and the solution's optimality is subject to the interpolation accuracy of the value function. Heuristic optimization is a more flexible method that can handle multiple objectives efficiently and allows for time non-separable objectives, but it is limited to short lengths of the simulation-optimization period.
For operating policy (OP) optimization, we suggest that Direct Policy Search (DPS) is the most widely applicable method and indeed, in recent years it has been the most commonly applied method, even if it cannot provide any assessment of the optimality or accuracy of the solution. Still, we do not think this is a major issue for practical purposes, given the difficulty in evaluating whether simplifying assumptions required by other methods are satisfied for the problem at hand, and to what extent. Similarly, expected value function estimation may still be preferred in those situations where it is computationally feasible (i.e. relatively small reservoir networks) and when one can reasonably presume that its underlying assumptions (in particular, time-separability of the objectives) are acceptable.
For Real-Time optimization, no specific method has been clearly established yet (although value function approaches have possibly been employed more frequently in the literature) but we would expect that more research will be carried out in this context given the increasing availability and advances in real-time monitoring and forecasting systems.

Conclusions
In this paper, we have reviewed the ever-growing body of literature in the field of reservoir operation optimization, based on a novel classification system that uses the argument of the optimization problem as the main criterion to classify methods. Our classification system shows that while the use of different arguments leads to substantially different problem formulations and types of solution (an optimal release sequence versus an optimal operating policy), the algorithms used for solving the optimization problem are to some extent interchangeable. We hope this way of introducing the literature enables to shift the focus from the mathematical properties of solution algorithms, which we expect to be less accessible to users, to the more obvious and tangible properties of the reservoir operation problem. We also provided a comparison between different types of optimization algorithms and some guidance as to what types of system they are more likely to be applica-ble to. Ultimately, we hope this paper will contribute to further improve the accessibility of this literature to a wider audience.

Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.advwatres.2019.04.012 .

Appendix A
In this Appendix we present a simple example of a reservoir operation optimization problem that we use to describe the mathematical formulation of the different optimization methods reviewed in the paper. We note that we use a highly simplified simulation model, which omits several processes (e.g. evaporation) that would need to be represented in a real-world application, in order to allow the reader to focus on the specifics of the optimization problem formulation. We also provide the MATLAB code to implement some of these methods at the link: https:// github.com/barneydobson/reservoir _ operation _ optimization _ examples .
The system is a single reservoir, single release, single point of demand system, pictured Fig. A1 and characterised by the following variables and constant parameters: • S t : reservoir storage at time t; • S cap : reservoir capacity (constant); • I t : reservoir inflow at time t; • u t : release from the reservoir to the point of demand at time t; • d: volume of water demanded (assumed constant); • w t : reservoir spill at time t.
The objective of the system operation is to minimize the squared deficits, i.e. the squared differences between the release and the demand (d -u t ) 2 at each time t. We will assume a simulation period of length T, and that the system operator is interested in the cumulate squared deficits over this period: Fig. A1. Schematic of a single reservoir system for which we build on in this appendix.

Release Sequence (RS) Optimization
The form of RS optimization will be: Subject to the constraints: +1 = + − − for t = 1 , … , T = max ( 0 , +1 − ) for t = 1 , … , T ≥ 0 , ≤ , ≤ + for t = 1 , … , T S 1 l 1 l 2 , … , l T d S cap given (A3) Both heuristic optimization and mathematical programming directly solve Eq. (A2) . Because linear programming requires a linear objective it is not suitable, to make it suitable the objective in A1 would not be squared. Because quadratic programming requires linear constraints, and the spill in A3 contains 'max' it cannot be directly applied to the problem of A2/A3. Instead, spills can be treated as decision variables to make quadratic programming applicable -which we demonstrate in the example code. Nonlinear programming and heuristic optimization are both straightforward to apply to the problem.
Value function estimation requires a different approach where each element of the release sequence is obtained by minimizing the so called 'value function' V t at the relevant time-step: (subject to the constraints in ( A3 ) relevant to time t). The value function gives the minimum cost associated at each storage state S t , i.e. the total cost that would be encountered if, starting from that storage value at time t, only optimal decisions were made until the end of the optimisation period: By definition, the value function satisfies the recursive equation (also called Bellman equation, ( Bellman, 1956 )): The value function estimation algorithm hence exploits such recursive equation. It starts by finding the value function at the end of the period ( t = T ). Since no further costs are defined beyond the end of the period (i.e. V T + 1 (S T + 1 ) = 0 for any possible storage value S T + 1 ), the value function for the final storage (V T (S T )) can be obtained by simply minimising the step-costs g T of the final transition from T-1 to T . Because the storage state at the final time-step (S T ) is not known (as it is dependent on previous decisions) the minimisation is repeatedly solved at several discrete storage values. The value function will thus be defined for any final storage value within the accuracy of an interpolation between the evaluated discrete values. Once the value function for the final time-step (V T (S T )) has been defined, the value function of the previous time-step (V T-1 (S T-1 )) can be derived by applying Eq. (A3) at discretised values of the state S T-1 . The process is iterated backwards in time across the entire simulation period until the value function is defined for every time-step.
Operating Policy (OP) optimization For OP optimization we define an operating policy that determines the release depending on (a subset of) the system state variables. For example, a simple OP could be one that makes the release depend linearly on the current storage and inflow of the previous time-step: If an optimal RS is available (for example after solving problem ( A2 )), then the policy parameters can be estimated by least squares minimization of the difference between the optimal release sequence u * and the outputs of the OP: where I and S * are the inflows and storages associated with the optimal release sequence u * (also obtained when solving ( A2 )). Such an approach is what we called Release Sequence Based (RSB) optimisation. In contrast, the Direct Policy Search (DPS) formulation will take the following form: subject to the constraints in ( A2 ). As with RS optimization, whether mathematical programming or heuristic optimization is used to perform the minimization in ( A9 ), the form of the optimization problem does not change.
As with value function estimation for RS optimization, expected value function optimization requires first deriving value functions by exploiting the Bellman equation, and then defining the (implicit) operating policy as where E(.) represents the expected value operator -calculated over the probability distribution of inflows. Thus, an individual release is given by