An Efficient Robust Optimization Workflow using Multiscale Simulation and Stochastic Gradients

We present an efficient workflow that combines multiscale (MS) forward simulation and stochastic gradient computation - MS-StoSAG - for the optimization of well controls applied to waterflooding under geological uncertainty. A two-stage iterative Multiscale Finite Volume (i-MSFV), a mass conservative reservoir simulation strategy, is employed as the forward simulation strategy. MS methods provide the ability to accurately capture fine scale heterogeneities, and thus the fine-scale physics of the problem, while solving for the primary variables in a more computationally efficient coarse-scale simulation grid. In the workflow, the construction of the basis fuctions is performed at an offline stage and they are not reconstructed/updated throughout the optimization process. Instead, inaccuracies due to outdated basis functions are addressed by the i-MSFV smoothing stage. The Stochastic Simplex Approximate Gradient (StoSAG) method, a stochastic gradient technique is employed to compute the gradient of the objective function using forward simulation responses. Our experiments illustrate that i-MSFV simulations provide accurate forward simulation responses for the gradient computation, with the advantage of speeding up the workflow due to faster simulations. Speed-ups up to a factor of five on the forward simulation, the most computationally expensive step of the optimization workflow, were achieved for the ex- amples considered in the paper. Additionally, we investigate the impact of MS parameters such as coarsening ratio and heterogeneity contrast on the optimization process. The combination of speed and accuracy of MS forward simulation with the flexibility of the StoSAG technique allows for a flexible and efficient optimization workflow suitable for large-scale problems.


Introduction
We consider the life-cycle optimization of hydrocarbon production by manipulating well controls (pressure, rates or valve settings) for a given configuration of wells, a process also known as long-term production optimization or recovery optimization. In particular we consider robust optimization of strategies which maximize an objective function over an ensemble of reservoir model realizations to capture the effect of geologic uncertainty (Van Essen et al., 2009). The most efficient optimization techniques for this purpose are gradient-based with gradients computed using the adjoint method (Jansen, 2011). However, the implementation of an adjoint is time-consuming and requires access to the simulator source code. This has led to the growing popularity of stochastic gradient-based optimization techniques which are easy to implement and flexible to use with different reservoir simulators as well as a different control types. One of the most used stochastic gradientbased techniques is Ensemble Optimization (EnOpt) which was first introduced by (Chen et al., 2009). EnOpt uses a number of perturbed control vectors around a known control strategy where the number of perturbations is much lower than the total number of elements in the control vector (i.e. the number of well controls times the number of control time periods over the life time of the reservoir). The associated objective function values of these perturbed controls are used to approximate the gradient using linear regression. For optimization of a single reservoir model, i.e. deterministic optimization, the computational effort to estimate the gradient increases with the number of perturbed controls to be evaluated. However, it is shown in (Chen et al., 2009) that for optimization over multiple reservoir models, i.e. for robust optimization, pairing one reservoir model to one perturbed control strategy leads to a similar computational efficiency as gradient-based techniques. An improved version of EnOpt for robust optimization was introduced in , called Stochastic Simplex Approximate Gradient (StoSAG), which is theoretically more sound and produces higher-quality gradients compared to EnOpt. Inspite of the attractive computational efficiency many high fidelity simulations need to be run within the optimization workflow. For real field cases this can be computationally demanding. The forward simulation is the most time consuming aspect of any robust optimization workflow. Thus it is essential to improve the forward simulation performance. Additionally, it was shown in (Fonseca et al., 2015) that, although the one-to-one pairing is computationally efficient, higher quality StoSAG gradients can be achieved by using a larger number of perturbations. Thus improving the computational efficiency of individual forward simulations could also be used to achieve higher quality StoSAG gradients by simulating a larger number of perturbed control strategies which will consequently improve the overall optimization process.
An increase in computational efficiency for robust optimization workflows is usually achieved by two general approaches. The first approach is to use a subset of model realizations for the optimization which reduces the number of simulations needed, while the second approach is by using faster simulation models. An overview of different approaches and workflows for using a subset of model realizations along with the advantages and disadvantages of these methods can be found in (Barros et al., 2018). One of the ways to improve the computational efficiency of forward simulation models in optimization is through the use of Reduced Order Models (ROM) (Jansen and Durlofsky, 2016;Cardoso Durlofskyet al., 2010;van Doren et al., 2006). Alternatively, there, has been an increase in the applicability of different simulation strategies to speed up the computational process. One such technique is the Multiscale (MS) method (Hou and Wu, 1997;Jenny et al., 2003). MS methods aim to solve the equations at a coarser scale, yet preserving the fine scale description. MS methods have increasingly been demonstrated as an efficient and accurate technique for reservoir simulation (Ţene et al., 2015;Cusini et al., 2015;Kozlova et al., 2015Kozlova et al., , 2016. For an overview of important developments in MS methods we refer to . Among these developments, an important one in the scope of the present work is the development of iterative MS methods, more specifically the iterative Multiscale Finite Volume method (i-MSFV) (Hajibeygi et al., 2008;Hajibeygi and Jenny, 2011). Due to the localization assumptions utilized in the construction of the MS basis functions, the solution obtained via an MS scheme is not as accurate as the fine-scale solution. However, these discrepancies can be resolved if an iterative scheme is employed. Moreover, because i-MSFV provides a fine-scale error estimate and an approximate, however fully conservative velocity field, it allows for both accurate and efficient simulation with control of the error estimate (Hajibeygi et al., 2012).
Upscaling/homogenization techniques (Durlofsky, 2005), aim to compute effective model parameters to represent fine-scale properties at a coarser, computationally affordable, scale. Even though upscaling methods can provide accurate production/injection rates (Li and Durlofsky, 2016), which is essential for stochastic gradient-based techniques, it is often important to enable the accurate modeling of physical phenomena that requires a fine-scale heterogeneity representation, e.g. front displacements, capillarity effects, and mixing. On the other hand, MS simulation strategies aim to compute coarse scale primary variables, but are still able to represent an approximate solution at the fine scale, which is an advantage over upscaling techniques. However, this leads to, consequently, approximate reservoir responses, which will be utilized by stochastic gradient computation methods. On that note, it is shown in (de Moraes et al., 2017;Krogstad et al., 2011) that analytical gradients (e.g. adjoint-based ones) computed via MS strategies provide an accurate enough approximation of the true gradient, capable of providing optimization results comparable to fine-scale optimizations. Also, the quality of stochastic gradients compared to analytically computed ones is discussed in (Fonseca et al., 2015).
The remainder of the paper is organized as follows. First, we discuss the StoSAG optimization methodology followed by the multiscale reservoir simulation framework which has been implemented in-house and used in this study. We then provide the workflow used for the optimization experiments performed in this paper. The computational gain of the workflow is illustrated via an analysis of the computation complexity of the algorithms involved. Following the theoretical descriptions of these two building blocks we illustrate the advantages and computational gains achieved on two different reservoir models for optimization cases with and without geological uncertainties.

Stochastic gradient computation
In model-based optimization problems, traditionally, controllable variables, in individual wells or at field scale, such as injection or production rates, bottom-hole pressures (BHP) or inflow control valve (ICV) settings etc., are manipulated to maximize the value of an objective function such as Net Present Value (NPV) or Ultimate Recovery (UR). The controllable variables commonly referred to as controls are denoted by the vector u which consists of all the variables to be optimized at the different control time steps. In this section we briefly outline the StoSAG method first introduced in (Fonseca et al., 2014a). A detailed description of the method can be found in .
For a control vector u N u , where N u is the total number of controls and is the optimization iteration index, we generate an ensemble of multi-variate Gaussian distributed perturbed control vectors u { 1 , u 2 ,…, u } M , M being total number of ensemble members, around the control vector u with a user defined covariance matrix × C N N u u . At the first iteration the choice of the control vector is user-defined and at subsequent iterations determined by the optimization algorithm. The covariance matrix C is usually kept constant throughout the optimization although methods exists to adaptively update the covariance matrix see e.g. (Fonseca et al., 2014b;Stordal et al., 2016). The objective function J chosen to be optimized is then evaluated for each of the perturbed control vectors u { 1 , u 2 ,…, u } M which leads to a corresponding set of objective function values J { 1 , J 2 ,…, J } M . To estimate the stochastic gradient we assemble a mean-shifted matrix of the control vectors U defined as (2) The equations described above can be used for deterministic (single geological model) optimization. Recently many papers have investigated the theoretical and practical applications of stochastic gradients for robust optimization; see e.g. (Chen, 2008;Fonseca et al., 2014a) and references therein. In this paper we use a 1:1 ratio, i.e. one reservoir model to one perturbed control strategy, to robustly estimate the gradient. This leads to a slightly different notation for the vector of objective function anomalies given in (2) which for robust optimization is defined as .
where m are the geological model realizations which are equal in number to the number of perturbed control vectors. The StoSAG gradient is then calculated via linear regression and is given by where the superscript † indicates the Moore-Penrose pseudo inverse and g N u . The gradient calculated above is then utilized in a simple steepest ascent algorithm (Nocedal and Wright, 2006) to calculate an updated control vector until convergence is achieved. (Do and Reynolds, 2013) have shown that the formulation provided above has many commonalities with other stochastic gradient methods such as Simultaneous Perturbation Stochastic Approximation (SPSA) (Spall, 1992).

Multiscale reservoir simulation
Multiscale (MS) methods (Jenny et al., 2003;Hou and Wu, 1997), in summary, use locally constructed MS basis functions to enable the solution of the original fine-scale problem at a coarse scale and allow the solution to be accurately represented back to the fine-scale representation of the reservoir model.
In the context of this work we consider an immiscible, incompressible two-phase flow regime. Because MS methods are wellsuited to solve elliptic problems, a sequential strategy to solve the governing equations is traditionally considered. Here, an Implicit in Pressure, Explicit in Saturation (IMPES) coupling is employed. For further explanations about the governing equations and solution strategies see, e.g., (Aziz and Settari, 1979).
The two-level (Zhou and Tchelepi, 2012) MS system can be algebraically expressed as (Zhou and Tchelepi, 2008;Wang et al., 2014a) where R is an × N N C F restriction operator, P is an × N N F C prolongation operator, p N C is the coarse-scale pressure solution, q N F the source terms vector, and A is the × N N F F system matrix resulting from the flow equation discretization (Aziz and Settari, 1979). Also, N F and N C are, respectively, the fine grid and coarse grid number of gridblocks. The interpolated fine-scale pressure is obtained by where p R N F is the approximate fine scale solution. The prolongation operator P is constructed based on local basis functions. Different strategies to build the basis functions are available in the literature (Hou and Wu, 1997;Jenny et al., 2003;Efendiev and Hou, 2009;Møyner and Lie, 2016). The restriction operator R can be either constructed as the transpose of the prolongation operator in finite-elementbased multiscale methods, or simply be based on the grid geometry in finite-volume-based ones (Jenny et al., 2003).
Following the MS pressure equation solution, we solve for the saturations at the fine scale. Because of the hyperbolic nature of the transport equations, a fine grid is necessary in order to capture the local, sharp saturation fronts. Hyperbolic conservative laws require locally conservative velocity fields. However, MS approximate solutions are not conservative at the fine scale. There are different alternatives to build a fine-scale conservative velocity field to be used in the solution of the transport equation at the fine scale (Jenny et al., 2005;Hajibeygi and Jenny, 2011;Hajibeygi et al., 2012).
Finally, because the Courant-Friedrichs-Lewy (CFL) condition for numerical stability (Coatset al., 2003) can be very restrictive in terms of the time-step size selection, we employ an asynchronous time-stepping strategy (Chen et al., 2004). One should note that sequential implicit (Jenny et al., 2006) and adaptive implicit (Aziz and Settari, 1979) simulation strategies could be considered to address the CFL time-step restrictions. However, we emphasize that IMPES strategies can deliver efficient solution strategies due to the straight forward solution of saturation -it can be seen as a simple matrix-vector multiplication. Some simulators rely on IMPES and deliver very good performances (Tolstolytkin et al., 2014). In the context of optimization, which is the focus of this work, any efficient simulation coupling strategy could be considered. Also, MS strategies introduced by Lee et al., 2009), which solve the transport equation at the coarse scale, could be used in our proposed MS-StoSAG workflow.

MS-StoSAG workflow
The main idea of the workflow we present next is to speed up the forward simulations required for the evaluation of the objective function and for the StoSAG gradient computation used at every iteration during optimization. In this direction, regarding MS simulations, we want to avoid the overhead to the total simulation time due to the construction of basis function (Ţene et al., 2015). This can be achieved by employing the offline/online basis function construction strategy introduced by (Efendiev et al., 2015). The construction of the basis function, for all ensemble members, is performed at an offline stage, outside the main optimization loop. Online updates to the MS system are then performed in an online stage so that any required improvement due to MS solution approximations are compensated. The ensemble optimization workflow combined with the online/offline MS simulation strategy is shown in Fig. 1.
In our implementation, although a simple steepest-ascent with a variable step size optimization algorithm has been used, the framework is flexible to allow for the use of more sophisticated algorithms. Furthermore, for the robust optimization considered in this work, the uncertainty associated with the fine-scale parameters is accounted for through an ensemble of equiprobable geological realizations (Aarnes and Efendiev, 2008;Dostert et al., 2008). have addressed the application of MS methods for stochastic subsurface fluid flow modeling. Even though these MS methods have not been considered in this work, it can be observed that the framework here presented is independent of a particular MS method, and hence any formulation, including the stochastic MS version, could be considered in our proposed MS-StoSAG workflow.
A key aspect of our MS-StoSAG workflow is the online stage. This online stage happens during each forward MS simulation. In our simulator the flow (pressure) equation, is solved via the Multiscale Finite Volume (MSFV) method (Jenny et al., 2003). The MSFV method, by construction, is mass conservative (Jenny et al., 2003). The restriction operator in (5) is based on a finite volume integration operator at coarse scale resulting on a matrix of 0's and 1's (see e.g. (Wang et al., 2014a)), whilst the prolongation operator is based on basis functions constructed via the local solutions of the governing flow equation, subject to assumed local boundary conditions, without right-hand-side (RHS) terms where λ is the mobility and φ is the basis function value. This involves the definition of a primal coarse grid (on which the conservative coarsescale system is constructed) and a dual coarse grid, which is obtained by connecting coarse nodes. A coarse node is a fine cell inside (typically at the center of) each coarse cell. Basis functions are solved locally on these dual coarse cells. Such overlapping coarse and dual coarse grids are crucial for conservative solutions in MSFV. An illustration of the MSFV grids is provided in Fig. 2. As basis functions do not account for RHS terms, well terms are considered as supplementary functions, called well functions (Jenny and Lunati, 2009). Well functions are local solutions of the flow problem considering unity solutions at the well (i.e., p w = 1). An illustration of basis and well functions is provided in Fig. 3. The prolongation operator can be expressed in terms of the basis functions corresponding to each coarse cell = … j N 1, , C , and each well function w corresponding to all = … w N 1, , w wells adds a column vector to the porous rock prolongation operator We highlight that, in our implementation, the computation of well/ basis functions is only performed at the offline stage, as a pre-processing step. The basis functions are not updated because, firstly, the smoothing step of the iterative Multiscale Finite Volume (i-MSFV) method (discussed below) addresses the discrepancies left over from the MS solution stage and, secondly, it is more computationally efficient in the optimization context since basis functions can be computed a priori, i.e. at an offline stage. Building basis functions is a key element in the multiscale simulation framework. An optimum balance between efficiency and accuracy is directly associated with the coarsening ratio employed in the construction of the coarse grid. The ability of capturing the fine-scale heterogeneities and the position of the coarse nodes (Wang et al., 2016) are examples of factors that influence the choice of the coarsening ratio. In (Wang et al., 2014b) it is indicated that,   generally, a coarsening factor approximately equal to the square root of the number of grid-blocks in each direction leads to a good performance/efficiency trade-off. For the robust optimization experiments considered in this paper the basis functions for every single geological model realization have been built. However, because we do so at the offline stage, the computational cost of building all the basis function is relatively small when compared to the overall optimization cost. The approximate multiscale pressure solution provided by (6) is inaccurate when compared to the fine-scale solution of the pressure equation. Firstly, by design, the MSFV solution reflects the localization assumptions utilized to compute the basis functions in (8). Secondly, non-monotone solutions may occur for geological settings with high permeability contrasts, specially in low permeability regions (Hajibeygi, 2011) (as demonstrated, e.g., for the SPE10 benchmark case (Christie Bluntet al., 2001)). Again, the non-monotone solutions are associated with the construction of the MSFV basis functions. Different strategies have been proposed in the literature to address this issue, e.g. the monotone MSFV (m-MSFV) (Wang et al., 2016)    methods. However, in this work, all these discrepancies can be resolved if an iterative scheme is employed (Hajibeygi and Jenny, 2011).
The iterative Multiscale Finite Volume method (i-MSFV) is capable of resolving these differences by resolving the high frequency errors via some iterative (smoothing) solutions at the fine scale and resolving the low frequency errors via the MSFV coarse-scale solution. In brief, the method consists of re-writing Eqs. (5) and (6) in residual form and iteratively solving the resulting system of equations until a convergence criterion is met. Note that i-MSFV delivers conservative coarse-scale solutions after any iteration stage. As such, it is not used as a linear solver, but to maintain the desired user-defined accuracy. This is, in our implementation, the online stage that addresses the MS solution inaccuracies. This is illustrated in Fig. 4.
Regarding the conservative velocity field reconstruction, we also rely on the i-MSFV method, by smoothing the fine-scale i-MSFV solution until a sufficiently small fine-scale residual is achieved.
We address the CFL time-stepping restrictions by solving the transport equation using small time-steps, hence honoring the CFL condition, but keeping pressure and velocity fields unchanged. The transport solver time-step t s is limited by CFL conditions, but the flow solver time-step t p is not. The velocity field is kept unchanged until the transport solver time t s reaches t p . The simulator outputs the information required to compute the gradient and objective function when the final time t f is reached.

A note about computational complexity
We measure the computational efficiency of our workflow by comparing the relative computational complexity of a steepest ascent iteration using our MS-StoSAG strategy to the alternative of employing  a fine-scale simulation in the forward-model evaluations.
In the scope of our applications, because of the underlying physics and solution strategies employed, the computational cost of the forward simulation can be assumed to be the cost of the pressure equation linear system solution. This is because (1) for incompressible flow there is no partial derivative computation involved, (2) in the IMPES strategy the solution of the transport equation is proportional to a matrix-vector multiplication. Furthermore, the computational complexity of all other steps of the workflow can also be considered negligible when compared to the cost of solving a linear system.
The complexity of solving a linear system of size N is assumed to be O aN ( ) b , where a and b are constants associated with the specific linear solver employed. One steepest ascent iteration, disregarding any backtracking for the sake of simplicity, requires the evaluation of M forward simulations to evaluate all realizations' objective function values. Additionally, in order to estimate the StoSAG gradient in a 1:1 approach, we need to evaluate another M forward simulations.
is the cost for the MS simulation when empoying a two-stage i-MSFV strategy, where N iMS is the total number of i-MSFV iterations and N S is the total number of smoothing steps per i-MSFV iteration.
Given that we are interested in evaluating the relative gain of the workflow compared to ensemble optimization when fine-scale forward simulations are employed, we can say that, given our setting, this can be assessed by simply computing the ratio Thus, because of the discussion above, and because our steepest ascent stopping criteria is the maximum number of iterations, the computational gain of the MS-StoSAG workflow for the MS setting we use can be directly estimated from the computational cost ratio between fine-scale and MS cost to solve the flow equation. This will be the metric used when evaluating the computational cost in the numerical experiments.

Numerical experiments
We illustrate the application of our proposed framework for lifecycle waterflooding optimization of two different models. For both models, the controls are bottom-hole pressures in the injection and production wells. The objective function used for optimization is a standard economic objective, Net Present Value (NPV), as defined in Eq. (9), for which we use the prices provided in Table 1 where q o k , represents the oil production rate in m 3 /day, q wp k , is the water production rate in m 3 /day, q wi k , is the water injection rate in m 3 / day, r o is the price of oil produced in $/m 3 , r wp is the cost of produced water in $/m 3 , r wi is the cost of injected water in $/m 3 , t k is the difference between consecutive time steps in days, b is the discount factor expressed as a fraction per year, t k is the cumulative time in days corresponding to time step k, and t is the reference time period for discounting, typically one year.
In the MS simulation of all numerical experiments there is no basis function reconstruction. Instead, we delegate the resolution of the discrepancies left by the localization assumptions and inaccuracies due to outdated basis functions to the i-MSFV smoothing step. Also, the conservative velocity field is reconstructed directly from the smoothed pressure field. Therefore, we require the i-MSFV loop to reduce the fine scale residual with one order of magnitude, while the fine scale residual tolerance after smoothing is set to 10 6 . A l 2 -norm is employed to compute the residual norm. The stabilized bi-conjugate gradient iterative solver with ILUT preconditioner (Saad, 1994) is used as the finescale smoother.

Toy model -five-spot model
In the first numerical experiment, in order to evaluate the proposed workflow in a relatively controlled environment, a toy-model, consisting of a simple synthetic 2D inverted five-spot model is considered.    Table 2. For the optimization, we have 5 control variables per control time step. In this exercise we use 6 control time steps of 720 days each, thus we optimize 30 variables. The values of the bottom-hole pressures are bounded for the production wells between a minimum value of 28 MPa and a maximum value of 30 MPa. The injection well pressures are bounded between a minimum value of 30 MPa and maximum value of 32 MPa.

Deterministic life-cycle optimization
We first consider a single model realization. For this purpose we use an ensemble of 50 perturbed control vectors with a perturbation size defined as 10% of the difference between the min and max bounds of the controls. The stopping criteria used is a maximum number of 25 optimization iterations. The initial starting strategy is one wherein the injector well operates at a constant BHP of 31 MPa and the production wells at a constant BHP of 29 MPa. When working with reduced order or upscaled models it is imperative to compare and validate the results with respect to the fine scale model realization.

Multiscale optimization.
We consider two different coarsening ratios to test the efficiency of our proposed MS-StoSAG workflow. The two coarsening ratios are 7 × 7 and 3 × 3. Thus a deterministic optimization with the optimization parameters discussed above is performed for 3 different models, fine scale, i-MSFV 7 × 7 and i-MSFV 3 × 3. The optimization process is illustrated in Fig. 6. We observe that all three models find optimal strategies which produce approximately similar NPV values. We also observe that while the i-MSFV 7 × 7 model achieves a slightly lower NPV compared to the fine scale model, the i-MSFV 3 × 3 model achieves a higher NPV than the other two models. Thus the coarsest model used produced the highest NPV which may be counter-intuitive. Cross validation of the i-MSFV strategies on the 21 × 21 fine scale model produces NPV values which are indistinguishable from each other which could be the result of a relatively simple small model. Fig. 7 is a comparison of the optimal control strategies obtained from the different models. We observe that all the strategies are very similar to each other thus the objective function values are also similar.  4.1.1.2. Sensitivity to optimization parameters. The parameters which influence the computation of a stochastic gradient such as the ensemble size of the perturbed controls, perturbation sizes to generate the ensemble of controls, random seeds used to generate the perturbed controls were varied and tested with the different i-MSFV models. The results from this exercise followed the exact same trends as reported in earlier publications see e.g. (Fonseca et al., 2015) who investigated the impact of gradient quality based on the various parameter choices.

Robust life-cycle optimization.
Inclusion of uncertainty within the optimization framework is usually accounted for by utilizing an ensemble of equiprobable geological model realizations. In this paper a large ensemble of 1000 realizations of the five-spot model was generated via the decomposition of a reference permeability image using Principal Component Analysis parameterization. See (Jansen, Fig. 9. Comparison of the mean objective function value for the three different simulation strategies, fine scale (blue), i-MSFV 7 × 7 (orange) and i-MSFV 3 × 3 (red), toy model. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) Fig. 10. Comparison of the controls that generated the highest NPV, toy model. 2013) for more details. Fig. 8 illustrates 4 different permeability realizations from a reduced ensemble of 50 members used in the optimization experiments. The reduced ensemble was selected by simulating all 1000 realizations with the same control vector and uniformly sampling realizations based on the highest, lowest and intermediate objective function values. We acknowledge that there exists a range of formal clustering techniques to carefully select a representative ensemble. For the proof-of-concept optimization experiments in this paper the representativeness of the selected ensemble to a larger ensemble is deemed to be outside the scope of this paper. The objective function used for the robust optimization experiments is the expected NPV calculated over the 50 realizations. Fig. 9 illustrates the evolution of the mean NPV through the optimization process for the different simulation models. The same coarsening ratios and initial starting strategy has been used as the deterministic case. The fine scale strategy achieves the highest objective function values followed by the i-MSFV 7 × 7 model and the i-MSFV 3 × 3 model. Though the five-spot model is relatively small it is reassuring that, irrespective of the different models (fine scale or i-MSFV) used, very similar objective function values have been achieved when geological uncertainties are accounted for. Fig. 10 is a comparison of the optimal control strategies for two different models, fine scale and i-MSFV 3 × 3. The optimal control strategies achieved after robust optimization are quite different from the strategies achieved for deterministic optimization which could be the impact of geological uncertainties, the correspondingly different basis functions and impact of coarsening ratio on the different realizations. A-priori determination of the impact of these factors on the results is non-trivial and is probably model dependent. Table 3 provides a comparison of the computational efficiency and speedup gained for the different models used for the experiments. We use the linear solver time to measure the speed up. This accounts for all computations performed by the i-MSFV solution strategy, i.e solution of the coarse system plus fine-scale smoothing. Also, it is well-known that the linear system solution is the most time-consuming part in reservoir simulation framework (Aziz and Settari, 1979).

Computational efficiency
An approximately factor 4 speedup can be achieved for this relatively small five-spot model. The average number of smoothing steps throughout the optimization is approximately similar, irrespective of the coarsening ratio used which is model-and problem-dependent. The tolerances of the residual for the MS solution and the smoothing steps are the same for all cases. In order to understand and validate the efficiency of the proposed framework a larger model is used below.

Kanaal reservoir model
A second example is considered to illustrate the effectiveness of our proposed workflow. We use a 99 × 99 2D model with grid blocks of 10 × 10 × 5 m. Thus we have approx. 10,000 grid blocks in this model compared to the 441 grid blocks in the five-spot model. The geological description used in this model has been inspired from a typical channelized North Sea reservoir. The channels (Kanaal in Dutch) are set in a sandy shale background with permeabilities ranging from 10 to 50 mD. The channels have permeabilities ranging from 250 to 700mD with constant porosities of 20%. This reservoir is developed using a line drive well configuration with 3 injection wells and 6 production wells as illustrated in Fig. 11. The fluid properties of this reservoir are given in Table 4.
An ensemble of 50 geological realizations has been created using geological modeling software (Schlumberger and Petrel, 2016) to be used for the robust optimization experiments. Each of the realizations has been constrained to well logs from 4 exploration wells which were generated from a base-case model. This base-case model does not form part of the ensemble of model realizations. An illustration of a subset of models is illustrated in Fig. 12. The life cycle period for this reservoir is 12 years. We allow the controls, i.e. the bottomhole pressures in all wells, to be manipulated once every 6 months, i.e. 24 we have control

Robust life-cycle optimization
For this example we focus on robust optimization experiments. The initial control strategy corresponds to the bottomhole pressures on their maximum bound for the injectors, i.e. a constant pressure of 32 MPa, and on their minimum bound for the producers, i.e. 28 MPa. Different coarsening ratios and their impact on the optimization are investigated. The coarsening ratios for this example are 3 × 3, 11 × 11 and 33 × 33. The fine scale model is 99 × 99. We observe in Fig. 13 that all the optimization experiments find solutions that achieve an objective function value higher than a reactive control strategy, where the production wells are shut-in once the water-oil ratio exceeds a preset maximum. The economic water cut for the reactive control strategy for the prices considered in this paper is 82%. The reactive control strategy has been used as a reference solution when evaluating the performance of life-cycle optimization techniques in line with the approach first introduced in (Van Essen et al., 2009). We also observe from Fig. 13 that different coarsening ratios achieve different optimal mean NPV values with similar trends as observed for the five-spot model. It is also important to note that, in this case, the optimization runs performed with the MS-StoSAG workflow provide mean NPV values higher than the one obtained by the fine-scale optimization.

Comparison of optimal strategies
Fig. 14 is an illustration of the optimal controls for 2 wells for the fine scale model and the coarser scale i-MSFV 11 × 11 model. The optimal controls obtained by the other models are significantly different from each other with the same trend observed for the other seven wells. This is a fact that has been consistently observed in the literature, first noted in. Considerably different control settings can lead to relatively similar NPV values due to the over-parameterization of the optimization problem. This is in accordance with numerous other studies; see, e.g., (Jansen et al., 2008;Van Essen et al., 2011;Do and Reynolds, 2013). A comparison of the optimal controls for two different i-MSFV models is illustrated in Fig. 15. Once again, we observe that different optimal control strategies achieve very similar objective function values.
In addition to visually recognising differences in the optimal control strategies we provide in Table 5 a comparison of the optimal volumes of oil and water produced and injected for the different simulation models. We observe an interesting trend in the results. Higher volumes of oil and water are produced in the i-MSFV models compared to the fine scale model. Additionally we observe that the higher the coarsening ratio, the higher are the production and injection volumes. An important aspect to be considered here is the ability of the local basis functions to represent the fine-scale heterogeneities. The coarsening ratio and the positioning of the coarse node vertices are instrumental in achieving higher quality basis functions. Different strategies exist to achieve higher quality basis functions (e.g. (Wang et al., 2016)), however, in our experiments, we rely on the ability of the i-MSFV to correct the MSFV solution approximations/errors.   15. Comparison of the optimal control strategies from i-MSFV 11 × 11 and i-MSFV 3 × 3 strategies, Kanaal model.

Table 5
Comparison of optimal total produced oil (Q o ) and water (Q wprod ) and total injected volume (Q winj ) for the different simulation strategies, Kanaal model.

Computational efficiency
In this section we compare the computational efficiency and speedup gained for the different coarsening ratios. For the larger Kanaal model an approximately 5 times speedup in computational efficiency is achieved as observed in Table 6. The tolerances of the residual for the MS solution and the smoothing steps are the same for all the cases. We also observe a clear trend in the average number of smoothing steps required for the different coarsening ratios with a higher number of smoothing steps being required for a coarser model.

Discussion
The multiscale implementation used in this work can be further improved upon to obtain an even higher computational efficiency similar to the results reported in (Krogstad et al., 2011). The speedup in the computational efficiency reported in (Krogstad et al., 2011) was obtained with coarse-scale representations for the both the flow and transport equations. In this paper the flow problem is solved at the coarse scale while the transport equation has been solved on the fine scale which improves the accuracy of our results.
Even though no basis function reconstructions were performed in our experiments, and the i-MSFV fine-scale smoothing performed remarkably well, that might not be the case when more complex physics are involved in the simulation model (e.g. gravity effects, capillarity, higher mobility ratios). However, even under more complex scenarios, MS strategies should still deliver considerable speedups given that basis functions need only be built adaptively and infrequently. Also, the employment of different velocity field reconstruction techniques could be necessary in the case that i-MSFV becomes expensive. But again, techniques like those presented by (Jenny et al., 2005;Hajibeygi and Jenny, 2011) benefit from adaptivity. Furthermore, the implementation of our MS reservoir simulator does not yet capitalize on all potential advantages of MS methods. MS methods are well suited to take full advantage of modern high performance-computing architectures. For instance, the solution of basis functions are embarrassingly parallelizable Manea et al., 2016), an advantage with respect to global-based ROM approaches. Moreover, we expect that the computational advantage will increase for larger-sized reservoir models, relying on the model-size-dependent computational performance of MS methods, as discussed, e.g., in (Jenny et al., 2003).
When working with coarse scale models for optimization it is imperative to validate the optimized strategy using the fine scale model to understand and quantify the impact of optimization. Reduced-order or upscaled models, though computationally very efficient, do not always produce similar production responses compared to the fine scale model. In our approach, because the transport problem is always solved on the fine scale, the optimal strategies need not be validated as is confirmed by the results in this paper. This is an additional attractive feature of the workflow proposed in this paper. When working with ROM methods such as POD (van Doren et al., 2006) or TPWL (Cardoso Durlofskyet al., 2010), multiple high-fidelity simulations need to be performed a-priori (akin to an offline stage) to develop the ROM and afterwards to verify the accuracy of the results. In our workflow we alleviate the need for these expensive pre-and re-computations. In our MS-StoSAG workflow the offline stage i.e., computation of the basis functions in the MS method is extremely cheap compared to the overall cost of the optimization process.

Conclusions
In this paper we presented a computationally efficient multiscale based stochastic optimization (MS-StoSAG) workflow, and applied it to two synthetic test cases. Deterministic and robust optimization experiments were performed to illustrate that significant improvements in objective function values (10-15% relative to the reactive control strategy) are attainable in a computationally efficient manner. The validation of our results highlights the accuracy of the multiscale forward simulation models. The results from our experiments show that an approximately five-times speedup can be achieved with scope for even higher speedups with our proposed workflow. We have illustrated that our MS-StoSAG workflow can achieve computationally efficient results, which improves the applicability of robust life-cycle optimization.