Iterative multiscale gradient computation for heterogeneous subsurface flow

Abstract We introduce a semi-analytical iterative multiscale derivative computation methodology that allows for error control and reduction to any desired accuracy, up to fine-scale precision. The model responses are computed by the multiscale forward simulation of flow in heterogeneous porous media. The derivative computation method is based on the augmentation of the model equation and state vectors with the smoothing stage defined by the iterative multiscale method. In the formulation, we avoid additional complexity involved in computing partial derivatives associated to the smoothing step. We account for it as an approximate derivative computation stage. The numerical experiments illustrate how the newly introduced derivative method computes misfit objective function gradients that converge to fine-scale one as the iterative multiscale residual converges. The robustness of the methodology is investigated for test cases with high contrast permeability fields. The iterative multiscale gradient method casts a promising approach, with minimal accuracy-efficiency tradeoff, for large-scale heterogeneous porous media optimization problems.


Introduction
Derivative computation is an important aspect of gradient-based optimization algorithms. When the objective (or cost) function evaluation involves the numerical simulation of discretized partial derivative equations (PDE), efficient gradient computation is of utmost importance. It is well documented in the literature that the most efficient and accurate way of computing derivatives are the analytical direct ( Anterion et al., 1989;Rodrigues, 2006;Oliver et al., 2008 ) (when the number of cost functionals is greater than the number of parameters) and Adjoint ( Chavent et al., 1975;Li et al., 2003;Oliver et al., 2008;Kraaijevanger et al., 2007;Rodrigues, 2006;Jansen, 2011 ) (if the number of parameters is greater than the number cost functionals) methods. However, even when considering efficient gradient methods, due to the necessity to evaluate the forward model and the derivative information many times, up until the optimality conditions are met, techniques to reduce the forward model simulation cost have been proposed ( Jansen, 2011;Cardoso et al., 2009;van Doren et al., 2006 ).
Multiscale (MS) simulation methods ( Jenny et al., 2003;Hou and Wu, 1997 ) have been increasingly employed for the efficient solution of elliptic ( Zhou and Tchelepi, 2008 ) and parabolic ( Ţ ene et al., 2015 ) equations, more specifically subsurface flow problems in highly heterogeneous porous media. Also, many developments to extend its applicability to extended physics have been observed in the recent years ( Lie et al., 2017 ).
Moreover, MS derivative computation strategies, based on MS forward simulation models, has also been subject of study. MS adjoint formulation (MS-ADJ) for single-phase subsurface flow have been presented in Fu et al. (2010Fu et al. ( , 2011 . MS adjoint computation methods for multiphase flow have also been developed ( Krogstad et al., 2011;Moraes et al., 2017 ). More recently, a mathematical framework for MS computation of derivative information has been developed ( de Moraes et al., 2017 ). It has been highlighted in de Moraes et al. (2017) ; Fu et al. (2010) that inaccurate MS gradient computations could lead to inaccurate gradient directions. However, as it is indicated in de Moraes et al. (2017) , strategies that improve the MS forward simulation solution (e.g. refinement of the MS coarse grid) result in better gradient estimates. Moreover, in Fu et al. (2010) it is suggested that an iterative MS gradient computation strategy could resolve the multiscale gradient inaccuracies. In this work, we develop an iterative multiscale gradient computation strategy which converges to the fine-scale gradient solution, thus allowing for error control and reduction for multiscale gradients. The derivative computation method is based on the generic mathematical framework introduced in de Moraes et al. (2017) . By augmenting the MS model equation and the state vectors with the i-MSFV smoothing stage, the framework is capable of providing derivative information at any desired accuracy, up to fine-scale precision. The augmentation is addressed by the implicit differentiation strategy. In the formulation, we avoid additional complexity involved in computing partial derivatives associated to the smoothing step by also only approximately solving the derivative state equation associated to the it. Also, the strategy seamlessly accommodates both the Direct and Adjoint methods.
The remaining of this paper is organized as follows. Firstly, we present an algebraic formulation for the i-MSFV method, suitable for the derivation of the derivative computation methods. Next, we derive the Direct and Adjoint methods to compute derivative information, following the i-MSFV framework, when the algorithms are presented. Numerical validation of the method against numerical differentiation is presented. The numerical experiments conducted show that fine-scale gradient can be reproduced via the i-MSFV method if the forward simulation converges to a small enough residual tolerance. We show numerical evidence that there is a relationship between the gradient quality and the i-MSFV solution residual by comparing fine-scale gradient and i-MSFV gradients and relating the difference between the two with the pressure error norm. Concluding remarks are finally presented in the last section.

Algebraic and algorithmic description of the multiscale iterative method
We consider the set of equations that algebraically describes the forward simulation at the fine scale, without any assumption regarding the underlying physical model, as ( de Moraes et al., 2017 ) where ∶ ℝ × ℝ → ℝ represents the set of algebraic forward model equations, ∈ ℝ is the state vector (which, for single-phase flow, contains the grid block pressures), θ ∈ ℝ is the vector of parameters, and the subscript F refers to 'fine scale'. There are N F fine-scale cells and N parameters. Eq. (1) implicitly assumes a dependency of the state vector x on the parameters θ, i.e. = ( θ) . (2) Once the model state is determined, the observable responses of the forward model are computed. The forward model responses may not only depend on the model state, but also on the parameters themselves, and can be expressed as where ∶ ℝ × ℝ → ℝ represents the output equations ( Jansen, 2016 ). It is assumed that g F can be described as where ( θ) ∈ ℝ × ℝ matrix and ( θ) ∈ ℝ . A two-stage multiscale (MS) solution strategy ( Jenny et al., 2003;Wang et al., 2014 ) can be devised by firstly computing a coarse scale solution where N C is the number of coarse grid-blocks, and then an approximated fine-scale solution The so-called prolongation operator = ( θ) which is an N F × N C matrix that maps (interpolates) the coarse-scale solution to the finescale resolution. The so-called restriction operator = ( θ) is defined as an N C × N F matrix which maps the fine scale to the coarse scale. More information about how these operators are constructed for the Multiscale Finite Volume (MSFV) method can be found in Zhou and Tchelepi (2008) ; Wang et al. (2014) . Let ̆ ∈ ℝ be the coarse scale solution ( N C ≪ N F ), and ′ ∈ ℝ the approximated fine-scale solution.
The iterative multiscale strategy ( Hajibeygi et al., 2008 ) can be devised by considering versions of Eqs. (4) and (5) written in residual form. Let −1 be an approximate solution to Eq. (4) at iteration − 1 and be the corresponding residual. A multiscale improvement to this approximation can be devised by writing Eq. (5) in residual form as where Here, ̆ is redefined as the coarse scale correction. Redefining Eq. (6) , we have such that x ′ now represents the approximate fine-scale correction at iteration , i.e., is an approximate solution of Eq. (4) augmented with the correction from the coarse-scale calculation. The approximate solution provided by Eq. (11) can be improved if successive smoothing steps are employed ( Hajibeygi et al., 2008 ). Let −1 ∈ ℝ , be the smoothed residual obtained from the approximation given by Eq. (11) and a version of Eq. (4) written in residual form. Here ∈ ℝ is the smoothed fine-scale correction at iteration . The solution smoothing is obtained by solving Eq. (13) using any iterative solver up to a prescribed (loose) tolerance or (small) maximum number of iterations ( Hajibeygi et al., 2008;Ţ ene et al., 2015 ). The solution for a given iteration is, where The MS iterative strategy is fully depicted in Algorithm 1 , where ∥. ∥ represents the 2-norm.
In Algorithm 1 , and are, respectively, the user-defined tolerances for the outer-loop and smoothing step. That allows to control the smoothing step as a relative improvement starting from the MS approximate solution. An investigation of an optimal relationship between the number of outer loops and the number of smoothing steps is presented in Ţ ene et al. (2015) .

Iterative multiscale gradient computation
For the developments that will follow in this section, it is convenient to write the set of equations that is solved in every iteration, namely Algorithm 1: Iterative multiscale method ( Hajibeygi et al., 2008 ).
Input : ̆ , , , , , , Output : Approximate solution for the linear system = (10) , (13) and (14) , in matrix form as One must note, however, that, in the i-MSFV procedure, Eq. (13) is solved only approximately and, therefore, strictly speaking the equation in the third row of Eq. (15) does not hold. The idea here is to describe the procedure in an algebraic manner, ignoring this approximation, in order to facilitate the presentation of the derivative calculation algorithms in the next section. Once the derivative calculation methods are obtained under the assumption that the algebraic relations in Eq. (15) hold, the same type of smoothing approach employed in the i-MSFV to resolve high frequency errors will be used in the solution of the derivative information. This results in a practical semianalytical algorithm for derivative calculation in an iterative formulation. Note that a fully analytical procedure would require calculating the derivative of the smoothing operator employed in the i-MSFV (step 8 in Algorithm 1 ), which can be quite complex, due to its nonlinear character ( Frank, 2017 ). The proposed semi-analytical approach also becomes general and applicable to any iterative procedure used in step 8, regardless of its nature. A truly analytical derivative method would require the implementation of derivative calculation for each iterative procedure used. More details on the implication of this assumption are discussed in 3.2.1 .
Considering all equations that must be solved for all iterations ∈ {1 , … , } , they can be collated in a super-vector ( Rodrigues, 2006;Kraaijevanger et al., 2007;Jansen, 2016 ) fashion as and the super-state vector defined as Note that we use bold-italic in the notation to represent super-vectors. It is discussed in Rodrigues (2006) ; de Moraes et al. (2017) , de Moraes et al. how any derivative information can be efficiently computed from the pre and post multiplication of the sensitivity matrix G , ∈ ℝ × , by arbitrary matrices as where V (of order θ × ) and W (of order m × N y ) are arbitrary matrices, defined based on the derivative information one wants to obtain. Eq. (18) requires the partial derivative of the model equations with respect to the parameters. From Eqs. (5) and (6) , as discussed in de Moraes et al. (2017) , it follows that and deriving Eq. (13) with respect to θ Also, from Eq. (14) , it follows that For the sake of simplicity and in coherence with the MSFV method used in the numerical experiments, the dependency of R on θ is neglected. Now, the order the operations in Eq. (18) are evaluated define the Direct and Adjoint methods. The derivation of both methods will be discussed in the next sections.

The direct method
If W is factored out in Eq. (18) , it can be rewritten as is solved from (25) The linear system described in Eq. (25) can be re-written in a blockwise fashion for each iteration : where, from Eqs. (8) , (10) , (13) , and (14) The partitioning lines indicate which matrix and vector terms belong to each iteration.
Substituting Eq. (27) and (28) in Eq. (26) , follows that For every iteration, the linear system that must be solved for the coarsescale equation is for the fine-scale approximate solution equation and for the smoothing equation As pointed out above, it would not be feasible to fully solve Eq. (32) . In order to obtain a practical derivative calculation method, the same smoothing procedure used in the i-MSFV is applied here, i.e., is obtained by solving Eq. (32) using any iterative solver up to a prescribed (loose) tolerance or a (small) maximum number of iterations. Finally, Observing that h only depends on = , Eq. (23) can be simplified to Now the Direct method for the iterative multiscale gradient computation can be fully determined. It is depicted in Algorithm 2 . Note that references to superscript −1 correspond to zero terms.
Algorithm 2: Post multiplication of G by V via the direct method.

The adjoint method
Now, if V is factored out in Eq. (18) , it can be rewritten as The linear system described in Eq. (37) can be re-written in a blockwise fashion for each iteration as (38) The structure of Eq. (38) shows that it can be solved via back substitution, i.e. the solution of the gradient information is backward in the iterations.
Substituting Eqs. (27) and (28) in Eq. (38) , follows that From Eq. (39) and recalling that h only depends on = , the (transposed) linear system that must be solved to compute Z is given by The equation that computes the "adjoint state " associated with reads and The "adjoint state " for is calculated from As discussed for the direct method, in the proposed derivative calculation method, Eq. (44) is solved as a smoothing step only, using any iterative solver up to a prescribed tolerance or number of iterations.
For g ′ , the adjoint state is given by and finally for ̆ Eq. (35) can be written as a sum where each term corresponds to the contribution of one iteration The algorithm can now be fully defined and is presented in 3 . Note the use of the notation ( WG ) to denote the partial sums in Eq. (47) and that references to superscript + 1 correspond to zero terms.
Algorithm 3: Pre multiplication of G by W via the adjoint method.

Remarks about the framework
An alternative formulation for the i-MSFV formulation has been previously proposed and investigated in Frank (2017) . In that work, both the state and the model equation vectors explicitly account for the smoothing stage. The formulation here proposed is based on two observations. Firstly, the implementation of the aforementioned variant, although offers more control over the gradient quality, relies on the ability of computing partial derivative matrices of smoothing step with respect to the parameters. More specifically, it implies on the knowledge of how the precondition M of the system matrix A is built. For simpler iterative strategies, e.g. Jacobi, which construction of M can be simple, the computation of θ is relatively simple. However, simpler iterative methods are usually less efficient. Also, the requirement of knowing the construction of M hampers the utilization of 'black-box' type of preconditioners. Secondly, it has been shown in Ţ ene et al. (2015) that only a limited number of smoothing steps are necessary to result in an efficient i-MSFV solution strategy. Hence, not much extra control would be achieved. Note that the linear solvers employed in lines 9 and 4 of, respectively, Algorithms 2 and 3 , are the same solvers employed in the solution of the forward simulation, using the same prescribed (loose) tolerance. Hence, the algorithms share the same computational advantages by solving for the approximated derivative information arising from the smoothing step.
The backward algorithm requires storing all intermediate states generated during the iterations in the forward run. It also requires solving many systems of equations for each backward time-step. If the iteration process in the forward run goes until machine precision is reached, then essentially the fully coupled system has been solved to fine-scale precision and it might be more beneficial to neglect the iteration history and aim to solve the fine-scale system of adjoint equations in a more efficient way, given that the derivative computation problem is linear. However, we highlight that, as it has been shown in the literature ( Fonseca et al., 2015; de Moraes et al., 2017 ), approximate gradients computed from approximate solutions are already sufficient to efficiently/successfully lead the optimization process to the optimal solution. How accurate this gradient will need to be is dependent on different aspects, e.g. the optimization algorithm, how early/late the optimization process is in terms of finding the maximum/minimum of a given objective function, among others. The algorithm here proposed has the advantage of controlling the gradient quality, as will be shown in the numerical experiments presented in the next section.

Numerical experiments
Given the fundamental nature of the developments presented here, we focus the numerical experiments on the validation of the computation and assessment of the gradient quality provided by the iterative multiscale method.
Our experiments will be based on the evaluation of the gradient of a misfit objective function with no regularization term ( Oliver et al., 2008 ) with a gradient where ∈ ℝ × is the so-called data covariance matrix. Here, C D will be considered a diagonal matrix given by Oliver et al. (2008) where 2 is the variance of the data measurement error.
In all experiments, the fitting parameters are cell-centered permeabilities. The observed quantity, d obs , is the fine scale pressure at the location of (non-flowing) observation wells, therefore and ̆ = .
In all experiments, the standard deviation of the pressure measurement error is ≈ 0.03 (note that the measurement error is also nondimensional). This represents a (very accurate) measurement error in the range of those usually employed in synthetic study cases (see e.g. Oliver et al., 2008 ). Note that the wells are controlled by bottom-hole pressure, expressed in terms of a non-dimensional pressure, i.e., where p inj and p prod are the injection and production pressures, respectively. In all the experiments, = 1 . 0 and = 0 . 0 , the grid-block dimensions are Δ = Δ = Δ = 1 m and the fluid viscosity is 1 . 0 × 10 −3 Pa s. In addition, in all the following test cases, well basis functions are included.
The metric utilized to assess the i-MSFV gradient quality is the angle between fine-scale and i-MSFV normalized gradients, i.e., Here, and Also, ∇ θ and ∇ θ denote the fine-scale and MS analytical gradients, respectively. As a minimum requirement, acceptable MS gradients are obtained if is much smaller than 90 o ( Fonseca et al., 2015 ).
And to prove our hypothesis, we particularly interested in observing the behaviour of the metric as more accurate i-MSFV solutions are computed.
In our i-MSFV implementation, the iterative process is controlled by the outer loop residual and the pre-conditioner smoother error tolerance . The Krylov subspace biconjugate gradient stabilized method (BiCGSTAB, Saad, 2003 ) is employed in the smoothing stage.

Validation experiments
Before focusing in the validation, in this section we will validate the iMS-gradient method against the numerical differentiation method with a higher order, two-sided Taylor approximation where we consider to be a multiplicative parameter perturbation. We define the relative error as Here ∇ θ is obtained by performing the proper amount of iterative multiscale reservoir simulations required to evaluate Eq. [57] . ∇ θ is obtained by employing the iterative Direct or Adjoint gradient method.
To evaluate the correctness of the proposed iterative gradient computation methods, as well as their implementation, we investigate the linear decrease of the relative error by decreasing the parameter perturbation from 10 −1 to 10 −4 . This investigation is carried out in two distinct examples. Both have a fine grid of 21 × 21 grid blocks. We employ a 7 × 7 coarsening ratio, giving a 3 × 3 coarse grid. The reference twin-experiment is generated with permeability realization number 992. Fig. 1 illustrates the fine-, coarse-and dual-grid cells along with the reference permeability.
Next we determine the well positions. We use the so-called quarter well spot. Here, two observation wells are placed near operating wells. The full specifications can be found in Table 1 .
The results of this experiment are found in Fig. 2 . Here, we use a single outer iteration. We use a very tight smoothing tolerance of = 5 × 10 −8 to ensure that the numerical gradient method produces accurate enough gradients. First of all we can see that the fine-scale numerical gradient method and the iterative MS-gradient methods are of the same order of accuracy with respect to the perturbation , for all different cases considered. In all experiments, we can see the linear decreasing behaviour of the relative error values as the perturbation decreases. Also, note that the Adjoint and Direct methods provide the analytical gradient with the same level of accuracy. In the first experiment, a homogeneous test case with the permeability value of 1 . 0 −13 is used. The second experiment indicates the correctness of the method when it is applied to heterogeneous porous media problems.

Investigation of i-MSFV convergence behaviour and gradient quality
In this investigation, the error metric given by Eq. (54) is evaluated for a whole ensemble of heterogeneous permeability fields. The   ensemble is generated via the decomposition of a reference permeability image using Principal Component Analysis parameterization. Fig. 3 illustrates 4 different permeability realizations from the ensemble. See Jansen (2013) for more details. The fine-scale and coarse grids contain 21 x 21 and 7 x 7 cells, respectively. An inverted five-spot well pattern is employed, while four observation wells are placed close to the production wells. The well configuration is depicted in Table 2 .
The synthetic observed pressures at the observation well locations are created via the classical twin-experiment strategy, where the permeability field is extracted from the 1000 geological realizations (the first one in Fig. 3 ).
The robustness of the method is illustrated in Fig. 4 . It is possible to observe that the angle between fine-scale gradient and the i-MSFV gradient is smaller the tighter we make the outer-loop residual tolerance. Moreover, the variance also goes to almost zero as we set the residual tolerance to 1 −4 . For this set of relatively small permeability contrast, perfect alignment with fine-scale gradient is reached if the tolerance is

Robustness with respect to heterogeneity contrast and distribution
In order to further explore the point about the robustness of the method with respect to heterogeneity contrast and distribution, four sets of 20 equiprobable realizations of log-normally distributed permeability fields with a spherical variogram and dimensionless correlation lengths of Ψ 1 = 0 . 5 and Ψ 2 = 0 . 02 are generated using sequential Gaussian simulations ( Remy et al., 2009 ). For each set, the variance and the mean of ln ( k ) are 2.0 and 3.0, respectively, where k is the grid block permeability. As depicted in Fig. 5 , for the realizations with a long correlation length, the angles between the permeability layers and the horizontal axis are 0 o , 15 o , and 45 o . A patchy (small correlation length) pattern is also considered ( Fig. 5 d). Compared with the previous set, the permeability contrast is much higher in this case.
The fine-scale and coarse grids contain 100 × 100 and 20 × 20 cells, respectively. The well configuration utilized in this numerical experiment is depicted in Table 3 .
The observed data is generated from a twin-experiment associated with (the first) permeability realization of each set.
In this experiment, = 1 . 0 −6 and = 1 . 0 −1 . The box-plot shown in Fig. 6 summarizes the required total number of smoothing steps, for all outer i-MSFV steps.
The grid orientation effect ( Aziz et al., 1993 ) impact on the performance of the i-MSFV method is clear in this example. The more the heterogeneity orientation is aligned with the flow orientation, the less  is the number of required iterations. Also, in relation to Fig. 7 , the more challenging the forward problem, the more challenging it is to compute i-MSFV gradient in accordance to the fine-scale gradient. Nevertheless, in all cases, almost all i-MSFV realization gradients are perfectly aligned with fine-scale gradient, demonstrating the robustness of the method.

SPE-10 comparative test case
Now, we investigate the performance of our method in the SPE-10 comparative case ( Christie et al., 2001 ), regarded as challenging model for upscaling ( Durlofsky, 2005 ) and multiscale simulation ( Hajibeygi, 2011 ) techniques. Here, we consider the 2-D flow simulation of both top and bottom layer of the original 3D model, which permeability fields are illustrated in Fig. 8 .
The fine grid dimensions is 60 × 220, while we employ a 12 x 20 coarse grid in the MS simulation. A quarter five-spot well setting is Fig. 5. Permeability distribution of four different realizations taken from the sets of 20 geostatistically equiprobable permeability fields with different correlation angles (a-c). Also, a patchy field (d) with a small correlation length is considered. Fig. 7. Illustration of gradient quality improvement when an i-MSFV gradient computation strategy is employed in comparison to a MSFV computation strategy. The x-axis represent the angle between fine-scale and the MSFV gradient (illustrated by blue crosses) and the i-MSFV gradient with error tolerance = 10 −6 (illustrated by orange circles). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)  Table 4 .
We emphasize that it has been reported (see e.g. Hajibeygi, 2011 ) that the MSFV method provides non-monotone pressure solutions when solving the pressure for the bottom layer ( Fig. 8 b). Strategies to improve the MSFV solution, among them the i-MSFV ( Hajibeygi and Jenny, 2011 ) here considered, are necessary to address the issue.
In this twin experiment, the reference permeability field used to compute the observations is considered homogeneous with a value of 1e-13.  In this experiment, we fix = 1 . 0 −2 and vary = 1 . 0 −2 , 1 . 0 −4 , 1 . 0 −6 to evaluate whether the same convergence behaviour of the i-MSFV gradient toward fine-scale gradient direction is observed.
Once again we observe that the method provides accurate gradients, even considering this challenging geological setting, for both the top and bottom layers of the model (see Fig. 9 ).

Discussion
Based on the results acquired from the different numerical cases of increasing complexity we demonstrated that our newly introduced method can provide accurate gradients, up to fine-scale accuracy, if small enough residual tolerances are employed in the i-MSFV forward simulation. Also, it is shown that only a few i-MSFV/smoothing iterations are necessary to have reasonably accurate gradients. However, we highlight that, from an optimization point of view, gradients that are fully in accordance with fine-scale gradient are not necessary. As long as the gradient roughly points to the correct up/downward direction, the optimization process is able to progress towards the maximum/minimum. This is particularly true in the early iterations, where there is a more steep region of the objective function. On the other hand, as the optimization process approaches the optimum, more accurate gradients are required for a precise stopping criteria evaluation. The convergent behaviour of the i-MSFV gradient, which is demonstrated to be directly related to the outer-residual tolerance, which is an estimate defined a-priori , should allow for an error control of the gradient computation, which ultimately would allow control of the optimization process performance.
The gradient computation methods here, by design, directly inherits the characteristics of the i-MSFV introduced in Hajibeygi et al. (2008) . The solution of the backward (transposed) system of equations mirrors the solution strategy of the linear system of equations that arises from the forward problem. Even the system matrices from the forward and the backward system of equations share the same properties (e.g. conditioning number). Therefore, in principle, all studies in the literature related to the robustness and efficiency of the i-MSFV method with respect to grid size, level of heterogeneity, among others, should also be valid in the context of the gradient computation methods here proposed. Nevertheless, further numerical studies are important to investigate the robustness of the method itself with respect to such aspects.

Summary and research perspectives
We introduce iterative multiscale gradient computation algorithms based on the iterative Multiscale Finite Volume (i-MSFV) simulation method. By firstly re-casting the i-MSFV in an algebraic framework, we derive a flexible, multi-purpose derivative computation framework which accounts for the Direct and Adjoint methods. Their computational efficiency is discussed and it is shown that they share advantages equivalent to the i-MSFV method. The new methods are validated via numerical experiments against numerical differentiation are shown to address the challenges encountered by the MSFV gradient computation, specifically associated with high heterogeneity contrast. Fine-scale-accurate gradients are obtained for challenging geological models with high permeability contrasts (e.g. the SPE-10 comparative test case) if the i-MSFV model converges to an error tolerance of 1 . 0 −6 . However, we highlight that such accuracy is not necessary and only a few i-MSFV/smoothing iterations are required for reasonably accurate gradients. Further control of the gradient quality via an a-priori error estimate along the optimization process should allow for a computationally efficient optimization method. More specifically, a less accurate gradient estimate usually suffices in the early optimization iterations, when the objective function convergence is more steep, while more accurate gradients are required in the flatter regions for a more precise stopping criterion. Hence, the determination of an a-priori error estimate should be a key development.