Variance reduction using antithetic variables for a nonlinear convex stochastic homogenization problem

We consider a nonlinear convex stochastic homogenization problem, in a stationary setting. In practice, the deterministic homogenized energy density can only be approximated by a random apparent energy density, obtained by solving the corrector problem on a truncated domain. We show that the technique of antithetic variables can be used to reduce the variance of the computed quantities, and thereby decrease the computational cost at equal accuracy. This leads to an efficient approach for approximating expectations of the apparent homogenized energy density and of related quantities. The efficiency of the approach is numerically illustrated on several test cases. Some elements of analysis are also provided.


Introduction
In this article, we consider some theoretical and numerical questions related to variance reduction techniques for some nonlinear convex stochastic homogenization problems. In short, we show here that a technique based on antithetic variables can be used in that context, provide some elements of analysis, and demonstrate numerically the efficiency of that approach on several test cases. This work is a follow-up of the articles [5,6,10] where the same questions are considered for a linear elliptic equation in divergence form.
The stochastic homogenization problem we consider here writes as follows. Let D be an open bounded domain of R d and 2 ≤ p < ∞. We consider the highly oscillatory problem for some f and some random smooth field W , which is stationary in a sense made precise below, and satisfies some convexity and growth conditions such that, for any ε > 0, problem (1) is well-posed. See Section 1.1 below for a precise description of the mathematical setting, which has been introduced in [11,12]. A classical example that motivated this framework is when where a is stationary (see e.g. [11, page 382]).
In (1), ε denotes a supposedly small, positive constant that models the smallest possible scale present in the problem. For ε small, it is extremely expensive, in practice, to directly attack (1) with a numerical discretization. A useful practical approach is to first approximate (1) by its associated homogenized problem, which reads and next numerically solve the latter problem. The two-fold advantage of (2) as compared to (1) is that it is deterministic and it does not involve the small scale ε.

2
This simplification comes at a price. The homogenized energy density W ⋆ in (2) is given by an integral involving a so-called corrector function, solution to a nonlinear problem (see (7) below for a precise formula). As most often in stochastic homogenization, this corrector problem is set on the entire space R d . In practice, approximations are therefore in order. A standard approach (see e.g. [7] in the linear setting) is to generate realizations of the energy density W over a finite, supposedly large volume at the microscale, that we denote Q N , and approach the homogenized energy density by some empirical means using approximate correctors computed on Q N . Although the exact homogenized density W ⋆ is deterministic, its practical approximation is random, due to the truncation procedure. It is then natural to generate several realizations. However, efficiently averaging over these realizations require to understand how variance affects the result. This is the purpose of the present article to investigate some questions in this direction, both from the theoretical and numerical standpoints.
Before proceeding and for the sake of consistency, we now present the framework of nonlinear stochastic homogenization we adopt, and situate the questions under consideration in a more general context.

Homogenization theoretical setting
To begin with, we introduce the basic setting of stochastic homogenization we will employ. We refer to [13] for a general, numerically oriented presentation, and to [4,9,17] for classical textbooks. We also refer to [19] and the review article [2] (and the extensive bibliography contained therein) for a presentation of our particular setting. Throughout this article, (Ω, F , P) is a probability space and we denote by E(X) = Ω X(ω)dP(ω) the expectation value of any random variable X ∈ L 1 (Ω, dP). We next fix d ∈ N ⋆ (the ambient physical dimension), and assume that the group (Z d , +) acts on Ω. We denote by (τ k ) k∈Z d this action, and assume that it preserves the measure P, that is, for all k ∈ Z d and all A ∈ F , P(τ k A) = P(A). We assume that the action τ is ergodic, that is, if A ∈ F is such that τ k A = A for any k ∈ Z d , then P(A) = 0 or 1. In addition, we define the following notion of stationarity (see [2, Section 2.2]): a function F ∈ L 1 loc R d , L 1 (Ω) is said to be stationary if, for all k ∈ Z d , F (y + k, ω) = F (y, τ k ω) almost everywhere and almost surely. ( In this setting, the ergodic theorem [18,22,24] can be stated as follows: Let F ∈ L ∞ R d , L 1 (Ω) be a stationary random variable in the above sense. For This implies that (denoting by Q the unit cube in R d ) The purpose of the above setting is simply to formalize that, even though realizations may vary, the function F at point y ∈ R d and the function F at point y + k, k ∈ Z d , share the same law. In the homogenization context we now turn to, this means that the local, microscopic environment (encoded in the energy density W ) is everywhere the same on average. From this, homogenized, macroscopic properties will follow.
We now describe more precisely the multiscale random problem (1). The domain D is a regular (in the sense its boundaries are Lipschitz-continuous) bounded domain of R d . The right-hand side function f belongs to L p ′ (D), with 1/p + 1/p ′ = 1 (hence f is indeed in the dual space of L p (D)). For any ξ ∈ R d , the random field y, ω → W (y, ω, ξ) is assumed stationary in the sense (3). We assume that it is continuous (and even C 3 ) with respect to the ξ variable, and that it is measurable with respect to the y argument. We also assume that there exists c 2 ≥ c 1 > 0 such that Furthermore, we assume henceforth that W is strictly convex with respect to the argument ξ, in the sense that . and a.s., where ∂ 2 ξ W ∈ R d×d is the Hessian matrix of ξ → W (y, ω, ξ). A more demanding assumption is that W is α-convex with respect to the argument ξ, in the sense that there exists α > 0 such that ∀η ∈ R d , ∀ξ ∈ R d , η T ∂ 2 ξ W (y, ω, ξ)η ≥ α η T η a.e. and a.s.
Unless otherwise stated, we only assume (5) in the sequel. When needed, we will explicitly assume (6). Under (4) and (5), the variational problem (1) is well-posed. In addition, the homogenized limit of (1) has been identified in [11,12] (see also [14,Theorem 3.1]): the unique solution u ε (·, ω) to (1) converges (weakly in W 1,p (D) and strongly in L p (D), almost surely) to some deterministic function u ⋆ ∈ W 1,p (D), solution to (2), where the homogenized energy density W ⋆ is given, for any ξ ∈ R d , by denotes the set of functions that belong to W 1,p loc (R d ) and are Q N -periodic. The convergence in (7) holds almost surely.

The questions we consider
In practice, we cannot compute W ⋆ (ξ), and have to restrict ourselves to finite size domains. We therefore introduce and readily see from (7) that As briefly explained above, although W ⋆ itself is a deterministic object, its practical approximation W ⋆ N is random. It is only in the limit of infinitely large domains Q N that the deterministic value is attained. This is a standard situation in stochastic homogenization.
Many studies have been recently devoted (at least in the linear case) to establishing sharp estimates on the convergence of the random apparent homogenized quantities (computed on Q N ) to the exact deterministic homogenized quantities. We refer e.g. to [7,15] and to the comprehensive discussion of [6, Section 1.2]. We take here the problem from a slightly different perspective. We observe that the error is the sum of a systematic error (the first term in the above right-hand side) and of a statistical error (the second term in the above right-hand side).
We focus here on the statistical error, and propose approaches to reduce the confidence interval of empirical means approximating E [W ⋆ N (·, ξ)] (or similar quantities), for a given truncated domain Q N .
Recall that a standard technique to compute an approximation of E [W ⋆ N (·, ξ)] is to consider several independent and identically distributed realizations of the energy density W , solve for each of them the corrector problem (8) (thereby obtaining several i.i.d. values W ⋆,m N (ω, ξ)), and proceed following a Monte Carlo approach: In view of the Central Limit Theorem, we know that our quantity of interest E [W ⋆ N (·, ξ)] lies in the confidence interval with a probability equal to 95 %.
In this article, we show that, using a well known variance reduction technique, the technique of antithetic variables [21, page 27], we can design a practical approach that, for finite N and any vector ξ, allows to compute a better approximation of E [W ⋆ N (·, ξ)] (and likewise for similar homogenized quantities). Otherwise stated, for an equal computational cost, the approach provides a more accurate (i.e. with a smaller confidence interval) approximation. We thereby extend to this nonlinear convex setting the results of [5,6,10] obtained in the linear case.
Our article is articulated as follows. In Section 2.1, we describe the proposed approach, and state our main results. The ingredients to prove these results are collected in Sections 2.2, 2.3 and 2.4. The actual proof of our main results is performed in Section 2.5. We make there several structural assumptions on the form of the energy density W to obtain these variance reduction results. In Section 2.6, we describe a general class of examples for which our assumptions are indeed satisfied. We next turn in Section 3 to some illustrative numerical examples, where we demonstrate the efficiency of the approach, even in cases where the theoretical analysis is incomplete.
2 Description of the proposed approach and main results

Statement of our main results
This section is devoted to the presentation and the analysis of our approach. We first focus on estimating the expectation E [W ⋆ N (·, ξ)] of the apparent homogenized energy density (see Section 2.1.1). Our variance reduction result, Proposition 1, shows that the technique of antithetic variables is indeed efficient. As often the case, it is difficult to quantitatively assess how efficient the approach is, and this will be the purpose of the numerical tests described in Section 3 to address this question.
We then turn to the estimation of the first (and next second) derivatives of W ⋆ N (·, ξ) with respect to ξ. These quantities naturally appear when one solves the convex homogenized problem (2) (approximating W * by W ⋆ N (ω, ·)), e.g. using a Newton algorithm. For these two quantities, our result is restricted to the one-dimensional setting. See Section 2.1.2 and Proposition 2 for the first derivative, and Section 2.1.3 and Proposition 3 for the second derivative. Sections 2.2, 2.3, 2.4 and 2.5 are devoted to the proof of the results stated here. In Section 2.6, we discuss an explicit class of energy densities W that falls into our framework.

Variance reduction on the homogenized energy density
In this section, we make the following two structure assumptions on the rapidly oscillating field W of (1). First, we assume that, for any N, there exists an integer n (possibly n = |Q N |, but not necessarily) and a function A, defined on Q N × R n × R d , such that the field W (y, ω, ξ) writes ∀y ∈ Q N , ∀ξ ∈ R d , W (y, ω, ξ) = A (y, X 1 (ω), . . . , X n (ω), ξ) a.s., (9) where {X k (ω)} 1≤k≤n are independent scalar random variables, which are all distributed according to the uniform law U[0, 1]. In general, the function A, as well as the number n of independent, identically distributed variables involved in (9), depend on N, the size of Q N , although this dependency is not made explicit in (9).
Second, we assume that the function A in (9) is such that, for all y ∈ Q N and all ξ ∈ R d , the map is non-decreasing with respect to each of its arguments.
Proposition 1. We assume (9)-(10). Let W ⋆ N (ω, ξ) be the approximated homogenized energy density field defined by (8). We define on Q N the field antithetic to W defined by (9). We associate to this field the approximate homogenized energy density field W ant,⋆ Then, for any ξ ∈ R d , Otherwise stated, W ⋆ N (ω, ξ) is a random variable which has the same expectation as W ⋆ N (ω, ξ), and its variance is smaller than half of that of W ⋆ N (ω, ξ). As mentioned above, this result generalizes [6, Proposition 2.1] to the nonlinear convex variational setting considered here.
Before proceeding, we briefly explain the usefulness of the above result for variance reduction techniques. Assume we want to compute the expectation of W ⋆ N (ω, ξ), for some fixed vector ξ ∈ R d . Following the classical Monte-Carlo method recalled in Section 1.2, we estimate E [W ⋆ N (·, ξ)] by its empirical mean. To this end, we consider 2M independent, identically distributed copies {W m (y, ω, ξ)} 1≤m≤2M of the random field W (y, ω, ξ) on Q N . To each copy W m , we associate an approximate homogenized energy density W ⋆,m N (ω, ξ), defined by (8). We next introduce the empirical mean and consider that, in practice, the mean E [W ⋆ N (·, ξ)] is equal to the estimator I 2M within an approximate margin of error 1.96 2M .
Alternate to considering (13), we may consider where W ⋆,m N is defined by (11). Again, in practice, the mean is equal to I 2M within an approximate margin of error 1.96 Observe now that both estimators (13) and (14) are of equal cost, since they require the same number 2M of corrector problems to be solved. The accuracy of the latter is better if and only if Var , which is exactly the bound (12) of Proposition 1.

Variance reduction on the first derivative of the homogenized energy density
Restricting ourselves to the one-dimensional setting, we now state a variance reduction result for the estimation of E [ξ∂ ξ W ⋆ N (·, ξ)]. Note that, to distinguish derivatives with respect to y from derivatives with respect to ξ, we keep the notation ∂ ξ W , even though we are in the one-dimensional situation.
We again make the structure assumption (9), and observe that it implies that where {X k (ω)} 1≤k≤n are scalar i.i.d. random variables, which are all distributed according to the uniform law U[0, 1], and where the function A 1 , defined on (−N, N) × R n × R, is given by In addition, we assume that, for all y ∈ (−N, N) and all ξ ∈ R, the map is non-decreasing with respect to each of its arguments. We recall that the function ξ → W (y, ω, ξ) is strictly convex (see assumption (5)) and satisfies (4). It therefore has a unique minimizer ξ 0 (y, ω). In the sequel, we consider energy densities such that this minimizer is independent of y and ω. Without loss of generality, we can assume that ξ 0 = 0. We thus consider energy densities W such that ξ → W (y, ω, ξ) attains its minimum at ξ = 0, a.e. and a.s.

Variance reduction on the second derivative of the homogenized energy density
Considering again the one-dimensional setting as in Section 2.1.2, we eventually state a variance reduction result for the estimation of E ∂ 2 ξ W ⋆ N (·, ξ) . Recall that, for any y and ω, the map ξ → ∂ ξ W (y, ω, ξ) is increasing. We can therefore introduce its reciprocal function ζ → ψ(y, ω, ζ), which is also increasing.
We again make the structure assumption (9), and observe that it implies that, for any y ∈ (−N, N) and any ζ ∈ R, ∂ 2 ξ W (y, ω, ψ(y, ω, ζ)) = A 2 (y, X 1 (ω), . . . , X n (ω), ζ) a.s., where {X k (ω)} 1≤k≤n are scalar i.i.d. random variables, which are all distributed according to the uniform law U[0, 1], and where the function A 2 , defined on (−N, N) × R n × R, is given by In addition, we assume that, for all y ∈ (−N, N) and all ζ ∈ R, the map is non-decreasing with respect to each of its arguments.

Classical results on antithetic variables
We first recall the following lemma, and provide its proof for consistency. This result is crucial for our proof of variance reduction using the technique of antithetic variables, performed in Section 2.5.

Lemma 4 ([21]
, page 27). Let f and g be two real-valued functions defined on R n , which are non-decreasing with respect to each of their arguments. Consider X = (X 1 , . . . , X n ) a vector of random variables, which are all independent from one another. Then Proof. This lemma is proved by induction. We treat the one-dimensional case (n = 1) below, and we refer to [6, Proof of Lemma 2.1] for the induction. Consider X and Y two independent scalar random variables, identically distributed. Both functions f and g are non-decreasing, so We now take the expectation of the above inequality: As X and Y share the same law, and are independent, this yields and (24) follows for n = 1.
The following result is a simple consequence of the above lemma (see e.g. [6] for a proof).

Corollary 5 ([21]
). Let f be a function defined on R n , which is non-decreasing with respect to each of its arguments. Consider X = (X 1 , . . . , X n ) a vector of random variables, which are all independent from one another, and distributed according to the uniform law U[0, 1]. Then We next observe that where we have used that Var(f (X)) = Var(f (1 − X)).

Derivatives of the corrector and of the homogenized energy density
We now introduce the correctors as the solutions to (8): In this section, we derive some useful expressions for the derivatives with respect to ξ of w N and of W ⋆ N . The first order optimality condition in (8) reads We deduce from that condition that and we note that we do not need to know ∂ ξ w N to compute ∂ ξ W ⋆ N . Computing the derivative of this equality with respect to ξ, we obtain that with the convention that ∂ ξ ∇w N jk = ∂ 2 w N ∂ξ j ∂y k for 1 ≤ j, k ≤ d. We can actually obtain a somewhat more symmetric expression. Computing the derivative of (25) with respect to ξ, we indeed see that We then infer from (27) and (28) that Remark 6. Using the same kind of arguments, we see that the function Suppose that W is α-convex (i.e. satisfies (6)). Then problem (30) is wellposed and allows to uniquely determine (up to an additive constant) g j , by solving a linear elliptic partial differential equation.
Combined with (29), this remark provides a practical way to compute ∂ 2 ξ W ⋆ N (ω, ξ) without using any finite difference approximation in ξ.
We finally note that, in view of (26), we have Likewise, in view of (29), we see that

Monotonicity properties
Our goal in this section is to establish monotonicity properties for the homogenization process. Such properties are indeed useful to apply Corollary 5 and therefore prove variance reduction.
To simplify the notation, we assume in this section that we are in a periodic setting. For any ξ ∈ R d , the function y → W (y, ξ) is supposed to be Q-periodic (with Q = (0, 1) d ), to satisfy the growth condition (4) and to be strictly convex with respect to ξ. The associated homogenized energy density is then given by We first show a monotonicity property on the homogenized energy density in Section 2.4.1. Next, restricting ourselves to the one-dimensional setting, we show monotonicity properties for the first and the second derivative of the homogenized energy density (see respectively Sections 2.4.2 and 2.4.3).

On the homogenized energy density
The following result is an extension to the nonlinear setting of a well-known result in the linear setting (see [23, page 12]).
Lemma 7. Suppose that the fields W 1 and W 2 satisfy We denote W ⋆ 1 and W ⋆ 2 the corresponding homogenized energy densities, defined by (33). We then have Taking the infimum over v, we obtain the claimed result.

On the first derivative of the homogenized energy density
We now establish a monotonicity result on the derivative of W ⋆ (ξ), in the one-dimensional setting. As in Section 2.1.2 (see (17)), we consider energy densities W such that ξ → W (y, ξ) attains its minimum at ξ = 0 for almost all y ∈ Q.
We denote W ⋆ 1 and W ⋆ 2 the corresponding homogenized energy densities, defined by (33). We then have Proof. We first claim that ∂ ξ W ⋆ (ξ) has the same sign as ξ.

On the second derivative of the homogenized energy density
We next turn to monotonicity properties of the second derivative of the homogenized energy density. As in Section 2.4.2, we consider energy densities satisfying (39). We additionally request that, almost everywhere in (0, 1), ξ → ∂ 2 ξ W (y, ξ) is non decreasing for ξ ≥ 0 and non increasing for ξ ≤ 0.
Proof. We first compute the derivative of (43) and obtain It is sufficient to prove (47) for ξ > 0. Using (41) and the fact that ψ 1 and ∂ 2 ξ W 1 are non-decreasing with respect to their second argument, we have Using (46) for ζ = ∂ ξ W ⋆ 2 (ξ), we deduce that In view of (48), this inequality readily implies (47) for ξ > 0. This concludes the proof.

Proof of Propositions 1, 2 and 3
Now that we have collected all the necessary ingredients, we are in position to prove our main results.

Variance reduction on the homogenized energy density
Proof of Proposition 1. As 1 − X k (ω) and X k (ω) share the same law, so do the fields W and W ant on Q N . Hence, the homogenized fields W ⋆ N (ω, ξ) and W ant,⋆ N (ω, ξ) share the same law, and we obtain the first assertion of (12). We now choose a vector ξ ∈ R d , and denote by P ξ N the operator that associates to a given Q N -periodic energy density the homogenized energy density evaluted at ξ. We see from (8) that W ⋆ N (ω, ξ) is the effective energy density (evaluated at ξ) obtained by periodic homogenization of W |y∈Q N : Using the function A of (9), we introduce the map , see that f (X(ω)) = W ⋆ N (ω, ξ) and that, using the definition (11) (50) We now infer from Assumption (10) that, for any y ∈ Q N and any ζ ∈ R d , the function A(y, ·, ζ) is non-decreasing with respect to each of its arguments. In view of Lemma 7, we obtain that f is non-decreasing.
We are thus in position to use Corollary 5, which yields Var (f (X)) .

Using (50), we obtain
Var W ⋆ N (·, ξ) = Var which concludes the proof of the second assertion of (12) and of the Proposition 1.

Variance reduction on the first derivative of the homogenized energy density
Proof of Proposition 2. The proof follows the same lines as that of Proposition 1. As 1 − X k (ω) and X k (ω) share the same law, so do the fields W and W ant on Q N . Hence, the quantities ξ∂ ξ W ⋆ N (ω, ξ) and ξ∂ ξ W ant,⋆ N (ω, ξ) share the same law, which implies the first assertion of (19).
To prove the second assertion, we again make use, as in the proof of Proposition 1, of the operator P ξ N that associates to a given Q N -periodic energy density the homogenized energy density evaluted at ξ (here, Q N = (−N, N)). Expression (49) holds. Choosing a vector ξ ∈ R, we introduce the function which obviously satisfies f (X(ω)) = ξ∂ ξ W ⋆ N (ω, ξ). Using the definition (18) (51) We now infer from Assumption (16) that, for any y ∈ (−N, N) and any ζ ∈ R, the function A 1 (y, ·, ζ) is non-decreasing with respect to each of its arguments. In view of Lemma 9, we thus obtain that f is non-decreasing.
Using Corollary 5, we write that Var Var (f (X)).
In view of (51), we recast this inequality as and therefore obtain the second assertion of (19). This concludes the proof of Proposition 2.

Variance reduction on the second derivative of the homogenized energy density
Proof of Proposition 3. The proof follows the same lines as the proof of Proposition 2. Using Assumptions (16) and (21), we see that Assump-tions (40) and (46) of Lemma 10 are satisfied. The monotonicity result of Lemma 10 next allows to use Corollary 5, which implies (23).

Examples satisfying our structure assumptions
Before proceeding to the numerical tests, we give here some specific examples of fields W that satisfy the above assumptions. We consider the case with c(y, ω) ≥ 0 and a(y, ω) ≥ a − > 0 a.e. and a.s., and provide sufficient conditions on the scalar fields a and c for the structure assumptions (9), (10), (16) and (21) to be satisfied. Note that (17) and (22) are already fullfilled. Consider two families (a k (ω)) k∈Z d and (c k (ω)) k∈Z d of independent, identically distributed random variables, and assume that where Q = (0, 1) d and Q + k is the cube Q translated by the vector k ∈ Z d . The scalar field a(y, ω) is therefore constant in each cube Q + k with i.i.d. values a k (ω), and likewise for c(y, ω). We assume that there exist α > 0 and β < ∞ such that, for all k ∈ Z d , 0 < α ≤ a k (ω) ≤ β < +∞ and 0 ≤ c k (ω) ≤ β < +∞ almost surely. Consequently, (4) holds.
Introduce now the cumulative distribution functions P a (x) = ν a (−∞, x), where ν a is the common probability measure of all the a k , and next the nondecreasing functions f a (x) = inf{z; P a (x) ≥ z}. Then, for any random variable X a (ω) uniformly distributed in [0, 1], the random variable f a (X a (ω)) is distributed according to the measure ν a . As a consequence, we can recast (53) in the form a(y, ω) = where (X a k (ω)) k∈Z d is a family of independent random variables that are all uniformly distributed in [0, 1], and f a is non-decreasing. We can proceed likewise for the variables c k . This yields an example where (9), (10) and (16) hold. In particular, the function A of (9) reads As shown in [6], more general fields a(y, ω) (where random variables may be correlated) also fall into this framework.
In what follows, we prove that, under assumptions (52) and (53), and if p ≤ 3, the structure assumption (21) holds. Without loss of generality, we may assume that y ∈ (0, 1), and write that . By a slight abuse of notation, we keep implicit the dependency with respect to y, work withā andc rather than x a and x c , and write A(ā,c, ξ) =ā |ξ| p p +c |ξ| 2 2 .
We next compute the derivative of A 2 with respect toc. Using again (54) to compute ∂g ∂c , we obtain that Recall thatā > 0,c ≥ 0, p > 1 and g > 0. We have assumed that p ≤ 3, and therefore deduce from the above relation that ∂A 2 ∂c ≥ 0. The structure assumption (21) hence holds in that case.
It is likely that other settings, such as W (y, ω, ξ) = (a(y, ω) + c(y, ω)) |ξ| p p + c(y, ω) |ξ| 2 2 , along with assumption (53), where a k and c k are all independent random variables, also fall in our framework. We will not pursue in this direction here.

Newton algorithm to solve the truncated corrector problem
As mentioned above, the corrector problem (8) is a convex minimization problem, which has been well studied in the literature (see e.g. [3,8,16,20]). We explain here how we proceed in practice to solve this problem, assuming that W is not only strictly convex, but actually α-convex (i.e. satisfies (6)).
To simplify our exposition, we use the notation of the Q-periodic case, where the corrector problem is (33). We introduce some basis functions {ϕ i } i∈I (e.g. finite element functions) where ϕ i ∈ W 1,p # (Q), and the finite dimensional space V h = Span {ϕ i , i ∈ I}. Consider the functional and the variational problem This problem has a unique solution (denoted w h ∈ V h ) up to the addition of a constant. The quantity ∇w h is well-defined, and is the finite-dimensional approximation of ∇w, where w is the solution to (33). In practice, problem (55) is solved using a Newton algorithm. We see that ∂J ∂w j (w) = D w (ϕ j ) and and H w (ϕ, ψ) = Q (∇ϕ(y)) T ∂ 2 ξ W (y, ξ + ∇w(y)) ∇ψ(y) dy.
The Newton algorithm consists in defining w m+1 h ∈ V h from w m h ∈ V h by the following linear elliptic problem: find w m+1 h ∈ V h such that Again, w m+1 h is uniquely defined up to the addition of a constant. The finite-dimensional problem (55) is α-convex, and W is smooth with respect to ξ: the Newton algorithm hence locally converges (quadratically), and lim m→∞ ∇w m h = ∇w h . In practice, we consider a sequence T h of meshes on Q, and set By classical finite element results, we know that lim h→0 ∇w h − ∇w L p (Q) = 0 (see e.g. [3] and also [25,1]).

Overview of numerical results
We have considered three test-cases of the form (52)-(53), namely with p = 4, in dimension d = 2. The random variables a k follow a Bernoulli distribution: P(a k = α) = P(a k = β) = 1/2, with α = 3 and β = 23. The value of the field c is chosen as follows: • Test Case 1: in this first test case, c(y, ω) = 0. The problem is thus strictly convex but not α-convex. In addition, the energy density is positively homogeneous of degree p, hence Remarks 8 and 11 apply.
• Test Case 2: the second test case corresponds to c(y, ω) = 1. The problem is then α-convex, and highly oscillatory only in its non-harmonic component.
• Test Case 3: for the third test case, we work with c(y, ω) chosen according to (53), where P(c k = γ) = P(c k = δ) = 1/2, with γ = 1 and δ = 3. The problem is thus highly oscillatory both in its non-harmonic and its harmonic components.
We take the meshsize h = 0.2. The Newton algorithm is initialized with the solution w 0 to −div [(a(y, ω) + c(y, ω))(ξ + ∇w 0 )] = 0 in Q N , w 0 is Q N -periodic,  (13)). For the antithetic variable approach, we have also solved 2M corrector problems, from which we build the empirical estimator (14). Therefore, in all what follows, we compare the accuracy of the Monte Carlo approach (MC) and the Antithetic Variable approach (AV) at equal computational cost.

Test Case 1
In this test case, the energy density is positively homogeneous. We therefore know, from Proposition 1 and Remark 11, that our approach yields estimations of the expectation of W ⋆ N (ω, ξ), ξ · ∂ ξ W ⋆ N (ω, ξ) and ξ T ∂ 2 ξ W ⋆ N (ω, ξ)ξ with a smaller variance than the standard Monte Carlo approach. Our aim here is to quantify the efficiency gain. Note also that we have not taken into account, in our implementation, the fact that W ⋆ N (ω, ξ), ξ · ∂ ξ W ⋆ N (ω, ξ) and ξ T ∂ 2 ξ W ⋆ N (ω, ξ)ξ are here proportional to one another. To begin with, we show on Figure 1 the estimation by empirical means (along with a 95 % confidence interval) of several homogenized quantities (the energy density, its derivatives with respect to each component of ξ, . . . ). We observe that the variance of all quantities decreases when the size of Q N increases, and that confidence intervals obtained with the antithetic variable approach are smaller than those obtained with a standard Monte Carlo approach, for an equal computational cost.
We now turn to a more quantitative analysis of the variance. Figure 2 shows the variances as a function of N (note the factor 1/2 in the definition of V MC , consistent with (12), (13) and (14)). We observe that the variance of any of our quantities of interest (obtained either with the Monte Carlo approach or the Antithetic Variable approach) decreases at the rate 1/|Q N | as N increases (as expected if one could use the Central Limit Theorem). We also observe that the variance obtained with our approach is systematically smaller than the Monte Carlo variance, in the sense that V AV ≤ V MC . We next report on Table 1 the variance reduction ratio which measures the gain in computational cost at equal accuracy, or the square of the accuracy gain at equal computational cost. Although this ratio somewhat varies with N, we observe that it is of the order of 10 for all quantities of interest, except for ∂ ξ 1 ξ 2 W ⋆ N , for which it is always larger than 4. In particular, even if N is not large (because we cannot afford to work on a large domain Q N ), we still observe variance reduction. Remark 13. Similar variance reduction ratios are obtained in the case when the corrector problem is supplemented with homogeneous Dirichlet boundary conditions on the boundary on Q N , rather than periodic boundary conditions as used here following (8) (results not shown).

Test Case 2
We now consider a test-case for which the energy density is not positively homogeneous. From our results of Section 2.1, we know that our approach yields variance reduction for the estimation of E [W ⋆ N (·, ξ)]. Our aim here is two-fold: we first quantify the efficiency gain, and we next verify (and this will indeed be the case) that we also obtain a gain in efficiency for quantities of interest (such as the first or second derivatives of W ⋆ N (ω, ξ) with respect to ξ) for which we do not have theoretical results in the two-dimensional case.
We show on Figure 3 the variances (56) of the same quantities of interest as previously (obtained either with the Monte Carlo approach or the Antithetic Variable approach). As for the previous test-case, we observe that all variances decrease at the rate 1/|Q N | as N increases. In addition, we observe that the variance obtained with our approach is systematically smaller than the Monte Carlo variance, in the sense that V AV ≤ V MC .
On Table 2, we report the variance reduction ratios (57) (with the same convention as in Table 1). We observe an efficiency gain of more than 10 for all quantities of interest, except again the cross derivative ∂ ξ 1 ξ 2 W ⋆ N , for which the gain is smaller, and of the order of 4.

Test Case 3
We eventually turn to our final test-case, where both coefficients a and c do depend on the space variable. We show on Figure 4 the variances (56) of our quantities of interest. Again, we observe that they all decrease at the rate 1/|Q N | as N increases, and that the variance obtained with our approach is systematically smaller than the Monte Carlo variance.
On Table 3, we report the variance reduction ratios (57) (with the same convention as in Table 1). Results are quantitatively similar to the ones obtained on Table 2: we do observe a robust variance reduction, even in cases for which theoretical support is still currently missing.