Application of the mixed multiscale finite element method to parallel simulations of two-phase flows in porous media

. The Mixed Multiscale Finite Element method (MMsFE) is a promising alternative to traditional upscaling techniques in order to accelerate the simulation of ﬂows in large heterogeneous porous media. Indeed, in this method, the calculation of the basis functions which encompass the ﬁne-scale variations of the permeability ﬁeld, can be performed in parallel and the size of the global linear system is reduced. However, we show in this work that a two-level MPI strategy should be used to adapt the calculation resources at these two steps of the algorithm and thus obtain a better scalability of the method. This strategy has been implemented for the resolution of the pressure equation which arises in two-phase ﬂow models. Results of simulations performed on complex reservoir models show the beneﬁts of this approach.


Introduction
Simulations of subsoil flows are based on geological models which provide a description of the geometry of the porous medium and its petrophysical properties such as porosity and permeability. The space variations of these properties play an important role on the flow. In industrial applications, like in the oil and gas industry for instance, the geological description of the studied area often leads to models where the permeability and porosity maps are defined on grids made of dozens or hundreds of millions of cells. This initial discretization will be referred to as finescale model in the following. Simulating multiphase flows, directly at this grid resolution, may lead to large computing times. This problem gets even worse when several simulations have to be launched in order to perform a sensitivity analysis, a calibration of the model parameters, or test different flow scenarios. To overcome this issue, various upscaling techniques and multiscale methods have been developed over the past years.
Upscaling techniques essentially consist in averaging the petrophysical properties defined in the fine-scale model on a coarser grid [1,2]. But they usually do not enable to downscale solutions obtained with this grid onto the fine one. This requirement is particularly important when working with workflows including both the fine and coarse models. Local Grid Refinements (LGRs) can be used jointly with these techniques, as in [3], in order to obtain more accurate solutions at particular places within a reservoir. However LGRs cannot be used over large space areas. Therefore the prospect of computing efficiently solutions at both fine and coarse scales motivated the development of multiscale methods. In this class of methods, the petrophysical properties are used to compute basis functions which are associated to degrees of freedoms related to the coarse mesh and calculated by solving local cell problems defined on the fine mesh. This set of basis functions is then used to build a discrete linear system with a reduced number of unknowns on the coarse mesh.
This approach has several advantages. Compared to classical finite elements, the shape of the basis functions changes according to the local variations of the permeability field within their support. All cell problems are independent from each other and their resolution can be performed in parallel. Moreover, once the linear system has been solved at the coarse scale, according to the multiscale method, either the fluxes or the pressures can be downscaled on to the fine grid.
The first multiscale finite element method has been introduced in [4] for the study of porous materials. Unfortunately, this method does not turn out to be appropriate for flow simulations since the fluxes are not conservative on the coarse and fine meshes. A mixed formulation was therefore proposed in [5] which enables, in case of two-phase flows, to couple the resolution of the pressure equation with the resolution of the saturation equation. Note that other multiscale formulations like, for instance, multiscale finite-volume methods [6][7][8] have also been proposed and provide mass conservative fluxes too. The reader can refer to [9,10] for an overview of these techniques. The aim of this work is here to provide strategies to improve the efficiency of these methods when they are used in a parallel environment and give quantitative speed-ups by using one of them, namely the Mixed Multiscale Finite Element Method (MMsFE) [5].
In this work, we consider a two-phase flow model where the pressure equation is decoupled from the saturation equation using IMPES or IMPIMS time discretizations. The pressure equation is solved by means of the MMsFE method whereas an upwind finite volume scheme is used to discretize the saturation equation at the fine-scale. This splitting is feasible since the MMsFE method provides conservative fluxes at the fine and coarse scales. The cell problems defined to compute the basis functions are discretized using a finite volume method with a Two Point Flux Approximation (TPFA) [11].
As mentioned before, the divide and conquer strategy which is inherent to multiscale methods makes them attractive for parallel simulations since all cell problems defined to compute the basis functions can be solved independently from each other. Shared memory architectures are usually used to efficiently deal with tasks in parallel. A parallel implementation of the MMsFE method for one single node has been proposed in [12] using the Matlab MRST toolbox [13]. For multi and manycores architectures, a parallel implementation has also been successfully developed in [14]. However, for large scale simulations, distributed memory computing is required. A recent work presents a hybrid parallel implementation within the DUNE framework [15]. Here, the hybrid parallel programming model consists in using a shared memory programming within SMP nodes and MPI-communications between nodes. In this approach, no communication is needed within one node. However, this requires to develop effective threadsafe components everywhere in the simulator.
In practice, the efficiency of the MMsFE method for the two phase flow problem may be reduced for two reasons. Even if all cell problems, which are defined to compute the basis functions, are solved on a small space subdomain including a few coarse cells, these problems can be numerous. Therefore the MMsFE method leads to substantial speedups only if a small part of the basis functions is recomputed at each time step. A criterion based on the variations of the total mobility is usually used to trigger an update of the basis in the neighborhood of the saturation front. The performance of the multiscale method is thus optimal when this front is not too large and the calculation workload related to the resolution of all cell problems is dynamically balanced through the processors. The second issue comes from the small size of the coarse linear system. Since multiscale methods offer the possibility to agglomerate a significant number of fine cells into a coarse one, the resulting distributed linear system can be small compared to the number of MPI processes required by other parts of the simulation sequence. Therefore, whereas a promising scaling may be expected for other parts of the algorithm, this strong reduction of the size of the linear system can lead to scalability problems. This issue was already pointed out in [15].
The purpose of this work is to address this second issue. We investigated pure MPI and hybrid approaches to finally propose a two-level MPI alternative. In this work, we use a redistribution library for linear systems in addition to the MPI parallel programming model, which enables the use of a subset of MPI processes to solve the coarse pressure system and avoids scaling issues when a larger number of MPI processes is required elsewhere in the resolution process. The performance of this technique has been assessed on oil reservoir models.
This paper is organized as follows. In Section 2, we briefly present the two-phase flow model. We introduce some notations and recall the MMsFE method in Section 3. First numerical results are then presented in Section 4. The different parallel programming techniques mentioned before are detailed and discussed in Section 5. The performances of the MMsFE method used along with the techniques presented in Section 5 are evaluated in Section 6. At last, concluding remarks and perspectives of works are summarized in Section 7.

Mathematical model
In this section, we first describe a simplified model for twophase fluid flows in porous media. Then, we briefly outline a finite-volume method based on a sequential splitting that is classically used to solve the corresponding system of equations.

Two-phase flow problem
We consider an immiscible and incompressible two-phase flow in a porous medium X. Here, X is a subset of R d with d ! 2. Let us denote by subscripts ''w'' and ''o'' the water and oil phases respectively. The equations governing fluid flows in a heterogeneous porous medium are derived from the conservation of mass. For each phase a, we write the mass balance where / denotes the total porosity or saturated volume fraction, S a the phase saturation, v a the phase velocity, and q a a source (or sink) term. Since both fluids fill the porous volume, the saturations satisfy the closure equation X In this work, gravity and capillary pressure are not taken into account. Hence, according to the two-phase extension of Darcy's law [16], the velocity v a can be formulated for each phase a as follows: where K is the permeability tensor, l a the viscosity and k r,a is the saturation dependent relative permeability of the phase a. We introduce the phase mobility k a ¼ k r;a l a and the total mobility Combining (1), (2) and (3), the following pressure equation can be obtained: where Then, by introducing the water fractional flow and writing the pressure gradient with respect to the total velocity v, this leads to the saturation equation Let oX be the boundary of X whose outward unit normal vector is denoted by n. oX is partioned so that oX = C D [ C N . Boundary conditions are imposed in the following way for the pressure equation (4): and for the saturation equation (5): From now on, the notation S instead of S w will be used for simplicity's sake.

Numerical resolution
The system of equations (4) and (5) could be solved by different ways [17]. Here, the splitting of the system into an elliptic equation and a hyperbolic equation that was introduced in the previous section is used to compute sequentially both pressure and saturation according to the scheme proposed in [18]. In a first step, the pressure equation (4) is solved at the current time step using saturation values from the previous time step. Then the saturation equation (5) is solved using the new pressure values. Thus, at each time iteration n + 1, the system to be solved reads with the boundary conditions introduced in (6) and (7) and Dt = t n+1 À t n . To evaluate the water fractional flow in (9), a first possible choice consists in using an explicit Euler scheme where S H ¼ S n leading to the IMPES formulation (IMplicit in Pressure, Explicit in Saturation).
To ensure the stability of the saturation calculations, a CFL condition on the time steps is required [19]. An implicit Euler scheme can also be used. In that case, S H ¼ S nþ1 . This second formulation is known as IMPIMS scheme (IMplicit in Pressure, IMplicit in Saturation). Appendix A.2 gives the complete discretization of this equation. At last, the computation of the saturations S n+1 requires the resolution of a nonlinear system due to the presence of the water fractional flow. This system is solved by means of a Newton-Raphson method [20]. The solution obtained with this method is unconditionally stable whatever the time step Át is.
Hereafter, equation (8) is called the pressure equation and equation (9) the saturation equation. The saturation equation is discretized on the fine mesh using an upwind cell-centered finite volume method [21] where the fluxes are computed either by means of the two-point flux approximation (TPFA) [11] or by means of the MMsFE method. The TPFA approximation is well-known to be consistent and monotonous for Cartesian grids. Since, in this work, our numerical examples only use this type of grid, the TPFA discretization will be considered for our comparisons as the fine reference solution. Appendix A recalls that discretization. Note that, when dealing with unstructured meshes, multi-point flux approximations [22] or mimetic finite difference methods [23] can be employed to ensure the consistency of the fluxes. The next section introduces the MMsFE method.

Mixed multiscale finite element method
Multiscale finite element methods provide an efficient way to compute the pressure P and the velocity v by solving a linear system defined on a coarse mesh while taking the details of a geological model into account at a fine-scale. Unlike upscaling techniques which average the values of the rock properties, multiscale methods actually compute basis functions whose shape depends on the space variability of these properties. The construction of such basis functions is achieved through the resolution of local subproblems called cell problems. Then a coarse linear system is assembled using a variational formulation based on these basis functions.

Multiscale mesh construction
Corner-point geometry grids [24] which are composed of hexaedral cells with a Cartesian topology are usually used for real test cases. Here, as a first trial, only Cartesian grids were considered for this study since the coarsening of such grids is easy for parallel simulations. In the following of this section, we first introduce some notations and the construction of the coarse mesh afterwards.

General notations
Let K h be a conforming polyhedral mesh of the domain X such that h ¼ max k2K h h k where h k is the diameter of the cell k 2 K h . Let F h denote the set of faces related to the mesh K h . Within this set, F i h stands for the group of inner faces: h for the group of boundary faces: Let us denote by F D h the set of boundary faces included in C D : For all cell k 2 K h , we also define Conversely, let us define for all For all inner face r 2 F i h , an orientation is arbitrarily chosen and a unit normal vector n r is defined based on this orientation. For all k 2 K h;r , we denote by n r,k the unit normal vector of r oriented outside of cell k and we define e r;k ¼ n r;k Á n r .

Upgridding
We denote by K h the fine mesh and by K H its corresponding coarse mesh with h ¼ H . In our case, the coarse mesh K H is generated by agglomerating the fine cells k 2 K h by means of a uniform partitionning of the structured grid K h . By that way, the coarse mesh faces are still aligned with the underlying fine grid. An example of such a mesh is given in Figure 1. Note that other works [25] proposed flow-based coarsening methods that were also applied along with multiscale methods.
Finally, we define the following sets mapping elements of K h and K H :

Definition
The MMsFE method introduces the multiscale finite element space Here MSðK Þ denotes a finite element space spanned by basis functions These basis functions are computed by solving cell problems on the fine underlying cells k 2 M H ;h K . The degrees of freedom of q H jK 2 MSðK Þ are the fluxes R R q H jK Á n R;K ds. In its primary formulation [5], the boundary conditions of the cell problems were chosen independently from the local heterogeneities. An oversampling technique consisting in extending the local domain of the cell problem was proposed to reduce the impact of these boundary conditions and improve the accuracy of the method. However, the use of oversampling may induce a loss of volume balance. Static and dynamic strategies were proposed in [26] to define the boundary conditions in conservative ways. These two approaches are the ones we tested in this work and we now detail. In the static approach, the cell problem of a face where n is the unit normal vector oriented outside of the domain K 1,2 . Functions w are weight functions verifying Note that the sink and source terms along with the no-flux boundary conditions on oK 1;2 defined for problem (10) imply that the flux of w R through R is equal to one. Functions / R are defined up to one constant. In practice, these basis functions are chosen so that the w-weighted average is equal to zero. For a face R 2 F D H and K 2 K H ;R , the related basis function is defined in the cell K and is solution to where function w R is given by The static approach sometimes fails to reach a good accuracy in the computation of the fluxes since it only takes local properties into account. Hereafter, when the static approach is used to compute the basis functions, these functions will be referred to as local basis functions. The idea of the dynamic approach consists in computing a one-phase flow at the fine scale whenever new boundary conditions are defined during the simulation. The obtained velocity field v mono is then used to define the boundary conditions of the cell problem. Thus, for each face R 2 F i_D H and each cell K 2 K H ;R , the basis functions w K;R and / K;R are defined in the following way: Hereafter, the basis functions obtained with the dynamic approach are referred to as global basis functions. Let us remark that a slightly different strategy can be applied. It consists in using, simultaneously, several global pressure fields obtained with different prescribed boundary conditions. In that case the boundary conditions of the cell problems remain fixed during the simulation but the number of degrees of freedom per coarse face is increased. This strategy, as suggested in [27], helps to improve the accuracy of the method.
In this work, the cell problems (11) and (12) are discretized using a TPFA scheme. Figures 2 and 3 show multiscale basis functions computed on a one-dimensional mesh with 1000 cells and a coarse mesh made of two coarse cells. The three multiscale basis functions obtained for a constant permeability field equal to one are represented in Figure 2. They correspond, in that case, to the hat functions used to approximate the velocity in the RT 0 mixed finite element space. Both graphs on Figure 3 depict the three multiscale basis functions and the permeability values used to compute them.

Update of the basis functions
In the IMPES or IMPIMS schemes, the total mobility k is updated after each resolution of (15), which requires an update of the basis functions. However, the variations of k are usually small from one time step to one another in regions far from the water saturation front. Theferefore, before a new resolution of the pressure equation, only a few basis functions need to be recomputed. To trigger these updates in cells where significant mobility variations occur, we chose the criterion proposed in [28]. Considering the quantity

Application of the method
The MMsFE method described above was implemented using the HPC simulation framework developed at IFPEN and based on the C++ Arcane platform [29]. Mesh data structures, parallelism, I/Os and scalable linear solvers are, for instance, some core functionalites provided by this platform. In our experiments, the coarse and fine linear systems are solved using the BiCGStab solver from PETSc [30] preconditioned with an AMG preconditioner from Hypre [31]. All cell problems are solved using UMFPACK linear solvers [32]. The threshold e tol = 0.8 is used to trigger locally an update of the basis functions. In our implementation, at each time step, once all fluxes have been computed by one of the two methods (TPFA or MMsFE), the new water saturations are computed implicitly using a Newton-Raphson non-linear solver. The value of the time step is adjusted during the simulation according to the number of the Newton-Raphson iterations: it is increased if the number of iterations is less than a number given by the user, it remains the same if this number is exceeded while obtaining the convergence and is decreased if the convergence is not reached. This classical strategy enables the use of big time steps during the simulation but leads to update a larger number of basis functions at each time iteration. It may not be the optimal strategy for the MMsFE method since smaller time steps require fewer updates. But we adopted it for the runs carried out with the TPFA scheme and used the same sequence of time steps for the MMsFE method in order to have the solution at the same times.
The differences in the fluxes given by the TPFA method and the MMsFE method are measured by computing the relative L 2 error of the water saturations: In the following section, we introduce the tenth SPE test case which has become a classical benchmark for upscaling and multiscale methods over the past years [33]. We compare, for this case, the solutions obtained with a local and global calculation of the basis functions.

SPE10 test case
The reservoir defined in the tenth SPE test case is a parallepiped made of 60 cells of size 20 ft in x-direction, 220 cells of size 10 ft in y-direction and 85 cells of size 2 ft in z-direction. Unlike the original data set, the permeability values are here taken equal to the horizontal values and truncated to 0.1 mD if they are below this threshold. This reduces the permeability contrasts but make all cells active in the resolution process. Figure 4 shows the permeability field in two layers of the Tarbert and Upper Ness formations.
Since our main objective in this paper is to quantify the influence of the permeability heterogeneities on the two different pressure solvers, we used a constant porosity value throughout the domain. Note that the contrasts of porosities here only have an impact on the conditioning of the linearized problem related to the saturation equation (9). Relative permeabilities are chosen such that kr w ¼ S À S wi 1 À S wi À S or 2 and kr o ¼ 1 À S À S or 1 À S wi À S or 2 ; with S wi ¼ S or ¼ 0:2. Oil and water viscosities are equal to 1 cp and 0.3 cp respectively. On the reservoir boundaries, the following conditions are imposed: -a fixed pressure of 1000 psi along with a water saturation equal to 1 on the border y = 0, -a fixed pressure of 500 psi on the border y = y max , -no flux on the other boundaries.
The computation is performed on a coarse grid of size 12 · 20 · 17. The simulations were run until the water volume injected into the reservoir reached 5% of its porous volume. Figure 5 shows the water saturations obtained with the TPFA method on the fine grid and with the MMsFE method using local basis functions and global basis functions. The use of dynamic information for the resolu-tion of the cell problems enables the global approach to better reproduce the fine reference solution compared to the local one. The relative L 2 error defined in (13) is indeed equal to 6.4% for the global method and to 11.6% for the local one.

Parallel computation
This section focuses on the parallel implementation of the MMsFE method and highlights some key points to ensure performance and scalability on clusters. The performances of the MMsFE method are analyzed through the tenth SPE test case introduced in the previous section. The number of MPI processes used for this test case is relatively small but scalability issues can already be found out on that example.
The results presented hereafter were obtained thanks to simulations performed on the IFPEN cluster. On this machine, each SMP node is composed of 2 Intel Xeon octo-cores processors E7-2670 (2.6 GHz). All 378 nodes are interconnected with QDR infiniband links (40 Gbit/s) and offer a peak of 110 Teraflops.

Pure MPI approach
A classical SPMD domain decomposition is used to distribute the calculation work into multiple tasks. Each parallel task works on a subdomain of the reservoir and the communications between the subdomains are done using the standard MPI interface.
The upper plot in Figure 6 shows the CPU times obtained with the classical TPFA pressure solver and with the local and global MMsFE solvers as a function of the numbler of MPI processes. Here the acceleration rises up to a factor 9 but falls to 4 when more than 16 MPI processes are used. The lower plot in Figure 6 shows that a better efficiency is achieved with the TPFA solver than with the MMsFE solvers.

Balancing the calculations of the basis functions
The performances of multiscale methods highly depend on the number of basis functions that should be recomputed at each time step. As mentioned in Section 3.3, we used a criterion based on the variations of the total mobility to trigger an update of the basis functions defined in the regions where these changes occur. Thus, the parallel performance of the MMsFE method depends on the way the saturation front is distributed among the processors. In the tenth SPE test case introduced in the previous paragraph, the water flows from the boundary Y min to the boundary Y max and this direction should be taken into account when creating the domain decomposition. In the case depicted in Figure 7 only a small part of the processors will be used to compute the basis functions whereas a better scalability can be achieved in the case represented in the right plots of Figure 7. With unstructured mesh, it is still possible to use weights in the connectivity graph to guide the partition of the domain in some directions. But, in oil reservoir simulations, the flow can be localized for some time around a few wells and, in that case, one has to resort to a dynamic load balancing of the mesh. Figure 8 shows the CPU time used for the computation of the local and global basis functions as a function of the MPI processes in the tenth SPE test case.

Balancing the resolution of the coarse linear system
A second issue comes from the number of MPI processes used to solve the coarse linear system. Since multiscale methods enable to agglomerate a significant number of fine cells into a coarse one, the resulting linear system can be small in comparison to the number of MPI processes which are available, leading to scalability problems as illustrated in Figure 9 for the tenth SPE case. Here, the CPU time increases when more than 16 processes are used. On the other hand, the saturation solver which only works on the fine grid, keeps a good scaling when increasing the number of MPI processes as shown in Figure 10.
A first solution to improve the speed-ups related to this step of the method could consist in using a hybrid parallel programming technique which uses both MPI and threads processes, in order to reduce the number of MPI processes  assigned to the coarse linear solver [34]. But then, the question of the optimal number of MPI processes that should be dedicated to this solver brings up and, in case of multiscale methods, this number could be significantly different from the one required by the other parts of the simulator. Keeping too many processes will reduce the performances as it was previously pointed out, since communications through the network will represent a large part of the time related to the iterative process. Conversely a reduced number of processes would penalize the other parts of the simulator. Concerning the implementation, hybrid parallel programming requires some effort since all existing software components of the simulator should be threadsafe. Thus, for these reasons and as suggested in [35], we propose in the next section a two-level MPI approach to reduce the distribution of our coarse linear systems. Note that this approach remains complementary to a hybrid parallel programming model.

A two-level MPI approach
The scalability problem related to the coarse pressure linear system and mentioned in the previous section is similar to the issue pointed out in [35] for coarse operations in multigrid staging. In this work, the authors resort to a second level MPI communicator set on a small part of the machine to circumvent this difficulty. The role of this second communicator consists in gathering the coarse data, processing them and sending them back to the first level MPI communicator. By that way less computing resources are used. The same strategy was adopted here by using a redistribution library for linear systems. This library, called ALEPH for Linear Algebra Extended to Hybrid Parallelism [36] is said to be ''hybrid'' in the sense that it uses different MPI communicators. The redistribution is done in a smart way using the topology of the machine. SMP nodes are filled in a compact manner in order to minimize costly operations with the second level MPI-communicator. Moreover, cache optimizations have been performed to minimize memory communications between the communicator levels. For instance, the global indexes of the linear systems are sent only once during the simulation. To solve the coarse pressure system, the ALEPH library collects, on a user-defined subset of MPI processes, all contributions to the matrix and the right-hand side that have been computed by all MPI processes in the first MPI-communicator. Then, the compacted coarse pressure system is solved using only this subset of MPI processes in the second level MPI-communicator. This resolution may be carried out by any MPI-based library such as PETSc, HYPRE, Trilinos, etc. In this work, the PETSc library was used. At the end of the resolution, the solution is distributed over all MPI processes. The ALEPH workflow is summarized in Figure 11 for an example of machine with two nodes composed of four processors. For the local approach, Figure 12 shows the CPU time taken by the coarse pressure solver according to the number of MPI processes when the ALEPH library is used or not. Here the redistribution operated by ALEPH enables to keep the CPU time below the limit of 1.5 s when more than 16 MPI processes are used. The same trend can be observed in Figure 13 with the global approach. Since this reduction on a small subset of MPI processes is only performed for the resolution of the coarse linear system, this strategy enables the other parts of the resolution process, and in particular the saturation solver, to run on a wider number of MPI processes. Nevertheless, a significant part of the available nodes remains unused during that stage and to make the most of these calculation resources, other programming techniques such as asynchronism should be studied.

Another example of speed-ups obtained with the MMsFE method
The size of the grid of the tenth SPE test case, presented in Section 4.1, is not very large in view of the calculation resources offered by the clusters at the present time. We therefore introduce here a second test case with a larger grid size where the speed-ups provided by the MMsFE method turn out to be more significant.

Description
That case takes up again the main parameters of the tenth SPE example. Only the dimensions of the domain and the permeability field change. Here, the fine-grid's size is 256 · 256 · 256. A geostatistical facies realization was created on that grid. Two facies were used to model high and low permeable regions within the reservoir with proportions 0.6 and 0.4 respectively. Figure 14 shows the obtained permeability field. The size of the coarse grid is here equal to 32 · 32 · 32. The global approach is used to compute the cell problems.

Results
The pressure solutions given by the TPFA and MMsFE methods are represented in Figure 15. The water saturations obtained with the fluxes related to these two methods are shown in Figure 16. Here the relative L 2 distance between these two solutions is equal to 3%.   All simulations were performed using 256 MPI processes over 16 nodes. Table 1 gives the CPU times obtained with the TPFA and MMsFE pressure solvers. Without the use of the ALEPH library, all MPI processes are used for the resolution of the coarse linear systems. Several ALEPH configurations have been tested up to the case where 128 processes are dedicated to the resolution of these systems. The processors used for each configuration are selected taking into account the topology of the nodes. For instance, the configuration using 128 processes uses exactly 8 nodes, the configuration using 32 processes uses 2 nodes and so on. The best configuration leads to a speedup of nearly 37 with 16 processes, which corresponds to solve coarse problem on one single node. These results tend to suggest that significant accelerations can be achieved during the resolution of the coarse system by limiting extra-node communications, which was carried out here by reducing the numbers of MPI processes and nodes at the same time. Table 2 gives the CPU times of the ALEPH solver using 16 processes spread over one to 16 nodes. We show that the   best configuration leads to use only one node and to maximize the intra-node MPI communications and that performances decrease when more nodes are used.
The total simulation time, including the pressure and saturation resolution times, is equal to 1255 s with the fine TPFA scheme (1137 s for the pressure equation, 118 s for the saturation equation) and to 156 s with the MMsFE solver (30 s for the pressure equation, 126 s for the saturation equation), leading to a global speedup of 8.

Conclusion
As shown in this work, two factors can limit the scalability of the MMsFE method: the calculations of the basis functions which may be unbalanced between the subdomains and the size of the global coarse linear system which can be small compared to the number of allocated calculation resources. Different solutions have been mentioned in Section 5.1.1 to tackle the first problem. But our works were mainly focused on the second one for which we propose a two-level MPI approach.
Our approach consists in reducing the number of MPI processes to a few ones which are concentrated on a few nodes when solving the coarse pressure system. This strategy, which was carried out here by using the ALEPH library, enables to decrease temporarily the number of MPI processes for that step of the MMsFE method while using the whole set of MPI processes for the other parts of the simulator. Compared to a hybrid programming model which may affect other parts of the application and thus may require more efforts in terms of implementation, this reduction and clustering of the MPI processes is only performed within the MMsFE method.
This technique leads to significant speed-ups for the two-phase model and for the mixed multiscale method considered in this work. But other mixed methods, like the ones introduced in [37] and [38] and based on the Generalized Multiscale Finite Element Method, could be used as well. These last two methods feature, in particular, the possible use of more basis functions in each coarse element and can ensure spectral and mesh convergences.
The next step will be to test the dynamic load balancing for the calculation of the basis functions along with the two-level MPI strategy. In view of reducing computing times for oil and gas reservoir simulations, well boundary conditions as well as a more complicated three-phase Black-Oil model will be considered for the following of this study. The results of these tests will thus turn out the potential of acceleration that the MMsFE method can provide for industrial applications when used in combination with these parallelism techniques.