SHEMAT-Suite: An open-source code for simulating flow, heat and species transport in porous media

Abstract SHEMAT-Suite is a finite-difference open-source code for simulating coupled flow, heat and species transport in porous media. The code, written in Fortran-95, originates from geoscientific research in the fields of geothermics and hydrogeology. It comprises: (1) a versatile handling of input and output, (2) a modular framework for subsurface parameter modeling, (3) a multi-level OpenMP parallelization, (4) parameter estimation and data assimilation by stochastic approaches (Monte Carlo, Ensemble Kalman filter) and by deterministic Bayesian approaches based on automatic differentiation for calculating exact (truncation error-free) derivatives of the forward code.


Motivation and significance
Numerical simulations of coupled subsurface fluid flow and heat transport problems are routinely performed in a number of geoscientific fields, including geothermal energy, groundwater, nuclear waste storage and CO 2 sequestration. Simulations are increasingly based on large and complex geological models, requiring sophisticated and parallelized software packages, ideally with a history of extensive benchmarking and testing on reallife applications. SHEMAT-Suite is such a software: a versatile, general-purpose open-source code for simulating flow, heat and species transport in geological reservoirs. Conceptually, SHEMAT-Suite treats geological reservoirs as porous media. The code solves transient or steady-state, forward and inverse coupled problems in 1D, 2D, and 3D. It can handle a wide range of time scales and, thus, can address both technical and geological processes. SHEMAT-Suite can handle the non-linearities resulting from the variation of rock and fluid properties with temperature, pressure and species concentration. In addition, SHEMAT-Suite is capable of deterministic and stochastic inverse simulations and data assimilation, which enable parameter estimation and uncertainty quantification. These are important features, since subsurface flow and transport modeling is subject to data scarcity and large uncertainties of different sources.
Other software packages can also handle problems that can be solved by SHEMAT-Suite. Some are available as purchasable research code or as commercial packages, such as TOUGH [1], FEFLOW [2] and STOMP [3], others are distributed as open-source initiatives, including OpenGeoSys [4], DUMUX [5] and MOD-FLOW [6]. SHEMAT-Suite complements available open-source packages with a powerful toolkit of functionalities including: (a) multi-level OpenMP parallelization for simulations of flow through porous media [7][8][9] (b) a modular user-framework for subsurface parameter modeling and customized graphical output (c) species transport including density-driven flow; (d) uncertainty quantification, parameter estimation and data assimilation by stochastic approaches (Monte Carlo, ensemble Kalman filter, [10]) and also by deterministic Bayesian inversion based on Automatic Differentiation (AD) for calculating exact Jacobian matrices of the forward model [11].
SHEMAT-Suite in its present form has evolved from SHEMAT (Simulator for HEat and MAss Transport [12,13]), a fully coupled flow and heat transport model that was developed from the isothermal USGS 3-D groundwater model of Trescott and Larson [14,15]. Originating from the Fortran-77 code SHEMAT, SHEMAT-Suite has been rewritten in Fortran-95 [11], refactored and enhanced subsequently with various additional capabilities within numerous research and PhD projects. These capabilities include (1) a parallelization scheme for SHEMAT-Suite [7]; (2) the solution of the systems of coupled partial differential equations by implicit Picard iteration in a finite difference discretization [16,17], discretization in SHEMAT-Suite is implemented on rectangular grids with variable cell sizes; (3) a direct linear solver for small simulations, an implementation of the Bi-Conjugate Gradients Stabilized (Bi-CGStab) method for large simulations [18]; (4) additional solvers and preconditioners easy to integrate in SHEMAT-Suite owing to its modular structure of the solver implementation; (5) linear algebra functions used from the libraries BLAS [19][20][21] and LAPACK [22]; (6) output of variable and parameter fields possible in many formats, most notably VTK [23] and HDF5 3 ; (7) deterministic Bayesian inversion [11]; (8) automatic differentiation of the forward code by the software Tapenade [24] for computing the Jacobian in the deterministic inversion framework; (9) sequential Gaussian simulation (SGSim 4 ) from the geostatistical library GSLib [25] for random field generation in the stochastic simulation mode within a Monte Carlo framework; (10) Bayesian updates implemented as ensemble Kalman filter (EnKF) analysis equations [26]. 5 The current, open-source version of SHEMAT-Suite is restricted to a single fluid phase in the porous medium. However, a multi-phase fluid version, which can, e.g., simulate CO 2 , steam [27,28], and supercritical water [29], is under development.
Important application areas such as groundwater management and contaminant transport simulations are reflected by the scientific contributions of SHEMAT-Suite so far, which can be divided into four categories: geothermics, paleoclimate, and hydrogeology from the geosciences, and, additionally, inverse method development. Geothermics is the most important category and motivated the software development originally. The applications of SHEMAT-Suite in geothermics range from largescale geothermal simulations [30][31][32] to small-scale borehole heat exchanger and temperature sensor simulations [33]. The second field of applications concerns paleoclimate related studies, including an ice module used for simulating freezing and thawing processes in porous media (e.g. [34,35]). Additionally, SHEMAT-Suite has been used in studies of broad geological interest, including a simulation of submarine groundwater discharge at the New Jersey shelf, USA, since the last ice age [36] and a simulation of convection cells in the Perth Basin, Australia [37]. Finally, general development of algorithms has been carried out using SHEMAT-Suite. Various EnKF-methods were implemented and compared [10]. The EnKF is a sequential, ensemble-based data assimilation algorithm that can be used for parameter estimation in large-scale simulations. SHEMAT-Suite uses different techniques from high-performance computing [38,39] (''Energy oriented Center of Excellence'' 6 ) and is coupled to the optimal experimental design framework EFCOSS [40] to enhance reservoir property estimation [41].

Software description
SHEMAT-Suite is a forward and inverse numerical code for simulating flow, heat and species transport.
There are three compilation modes for SHEMAT-Suite: a forward-mode for pure forward computation, an automaticdifferentiation-mode for deterministic inverse computation, and a stochastic-mode for geostatistical simulation, stochastic data assimilation and parameter estimation.
Most of the user interaction with SHEMAT-Suite is restricted to the input file. For the forward-mode, only one main input file is needed. For the inverse modes, additional input needs to be added in the main input file as well as in separate input files.

Software functionalities
The source code of SHEMAT-Suite is organized in two levels, the branch level and the directory level. On the branch level, the version control software git is used to divide the source code into branches. As usual in software development, git-branches are used to develop the software, in particular the so-called 'master branch'. For SHEMAT-Suite, this concept is slightly extended to four master branches (Fig. 1). The four master branches are derived from the three core functionalities of the software: (1) forward-mode for the solution of flow, heat and dissolved species transport, (2) AD-mode (ad) for Bayesian inversion employing a derivative-based optimization scheme using tangent linear and adjoint codes generated via automatic differentiation (AD) for calculating exact Jacobian matrices, and (3) stochastic-mode (sm) for Monte Carlo simulations, data assimilation and parameter estimation using the EnKF. The first of the four branches, master, can be compiled only in forward-mode and is optimized for pure forward computations. The second branch, master-ad, can be compiled in forward-mode and AD-mode. The third branch, master-sm, can be compiled in forward-mode and stochasticmode. The fourth and final branch, master-all, can be compiled in all three modes. In general, a user could always work with master-all. However, the smaller codes may give a better performance for the specific task with respect to memory and computation time.
Because of the special branch structure, it is important for SHEMAT-Suite developers to introduce changes in the correct branch and, subsequently, merge these changes into downstream branches. Changes affecting the forward computations should be added to master, changes affecting only the AD/inverse computations to master-ad, changes affecting only the Monte-Carlo framework and stochastic computations to master-sm. The branch master-all should, in principle, only receive the changes to the other branches through merging. Rarely, specific changes may be needed to render the changes in the upstream branches compatible.
Open-source documentation of SHEMAT-Suite is available alongside the source code of SHEMAT-Suite. A SHEMAT-Suite-Wiki 7 includes (1) 9 For example, this repository contains a Python script for converting ASCII input files into hdf5 files that can be read by SHEMAT-Suite. Finally, we plan to gradually switch to continuous integration methodologies of software development. This will simplify the code development workflow, minimize the introduction of errors and thus assure the software quality.

Software architecture
This section explains the source code directories in the branches described in the previous section (see Fig. 2 for a sketch of the source code directories used for forward computations).
The master branch contains the following directories: /forward/ contains most of the code for forward computation, including the handling of input and output, the time discretization loop and the non-linear iteration loop. This code is parallelized using the shared-memory parallel programming paradigm OpenMP aiming for multi-core workstations.
/solve/ contains the solution of systems of linear equations. The separation from /forward/ ensures the easy implementation of additional solvers and preconditioners and is crucial for the usage of an automatic differentiation software.
/props/ contains property modules defining the dynamic behavior and coupling of fluid and rock properties. Property modules can be designed for using both constant, or pressure-and temperature-dependent fluid and rock properties. Additionally, the influence of dissolved species concentrations, typically salt concentration, on fluid properties such as density can be accounted for. The modular structure allows for adding site-or application-specific property relationships.  /cmake/ contains compilation utilities using CMake tools. 10 /doc/ contains input for generating the Doxygen documentation.
The branches master-sm and master-ad add the following directories.

Illustrative example
This section introduces a simple example simulation with SHEMAT-Suite. For numerous scientific application examples the interested reader is referred to the literature cited in Sections 1 and 4 and in Table 1

Forward computation
The numerical solution of the Theis model [42] serves to illustrate the usage of SHEMAT-Suite. The Theis model is a solution of the flow equation, computing the transient water table drawdown resulting from a pumping well in a horizontal, confined aquifer. We compare the numerical solution by SHEMAT-Suite to the analytical solution [13,43]. A compute capsule is provided on Code Ocean for reproducing the illustrative example. 12 Input File 1 shows the input file for the Theis problem. The first section of the input file provides general information of the run, for example a title and the specification of the property module, in this case const. This means that the fluid parameters such as fluid density and viscosity and rock parameters such as permeability are kept constant throughout the simulation. The second section defines the grid starting with the number of cells in x-y-z-direction and followed by the cell sizes. In this case, the grid is coarser away from the well and finer in the center . For the rock matrix, the properties include porosity (10%), permeability (10 −12 m 2 ), and compressibility (10 −8 m s 2 kg −1 ). Other parameter inputs include thermal parameters and parameters for species transport. Since this illustrative example is for groundwater flow only, these additional parameters do not influence the simulation. In the SHEMAT-Suite documentation, users find a complete walkthrough of a number of example input files including this one for the Theis model. Finally, the output file types are set to VTK and HDF5.
The results are shown in Fig. 3. The hydraulic head field visualizes the drawdown around the pumping well after 3.5 days.
The HDF5 output of SHEMAT-Suite is visualized using Python and matplotlib [44]. The comparison with the analytical Theis solution reveals that the absolute error between the two functions is smaller than 10 cm, even in the center of the model, where the influence of the pumping well is greatest and in the order of meters.

Impact
SHEMAT-Suite has contributed to multiple geoscientific areas. A summary of important code developments of SHEMAT-Suite can be found in Table 1. The most significant contribution of SHEMAT-Suite is in the field of geothermal energy that served as original motivation for the development of the software. To name a few recent examples, temperature sensors installed at borehole heat exchangers have been used to estimate groundwater velocities using SHEMAT-Suite [33,45]. In models of the vadose zone, the heat output of high-voltage electric cables was modeled with SHEMAT-Suite [46,47]. For deep geothermal reservoirs, the influence of prominent seismic reflectors on the temperature distribution and the formation of convection cells were studied using the code [32,37]. Additionally, a current research project uses SHEMAT-Suite for predicting the geothermal potential of a caldera system in Mexico [48].
Besides, simulations using SHEMAT-Suite contributed to other geoscientific fields. A property module for ice was developed [34] and is used currently for simulating freezing and thawing processes in porous media. In a current research project, SHEMAT-Suite is used for simulating the submarine groundwater discharge Input File 1: Input file of the THEIS model. on the New Jersey continental shelf since the last ice age [36]. The modern aquifer flow simulations of SHEMAT-Suite are often ensemble-based and rely on high-performance computing [49]. SHEMAT-Suite has had an impact on computational geoscience as well. Three-dimensional inverse parameter estimation was performed with SHEMAT-Suite based on automatic differentiation [11]. Monte Carlo techniques were applied for uncertainty quantification of expected geothermal energy usage in a geothermal reservoir at The Hague, Netherlands [50]. A parallelization scheme was specifically developed for SHEMAT-Suite and implemented [7][8][9]38,39]. The ensemble Kalman filter has been implemented and extensively tested for stochastic permeability estimation [10,50]. Optimal experimental design has been used to find the best position of new boreholes in a geothermal reservoir [41]. The open-source availability of the code will impact the broader usage of these methods in the geosciences.
We will now list some important developments that are currently not part of the open-source version of SHEMAT-Suite presented here, but that are planned to be merged with the opensource publication in the future. A module for multi-phase flow has been developed using SHEMAT-Suite as starting point [28]. Moreover, an equivalent fracture model approach was implemented, which represents fracture permeability by upscaling based on the mimetic finite difference method [54,56]. Since SHEMAT-Suite is a porous medium model, it cannot simulate discrete fracture models and thus flow or transport through fractures explicitly. However, flow through fractured rocks can be approximated under certain conditions by modeling anisotropic permeability or by applying a stochastic-continuum concept [49]. A basic approach for simulating unconfined aquifers has been implemented. A module for electrokinetic potential simulations was included, which can be used for studies of self potential originating from fluid flow in porous media [57][58][59]. Another module was implemented in SHEMAT-Suite for chemical fluid-rock interaction [55]. In the current research project EoCoE, the focus shifts to implementing interfaces of SHEMAT-Suite to modern highperformance software, such as the portable data interface (PDI 13 ) for data handling, PETSc [60] for parallel solver methods and the parallel data assimilation framework (PDAF [61]) for a parallel implementation of the ensemble Kalman filter. Finally, the multiphase flow implementation uses the software PETSc [60] for becoming MPI-parallel and suitable for distributed-memory CPU clusters.

Conclusions
To solve forward and inverse problems arising in flow, heat and species transport, users can choose SHEMAT-Suite for a number of reasons, including (1) the modular character of the software, in particular the novel use of git branches for the software development; (2) out-of-the-box usage of automatic differentiation (AD) for calculating exact Jacobian in inversions; (3) its large variety of application fields, for example shallow and deep geothermal energy, hydrogeology, engineering; (4) the high performance computing capacities, i.e. the OpenMP implementation, and (5) capabilities for deterministic and stochastic inversion.
The recent open-source publication of the SHEMAT-Suite source code has a number of advantages: (1) it makes SHEMAT-Suite available to a larger community of code users and developers, (2) it makes the existing research using SHEMAT-Suite reproducible, (3) it sparks new interesting developments and additions to the code, and (4) it initiates the formation of a SHEMAT-Suite user community.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.