Benchmark study of simulators for thermo-hydraulic modelling of low enthalpy geothermal processes

In order to assess the thermo-hydraulic modelling capabilities of various geothermal simulators, a comparative test suite was created, consisting of a set of cases designed with conditions relevant to the low-enthalpy range of geothermal operations within the European HEATSTORE research project. In an effort to increase confidence in the usage of each simulator, the suite was used as a benchmark by a set of 10 simulators of diverse origin, formulation, and licensing characteristics: COMSOL, MARTHE, ComPASS, Nexus-CSMP++, MOOSE, SEAWATv4, CODE_BRIGHT, Tough3, PFLOTRAN, and Eclipse 100. The synthetic test cases (TCs) consist of a transient pressure test verification (TC1), a well-test comparison (TC2), a thermal transport experiment validation (TC3), and a convection onset comparison (TC4), chosen to represent well-defined subsets of the coupled physical processes acting in subsurface geothermal operations. The results from the four test cases were compared among the participants, to known analytical solutions, and to experimental measurements where applicable, to establish them as reference expectations for future studies. A basic description, problem specification, and corresponding results are presented and discussed. Most participating simulators were able to perform most tests reliably at a level of accuracy that is considered sufficient for application to modelling tasks in real geothermal projects. Significant relative deviations from the reference solutions occurred where strong, sudden (e.g. initial) gradients affected the accuracy of the numerical discretization, but also due to sub-optimal model setup caused by simulator limitations (e.g. providing an equation of state for water properties).


Introduction
A wide range of geothermal installations operate at low enthalpy conditions, including direct heating and Underground Thermal Energy Storage (UTES). The latter has become an attractive means to respond to seasonal changes in the availability and demand in heating of buildings.
The Geothermica-funded HEATSTORE project (https://heatstore. eu/) strives to advance diverse high temperature underground thermal energy storage (HT-UTES) technologies, ranging from aquifer thermal energy storage (ATES), with pilot sites in the Netherlands, Iceland, and Switzerland, through borehole thermal energy storage (BTES, with a pilot site in France), pit thermal energy storage (PTES in Denmark) to heat storage in abandoned underground mines (MTES, at a site in Germany). ATES systems stand out from this group, with great potential due to their simpler design, relatively high charging and discharging rates, and fundamentally due to their large thermal heat capacity at relatively lower costs in comparison to other systems that require construction and maintenance of some form of container (Lanahan & Tabares-Velasco, 2017). While low temperature (< 25 ∘ C) aquifer thermal energy storage (LT-ATES) systems are common (>2800 installations), particularly in the Netherlands, with a minority existing in Belgium, Denmark, and Sweden), high temperature (∼ 25 ∘ C to ∼ 90 ∘ C) aquifer thermal energy storage (HT-ATES) systems are still rare, with only a handful existing in operation worldwide (Fleuchaus, et al., 2018).
The performance and design parameters of HT-ATES systems are governed by coupled physical and chemical processes taking place within the subsurface, but how they will act in a planned project is typically inaccessible to simple analytical calculations. Numerical modelling is therefore a crucial part of the research and design process for HT-ATES systems, namely to provide cost-effective essential input to studies of feasibility, risk, optimization, and environmental impact. Thermo-hydraulic (TH) processes comprise the transfer of heat via conduction and advection between the rock and the fluid that flows through the subsurface void space, and constitute prime simulation targets for early-stage feasibility, design, and optimization assessments. For more advanced assessments, thermo-hydraulic-mechanical (THM) processes such as poro-and thermo-elastic deformation and/or thermohydraulic-chemical (THC) fluid-rock interaction may also be important.
The existence of coupling between the different subsurface processes typically results in the non-trivial evolution of flow, thermal structure, and other parameters. Testing simulators of subsurface geothermal processes via comparison with production data can therefore be challenging, as the latter is often either incomplete or lacks sufficient information about the context of the actual processes monitored, and ultimately requires careful interpretation and post-processing that can cause long delays in the exercise. In contrast, the reliability of simulators can be assessed more efficiently by evaluating their ability to simulate well-defined problems with known solutions.
Numerical comparison studies have been carried out for decades in closely related contexts, highlighting the importance of such assessments in efforts made by large groups with simulators of varying origin and capability. The Stanford Special Panel (1980) performed a study aimed to establish simulator reliability for investment decisions, and focused on fluid heat and mass transport (i.e. TH) in porous media through six problem scenarios. Both single and two-phase flow conditions (i.e. water and steam) 1D, 2D, and 3D cases were considered, including one 2D case where a well intersected a simplified fracture. Most simulators tended to agree with one another with their results; however, several reports were made of errors due to certain ambiguities or misinterpretations of the individual problems, as well as other human-related errors. Pruess, et al. (2004) performed a similar study within the context of CO 2 injection in saline aquifers via seven simplified 1D problems and one 2D problem, all of which included some combination of the characteristic physical and chemical processes. Pruess, et al. (2004) reported varying levels of agreement among the ten different simulators that were compared, a number of programming bugs found in the source code of some of the simulators used, and a few cases where no rational explanation could be found for the discrepancies observed. Mukhopadhyay, et al. (2013) also performed a CO 2 injection-related comparison while evaluating levels of uncertainty for different key modelling parameters that may lead to discrepancies among simulators during comparisons. The Geothermal Technologies Office sponsored a Code Comparison Study (GTO-CCS) (White, et al., 2016) aimed at identifying capabilities and developmental needs of numerical simulation tools for modelling Enhanced Geothermal Systems. White, et al. (2016) reported a relatively good agreement across participants in their seven-problem comparison, where outstanding differences were mainly attributed to the lack or possession of particular features (e.g. advanced models), incorrect application of assumptions, numerical approach used (i.e. discretization), mesh refinement, constitutive relationships, and the inability or reluctance of participants to modify underlying code to match specific conditions needed for a particular test. The DECOVALEX project (Jing, et al., 2016) is an on-going effort, aimed at validating simulators and their conceptual models as well as increasing the understanding of coupled THMC processes in radioactive waste disposal settings. Considering the high level of complexity in their modelling tasks, results were reportedly in good agreement and differences were mainly attributed to model characterization, mesh refinement, and conceptual issues. Kolditz, et al. (2018) describe multiple benchmark problems, covering single (i.e. H, T, M) and coupled (i.e. HM, TM, THM, and THC) processes, in a verification effort for the OpenGeoSys (OGS) project. Carrayrou, et al. (2010) presented a series of one-and two-dimenstional reactive-transport modelling (RTM) tests of varying complexity for the MoMas group. Similarly, the SeSBench initiative (Steefel, et al., 2015) created a series of RTM benchmarks with a particular focus in verifying and establishing common solutions, while leaving the validation of the involved simulators a secondary task.
In the context of this paper, a simulator application, or simply a simulator, consists of core algorithmic instructions and calculations necessary for numerical modelling which have been translated into a computer program, or source code. The resulting simulator is used to perform TH simulations, while optional features, the pre-processing of input data, and the post-processing/visualization of results are provided by a series of peripheral applications. The latter are typically closely associated and packaged with the main simulator and hence, unless otherwise explicitly stated, a name assigned to a simulator applies to the complete simulator package.
In the present work, we benchmark those project-associated simulators that the HEATSTORE community aimed to use for simulating ATES systems through their porous media flow modelling capabilities. The resulting test cases (TCs) and model solutions are, however, of relevance to any geothermal modelling within the low enthalpy range of conditions. The central goal of our study is to benchmark the capability of different simulators to correctly simulate the individual physical processes relevant to ATES systems (and other low enthalpy geothermal operations). This is a necessary step before simulation of full systems and comparison with operational / field data can be done in a meaningful way.
Within the HEATSTORE consortium, seven partners employ 10 different numerical simulators to carry out modelling work (i.e. COM-SOL, MARTHE, ComPASS, Nexus-CSMP++, MOOSE, SEAWATv4, CODE_BRIGHT, Tough3, PFLOTRAN, and Eclipse 100). These simulators cover a range of numerical methods including finite differences, finite volumes, and finite elements, as well as hybridized versions of these. Both academic and commercial simulator applications are used, ranging from fully open source to proprietary "black box" simulators. Given that each simulator is used independently for a different pilot site, a comparative accuracy assessment between all simulator applications is needed to ensure a high level of consistency in assessments across the HEATSTORE project. There is a relatively clear overlap in terms of challenges to the above-mentioned comparison efforts, where the human factor is a particular highlight. No matter how well defined a modelling task is, human-related error tends to percolate through in terms of programming, conceptual, as well as usage and configuration inaccuracies. Recognizing the technical and human-factored challenge imposed by such an exercise, this paper introduces the test suite defined by the HEATSTORE consortium, aiming at increasing confidence and ensuring the reliability of simulation results across the project.
The test cases are designed to capture the processes relevant for HT-ATES systems, and to minimize the human factor by maintaining a low level of complexity while still imposing a considerable challenge in terms of processes simulated, meshes used, and heterogeneities modelled. The results obtained by all users from respective partner institutions are discussed, and recommendations and conclusions are presented regarding simulator suitability when applied to geothermal topics related to and beyond HT-ATES design. The detailed specifications of the simulated problems allow parties working on HT-ATES or related technologies to compare the capabilities of their preferred simulators to the ones presented here. For straightforward comparison, the data needed to reproduce this test suite is available for download in the electronic supplementary material.

Test Suite Definition
The test suite is composed by four test cases, defined to examine different facets of HT-ATES-related TH problems: transient pressure test verification (TC1), a complex well-test comparison (TC2), a thermal transport experiment validation (TC3), and a convection onset comparison (TC4).
A key element in real-world subsurface heat storage applications is that water properties can vary significantly over the temperature interval of interest, e.g., dynamic viscosity decreases by a factor of approximately three from 25 to 90 • C, and by approximately four from 10 to 90 • C, causing a similar increase in hydraulic conductivity. Since such variations may significantly influence flow behavior, we considered it important to perform the simulations with water properties that are temperature-and pressure-dependent, which is often not considered in analytical solutions to standard comparison problems. Unless otherwise stated (e.g. TC1) an IAPWS standard formulation (e.g., IAPS-84, IAPWS-IF97, and IAPWS95 in order of increasing accuracy) (Haar, et al., 1984) (Wagner, et al., 2000) (Wagner & Pruß, 2002) was expected to be used, combined with the respective model for viscosity (e.g. IAPWS-R12-08) (Huber, et al., 2009) to obtain these properties throughout the necessary simulator solution process.
Since all simulators were three-dimensionally-capable, all participants were instructed to treat all cases as three dimensional simulations whenever possible, even if the test case could have been simulated in two dimensions or cylindrical coordinates. Furthermore, effort was requested for participants to utilize meshes that were provided specifically for these tests. Considering that in some circumstances, direct use of these meshes may not have been possible due to differences or limitations in modelling approaches or import formats, each participant was also allowed to produce an adequate-yet-similar set of meshes for their simulator. Particular care was instructed in terms of number of mesh cells and points, and overall resolution distribution (e.g. minimum and maximum cell sizes).
• The test cases were also indirectly designed to assess the existence and usability of a range of capabilities such as: • the implementation of relevant equation of state and viscosity models for water. • use and import various types of meshes (e.g. radial, Cartesian).
• introduce and control wells.
• the possibility to obtain interpolated property values at any spatial point in the domain. This is useful when monitoring a value at specific location of interest that does not correspond to a numerical mesh point/node. The alternative is that the mesh should be created in such a way that a node is conveniently positioned exactly on the desired monitored location.
Each individual test does not constitute a complete ATES system model but is intended to independently highlight a particular, important process or event that frequently takes place during the operation of an ATES system. Given their simplicity, there is an inevitable application overlap, and thus TC1-TC4 are likely to be pertinent to a variety of geothermal modelling purposes.

TC1: Analytical transient pressure verification
This test case simulates reservoir response to pure liquid water injection during the practice of well testing (i.e. also known as pressure transient testing, PTT, or pressure transient analysis, PTA). Related to HT-ATES as well as other geothermal applications, injection of water in a reservoir is an essential event for the propagation and distribution of heat. Accepting simplifying assumptions, TC1 idealizes a reservoir of homogeneous properties into a cylindrical geometry, and focuses on reproducing a characteristic pressure response in a non-deformable, confined, infinitely wide reservoir of thickness h. A constant volumetric flow rate Q is injected uniformly through a well placed at the reservoir center, of length equal to the reservoir thickness and radius r wb (see Figure 1).
An analytical solution can be derived by assuming symmetry about the z axis, the absence of gravitational forces, and by posing the problem in the radial dimension (Philips, 1969) (Pullan, 1990) (Basha, 1999) (Chen, 2007). The resulting isothermal, slightly compressible flow problem is given by the following governing equation, coupled to corresponding initial and boundary conditions, where p is the fluid pressure, κ = k/(μ f β f ϕ) is typically known as the pressure diffusivity, k is the assumed isotropic permeability, μ f is the fluid viscosity, ϕ is the rock porosity, β f is the fluid compressibility, and p res is an arbitrary, constant, far field reservoir pressure. While the problem posed by (1), (2), (3), and (4) is undefined for r = 0, it is defined for any r > 0 and thus certainly for any radius larger than the well bore (i.e. 0 < r wb ≤ r). The analytical solution is thus given by, where r ≥ r wb (5) where Ei is known as the exponential integral function, Reproducing solution (5) as accurately as possible in a threedimensional finite simulation is the main goal for each participant of this test case. Furthermore, for the assumptions to remain formally valid while a constant volumetric flow rate is being applied, viscosity μ f and fluid compressibility β f should also be considered constant by each simulator.
When defining boundary conditions, the top and bottom boundaries should be set to no-flow, while the cylindrical outer boundary should be set to a Dirichlet type with a value of p res , thus allowing mass to leave the domain. In terms of initial conditions, pressure should be set to a uniform p res value throughout the computational domain. With respect to well modelling, the participants are free to use the best and most convenient methodology available to their simulators to achieve the required flow rate at the center of the reservoir. Injection at a constant volumetric flow rate Q should begin immediately (i.e. on the very first time iteration) and be maintained for the duration of the simulation.
Two independent simulations should be run, each involving a mesh with a different level of refinement, with the intention of demonstrating/verifying mesh convergence. The meshes should be radially structured with an identical number of uniform angular subdivisions, and composed of volumetric cells (i.e. hexahedra and prisms) to define the fluid-filled rock matrix, boundary cells (i.e. quadrilaterals and triangles) defining domain boundaries, and line cells to define well trajectories. Basic mesh information is presented in Table 1. In some simulators, additional boundary cells may be needed to define the necessary well-modelling conditions at the well face.
A sample snapshot of a coarse version of the required mesh is shown in Figure 2(a), while a plot of the radial distribution of cell sizes (i.e. the edge aligned with the radial direction) is provided in Figure 2(b). The cell size distribution data for each mesh is available for download in the electronic supplementary material as a tab-delimited text file.
All necessary simulation parameters including finite domain dimensions, as well as the times and locations where fluid pressure values should be compared are presented in Table 2. Unless otherwise stated, all parameters were purposefully chosen in an attempt to reduce simulation run-times. Nevertheless, in an effort to assess the sensitivity of simulators to the effects of geological heterogeneities in the reservoir material properties (i.e. in TC2), TC1 establishes a base case with a relatively low permeability that would be otherwise infeasible for seasonal aquifer heat storage or direct hot water production; this low permeability, however, poses a stronger technical challenge to the simulators due to buildup of steeper pressure gradients. The values used for permeability and reference temperature (34 • C), were inspired by prior published field observations by Chevalier, et al. (2010) and the GeoMol Team (2015), as well as field observations during the drilling of well GEO-01 (Guglielmetti, 2021) at the HEATSTORE Geneva pilot site.

TC2: Injection-Falloff-Drawdown-Build-up well test
Well tests such as injection, falloff, drawdown, and build-up are typically run on a drilled well to obtain and/or expand the amount of  Figure 2. (a) Coarse mesh required for TC1 and (b) a distribution plot of the cell radial size vs. the location of its outer-most end-point.

Table 2
Simulation input specification summary for TC1. For simulators where setting fixed fluid properties proves problematic, reference values at 34 ∘ C and 10 6 Pa should be used.

Domain thickness (h) 200 [m]
Initial pressure (pres) 10 6 [Pa] Outer boundary pressure (pres at r = rmax) J.E. Mindel et al. Geothermics 96 (2021) 5 information known about a sub-surface reservoir (Slotte & Berg, 2017). Furthermore, such as sequence also represents an idealized operation of a seasonal storage system. The injection phase represents the charging period, while the drawdown phase represents the discharge period. Intermediate phases of falloff and build-up represent periods of storage or inactivity. The test case described in this section intends to reproduce the reservoir response to a pre-defined sequence of these operations and is aimed at comparing TH modelling capabilities across participating simulators.
The sensitivity of simulators to effects due to geological heterogeneities in the reservoir material properties is also studied in this test case. For this purpose, and inspired by field observations (Chevalier, et al., 2010) (GeoMol Team, 2015 (Guglielmetti, 2021), an otherwise infeasible low-permeability base case scenario was devised. Geologic heterogeneity is then introduced via a cylindrical high permeability zone (inspired by targeting reef structures of high porosity and permeability at the Geneva site) as well as a fracture zone. Both of these rock material discontinuities constitute variants to the base case, represent  J.E. Mindel et al. possible scenarios to be encountered in a complex subsurface, and are simplified/generic enough to provide the reproducibility expected for a comparative test case.
Utilizing the basic setup presented for TC1, TC2 combines a non-stop well-test sequence of, injection, falloff, drawdown, and build-up for three reservoir variants. With Variant 1 defining the homogeneous base case, Variant 2 introduces a region of higher uniform permeability and porosity centered around the well, from r = r wb until r = r reef . Variant 3 also uses Variant 1 as a base, and introduces a single fracture at a specific distance from the well, x = x frac . Graphical representations of Variant 2 and Variant 3 are shown in Figure 3.
In contrast to TC1, heat transport is no longer neglected, and the volumetric flow rate Q varies depending on the operational stage. Gravitational effects (i.e. buoyancy) are still neglected; however, and the initial reservoir temperature is assumed uniform, of value T res . Top and bottom boundary conditions should be set to adiabatic/no-flow for all variants, while initial and far field pressure values are set to p res = 10 7 [Pa] in contrast to a lower value used in TC1. This was prescribed to avoid possible issues with negative pressures during the drawdown phase due to the low base permeability. Furthermore, no "well damage" factors should be applied to the near-well-bore region, as a change in such factors would affect the measured pressure at the well face, and thus could drastically affect the comparisons.
All fluid properties (i.e. ρ f , μ f , β f ) should be functions of pressure  and temperature (i.e. IAPWS formulation combined with the respective model for viscosity should be used), while fluid thermal conductivity λ f is assumed constant. Combined bulk thermal properties such as conductivity and heat capacity are porosity-weighted, where subscripts f, r, and t indicate fluid, rock porous medium (i.e. dry) and total/bulk thermal conductivity λ, density ρ, and specific heat capacity c p , respectively. A total number of six simulations should be run for TC2, when counting the three variants, and two types of meshes: radial and Cartesian. In Variant 2, a high-porosity, high-permeability zone is introduced, centered on the well, to emulate the effects of a reef structure. For the case of the radial mesh, the zone consists of a cylinder with r = r reef = 10 [m], while for the Cartesian mesh, a square section of 2⋅r reef = 20 [m] a side is used. Permeability, porosity, thermal conductivity, and specific heat capacity values are k reef , ϕ reef , λ reef , c p reef , respectively. In Variant 3, a fracture zone is introduced on the plane x = x frac = 10 [m], of length 2400 [m]. This fracture inherits material properties from the host rock, except for permeability (k frac ) and porosity (ϕ frac ). An aperture parameter a frac should be used to obtain the permeability via the "cubic law", where k frac = a 2 frac /12 (Zimmerman & S. Bodvarsson, 1996). The fracture should extend to the boundaries of the domain, as shown in Figure 3(b), and boundary conditions at intersections should be set similar to corresponding domain boundaries. Flow should only be allowed to enter or leave the domain through the connection of the fracture to the lateral boundaries.
To reduce simulation run-times, meshes used for this problem have no restrictions imposed to their vertical resolution and should have horizontal near-well refinement characteristics as similar as possible to the coarse mesh used in TC1, described in Figure 2(b). For Variant 1, the mesh corresponding to Variant 2 can be used by assigning material properties to the reef material, which are identical to the surrounding reservoir rock. Horizontal geometric details of sample meshes can be observed in Figure 4, while zoomed-in internal views of the heterogeneities are shown in Figure 5.
The sequence of the well tests, constituting a total of one year of simulated time, should generate a well-face pressure response similar to the one shown in Figure 6 by adhering to the following order:  Table 3.

TC3: Experimental heat transport validation test
This test case is aimed at studying unidimensional, unsteady, fluid and heat transport through porous media, with particular focus on monitoring the evolution of a thermal front. As such, it is intended compare the accuracy of simulators used to model the regional flowinduced transport of a heat signature created by an ATES system. It is based on the published material by Singh, et al. (2006) who describe numerical results using two distinct numerical solution approaches, while simultaneously comparing them to experimental results. Our aim is thus to emulate the settings of their experiments to be able to compare and validate all results from participating simulators of this test case. We considered that while flow velocities used are well above typical regional groundwater flow velocities that would allow the viable existence of an ATES system, the basic process of advection-dominated heat transport is nevertheless modelled, and a priority should be set to the experimental validation aspect of the test.
As depicted in Figure 7, each experiment of this test case comprised passing water through a pipe, packed with a particular type of sphere. Two experiments were run, one with glass, and the other with steel spheres. Depending on the type of sphere, a different length of pipe was used (i.e. L glass = 0.725 [m], L steel = 0.465 [m]), with same radius R = 0.029 [m]. Flow was initially established at steady cool temperature T C = 26.85 [ ∘ C], and at time t > 0 the water inflow temperature was increased at the inlet to T H = 41.85 [ ∘ C]. The step thermal signal was transported through the pipe at a Péclet number specific to each sphere pack, and temperature was measured by thermocouples placed at three prescribed locations. Figure 7. Physical model and coordinate system, adapted from Singh, et al. (2006).
An approach for estimating the pressure drop across a packed bed in the corresponding simulations is not reported by Singh, et al. (2006), mainly due to their assumption of a constant, steady, and uniform velocity field. In addition, their validated numerical methods for heat transfer appear to be of a linear nature. In contrast, our current test case is designed to be run through non-linear solvers, where most fluid properties are functions of p, and T. While open-source-based simulator applications may be readily modified and re-compiled to operate under special conditions such as a uniform prescribed u f , this is not necessarily true for all commercial simulators, and thus the relevant applicable conditions for fluid pressure should be determined.
Superficial flow velocity u f and particle Reynolds number Re p used in the experiments can be estimated using the experimental Péclet numbers as well as the provided reference water properties (Singh, et al., 2006), where μ f , c p,f and λ f are the fluid dynamic viscosity, heat capacity, and thermal conductivity, respectively. Using the formulas for the Péclet number for heat transfer where R is used as a characteristic length scale, and the particle Reynolds number for packed beds, where d p is the particle diameter and ϕ is the pack porosity, values were obtained in Table 4 for both experiments which describe an advectiondominated heat transfer as well as the beginning of the transitional flow regime where 10 < Re p < 2000 (Rhodes, 2008) . Due to the onset of the transitional flow regime, the Ergun equation (Ergun, 1952) is an optimal choice to perform an estimate of the pressure drop across the packed bed, where Δp E is the Ergun pressure drop across the pack, d p is the particle diameter, Φ s is the particle sphericity (i.e. with a value of 1 for spheres), and ϕ is the medium porosity. Nevertheless, due to the unidimensional and practically steady flow conditions a first order approximation can be made via the Kozeny-Carman equation (Kozeny, 1927) (Carman, 1937) (McCabe, et al., 2005, where Δp K is the Kozeny-Carman pressure drop across the pack, and an empirical factor of 180 has been found to apply to uniformly sized sphere packs (Rhodes, 2008). Equation (11) maintains consistency with the Darcian flow assumptions of all participating simulators in exchange for a minimal cost in accuracy. With flow reference conditions given by (8) and Table 4, this loss in precision was assessed by measuring the sensitivity of various fluid properties to the inlet conditions. After calculating the pressure at the inlet necessary to match the drop approximated by (10) and (11), density ρ f , fluid enthalpy h f , and specific fluid heat capacity c p,f were calculated using IAPWS-IF97 formulas (Wagner, et al., 2000), a temperature equal to T H , and a fluid pressure value obtained by adding the approximated pressure drop to 101325 [Pa]. Fluid properties were then compared in terms of the relative difference to those obtained when using the Kozeny-Carman approximation. The results of this comparison are presented in Table 5, showing a minimal effect of selecting (11) instead of (10) through extremely low differences in fluid property calculations. The latter can in turn be translated into a negligible non-Darcian effect on the current validation exercise.
We can now proceed to estimate a value of equivalent permeability of the pack of spheres in the pipe by equating superficial velocity u f to Darcy velocity in (11) to obtain,  Pressure and temperature sensitivity analysis of fluid density, specific enthalpy, and specific heat capacity, depending on the approximation used to calculate the pressure drop. Parameters used in equations (10) and (11) were obtained from (8), Table 4, and A summary of particle diameter, pipe length, porosity and resulting permeability values is presented in Table 6, together with other fluid and solid properties.
Two simulations should be run, each with a different radially structured mesh corresponding to the appropriate length of the pipe, L glass or L steel . Gravitational effects should be ignored, and flow should be assumed single-phase, non-isothermal, compressible (i.e. or slightly compressible). Flow velocity u f should be considered positive in the positive direction of the x-axis, which should be the preferred orientation of the pipe. The inlet should be placed at x = 0 [m], a longitudinal mesh spacing of 1% of the pipe length should be used, and a 10% of the pipe diameter is suggested. Characteristics of the meshes corresponding to each experiment can be observed in Figure 8.
Initial conditions are of steady flow at a temperature T C , and boundary conditions should be applied which create uniform flow rate along the pipe corresponding to the respective u f values from Table 4. The latter can be achieved by either prescribing the flow velocity over the whole domain, or solving for Darcian fluid flow by imposing boundary conditions such as a flow rate (i.e. Neumann) condition at the inlet and a Dirichlet pressure condition at the outlet p out .
At all times, the assumption is made of instantaneous thermal equilibrium between the solid spheres and the fluid, and thus it is suggested that a combined thermal conductivity be calculated via a porosity weighted approach, where an effective thermal conductivity of the fluid has been calculated which includes axial mechanical dispersion effects (Wakao, et al., 1979).
The pipe-wall should be assumed to be adiabatic (i.e. ∂T /∂r = 0), the outlet temperature gradient should be set to zero (i.e. ∂T /∂x| L = 0), and the imposed inlet water temperature should rigorously follow the experimental data for the first thermocouple PROBE_1 at x = 0 [m] as a time-dependent Dirichlet condition. The remaining data from two thermocouples, PROBE_2 and PROBE_3, are to be used for validation purposes. The locations of all monitoring probes are shown in Figure 8 and presented in Table 6.
The measurements presented by Singh, et al. (2006) are given through plots of non-dimensional timet, and non-dimensional temperatureT, where T is the dimensional temperature in [ ∘ C], and α f is the fluid thermal diffusivity calculated via α f = λ f /(ρ f c p,f ) using values from (8).
The output times required at the locations of all thermocouples are given by the following set, where Δt = 0.0025, α f = 0.00146 [cm 2 /s], and values sampled at PROBE_1 are essentially a sanity check for the inlet boundary condition.

TC4: Horton-Rogers-Lapwood problem
This test case is designed to model internal natural convection in a confined porous medium. While fluid is always contained within the modelled domain, heat may leave and enter it. Acting as a complement to the other test cases presented in this paper, the focus of this particular exercise is on measuring the capabilities of simulators for modelling thermal buoyancy-based onset of instabilities.
Rayleigh-Bénard (RB) convection is a self-organizing non-linear phenomenon triggered by particular conditions in the general case of a free fluid. The Horton-Rogers-Lapwood (HRL) problem addresses RB low-inertia convective flow through a saturated porous medium (Horton & Rogers, 1945) (Lapwood, 1948). In such a setting and given their longer development time-scale, RB cells constitute a projected outcome of buoyancy effects that can be initially observed during the lifetime of an HT-ATES system (Nagano, et al., 2002) under the influence of cyclic heating during charging stages. As shown in Figure 9, in one of its possible configurations (Beck, 1972), the HRL problem consists of a fully confined liquid-water-saturated porous medium of constant isotropic permeability k, thermal conductivity λ r , density ρ r , and heat capacity c p,r , in the presence of gravitational forces. Despite the simplistic boundary conditions, and considering being representative of an adequately permeable system, the onset of convective cells through the HRL problem establishes an unambiguous demonstration of the capability of a simulator to model buoyancy in a porous medium subjected to relevant temperature ranges.
The top and bottom boundaries are defined as thermally conductive, and subjected to constant temperature boundary conditions at T C and T H , respectively. Initial conditions are given by a static thermal gradient defined by (T H − T C )/H and hydrostatic pressure conditions. In such a setting it is useful to characterize the expected convection onset based on the Rayleigh number, defined as the ratio between the diffusive and buoyant thermal transport time scales, These timescales are governed on one hand by the thermal diffusivity α t of the combined fluid and porous medium, and on the other by the buoyancy-triggering density differences of the fluid as well as the resistance to flow, Considering linearized theory (i.e. the Boussinesq approximation), the Rayleigh number for a porous medium can be defined as, where μ f is the fluid viscosity, Δρ f is defined as the fluid density difference between the two temperatures T C and T H , and the thermal diffusivity is given by α t = λ t /(ρc p ) t . The bulk thermal conductivity λ t and heat capacity (ρc p ) t may be estimated via a porosity-weighed approach (Riley & Winters, 1989), ) t = ρ f c p,f ϕ + ρ r c p,r (1 − ϕ) where subscripts f, r, and t indicate fluid, rock (i.e. dry) porous me-dium, and total/bulk properties, respectively. According to analytical work by Beck (1972), convective flow within the enclosure shown Figure 9 is practically guaranteed if Ra is greater than a critical value Ra c = 4π 2 . Hewitt, et al. (2014) stated that for Ra c < Ra <∼ 382, steady and stable RB cells will occur. For ∼ 382 < Ra <∼ 1300 the RB cells become increasingly unsteady, chaotic, and unpredictable. Beyond Ra ≈ 1300 there should be minimal steady or quasi-steady structure remaining in the flow.
The enclosure dimensions were chosen to ensure two-dimensional behavior in a three-dimensional environment and for a domain height that is similar to typical HT-ATES aquifer thicknesses. As shown by Beck (1972) and as also studied by Florio (2017), as long as the domain dimension in the y-direction remains sufficiently small in comparison to x and z, convection should happen only along the x − z plane. This pseudo-three-dimensional arrangement allows testing three dimensional gravitationally-driven heat transport features of simulators while being able to observe and compare results qualitatively and to a certain extent quantitatively in terms of the number of horizontal RB cells, using two dimensional cross section plots.
In terms of predictability of a correct solution under a given set of conditions, analytical determination of the number of horizontal RB cells for a particular Ra value is not a straightforward subject and is beyond the scope of this paper. Nevertheless, as shown by Horton & Rogers (1945), Lapwood (1948), Beck (1972), Riley & Winters (1989), Tyvand (2002), Hewitt (2014) and many others, analytical solutions are characterized by modes that determine the number of RB cells that should be present if Ra is slightly above Ra c . Under such conditions, a preferred mode determines the number of RB cells that will arise. Furthermore, Riley & Winters (1989) showed through linear analysis that a single convection cell in the vertical direction is a characteristic of the preferred stable solution in two-dimensions, while the preferential number of horizontal ones is dependent on the ratio L/H. Minimizing the expression for Ra that satisfies the analytical solution to the eigenvalue problem posed by the flow conditions (Florio, 2017), yields a mode m equal to 2 at a minimum value of Ra = 4π 2 , for the geometry defined in Figure 9. This mode also represents the minimum amount of RB convective rolls expected to appear for any simulation in this test case, also shown in Figure 9. Six simulations should be run for this test case, consisting of all possible combinations of three possible values for permeability and two  different mesh refinements. Using values provided in Table 7 Ra may be used as a function of permeability via (17), as depicted by the red line in Figure 10. Permeability values for the simulations in this test case, named k low , k med , and k high , were selected to yield Rayleigh numbers within the range for stable RB cells to form, 4π 2 < Ra <∼ 382 . Simulation meshes should follow a regular pattern, as shown in Figure 11, where grid cells are cubic, and edge sizes are 5 [m] for the coarse grid, and 2.5 [m] for the fine grid. Furthermore, and in contrast to other linearized approaches found in the literature, a pure H 2 O equation of state (e.g. IAPWS standard) and corresponding model for viscosity should be used in the simulations.
Initial conditions should correspond to a linear (i.e. steady state conductive) thermal gradient based on bottom (80 • C) and top (20 • C) temperatures, which cover the typical range expected for HT-ATES applications, as well as a hydrostatic pressure gradient based on a top boundary value of 10 7 [Pa]. The latter pressure boundary value is representative of deep aquifers (Kühn, et al., 2002), should only be used to determine the initial internal hydrostatic conditions, and not be used as a Dirichlet condition in the subsequent transient simulation.
Since the initial conditions represent a metastable solution to the problem, and that structured meshes are used, most simulators may need a disturbance to trigger the formation of convective cells. A small artificial perturbation should thus be introduced only during the first time step Δt, in the form of a variation of temperature values at the bottom boundary T B as a function of x, the bottom temperature boundary condition: Vertical-cross-sectional snapshots should be produced consisting of streamlines as well as the temperature field at times t = {10, 100, 1000} [yrs]. Fluid and porous medium thermal properties, including a full compiled list of parameters to be used in the simulations is provided in Table 7.

Results & Discussion
Seven partners from the HEATSTORE project performed the test suite presented in this paper:   J.E. Mindel et al. simulator application typically possesses unique accuracy and performance characteristics stemming from its core formulation, algorithmic implementation, programming language, and other peripheral capabilities such as the ability to model fractures and wells. Given the breadth of configuration choices available to most simulators, the comparison we present in this paper targets the capabilities of simulator packages under specific settings optimized unilaterally by each participant, thus adding an important 'human factor' component. The latter should be kept in mind in the discussion of results that follows, where the reference to a simulator (e.g. MARTHE, MOOSE, Eclipse 100, etc.) implies the association between each human user (i.e. or team of users) and the corresponding simulator. Furthermore, and in particular for commercial simulators or large open source projects, some feature options may not be readily available, which could also affect a capability and its optimal accuracy. Table 8 presents a summary of major characteristics such as time and space discretization, as well as the level of completion achieved in each TC.
For reference, data and plots for TC1, TC2, and TC3 at the required monitoring locations are reported using the names PROBE_1, PROBE_2, PROBE_3, etc. for each particular test case. The increasing number of the probe implies that it is further away from a reference location. In the case of TC1 and TC2, the reference location is the center of the well at the center of the simulated domain, while for TC3 the reference is the pipe inlet (i.e. x = 0 [m]).

TC1 results
All participating simulators were able to reasonably reproduce the analytical solution, as can be seen through a sample of results plotted over distance from early (10 [s]) and late (10 5 [s]) time levels in Figure 12, as well plotted over time for PROBE_1 in Figure 13, within coarse and fine mesh contexts. In the case of the Eclipse100 simulator, Table 8 Participant information including basic time (Δt) and space (ΔΩ) discretization and status of completion of simulation sets for each TC. results for the coarse mesh were only available starting from PROBE_3 onwards, and for the fine mesh from PROBE_2 onwards. In the case of COMSOL, for the coarse mesh, results were available from PROBE_2 onwards.
Analyzing the full set of results presented in Appendix A, highest peaks of relative error were presented by the Tough3 and PFLOTRAN simulators (~47%, ~43%) for the coarse mesh case, and by the Nexus-CSMP and MARTHE simulators (~25%, ~21%) for the fine mesh case. Error levels drop consistently and uniformly the further away a particular probe location is from the injection point. PROBE_2 highest levels of error are dominated by Tough3, COMSOL, and SEAWATv4 for the coarse mesh, and by Eclipse 100 (i.e. considerably, at ~48%), Tough3, and MOOSE for the fine mesh. In the cases of PROBE_3, PROBE_4, and PROBE_5, error levels for peaks are considerably lower (e.g. in the ranges ~12% -~28%). A time delay of the onset of the maximum error is also observed, which depends on the location of the probe for all simulators. In PROBE_1, the error peak appears between 1 and 100 seconds of elapsed time for the coarse mesh case, and between 1 and 10 seconds for the fine mesh case. In contrast, the error peak at PROBE_8 appears at the end of the simulation (t = 10 8 [s]).
Eclipse 100 must be signaled as a particular outlier for this test since, due to license limitations, radial meshes were not available to be generated. Furthermore, it was not possible to output results or interpolate them at the well face (i.e. PROBE_1) for both the coarse and fine versions. PROBE_2 was only available for the fine mesh case, where results were particularly inaccurate with error values reported of ~48%. A challenge may have also existed in the generation of the mesh, given that results from PROBE_1 and PROBE_2 were missing. The mesh used may thus not have been in accordance with the guidelines in terms of cell size distribution shown in Figure 2(b), and it was much coarser than expected for both coarse and fine cases.
Given the range of formulations and implementations used by all participants taking part in this particular test, a reason for the resulting values of error may not easily be identified without deeper study, which is outside the scope of this paper. A major challenge is presented in terms of accuracy in most discretizations by the steep gradients existing at the well face (i.e. particularly during the initial stages), as well as by the modelling of sources and sinks required to represent wells. Human factor aside, all simulators seem to be reasonably up to the task for TC1.

TC2 results
Through this test case, simulators should be able to reproduce typical well test responses based on artificial-yet-representative underground realizations, which include two essential types of heterogeneities: faults and reef structures. Although all participants were able to perform at least one of the variants within TC2, only four (COMPASS, Nexus-CSMP++, MOOSE, and CODE_BRIGHT) were able to carry out all of them. The COMSOL participant chose not to carry out simulations with a Cartesian mesh after not being able to import the mesh information supplied by the guidelines, and for other undisclosed reasons. In the case of MARTHE, the participant reported that the version of the source code used did not allow the simulation of vertical fractures in a radial mesh, and the SEAWATv4 and Tough3 participants reported a similar situation. The PFLOTRAN participant was unable to carry out Variant 3 simulations for indeterminate reasons. In the case of Eclipse 100, due to the available commercial license, a missing feature prevented the use of radial meshes. A completion summary is presented in Table 9.
Pressure results for Variant 1 at PROBE_1, for both radial and Cartesian meshes, are relatively comparable for most simulators, as can be observed in Figure 14 (a) and (b). Only CODE_BRIGHT and MOOSE appear to stand out in this respect with values spanning well beyond the ranges of the other seven simulators. In particular, for the Cartesian mesh case, CODE_BRIGHT presents a considerable delay in the pressure response. MOOSE, on the other hand, does not express delay but reports a higher level of asymmetry for pressure values observed during injection and drawdown (+3/-5 MPa). For Variant 2, pressure results remain reasonably similar and symmetrical for injection and drawdown periods across simulators, with minor exceptions for MOOSE and SEAWATv4 in both mesh types (see Figure 14 (c) and (d)). For Variant 3, in contrast to other simulators and consistent with its behavior in Variant 1 and 2, MOOSE reports a higher pressure change during drawdown than during injection, for the same flow rate magnitude (see Figure 14(e) and (f)). The outlier for Variant 3 is SEAWATv4 when using a Cartesian mesh, where the injection pressure change is higher than the change during drawdown.
In terms of temperature results for this test, Variant 1 shows similar issues for CODE_BRIGHT in the form of a time delay (see Figure 15(a) and (b)). Nexus-CSMP++ displays a slight delay in the temperature decrease for Variant 3, radial mesh case during the fall-off and drawdown periods in comparison to other simulators. Furthermore, there seems to be an acceleration of the temperature decrease during the build-up phase, leading to a final difference with other simulators of approximately -6 • C.
For all variants, we can also observe that MOOSE and Tough3 have not utilized an enthalpy source-type or rate-type condition at the injection site, but have resorted to a Dirichlet condition for temperature. The latter is evident from the fixed nature of the temperature evolution throughout the injection part of the simulation. In response to this observation, the MOOSE participant reported that specifying an enthalpy injection rate for the equation of state formulation preferred by this test (e.g. IAPWS-IF97) was not an implemented feature at the time that results were submitted for this study. An enthalpy-rate-based boundary condition with the preferred equation of state was later made available within the main MOOSE repository, which would have likely brought results closer in line with other simulators. Testing and implementing such new formulation was beyond the tasks of our study.
The complete set of results can be observed in section Appendix B, where the remarks made for PROBE_1 are translatable to the remaining seven probes placed downstream in sequence from the well face. For the probes placed farthest from the well (i.e. PROBE_5, PROBE_6, PROBE_7, and PROBE_8), small numerical fluctuations can be observed particularly for the temperature results. All simulators tested display these fluctuations, ranging ~+/-0.01 • C, and are likely due to cumulative numerical error of varying source (e.g. formulation, implementation, etc.). Overall, minus the obvious outliers, the comparison of results for both pressure and temperature for all simulators are within reasonable agreement.

TC3 results
The goal of this test is to compare simulator heat transport capabilities under steady input flow-rate conditions. Namely due to the singular dimension of the flow considered by this experiment and the low geometric complexity, all simulators were able to perform this test. The main challenges consisted in reading the input data over time for the inflow boundary condition and setting the flow conditions to match the specifications.
As evidenced by the results presented in Figure 16 and Figure 17, all simulators were able to read in the input boundary conditions set by the temperature time-dependent profile at PROBE_1. This implies that the level of error at this particular monitoring point should be relatively low. Only Eclipse 100 produced a read-in error above 2% for both the glass and steel spheres case, which may be attributed to some round off process. Only for the steel spheres, COMSOL also produced an error above 2%, where careful inspection revealed that the glass sphere case input was accidentally used for the steel sphere case. At the time of drafting this paper, a correction was not received from the corresponding participant.
For most participants (i.e. ComPASS, COMSOL, PFLOTRAN, MOOSE, Nexus-CSMP++, SEAWATv4, CODE_BRIGHT, MARTHE, Tough3) the shapes of the time-dependent temperature monitor at each probe location remained qualitatively very similar to the experimental results, with an observed slight simulated speedup of the thermal front in most cases. Assuming all participants have followed the described meshing guidelines for TC3 precisely, the slight speedup expressed by this first group of simulators is most likely because heat losses through walls were not modelled (i.e. ∂T/∂r = 0). Including this factor would have the greatest effects, with the heat being lost radially and a subsequent subtle slow-down of the temperature front. Within this first sub-group of simulators, at probes 2 and 3 maximum levels of error observed are ~10% and ~20% for glass spheres, and ~4% and ~7% for steel spheres, respectively. COMSOL shows a slightly elevated level of diffusion, as evidenced by the difference in the slope of the line (i.e. ∂T/∂t) in both steel and glass sphere cases.
Eclipse 100 remained as the main outlier due to their exceptionally slow-progressing and diffuse temperature profile leading to elevated error values ranging from ~15% to ~30%. The reason for the latter is reportedly difficult to assess, but is likely due to a setup/configuration issue. An inadvertent excessive heat loss may have been modelled through non-adiabatic walls, coupled to excessive diffusion due to the discretization and/or incorrect grid resolution, as supported by the observed slope in most of the curves shown in Figure 16 (b and c) and Figure 17 (b and c).

TC4 results
The focus of this test case is on comparing the convection-modelling capabilities under unstable conditions. A challenge is presented to the core formulation of each simulator, in terms of capturing the dynamic onset and evolution of RB cells.
With the exception of COMSOL and Eclipse100, all participants were able to carry out most of the required simulations. Reportedly, COMSOL was simply not able to converge on any of the simulations set out by this test regardless of the permeability values or mesh grade used. It is suspected that the human factor may be at play given the long-standing use Table 9 Simulator completion summary per variant case for TC2. An 'X' implies that a particular simulation has been completed.
of COMSOL in a wide variety of dynamic settings. Reportedly, due to its internal "black-oil" formulation Eclipse 100 is also incapable of modelling buoyancy whilst using an equation of state required by the specifications in this test. Eclipse 100 does not include a true energy equation and mimics hydrothermal flow via the numerical transport of an equivalent passive tracer field. Although a newly developed Eclipse Geothermal simulator is available, it was not licensed for this work.
Assuming that the setup/configuration for all other simulators were performed accurately, these successful ones were able to produce result snapshots at 1000 [yrs] for all six required simulations, with one minor exception from MOOSE, for which solution convergence was not achieved for the high permeability, fine mesh case. The extended summary of results presented in Appendix C shows that, for the same mesh and equivalent permeability, some simulators display a disturbance in their temperature fields far earlier than others. As further observed in an extract of early (i.e. t = 10 [yrs]) results presented in Figure 18, while streamlines may already exist at a very early stage, velocity magnitudes may vary and not have caused significant alteration to the initial conditions. This contrast can be observed in the different levels of perturbation of the temperature field for each simulator.
As observed in Figure 19, some simulators may reach a different number of RB cells at 1000 [yrs]. Studying the fully detailed reasons behind this is beyond the scope of this paper; however, in principle it can be assumed that it pertains to at least the following ones, in decreasing order of importance: 1 Coupled to the non-linear aspect of the problem, the formulation, numerical discretization, and solution process likely have an impact on the level of numerical diffusion in both space and time, and may thus tend to either attenuate or even occasionally strengthen instabilities. As observed in Figure 18 for the SEAWATv4 and PFLOTRAN simulators, the flow field is strong enough already at t = 10 [yrs] to significantly affect the temperature field. This is not the case for Nexus-CSMP++. 2 The level of initial perturbation. It was not verified whether all simulators were able to properly and uniformly implement equation (20).
A summary of achieved RB cells at t = 1000[yrs] per simulation case and participant is presented in Table 10. A rounded average is presented, displaying a value of 3 for the low permeability cases, 4 for the medium permeability cases, 4 for the high permeability coarse case and 5 for the high permeability fine case. As expected, these values show that there is a general tendency for increase in the number of RB cells with increasing permeability, as should be the case due to increasing Rayleigh number. This is also true to a lesser extent for decreasing mesh-cell size. Nevertheless, some simulator formulations exhibit a higher sensitivity to permeability values and mesh grade than others. Nexus-CSMP++ displays a constant 3 cells throughout all 6 simulations, while other simulators like Tough3 increase the number of RB cells from by at least 2 when changing from low to medium permeability. Table 10 also shows that simulators utilizing FD and FV methods tend to be more sensitive to the increase in permeability than those possessing at least some FE aspect, with the notable exception of MOOSE. In a similar light, with the exception of Nexus-CSMP++, all simulators display a sensitivity to the permeability and mesh cell size values. Although the only observable characteristic difference that could account for this is in the timestepping (i.e. Nexus-CSMP++ uses an explicit method, while all others use an implicit method), further tests with other simulators set to utilize an explicit method should be carried out to allow further investigation of   [-] for the glass sphere simulation as part of TC3. Each column of plots represents measurements at a particular probe (i.e. 3 probes in total), while the top row represents temperature, and the bottom row represents error. PROBE_1 corresponds to the input boundary condition.  [-] for the steel sphere simulation as part of TC3. Each column of plots represents measurements at a particular probe (i.e. 3 probes in total), while the top row represents temperature, and bottom row represents error. PROBE_1 corresponds to the input boundary condition.
whether time discretization approach is indeed an important factor.
Due to the non-linear characteristic of the fluid flow assumptions in the problem posed by TC4, analytically predicting the number of RB cells at a particular time is a difficult task. Approaches exist; however, they typically rely on a linear stability analysis under Boussinesq considerations instead of a non-linear equation of state. Notwithstanding, these analytical solutions do provide a guide via the value of a "preferred mode" when Ra is only slightly above Ra c . This "preferred" number of cells m = 2 was displayed by ComPASS and CODE_BRIGHT for the simulations with the lowest value of permeability (see Table 10). Given that for k low , Ra ≈ 49, a further analysis of (19) reveals that m = 3 is also a possible steady mode, which is the one displayed by all other participants. A similar analysis for k med and k high reveals conditions closer to m = 6 and m = 9, respectively, and highlights the fact that predicting a numerical solution for this problem at a particular time level is not straightforward. Since all modes up to a particular Ra are possible and technically correct, the reasons for a simulator settling into one of them may be also related to formulation and associated numerical diffusion. Nonetheless, TC4 was designed to focus on comparison and capability testing, and not as a specific verification exercise. Overall, with some degree of variation, results were within reasonable agreement. The latter is particularly true for the simulations Figure 18. Cross-simulator comparison depicting coarse-mesh snapshots of streamlines and temperature contours for the medium permeability case at time level t = 10 [yrs] for PFLOTRAN, Nexus-CSMP++, and SEAWATv4. Figure 19. Cross-simulator comparison depicting coarse-mesh snapshots of streamlines and temperature contours for the medium permeability case at time level t = 1000 [yrs] for PFLOTRAN, Nexus-CSMP++, and SEAWATv4.

Table 10
Summary of observed number of RB cells at t = 1000 [yrs] for each permeability value and mesh grade used. run with k low (i.e. k = 2⋅10 − 12 [m 2 ]).

Conclusions and Recommendations
A TH modelling test suite was used in a benchmark exercise carried out by seven teams in the context of the HEATSTORE project, comprising up to 10 simulators. Within the suite, tests included an analytical transient pressure test (TC1), a well-testing model test (TC2), an experimental heat transport validation test (TC3), and a convection onset comparison test (TC4). The benchmark demonstrates and verifies the different capabilities of simulators utilized in the HEATSTORE project from a comparative accuracy point of view. Overall, most of participants were able to perform most tests reliably.
Major deviations from analytical solutions or from mean results are restricted to extreme conditions such as the early stages of the transient pressure test (i.e. TC1), or the sudden changes in operational modes (i.e. TC2). In some scenarios, the relative deviation can be considerable even outside extreme conditions. Inferred setup problems may be an important issue in those cases, including lacking functionality and/or numerical susceptibility of the respective simulator. Often, however, the latter are of secondary relevance since solution deviations are still within the uncertainty range of an actual field measurement.
In essence, this indicates that most simulators in this study are suitable to solve HT-ATES subsurface simulation problems. A few crucial exceptions occurred when recommended setup choices could not be enabled, most notably the lack of an adequate equation of state treatment in Eclipse 100. Similarly, and although it was implemented after this study was completed, the lack of enthalpy-injection functionality in MOOSE affected the accuracy of the simulation. These two examples, however, show the difference of a proprietary vs. an open source simulator. In principle, the lacking functionality could have been implemented in MOOSE if the scope and duration of the study would have allowed it, while it would most likely not have been not possible in Eclipse 100.
Occasional other problems (e.g., with COMSOL) may have been solved if sufficient resources for improving simulator setup could have been invested; this demonstrates that the human factor plays an essential role: the simulator and its setup controls need to be known well enough to be able to perform robust simulations. Even experienced users may not be aware of all relevant parameters when applying a simulator beyond their previous experience. This applies to both commercial proprietary and open source (i.e. often academic) simulators.
The lessons learned lead us to formulate the following recommendations for the use of numerical TH modelling in the design, operation and optimization of HT-ATES systems and other geothermal applications: • A necessary requirement is that the simulator is able to account for first order effects of temperature-and pressure dependent fluid properties in the simulated physical processes. A particular learning here is that the use of sufficiently accurate models is needed to correctly capture the important effects of temperature and pressure dependence on fluid properties. For HT-ATES and other geothermal applications, accurate treatment of this is critical to assess the economic viability and potential of projects. • Depending on the question at hand, the availability and usage of realistic options for initial and boundary conditions is essential to arrive at a sufficiently accurate solution. Proxy setups such as constant temperature instead of enthalpy-rate injection in wells, for example, may lead to inaccurate simulations. More generally, for commercial simulators, a 'licensing factor' comes into play: the lack of an appropriate license for a feature typically disallows its usage and the applicability to the problem of interest may be affected. • The person responsible for carrying out the simulation should be experienced in the use of the simulator for the particular problem at hand. Some simulators are well documented for certain types of problems, but it is often the case that only developers or very experienced users may know how to apply some features to particularly uncommon problems. Even experienced users of widely used and well tested simulators may obtain inaccurate results on relatively simple problems when the simulator is applied outside their normal area of simulation expertise.
A main outcome of this benchmark is the experience obtained from the comparative analysis, providing assurance in the results obtained throughout HEATSTORE, and prospectively the selection of a suitable simulator to be used for a particular scenario at a project site. The results show that most simulators are capable of performing all elementary, yet crucial TH calculations at a reasonably similar level of accuracy. Like in other benchmark studies of numerical simulators, the human factors of available expertise and ease-of-use are of prime concern when problem and/or simulator complexity increases. Ultimately, regular and systematic comparison of simulators should help companies and institutions build confidence, improve them, and/or help select suitable ones for a particular problem at hand.