Towards predicting DNAPL source zone formation to improve plume assessment: Using robust laboratory and numerical experiments to evaluate the relevance of retention curve characteristics

We conducted multiple laboratory trials in a robust and repeatable experimental layout to study dense nonaqueous phase liquid (DNAPL) source zone formation. We extended an image processing and analysis framework to derive DNAPL saturation distributions from reflective optical imaging data, with volume balance deviations < 5.07%. We used a multiphase flow model to simulate source zone formation in a Monte Carlo approach, where the parameter space was defined by the variation of retention curve parameters. Integral and geometric measures were used to characterize the source zones and implemented into a multi-criteria objective function. The latter showed good agreement between observation data and simulation results for effective DNAPL saturation values > 0.04, especially for early stages of DNAPL migration. The common hypothesis that parameters defining the DNAPL-water retention curves are constant over time was not confirmed. Once DNAPL pooling started, the optimal fit in the parameter space was significantly different compared to the earlier DNAPL migration stages. We suspect more complex processes (e.g., capillary hysteresis, adsorption) to become relevant during pool formation. Our results reveal deficits in the grayscale-DNAPL saturation relationship definition and laboratory estimation of DNAPL-water retention curve parameters to overcome current limitations to describe DNAPL source zone formation.


Introduction
A large number of industrial sites worldwide are affected by contamination of dense non-aqueous phase liquids (DNAPLs) (Fetter et al., 2017;UBA, 2017;Gupta and Yadav, 2019). This group of often highly persistent chemicals poses tremendous threats to ecosystems and humankind (Sakari et al., 2008;Fetter et al., 2017), especially when they persist in groundwater that is used as irrigation or potable water (Mackay and Cherry, 1989;EC, 2004). In particular, chlorinated solvents such as trichloroethylene have great risk profiles due to their toxic and carcinogenic properties (Gandhi et al., 2002;EC, 2004) and are among the most frequently detected contaminants in the subsurface (McCarty, 2010;Nsir et al., 2012). This clearly motivates appropriate risk assessment and site management strategies (Wiedemeier et al., 1999;Prokop et al., 2000;Kueper et al., 2014). Here, adequate understanding of migration processes controlling source zone formation can help to increase water security and to lower remediation efforts.
Once released into the subsurface, DNAPLs form so-called source zone geometries, i.e., the zone in which the source of the contaminant is residing together with other solid, liquid and gaseous phases (Mercer and Cohen, 1990;Powers et al., 1998;Brusseau et al., 2011). Due to interphase mass transfer, i.e., mainly dissolution of non-wetting phase components into the aqueous phase (e.g., Fure et al., 2006;Agaoglu et al., 2015), such entities represent long-term sources for contamination of downstream groundwater (e.g., Kram et al., 2001;EEA, 2014;Essaid et al., 2015). The complex geometrical and physicochemical properties of source zones are, together with subsurface characteristics such as aquifer heterogeneity and hydraulic conditions, the most sensitive factors in controlling contaminant plume evolution (Soga et al., 2004;Falta et al., 2005;Liedl et al., 2005;Fure et al., 2006;Brusseau et al., 2007;Engelmann et al., 2019a).
In particular at field sites, DNAPL source zone geometries (SZGs) are complex structures that are difficult to describe (e.g., Feenstra, 2005;Essaid et al., 2015). While many spill locations are known to have DNAPL contamination, knowledge on the spatial extent of such sources is widely missing due to technical and financial constraints (Engelmann et al., 2019a). In some cases, source zone extents are estimated using subsurface exploration methods such as depth-discrete sampling (e.g., Guibeault et al., 2005) or indirect geophysical measurements (e.g., Zhang et al., 2002). However, these methods often interpolate sparse point data and suffer from uncertainties. As a consequence, in most site assessments, dissolved contaminant plumes are detected, monitored and treated only (Engelmann et al., 2019a). This has led to numerous sites where remediation efforts have been inefficient or even failed, or exceeded financial resources (e.g., NRC, 2005).
Laboratory-scale experimental studies represent promising opportunities for generating observation data to evaluate migration processes under controlled conditions (e.g., illumination, temperature), and with well-known system properties (e.g., hydraulic boundary conditions). Improved knowledge on factors controlling source zone formation would lead to better predictions of corresponding SZGs in the field, and therefore improved assessment of contaminant plumes affecting groundwater (Engelmann et al., 2019a). Numerical models are powerful tools to augment the interpretation of laboratory experiments (Praseeja and Sajikumar, 2019); they allow more careful assessment of the parameters controlling SZG, and in some cases reduce the number of experiments required for characterization. Calibrated models can then be used at larger scales, e.g., field sites, to estimate SZGs based on DNAPL and field characteristics, with the potential to reduce the technical and economical burden of SZG characterization. A remarkable suite of numerical multiphase flow model software exists (Sookhak Lari et al., 2019a), but applications for the previously described analyses are rare and, with Erning et al. (2012) as exception for DNAPL-type chemicals, mostly related to light NAPL compounds (e.g., Pasha et al., 2014;Sookhak Lari et al., 2016;Bortoni et al., 2019).
Up to now, few studies focused on understanding and predicting DNAPL source zone formation. For example, in Erning et al. (2012), a numerical multiphase flow model was calibrated against DNAPL saturation data to evaluate the relevance of groundwater flow for DNAPL migration. Zheng et al. (2015) investigated DNAPL migration at tank scale under influence by groundwater flow by using a light transmission method and numerical modeling. They found that the retention curve parameters are not very sensitive to simulation results compared to permeability and porosity. However, as shown by other studies dealing with light NAPL chemicals, the sensitivity of retention curve parameters is typically high (e.g., Sookhak Lari et al., 2016). In Cheng et al. (2016) a similar setup used light transmission visualization to trace DNAPL movement under influence by surface active agents, but a comparison of source zone formation under undisturbed conditions was not performed. Therefore, evaluating the relevance of retention curve parameters should receive increasing interest.
The objectives of this study are to define an experimental setup capable of repeatedly generating DNAPL saturation distributions under controlled conditions, and to develop a numerical multiphase flow model that is capable of simulating the DNAPL source zone formation with quantifiable uncertainties. We first present the experimental setup used for generating raw images of source zone formation under defined laboratory conditions. Raw images obtained through a reflective optical imaging method were manipulated by applying a set of customized image analysis and processing (IPA) approaches that were previously presented in Engelmann et al. (2019b). The IPA framework was adopted for generating physically plausible distributions of effective DNAPL saturation. These experimental observation data were used to calibrate a numerical multiphase flow model. A classical Monte Carlo simulation was performed, where the parameter space was defined by variations of the DNAPL-water retention curve parameters n nw (-) and α nw (1/m).
Integral and geometric measures were used to characterize the source zones and implemented into a multi-criteria objective function. Unlike other previous works, this study quantified uncertainties related to the experimental setup, reflective optical imaging as well as numerical model. It furthermore evaluated and proved the relevance of retention curve parameters. The results provide specific indicators for future investigations to ultimately improve the understanding of DNAPL source zone formation.

Materials and methods
The experimental tank setup and image processing and analysis (IPA) methods were previously reported (Engelmann et al., 2019b). In Sections 2.1 and 2.2, these methods are briefly reviewed along with more detailed discussion of new methods used in this work. Each experimental release scenario was performed three times to maintain repeatability and to estimate experimental uncertainties.

Experimental setup for DNAPL migration in porous media
A quasi-two-dimensional, dismountable tank with inner dimension of 0.30 × 0.30 × 0.02 m 3 served as the experimental setup to generate data on DNAPL migration (see Fig. 1). Stainless steel bolts connect the front and back panels (transparent PMMA) with the supporting structures (opaque PVC). A silicon mat with 2 mm uncompressed thickness prevents leakage. Each in-/outflow chamber was separated from the inner part using PVC spacers having homogeneously aligned horizontal perforations (2 mm inner diameter) and geotextile (fully watersaturated hydraulic conductivity K s = 1.2 × 10 − 1 m/s, Co. Flair-stone®) attached onto each inner spacer side. This prevented sediment redistribution while filling the tank with water. A tank thickness of 2 cm was adjusted to minimize wall effects by having a thickness of at least 16 times the largest grain size (Jawitz et al., 1998;van Valkenburg and Annable, 2002), and to minimize DNAPL migration normal to the front and back panels, i.e., DNAPL movement primarily occurs in two dimensions.
A falling-head DNAPL release was performed using a reservoir and a release rod, connected with a flexible tube and remotely controlled via solenoid valve. For each scenario, a DNAPL volume of 20 mL with an initial head difference of 27.5 cm (representing 4036 Pa pressure difference for fluid density ρ n = 1.48 × 10 3 kg/m 3 ) was released into the tank at the same location.
The wetting phase was distilled, degassed water at pH 8.0, dyed with Brilliant Blue FCF (5 mg/L, Co. Supelco™ Analytical). This preparation procedure allowed for the minimization of sorption effects (van Valkenburg and Annable, 2002) as well as for avoiding unwanted reactions and air entrapment. Hydrofluoroether (HFE-7100; > 99%, Co. 3M™ Novec™) was used as non-wetting phase chemical, having properties similar to common DNAPL compounds (e.g., Luciano et al., 2010). To prepare the 100 mL non-wetting phase, 96 mL HFE-7100 was mixed with 4 mL methyl octanoate (> 98%, Co. Thermo Fisher Scientific) to allow for dying with 5 mg/L Sudan IV (Co. Alfa Aesar). This mixture had a viscosity, μ n , of 0.6 × 10 − 3 Pa s at 20 • C according to Luciano et al. (2010), and a DNAPL-water interfacial tension, σ nw , of 35.59 × 10 − 3 N/m (Luciano et al., 2010;Erning et al., 2012). Due to the relatively low solubility of this mixture in water (< 12 mg/L; Luciano et al., 2010) compared to typical compounds such as trichloroethylene (~ 1300 mg/L; Jellali et al., 2003), interphase mass transfer was assumed negligible within the maximum experimentation time frame of 10 h. The dyes chosen were selected because they provided a sharp optical contrast between wetting and non-wetting phases, and because their sorption to sediments was negligible based on 24-h batch trials (Engelmann et al., 2019a). Observation data was used for model calibration within the stage of DNAPL redistribution during source zone formation as suggested by Kueper et al. (2014) (here, DNAPL redistribution was followed for up to t = 300 s to avoid unknown effects of long-term DNAPL-soil-water interaction; see Section 2.4).
Three types of non-consolidated porous media were used as tank filling material to judge the dependency of DNAPL migration on porous media properties. These are two transparent model soils that facilitate phase dying and image processing (Engelmann et al., 2019b;Wang et al., 2019), and a natural sand (retrieved from a sand pit at Pirna, Germany). The model soils are spherical clear glass beads (diameter 1 mm, Co. VWR) and clear filtering glass (diameter 0.8-1.2 mm, Co. Nature Works). The glass beads were washed and oven-dried for 24 h at 105 • C. The same was done with the filtering glass and natural sand after sieving to single-size fractions of 1.0-2.0 mm. Tank filling was performed using a wet-filling procedure. Compaction was adjusted by adding ~ 1 cm layers and carefully sledging a rubber hammer against the outer tank walls. At the bottom of the tank, a highly compacted natural sand layer (0.2-0.4 mm single-size fraction) of 3 cm thickness Table 1 Hydraulic characteristics for each porous media type, determined at room temperature (~ 22.5 • C). These parameters served as the starting parameter set for model implementation. Values partly adopted from Engelmann et al. (2019b). Variables Glass beads (G) (1 mm) Filtering glass (F) (1…2 mm) Natural sand (S) (1…2 mm) van Genuchten shape parameter b naw (− ) 8.222 11.568 3.360 a Calculated from mean K s value using a water dynamic viscosity μ w = 0.9544 × 10 − 3 Pa s and a water density ρ w = 997.8kg/m 3 (both at room temperature T ∼ 22.5 • C). b Measured at interface between air and water at room temperature (T ∼ 22.5 • C).
served as horizontal aquitard enabling DNAPL pooling. Each type of porous media was characterized for hydraulic properties (see Table 1; see also Engelmann et al., 2019b for methodologies). For characterizing DNAPL migration in the two-phase porous media system, the retention curves defining the relationship between capillary pressure P c (Pa) and saturation S( − ) as well as between relative permeability κ r ( − ) and saturation S are to be determined. In this study, the van Genuchten shape parameters for the interface between water and air, n aw ( − ) and α aw (1/m), as well as residual and saturated water content θ rw ( − ) and θ sw (-), respectively, were estimated under water drainage conditions using a HYPROP® sensor (Co. UMS). The determination of the aforementioned parameters under imbibition conditions was not possible. The tank itself sat on a horizontally aligned stainless-steel frame. The entire setup was placed in a climate chamber (Co. Viessmann) at T = 10 • C ± 1 K and without sunlight exposure to minimize changing ambient conditions. A custom-made rectangular aluminum frame held four light emitting diode panels (18W each, Co. OSRAM), each equipped with a light diffusor. The orthogonal distance between tank and LEDs was set to 0.3 m.
A remotely controlled RX-10M2 camera (Co. Sony), equipped with a 24-200 mm F2.8 zoom lens (Vario-Sonnar T, Co. Zeiss®) and attached to a horizontally aligned tripod, was used for serial image acquisition. A fully manual camera mode was specified, with equal settings for each DNAPL release scenario (lens aperture f/6.3, ISO setting 100, shutter speed 1/40 s, hard manual focus with centered pattern, focal length 200 mm, serial time step length 10 s). Raw images were stored in ARW format, with a spatial resolution of 5471 × 3080 pixels (16:9 lengthwidth ratio) and 24-bit color depth for each channel. 10 × 10 pixels represent a physical area of ~ 1 × 1 mm 2 . A 17.68% gray photostock card was attached to the tank for later white balance adjustment.

Application of IPA framework to calculate experiment-based DNAPL saturation profiles
The raw images of each experimental DNAPL release scenario were converted to TIFF format images (see Electronic Supplementary Material (ESM) S1). These images were then used for calculating spatial distributions of DNAPL saturation S n ( − ). In line with the IPA framework presented earlier (Engelmann et al., 2019b), steps in the following order were considered: white balance adjustment, cropping to area of interest (see green rectangle in Fig. 2), conversion to other color space if required (e.g., from RGB to HSV), selection of single band with highest contrast and lowest noise (e.g., S band out of HSV color model), calculation of absolute difference compared to first image without DNAPL to reduce noise and illumination scattering, and application of a Gaussian filter with specified kernel (e.g., 5 × 5 pixels surrounding the pixel of interest) for noise reduction. Different IPA configurations were compared against each other in order to yield the optimized settings by following the procedure presented in Engelmann et al. (2019b).
The initial usage of a 10 × 10 regular pixel grid (Luciano et al., 2010;Engelmann et al., 2019b) on binary images led to implausible results, with values of S n approaching 1 in most areas of the source zone. Therefore, a strategy different from the IPA framework presented in Engelmann et al. (2019b) was followed. Few studies have investigated the relationship between optical parameters and NAPL saturation. In Darnault et al. (1998), a linear relationship between hue and water content was assumed for a soil-oil-water system evaluated using the light transmission method (LTM). This approach was then transferred to a soil-air-oil-water system, where the linear relationship between water and hue was adopted (Darnault et al., 2001). In Bob et al. (2008) and Cheng et al. (2016), DNAPL migration was visualized using LTM and a non-linear relationship between light intensity and non-wetting phase saturation. However, the latter technique is not applicable for reflective optical imaging as used in this study. In Wang et al. (2019), a non-linear relationship between water saturation and intensity was determined for transparent soil filled with a LNAPL chemical. Due to the entirely different optical and physicochemical properties of fused quartz compared to the material used in our study (see Section 2.1), this technique cannot be transferred to our study.
For reasons of simplicity, a linear relationship between gray level and DNAPL saturation, S n , was assumed. For this, a semi-automatic definition of two thresholds was performed for each gray-scale image. Specifically, the maximum threshold thr max (-) was determined for each experiment by setting the maximum gray level in the first image with DNAPL present at t = 20 s. This corresponded to pixels at the release rod outlet, where the highest DNAPL saturation, S n = 1, occurs (see Fig. 1). The value of thr max remained constant for all evaluation times of each release scenario. Sporadically occurring gray levels above this threshold were excluded, as such levels are likely to be noise.
The minimum threshold thr min (-) was determined from the low gray levels for each image (i.e., dark gray to black; see ESM S2). The latter threshold represents sediment grains surrounded by water only, yielding Shown are the experimental tank (background) and the spatial domains, including coordinates (given in m) of both the experimental observation data (IPA data, green box) and the numerical model configuration (red box). In addition, boundary conditions (b.c.) defined for specific model stages (blue, orange, and pink), as well as the linear pressure gradient are given. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) a DNAPL saturation S n = 0. Gray levels below thr min were attributed to noise. The minimum threshold was determined by iteratively changing thr min starting from the lowest gray level l(x, z) (-) present in the grayscale image. Here, the horizontal image coordinate is given as x (-) and the vertical one as z (-). For each iteration, the absolute difference defined between experimentally known volume balance V bal (-) (see Engelmann et al., 2019b) and the IPA-related volume balance V IPA (-) was calculated as Hereby, the IPA-related balance was determined as with image pixel counter c (-), the DNAPL saturation value assigned for each image pixel S n,c (x, z) (-) and the sum of all image pixels p (-). Here, S n,c (x, z) values were derived using the following linear regression: The iteration was terminated once V dev becomes minimum. Due to illumination fluctuations in space and time, the minimum threshold was individually defined for each image. The performance of the aforementioned iterative determination of thr min was estimated by calculating the mean percentage difference over the first 100 images j (-) per DNAPL release scenario and relating it to the experimental volume balance V bal : Then, in order to prepare the DNAPL saturation data for the calibration of the numerical setups (see Section 2.3), two further data processing steps were performed. The S n distributions at the evaluation times selected partly showed scattering in regions, where DNAPL occurrence is physically implausible (e.g., at left or right upper margins of the tank). To exclude these noise-related patterns, image cells having S n values below or equal to 0.04 were set to zero. Then, in order to match the DNAPL mass balance, a factor equal for all image cells was iteratively applied until the mass deviation fell below a threshold value of 10 − 4 kg. Finally, the IPA-based data was downscaled from the original resolution (0.1 × 0.1 mm 2 ) to the numerical grid resolution (2 × 2 mm 2 ) by considering each 20th pixel only.
The raw TIFF format images acquired from the laboratory experiments are provided in ESM S1. Furthermore, in ESM S2, intermediate results for individual IPA steps as well as downscaled saturation profiles used as input for model calibration (2 × 2 mm 2 ) are presented.

Conceptual setup and numerical implementation of reference scenarios
The conceptual model serving as the basis for the numerical reference setup is presented in Fig. 2. All required model parameters are given in Table A1. According to the experimental setup (see Section 2.1), the system is assumed to be two-dimensional, homogeneous, isotropic, and isothermal. Capillary hysteresis was assumed to be negligible, limiting the numbers of parameters for calibration. This also mitigated the possible outcome of two or more final solutions with equal goodness of fits. Including hysteresis effects into the model configuration may lead to better model fitness as shown by Sookhak Lari et al. (2016); however, the laboratory setup allowed for the experimental determination of retention curve parameters for water drainage only (see Section 2.2). Diffusion, adsorption, biodegradation and reaction of the DNAPL are not considered. Phase transition effects (i.e., dissolution of DNAPL into the aqueous phase) are negligible within the time span of the experiment (see Section 2.1). Due to system symmetry, only half of the inner tank domain was numerically simulated to save computational effort.
A grid size of 2 × 2 mm 2 for the vertical cross section domain was selected as a compromise between numerical accuracy and model run time. The third dimension was single cells with a length of 2 cm in order to gain mesh volumes representing the tank layout (see Section 2.1) and to easily check volumetric balances. The mesh consisted of 75 columns and 120 rows, yielding 9000 nodes. All model domain margins were defined as no-flow conditions. Two steps were undertaken to define the numerical setup: mesh creation (stage 1) and generation of fully watersaturated initial conditions with linear pressure distribution (stage 2). Then, for each reference scenario of the three porous media types, the following sequence was followed: DNAPL release (stage 3) and DNAPL migration (stage 4). For initialization (stage 2), the topmost node row was assigned a first-order boundary condition, defining ambient pressure using a pressure value of 101,325 Pa. The leftmost and rightmost node columns were assigned with a linear hydraulic pressure gradient, where the lowermost nodes show a pressure of 103,673 Pa. In addition, for each of these nodes, saturations for water phase S w , DNAPL phase S n and gas phase S g were assigned values of 1.0, 0.0 and 0.0, respectively, in order to specify a fully water-saturated domain. The model results of the initialization step (stage 2) were defined as initial conditions for the DNAPL release simulation (stage 3). For the latter, a single node at the DNAPL release location was assigned with a second type boundary condition by specifying an inflow rate of 7.4 × 10 − 4 kg/s for a duration of 20 s (this represents a half-domain release of 10 mL with a DNAPL molar mass M = 250 g/mole, see Section 2.1). The model results of DNAPL release (stage 3) were set as initial conditions for the DNAPL migration simulation (stage 4).
A variety of codes able to numerically simulate multiphase flow processes exist (Sookhak Lari et al., 2019a). We used TMVOC (Pruess and Battistelli, 2002), a numerical integral finite-difference multiphase multicomponent non-isothermal software. This software is capable of simulating fluid flow via a multiphase extension of Darcy's law, which includes retention curves for relative permeability-saturation κ r (S) and capillary pressure-saturation P c (S) (for general theoretical framework of multiphase flow, refer to, e.g., Praseeja and Sajikumar, 2019). The latter curves are implemented for three immiscible phases (water, non-wetting phase, gas) according to the scaling approach after Parker et al. (1987). TMVOC is widely applied for studies related to NAPL spreading and has been benchmarked against other codes and analytical solutions (e.g., Sookhak Lari and Moeini, 2015Lenhard et al., 2018). Furthermore, TMVOC has been verified against observation data at both laboratory and field scales (e.g., Basu et al., 2008;Erning et al., 2012;, 2018a, 2018bZheng et al., 2015). In this study, a parallelized version of TMVOC (Zhang et al., 2008) as implemented in TOUGH3 (Jung et al., 2018) was applied. Simulation runs were performed on the HPC clusters Pearcey (The Commonwealth Scientific and Industrial Research Organisation -CSIRO, https://www.csiro.au/en/Research/Technology/Scientific-co mputing/Pearcey-cluster, last access 06/04/2020) and Taurus (Technische Universität Dresden, https://doc.zih.tu-dresden.de/hpc-wiki/bin /view/Compendium/SystemTaurus, last access 08/04/2020). In sum, all simulation realizations consumed ~ 390,000 total CPU hours.

Model calibration strategy
The basis for model calibrations against the laboratory-scale observation data were the reference model configurations for each porous media type as described in Section 2.3. During calibration of stages 2 and 3, the model configuration with respect to mesh size, time stepping, fluid properties, porous medium characteristics and boundary conditions were left constant except for the calibration parameters. According to the well-defined experimental conditions (see Section 2.1), most model parameters could be determined by hydraulic tests or retrieved from technical data sheets. However, the highest uncertainties were related to the proper definition of the κ r (S) and P c (S) retention curves (e. g., Gerhard and Kueper, 2003). In TMVOC, several options for defining the latter functions exist. We decided to select the κ r (S) and P c (S) functions after Parker et al. (1987), as these are the most physically plausible functions. Here, the exponents implemented in the retention curves after van Genuchten (1980) are based on the Mualem approach (Mualem, 1967). The test of alternative retention curve functions (e.g., Brooks and Corey, 1964;Stone, 1970;Faust, 1985) would have either required the same amount of undeterminable parameters leading to a similar contribution of parameter uncertainty, or to a less physics based description of the capillary system, and was out of scope in this study.
The van Genuchten shape parameters n nw (-) and α nw (1/m), both defined for the interface between water and non-wetting phase , were selected as parameters to be optimized during calibration. The respective values for the interface between water and air, determined through HYPROP® sensor (see Section 2.1), were defined as starting values for setting up the non-calibrated reference model. All model parameters are given in Table A1.
In order to quantitatively compare modeling data against experimental observation data (see Section 2.2), an objective function was defined describing the deviation of the source zone shapes from each other using six different partial objective functions X that were normalized and equally weighted: Hereby, R and MSE represent integral measures, while the remaining criteria are geometric ones. For R, the normalization was defined through For all remaining criteria of Eq. (5), normalization was defined as Hereby, for Eqs. (6) and (7), X min is the minimum and X max is the maximum criterion value. Values of ∆A (-) were calculated by cumulating the deviations of partial areas A i (m 2 ): where A i are defined by S n,i values within the range of For example, the contribution of the first partial area A 1 (m 2 ), defined for source zone areas having DNAPL effective saturation values S n between 0.0 and 0.01, is calculated through multiplication with a weighting factor value w 1 (-) of 0.01. The best fitness is reached when f (-) becomes 1, while the worst fitness is represented by f (-) equal to zero. The normalized objective function f norm,t was calculated for each evaluation time t. The acceptable model fitness per evaluation time t was defined with f norm,t ≥ 0.9. The overall model fitness was then calculated to identify a single parameter set for each porous media type scenario where f sum,norm becomes maximum (i.e., parameter set has best fitness for all evaluation times).
Even with two parameters to be optimized, objective functions can be highly irregular, with multiple local optima and parameter nonuniqueness. In this study, a manual calibration strategy having two subsequent evaluation steps for each porous media type was used. A rough calibration with wide parameter ranges and few intervals was used to sample larger areas of the unknown objective function. Here, specific parameter sets leading to unsuccessful calibration or even model non-convergence could be excluded from further consideration. Then, a fine calibration having narrow parameter ranges and many intervals was performed to identify parameters indicating good to moderate model fitness. During fine calibration, the parameter n nw was varied between 4.0 and 14.0, with intervals of 0.1. Values of α nw were varied between 16.0 1/m and 48.0 1/m, with intervals of 1.0 1/m. Python 3.8 (PSF, 2020) was used as scripting language for IPA configuration/application (see Section 2.2), generating TMVOC input files for each realization, reading TMVOC output files and analyzing calibration data.

Results
The following section presents the main results of this study. Essential information is shown in the figures of this section as well as in the appendices A to D. Additionally, intermediate steps of the evaluation can be obtained from the Electronic Supplementary Materials (ESMs) S1 to S5.

IPA framework application
Using the experimental raw images, the IPA framework (see Section 2.2) was applied to derive DNAPL phase saturation distributions S n . Results are sensitive to the adequate configuration of subsequent steps implemented in the IPA framework (Engelmann et al., 2019b) as well as to the iterative determination of the minimum gray-scale threshold thr min . IPA-related uncertainties with V dev values of 1.48-2.16%, 2.41-2.95% and 2.75-5.07%, calculated based on a sample size of ~ 100 processed images for each release scenario (see ESM S3), were achieved for the porous media glass beads, filtering glass and natural sand, respectively.
All intermediate IPA steps can be retrieved from the ESM S2. Distinctive differences in the pixel frequency histograms of the grayscale images, serving as a basis for calculating S n distributions, can be identified. The background gray scale range slightly depends on the porous media type, with gray levels between 0 and up to ~ 38 for glass beads and filtering glass, and up to ~ 20 for natural sand. Unlike the two artificial materials, the maximum threshold placed for natural sand is below the maximum gray level observed during DNAPL presence. The gray scale range that was used for calculating S n values above zero, representing DNAPL source zones, is largest for glass beads, covering nearly the entire gray scale range up to a maximum threshold of ~ 240. This indicates a good contrast between areas of water, DNAPL and grains for this porous media type. Adversely, the width of the gray scale spectrum is rather small, which may lead to problems for the adequate identification of pores affected by DNAPL saturation.
In the raw and gray scale images before absolute difference calculation, areas of entrapped DNAPL fluid can be identified. However, depending on the trial number, some areas show calculated S n values above zero, while the same areas were identified as zero non-wetting saturation (compare the distribution of blue color above the pooled DNAPL in Figs. ESM S2.6 and ESM S2.18). Especially for filtering glass, areas with S n below 0.04 are clearly attributed to image noise through light reflection (e.g., see Fig. ESM S2.36). For the natural sand scenario, trial #02, such low S n values are indistinguishable from the background (see Fig. ESM S2.45). Therefore, for all porous media types and all trials, cells having S n values below 0.04 were set equal to zero to exclude these ambiguous partial areas from model calibration.
The IPA step images and DNAPL saturation distributions provide indications for DNAPL migration patterns affected by irregularities of grain size and surface non-smoothness. Here, the glass bead scenarios show more or less homogeneous flow patterns which do not largely differ between individual trials. The width and height of pooled and moving source zone areas are almost equal for all three trials for all evaluation times. For the filtering glass scenarios, flow paths spatially separated from each other can be identified when comparing individual trials (e.g., compare Figs. ESM S2.29 against ESM S2.35). While the first two evaluation times show more or less similar flow patterns, this effect becomes more distinctive when reaching migration times of 100 s and later. These observed flow patterns point towards strong influence by gravity-driven vertical fingering, which is likely to occur in homogeneous media with small-scale heterogeneities and rough grain surfaces (Glass and Nicholls, 1996). For the natural sand scenarios, it is obvious that DNAPL migration patterns are more heterogeneous compared to the other two porous media types. Effects of irregular sediment compaction (see Fig. ESM S2.53) and 'local gaps' within the DNAPL source zone can be seen (e.g., see Fig. ESM S2.48). The latter patterns are assumed to be induced by the three-dimensionality of the tank and/or locally less impermeable sediment areas. All in all, larger spatial irregularities in pore aperture lead to heterogeneous source zone formation for the Fig. 3. Exemplary normalized partial objective functions X norm for the glass bead scenario at t = 20 s (after DNAPL release). Fig. 4. Exemplary normalized partial objective functions X norm for the glass bead scenario at t = 60 s (free migration before pooling). Colored areas of f norm,t are above the predefined minimum acceptable fitness of 0.9, while gray areas are below. The latter do not contribute to f sum,norm where colored areas represent parameter space with overlapping acceptable fitness, while white areas represent non-acceptable parameter combinations. porous media types filtering glass and natural sand.

Evaluation of numerical model calibration success
It was possible to run most of the numerical model realizations defined by the parameter combinations as described in Section 2.3. Even after careful evaluation, some model configurations did not converge, especially for later evaluation times (see white areas in partial objective functions shown in Figs. 3, 4 and ESM S4). We assume this is due to solver configurations inappropriate for selected parameter combinations such that specific model run times are exceeded (e.g., limited number of iterations) and consider non-converging realizations as indicators for non-physical parameter combinations.
Partial objective functions X norm after Eq. (5) were calculated for each porous media type and evaluation time (see ESM S4). Exemplary partial objective functions are shown in Figs. 3 and 4. Overall, the objective function distributions, i.e., the two-dimensional parameter space spanned between the retention curve variables n nw and α nw , are entirely different for each of the three porous media types and for each of the four evaluation times. The large dependency on partial objective function type, being either integral or geometrical measures, bears potential for calibration uncertainty.
It can be seen that the areas of better fitness, indicated by red color in Fig. 3, are partly similar for some partial objective functions (e.g., ∆A norm vs. R norm ), but not so for others (e.g., MSE norm vs. ∆w max,norm ). This indicates a strong dependency of model calibration fitness on the choice of partial objective functions. Hereby, the geometrical criteria ∆A norm , ∆h max,norm , ∆w max,norm and ∆(w max /h max ) norm show stronger changes than the integral measures R norm and MSE norm . These changes in geometrical criteria can be attributed to different criteria sensitivities in different stages of migration. For instance, before DNAPL pooling starts, vertical migration mainly controlled by gravitational force provides a high sensitivity for ∆h max,norm but not for ∆w max,norm . Overall, the appearance of each partial objective function depends on the porous media type and the evaluation time t.
Areas with higher fitness remain in similar zones (i.e., same margin or corner) for most criteria and evaluation times. However, this is not the case especially for the criterion ∆(w max /h max ) norm . For instance, when comparing this criterion for glass beads for 20 s against 60 s (see Figs. 3 and 4), parameter combinations significantly lead to changed lengthwidth ratios. This indicates changing DNAPL migration behavior (Kueper et al., 2014), with transition from a stage of ongoing DNAPL spill (i.e., DNAPL-affected areas form geometry central-symmetrically around the release point) towards free phase migration once the spill event has ceased (i.e., vertical movement of DNAPL phase due to gravitational force becomes more important compared in zones where the capillary entry pressure is exceeded by the DNAPL fluid). Another change in the partial objective functions occurs during the transition from free phase migration towards a pooling process (i.e., the DNAPL fluid is hampered in vertical migration and forced to distribute itself in both horizontal directions). While the criterion MSE norm remains similar to previous evaluation times, the other criteria show completely different fitness distributions (see, e.g., Fig. 4 and ESM S4.2).
For the glass bead scenario (see Figs. 3, 4 and ESM S4.1 to ESM S4.2), the criteria dramatically change when comparing the evaluation times 60 s against 120 s. The zones showing better fitness move to low n nw values for the criteria ∆A norm , ∆h max,norm and ∆(w max /h max ) norm . This behavior indicates that the parameter combinations formerly having a good fitness are not capable of adequately representing the DNAPL pooling process. Since the change in the distribution of the integral fitness criteria MSE norm and R norm is not as large as for the remaining geometrical criteria, we assume that the numerical setup is less capable of representing the pooling process, as retention processes become more relevant in later evaluation times. For example, with progressing experimental time, DNAPL migration is more controlled by capillary forces compared to gravitational forces.
For the filtering glass scenario, the integral criteria do not differ as much from evaluation time to evaluation time as the geometrical criteria.
The geometrical criterion ∆A norm shows similar behavior for the first two evaluation times, then suddenly changes at t = 100 s to lower values of n nw . This shift towards lower n nw values is observed for the criteria ∆ h max,norm and ∆w max,norm as well. The amount of parameter combinations with good ∆(w max /h max ) norm values is generally small, except for t = 100 s. For the last evaluation time (Fig. ESM S4.6), the criterion ∆ w max,norm shows very low fitness values for all parameter combinations. This indicates that the parameter values within the defined ranges do not represent the experimental data at this migration stage.
Similar to filtering glass, the integral criteria do not differ from evaluation time to evaluation time as large as the geometrical ones for the natural sand scenario. While ∆A norm shows similar behavior for the first three evaluation times, a shift towards smaller n nw values occurs at t = 300 s. Nearly the entire range of α nw provides good fitness for the criterion ∆h max,norm for all evaluation times except for t = 100 s. The general shape of ∆w max,norm remains similar for the first three evaluation times, indicating a shift towards larger values of n nw . In the 300 s evaluation time of the natural sand scenario, due to model non-convergence, only sparse model results are given for n nw > 11.5. Values of the integral fitness criteria MSE norm and R norm do not differ much from the previous evaluation time. In contrast, the geometrical criterion ∆(w max /h max ) norm captures nearly the entire parameter space, indicating that this criterion is not usable to judge on model fitness in the stage of DNAPL pooling.
Overall, the relevance of the calibration parameters n nw and α nw largely depends on the choice of the partial objective function, the evaluation time and the porous media type. In order to reduce the complexity of individual objective functions, the normalized fitness per evaluation time f norm,t , as calculated after Eq. (1), is shown in Fig. 5. The distribution of f norm,t is subdivided into acceptable fitness with f norm,t ≥ 0.9 (colorized areas) and excluded parameter space (gray areas). It is obvious that the fitness distribution, i.e., coverage area of parameter space and region of parameters n nw and α nw , largely depends on porous media type and evaluation time.
For the glass beads scenario (Fig. 5a-d), larger n nw values in the upper half value range lead to good fitness for the first three evaluation times. In contrast, values of α nw span almost the entire variable range, thus indicating a smaller sensitivity compared to n nw . Areas having acceptable fitness are similar for the first two evaluation times, but increase in the third one. The global optimum does not change much from evaluation time to evaluation time. While the fitness patterns appear similar for the first three evaluation times, a rough change is occurring for t = 120 s. Here, areas with acceptable fitness move to very low n nw values of ~ 4 and values of α nw between 24 1/m and 48 1/m. The latter parameter values do not match the previously determined values, so that the numerical model is not capable of representing the observation data with equal combination of n nw and α nw . It is assumed that, while the undisturbed vertical migration occurring between t values of 40 s and 60 s is well matched, the DNAPL pooling process is less predictable with the numerical model configuration.
For the filtering glass scenario (Fig. 5f-i), the area of acceptable fitness is much smaller compared to the glass bead scenario (Fig. 5a-d).
Most areas of the f norm,t are covered by non-acceptable fitness, giving indication that the two parameters show higher sensitivity for this porous medium compared to glass beads and that the parameter range may not be sufficient for representing the observation data. For all four evaluation times, acceptable values of n nw and α nw are in a range of 7.5-14 and 16-22 1/m, respectively. Hereby, the parameter α nw shows less sensitivity compared to n nw , which is in contrast to the glass bead scenario.
For the natural sand scenario (Fig. 5k-n), the covered area by acceptable fitness is smallest compared to the other two porous media types. In line with filtering glass, this indicates higher parameter sensitivity compared to glass beads. For all evaluation times, low α nw values between 16 1/m and 26 1/m yield acceptable fitness. Interestingly, n nw values below 7.5 lead to acceptable fitness for the first and last evaluation times. In contrast, a shift towards larger values between 8.5 and 14 is observed for the evaluation times of 40 s and 60 s. This behavior indicates model calibration is not unique, resulting in two different parameter combinations with similar objective function values.
Overall, the transient objective functions largely depend on individual evaluation times. To identify a parameter combination calibrating the model against the observation data for each porous media type separately, the cumulative, normalized fitness f sum,norm was calculated as shown in the lowermost row of Fig. 5e, j, o. In general, it cannot be shown whether n nw or α nw is more relevant for calibrating the model, as the distribution of f sum,norm shows different patterns for the three porous media types. The glass bead scenario (see Fig. 5e) shows two separate areas with acceptable fitness, but covers a large portion of the entire parameter space, especially in the upper range of n nw . The optimized fitness was achieved for n nw and α nw equal to 13.8 and 29.0 1/m, respectively. For filtering glass (see Fig. 5j), the optimized parameter set was determined as n nw and α nw equal to 11.6 and 16.0 1/m, respectively.
Here, a single optimum was identified and acceptable parameter combinations cover a small extent of f sum,norm only, with n nw values larger than 7.5 and α nw values below 24 1/m. The natural sand scenario reveals two local optima of f sum,norm (see Fig. 5o). These optima can hardly be separated in terms of f sum,norm value. A best fit was achieved for n nw and α nw equal to 11.0 and 16.0 1/m, respectively. As this parameter set was also identified for the filtering glass scenario, having different porous media properties (see Table 1), it appears likely that either the filtering glass or natural sand scenario may show less calibration success when comparing model against observation data (see Section 3.3).

Comparison between experimental observation data and model results
The successful calibration of the model setup against observation data distributions of the three trials per porous media type (calculated as mean S n ) yielded optimized parameter sets for each porous media type (see Fig. 5). As a next step, an evaluation of calibration success was performed by comparing the calibrated model output data for each evaluation time against the observation data of each experimental trial (see Fig. 6). This analysis will help to estimate experimental repeatability and to judge the feasibility of the partial objective functions. Furthermore, the capability of the numerical model to represent the transient DNAPL migration process under the defined system conditions can be identified. First, an intercomparison of all partial objective functions was carried out (see Fig. 6 and ESM S5). Then, DNAPL saturation distributions were directly compared with absolute S n values for experimental and modeling data, as well as absolute differences and partial areas (see Fig. 7). The data of all porous media types are shown in the appendices B, C and D, Figs. B1 to B3, C1 to C4, and D1 to D4, respectively.
For glass beads (Fig. 6a), the overall fitness is good for all evaluation times, with mean R above 0.8 and MSE below 0.004. The geometrical criteria indicate larger errors for the two later evaluation times, especially for t = 120s. Here, the error in source zone width-height ratio ∆ (w max /h max ) gets values of ~ − 2.0. This means that the calibrated model generates DNAPL source zones of width-height ratio that are two times larger than those observed in the experiment.
For filtering glass (Fig. 6b), all criteria show the poorest goodness of fit compared to the other two porous media types. Values of R start with ~ 0.7, but rapidly drop to ~ 0.27 for the third evaluation time t = 100s, with subsequent rise up to 0.56 for the fourth evaluation time. Interestingly, MSE values show a continuous decrease in accuracy. The worst model fitness is achieved for t = 300s, with large deviations in source zone height (∆h max ∼ − 0.1m) and width (∆w max ∼ 0.11m). While R values are above 0.5, the width-height ratio ∆(w max /h max ) reaches values of ~ 6.4. The latter parameter indicates a non-successful calibration for this evaluation time. When comparing the spatial distribution of S n (see Figs. C1 to C4), although all migration stages are represented in principle, it is obvious that the model parameterization does not accurately simulate the experimental data. The distribution of partial areas shows similar patterns, but the spatial distribution of S n is poorly matched by the model. It seems that the migration is occurring slower in the experiment compared to the model. As migration patterns largely vary for each experimental trial due to exaggerated fingering effects (see

Table A1
Summary of relevant model parameters left constant during calibration, partly separated in porous media types of the experimental scenarios (G: glass beads; F: filtering glass; S: natural sand). Parameters relevant only for processes that were deactivated in TMVOC (e.g., diffusion, expansion/compression, or non-isothermal effects such as heat conduction) did not show effects on model results and are, therefore, not shown. Remaining default parameters can be retrieved from the TMVOC manual (Pruess and Battistelli, 2002 Figs. C2 to C4), discrepancies between experimental and modeling data depend more strongly on trial numbers compared to glass beads.
Overall, calibrating the model against the heterogeneous S n observation data was not possible, and filtering glass does not represent an appropriate compromise between artificial and natural porous media types. Due to the smooth surface characteristics of filtering glass, similar to the glass beads, it is assumed that capillary hysteresis is negligible. As migration velocities are similar for both filtering glass and natural sand (see, e.g., experimental observation data presented in ESM S1), parameter combinations leading to best model fitness that are similar for these two porous media types would have been expected. Also, due to issues measuring n aw and α aw for the highly permeable porous media used (see Table 1), even with determinable scaling factors to transfer the van Genuchten parameters from air-water to DNAPL-water interface , experimental estimates for the two calibration parameters n nw and α nw are subjected to large uncertainty. Therefore, we assume that the multi-criteria objective function is not fully suitable to compare random flow patterns in filtering glass against continuous flow simulated in the numerical model. The first two evaluation times of the natural sand scenario (Fig. 6c) show good model fitness that is comparable with the glass bead scenario, with mean R values above 0.8 and mean MSE below 0.004. The geometrical criteria indicate small errors as well. However, for the last two evaluation times, the agreement between model and experimental data worsens, with mean R values dropping to 0.55 and discrepancies of mean ∆h max up to 0.07 m. Compared to the other two porous media types, the variance of criteria values is larger, with stronger dependence on trial number. Therefore, although a good calibration was achieved, the repeatability of natural sand scenarios is less compared to DNAPL release into artificial porous media. Hence, preparing and performing trials in natural porous media needs more care. Unlike the filtering glass scenarios, and with accuracy comparable to the glass bead scenarios, a good agreement for spatial S n distributions was achieved (see Figs. D1 to D4). The time frames and spatial geometries are well simulated. However, in the last evaluation time during the pooling process, larger S n values are not given in the experimental data. It is assumed that both the IPA threshold application (see Section 3.1) and three-dimensional migration affect the observation data. As shown in Fig. 6c, a large variation of partial objective functions revealed a strong dependence on trial number. This effect can be seen in the S n distributions as well (see Figs. D2 to D4).
Overall, as shown in Figs. 6 and 7, the model fitness is consistently reduced with increasing migration time for all three porous media types and all partial objective functions. Also, deviations between experimental data and model results are larger for individual experimental trials compared to the mean S n data (exception: ∆h max for filtering glass; see Fig. 6b), as models were calibrated against the latter. Later stages of DNAPL migration are, therefore, less represented by the calibrated numerical model. Especially the pooling process on top of a DNAPLimpermeable sediment layer is less captured by the model.
In Fig. 7, the comparison is presented for the glass beads scenario with mean observation data of all trials. It can be seen that the transient DNAPL migration process is well captured by the calibrated model. All individual migration stages (release, migration from boundaries, and pooling) are given by the model, and general features such as geometrical properties (esp. source zone width) and arrival time of the source zone at the aquitard's top surface (see Fig. 7c) are simulated with good agreement. It can be seen that the distribution of partial areas in the model shows similar patterns compared to the experimental data for all evaluation times. With increasing evaluation time, the extent of the vertical source zone increases for the model, but the experimental data does not show this effect. Furthermore, S n values above 0.3 are not present in the experimental data. Both effects are due to threshold application as necessary IPA step for excluding background information (see Section 3.1). The comparison with individual trials (see Figs. B1 to B3) does not reveal a significant dependence of model accuracy on trial number.
In analogy to glass beads, the comparison of experimental and calibrated modeling results for filtering glass (see Figs. C1 to C4) and natural sand (see Figs. D1 to D4) show partly similar disagreements. The first two evaluation times are well captured by the numerical model, while the two later times show more and more differences. The agreement for filtering glass trials is poor, as indicated in the aforementioned (see Fig. 6) partial objective functions. For natural sand, the calibrated model is able to represent the lowermost part of SZG as well as the outer source zone margins during evaluation times 1-3, (see Figs. D1 to D3). Also, though less accurate, the pooling width is well captured (see Fig. D4). Gaps within the observation data source zones showing S n values equal to zero are not given in the model. As stated further above, these missing source zone areas may be attributed to three-dimensional flow (i.e., flow normal to the images), so that parts of the migrating DNAPL are not captured by the camera.

Critical evaluation of main findings
Overall, all methods applied in this study led to consistent and physically plausible results, which may be transferred to other applications as well. This confirms the applicability of the IPA framework on experimental tank setups, as well as the feasibility of TMVOC for simulating DNAPL migration in porous media. However, some aspects of the results indicate distinct limitations of the methodologies. In the following, we critically discuss the current perspective of investigations related to image experimental works, image processing and analysis, as well as numerical modeling (see Sections 4.1, 4.2 and 4.3, respectively). Furthermore, we formulate suggestions on how to potentially improve the existing methods. This outlook may have impact on the development and improvement of future investigations in adequately understanding DNAPL release into porous media.

Experimental simulations of DNAPL migration
The experimental setup was defined as a system with the lowest amount of degrees of freedom to allow for the controlled release of a DNAPL fluid into different sediments. Especially the release method is of high importance, as this boundary condition is relevant for the first migration stage "source on" (Kueper et al., 2014) and affects later stages. The gravity-driven outflow from a reservoir together with a double-tube release rod, which was used in this study, allowed for repeatable DNAPL release of known fluid volume and reduced air entrapment and sediment disturbance to a minimum. Numerical simulation results give proof that the aforementioned release method is adequate for exact experimental trials (see Section 4.3).
In many NAPL-related experimental studies, spherical glass beads were utilized to mimic non-consolidated porous media having properties for optical imaging being advantageous over natural sediments. In this study, filtering glass was tested as a cost-efficient alternative. However, presumably due to the sharp grain surfaces, highly irregular DNAPL migration patterns with exaggerated small-scale fingering effects in both vertical and horizontal directions were observed These preferential flow paths could not be repeated for each experimental trial and are considered to occur randomly depending on grain shape distribution and slightly irregular sediment compaction.
In all cases, non-consolidated porous media types with coarse grain size fractions were used for the experimental scenarios. Coarser material was chosen to minimize the longer-term effects such as adsorption, dissolution and degradation of the dyed DNAPL surrogate and, ultimately, provided a system which could be well maintained and reused for several trials with repetition consistency. The relatively high migration a According to Reid et al. (1987). b Calculated assuming a solubility of 12 mg/L, with molar mass of water M w = 55.56 mol.           velocity compared to more natural sediments produced slightly unsynchronized data, so that start times of DNAPL release at the solenoid valve, from photo acquisition, and with the model are not perfectly matched. Although this bias is small compared to the entire scenario duration of up to 300 s, in further studies image acquisition could be improved by using high-resolution videos with 25 images per second. In addition, in order to reduce temporal bias and to represent more natural porous media present at contaminated field sites, multi-fraction porous media with smaller fraction sizes and, hence, lower intrinsic permeability, should be used.
Here, more efforts are to be placed on better sediment characterization to improve process understanding and to reduce the number of unknowns in the numerical model. For instance, by determining the organic carbon content and by undertaking batch trials with different ratios between dyes, water, grains and DNAPL, adsorption isotherms could be determined for consideration in the numerical model. In addition, the retention curve parameters n nw and α nw defining the capillary pressuresaturation curves should be measured for water-DNAPL interface (e.g., Lenhard and Parker, 1987;Alazaiza et al., 2019;Mao et al., 2020) and in both drainage and imbibition direction to identify the relevance of capillary hysteresis typically present in natural sediments (e.g., Pasha et al., 2014;Sookhak Lari et al., 2016).

Reflective optical imaging and image processing
The IPA framework presented in Engelmann et al. (2019b) represents a straightforward approach to convert images acquired through optical imaging in the VIS spectrum to DNAPL saturation distributions, which serve as observation data for calibrating models. By nature, optical imaging is limited by background exclusion, i.e., specific parts of the scattered color spectrum are to be excluded to distinguish between background (e.g., grains or water phase) and the DNAPL phase. The definition of lower and upper thresholds in the grayscale spectrum leads to missed DNAPL saturations especially in the smaller value range, whereby entrapped DNAPL areas are visible in raw images. Here, camera sensors without UV and IR radiation filter may provide more insight into the color spectrum of raw images. Fluorescent dyes in combination with dyes visible in the VIS spectrum may be beneficial to better visualize interfacial areas between the wetting and non-wetting phases when using hydrophilic and hydrophobic tracers. Further details of the saturation distribution may be acquired by using dual-dyed setups, i.e., when both fluid phases (DNAPL and water) are dyed with different colors.
Reflective optical imaging represents a straightforward technique for easy visualization of two-dimensional phase distributions. However, DNAPL migration occurs in all three spatial directions and is, therefore, potentially missed. Here, taking raw images from both tank sides with identical illumination and camera settings would be beneficial to compare differences to better estimate the error given by the assumption of quasi-two-dimensional conditions within the tank layout. Besides light spectrum methods, geophysical techniques such as computer tomography or electrical resistivity tomography may complement optical imaging, with more insight into the porous media structure to quantify the effects of three-dimensional migration (esp. the relevance of DNAPL flow normal to the photo image) and to validate the quasi-twodimensional tank setup approach. Independent from the aforementioned potential improvements of indirect phase visualization, optical imaging should be supported by direct physical sensors such as pressure transducers or time-domain reflectometry sensors. Having other sensor types would be beneficial for calibrating the DNAPL saturation distributions calculated by the IPA framework against point-like physical information from the system.
In this study, a linear relationship between the gray scale spectrum and the DNAPL saturation between two endpoints of lowest and largest S n values was assumed. However, these relationships may be non-linear (Wang et al., 2019). For example, in multi-colored porous media, the relationship could be influenced by colored grains that have a similar reflective property as the dyed DNAPL surrogate. This may lead to an underestimation of the maximum DNAPL saturation. A more robust and sophisticated method for experimentally determining these relationships would lead to more accurate DNAPL saturation distributions. This would substantially improve observation data quality and, hence, support the numerical model evaluations.

Numerical modeling used for predicting source zone formation
After calibration against the three porous media types used, the numerical model was capable of simulating the Darcy-scale DNAPL migration. Relevant source zone formation features such as release event, post-release entry pressure, downward migration aside from tank boundaries and pooling on the aquitard were adequately represented by the model output. Interestingly, the agreement between observation data and modeling results are similar for both the glass beads and natural sand scenario, with discrepancies occurring mainly in the smaller DNAPL saturation spectrum and during later migration times. In contrast, the calibration of the model for the filtering glass was not successful. Strong pore-scale effects (vertical fingering, preferential flow paths) led to migration patterns of large variation and a high degree of randomness depending on the trial number. Also, though physical properties of filtering glass are different compared to natural sand, the migration velocity is nearly similar for both porous media of equal fraction size. A further widening of the parameter space may lead to parameter sets with better model fitness; however, such random smallscale flow path variations may not be representable with the model configuration simulating physically stable migration regimes. Results of this study give indications that filtering glass may not represent a beneficial, cost-efficient substitute for glass beads.
The experimental scenarios and modeling results revealed that the transient DNAPL source zone formation process largely depends on both porous media and fluid properties. While leaving all system properties, including fluid properties, constant throughout all experimental and modeling scenarios, changes of the parameters α nw and n nw that parametrize the relationships of capillary pressure-saturation and relative permeability-saturation have major influence on DNAPL migration. This stays in contrast to findings of a previous study, where sensitivities for retention curve parameters were found to be lower than for other system parameters such as porosity or permeability (Zheng et al., 2015). The calibrated retention curve parameters have a similar magnitude as those given in a number of other studies (e.g., Erning et al., 2012;Zheng et al., 2015;Bortoni et al., 2019), so that these values appear to be physically plausible. As the characterization of fluids (e.g., density, viscosity, interfacial tension) is generally easier than for porous media having different sources of ambiguity (e.g., pore-scale heterogeneity, layered structures, capillary hysteresis), careful attention should therefore be placed on adequate sediment characterization. A comprehensive parameter sensitivity study could quantify the relevance of other factors such as the DNAPL release boundary condition for source zone formation, but this was not of focus in this study. In addition, testing other types of retention curve functions (e.g., Brooks and Corey, 1964;Stone, 1970;Faust, 1985) for the model could provide further insights in the validity of the theoretical understanding of this two-phase flow system. However, the fundamental problem remains, i.e., that parameters have to be defined through calibration.
As model calibration success strongly depends on the adequate definition of the multi-criteria objective function, which consisted of integral and simple geometrical criteria in this study, we see large potential in the further improvement of the objective function. Especially in light of misleading integral measures that cannot describe geometrical matches between observation data and modeling results, alternative shape characterization techniques such as pseudo-landmark analysis may be considered (e.g., Lele and Richtsmeier, 2001). With such statistical approaches, very complex geometrical shapes can be quantitatively described with mathematical criteria, improving the comparability of source zones resulting from observation data and numerical model output.
Generally, model fitness was significantly lowered with increasing migration time, with trends consistently shown in most fitness criteria. We assume that the model accuracy decreases once capillary forces gain preferential influence over gravitational forces, whereby the latter are easier to be implemented in the governing equations implemented in the model and require less parameterization efforts. Here, the migration driven by capillary forces is weakly reproduced by the model. Reasons could be physical effects such as capillary hysteresis (Pasha et al., 2014;Sookhak Lari et al., 2016), which are not yet considered in the model configuration. Also, fluid or dye adsorption may have strong influence on observation data, especially for natural porous media with organic carbon content and potential to variable pH values of the surrounding aqueous phase. In addition, three-dimensional migration may occur especially for natural porous media which are subject to heterogeneous sediment compaction. Here, a three-dimensional model configuration may be capable of better representing irregular migration patterns, yet may largely suffer from parameter non-uniqueness due to elevated data (geophysical imaging required) and parameter uncertainties (e.g., determination of three-dimensional permeability tensor).
With the capability of the adequately calibrated numerical model reproducing longer DNAPL migration times and capturing most source zone formation processes, a next step could be to transfer the model configurations to the field scale. As field site information are scarce with respect to source zone geometries (e.g., Engelmann et al., 2019a), first it would be beneficial to identify exploration demands with as much detail as possible by performing conceptual parameter sensitivity studies under consideration of known field site characteristics. Here, dimensional analyses may support the proper conceptualization of numerical models (e.g., Nield and Bejan, 2017).

Conclusions
In this study, a quasi-two-dimensional, laboratory-scale tank setup was used for generating experimental observation data of DNAPL phase saturation. Three different types of homogeneous porous media, having a bandwidth between artificial and natural character, were used to form the basis for evaluating the relevance of porous media properties, with special focus on the parameterization of the κ r (S) and P c (S) retention curves characterizing DNAPL migration. Three repetitions were run for each porous media type to judge experimental feasibility and to quantify experimental bias. The existing IPA framework after Engelmann et al. (2019b) was adapted to visualize more plausible S n distributions from grayscale images. Its application led to physically plausible observation data that could serve as basis for model calibration. A numerical reference model was defined using relevant experiment information and implemented using the multiphase flow model TMVOC (Pruess and Battistelli, 2002). Ranges of the parameters α nw and n nw were specified and model realizations were run on a HPC infrastructure to calibrate the numerical setup for the three porous media scenarios individually based on trial means for S n . This classical Monte-Carlo calibration strategy led to a close model fit of numerical to laboratory data and remaining deviations were used to identify shortcomings of the IPA framework and the modeling implementation, as well as to delineate respective uncertainties.
Overall, the methodologies presented in this study allow for the consistent visualization of DNAPL migration in porous media and are transferable to similar systems with large flexibility. However, future assessments involving alternative retention curve functions could bear insights to judge on the validity of the theoretical understanding of twoand three-phase flow systems. Model calibration strategies should be supported by sound measurements of parameters relevant for retention curve definition. Main findings also point towards that more research is required to describe DNAPL source zone geometry through both experimental evaluations (i.e., tank experiments) and scenario simulations (i.e., numerical models), before site assessment will become capable of yielding robust projections for DNAPL contamination.

Supplementary materials
The following Electronic Supplementary Material was prepared to support methodologies and findings presented herein: Electronic Supplementary Material S1: "Raw images" Electronic Supplementary Material S2: "Processed images" Electronic Supplementary Material S3: "IPA fitness" Electronic Supplementary Material S4: "Partial objective functions" Electronic Supplementary Material S5: "Model verification".

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.