Estimating Characteristic Grain Sizes and Effective Porosity from Hydraulic Conductivity Data

In groundwater applications, knowledge about a range of hydraulically relevant parameters is beneficial to counteract data scarcity and enhance parameter robustness. The present study provides a novel procedure to reverse engineer representative grain diameters and effective porosity from values of hydraulic conductivity. A large data set comprising more than 500 sieve curves and more than 1000 values of hydraulic conductivity was analyzed for interparameter dependencies between characteristic grain sizes, hydraulic conductivity and effective porosity. Based on that analysis, a mathematical model consisting of a set of empirical and physically based equations was developed. The procedure was applied on another large data set and its accuracy, deviations and error propagation were quantified. The method works best for well‐sorted sediments. Practical examples are given in order to demonstrate the range of applicability of the procedure. The input is interchangeable, such that different input data enable calculating characteristic grain sizes, hydraulic conductivity and effective porosity. A free, open‐source stand‐alone GUI of the procedure is available for download.


Introduction
The knowledge of hydraulically relevant parameters of aquifer material is essential for calculating groundwater flow and transport. Such parameters, for example, hydraulic conductivity K and effective porosity n e (which, unlike total porosity, is the fraction of void space relevant have been widely used and published, the original full granulometric data are often not included. When, for example, the Hazen (1892) equation is used, mostly only the value of d 10 is included in the data set. This however, prevents the later application of other methods, which use different characteristic grain sizes, for example, d 20 for the USBR method (Creager et al. 1945). Several studies have investigated and compared the different granulometric methods and showed some overlap in results but also some differences (e.g., Odong 2007;Song et al. 2009;Říha et al. 2018). The only physically based equation relating a characteristic grain size with hydraulic conductivity and effective porosity is the Kozeny-Carman equation (Kozeny 1927;Carman 1956), which can be derived from the Navier-Stokes equations for flow in a bundle of capillaries of equal length (Bear 1988). It is widely used and its use is sometimes preferred in literature over the empirical grain size curve methods (Carrier III 2003). The Kozeny-Carman equation requires the knowledge of a mean grain size diameter (commonly expressed as d 50 [L]) and effective porosity n e (−) as follows (Bear 1988): (1 − n e ) 2 · d 2

50
(2) where μ (M L −1 T −1 ) is the dynamic fluid viscosity, ρ (ML −3 ) is the fluid density, and g (L T −2 ) is the gravitational acceleration. The objective of this study is to expand the range of application of the granulometric methods described above, by identifying and utilizing connecting links between functional parameters. Using the newly found interdependencies allows using other methods and thus obtaining a bandwidth of values of hydraulic conductivity, which in turn could yield a more reliable mean value. A bandwidth of hydraulic conductivity values is useful, for example, to determine more realistic limits for the calibration of numerical groundwater models.
Additional data from the grain size distribution may also be beneficial for the calculation of transport processes. For example, typical values for longitudinal dispersivity at the pore-grain scale can be obtained from the mean grain diameter (Spitz and Moreno 1996).
The scarcity of complete data sets, that is, with the full results of the sieve curve analysis and not just d 10 , is a general problem. In the present study, we present a new procedure for the calculation of characteristic grain size diameters and effective porosity from hydraulic conductivity data alone. The resulting "virtual" characteristic grain sizes (and porosity) can then be used for further calculations using other equations and models. Therefore, we analyzed a large data set consisting of more than 500 representative grain size curves and more than 1000 calculated hydraulic conductivities. Based on this data set, we derive a set of empirically and physically based equations to obtain a mathematical model for the calculation of the unknown representative grain diameters d 10,20,50,60 and effective porosity. We further demonstrate the validation of the newly proposed procedure, quantify its deviations and comment on potential errors and its application range. In the supplementary material, we provide a Python-based script of the procedure with exemplary input files. We further provide a GUI of the procedure for free download. Finally, we provide a large merged data set containing measured representative grain diameter and corresponding hydraulic conductivities.

Data Sets
The data sets used in this study comprise (1) the data set published by Rosas et al. (2014) and (2) an own data set, called BGR data set here. Both contain not only grain size distribution-based hydraulic conductivity data, but also a range of measured representative grain diameters. The latter were used to verify the proposed procedure. The data set by Rosas et al. (2014) includes 431 data samples, each containing grain size distribution data in the form of characteristic grain diameters d 10,20,50,60 and calculated hydraulic conductivities stemming from a variety of methods (using, e.g., the Hazen (1892), Beyer and Schweiger (1969) and Kozeny-Carman [Kozeny 1927;Carman 1956] methods). These samples come from a multitude of depositional environments (dune, beach, shallow offshore, alluvial sediments) from different locations worldwide. From the 431 data points, 259 satisfy all necessary preconditions for the proposed procedure (1 < Cu < 2.5, 0.06 mm < d 10 < 0.6 mm, where C u = d 60 /d 10 [−] is the coefficient of uniformity) and are used here. In total 777 values of hydraulic conductivity (259 values based on the Hazen 1892, 259 values based on the Beyer and Schweiger 1969 and 259 values based on the Kozeny-Carman;Kozeny 1927;Carman 1956 methods) are used.
The second data set consists of samples from different projects of the Federal Institute for Geosciences and Natural Resources (BGR), Germany (Table 1; Tewolde et al. 2019;Houben et al. 2020;Post et al. 2022;BGR 2022). It includes dune and beach sediments from the island of Langeoog, Germany, as well as fluvialaeloian sediments from Namibia and fluviolacustrinelacustrine sediments from Chad and comprises in total 273 samples containing the representative grain diameters d 10,20,50,60 and 150 Beyer and Schweiger (1969)based and 96 Hazen (1892)-based hydraulic conductivities that satisfy the preconditions for use of the proposed procedure. In total, 236 values of hydraulic conductivities are used from this data set. The BGR data set is provided as supplementary material named BGR_data_set.csv to the present manuscript. Some general information about the BGR data set is summarized in Table 1.
The Rosas et al. (2014) data set and the BGR data set were merged and the merged data set contains in total 1023 values for characteristic grain diameters  d 10,20,50,60 and hydraulic conductivities. That data set is provided in the electronic supplementary material named Merged_data_set.xlsx . After randomly redistributing, half of the data set (referred to as calibration data set) was used for the calibration of the mathematical model of the proposed procedure and the other half was used for validation (referred to as validation data set). Note that the grain sizes from the Rosas et al. (2014) data set were obtained from classical sieve analysis (as described in, e.g., Tanner and Balsillie 1995), while the BGR data set was obtained from an optical analysis. For the latter, the particle sizes were determined on about 10 g of dried and disaggregated sample material utilizing a CAMSIZER (Retsch Technology GmbH, Haan Germany 2011), which continuously images a falling curtain of grains and measures the grain size of individual grains in each image using two digital cameras (CCD). As with sieve analysis, the fractions <63 μm and >2 mm were removed prior to analysis. Also note that, depending on the individual measurement devices and the experience of operators, representative grain diameters may have different levels of accuracy which may lead to discrepancies. This effect may contribute to the spreading of data points in the data set used for the derivation of the proposed procedure.

Calibration
For the calibration of the proposed procedure, the Nelder and Mead (1965) simplex algorithm, which is preprogrammed in the Python environment (van Rossum 1995), was used (minimize function as described in, e.g., Johansson 2015) and applied on the calibration data set. This downhill algorithm is widely and successfully used in numerous engineering and hydrogeological applications (e.g., Clarke 1973;Barati 2011;Peche et al. 2019;Feng et al. 2020). In its present application, the objective function describes deviations from model results and measured values and the parameter space consists of a range of values of the dimensionless numbers and coefficients. A visualization of the concept of the Nelder and Mead (1965) simplex algorithm is given in the Appendix in Figure 1 (SI1).

Quality Control
For the evaluation of the proposed procedure, the dimensionless Nash and Sutcliffe (1970) efficiency coefficient (NSE) is calculated for each procedure step as follows: where y i = y 1 . . . y n is the data set array (measured data) and f i = f 1 . . . f n is the newly predicted data array obtained from the proposed procedure. Here, values of NSE are qualitatively evaluated by using the NSEclassification after Moriasi et al. (2015).

Python Script and Stand-Alone GUI
The proposed procedure was incorporated into a Python script (van Rossum 1995), named HYPAGS.py (acronym for HYdraulic Properties And Grain Sizes), provided as electronic supplementary material. An exemplary input file containing all values of hydraulic conductivity from the Rosas et al. (2014) and the BGR data set named input.csv is also included as electronic supplementary material. A flowchart of the script is given in Figure 2. The python script is programmed so that representative grain sizes and effective porosity are calculated from values of hydraulic conductivity.
A freeware stand-alone GUI for Windows named HYPAGS.exe was developed in order to easily apply the procedure in practice. The GUI enables to calculate (1) d 10 , d 20 , d 50 , d 60 , and n e , based on a single input of K, (2) K, d 20 , d 50 , d 60 , and n e , based on a single input of d 10 , and (3) K, d 10 , d 50 , d 60 , and n e , based on a single input of d 20 . The GUI (and the corresponding Python source code) is provided for download: https://github.com/APeche/ HYPAGS/releases/tag/Version1.2.2. Screenshots of the GUI are given in Figures 1 and 7 and in the Appendix in Figures 5 and 6 (SI1).

Mathematical Model
The newly developed procedure derives the characteristic grain diameters d 10,20,50,60 (and thus obtaining a virtual sieve curve) only from given hydraulic conductivity values that were calculated with any of the following methods: Hazen (1892), Beyer and Schweiger (1969), and Kozeny-Carman (Kozeny 1927;Carman 1956). It further enables to estimate the effective porosity n e . The procedure uses calibrated dimensionless numbers ( ) and coefficients (c, a). In a first step, the procedure uses the given hydraulic conductivity K L T −1 to calculate the characteristic grain diameter d 10 (L), based on a dimensionless number, detailed below. Subsequently, d 10 is used to calculate d 20 (L) based on an empirical equation, which utilizes an assumed linear dependency (similar to Alyamani and Ş en 1993) between both, obtained from the data set. In a next step, d 20 and K are used to iteratively solve for d 50 (L) and the effective porosity n e (−), using the Kozeny-Carman equation. Lastly, d 50 is used to calculate d 60 (L) based on an empirical equation, which again utilizes an assumed linear dependency between both grain sizes, obtained from our data set. A flowchart of the procedure and the corresponding mathematical model for each procedure step are given in Figure 2.
Following Wang et al. (2017), the Buckinghamtheorem (Buckingham 1914) was used to derive an equation for the calculation of d 10 based on K and a dimensionless number. The underlying assumption is the general relation between K and d 10 (Equation 1) as expressed in following proportionality (e.g., Hazen 1892; where κ (L 2 ) is the intrinsic permeability, and d x (L) may be any representative grain diameter. Using a dimensionless number , that proportionality becomes Defining d x = d 10 and reformulation of Equation 5 yields the equation for the calculation of d 10 as follows: The value of the dimensionless coefficient depends on the method used to calculate K. Values for and corresponding methods are given in Table 2. Values for individual methods will be referred to as parameterizations in the following (e.g., Hazen-based parameterization).
The proposed procedure approximates grain size distributions to be linear between d 10 and d 20 as well as Subsequently, the linear relationship between d 10 and d 20 reads as follows: where c 1 (−) is a calibrated coefficient, which, however, depends on the method used to calculate K. They were calibrated using the data set and are given in Table 2. The linear relationship between d 50 and d 60 reads with a calibrated coefficient c 2 (−), again depending on the method used to calculate K (  = a 3 · d 20 . Values for the a-coefficients were calibrated as previously described and applied on the calibration data set. Values obtained are a 1 = 1.004, a 2 = 1.510 × 10 −4 , and a 3 = 5.788 × 10 −3 . Equations 6, 8-10, 11 are used in the procedure (as visualized for each procedural step in Figure 2). Note that the equations are interchangeable such that any parameter may be calculated with any input, for example, d 20 enables to calculate K, n e , d 10,50,60 . However, this does not apply if only values for d 50 or d 60 , and n e are initially given, since the iterative procedure requires d 20 and K for the calculation of the starting values and no equation of the procedure relates d 20 and d 50 .

Range of Validity
The procedure is only valid for a certain type of data input. It conceptualizes sieve curves as cumulative S-shaped frequency curves (in Figure 3). The procedure is not valid for curves that are bi-or multimodal, for Table 3 Preconditions of the proposed procedure

Method Used to Calculate K Preconditions
Hazen C u ≤ 2.5 Beyer-Schweiger 1 < C u ≤ 2.5 6 × 10 −5 m < d 10 < 6 × 10 −4 m Kozeny-Carman Silts, sands, gravely sands, C u ≤ 2.5 example, poorly sorted sediments. The procedure is valid for grain size distributions representing relatively uniform sediments with a small bandwidth between d 10 and d 60 (as expressed by C u ). It was found that the procedure works best for C u ≤ 2.5, potentially up to C u ≤ 3. Furthermore, the validity of the proposed procedure depends on the preconditions of the method used to calculate the model input (in form of values of hydraulic conductivity). Those preconditions are summarized in Table 3  The procedure is valid for input within the upper and lower bounds of the data set used to develop the mathematical model (Table 4).

Application of the Proposed Procedure
The procedure is applied on the validation data set in order to validate the proposed procedure. Values used for gravitational acceleration and fluid properties are g = 9.81 m/s 2 , ρ = 999.7 kg/m 3 , and μ = 0.001306 kg/(ms), respectively. Since the actual field-groundwater temperatures are unknown, these values correspond here to an assumed groundwater temperature of 10 • C, typical, for example, central Europe and parts of North America. Values for the BGR data set were all calculated based on this temperature, while values for the Rosas et al. (2014) data set are based on an assumed groundwater temperature of 25 • C. Actual individual groundwater temperatures of the data sets are mostly unknown and their consideration would improve the procedure results. This dependency has been analyzed with a sensitivity study of HYPAGS based on the same input but with different groundwater temperatures of (1) 10 • C and (2) Table 5 Quality appraisal of the proposed procedure   respective data set, the grain sizes, although given, are ignored in this step. However, the calculated results for d 10 , d 20 , d 50 , and d 60 are then compared to the measured values from the data set. Values for calculated effective porosity could not be directly compared to measured data since they are not part of the BGR data set and they are only indirectly given for the Rosas et al. (2014) data set, as published in Wang et al. (2017). However, a qualitative comparison with results from Wang et al. (2017) was possible and is shown in the following. Results in the form of NSE values for all parameters calculated with the proposed procedure are summarized in Table 5. With NSE ≥ 0.6, the results can be classified as "good" and "very good." Merely the calculation of d 60 using Hazen-based hydraulic conductivities can be classified as "satisfactory." Validation results in form of measured versus calculated values are visualized in

Analysis of Deviations and Error Propagation
This section quantifies and illustrates the deviations inherent in the method and subsequent error propagation resulting from each calculation step of the procedure. The resulting deviation between values of any characteristic grain diameter from the proposed procedure d x,Procedure (L) and measured values from the data set d x,Measured (L) was calculated as follows: where d x (%) is the relative deviation for any characteristic grain diameter considered in the proposed procedure. Due to the lack of measured data, this could not be done for the effective porosity. Boxplots of the deviation d x are given in Figure 6. The proposed procedure recreates measured values with an overall median deviation lower than 30%. Individually, the median deviations for d 10 , d 20 , d 50 and d 60 yield approximately 13%, 11%, 19%, and 26%, respectively. Corresponding upper quantiles are around 18%, 16%, 29%, and 40%. For future use of the procedure, the obtained median, quantile or maximum deviations may be used as error range.

Demonstration of Applicability-Practical Examples
In order to demonstrate the applicability of the proposed procedure and the mathematical model, in total seven practical examples are presented (where three examples are given in the Appendix). Comparisons of procedure results with literature results are given whenever the latter are available. Since groundwater temperatures are unknown, all examples were calculated based on an assumed groundwater temperature of 10 • C.

Example 1
This example demonstrates the calculation of d 60 based on a given value for d 10 = 9.0 × 10 −5 m, representing sand ridge material from Smerdon et al. (2005). Using the Kozeny-Carman-based parameterization, input of d 10 into Equation 8 yields d 20 = 1.1 × 10 −4 m. Reformulation of Equation 6 for K and calculation using d 10 yields  Table 5. K = 6.3 × 10 −5 m/s. Using d 20 and the corresponding K for the calculation of the starting values for the iterative scheme and its application (Equations 10 and 11) yields d 50 = 1.8 × 10 −4 m and n e = 0.28. Finally, using Equation 9 yields d 60 = 2 × 10 −4 m. Comparison with the measured value from Smerdon et al. (2005) with d 60 = 2 × 10 −4 m shows an excellent correlation. Furthermore, the comparison of the calculated hydraulic conductivity (K = 6.3 × 10 −5 m/s) with the measured hydraulic conductivity K = 1.3 × 10 −5 m/s by Smerdon et al. (2005) (obtained using a slug test after Hvorslev 1951) yields the same order of magnitude. Note that C u = 2.2.

Example 2
This example demonstrates the calculation of d 50 based on a given value for d 20 = 1.89 × 10 −4 m, representing medium sand from Odong (2007). In this case, the Beyer-Schweiger-based parameterization is used. Reformulation of Equation 8 enables to calculate d 10 = 1.26 × 10 −4 m (cf. value from Odong (2007) Odong (2007): n e = 0.44) and d 60 = 2.95 × 10 −4 m. With a C u = 1.8, this example shows the validity of the proposed procedure for steep grain size curves with small C u . An application of the HYPAGS GUI using the present example is demonstrated in Figure 7.

Example 3
This example demonstrates the application of the proposed procedure in order to calculate the effective porosity based on a given value of hydraulic conductivity of K = 4.2 × 10 −6 m/s, from Langguth and Voigt (1980), representing a silty sediment. Using the Kozeny-Carmanbased parameterization, application of Equation 6 yields d 10 = 2.3 × 10 −5 m (cf. value from Langguth and Voigt (1980): d 10 = 2.2 × 10 −5 m). An application of Equation 8 yields d 20 = 2.8 × 10 −5 m and subsequent iterative approximation by means of Equations 10 and 11 gives n e = 0.25, which agrees well with the value from Langguth and Voigt (1980) of n e = 0.28. Further results from the proposed procedure are d 50 = 5.7 × 10 −5 m (cf. value from Langguth and Voigt (1980): d 50 = 5.0 × 10 −5 m) and d 60 = 6.5 × 10 −5 m. Note that this application is slightly out of the optimum validity range of the procedure with C u = 2.8. An application of the HYPAGS GUI using the present example is demonstrated in Figure  3 in the Appendix (SI1).

Example 4
This example demonstrates the application of the proposed procedure for an example with a large C u , beyond the procedure's limits. This example thus does not fulfill the preconditions of the proposed procedure. The sediment represents a nonuniform middle sand with C u = 7.5 (Wagner et al. 2014). In this example, d 60 is calculated based on d 10 = 2.5 × 10 −4 m using the Beyer-Schweiger-based parameterization. Application of Equation 6 reformulated for K and (8)

Summary and Discussion
The present study presents a new procedure to reverse engineer characteristic diameters and effective porosity from hydraulic conductivity data alone. Key developments, findings and contributions are: • feasible, easy-to-apply procedure for the calculation of characteristic grain diameters and effective porosity solely based on hydraulic conductivity data, • prediction of grain diameters according to NSE can be classified as very good to good (and in one case satisfactory), provided that C u ≤ 2.5, potentially C u ≤ 3 , Figure 7. Screenshot of the HYPAGS GUI with results for practical Example 2.
• deviation increases with each calculation step of the procedure, but the median deviation does not exceed 30%, • a stand-alone GUI for Windows named HYPAGS.exe and a Python-based script with an exemplary input file of the data (electronic supplementary material), • a data set including representative grain diameters and calculated hydraulic conductivities (electronic supplementary material), • calculation steps of the procedure can be interchanged, such that any parameter considered in the procedure may be calculated with any single input. In that case, the authors recommend to use the parameterization given for the calculation based on hydraulic conductivites calculated with the Kozeny-Carman equation, which is generally recommended in literature (Carrier III 2003), since the Kozeny-Carman equation is physically based.
The proposed procedure was derived and tested using a data set consisting of more than 1000 samples, representing sediments from all over the world. However, its application range is restricted to wellsorted sediments with relatively small coefficients of uniformity (C u ≤ 2.5 to 3). However, the authors recommend judicious testing before using the procedure for cases with C u > 2.5. The procedure should not be applied in poorly sorted sediments. However, using the mathematical model of the proposed procedure, a new parameterization could be derived specifically for such cases.
The proposed procedure may be applied on values for hydraulic conductivities based on other methods than grain size analyses (e.g., Darcy-experiments, well tests). However, in that case it has to be assumed that these methods yield similar values of hydraulic conductivities. In certain documented cases, it is known that values of hydraulic conductivities stemming from different methods of determination may vary greatly (Cheng and Chen 2007), even by orders of magnitude (Pucko and Verbovšek 2015).
It has to be mentioned that the NSE for quality control may be sensitive to extreme values, and thus even poor models may yield in a NSE that may falsely be classified as having a good fit (Legates and McCabe Jr. 1999). Its use was chosen on a rather pragmatic base because (unlike other quality control metrics) NSE allows a qualitative classification (Moriasi et al. 2015). It is assumed that the error due to use of the NSE is rather small because the practical examples used here represent a large bandwidth of values.