Nation-wide estimation of groundwater redox conditions and nitrate concentrations through machine learning

The protection of water resources and development of mitigation strategies require large-scale information on water pollution such as nitrate. Machine learning techniques like random forest (RF) have proven their worth for estimating groundwater quality based on spatial environmental predictors. We investigate the potential of RF and quantile random forest (QRF) to estimate redox conditions and nitrate concentration in groundwater (1 km × 1 km resolution) using the European Water Framework Directive groundwater monitoring network as well as spatial environmental information available throughout Germany. The RF model for nitrate achieves a good predictive performance with an R2 of 0.52. Dominant predictors are the redox conditions in the groundwater body, hydrogeological units and the percentage of arable land. An uncertainty assessment using QRF shows rather large uncertainties with a mean prediction interval (MPI) of 53.0 mg l−1. This study represents the first nation-wide data-driven assessment of the spatial distribution of groundwater nitrate concentrations for Germany.


Introduction
The increasing use of groundwater as the world's most important drinking water resource underlines the importance of protecting it from pollution (Foster and Chilton 2003). In Germany and many other countries, the quality of groundwater is affected by elevated nitrate (NO 3 ) concentrations attributed predominantly to diffuse nitrogen losses in agricultural systems (Galloway et al 2008, EEA 2018. Nitrate pollution of water systems has far-reaching environmental consequences and poses a serious risk to human health (van Grinsven et al 2006, Erisman et al 2013, Reis et al 2016. The EU Water Framework Directive 2000/60/EC (WFD) was introduced for integrated water protection, aimed at maintaining or restoring the 'good status' of groundwater bodies. With regard to nitrate pollution, this requires measures to reduce nitrogen inputs to meet the WFD quality standard of <50 mg l −1 NO 3 .
The EU Member States are obliged to submit a regular inventory of the groundwater status. The latest report shows that 25% of groundwater bodies in the EU have a poor chemical status, 18% primarily due to nitrate pollution (EEA 2018). The responsible authorities need to develop interdisciplinary, integrative and holistic river basin management plans for the implementation of the WFD, which requires comprehensive mapping of the pollution of water systems and identification of areas under significant pressure (Kallis andButler 2001, Voulvoulis et al 2017).
Estimating the nitrate concentration in groundwater depends on several factors and processes. Land use patterns and the amount of the nitrogen surplus on agricultural land provide indications of input sources. In addition, the hydrogeological and biogeochemical conditions of the aquifer determine the occurrence of nitrate in the water body. Complex natural attenuation processes occur during the vertical displacement of nitrate in the unsaturated zone and the lateral transport in the aquifer in the direction of groundwater flow (Rivett et al 2008). Denitrification, as the most significant process of nitrate removal in the groundwater body, depends strongly on the prevailing redox conditions (Korom 1992, Rivett et al 2008. Denitrification occurs under anaerobic conditions if denitrifying bacteria and electron donors are present (Rivett et al 2008). Such anaerobic conditions are characterised by an oxygen (O 2 ) concentration of ⩽1-2 mg l −1 and iron (Fe) concentration of ⩾0.1-0.2 mg l −1 (Kunkel et al 2004, Rivett et al 2008, Mcmahon and Chapelle 2008. Several studies dealt with the large-scale estimation of redox conditions in groundwater (Tesoriero et al 2015, Close et al 2016, Rosecrans et al 2017, Koch et al 2019 and Ransom et al (2017) identified it as one of the most important parameters for the estimation of nitrate concentrations.
Data-driven statistical approaches have proven their worth in simulating large-scale groundwater quality. Nationwide groundwater nitrate concentration maps were developed with regression models for China (Gu et al 2013) and for the United States (Nolan et al 2002, Nolan andHitt 2006). Machine learning (ML) techniques are receiving more and more attention in the field of water research. Khalil et al (2005) showed the strong predictive capabilities in groundwater contamination modelling of different algorithms, e.g. artificial neural networks or support vector machines. As another effective tool, random forest (RF) developed by Breiman (2001) is frequently applied in water resources issues (Tyralis et al 2019). RF is a very powerful tool for modelling and evaluating large complex data sets with nonlinear relationships between both numerical and categorical explanatory variables. It is further characterised by its robustness and increased predictive performance (Tyralis et al 2019). Comparisons of different machine learning techniques in terms of prediction performance have shown that tree-based models such as RF outperform others (Wang et al 2016, Knoll et al 2019. Such tree-based ML techniques have already been used in several studies on large-scale prediction of groundwater nitrate concentrations (Nolan et al 2014, Rodriguez-Galiano et al 2014, 2018, Wheeler et al 2015, Ransom et al 2017, Tesoriero et al 2017, Knoll et al 2019, Ouedraogo et al 2019. In terms of the estimation of groundwater redox conditions, Close et al (2016) and Wilson et al (2018) performed linear discriminant analyses. Whereas, other studies also made use of ML for predicting groundwater redox conditions or the redox interface (Tesoriero et al 2015, Rosecrans et al 2017, Koch et al 2019, Friedel et al 2020. Due to the complex processes influencing groundwater quality, estimations are accompanied by uncertainties that need to be quantified (Refsgaard et al 2007, Grizzetti et al 2015. As one of the few, Rahmati et al (2019) investigated the uncertainties in machine learning approaches for estimating groundwater nitrate concentrations and showed that it is important to consider both the performance and the uncertainty for model evaluation. Ransom et al (2017) used bootstrapping for a boosted regression tree model and Koch et al (2019) extended RF with geostatistics to assess uncertainties. By modifying RF to quantile random forest (QRF), Meinshausen (2006) provided a method to consider a full conditional distribution for the predictions instead of the mean only. This enables not only the additional output of median predictions, but also to estimate uncertainties by determining prediction intervals. Even though this method is not yet common in water research, it has already been used in digital soil mapping (Vaysse andLagacherie 2017, Szatmári andPásztor 2019).
Here, we provide for the first time a data-driven estimation of groundwater nitrate concentrations for the federal state of Germany, which is characterized by a large share of agricultural land use of around 50% of its area (360 000 km 2 ). The objectives of our study are to develop a RF model combined with QRF to: (1) characterise the spatial distribution of redox conditions in groundwater, (2) estimate groundwater nitrate concentrations considering the redox conditions, and (3) determine prediction intervals as an uncertainty assessment.

Materials and methods
The general modelling process involves four steps, i.e. setting up a database, data pre-processing, predictive modelling and uncertainty analysis with RF combined with QRF (figure 1). The predictive modelling as well as the uncertainty analysis is performed with R (v. 3.4.1). All spatial data are processed in the Geographical Information System ArcGIS (v. 10.4).

Database
According to WFD, the chemical status of groundwater is evaluated based on a representative monitoring network on which the EU Member States are required to report regularly. The WFD monitoring includes a surveillance network for the long-term and overall description of the groundwater status and an additional operational network. The monitoring sites selected in the WFD monitoring network represent the qualitative status of the respective groundwater body (EC 2003). As the implementation of the WFD in Germany is the responsibility of the federal states, the design of the monitoring networks varies, which is reflected in particular in the numbers and distribution of the individual monitoring sites. In this study, we use a consolidated data set of groundwater concentration measurements of all WFD monitoring networks of the federal states (hereafter referred as WFD data set), which was provided by the responsible federal authorities. We considered a number of preconditions for the selection of the monitoring sites out of the WFD data set: • Metadata and information on the sampling depth need to be available.  • Monitoring sites with strong variations in the annual means are excluded (standard error of the mean (SEM) in the time series is >1 mg l −1 for O 2 and Fe and >10 mg l −1 for NO 3 ) to provide a representative concentration with robust means for the sampling period. • Outliers in the O 2, Fe and NO 3 concentrations (>99% percentile) are removed from the data set.
Information on data acquisition and preparation is further described in text S1 of the supplementary material, available at stacks.iop.org/ERL/15/064004/mmedia. The groundwater quality data is statistically summarised in table S1 and figure S1, the spatial distribution of the monitoring sites is mapped in figure S2.
The groundwater concentration measurements (point data) for NO 3 (n = 5414), O 2 (n = 5837) and Fe (n = 5628) are used as the dependent variables. The redox condition classification is derived from O 2 and Fe concentrations and is taken into account in the nitrate model as predictor. To simulate groundwater nitrate concentrations, a set of 21 relevant spatial environmental predictors available throughout Germany is pre-selected according to expert knowledge, including 7 predictors on land use and management, 12 predictors on hydrogeology and hydrochemistry, and 2 predictors on soil conditions (table S2). A description of the data and information on data preparation can be found in text S2 and figures S3-S6.

Pre-processing 2.2.1. Characterisation of redox conditions
The characterisation of the redox conditions is based on a four-point classification scheme according to LAWA (2018). O 2 concentration is the predominant determinant for the occurrence of denitrification, with concentrations <2 mg l −1 indicating anaerobic conditions and the occurrence of denitrification (Rivett et al 2008). In this case, two points are assigned to the monitoring site. In the range of 2-5 mg l −1 O 2 , there is only an average probability for the occurrence of denitrification, and one point is assigned. At O 2 concentrations >5 mg l −1 there are strong aerobic conditions, the probability of denitrification is low and no points are assigned to the monitoring site. Since denitrification preferably occurs under the presence of Fe concentrations ⩾0.2 mg l −1 (Kunkel et al 2004(Kunkel et al , 2017, we added an extra point. This results in four redox classes: (3) strongly anaerobic, (2) anaerobic, (1) intermediate and (0) aerobic (table S3).

Buffer analysis
The point data (O 2 , Fe and NO 3 concentrations) are linked with the spatial predictors by means of contributing areas. The contributing area represents the potential catchment area of a monitoring site where the measured groundwater quality can be influenced by environmental factors. Since information on largescale groundwater flow conditions is limited, we used a simplified procedure to determine contributing areas for each monitoring site, i.e. a circular buffer of 1000 m radius results in best model performances (Knoll et al 2019). The spatial predictors are compiled within this buffer zone and are linked to the monitoring site. The spatial predictors are assigned to the individual cells of the grid map for Germany (358 171 grid cells) in the same way. The data pre-processing is further described in text S3.

Feature selection
First, a correlation matrix was used to test for strong correlations (pearson correlation coefficient r > 0.75) between the numerical predictors. This was the case for the leachate rate and the groundwater recharge rate, whereupon we removed the leachate rate from the data set. With the Boruta R package (Kursa and Rudnicki 2010), an all-relevant feature selection algorithm, a final set of relevant predictors was selected, where rock type of the aquifer was removed as well (table S2). Predictor importance plots derived from the Boruta method can be found in figure S7. After feature selection, 11 predictors remain for the O 2 data set, 6 for the Fe data set, and 19 for the NO 3 data set.

Predictive modelling
For RF model training and evaluation, the data is first split into a training (80%) and test (20%) data set. In order to ensure an approximately equal distribution of the response variable in both data sets, stratified random sampling is applied (Kuhn and Johnson 2013). For spatial prediction, the final model is trained with all data due to the relatively low density of monitoring sites. Details about the RF modelling approach is given in text S4. While RF considers only the conditional mean of a response prediction, QRF keeps all observations in this node and assesses the conditional distribution (Meinshausen 2006). This continuous distribution function describes the range and variation of the response variable around the predicted mean and thus median values or an estimate for the uncertainty of the prediction can be obtained. We use the R package 'caret' v. 6.0-82 (Kuhn 2018) for predictive modelling, in which the R package 'randomForest' (Liaw and Wiener 2002) (method = 'rf ') and 'quantregForest' (Meinshausen 2017) (method = 'qrf ') is implemented. In 'caret' , we define the number of trees in the model (ntree = 1000) and three times repeated 10-fold crossvalidation (method = 'repeatedcv' , number = 10, repeats = 3) as resampling method in the model training function.
The evaluation of the model performance is based on three objective functions: the root mean squared error (RMSE), coefficient of determination (R 2 ) and mean absolute error (MAE). RMSE and MAE give an expression for the average model prediction error in the unit of the model response, where R 2 is a common measure indicating the variance in the prediction. We determine model performance for the training data set and validate it with the independent test data set. Because RF trees grow to a maximum size, it likely overfits the data in each tree, which results in essentially low prediction errors on the training data. Using cross-validation in the model training process avoids overfitting. Therefore the performance for the finally used model based on all data is determined through cross-validation. The relative predictor importance for each model is evaluated during the model training process by cross-validation based on the training data set with a normalised measure for importance from 0% to 100%, where the most important predictor is set to 100% (Kuhn 2018).
After model training and setup of the grid map data set (1 km × 1 km) for Germany, the models are used to estimate the groundwater concentrations for each grid cell according to the assigned predictors. Before applying the model for NO 3 to the grid, the redox conditions must be determined for each grid cell according to the classification scheme based on the predicted O 2 and Fe concentrations (table S3).

Uncertainty analysis
To evaluate the reliability of the estimates, a prediction interval (PI) can be derived to determine the model uncertainty. The p-quantile of a distribution corresponds to the value not exceeded by the values of the response variable with a probability p. Based on the p-quantiles, the PI is defined as the range between the lower (PL lower ) and upper prediction limit (PL upper ). A 90%-PI indicates a range from the 5%-quantile to the 95%-quantile , in which the true value is expected with a high probability (p = 0.9). The mean prediction interval (MPI) and the prediction interval coverage probability (PICP) are suggested as statistical measures to evaluate the uncertainty of QRF Solomatine 2006, Rahmati et al 2019). The MPI expresses the average width of all PIs, with lower values indicating lower uncertainties (equation (1)). With the PICP as a validation measure of the model uncertainty, the proportion of observed values (y i )  within the estimated PI is calculated for a given confidence level p (equation (2)) (Dogulu et al 2015). Ideally, PICP corresponds approximately to the confidence interval p, otherwise the uncertainty is overestimated (PICP > p) or underestimated (PICP < p).

Spatial distribution of groundwater redox conditions
In the pre-processing step, four classes characterising the prevailing redox conditions were determined based on the measured O 2 and Fe concentrations. figure 2 shows a distinct influence of redox conditions on groundwater nitrate concentrations with a significant decrease with increasing anaerobic conditions. However, very high nitrate concentrations can still be seen in the anaerobic classes (2) and (3), despite potentially favourable conditions for denitrification. In these cases, either the denitrification potential is exceeded or the denitrification capacity is exhausted due to depleted electron donors (Green et al 2008, Liao et al 2012, Wilde et al 2017. In order to map the redox conditions, the spatial distribution of O 2 and Fe groundwater concentrations must first be estimated. RF models for both parameters are trained with the pre-selected predictors. The importance of the predictors used in the RF model for O 2 (RF O2 ) and Fe (RF Fe ) are ranked and depicted in figure 3. For the RF O2 model, the hydrogeological units and the groundwater recharge rate are most relevant. In the RF Fe model, the hydrogeological units dominate followed by field capacity. Tesoriero et al (2017Tesoriero et al ( , 2015 also concluded that geology and parameters related to groundwater travel times, including groundwater recharge rate, are significant predictors for estimating aerobic conditions.  The results of the predictive performances of the RF models are summarised in table 1. The RF O2 model shows a very good predictive performance for the training data set. The RF O2 model based on all data after cross-validation still achieves a good predictive performance (RMSE = 2.49 mg l −1 , R 2 = 0.53, MAE = 1.86 mg l −1 ) similar to that of the test data set. The RF Fe model also shows very good predictive performance for the training data set. However, for the independent test data, the RF Fe model only has a reasonable predictive performance as has the RF Fe model based on all data after cross-validation (RMSE = 3.00 mg l −1 , R 2 = 0.26, MAE = 1.66 mg l −1 ). Similar studies, e.g. Tesoriero et al (2017) and Rosecrans et al (2017) can hardly be compared with our results, as they predict the probabilities of exceeding certain thresholds. Predictions versus observations for groundwater O 2 and Fe concentrations of the test data are shown in figure 4. Despite its lower predictive performance compared to oxygen, iron provides an important additional contribution to the classification of the redox conditions.
The predicted grid maps of the groundwater concentrations of the redox parameters O 2 and Fe can be found in figures S8 and S9. For each map, the QRF model is used to determine the upper and the lower prediction limit (PL upper = 95%-quantile, PL lower = 5%-quantile) to derive the PI (figure S8 and S9). Density plots of the predicted redox parameters can be found in figure S11. The MPI of 7.4 mg l −1 for the RF O2 model and 6.0 mg l −1 for the RF Fe model indicate a very high uncertainty in the model prediction. The PICP values of 0.89 for RF O2 and 0.90 for RF Fe correspond to p (0.90), thus indicating that the uncertainties of the predictions are correctly assessed with QRF.
According to the classification scheme in table S3, the spatial distribution of the redox condition is determined with the estimated O 2 and Fe concentrations in groundwater. Due to the low model performance of the RF Fe model and its rather large uncertainties this could lead to misclassifications, especially in line with the low Fe concentration threshold for anaerobic conditions. As Fe concentrations show a strongly right skewed distribution, we use the median in favour of the average to describe the expected response. The resulting map of the derived redox conditions is shown in figure 5. In the northern part of Germany as well as in some lowlands in the central and southern part predominantly strongly anaerobic to intermediate redox conditions occur. The southern part of Germany is dominated by intermediate to mostly aerobic redox conditions. This distribution indicates a higher denitrification potential in the unconsolidated aquifers of the North German Plain and local lowlands than in the consolidated units of central and southern Germany.

Spatial distribution of groundwater nitrate concentration
The RF model training for the NO 3 model (RF NO3 ) is based on a more comprehensive set of predictors compared to RF O2 and RF Fe (table S2). In addition to available spatial predictors, information on redox conditions derived from monitoring data of O 2 and Fe concentration (figure 5) is used. The variable importance of the predictors used in the RF NO3 model is ranked in figure 3. The redox condition is by far the most relevant predictor, followed by the hydrogeological units and the percentage of arable land. The high relevance of the hydrogeological units and the percentage of arable land has also been found in Knoll et al (2019). Ransom et al (2017) used the redox parameters manganese and oxygen for nitrate prediction, which in that case turned out to be the most relevant predictors. Contrary to our expectations, the groundwater residence time within the RF NO3 model is less relevant. Ransom et al (2017) added the groundwater age as a predictor and significantly improved the predictive performance of their groundwater nitrate model. However, the rather coarse resolution of groundwater age might be the reason for the low impact in our study.
The RF NO3 model results in a good predictive performance for the training data set (table 1). For the independent test data set, the model still provides good predictive performance as well as for the model based on all data after crossvalidation (RMSE = 20.12 mg l −1 , R 2 = 0.52, MAE = 12.71 mg l -1 ). Predictions versus observations for groundwater nitrate concentrations of the test data are shown in figure 4. The RF NO3 model tends to underestimate high concentrations (>50 mg l −1 ) and has problems in the prediction of extreme values >100 mg l −1 . However, the performance of the model can be regarded as good, especially with respect to the large scale of the application and the relatively low density of monitoring sites. Other studies on large-scale prediction of groundwater nitrate concentrations result in comparable model performances , Wheeler et al 2015, Ransom et al 2017, Knoll et al 2019. The predicted grid map of the NO 3 concentration in groundwater is shown in figure 6, a corresponding density plot can be found in figure S11. As shown before, results of the RF Fe and the RF O2 models contain partly large uncertainties. This could lead to misclassifications of the redox classes used for the NO 3 prediction. Since the redox condition is the most relevant predictor, we analysed its spatial sensitivity. In short, we added a random error of 5%-30% to all classes in figure 5, see details in text S5 and figure S12. The mean predicted NO 3 concentration varies more with an increase in misclassification. But even an assumed misclassification as large as 30% does not lead to a general questioning of our results. The predicted mean NO 3 concentration across Germany of 22.7 mg l −1 ranges within 21.0 to 24.9 mg l −1 under this assumption.
The QRF model is also used to determine the upper and the lower prediction limits for the RF NO3 model. The derived 90%-PI is shown in figure 6. Based on the test data, the MPI of 53.0 mg l −1 for the RF NO3 model indicates a very high uncertainty in the model prediction. The PICP values for the RF NO3 model of 0.91 corresponds to p (0.90), indicating correctly assessed prediction uncertainties with QRF. Other studies on prediction of groundwater nitrate concentration determined uncertainties in a similar range (Ransom et al 2017, Rahmati et al 2019. Koch et al (2019) pointed out that the uncertainty can be significantly reduced with a more comprehensive data set. In line with this, we also tested a data set with additional monitoring sites (n = 13 038 for NO 3 ) operated by several federal states in Germany, which resulted in a reduced, but still high uncertainty (MPI = 41.9 mg l −1 ). Anyway, as these sites are not part of the WFD monitoring program, they were not further considered in this study.

Discussion
The high relevance of the redox conditions in the estimation of nitrate concentrations underlines the importance of a precise estimation of the hydrogeochemical conditions in the aquifers. Since hydrogeological units ( figure S4) provide a decisive predictor for the estimation of the redox parameters, it is obvious that they are also of high relevance for the estimation of groundwater nitrate concentrations. Nevertheless, the anthropogenic impact on nitrate loads can only be represented properly if the inputs of nitrogen to the groundwater are correct. Thus, knowledge about percentage of agricultural land, the nitrogen surplus and the seepage rate are essential. Although natural turnover processes in the groundwater body are the most important determinant with regard to the groundwater pollution, the starting point for mitigation measures is to reduce the input, especially with regard to a finite reduction potential in the groundwater body.
It is apparent that the redox conditions are roughly divided into two regions (figure 5). In the North German Plain and in some lowlands, anaerobic to strongly anaerobic conditions and thus preferably denitrifying conditions predominate. Conversely, aerobic to intermediate redox conditions tend to dominate in the southern and central German mountain regions. Similar findings are described in Kunkel et al (2004) who conclude that aquifers with reduced nitrate degrading capacity occur in the North German Plain, while non-nitrate-degrading conditions prevail in aquifers of consolidated units. Hannappel et al (2018) also found that the denitrification potential in quaternary and tertiary unconsolidated rocks is clearly higher than in the consolidated units.
The north-eastern part of Germany shows predominantly unpolluted areas in glacially formed Late Pleistocene units. These largely confined aquifers are characterised by low flow velocities, long groundwater residence times and strongly anaerobic conditions due to thick covering till layers and are therefore well protected against contamination (Wendland et al 2008, Merz et al 2009. Merz et al (2009) also describe aerobic areas in groundwater recharge regions with high NO 3 concentrations, which only in places are identified by the predictions of the RF NO3 model. However, predictions are accompanied by rather high uncertainties, so that higher concentrations are at least within the confidence limits of the model. In the north-western part of Germany, high NO 3 concentrations occur in the Geest regions formed in the Middle Pleistocene, whereas in the lowlands the groundwater is almost nitrate-free, which is in line with Wriedt et al (2019). The redox conditions tend to be less anaerobic to intermediate in the Geest areas in contrast to the lowlands where strong anaerobic conditions prevail, which was also reported by Eschenbach et al (2018). Conversely, lowlands with sand and gravel deposits in the middle and southern part of Germany often show nitrate-polluted groundwater, due to a lack of cover layer, weakly anaerobic to intermediate redox conditions and intensive agricultural land use (Grimm-Strele et al 2008, Wendland et al 2008, Kuhr et al 2013, Knoll et al 2019. For the pore aquifers of the northern Alpine foothills, the RF NO3 model predicts large areas of nitrate-polluted groundwater, despite generally well-developed cover layers and the associated high protective function of the aquifers in this region. The tendency to overestimate groundwater nitrate concentration in this region can be due to the very low density of monitoring sites. The uncertainties in this area are also high. In the consolidated units of central and southern Germany, fractured and karst aquifers are generally more vulnerable to contamination and predominantly indicate higher groundwater nitrate concentrations, which is also in line with a lower nitrate reduction potential due to aerobic to intermediate redox conditions (Hannappel et al 2018). In these consolidated regions, the high nitrate concentrations in the groundwater correspond to areas with high nitrogen surpluses caused by intensively agriculturally used depressions. Only the mountain regions with extensive forest cover usually show no groundwater nitrate pollution. Based on the described approach, about 10% of the area of Germany exceed the threshold of 50 mg l −1 NO 3 .
In general, the uncertainties are large (MPI = 53.0 mg l -1 ), especially in the regions where nitrate concentrations are high or where strong variations between very high and low concentrations exist. Ransom et al (2017) also determined the highest uncertainties in the nitrate-polluted areas.
A possible source for prediction errors and high uncertainties is linked to the available predictors. A more precise assessment of groundwater residence times could lead to better results and lower uncertainty. However, there is no nation-wide detailed database on the depth to the groundwater surface and flow conditions or the thickness of cover layers, which would be important indicators for residence times and the protective function of the aquifer. In addition, the integration of further predictors, such as a detailed description of the geochemical character of the aquifer or detailed information on input sources (that are currently not available on large scales) could improve the model performance. Uncertainties and prediction errors may also arise from the underlying dependent variable. Investigations of the spatial distribution of redox conditions in aquifers have shown a decrease in aerobic conditions with depth (Close et al 2016, Rosecrans et al 2017. Due to this hydrogeochemical zoning of aquifers, the concentrations of the groundwater samples strongly depend on the horizon in which the wells are screened and generally the NO 3 concentrations decrease with increasing well depth (Wheeler et al 2015, Ransom et al 2017. Tesoriero et al (2017) also showed a higher probability for high nitrate concentrations near the groundwater surface. Wriedt et al (2019) reported a decrease in nitrate concentrations with depth, but they concluded that a vertical gradation of depth horizons cannot adequately describe the hydrogeochemical zoning of the aquifers. Long screened sections can also lead to mixing of groundwater from the aerobic and anaerobic zone during sampling. Adjacent measuring points, which supply samples from different horizons, thus lead to variations in the data set, likely causing high prediction uncertainties.
Finally, it should be pointed out that the results of the regionalized groundwater nitrate concentration, and thus the identified areas with NO 3 > 50 mg l −1 (10%), presented in this study cannot be compared one-to-one with studies focusing on other research questions. The classification of the chemical status of the groundwater in Germany according to the WFD showed that even 27.1% of the groundwater bodies did not meet the WFD quality standard for nitrate (UBA. 2017). It should be taken into account that the WFD-assessment is based on the precautionary principle, i.e. the results consider extreme concentrations which are potentially harmful. Accordingly a groundwater body (average size around 316 km 2 ) is entirely classified as 'poor status' if even one of the monitoring sites or a defined proportion of the area within the groundwater body does not meet the quality standard (GrwV 2010). Thus, a much higher proportion of the area is designated as nitratepolluted compared to our analysis, where the spatial assessment is based on a much higher resolution and mean values excluding outliers. In the authors' view, which methodology most adequately characterises groundwater quality (with regard to nitrate) depends on the scientific context and the research objective.

Conclusion
This study shows the potential of machine learning applications such as RF and QRF in the field of water research on a national scale. We presented the first Germany-wide assessment of groundwater redox conditions and nitrate concentrations with a resolution of 1 km × 1 km using a uniform data-driven approach based on spatial environmental predictors. In addition to a large influence of redox conditions, hydrogeological units and the percentage of arable land on nitrate estimation, high prediction uncertainties were determined. A more detailed and comprehensive database stratified according to well depths would likely enhance the prediction of NO 3 concentrations in groundwater. Adaptations of national groundwater monitoring networks with extensive monitoring in different depth horizons would be a major step forward, not only for Germany. However, the results of this study can contribute to further research on nation-wide scales, e.g. the calculation of national nitrogen targets or quantifying nitrogen flows from groundwater into surface waters and for the identification of vulnerable regions in which detailed investigations regarding the implementation of mitigation measures are necessary.

Acknowledgments
The project underlying this publication was coordinated and funded by the German Environment Agency in the framework of the environment research plan of the Federal Ministry for the Environment, Nature Conservation and Nuclear Safety (Project No. 3715 22 2200). We are very grateful to German Working Group on water issues of the Federal States and the Federal Government-Committee for Groundwater and Water supply (LAWA-AG) and the Water Management Authorities of the federal states for providing the groundwater data. We also declare that no conflicts of interest exist in the submission of this manuscript.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.