How to perform global sensitivity analysis of a catchment-scale, distributed pesticide transfer model ? Application to the PESHMELBA model

. Pesticide transfers in agricultural catchments are responsible for diffuse but major risks to water quality. Spatialized pesticide transfer models are useful tools to assess the impact of the structure of the landscape on water quality. Before considering using these tools in operational contexts, quantifying their uncertainties is a preliminary necessary step. In this study, we explored how global sensitivity analysis can be applied to the recent PESHMELBA pesticide transfer model to quantify uncertainties on transfer simulations. We set up a virtual catchment based on a real one and we compared different approaches 5 for sensitivity analysis that could handle the speciﬁcities of the model: high number of input parameters, limited size of sample due to computational cost and spatialized output. We compared Sobol’ indices obtained from Polynomial Chaos Expansion, HSIC dependence measures and feature importance measures obtained from Random Forest surrogate model. Results showed the consistency of the different methods and they highlighted the relevance of Sobol’ indices to capture interactions between parameters. Sensitivity indices were ﬁrst computed for each landscape element (site sensitivity indices). Second, we proposed 10 to aggregate them at the hillslope and the catchment scale in order to get a summary of the model sensitivity and a valuable insight into the model hydrodynamical behaviour. The methodology proposed in this paper may be extended to other modular and distributed hydrological models as there has been a growing interest in these methods in recent years.


Introduction
Pesticide transfers from fields to water bodies is a major but also complex environmental concern.Significant efforts are required to assess risks for aquatic and human lives.However, this is made difficult due to the complexity of processes at stake, the diversity and the fragmentation of agriculture landscapes where pesticides are applied (Campbell et al., 2004).Pesticide transfer models are essential tools to support risk management as they make simulation of contamination and transfers possible.
They also make it possible to explore and compare alternative scenarios.However, there are many challenges for developing and using such models, highlighted for example in Gascuel-Odoux et al. (2009).In particular, to support decision-making, it is important to use physically based models that are simple enough to ensure flexibility (Reichenberger et al., 2007;Dosskey et al., 2011).A large range of models exist at a local scale (Adriaanse, 1997;Muñoz-Carpena et al., 1999;Carsel and Baldwin, (RF).Indeed, its structure provides valuable information on feature importance that can be processed as sensitivity indices like in Harper et al. (2011) and Aulia et al. (2019) (see Antoniadis et al. (2021) for a review on the use of random forests for sensitivity analysis).
Last but not least, pesticide transfer models are often fully spatialized, meaning that the interest area is divided into spatial units on which equations are solved locally.Such specificity should be carefully addressed to make the GSA step as informative and relevant as possible.Sensitivity can be examined at a local scale, on each spatial unit, which can make it computationally extensive.A second point of view consists in providing the user with synthetic measures that summarize the sensitivity over the whole spatial domain.Saint-Geours (2012) introduces the notion of site sensitivity indices to refer to local GSA and the notion of aggregated sensitivity indices (sometimes called block sensitivity indices) to make reference to sensitivity indices with respect to all spatial points.Site sensitivity indices result in sensitivity maps that detail spatial contributions of influential parameters (Herman et al., 2013;Abily et al., 2016).Aggregated sensitivity indices are built either on a scalar objective function built from the spatialized output (like sum, average or maximum value) as explored in Saint-Geours et al. (2014) or from a GSA performed on spatialized multidimensional output.In that case the extension of the GSA method to multidimensional output should be examined so as the interpretation of the results.For instance, the generalization of Sobol' indices to multidimensional output from Gamboa et al. (2013) is used in De Lozzo and Marrel (2016), to perform GSA on a spatialized radioactive material release model.
In this paper, we explore how the sensitivity of risk-assessment models can be relevantly tackled based on the example of the process-oriented, modular PESHMELBA model (PESticide and Hydrology: ModELling at the catchment scale).We perform a GSA of the PESHMELBA model and we aim at identifying relevant tools and a feasible methodology that could be transposed to other complex, distributed models.The spatialized aspect of the sensitivity analysis is particularly investigated.A broader scientific purpose is to determine how the information got from GSA can be used to better understand the processes that drive transfers and fate of pesticides in the PESHMELBA model.The analysis is performed on a virtual scenario based on a real catchment in the Beaujolais region (France).
The paper is organized as follows: we describe the PESHMELBA model in Section 2.1 and the model setup in Section 2.2.
Then, we introduce the different GSA methods used and the case study in Section 2.3 and the methodology used for landscape analysis in Section 2.4.Input sampling is described in Section 2.5.Results are presented in Section 3 focusing successively on screening (Section 3.1), comparison of GSA methods (Section 3.2) and spatial analysis (Section 3.3).

The PESHMELBA model
The PESHMELBA model represents a catchment as a set of interconnected components that stand for landscape elements such as plots, Vegetative Filter Strips (VFSs), ditches, hedges or rivers (Rouzies et al., 2019).In order to respect the spatial organization and the heterogeneity of the landscape, it deals with mesh elements that can be surfaces or lines.Surface mesh elements are called Homogeneous Units (HUs).A HU is a portion of landscape that is homogeneous in terms of hydrodynamical processes and agricultural practices.Linear mesh elements are called reaches.A reach is characterized by its nature (so far ditch, river or hedge) and by its neighbouring components: it is at most in contact with one elementary mesh element on each bank.In addition to its geometric or hydrodynamic properties, each mesh element is characterized by its one-way connections with the neighbouring components that stand at a lower altitude.One or several processes are represented on each element depending on its nature.Lateral transfers at surface and in subsurface between elements are also integrated.Independent codes called units are used to simulate the different processes, depending on the knowledge the user has on the targeted catchment functioning.Then, the OpenPALM coupler (Fouilloux and Piacentini, 1999;Buis et al., 2006) is used to couple the units and to build the complete application.OpenPALM has adapted features to easily deal with spatial and temporal aspects of the model.
For example, synchronization tools make it possible to couple processes with different time steps.The final structure is highly modular and process representations can easily be added, upgraded or removed depending on the landscape description.These features make PESHMELBA particularly suitable for scenario exploration.
PESHMELBA focuses on surface and subsurface transfers of water and pesticides.An extensive description of elements and processes already included can be found in Rouzies et al. (2019).The PESHMELBA version used in this study integrates a representation of water and pesticide transfers on plots, VFSs and rivers.Each plot or VFS is represented by a unique column of soil divided into vertical cells.In such a column, vertical infiltration is simulated using a solution of the 1D Richards equation proposed by Ross (Ross, 2003).An adapted set of parameters makes it possible to represent high infiltration rate, surface runoff reduction and enhanced adsorption and degradation on VFSs.Root-water uptake is integrated based on Varado et al. (2006).Surface runoff routing is represented based on the kinematic wave (Lighthill and Whitham, 1955) and the Darcy law (Darcy, 1857) is used for lateral subsurface transfers.In addition to shallow groundwater tables, PESHMELBA also represents shallowly perched water tables and associated lateral transfers.Finally, reactive transfer of solutes is represented: advection, degradation based on a first order law and adsorption, based on linear or Freundlich isotherms are integrated.Each river or ditch reach is represented by a unique tank.The River1D module (Branger et al., 2010) solves the kinematic wave equation for water routing and pesticide advection in the network.Groundwater-river exchanges are represented by the Miles formulation adapted by Dehotin et al. (2008).

Model setup
A virtual scenario of limited size was set from a portion of la Morcille catchment (France) in order to explore different GSA methods and to ease interpretation of spatialized results.The chosen portion was selected so as to remain representative of the global composition of La Morcille catchment in terms of soil, slope, type and size of elements as well as interface length between them.The chosen scenario was composed of 10 vineyard plots, 4 vegetative filter strips and 6 river reaches that delimit a left and a right slope (see Figure 1).Soils types on the catchment are mainly sandy (Peyrard et al., 2016).We used the classification from Frésard (2010) that groups soil types into 3 main Soil Units (SUs).Each SU is defined by the vertical succession of 3 or 4 soil layers, also called soil horizons: one surface horizon, 1 or 2 intermediary horizons and one deep horizon as depicted in Figure 2. Note that interface depths can vary from one SU to another.The reader may refer to Rouzies et al. (2019) for further details on how soil types  Spatial arrangement was set in order to be as realistic as possible in terms of possible interfaces between SUs.Each SU was set at least on one vineyard plot and one VFS on the virtual scenario.Bulk density and organic carbon content data were available from Van den Bogaert (2011) and Randriambololohasinirina (2012).Hydrodynamic parameters for each soil type are described in Table 1.Retention values measured by Van den Bogaert (2011) were used to fit retention curve using SWRCfit tool (Seki, 2007).A Schaap-Van Genuchten conductivity curve was used (Schaap and van Genuchten, 2006;Ross, 2006) whose matching point at saturation Ko and empirical pore-connectivity L were derived from conductivity data and retention parameters from retention curve fitting.Surface organic carbon content was set equal to that of the first soil horizon on plots and VFSs.For each SU, only the first soil horizon on VFSs differs from vineyard plots so as to highlight enhanced infiltration capacities.Surface horizon on VFSs was characterized by a 2.8%-organic carbon content (Randriambololohasinirina, 2012) and a saturated hydraulic conductivity of 150 mm.h -1 (or 4.31•10 -5 m.s -1 ) following Catalogne et al. (2018).In the absence of data to characterize potential anisotropy of vertical and horizontal saturated conductivities Ks v and Ks h , isotropy was considered, thus the ratio Ks h /Ks v was set to 1 on the catchment.The pesticide chosen in this study is the tebuconazole as it is a fungicide widely used on la Morcille catchment.It is a slightly mobile molecule and we used a Freundlich isotherm to describe its adsorption equilibrium.Adsorption parameters were obtained from Lewis et al. (2016) (Koc = 769 mL.g -1 , Freundlich isotherm exponent = 0.84).Surface degradation coefficient was also taken from Lewis et al. (2016) (DT 50 = 47.1 days) and a decreasing degradation rate in function of depth was set as in FOCUS (2001).A 500g-application was considered at the beginning of the simulation on plots 2 and 7 (see Figure 1).
Most of the transformation and adsorption of tebuconazole was supposed to happen on plots and VFSs at this modelling scale.
Therefore, no adsorption or degradation was simulated in the river.
A no-flux boundary condition was applied on all sides except on surface where rain and potential evapotranspiration were considered.Rain data were extracted from BDOH database (Gouy et al., 2015).A 3-month simulation was performed on a winter period characterized by long and intense rain events (670 mm cumulated).Potential evapotranspiration (PET) data were obtained from MeteoFrance for the neighbouring site of Liergues (MeteoFrance, 2008).Data were averaged over 10-day periods and corrected by a factor -11 % to match La Morcille site (Durand, 2014;Caisson, 2019).Rain and PET data for the simulation are summarized in Figure 3.Although virtual, we aimed at setting initial conditions as plausible as possible for this scenario.Initial water table levels were deduced from piezometric data on a neighbouring hillslope.Data from several piezometers were available on a transect, perpendicular to the river.Data were extrapolated over the virtual hillslope width on both sides of the river.All soil columns were supposed to be in hydrostatic equilibrium at the beginning of the simulation.An upstream 0.177 m 3 s -1 -flow was considered in the river based on local measurements (Gouy et al., 2015).
Two types of vegetation were represented in this scenario.Vineyard cover was considered on plots while permanent grassland was simulated on VFSs.Considering the period of simulation (3 months), a fixed root depth (Zr=2.62 m) and a fixed root density in the first 10 % of the root depth (F 10=37 %) were considered for vineyards following values reported in Smart et al. (2006) and confirmed by expert knowledge in the area.The root depth (Zr) was set to 0.9 m and the root density in the first 10 % of the root depth (F 10) was set to 33.5 % for grassland (Brown et al., 2007).For vineyards, the Leaf Area Index (LAI) was assumed to increase from leaves formation until a maximum value before declining until harvest.Dates and associated values for this development cycle were taken from Brown et al. (2007).They are summarized in Appendix A1 so as the constant LAI value set for grassland on VFSs, based on Brown et al. (2007).The remaining parameters for root extraction model were fixed to nominal values proposed by Varado et al. (2006); Li et al. (2001).Manning coefficients were set from data reported in Arcement and Schneider (1989).A mature row crop value (0.033 s.m -1/3 ) was chosen for vineyard while a high grass pasture value (0.2 s.m -1/3 ) was set for VFS cover.
Finally, ponding height was set to 0.01 m on vineyard plots while an increased value was set on VFSs (0.05 m).According to Gao et al. (2004); Walter et al. (2007), the surface mixing layer thickness was set to 0.01 m on both plot and VFS domains.
In the river, the distance between the river bed and the limit of impervious saturated zone (di) was set to 1.5 m (ERT field measure, personal communication) and the saturated hydraulic conductivity (Ks) was set to 2.38•10 -5 ms -1 accordingly to local saturated conductivities in the neighbouring soil.

Global Sensitivity Analysis methods
We explored several Global Sensitivity Analysis methods in order to identify the most suitable approach to deal with the specificities of the targeted application: heterogeneous and coupled structure of the model, highly non linear processes and spatialized aspect.In what follows, we denote Y ∈ R a given model output.Y is function of a multivariate input random vector with M the computational model that maps X onto the output space.
The following three sections detail the mathematical theory under each method investigated to compute sensitivity indices for the PESHMELBA model: the Sobol' indices based on variance decomposition using generalized polynomial chaos expansion, the HSIC dependence measures and the feature importance measures obtained from Random Forest metamodelling technique.
For each method, extension to multidimensional output Y ∈ R d is also investigated.Readers in a hurry may directly jump to Section 2.4 that provides a qualitative summary and the technical details on the implementation for this case study.

Variance decomposition Definition
Variance-based methods aim at determining how input factors contribute to the output variance (Faivre et al., 2013).One of the most popular variance-based method is the Sobol' method (Sobol, 1993).It is based on the decomposition of the output variable Y into summands of increasing dimension: where 1 ≤ i 1 < ... < i s ≤ M and where summands m i1,...is (x i1 , ..., x is ) are defined so that their integral over any of its independent variables is zero.This last property implies that the summands are orthogonal, which makes this decomposition unique.It is referred to as ANOVA for ANalysis Of VAriances (Archer et al., 1997).If Y = M(X) is square integrable, all summands m i1,...,is are also square integrable and one can square and integrate Eq. ( 1) leading to: If one introduces the notion of partial variances V i1,...,is so that: Then, Eq. ( 2) reads: In such formulation, V i1,...is indicates the portion of variance that can be attributed to interactions between input parameters X i , i ∈ i 1 , ...i s .Single-index variance V i indicates the portion of variance that can be attributed to the main effect of X i taken alone.From the above, one can define Sobol' indices as: By definition, 0 ≤ S i1,...,is ≤ 1.In particular, first order sensitivity indices only account for main effects.These indices are usually calculated as a first step as they often account for a large portion of the variance (Faivre et al., 2013).Total sensitivity indices S Ti evaluate the total effect of an input factor X i on the output by taking into account its main effect S i and all interaction terms that involve it:

Sobol' indices computation using generalized polynomial chaos expansion
In the original paper, Sobol (1993) uses Monte Carlo simulations to compute these sensitivity indices.This method is based on two Monte Carlo samples of size N .However, such estimation of sensitivity indices requires 2 N estimations of the model.Polynomial Chaos Expansion metamodelling technique is based on homogeneous chaos theory (Wiener, 1938).It provides a functional approximation of the computational model based on the projection of the model output on a suitable basis of stochastic polynomial functions in the random inputs (Ghanem and Spanos, 1991).For any square integrable output random variable Y = M(X), Y ∈ R, its polynomial chaos expansion is expressed as follows: where the Ψ α 's are multivariate orthonormal polynomials that constitute the basis and γ α 's the associated coordinates.Multivariate polynomials are built as tensor products of univariate polynomials.For example, in the case of a multivariate polynomial of degree α, one defines the multi-index α = (α 1 , ..., α M ) with |α| = M i=1 α i and Ψ α is expressed as: where Ψ αi (x i )'s is the univariate polynomial of degree α i from the orthonormal family associated to variable x i .Univariate polynomials are chosen among classical families of polynomials, depending on the input factor distributions (Xiu and Karniadakis, 2002).Expansion from Eq. ( 7) is usually truncated to a finite sum for practical computation: where A ⊂ N M is the subset of selected multi-indices of multivariate polynomials.Different truncation schemes exist, the most obvious being the standard truncation scheme that only keeps multivariate polynomials of total degree less or equal than p: PCE truncated decomposition can be rearranged into summands of increasing order.Such formulation is based on sets A i1,...,is that contain all α = (α 1 , ..., α M ) tuples such that only indices (i 1 , ..., i s ) are nonzero: As detailed in Sudret (2008), the truncated expansion in Eq. ( 9) can be written as: By unicity of the ANOVA decomposition, Eq. ( 11) is therefore the ANOVA of Y.As a result, Sobol' indices can be deduced from gathering PCE coefficients according to the dependency on each basis polynomial.Because of the orthonormality of the multivariate polynomials, the partial variance are nothing but the sum of the squares of the PCE coefficients.After squaresumming and normalizing, one gets: where Var[Y] = α∈A γ 2 α is the total variance.In particular, first-order indices read: and total-order indices are expressed as: Extension to multidimensional outputs In this paper, Sobol' indices for multidimensional outputs are calculated following the formulation by Gamboa et al. (2013) for generalized Sobol' indices.Denoting the output variable where After computing the covariance and scalarizing both sides of Eq. ( 15) by applying the trace operator Tr, one gets: where Σ, C i , C ∼i and C i,∼i respectively denote the covariance matrices of Y, M i (X i ), M ∼i (X ∼i ) and M i,∼i (X i , X ∼i ).
As soon as Y is not constant, one can divide both sides of Eq. ( 16) by Tr(Σ), leading to: where Var[Y j ] stands for the variance of the scalar j th component of Y and S i,j is the first order Sobol' indice of Y j associated to X i .The Hoeffding decomposition can be generalized to any subset u ∈ {1, .., M } and Sobol' indices at any order can be then computed the same way.

HSIC dependence measure
The following sections briefly describe the HSIC dependence measure and introduce a method for computing an estimator.
Although the description below considers 1d-input factor and 1d-output, the HSIC theory can be easily extended to multidimensional variables (see De Lozzo and Marrel 2016 for example).
The HSIC theory relies on Reproducing Kernel Hilbert Space (RKHS) and kernel functions.The Hilbert space H of functions from X to R associated to the scalar product •, • H is a RKHS if for all x ∈ X , the application h ∈ H → h(x) is a continuous linear form.The Riesz representation theorem stands that to each value x in the space X corresponds a unique function φ x ∈ H so that ∀h ∈ H, h(x) = h, φ x H .Such RKHS thus provides us with a mapping from X to H.This mapping and thus the RKHS

Definition
Let F i denote the RKHS composed of all continuous bounded functions of input X i with values in R and G the RKHS composed of real-valued continuous bounded functions of output Y with values in R.
The main idea behind the Hilbert-Schmidt Independence Criterion (HSIC) used for GSA is to calculate the cross-correlation between any non-linear transformations of some input factor X i and the output Y (De Lozzo and Marrel, 2016).It means that such dependence measure simultaneously captures a very broad spectrum of forms of dependency between the variables (Meynaoui et al., 2018).Such non-linear transformations of X i and Y are described by the elements of F i and G.The HSIC measure then corresponds to the cross-covariance between any functions f and g in these RKHS, namely cov(f (X i ), g(Y)).
To do so, one may define the cross-covariance operator C[GF i ] : G → F i which is the unique operator from G to F i such that: Finally, the HSIC measure corresponds to the square of the Hilbert-Schmidt norm of the cross-correlation operator, which is: where (u i j ) j≥0 and (v k ) k≥0 are orthogonal bases of F i and G respectively.The resulting sensitivity indexes proposed by Da Veiga (2015) are defined for each input factor X i , i ∈ {1, ..., M } as: The HSIC measure fully depends on the choice of the RKHS F i and G and especially on the choice of the scalar product that defines the relation between elements from these RKHS.The kernel function defines such scalar product (Meynaoui, 2019).A large range of kernel functions exists and in this paper we focus on universal kernels (i.e. a kernel that is dense in the space of continuous functions with respect to the infinity norm).Indeed, it is necessary to use a universal kernel to characterize the independence of the variables.If the RKHS F i and G are universal, HSIC(X i ,Y) is equal to 0 if and only if X i and Y are independent (Gretton et al., 2005b;Da Veiga, 2015).Following De Lozzo and Marrel (2014Marrel ( , 2016) ) and Da Veiga (2015) we chose a Gaussian kernel as they often perform well (Gretton et al., 2005a) and can be used for scalar or vectorial variables.For a vectorial variable x ∈ R q , it is expressed as follows: with ||.|| 2 is the Euclidian norm in R q and where the hyperparameter λ is called the bandwidth parameter of the kernel.In this study, the bandwith λ is estimated from the inverse of the empirical standard deviation of the sample.

Computation
The HSIC measure can be rewritten in terms of kernels (Gretton et al., 2005a): where k Xi (resp.k Y ) is the kernel function defining the RKHS associated to X i (resp Y ), X = (X 1 , .., X i , ..., X M ) is an independent and identically distributed copy of X = (X 1 , .., X i , ..., X M ) and where Y is the output associated to X .Using Eq. ( 22), an estimator of HSIC can therefore be computed from an N -sample (x j i , y j ), j ∈ {1, .., N } of (X i , Y) such as proposed in Gretton et al. (2005a): where The HSIC sensitivity indices can be thus easily and quickly computed from a preexisting sample but might be biased if the sample size is too small as pointed out by De Lozzo and Marrel (2014).

Ranking and screening and based on HSIC measure
The HSIC dependence measure can be computed for each input factor X i and directly used as a sensitivity index for ranking.
In De Lozzo and Marrel (2014) a statistical test approach is also proposed for screening.It is based on bootstraps and it is then adapted for non asymptotic contexts.Considering an experimental design of N points (X 1 i , ...X N i ) and the associated output points (Y 1 , ..., Y N ), De Lozzo and Marrel (2014) aim at testing the hypothesis "X i and Y are independent".An estimator HSIC(X i , Y) of the dependence measure HSIC(X i , Y) is firstly computed.Then B bootstrap versions Y [1] ,...,Y [B] are resampled from the original output sample (Y 1 , ..., Y N ) with replacement so as to contain the same number of points N .For each Y [B] the input points associated to X i are not resampled.Indeed, under the independence hypothesis, any values of Y can be associated to X i .For each bootstrap version b, an estimator HSIC (X i , Y) is computed.Then, the associated bootstrapped p-value is given by : Finally, denoting α the significance level, if p-val B <α, the independence hypothesis is rejected, otherwise it is accepted. 13

Random Forest
This section briefly introduces the random forest metamodelling technique and details how it can be used to derive sensitivity measures.Again, it is assumed that the output Y is scalar but the concepts and algorithms below can be extended to multidimensional variables.

General formulation
Random forest (Breiman, 2001) belongs to ensemble machine learning techniques.It consists in averaging results from an ensemble of K decision trees created independently.A decision tree is composed of an ensemble of discriminatory binary conditions contained in nodes.Such conditions are hierarchically applied from a root node to a terminal node (tree leaf) (Rodriguez-Galiano et al., 2014).The input space is therefore successively partitioned into smaller groups that correspond to the nodes according to a response variable.Such splitting goes on until reaching a minimum threshold of members per node.
As each individual decision tree is very sensitive to the input dataset, bagging is used to avoid correlations between them and to ensure model stability.It consists in training each decision tree from a different training dataset smaller than the original one.Such subsets are built from the original one by resampling with replacement making some members be used more than once while others may not be used.Such a technique makes the random forests more robust when facing slight variations in the input space and increases accuracy of the prediction (Breiman, 1996(Breiman, , 2001)).The samples that are not used to grow a tree are called "out-of-bag" (OOB) data and will be used for the test step.RF workflow is summarized in Figure 4. Feature importance for the sensitivity analysis RF structure can be cleverly used to provide knowledge about how influential each input factor is.This measure is referred to as feature importance in the RF formalism.The random forest is first trained on the targeted output variable Y using a N -points sample (X j , Y j ) for j ∈ {1, ..., N }.Once the forest has been trained, each input factor X i is permuted individually so as to break the link between X i and Y.The effect of such permutation on the model accuracy is then investigated.A large decrease in accuracy indicates that the input factor is highly influential whereas a small decrease in accuracy indicates that it has little influence.Different algorithms exist to compute such Mean Decrease in Accuracy (MDA) (see Bénard et al. (2021) for an extensive review of the different formulations in the existing R and Python packages) and we focus here on the original formulation from Breiman paper (Breiman, 2001).The decrease in accuracy is originally computed from the mean square error between predictions from OOB data with and without permutation for each tree.Results are then averaged over all trees to get the MDA.The algorithm for feature importance calculation is extensively described in various papers (Soleimani, 2021;Bénard et al., 2021, e.g.) and it is reminded in what follows: 1.For each tree k: -Estimate ˆ OOB k the error from the OOB sample L k : Where MRF is the estimated RF metamodel.
-For each input factor X i : -Estimate ˆ * OOB k (i) using the permuted input set: 2. For each input factor X i -Compute the mean decrease in accuracy M DA i : Where K is the total number of trees Despite the "black-box" aspect of RF building, recent works theoretically established a link between Mean Decrease in Accuracy and Sobol Total Indices when input parameters are assumed independent.Indeed, Gregorutti et al. (2017) established that for all input parameter X i : 2.4 GSA strategy for the landscape analysis In this study, several GSA approaches that define the notion of sensitivity in contrasted ways were tested and compared.These methods are described from a methodological point of view in the previous sections and qualitatively summarized in what follows.
A variance decomposition method was first used and the associated Sobol' indices (Sobol, 1993) were computed.These indices measure the impact of each input parameter on the variance of the output.They capture the direct impact of any input and also accurately describe the impact of input parameters when they are interacting ones with others.1979).We used a q-norm-and degree-adaptive sparse PCE based on Least Angle Regression Scheme (LARS, Blatman and Sudret 2011) with q-norm ∈ [0.1, 0.2, ..., 1.0] and a maximum degree of 3.
Second, we computed sensitivity measures based on Hilbert-Schmidt Independence Criterion (HSIC, Da Veiga 2015).HSIC sensitivity indices belong to the category of dependence measures that quantify, from a probabilistic point of view, the dependence between each input and the output.More precisely, the HSIC indice calculates the distance between the input/output joint pdf and the product of their marginal pdfs when they are embedded in a Reproducing Kernel Hilbert Space.HSIC-based measures were used both for ranking and for screening purposes as they allow for fully characterizing the independence between two variables.In addition, these indices can be estimated from small samples (a few hundred of points) and do not depend on the number of inputs.In this study, screening was performed on a 4,000-point LHS to ensure a reasonable simulation time using a statistical test for independence.The power of the test α was set to 1% and 100 bootstrap replicates were used.Then, ranking was performed on the new 1,000-point LHS obtained from the reduced set of input parameters.The R code provided in De Lozzo and Marrel (2016) to compute HSIC measure was adapted in Python to perform both screening and ranking.
Finally, sensitivity indices deduced from Random Forest (RF, Breiman 2001) surrogate model were also computed.A Random Forest model is an empirical input/output relationship based on an ensemble of decision trees.The relative importance of each input parameter in the RF building (called feature importance measure) can be easily deduced from it.More precisely, an input parameter X i is considered important if when breaking the link between X i and the output Y by permutation, the RF prediction error increases.The metric for measuring prediction error can be defined in different ways and we chose the original formulation of Mean Decrease Accuracy from Breiman (2001).These feature importance measures constitute a kind of sensitivity indices that can be calculated (almost) for free as soon as the RF structure is available.In this study, we used the randomForestSRC R package (Ishwaran and Kogalur, 2020) to obtain feature importance measures once again from the 1,000-point LHS.The number of trees used to train the RF was set to 500.As RF building already includes a bootstrap step to build each tree, applying another bootstrap could thus affect the consistency of the OOB sample.We then used a subsampling approach without replacement with subsample size set to 800 to get error bounds on feature importance measures.As deducing the RF structure from a reduced dataset may lead to limited performances, we primarily checked that RF performances were reasonably decreased when shifting the training dataset size from 1,000 to 800.The same procedure was applied to estimate error bounds on HSIC indices as 800 points was supposed to be far enough to compute consistent estimators.100 replications were used for all resampling methods.
In a second time, aggregated sensitivity indices were computed to summarize all site/local sensitivity indices.In that case, the objective function was no longer a scalar but rather a vector that gathered a given cumulated variable for all HUs over the catchment.Aggregated Sobol' indices were calculated following Gamboa et al. (2013) on generalized Sobol' indices (Section

2.3.1).
Definition of uncertainty in input factors is critical to make sure that GSA results are representative of the model performance (Sobol, 2001;Muñoz-Carpena et al., 2010;Nossent and Bauwens, 2012;Herman et al., 2013).Such an uncertainty is represented by attaching to each input factor a probability density function (pdf) that reflects its variability in the simulation context (Gatel et al., 2019).GSA performed in this paper was fully oriented toward the application case described in the previous sections.Therefore, the input factor distributions were chosen as being as representative as possible of the available data on the catchment and the associated uncertainties.Mean values were taken from the standard scenario described in Section 2.2.
Distributions and standard deviations were assigned based on experimental measurements from the catchment of application, available scientific literature or expert knowledge (Table 2).
Bounds from data were used when several measurements were available.A mean distribution spread was calculated from horizons with several data and was used for setting max and min bounds on horizons when a single measurement was available.
A triangular distribution was assigned to Koc (Lauvernet and Muñoz-Carpena, 2018) and a normal distribution was assigned to DT 50.60% CV were assigned to Koc and DT 50 distributions as such parameters are considered relatively uncertain (Dubus et al., 2003).Triangular distributions were assigned to Manning's roughness on plots and for the river bed (Lauvernet and Muñoz-Carpena, 2018;Gatel et al., 2019).A uniform distribution with a +/-20 % range around the mean value was assigned to remaining input factors as little information could be found in the literature (Zajac, 2010).
Using a fully distributed model such as PESHMELBA raised the issue of sampling strategy.Indeed, in this case study, even if the site was only composed of 14 surface units, the large number of soil horizons on the catchment, considering the hydrodynamical distinction between plots and VFSs, also dramatically increased the number of parameters.Sampling all parameters on each spatial unit led to a huge number of simulations that could not be computationally afforded.Moreover, such independent sampling on a very large number of parameters may lead to misinterpretation of the sensitivity analysis results as the influence of physical processes could not be distinguished from spatial arrangement.For each sample, one set of hydrodynamical parameters was therefore sampled for each soil horizon and those parameters were applied to all spatial units that contained this horizon leading to a total of 145 parameters to be sampled.The bar colours are related to physical processes: brown is related to soil parameters and the darker the brown, the deeper the parameter, blue is related to river parameters whereas green is related to vegetation parameters.Filling in brown bars refers to the soil type of the parameter: soil 1 is not filled, soil 2 is cross hatched whereas soil 3 is filled with circles.
In De Lozzo and Marrel (2016), the authors call for caution when drawing general conclusion about HSIC and Sobol' indices comparison.In their paper, HSIC indices were found to be more similar to Sobol' first-order than to total-order indices but our findings reach the opposite conclusion.These results confirm guidelines on the use of these indices stating that they should not be used for general comparison, being based on different mathematical concepts.While Sobol' indices scrutinize only the variance of the output, the HSIC indices consider the whole dependence between an input and the output.In addition, HSIC indices are not intrinsically built to capture interacting effects.Our results show that they obviously capture some interactions but not all of them and not always.
RF feature importance indices have been proven to relate to Sobol total indices (Gregorutti et al., 2017) but our results exhibit large discrepancies between RF feature importance measures and Sobol total indices.Table from Appendix D1 provides quality scores for both PCE and RF metamodels.Results show that the RF metamodel performs much more poorly than the PCE for all variables probably due to the limited sample size.Despite this limited performances, error bounds on RF indices are quite small contrarily to error bounds on Sobol' indices estimates.We incriminate the resampling strategy that differ between the two methods.While the bootstrap technique has been proven to assess the quality of the PCE (Marelli and Sudret, 2018), the subsampling technique set on RF only targets the precision of feature importance measures.A more adapted subsampling approach, for example based on Ishwaran and Lu (2019) should probably be further investigated on bigger samples to better compute RF sensitivity indices and to accurately assess their quality.
To conclude, Sobol' variance decomposition is the only method that gives insight into main and interactive effects bringing precious knowledge for model validation.Sobol' indices then seem more appropriate to analyse complexe pesticide-related variables, dominated by physical interactions.These indices also have the merit of being easy to interpret.In addition, a generalized formulation for aggregated indices is provided by Gamboa et al. (2013) and has already been deeply explored.
However, the precision of the Sobol' indices obtained from PCE remains quite low and results should be thus interpreted with caution.Considering a high dimension problem and a very limited computational budget (inferior to 1,000 simulations), HSIC indices may not be discarded to compute accurate sensitivity indices as soon as detailed knowledge about interactive effects is not needed.

Landscape analysis
The previous section showed that Sobol' indices (first and total order) provide valuable information about interactions between parameters.As a result, we focus on this method in what follows.Site rankings such as presented in Figure 7 are gathered for all HUs in the form of sensitivity maps in Figure 8 for water surface runoff, and in Figure 9 for pesticide surface runoff.
Broadly speaking, both maps show strong spatial heterogeneities regarding influential parameters and a contrasted behaviour between right and left banks can be identified.For both output variables, hydrodynamical parameters (thetas, thetar and mn) of deep horizon from soil 1 (resp.2 and 3) are mainly influential only on HUs characterized by soil 1 (resp.2 and 3) (see Figure 2 for a reminder on soil types).
Local hydrodynamical parameters are found to be dominating to explain the output variable variance.A particular case is HU4 (indicated by an array on both Figure 8 and 9) which is characterized by soil 3 while parameters from soil 1 explained most of the variance of both water and pesticide surface runoff variables.The location of HU4, near the outlet, downstream several soil-1 HUs, may explain such spatial interactions.In addition to specific soil parameters, other parameters such as the manning roughness on vineyard plots (manning_plot) or the coefficient of adsorption (Koc_pest) have a greater influence on HUs from the left bank (top part of the catchment in the Figure ).
Finally, comparison of first-order and total-order maps shows quite similar results for water surface runoff on the one hand.It indicates that direct effects are significant for all influential parameters.On the other hand, direct effects are far from dominant on pesticide surface runoff.Once again, most parameters are influential nearly only in interaction with other parameters since the fist order indices are very low compared to the total order indices.Sensitivity maps provide local, detailed information about influential parameters on each location of the catchment.However they are computationally costly as one GSA per HU must be performed.This approach may be hard or even impossible to transpose to real catchment scale composed of several hundreds elements.Catchment scale aggregated indices thus provide a synthetic information at a lower computational price.In this study, Sobol' aggregated indices were directly computed from site sensitivity indices as they were available.However, if the size of the problem does not allow for direct computation, a pick-and-freeze estimator (Gamboa et al., 2013;De Lozzo and Marrel, 2016) can be used for Sobol' generalized indices.In our case, such overview of sensitivity analysis allows us to focus calibration efforts on deep soil hydrodynamical parameters and pesticide adsorption coefficient to improve the quality of the simulation of both water and pesticide surface flows.As pointed out in Marrel et al. (2015), both approaches are complementary and provide precious knowledge about the model functioning.The scale of the bank, or more generally of the hillslope, may also constitute an adapted intermediary scale to meet both requirements of detailed results for physical interpretation and computational efficiency.

Conclusion
In this paper, we have described the first global sensitivity analysis of the modular and coupled PESHMELBA model.For this first experiment, a virtual, simplified catchment was set to explore different approaches for GSA and to propose a methodology for future real applications.Even if the scenario was simplified, a particular attention was paid to reproduce and to deal with the challenges that would be faced in real applications: high number of input parameters with spatial heterogeneities, limited size of samples due to the high computational cost of a simulation and spatialized ouputs.We first proposed a screening step using an independence test based on the HSIC dependence measure.Then, we compared several methods to compute sensitivity measures: Sobol' indices computed from a PCE surrogate model, HSIC dependence measures and feature importance measures got from RF surrogate model.Although the results obtained from the three methods were mainly consistent, we noticed that Sobol' analysis was the only method that was able to capture some interactions between parameters, which may be of particular interest in such coupled models.
All methods were first applied on each landscape element individually.They provided us with local measures of sensitivity called site sensitivity indices.Such site sensitivity indices can be gathered into sensitivity maps and they highlight local contributions of parameters.Although very informative about the hydrodynamical functioning of the scenario, these maps were computationally demanding to produce.Then, we used an extension of the previous methods to multidimensional outputs in order to get an overview of sensitivity over the whole physical domain.These aggregated indices were first computed at the catchment scale to characterize the whole output uncertainty.They may allow the users to focus calibration efforts.Additionally, they were computed on each bank hillslope and we propose to use this scale as an intermediary scale to get an aggregated information about the catchment functioning as it still reflects spatial heterogeneities of hydrodynamical processes.When extending this methodology to other case studies, the intermediary spatial scales to be focused on should be defined depending on the characteristics and the goals of the study to make the most of the analysis.
Further research could extend global sensitivity analysis of PESHMELBA to contrasted climatic scenarios in order to encompass different environmental conditions as sensitivity analysis results highly depend on climatic and site condition (Alves Ferreira et al., 1995;Lauvernet and Muñoz-Carpena, 2018;Saltelli et al., 2019).Additionally, parameters were assumed to be in-et al., 2021) or using the principal components of the model's functional outputs.The definition and the use of adequate hydrological signatures such as proposed in Branger and McMillan (2020) and Horner (2020) may also be of interest to understand space-time variability and to capture a broader range of physical processes.
Global sensitivity analysis is a necessary but not yet systematic step to model evaluation, especially in the case of spatialized, risk assessment models that can be complex to deal with.This study proposes a comprehensive method based on 610 complementary indices and thus paves the way for systematic analysis of such environmental exposure models.Table C1: Remaining parameters after screening step for each output variable.In the XXX_XXX_XXX syntaxe of parameter names, the first block is the type of element the parameter refers to (soil horizon, river, vegetation, pesticide, HU or VFS), the second part is the parameter name while the last part is the element index the parameter refers to (soil horizon or vegetation type).

Figure 1 .
Figure 1.Left: portion of La Morcille catchment selected to perform sensitivity analysis.Yellow units stand for vineyard plots while green units stand for vegetative filter strips.Brown stars denote locations of pesticide application.Right: slopes and connections between elements.
Figure 2): sandy soil (SU1), sandy soil on clay on the right bank plateau (SU2) and heterogeneous sandy soils on lower slopes and thalwegs (SU3).

Figure 2 .
Figure 2. Soil type locations for the case study.Green contours show the vegetative filter strips.

Figure 3 .
Figure 3. Climatic forcing (rain and potential evapotranspiration) for the simulation.The dotted red line stands for the one-shot pesticide application.
Saltelli et al. (2008) recommend to choose N = k(M + 2) where M is the number of input factors and k a number between 500 and 1000.Using such a sample size may end up being unfeasible in case of computationally expensive models.The use of a metamodel is then an interesting alternative, in particular with the Polynomial Chaos Expansions (PCE) that have the advantage of providing direct estimation of Sobol' indices.PCE theory is briefly introduced in what follows, mainly based on LeGratiet et al. (2017).
for any input parameter X i , generalized first order Sobol' indices are based on the Hoeffding decomposition of the model function M:

Figure 4 .
Figure 4. RF workflow (adapted from Rodriguez-Galiano et al. 2014).K bootstrapped sets are extracted from the original training set.Part of each set "InBag" is used to grow an independent decision tree and the final regression value is the average of all trees.The remaining portion of each tree "out-of-bag" (OOB) is used as a test set.

Figure 5 .
Figure 5. Full workflow used to perform GSA on the PESHMELBA model.

Figure 7 .
Figure 7. Sobol' total and first order indices computed using PCE, HSIC and RF site sensitivity indices for all output variables on HU14 with associated 95% confidence intervals.RF feature importance measures are normalized by 2Var[Y] following Eq.(28).HU14 is displayed with a red contour on top left figure.For all methods, displayed parameters are the 10 most influential parameters regarding Sobol' total indices.

Figure 8 .
Figure 8. Maps of Sobol site sensitivity indices for water surface runoff for the most significantly influential parameters.

Figure 9 .Figure 10 .
Figure 9. Maps of Sobol site sensitivity indices for pesticide surface runoff for the most significantly influential parameters.

Table 1 .
Hydrodynamics parameters for SU1, 2 and 3 based on Van Genuchten retention curve and on Schaap-Van Genuchten conductivity curve fitting.Parameters are described in Table2.Values for the surface horizon are explicitly distinguished between plot and VFS when they are different.Horizons 11, 12, 13 are surface horizons for plots whereas horizons 14, 15, 16 are surface horizons for VFSs.
(Marelli and Sudret, 2014)08tion requires a large sample size that could not be afforded in this case study.As a result, we computed Sobol' indices from a limited sample size, based on Polynomial Chaos Expansion(PCE, Sobol 1993;Sudret 2008) in order to circumvect such difficulty.This approach consists in building a surrogate model which analytical polynomial expression is directly related to Sobol' indices.Building a PCE and deducing the associated Sobol' indices thus only requires a training sample of limited size and knowledge about each input parameter probability distribution.In this study, we used the UQLab Matlab software(Marelli and Sudret, 2014)and we computed Sobol' indices from a 1,000-point Latin Hypercube Sample(LHS, McKay et al.