Interactive comment on “ Assessing parameter importance of the Common Land Model based on qualitative and quantitative sensitivity analysis ”

Over all, this paper is well organized. A suite of SA methods are tested on a complex land surface model, CLM. It is clearly demonstrated that most of the SA methods are capable of identifying the sensitive parameters of CLM with small sample size in this experiment setting. The results promise that state-of-the-art SA methods can be applied to analyze complex earth system models in some cases. The authors also interpret the SA results from the view point to model physics to prove the effectiveness of the methods. The paper could be further improved if the authors could work more on improving the presentation. For instance, the descriptions of figure formats (such as Line 20-27 on Page 2256) should be moved to the figure captions instead of putting


Introduction
A land surface model (LSM) is an integral component of any numerical weather prediction (NWP) and climate models.The ability of an LSM to represent the land surface processes accurately and reliably depends on several factors (Duan et al., 2006).The first factor is the authenticity of the model structure (e.g., the equations or parameterization schemes of the model).The second is the quality of external forcing data and the initial and boundary conditions.The third is the appropriateness of the model parameter specification.How to estimate model parameters has received increasing attention from the hydrology and land surface modeling community over recent years (Franks and Beven, 1997;Gupta et al., 1999;Duan et al., 2001Duan et al., , 2003;;Jackson et al., 2003;Liu et al., 2005;Hou et al., 2012).
In traditional hydrological modeling, model parameters are often estimated through model calibration, i.e., a process of matching model simulation with observation by tuning model parameters.However, calibrating the parameters of complicated LSMs is a challenging task because of high dimensionality and nonlinear parameter interaction.With water, energy and, in some cases, carbon and nitrogen cycles being considered concurrently, a typical LSM usually has a large number of adjustable parameters (from O(10) to O(100)) that govern the model equations.Typically 10 5 × 10 6 or even more model runs are required to calibrate a high-dimensional (> 10) model (Vrugt et al., 2008;Deb et al., 2002).To compound the problem, running an LSM at a large spatiotemporal scale can be very time-consuming, making traditional parameter calibration methods (e.g., genetic Published by Copernicus Publications on behalf of the European Geosciences Union.algorithm (GA) (Goldberg, 1989) and shuffled complex evolution method (Duan et al., 1993)) impractical.
For the reasons above, we need to reduce the dimensionality by identifying which parameters have the most influence on model performance.Sensitivity analysis (SA) is a family of methods that are designed to identify the most sensitive (namely, influential) parameters from the insensitive ones (Saltelli et al., 2004).A good SA method is able to screen out the most sensitive parameters in a relatively low number of model runs (Tong and Graziani, 2008).
There are two types of SA methods: qualitative and quantitative.Qualitative methods provide a heuristic score to intuitively represent the relative sensitivity of parameters, while quantitative methods tell how sensitive the parameter is by computing the impact of the parameter on the total variance of model output.Qualitative methods usually need fewer model runs while quantitative methods require a large number of model runs.Therefore, for a specific problem, choosing which kind of SA methods is very important.In recent decades, there are several comparisons of different SA methods, of which seven examples are shown in Table 1.We can see that researchers have drawn different conclusions: some have suggested the quantitative SA methods are more reliable, some maintain that the qualitative SA methods can achieve consistent results with the quantitative methods; and others have supposed that applying multiple SA methods would lead to more robust conclusions.This lack of consensus implies that more work is needed to answer how to choose the most appropriate SA method.
SA methods have been applied to practical problems in many fields (Campolongo and Saltelli, 1997;De Pauw et al., 2008;Yamwong and Achalakul, 2011).For hydrological and land surface models, Collins and Avissar (1994) employed the Fourier amplitude sensitivity test (FAST) to evaluate the parameter importance to the sensible heat and latent heat in the LAID (land-atmosphere interactive dynamic) land surface scheme.Bastidas et al. (1999) proposed the multiobjective generalized sensitivity analysis (MOGSA) method and screened out 18 sensitive parameters from a total of 25 parameters in the BATS (biosphere-atmosphere transfer scheme) model.It was demonstrated that the degradation in the quality of the calibrated model performance is negligible if the insensitive parameters were not calibrated.Tang et al. (2007) applied local and global SA methods on the lumped Sacramento soil moisture accounting model (SAC-SMA).Their aim was to identify sensitivity tools that would advance the understanding of lumped hydrologic models.The relative efficiency and effectiveness of several SA methods have been analyzed and compared.Hou et al. (2012) introduced an uncertainty quantification framework to analyze the sensitivity of 10 hydrologic parameters in CLM4SP (Community Land Model Version 4 with satellite phenology) with a generalized linear model (GLM) method.They found that the simulation of sensible heat and latent heat is sensitive to subsurface runoff generation parameters.In the afore-mentioned work, many SA methods have shown their effectiveness in screening out important parameters.However, for large complex dynamic system models, which are expensive to run, we need to be able to screen out important parameters with as few model runs as possible.Therefore, the goal of this study is to investigate the effectiveness and efficiency of different qualitative SA methods for parameter screening.
Several SA methods were used to evaluate the importance of 40 adjustable parameters in the Common Land Model (CoLM).The work has two objectives: (1) to test and compare different qualitative SA methods for separating sensitive parameters from insensitive ones; and (2) to validate the screening results using a quantitative SA method.Towards these objectives, this study first screened out the sensitive parameters qualitatively with a small amount of samples, and then quantified the sensitivity of all parameters using a quantitative SA method.
The paper is organized as follows.Section 2 presents a brief introduction of the qualitative SA methods for parameter screening and the quantitative SA method for computing the parameter importance.Section 3 introduces the model used, CoLM, and its adjustable parameters.The study area, the forcing and validation data, and the design of the sensitivity study are also described.Section 4 presents the results and discusses the performance of qualitative and quantitative SA methods.The physical interpretations of the screening results are also examined.Section 5 provides the conclusions.

Local method
Local method is a derivative-based sensitivity method.The sensitivity of variable X i ∈ [a i b i ] is computed as the normalized local sensitivity scaled by the variable range: , where s i is the local sensitivity measure, Y is the model output, α i is a value of X i at which the sensitivity is evaluated, and a i and b i are the lower and upper bounds of X i .The variable with a high s i value is considered to have a high impact on the model output.Obviously the value of s i is dependent on location α i .Neumann (2012) micropollutant degradation model 10 Applying multiple SA methods with multiple objectives was expected to lead to more robust conclusions.Sun et al. (2012) water quality model 6 RSA (regional sensitivity analysis) was more appropriate for complex models where system nonlinearities and parameter interactions were more likely to be important.

Sum-of-trees (SOT) method
The SOT method is a tree-based method.A single regression tree model is a step function, which is obtained by recursively partitioning the data space and fitting a simple prediction model (generally, the average value) within each partition (Breiman et al., 1984).In the process of recursively partitioning, the variables are split to cause maximum decrease in impurity function (residual sum of squares) until the impurity function falls below a threshold.The SOT model uses a certain number of bootstrapped samples to build independent regression trees and then averages them (Breiman, 2001).The total number of splits for each variable in the model stands for the importance of this variable, i.e., the variable with the most splits in the model is considered to be the most important one.

Multivariate adaptive regression splines (MARS) method
The MARS method (Friedman, 1991;Shahsavani et al., 2010) is an extension of the regression tree method.After recursively partitioning the data space, it builds localized regression models (first-order linear or second-order nonlinear) instead of step functions.Therefore, this method can produce continuous models with continuous derivatives and has better fitting ability.This method includes a forward procedure and a backward procedure.The forward procedure builds an over-fitted model by considering all variables, while the backward procedure prunes the over-fitted model by removing one variable at a time.For each model M, a generalized cross-validation (GCV) score can be computed: where N is the number of observations, Y i is the ith observation, Ŷi is the estimated value of Y i , C (M), which is equal to 1 + c (M) d, is the number of effective parameters, where d is the effective degrees of freedom, and c (M) is a penalty for adding a basic function.
To screen out the important variables, the increase in GCV values between the pruned model and the over-fitted model is considered as the importance measure of the removed variable (Steinberg et al., 1999).The larger the GCV increase, the more important is the removed variable.
The MARS method is actually a surrogate-model method.Shahsavani et al. (2010) showed that MARS provides acceptable estimates of total sensitivity indices at a much lower cost than using only runs of the original model.

Delta test (DT) method
DT method is a variable selection method based on the nearest neighbor approach.Let Y = F (X) = F (X 1 , . . ., X m ) + ε, where the noise ε = (ε 1 , . . ., ε m ), ε i (i = 1, . . ., m) is independent identically distributed random variable with zero mean.The DT criterion of a variable subset S ⊆ {X 1 , . .., X m }, δ (S), can be computed as where N S (i) = arg min k =i X i −X k 2 S represents the nearest neighbors of the input point X i for the subset S, Y N S (i) is the function value corresponding to N S (i), Y i is the function value corresponding to X i , and N is the sample size.δ (S) is an estimate of the variance of the residual (converges to the true residual in the limit N → ∞) when only the variables in S are selected for regression.It has been demonstrated that either adding the unrelated variables or omitting the related ones will increase the δ value (Eirola et al., 2008).Therefore, the variable subset S with the smallest DT  criterion corresponds to the most important subset of variables, i.e., the most sensitive parameters.
For high dimensional problems, it is impractical to compute all possible combinations of variable subsets (e.g., for 40 variables, the total configuration of subsets is 2 40 − 1).Therefore, to speed up the search for the variable subset with a minimum δ (S), search algorithms such as GA are often used (Guillen et al., 2008).Thus, the reliability of DT results depends on the effectiveness of the search algorithm applied.

Morris method
Morris method is a gradient-based SA method using an individually randomized Morris one-factor-at-a-time (MOAT) design (Morris, 1991).This study employed an enhanced Morris method (Campolongo et al., 2007).Consider a model   with k independent inputs X i (i = 1, . . ., k), whose ranges are normalized to [0, 1].The experimentation region is a discrete k-dimensional p level grid.For a given value of point X 0 = (x 1 , x 2 , . . ., x k ), the elementary effect of variable X j is defined as where is a value in 1/p − 1, . .., p − 2/p − 1.The sampling strategy generates a random starting point for each trajectory and then completes it by perturbing one input variable by + or − at a time in a random order.At the end of process, a trajectory spanning k+1 points is evaluated to compute the elementary effects for all k input variables.After repeating this procedure r times to construct r trajectories of k+1 points in the input space, the total cost of the experiment is thus r × (k+1).The mean of |d j |, µ j , and the standard deviation of d j , σ j , can be construed as the sensitivity indices of input variable X j : where µ j assesses the overall influence of X j on the output, while σ j estimates the higher order effects (i.e., effects due to interactions) of X j .
Because of its characteristics of small computational demands, Morris method has been widely applied.Herman et al. (2013) demonstrated that it was able to correctly identify sensitive and insensitive parameters for a highly parameterized, spatially distributed watershed model with 300 times fewer model evaluations than the Sobol' method.

Sobol' method
Sobol' method (Sobol', 1993) is a quantitative SA method based on the variance decomposition theory, which decomposes the variance of the output as   the total number of variables, and V i represents the part of variance of output which can be explained by the ith variable only, V ij represents the part of variance of output which can be explained by the interaction of the ith and j th variables, V 1,2...,n represents the part of variance of output which can be explained by the interaction of all the variables.The Sobol' sensitivity index is defined as S i 1 ,...,i s = V i 1 ,...,i s V , where V i 1 ,...,i s denotes the variance corresponding to (i 1 , . . ., i s ), and the integer s is called the order or the dimension of the index.All the values of S i 1 ,...,i s are nonnegative, and their sum is where S i = V i V is the main effect (first order effect) of the ith variable, and S ij = V ij V is the interaction effect (second order effect) of the ith and j th variables (Sobol', 2001).The total effect of the ith variable can be obtained by Eq. ( 6), where V −i is the variance without considering the i-th variable (Homma and Saltelli, 1996): The total effect reflects the variable's contribution to the variance of model output.The values of those indices for important variables are generally much higher than those for unimportant ones.The Sobol' method can provide reliable quantitative sensitivity information of the input variables.However, for a high dimensional problem, it needs a large number of model runs (10 4 to 10 5 or more).For example, Rosolem et al. (2012) used 45 000 model runs to assess the Sobol' sensitivity indices of 42 parameters in the Simple Biosphere 3 (SiB3) model.Zhang et al. (2013) used 60 000 model runs to study the sensitivities of 28 parameters in the Soil and Water Assessment Tool (SWAT) model through Sobol' method.If a small number of model runs is used, the estimates of the total effects vary greatly around the analytical values, and at times can take on unphysical negative values (Saltelli et al., 2000).To avoid unphysical variance values and to reduce the need for extremely large number of model runs, we carried out Sobol' analysis on the response surface model instead of the original model.The response surface model here is constructed by the MARS method, introduced in Sect.2.3.The effectiveness of the response surface model based Sobol' method (RSMSobol) has been demonstrated by Storlie et al. (2009).To assess the importance of parameter P (i), we computed the relative values of the total effects of parameter P (i): The cumulative importance of a subset of parameters, A, can be computed as 3 Experimental setup

CoLM and adjustable parameters
CoLM (Dai et al., 2003) is a widely used land surface model.It combines the advantages of three existing land surface models: Land Surface Model (LSM) (Bonan, 1996), Biosphere-atmosphere transfer scheme (BATS) (Dickinson et al., 1993) and Institute of Atmospheric Physics landsurface model (IAP94) (Dai and Zeng, 1997).In recent years, it has incorporated different physical processes such as glacier, lake, wetland and dynamic vegetation.It has also been successfully implemented in several global atmospheric models (Yuan and Liang, 2010).CoLM considers the biophysical, biochemical, ecological and hydrological processes.The energy and water transmission among soil, vegetation, snow and atmosphere is well described.The model contains one vegetation layer, 10 unevenly distributed vertical soil layers, and up to five snow layers (depending on the snow depth).The parameterization scheme of soil thermal and hydraulic properties are derived from Farouki (1986), Clapp and Hornberger (1978) and Cosby et al. (1984).The parameterization scheme of snow is synthesized from Anderson (1976), Jordan (1991) and Dai et al. (1997).
In this study, forty of the time-invariant coefficients and exponents in CoLM, i.e., model parameters, are chosen as parameters that can be adjusted according to local conditions.Their physical meanings and value ranges are shown in Table 2.These adjustable parameters can be classified into three categories: canopy, soil and snow.The default parameters of canopy depend on the vegetation type in the 24-category (USGS) vegetation dataset.Soil parameters depend on the soil texture in the 17-category (FAO-STATSGO) soil dataset.Snow parameters depend on the snow depth.In this paper, the parameter ranges are the lower and upper bounds among all the possible types of canopy, soil and snow types (Ji and Dai, 2010).Note that the initial parameter ranges can have significant influence on the result of sensitivity analysis.For example, y = (a 2 + b)x where the range of input "x" and parameter "b" are both [0,1].Obviously, parameter "a" is sensitive when the absolute value of "a" is very large, and insensitive when "a" is close to zero.The initial parameter ranges must be carefully selected and the analysis result may be valid only for these ranges.For convenience, these parameters are index numbered from P1 to P40.
This study screens sensitive parameters for six land surface fluxes: sensible heat, latent heat, upward longwave radiation, net radiation, soil temperature and soil moisture.The objective function is the root-mean squared error normalized by the geometric mean (Parada et al., 2003): where N is the number of observations, j indexes the time step, y sim i,j and y obs i,j are the simulated and observed values, and i ranges from one to six, standing for different flux types, respectively.All the objective functions and their descriptions are shown in Table 3. Objective function represents  the performance of simulation, so a smaller RMSE means a better performance.

Study area and datasets
The Heihe River basin, the second largest inland river basin in the arid region of northwest China, is located between 96 • 42 -102 • 00 E and 37 • 41 -42 • 42 N, and covers an area of approximately 130 000 km 2 .The Heihe River basin, whose altitude varies approximately from 0 to 5500 m, is covered by a variety of land use types, including desert, farmland, forest, grassland, snow cap, etc.Therefore, it is an ideal region for the study of LSM.In this paper, A'rou observation station, which is located upstream of the Heihe River basin, is chosen for the study area.The results of SA methods intercomparison will be helpful for following up research projects of the whole region.The geographic coordinate of A'rou is 100 • 28 E, 30 • 08 N (see Fig. 1); its altitude is 3032.8m above sea level.It belongs to the typical continental climate.The underlying surface type is alpine steppe.
The forcing data and validation data is shown in Table 4.The forcing data of CoLM includes downward shortwave and longwave radiation, precipitation, air temperature, relative humidity, air pressure and wind speed (Hu et al., 2008).The validation data contains observations of six fluxes.These six fluxes are all important physical quantities between land surface and atmosphere.Soil temperature and moisture data are available for depths of 10, 20, 40 and 80 cm.Because the soil column in CoLM is divided into 10 layers (the depths are shown in Table 5), we used the linear interpolation method to achieve soil temperature and moisture calculations for the observed depths.
The data for year 2008 was used to spin up CoLM.Model simulations from 1 January 2009 to 31 December 2009 with a 3 h time step are used to evaluate model parameter sensitivity.

Design of sensitivity study
This study used a newly developed software package named Problem Solving environment for Uncertainty Analysis and Design Exploration (PSUADE) (Tong, 2005) for all SA analyses.PSUADE implements various uncertainty quantification (UQ) tools such as design of experiments, sampling methods, qualitative and quantitative sensitivity analysis, response surface, uncertainty assessment, and numerical optimization.
We conducted the SA study in two stages: qualitative parameter screening and quantitative validation.In the first stage, the study investigates the proper sampling designs and sample sizes for different qualitative SA methods.Once the proper sampling design and sample size are determined for each qualitative method, the most sensitive parameters that control each of the six flux simulations are identified.In the second stage, the quantitative method, RSMSobol, is used to validate the parameter screening results from the first stage based on the contributions of screened parameters to the total variances of model outputs.The parameter screening results  are also checked for their consistency with the parameters' physical interpretations.

Sampling methods and sample sizes
We tested and compared different sampling methods and sample sizes (see Table 6).For SOT, MARS and DT, three sampling methods were evaluated: Monte Carlo (MC) (Hastings, 1970), Latin Hypercube (LH) (McKay et al., 2000) and LPTAU (quasi random sequences) (Statnikow and Matusov, 2002).For each sampling method, different sample sizes, 200, 400 and 1000 (i.e., 5, 10 and 25 times of the number of parameters, respectively), were investigated.Morris method has its own sampling method.The sample size of Morris method is generally set as a multiple of n + 1, where n is the number of parameters.Therefore this study tested three sample sizes: 205, 410 and 1025 for Morris method.
Take the results of SOT for example, which examines parameters most sensitive to sensible heat flux.The SOT sensitivity scores of 40 parameters given by different sampling designs are shown in Fig. 2. The numbers along each circle represent different parameters, with the length of the needles, which range from 0 to 100, indicating the relative sensitivities of different parameters.
From Fig. 2, we can see the most important parameters based on SOT method.With 1000 samples, all sampling methods identified the same sensitive parameters: P36, P6, P30, P2, P34 and P17.When the sample size is reduced to 400 for LH and LPTAU, the results are similar to those at 1000 samples, suggesting that a sample size of 400 is adequate for identifying the most sensitive parameters.With 400 samples, SOT based on MC sampling method can still screen out the same parameters, but the medium sensitive parameters, P2, P34 and P17, are not as clearly identified.With 200 samples, even though SOT using all the three sampling methods can still find all sensitive parameters, the relative sensitivities of the medium sensitive parameters are too small to be seen clearly (e.g., P17).This suggests that 200 samples may not be enough for SOT method.Thus, LH and LPTAU are considered to be better sampling designs for SOT, and 400 samples are enough for these sampling designs.
Similarly, Figs.3-5 show the results of MARS, DT and Morris methods.We have the following observations: (1) for MARS method, the results based on MC, LH and LPTAU are nearly the same, 400 samples are enough for all sampling methods; (2) LH is more suitable for DT, 400 samples are enough; and (3) for Morris method, 410 samples are enough.
Based on the above results, it seems clear that 10 times the number of parameters are approximately enough for qualitative SA methods to screen 40 parameters of CoLM.In the following study, LH is chosen for SOT, LPTAU is chosen for MARS, and LH is chosen for DT.The sample size is set to  400 for these three designs.For Morris method, the sample size is set to 410.

Intercomparison of qualitative SA methods
The parameter screening results by all qualitative SA methods for all fluxes are summarized in Figs.6-11.The sensitivity scores of 40 parameters are normalized to [0, 1].The most sensitive parameters get a score of 1, while the least sensitive ones get a 0 score.The vertical axis in these figures denotes different SA methods and the horizontal axis denotes the 40 parameters.The grey scale of each grid indicates the sensitivity level of each parameter by each SA method.In Fig. 6, for example, the dark grey color for P6 and P36 indicates that they are the most sensitive parameters for sensible heat flux.
From these figures we have three interesting findings.First, for each land surface flux, the number of sensitive parameters appears to be less than 10.For latent heat and sensible heat fluxes, there are more sensitive parameters as compared to other fluxes, which have only 2-3 sensitive parameters.Second, the results of SOT, MARS and Morris methods are consistent with each other except for the case of latent heat.For latent heat, the number of sensitive parameters is relatively larger than that of other fluxes (this is confirmed in the following quantitative SA).SOT, MARS and Morris methods got similar results for the most sensitive parameters, but there are some discrepancies in identifying the medium sensitive parameters for latent heat.Third, the results of Local method and DT appear very different from that of other methods.Local method often takes sensitive parameters as insensitive ones (type I error, e.g., P3 for soil moisture) or the insensitive parameters as sensitive ones (type II error, e.g., P20 and P27 for sensible heat).The possible reason is that the local behavior near one specific parameter set is different from the global behavior.The most sensitive parameters given by DT are similar to other methods, but results for medium sensitive parameters are significantly different, especially when there are a large number of sensitive parameters (e.g., in the cases of sensible heat and latent heat).We suspected that the GA used in DT failed to find the optimal parameter subset in those cases.

Validation of parameter screening results
The qualitative SA methods identified the most sensitive parameters for different fluxes data, as shown in the previous section.Here we use RSMSobol method to confirm if these findings are reasonable.The total effect is computed by RSMSobol using 2000 samples to assess the importance of each parameter.The results are shown in Fig. 12, in which each slice of the pie chart indicates the relative importance of the parameter, as computed by Eq. ( 7).The RSMSobol results obtained are deemed reliable since the training and testing errors of the response surface are below 2.5 %.The training error is computed by the training samples, which are used to construct the response surface, while the testing error is computed by the other samples.We note from Fig. 12 that the number of important parameters for each flux is indeed   , 2-8).This confirms that the results of qualitative SA methods are reasonable.
Table 7 shows the cumulative importance of the 10 most sensitive parameters selected by different qualitative SA methods, as computed according Eq. ( 8).The SA method is regarded as effective if the cumulative importance of the 10 most sensitive parameters is close to 100 %.Obviously, local method is ineffective in screening the important parameters for sensible heat (79.74 %), latent heat (57.98 %), upward longwave radiation (51.57%) and net radiation (85.71 %); while the other methods are effective because the cumulative importance of the 10 most sensitive parameters are close to 100 %.Furthermore, to confirm the effectiveness of global SA methods, Fig. 13 shows the cumulative importance of the top 10 sensitive parameters screened by different SA methods.According to Fig. 13, the SOT, MARS and Morris methods performed well for all the land surface fluxes as their cumulative importance curves are always higher than others.
DT is prone to selecting more parameters than other methods (committing type II error) and does not distinguish the medium sensitive from highly sensitive parameters.But the result of validation shows that the most sensitive parameters selected by DT are nearly the same to that given by the other global methods, even though the medium sensitive parame-ters may differ from the ones identified by other SA methods.This suggests that a type II error possibly committed by DT is not as damaging as a type I error, as in the case of local method.
In summary, global SA methods, SOT, MARS, DT and Morris methods, are effective to reliably screen the most sensitive parameters with only 400 samples for a 40-parameters problem, even though DT may commit a type II error.Local gradient SA is helpful if we are interested in particular events or a special parameter set, but it might give misleading results when we are concerned about analyzing global behavior.

The consistency of the screening results and physical interpretations
In previous sections, we used five different qualitative SA methods to identify the most sensitive parameters for all flux types.The quantitative RSMSobol method confirms that the qualitative SA results are reasonable.Here we try to explain the SA results based on physical interpretations of the screened parameters.P6 and P3 are shown to be the most important parameters for soil moisture (see Figs. 11 and 12).From Clapp and Hornberger (1978), P6 (Clapp and Hornberger "b" parameter) is the exponent of wetness in the formulas for soil hydraulic conductivity and water potential, and P3 (porosity) is a part of the denominator in the formulas that compute wetness.A small perturbation in these values would result in much change to soil moisture.Therefore these two parameters are sensitive for soil moisture.It should be mentioned that P2 (saturated hydraulic conductivity) and P4 (minimum soil suction) will also affect the simulation of soil moisture (see Fig. 11), but they are not as sensitive as P6 and P3, which have an exponential relationship with soil moisture.
Besides soil moisture, P6 is also important for other land surface fluxes (see Fig. 12).This is because soil moisture is an important model output that is tied to sensible heat flux, latent heat flux and radiant fluxes (Henderson-Sellers, 1996).A parameter that exerts great influence on soil moisture should have an important impact on related fluxes.This finding is consistent with those of Lettenmaier et al. (1996).
P36 (aerodynamic roughness length) is another important parameter for sensible heat, latent heat, upward longwave radiation, net radiation and soil temperature (see Fig. 12).Through its influence on friction velocity, P36 affects the magnitude of aerodynamic resistance and near-surface drag force for the simulation of sensible heat, latent heat, and radiant fluxes, and therefore indirectly affects estimates of soil temperature (Dorman and Sellers, 1989).P17 (the inverse of square root of leaf dimension), P30 (longwave reflectance of living leaf) and P34 (longwave transmittance of living leaf) are sensitive to the simulations of surface temperature and air temperature.Accordingly, they are important for sensible heat and net radiation.The sensitivity of other parameters, including P18 (quantum efficiency of vegetation photosynthesis) and P4 (minimum soil suction), to latent heat can be explained by their influence on evapotranspiration.
But not all the parameters in the screening results can be explained based on physical interpretations (e.g., P12 in screening result for latent heat).Possible reasons are (1) due to the limitation of the SA methods and the sample sizes as the insensitive parameters might be regarded as sensitive ones; (2) due to a lack of authenticity of the model structure as the physical processes might not be described perfectly; (3) due to local conditions or a lack of appropriate observations for sensitivity evaluation (e.g., saturated hydraulic conductivity P2 is not sensitive because there is no runoff observations); (4) input uncertainty caused by observation error possibly having non-ignorable influence on the sensitivity analysis; (5) screening the sensitive parameters for a complex model may be a non-uniqueness issue.

Conclusions
In this study, we first identified the most sensitive parameters for sensible heat, latent heat, upward longwave radiation, net radiation, soil temperature and soil moisture using five different qualitative SA methods.We investigated the proper sampling design and sample size necessary for screening the parameters effectively.Based on the SA results, there are 2-8 parameters that are deemed as most sensitive in CoLM, depending on the flux type.We employed a quantitative SA method to confirm the screening results.The results of the quantitative method are consistent with those of qualitative methods.Moreover, the screening results are generally consistent with the physical interpretations of the model parameters.
By using meteorological and land surface observation data in A'rou for the Heihe River basin in northwestern China, this study demonstrates the feasibility of employing different qualitative global SA methods to find the most important parameters in a complex model, which is similar to methods used by Massmann and Holzmann (2012).Though different methods are compared, we confirmed that global SA methods are more suitable for complex models to screen out the most sensitive parameters from the insensitive ones.Because there exist some differences among the rank of screened parameters given by different SA methods, we suggest that multiple SA methods be applied for a complex problem, which is also supported by Neumann (2012).
For a 40-parameter CoLM, we were able to screen out the most important parameters using only about 400 samples, which is similar to Confalonieri et al. (2010).The kind of parameter screening approach studied here should be applicable to other complicated models.However, caution must be exercised in interpreting these results.The parameters identified in this study were obtained with data of limited length and at a single site with particular geographical conditions.Results from a different location or a different condition can be quite different from the ones shown in this study.The screened parameters are also tied to available land surface fluxes used in the study.Parameters such as saturated hydraulic conductivity (P2) were not considered important parameters because we did not examine parameter sensitivity to runoff generation.To truly understand the parameter sensitivity for CoLM, we need to conduct a more comprehensive SA study by including more geographical locations, more observation data types and longer datasets.In future research, parameter screening of CoLM will be extended to regional and even global scale by using more available data.
Even though we identified the most important parameters for CoLM, we did not perform model calibration to obtain the most appropriate estimates for these parameters.Model calibration for complex multi-flux, high-dimensional LSMs such as CoLM can be extremely complicated.To do model calibration in such cases, future studies must explore more mathematical tools, including the surrogate modeling approach, to save computational resources and therefore feasibly achieve a multi-objective optimization strategy for model calibration of multi-physics models.
soil temperature and moisture data contain the data of 10, 20, 40, 80 and 120 cm, respectively.

Fig. 2 .
Fig. 2. The sensitivity score of sensible heat given by SOT.The length of needles, which range from 0 to 100, represents the sensitivity score of each parameter.

Fig. 3 .
Fig. 3.The sensitivity score of sensible heat given by MARS.The length of needles represents the sensitivity score.

Fig. 4 .
Fig. 4. The sensitivity score of sensible heat given by DT.The length of needles represents the sensitivity score.

Fig. 5 .
Fig. 5.The sensitivity score of sensible heat given by Morris method.The length of needles represents the sensitivity score.

Fig. 6 .
Fig. 6.The qualitative sensitivity analysis results of different methods for sensible heat.The sensitivity scores are normalized to [0, 1]; 1 means most sensitive and 0 means least sensitive.

Fig. 7 .
Fig. 7.The qualitative sensitivity analysis results of different methods for latent heat.The sensitivity scores are normalized to [0, 1]; 1 means most sensitive and 0 means least sensitive.

Fig. 8 .
Fig. 8.The qualitative sensitivity analysis results of different methods for upward longwave radiation.The sensitivity scores are normalized to [0, 1]; 1 means most sensitive and 0 means least sensitive.

Fig. 9 .
Fig. 9.The qualitative sensitivity analysis results of different methods for net radiation.The sensitivity scores are normalized to [0, 1]; 1 means most sensitive and 0 means least sensitive.

Fig. 10 .
Fig. 10.The qualitative sensitivity analysis results of different methods for soil temperature.The sensitivity scores are normalized to [0, 1]; 1 means most sensitive and 0 means least sensitive.

Fig. 11 .
Fig. 11.The qualitative sensitivity analysis results of different methods for soil moisture.The sensitivity scores are normalized to [0, 1]; 1 means most sensitive and 0 means least sensitive.

Fig. 12 .
Fig. 12.The relative importance of parameters obtained by RSMSobol' total effect analysis.

Fig. 13 .
Fig. 13.The relationship between the number of screened parameters and cumulated relative importance for different sensitivity analysis methods.

Table 1 .
Comparison of different SA methods.

Table 2 .
Adjustable parameters and their ranges.

Table 3 .
The objective functions.

Table 4 .
The forcing data and validation data taken from A'rou observation station.

Table 5 .
The depth of each layer.

Table 6 .
The experiment designs to confirm the proper sampling methods and sample size for SA methods.

Table 7 .
The cumulative importance of the 10 most sensitive parameters screened by different qualitative SA methods.SA method Sensible heat Latent heat Upward longwave Net radiation Soil temperature Soil moisture