Regionalization of hydrological model parameters using gradient boosting machine

This discussion paper examines the use of gradient boosting machine learning to model the dependency of model parameters and estimate model parameters for four climate regions in China. The study also enhances the DTVGM model by incorporating the Penman-Monteith-Leuning (PML) equation. The discussion paper is well-presented, the study design is nicely conceived, and the results and discussion are presented fairly, with references to supporting figures, tables or cited literature, where appropriate. I only have minor comments to enhance what is already a high quality manuscript.


Responses to Referee #1's comments
The manuscript is interesting and clearly written.I have a single comment; in particular, what are the practical reasons for regionalizing the parameters of the hydrological model using regression algorithms, given that available data cover uniformly the region of the study; therefore, regionalization is not needed in my opinion (parameters are already available).
Response: We sincerely appreciate the referee's positive comments on the manuscript.
Our response to the reviewer's concern on regionalizing the model parameters is as follows and has been revised in the discussion part of the manuscript (Sec.4.4, Page 29).
In this study, though parameters were calibrated and thus available at each gridcell, the parameter values at around 450 gridcells were not reliable owning to poor model performance (i.e., KEG < 0) (Knoben et al., 2018;Koskinen et al., 2017;Sutanudjaja et al., 2018).Therefore, we only used the calibrated parameters with KGE ≥ 0 (i.e., representing better model performance) for regionalization of parameters.The model performance for 53% of these gridcells (with KGE < 0 prior to regionalization) were improved when we re-ran the model with regionalized parameters.Particularly, the KGE values in 37% of the gridcells (with KGE < 0 prior to regionalization) became positive, indicating a substantive improvement of the modeling performance.
Even though the parameters were well calibrated and available at each gridcell, one might think whether and which topographic and edaphic properties mediate these hydrological parameters.Our machine learning (i.e., GBM) based regionalization of parameters enables to estimate six key hydrological parameters using site-specific characteristics.Following the regionalization of parameters, our results of variable importance quantitatively indicate that the runoff generation parameters are majorly controlled by slope, saturated soil moisture content, and elevation.Moreover, the terrain attributes significantly regulate the runoff processes in relatively humid regions, while the saturated soil moisture content becomes a limiting factor in arid areas.The regionalization of parameters will improve our mechanistic understanding of the runoff generation processes and associated key hydrological parameters under different topographic and edaphic conditions."Response: We sincerely appreciate the referee's positive comments on the manuscript.

Responses to Referee #2's comments
All the comments and suggestions have been replied to below and have been addressed in the revision.
(1) In several places, the model is referred to as the "China-wide hydrological model" (see L105).This is somewhat confusing.Is this a formal name for the hydrological model?If so, I would expect to see a citation after the phrase to reference the use of this name.If this is not the formal name, it may be more helpful to say "We ran a hydrological model developed by Beck et al (2020), for country of China in a spatially distributed…" In Section 2.1, the title could be changed to "Application of the hydrological model across China." Response: Thanks for pointing this out!"China-wide hydrological model" is not a formal name for the model.We have rewritten the sentence to address this issue: "We (2) In my reading, there was some confusion about how you were able to compute evaluation metrics for all 15,640 grid cells (see Figure 5  "Hydrologic models often rely on regionalization approaches to transfer information from small to large spatial scale (e.g., from gridcell to subbasin, watershed, and regional scale) (Beck et al., 2020;Mizukami et al., 2017), and from gauged to ungauged catchments (He et al., 2011;Hrachowitz et al., 2013;Pagliero et al., 2019;Parajka et al., 2013).
In this study, though parameters were calibrated and thus available at each gridcell, the parameter values at around 450 gridcells were not reliable owning to poor model performance (i.e., KEG < 0) (Knoben et al., 2018;Koskinen et al., 2017;Sutanudjaja et al., 2018).Therefore, we only used the calibrated parameters with KGE ≥ 0 (i.e., representing better model performance) for regionalization of parameters.The model performance for 53% of these gridcells (with KGE < 0 prior to regionalization) were improved when we re-ran the model with regionalized parameters.Particularly, the KGE values in 37% of the gridcells (with KGE < 0 prior to regionalization) became positive, indicating a substantive improvement of the modeling performance.
Even though the parameters were well calibrated and available at each gridcell, one might think whether and which topographic and edaphic properties mediate these hydrological parameters.Our machine learning (i.e., GBM) based regionalization of parameters enables to estimate six key hydrological parameters using site-specific characteristics.Following the regionalization of parameters, our results of variable importance quantitatively indicate that the runoff generation parameters are majorly controlled by slope, saturated soil moisture content, and elevation.Moreover, the terrain attributes significantly regulate the runoff processes in relatively humid regions, while the saturated soil moisture content becomes a limiting factor in arid areas.The regionalization of parameters will improve our mechanistic understanding of the runoff generation processes and associated key hydrological parameters under different topographic and edaphic conditions." (3) Also in Figure 5, it would be helpful to show the comparison of the model performance with and without the PML addition so that one can see in quantifiable terms how the addition of the equation improves the calibration and validation performance.
Response: Thanks for the valuable suggestion!We have shown the comparison of model performance with and without PML addition in Supplementary Fig. S2 (see Fig. R1).We have also added the following text to describe the results: "A great deal of previous studies have highlighted the importance of incorporating the vegetation change information into hydrological models to achieve better performance in hydrological simulations (Donohue et al., 2007(Donohue et al., , 2010;;Gerten, 2013;Ivanov et al., 2008;Lei et al., 2014;Thompson et al., 2011).Additionally, it has been demonstrated that coupling the PML equation into hydrological models can improve the hydrological simulations under vegetation greening conditions (Bai et al., 2018;Li et al., 2009;Zhang et al., 2009;Zhou et al., 2013).were somewhat confusing.This could be my lack of familiarity with the TSS, but it may be helpful to look over those lines to see if you could improve the explanation there.
Response: Thanks for your comments.We have provided more detailed information of the Taylor skill score (TSS) in the revised paper as follows: "The Taylor skill score, as a comprehensive metric of correlation coefficient, standard deviation, and root mean square error, has been widely used in model evaluation (Mohan and Bhaskaran, 2019;Taylor, 2001)."[Line 214-216, Page 10] This discussion paper examines the use of gradient boosting machine learning to model the dependency of model parameters and estimate model parameters for four climate regions in China.The study also enhances the DTVGM model by incorporating the Penman-Monteith-Leuning (PML) equation.The discussion paper is well-presented, the study design is nicely conceived, and the results and discussion are presented fairly, with references to supporting figures, tables or cited literature, where appropriate.I only have minor comments to enhance what is already a high quality manuscript.
ran a hydrological model, i.e., Distributed Time-Variant Gain Model with the Penman-Monteith-Leuning equation (DTVGM-PML) developed in this study, for country of China in a spatially distributed manner" [Line 106, Page 5].We have also changed the title of Section 2.1 to "Application of the hydrological model across China".Other corresponding information has been revised [Line 103, Page 5; Line 130, Page 6; Line 549, Page 30].
for example).If you know the "truth" for runoff and ET at every grid cell, then why do you need a regionalization model?Response: Our response to the reviewer's concern on regionalizing the model parameters is as follows and has been revised in the discussion part of the manuscript [Sec.4.4, Page 29].
Figure R1.Comparison of model performance in runoff (a: KGE, and b: PBIAS) and ET (c: KGE, and d: PBIAS) simulation between DTVGM and DTVGM-PML in the calibration and validation periods.KGE denotes the Kling-Gupta efficiency.PBIAS denotes the percent bias".
Figure R2.Taylor skill scores (TSS) of each parameter generated from the multiple linear regression (MLR) and the gradient boosting machine (GBM).The Taylor skill scores were computed using parameters from all grid cells across China.