Interactive comment on “Quantifying the impacts of human water use and climate variations on recent drying of Lake Urmia basin: the value of different sets of spaceborne and in-situ data for calibrating a hydrological model” by

The present study aims to quantify (estimate) the impact of human water consumption—as for irrigation, livestock, domestic, manufacturing, and thermal energy production—versus (natural) climatic variability on the water balance and storage of the Lake Urmia (LU) basin and consequently the lake desiccation during the

The time period of the analysis In their analysis, authors only considered the time period 2003-2013. It is not clear why both earlier years and the most recent years are excluded from the analysis. It is well-known that there has been significant changes in the basin and the lake water level since late 1990s. Using a statistical change point analysis, Khazaei et al. [in review] identified the year 2000 as the beginning of the period with significant changes in the lake dynamics. So, it is crucial to include all the years post 2000-the data availability is not an issue as ULRP now provides researchers with required data. For instance, annual inflows to the lake are highly variable during this period (see the supplement figure), which you also used for calibrating the model. Given such degree of variability, the modelling results are likely sensitive to this variable, and it is important to use the entire time period. C2

Irrigated area
You have assumed that the irrigated area in 2013 remains the same as 2012. This assumption is wrong and known to be wrong, as we know that the irrigated area has increased since 2012. Chaudhari et al. [2018] estimated the cropland area of the basin using both MODIS and HYDE 3.1 products (also notice the difference between these two products on Fig 3). Land use change is a major driver in this basin. Expansion of the irrigated area has increased the evapotranspiration losses, and consequently lead to less runoff in the basin and in turn less inflows to the lake. Therefore, I expect a useful model to be highly sensitivity to this input, and thus improper handling of this input data could be major source of uncertainty.

Groundwater
It is not clear how representative the 248 groundwater (GW) wells data (section 2.23) are for the deep GW withdrawals in the basin. Failing to include deep GW abstraction can bias your results underestimating both groundwater and net abstractions in the basin (Figure 7), which seems to be the case. ULRP [2015] estimated the total amount of groundwater extraction in the basin, in 2013-2014, around 2,200 MCM, of which around 900 MCM were extraction from shallow groundwater and about 1000 MCM from deep groundwater. In your results, the estimated GW abstraction, for the same year, for the most input-comprehensive variant of RS_Q_GW_NA are around 1,200 MCM-well below the actual estimation. The groundwater extraction in the basin has also had an enormous impact on the inflowing runoff to the lake as well.

Water consumption
Lake Urmia is a highly regulated basin, so embedding water withdrawal/consumption in the basin's model is crucial. In doing so, however, you have used the water withdrawal records only for the year 2009. There is a high (year to year) variability in C3 water withdrawal/consumption in the basin, implicitly indicated by the high variability of the inflow to lake, as presented in your Figures 7 and 8. Year 2009 has one of the least amount of inflows to the lake (Figure 7) implying possibly a high water consumption in the basin. That said, given the variability of water withdrawal/consumption in the basin, including a single year is not sufficient and could lead to significant bias/uncertainty in your modelling results. Further, in using water withdrawal/consumption records be aware of possible inconsistencies between the definition and estimation of water withdrawal/consumption/use by different water agencies/authorities or studies, and the related metrics or model fluxes. Such inconsistencies could lead to methodological fallacies and unreliability of the analysis results [Madani and Khatami, 2015].

Model calibration and over-parametrisation
Based on table 2, in the variant RS_W_GW the parameters α and β vary between 0.45 to 0.47 and 0.47 to 0.52, respectively. That said, by introducing the NA data into the variant RS_Q_GW_NA the parameter ranges expand significantly to 0.29 to 0.56 for β and 0.39 to 0.63 for β. This implies that the model is not benefiting from the additional information content. For instance, there is no significant improvement from RS_W_GW to RS_W_GW_NA evidenced by Figure 8 and metric values on table 4; for years 2003, 2004, 2007, and 2009 the RS_Q_GW variant is even better than the RS_Q_GW_NA in terms of estimating the annual inflow to the lake ( Figure 8b). That is, the model is insensitive to adding new data NA. Instead the new information is compensated by the model parameters. It is an indication of over-parameterisation in your model, which is expected for annual multipliers, that undermines the reliability of the results. While I understand the rationale behind a year-specific parameter value, it is a serious issue. Instead, you should do a more efficient and effective parameter search (instead of manual calibration) finding one or (more desirably) a number of acceptable parameter sets. Then to ensure their reliability you can do a year-by-year evaluation of the model performance, i.e. evaluating the model performance against a C4 given metric for each year separately. Also, it would be helpful to include a schematic of the model structure demonstrating its fluxes, storages, and their interconnection. This can help the reader to better understand the mechanism and process-representation of the model.

Model calibration/evaluation and performance metrics
It is a well-established fact that CC is an inadequate measure for model evaluation [Willmott, 1981]. It is especially redundant to use CC together with NSE, as CC is already included in the NSE metric; see the NSE decomposition by Murphy [1988] and Gupta et al. [2009]. Further, NSE puts more emphasis on the larger values, e.g. as Pushpalatha et al. [2012] showed it focuses on the top 20% of discharge flows. Therefore, calibration by NSE introduces bias. So, it's more useful to combine NSE with bias rather than RMSE. Furthermore, other metrics such as Willmott's refined index of agreement [Willmott et al., 2012] and KGE [Gupta et al., 2009] shown to be better than NSE. Also, on P 5 L 25-29, you already explained that the standard WGHM is not calibrated for LU basin. Yet you reported the results of the standard model on Figures 7-8, and discussed it through the results and discussion. Including the standard variant in your results and discussion does not serve the manuscript any benefit other than adding to its bulk. Further, it is also a well-known fact that (hydrologic) model cannot be validated [Konikow and Bredehoeft, 1992;Oreskes et al., 1994]. Therefore, using the term validation is both semantically and theoretically wrong. As a matter of good practice, it's been recommended to use the term evaluation instead of validation [Beven and Young, 2013]. Same comment applies to terms such as optimal values and optimal fit throughout the manuscript-there is no optimal set.

C5
In this study you have not investigated the model parameter space other than four calibration variants, while the calibration is done manually. So, first, there is no way to justify that the manually calibrated parameter values are the best-performing calibrations (despite what you said e.g. on P1 L16). Further, no sensitivity nor uncertainty analysis is performed which is nowadays a requirement for publishing a modelling analysis in the hydrologic community. Without any sensitivity/uncertainty analysis the reliability of the modelling results is questionable. Other than the model structure and parameters, the role of data uncertainty is important and should be discussed. For instance, the annual inflows are not the exact inflows to the lake. They are estimates of the last station, sometimes as far as 50 km from the lake.

Discussion of modelling results and equifinality
There are fundamental issues in the discussion within the first paragraph of section 3.2. First, adding a new input data into model calibration/evaluation does not necessarily decrease the modelling uncertainty. In fact, by adding each new data you're introducing a new source of uncertainty as there is also uncertainty associated with data themselves; especially in a case like LU basin where the data uncertainty is very high both for ground and remote sensing data. The model parameter equifinality is not necessarily a problem [Savenije, 2001], in fact it can help us to improve our modelling in the face of uncertainty [Beven, 2009]. As all (hydrologic) models are wrong [Box, 1976], model ensembles-as multiple working hypotheses-are better suited than calibrated model with a single parameter set to describe/predict hydrologic systems given the uncertainties [Beven, 2012;Beven et al., 2012;Chamberlin, 1890;Clark et al., 2011;2012]. Despite your statement, parameter equifinality does not ask for additional data! Adding data, in fact, may even exacerbate the model parameter equifinality, and one cannot make up for parameter equifinality by adjusting the parameter values. For instance, the model structure may not be able to benefit from the additional information content, C6 and therefore the new data is redundant (which seems to be the case comparing the results of variants RS_Q_GW and RS_Q_GQ_NA). What parameter equifinality implies is a more thorough search of the parameter and model space as well as more rigorous model evaluation schemes such as limits of acceptability approach [Beven and Binley, 2014]; even then the parameter equifinality will remain. In other words, each model variant is prone to model parameter equifinality. Further, parts of the discussion in this section are not new lessons (e.g. limitations of global hydrologic models in paragraph 2) and are well-established in the literature. Also, section 3 is very long, sometimes discussing too much details. I think it'd serve your discussion better to provide the top 3-5 main learned lessons as bullet points early on in the section. Then briefly explain each bullet point. The rest, especially modelling technicalities which may be valuable particularly for the reproducibility of the study, could be provided as supplement. In doing so, it's particularly helpful to restructure the result discussion by, first, explicitly discussing the limitations and uncertainties associated with the modelling design and consequent results.

On the role of human activities, climatic changes, and drought
On P 2 L 12-16 you stated that the decreasing trend in precipitation (P) and increasing trend of temperature (T), and thus increased evaporation, has very likely to contributed to the decrease in the lake volume. This is also reported as one of the main reasons for lake degradation on P3 L12. This statement is debatable. First, in our recent analysis [Khazaei et al., in review], we showed that the decrease in P and increase of T is not considerable in explaining the shrinkage of the lake; nor the decrease in T can be associated directly with an increase in lake evaporation. The major driver of the basin, as stated before, is the land use change and the substantial expansion C7 of cropland areas. This has led to increase in the irrigation hence less available runoff as for the lake inflow, and also caused a major increase in evapotranspiration.
The following sentence (last sentence of the paragraph) does not explicitly indicate the greater role of human activities in the lake desiccation compared to atmospheric climate change, which is the common finding of the most of the studies in this area [AghaKouchak et al., 2015;Aneseh et al., 2018;Stone, 2015;Torabi Haghighi et al., 2018;Vaheddoost and Aksoy, 2018]. On P 24 L 34 you concluded that "climate change must be constrained to prevent strong decreases of precipitation and runoff". It is not clear to me what you mean by constraining climate change here. Also, as discuss previously the role of human activities are more substantial in the lake's fate than atmospheric changes. Further, on P 3 L 2-3, you have discussed the role of drought in the lake's water level decline. First, the term drought is ambiguous, and it should be further specified what type of drought is discussed; atmospheric, hydrologic, agricultural, ecologic, or anthropogenic. Second, the analysis by AghaKouchak et al. [2015] indicated no considerable trend in droughts, at 0.05 significance level, during the past three decades. They argued that the region has undergone even more severe multi-year droughts in the past that did not cause a major change in the lake's surface area. They, therefore, cautioned against overrating the role of drought on the drying of the lake and disruption of its water balance. This is a technical note: you have used CC and NSE (P 9 L 11-12) to cross compare precipitation and temperature records of difference sources. First, I assume by CC you meant Pearson CC, which should be explicitly mentioned. Second, both CC and NSE are sensitive measures, i.e. a few number of large outliers can significantly change their values; especially for skewed distributions. It is better to use (more) resistant alternatives such as Spearman ranked correlation (instead of Pearson correlation) and Willmott's refined index of agreement [Willmott et al., 2012] (or ideally normalised the data using a transformation such as Box-Cox, first, and then compare the time series distance). C8