Modelling hydrological responses under climate change using machine learning algorithms – semi-arid river basin of peninsular India

Catchment scale conceptual hydrological models apply calibration parameters entirely based on observed historical data in the climate change impact assessment. The study used the most advanced machine learning algorithms based on Ensemble Regression and Random Forest models to develop dynamically calibrated factors which can form as a basis for the analysis of hydrological responses under climate change. The Random Forest algorithm was identified as a robust method to model the calibration factors with limited data for training and testing with precipitation, evapotranspiration and uncalibrated runoff based on various performance measures. The developed model was further used to study the runoff response under climate change variability of precipitation and temperatures. A statistical downscaling model based on K-means clustering, Classification and Regression Trees and Support Vector Regression was used to develop the precipitation and temperature projections based on MIROC GCM outputs with the RCP 4.5 scenario. The proposed modelling framework has been demonstrated on a semi-arid river basin of peninsular India, Krishna River Basin (KRB). The basin outlet runoff was predicted to decrease (13.26%) for future scenarios under climate change due to an increase in temperature (0.6 °C), compared to a precipitation increase (13.12%), resulting in an overall reduction in water availability over KRB.


INTRODUCTION
Hydrological models are promising tools to study various water resources engineering problems such as flood prediction and design, drought assessment, water quantity and quality assessment and hydrological responses under climate variability (Sood & Smakhtin 2015). The most important classification of hydrological models is empirical (data-driven), conceptual and physically based This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/). models (Aghakouchak & Habib 2010). Empirical models have evolved as data-driven using various regression , machine learning algorithms such as Artificial Neural Network (ANN) (Hu et al. 2018), least square-Support Vector Regression (SVR) (Bharti et al. 2017), etc. Such empirical models are primarily based on observations without considering the various hydrological processes involved (Jaiswal et al. 2020), whereas the physically-based models work on the principles of conservation of mass, momentum and energy to consider various hydrological processes along with field measured parameters and other basin characteristics (Aghakouchak & Habib 2010). The conceptual models are based on the simplified mathematical conceptualization of a hydrological system by considering various components at catchment scales and use model parameters in simulating the discharges (Jaiswal et al. 2020). Physically-based models are complex and require fine spatiotemporal data, whereas purely data-driven empirical models are limited completely to observations. Compared to physical and empirical based hydrological models, the conceptual models have gained much interest in the research community due to their capability to include various hydrological processes (e.g. runoff, evapotranspiration, surface storage, etc.) and ease of implementation with available data (Ammann et al. 2020). In this context, various conceptual models have evolved in the past, such as HYMOD (Parra et al. 2018), HBV model (Vormoor et al. 2018), etc. Model parameters or calibration factors are most prominent in conceptual hydrological models affecting the accuracy of streamflow estimates (Arnold et al. 2012). In this context, auto-calibration approaches for hydrological models have progressed in recent years, for example, the Soil Water Assessment Tool -Calibration Uncertainty Programs (SWAT-CUP) (Ha et al. 2017), and the Shuffled Complex Evolution Algorithm (SCE-UA) which carries the limitation of multiple optimal parameters and corresponding accuracies (Pohlert et al. 2007). Calibration of hydrological model parameters using remotely sensed variables has also gained interest in recent years, based on satellite evapotranspiration data (Immerzeel & Droogers 2008;Rientjes et al. 2013). To this end, various catchment processes have been conceptualized in hydrological modelling by including model parameters whose values can be determined through calibration (Mianabadi et al. 2019). In this context, calibration factors were implemented for various processes such as closure of water balance ( Jarsjö et al. 2008), radiation (Choudhury 1999) and vegetation dynamics (Donohue et al. 2007). Conceptual models developed by including such calibration factors on various processes have been used to study the hydrological responses under climate, land use and water changes as a function of various hydroclimatic variables and observed historical streamflows (e.g. Asokan et al. 2010).
Hydrological variables such as precipitation, evapotranspiration and streamflows have shown pronounced changes under climate signals and therefore the model parameters also should account for such changes (Srinivasa Raju & Nagesh Kumar 2018). Most of the conventional hydrological models apply such calibration parameters which were established based on observed historical streamflow data and other hydrological variables in the hydrological climate change impact assessment (e.g. Meenu et al. 2013). Such models integrate hydrological models and statistical downscaling models to study the hydrological responses based on the historically established model parameters (Hengade et al. 2018;Saraf & Regulwar 2018). Inclusion of time-invariant model parameters which are independent of the hydro-climatic variables and estimated solely based on the observed data are limited to capture the temporal variability of water-energy balance variables under climate change (Rehana et al. 2020a). The present study proposes to establish a relationship between the calibration factors and various hydroclimatic variables and uncalibrated runoff. Such an established functional form of calibration factors can be easily implementable for future scenarios under projected hydroclimatic variables. In this context, developing a calibration approach relating precipitation, evapotranspiration and uncalibrated streamflows can improve the model performance and understanding of hydrological processes (López et al. 2017). Therefore, estimating model parameters dynamically accounting for the expected hydrological variability under climate change can be promising. Furthermore, such approaches can be further extended with climate change projections to study the hydrological response of river basins.
The study proposed a modelling framework by relating the conceptual hydrological model parameter with various hydrological variables of precipitation, evapotranspiration and simulated runoff. Such established relationships will ease the estimation of model calibration factors for future scenarios with climate change projections obtained from statistical downscaling models. The major challenges in implementing such a framework are limited data and poor understanding of complex relationships between the calibration factors and various hydrological process-based variables. To overcome such limitations, the present study proposed an approach to model the parameters of conceptual models using various hydrological variables by adopting machine learning algorithms. The study emphasized the use of advanced machine learning algorithms for developing such complex relationships between model parameters and hydrological variables, which can form as a basis for studying the hydrological responses under climate variability. The proposed modelling framework has been demonstrated on a semi-arid river basin, Krishna River Basin (KRB), India.

Basin description
Krishna River Basin (KRB) is the fifth largest river system in India and second largest river basin of peninsular India with a total catchment area of 258,948 km 2 ( Figure 1). The KRB, 73°17 0 -81°9 0 E and 13°10 0 -19°22 0 N, has heavy precipitation over the Western Ghats with decreasing precipitation towards the upper and lower parts of the basin. The KRB originates in the Western Ghats of India and travels for about 1,400 km across the four states of Karnataka, Maharashtra, Andhra Pradesh and Telangana before joining the Bay of Bengal at Hamsaladeevi in Andhra Pradesh near the east coast of India. The major tributaries of the KRB include Ghataprabha, Malaprabha, Bhima, Tunga-Bhadra and Musi. About 44% of the KRB lies in Karnataka, with about 26, 15 and 15% of the basin in Maharashtra, Telangana and Andhra Pradesh respectively. The annual average precipitation over the basin is about 784 mm, with 90% of the precipitation occurring during the south-west monsoon from June to October (http://india-wris.nrsc.gov.in/wrpinfo/?title ¼ Krishna). Most of the basin has a prevailing semi-arid climate and the states covering the basin (Maharashtra, Karnataka and Telangana) are majorly drought-prone (http://india-wris.nrsc.gov.in). The major water use of the KRB is for irrigation with about 61.9 billion cubic meters per year (BCM/year) along with domestic and industrial water uses as 1.6 and 3.2 BCM/year (Rooijen et al. 2009).

Distributed hydrological model: PCRaster
In this study, hydrological modelling of the KRB was performed using the distributed hydrological model, PCRaster-water flow module using POLFLOW code (Wit 2001;Jarsjö et al. 2008) (http:// pcraster.geo.uu.nl/pcraster/4.2.0/documentation/pcraster_manual/sphinx). The KRB was delineated and a local drainage direction (ldd) map was created using the topographical data ( Figure 2). The basin was discretized with a rectangular grid (1 Â 1 km) and for each grid various hydrological processes can be provided as input. The Digital Elevation Map (DEM) data used was at a resolution of 30 arc-seconds (approximately 1 km) from the Global 30 Arc-Second Elevation (GTOPO30) data set provided by the US Geological Survey. Spatially distributed data of daily precipitation (P) and temperature (T) for the KRB were from the India Meteorological Department (IMD). The gridded daily precipitation data at a spatial resolution of 0.25 Â 0.25° (Pai et al. 2014) and temperature at a resolution of 1 Â 1° (Srivastava et al. 2009) were cropped to the basin for a common time period from 1951 to 2014. The temperature, which was available at 1 Â 1°resolution, was interpolated to 0.25 Â 0.25°using the inverse distance weighting method. The discharge data was obtained from Krishna & Godavari Basin Organization (KGBO), Central Water Commission (CWC), Hyderabad, Government of India (www.kgbo-cwc.ap.nic.in) for about 25 discharge locations for the period 1966-2015 (Figure 1(b)).
The basic hydrological processes involved in the partitioning of precipitation (P) into evapotranspiration (AET), groundwater (GW), runoff (Q) and change of storage (DS) at catchment scale can be written as follows: For sufficiently long time-scales, the groundwater inflow (GW in ) and outflow (GW out ) volumes along with the change in storage volumes DS can be neglected. Such basic assumptions ultimately represent a water balance equation at river basin scale with P, AET and R as the major hydrological variables which can be related to estimating the precipitation surplus or Available Water (AW) and can be expressed as follows: It can be noted that the study employed a conceptual catchment scale hydrological model on an annual time scale, neglecting the storage component of water in terms of groundwater in the water-balance equation. For long-term soil water storage changes the groundwater recharge influenced by geological features and human interaction water withdrawals are considered to be negligible (Wu et al. 2017). However, it can be noted that the storage component of water cannot be neglected at any temporal scale less than annual and the resulting phenomenon and the hydrological responses may vary due to the inclusion of groundwater storages being taken into account (Gunkel & Lange 2017).
If the basic hydrologic unit of a distributed hydrological model is considered as grid cell, i, the annual total precipitation (P i ) and AET (AET i ), the precipitation surplus (AW cal,i ) can be estimated as follows: where P i and AET clim,i are at annual scale in mm/year. Due to the lack of direct and long-term AET data, most of the studies depend on either pan evaporation or empirical estimations of PET (Han et al. 2014). One of the widely applied methods to estimate AET flux in hydrology is estimating PET from the energy available for vaporization first, and then applying a limiting factor to account for the water availability (Anabalón & Sharma 2017). Such ET models based on P and PET works with readily available and modelled operational meteorological variables such as precipitation and temperature (Budyko 1974). In this context, Budyko's (1974) hypothesis has gained much interest in hydrology, which can relate to the most important hydrological variables of P, PET, AET along with model parameter 'v' to represent climate variability, basin characteristics, vegetation and terrain, etc. Several studies used parametric formulations of the Budyko framework to estimate AET globally (e.g. Bai et al. 2020) and Indian case studies (e.g. Singh & Kumar 2015;Goyal & Khan 2017). A bottom-up probabilistic Budyko framework has been proposed by Singh & Kumar (2015) to estimate the vulnerability of available water of India under climate change. In another study by Sinha et al. (2018), the Budyko framework was used to study the anthropogenic stress and climatic variance under warming shifts across 55 catchments in peninsular India. Such a basic parametric formulation of Budyko framework can be written as follows: Several studies have modified the above formulation and the most widely applied one is based on one by Zhang et al. (2004) as a calibration-free formulation for the Budyko equation relating to P, PET and AET as follows: where PET can be estimated based on empirical formulations such as Penman (1950), Thornthwaite (1948) and Hargreaves & Allen (2003), etc. The present study considered the Thornthwaite model as one of the simplest models to estimate PET, which includes average air temperature and geographical location of the region of interest as input variables. The most dependable gridded observed meteorological data available for India is temperature among the other variables (e.g. wind speed, humidity, radiation, etc.) required to implement detailed PET estimates such as the Penman model. Therefore, the present study adopted the Thornthwaite model as the temperature-based PET model for the Indian context. The AET can be estimated on a monthly time scale with PET estimates. The AET estimated using Equation (5) is region-specific and ideally developed for given climatic and basin characteristics. Therefore, model parameters have to be introduced in Equation (5) to obtain runoff which is in agreement with that observed. As the AET is the most complicated component of the hydrological cycle, defined by various factors of climatic, vegetative, soil moisture and amount of available water, implementation of calibration factors on AET will be more promising. We have introduced the calibration factor on AET as the basic conceptual hydrological model considered was based on the closure of water balance involving the prominent hydrological variables of precipitation (P), AET and runoff (R), R ¼ P-AET. A calibration factor (X Cal ) has been introduced on AET in the calibration of the observed runoff (R obs ) using the water balance equation, R obs ¼ (P-X Cal AET). This way, a model parameter has been introduced on AET by accounting for the closure of water balance at the catchment outlet. Therefore, the AW with inclusion of calibration factor (X Cal ) can be written as follows: The discharge at the basin outlet (R Cal,outlet ) can be estimated by accumulating the flow at grid cell, i, (RAW cal,i ) and from all upstream grid cells (RAW cal,V ) following the flow direction of the river and corresponding to the area of each grid cell (A Cell ) as follows: where R Cal,outlet is the uncalibrated total runoff from the basin outlet at steady state and it is generally not consistent with observed runoff, R Obs,outlet : A comparison of observed and uncalibrated runoff estimations can be made to estimate the model parameter (X Cal ) based on Jarsjö et al. (2008) as follows: where R Obs,outlet and R Cal,outlet are the long-term annual average observed and simulated runoff at the basin outlet in m 3 /s respectively and P P and P AET clim are the long-term accumulated annual average observed precipitation and AET (Equation (5)) over the basin in mm/year. Such a basin averaged single calibration factor has been widely applied to study hydrological responses of soil, land cover and water use by Asokan et al. (2010) and Jarsjö et al. (2008). However, to study the hydrological responses under climate change, use of single time-invariant calibration factors may be limited to capture the variability of precipitation and AET. A dynamic model parameter which can capture the observed variability of precipitation and ET has been proposed, which can be further used to study the hydrological responses under climate change. Such a dynamic calibration factor can vary as a function of hydrological variables (P, AET and uncalibrated runoff) for each time step. To model the complex relation between the model parameters and hydrological variables of P, AET, and R, the present study proposed two machine learning algorithms, Ensemble Regression (ER) and Random Forest (RF).

Machine learning algorithms for modelling the calibration factors
The annual scale calibration factors (X Cal ) were estimated and considered as predictand variable and uncalibrated runoff (P-AET) resulting from the PCRaster model, along with P and AET which were considered as predictor variables in training and testing of machine learning models. The present study used two machine learning models, Ensemble Regression (ER) and Random Forest (RF) to predict the model parameters.

Ensemble regression (ER) model
Ensemble Regression (ER) methods are machine learning paradigms in which multiple methods which are often referred to as weak learners are trained to solve a problem and are combined to obtain better results (Friedman 2001). The ER model works on the hypothesis that a diverse set of models can make better predictions in comparison with an individual model. The ER models have gained interest in hydrological model assessments in recent years (Sajedi-Hosseini et al. 2018). The , where x i is the set of predictors and y i as the observed predictand value at the ith timestep, will be considered. Initially, all predictors are given equal weighting coefficients equal importance À a i ¼ 1 N and an initial model (F 0 (x)) will be developed to predict values of the form y ¼ F 0 (x). At every iteration, m, the residuals (r im ) will be calculated between the observed (y i ) and modelled predictand value (F mÀ1 (x i )): A base-learner (h m ) will be fitted to these residuals using a loss function 'L' in the direction of the steepest gradient, i.e. weight, a i of point 'i' is increased corresponding to a higher value of residual using the training set {(x i , r im )} N i¼1 . The model is then sequentially updated as follows: where the argmin h refers to minimization of base-learner error (h m ). The least-squares loss function L(y, F(x)) ¼ 1 2 (y À F(x)) 2 is used to update the residuals.

Random forest (RF) model
In recent years, data mining algorithms such as Random Forest (RF) have been widely applied in various studies related to water resources engineering such as hydrological (Moore et al. 1991), soil moisture forecasting (Prasad et al. 2018) and water quality assessment (Bui et al. 2020). RFs are an ensemble of decision trees, which split the data space similar to the way humans and each node represents a predictor variable (Breiman 2001). The data splitting is generally done based on inequality condition, Gini index, which is a measure of the performance of the split and measures how diverse the data is until it reaches a terminal node (Breiman 2017). The RF tree size is determined based on the number of nodes, which serve as a basis to minimize the variance of each split. RFs can perform regression as well as classification using various decision trees based on bootstrap aggregation (bagging), which is a random data re-sampling methodology with replacement. The bootstrapping process was used to sample S data points with N variables by replacement. A decision tree was trained on each of these random samples. Bagging is the process of making overall predictions from average predictions of various bootstrapped samples trained on individual decision trees, making averaging predictions from each decision tree. The steps involved in designing the RF models are as follows: 1. Draw a bootstrap samples S* of size N from the training data 2. An RF tree T b will be developed to the bootstrapped data recursively for each terminal node of the tree.
• Selecting S variables at random from N variables • Choosing the best split-point among the S • Split the node into two daughter nodes • Continuing the above steps until the size of the minimum node n min is reached.
3. The ensemble of trees can be estimated as {T b } B 1 4. Prediction at a new point, x, can be made using the regression formulation aŝ The RF model has been identified as a robust model due to its ability to capture the complex relationships between features and labels efficiently, accounting for the uncorrelated decision trees in the prediction.
The trained and tested machine learning algorithms will be further used to predict the calibration factors which can be used to estimate the calibrated runoff at the catchment scale. To quantify the performance of models and to understand the dependence between calibration factors and hydrolo- Statistical downscaling models are the state-of-the-art climate change projections prediction models relating to large-scale climate variables with surface hydrological variables (e.g. precipitation) using statistical methods (Wilby & Dawson 2013). The present study employed a multisite statistical downscaling model to predict the climate change projections of precipitation and temperature (Figure 3). The basic formulation of the downscaling model includes data preprocessing to remove systematic bias in the modelled and actual climate observations (bias correction), data reduction method (principal component analysis, PCA), predictand variable states estimation (K-means clustering), fitting algorithm to relate predictors and predictand states (CART), and transfer function (Support Vector Regression). National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/ NCAR) climate data from 1948 to the present at a resolution of 2.5 Â 2.5°were considered as largescale observed climate variables. The MIROC-ESM derived from the Atmosphere and Ocean Research Institute (The University of Tokyo), National Institute for Environmental Studies, and Japan Agency for Marine-Earth Science and Technology, Japan, was considered as the GCM output. Spatial resolution mismatch between NCEP and GCM data was resolved by applying an inverse distance weighting interpolation. The GCM predictor data undergoes bias correction based on quantile mapping method by comparing the cumulative distribution functions (CDFs) of NCEP and GCM predictor data for the doi: 10.2166/h2oj.2020.034 historical and future scenarios to remove the systematic bias associated with the climate outputs (Li et al. 2010). Both bias-corrected GCM and reanalysis climate data undergoes standardization, which involves subtraction of mean and division by standard deviation estimated with all the data points of the time series. After data processing, statistically significant climate variables (predictors) were considered to predict the precipitation over various grid points of the basin. The predictor variables which have shown significant correlation coefficients with precipitation were identified as surface air temperature, wind speed, humidity, etc., which were also found as potential predictors in other studies over India (Rehana & Mujumdar 2012;Salvi et al. 2013). NCEP/NCAR reanalysis data for surface air temperature, mean sea level pressure, specific humidity at 500 mb pressure level, zonal and meridional wind velocity at the surface level is extracted for the latitudes in the range of 12.5-20°N and the longitudes in the range of 72.5-82.5°E, surrounding the entire Krishna Basin.

K-mean clustering
K-means clustering is an unsupervised machine learning algorithm that partitions n observations into K clusters for which each observation belongs to the cluster with the nearest mean. In this study, we use the K-means algorithm to achieve cross-correlation among the rain stations and group the months having similar rainfall. This technique reads the observed rainfall values for all grids in the basin in a month, clusters them, and provides a single representative value that is referred to as the state of rainfall for that particular month. Consider X ¼ x 1 ,x 2 ,x 3 , …., x n as the set of data points and V ¼ μ 1 , μ 2 , μ 3 as the set of cluster means. For each data point, the nearest mean was identified and the point is assigned to the corresponding cluster. The objective of k-means algorithm is to iteratively minimize the squared error function given as follows: where x i À v j represents the Euclidian distance between x i and v j , C is the number of clusters and c i is the number of points in the cluster.

Classification and Regression Trees (CART)
Classification and Regression Trees is a decision tree learning technique used in prediction modelling. A decision tree is a machine learning model in which categorization of the data is carried out with each feature partitioning the data based on conditions or range of the features and works when the data has a finite set of values. CART helps in categorizing the rainfall into states by building a statistical relation between the continuous principal components extracted from predictor data and the rainfall states estimated using K-means clustering. The established relationship is assumed to be intact for the future predictors which are then taken as input for CART model and for which the future rainfall states are estimated. An advantage of CART over linear classification models is that they can capture non-parametric and nonlinear relationships as well as yield simple models. Crossvalidation is carried out in order to ensure there is no risk of overfitting the data. For every iteration, calculate the Gini's impurity/diversity Index (E) (Loh 2011) of the data using the formula given in the equation as follows: where P(w j ) represents the probability of data being in jth class for each value of the attribute. An attribute splitting will be selected which has minimum impurity index (E). It can be considered as 0 when all the patterns at node have the same class label. Continue until impurity (E) is less than a certain threshold (η) or on reaching the maximum number of iterations where E , η.

Support vector regression (SVR)
Individual regression models are built on separating the predictor and observed data based on the weather state category into individual datasets. The SVR model tries to fit the error within a certain threshold (ε), identifying a single separating hyperplane which maximizes the margin rather than solely minimizing the error, which helps to find the best model. Linear SVR was used as the regression technique to predict the precipitation and temperature without overfitting. The training data of {(x 1 , y 1 ), . . . . . . ::(x n , y n )} with n patterns and function f(x) will be identified with the consideration of the deviation from the observed target variables y i (Lima et al. 2012). The input variables, X, will be mapped into a higher dimensional feature space using a nonlinear mapping function w: where ,, . denotes the inner product, and W and b are the regression coefficients, which can be estimated by minimizing the error between f(x) and the observed values of y. SVR uses the ∈-insensitive error to measure the error between f(x) and the observed values of y, where ∈ is the hyper-parameter: Using the training data of (x i , y i ) the values of w and b are estimated by minimizing the objective function: where C and ∈ are the hyper-parameters. The minimization of the objective function, F, uses the Lagrange multiplier method, and the final regression equation with kernel function K(X, X 0 ) can be in the form:

RESULTS AND DISCUSSION
The study applied a PCRaster distributed hydrological model for 348 grids encompassing the KRB at 0.25 Â 0.25°resolution at the annual scale. The original water balance equation of (P-AET) was estimated for each grid from 1965 to 2014 to estimate the precipitation surplus. The precipitation surplus will be converted to runoff by using the area of each grid cell (Equation (7)). The basin outlet uncalibrated runoff will be estimated using the accuthresholdflux function of PCRaster using a local drainage network created based on DEM data. The observed and uncalibrated runoff along with precipitation and AET will be used to estimate the basin averaged model parameter, X cal on an annual scale. The model parameters estimated based on the closure of water balance by comparing the observed and uncalibrated runoff will be considered as the predictand. The hydrological variables of P, AET and uncalibrated runoff (R cal ) are considered as the predictor variables in modelling of calibration factors using machine learning algorithms. The calibration factors obtained based on the closure of water balance (Equation (10)) were considered as observed ones and we have used P, AET (Equation (5)) and uncalibrated runoff resulting from PCRaster (Equation (8)) as independent variables in the modelling of dynamic calibration factors using ER and RF models. The ER and RF models were trained with observed calibration factors obtained from Equation (10) as a dependent variable and P, AET and uncalibrated runoff obtained from PCRaster as independent variables for the 1966-2000 data. The trained ER and RF models have been tested with the independent data of P, AET and uncalibrated runoff for the period 2000-2014. The trained and tested ER and RF models were compared in terms of various performance measures, as given in Table 1. The error between the modelled and predicted values of calibration factors were found to be less for the RF model compared to ER. The RMSE and MAE values for the RF (ER) model was 0.04 (0.02) and 0.14 (0.09), respectively. Furthermore, the efficiency performance measures such as NSE and KGE have also shown that the RF model is robust to predict the calibration factors compared to the ER model with limited data points in training (34) and testing (15). Also, the RF model has been identified as a better fit compared to ER in terms of similarity measure (likelihood) and bias (APB), as given in Table 1 and Figure 4.
It can be noted that the calibration factors in Figure 4(a) and 4(b) are in the range of 0.9-1.5, which represents the multiplying factor to be applied on the AET. These are the range of values which have to be applied on the estimated AET based on Equation (5) to implement Equation (6) in the estimation of actual water availability in terms of runoff. The resulting runoff obtained after application of the calibration factor on AET will represent the calibrated runoff which is comparable with the observed runoff on the catchment scale. Figure 4(c) and 4(d) show the observed and simulated runoff values with ER and RF models for the training and testing periods. Due to the robustness of the RF model in modelling the calibration factors, the present study used the RF model to predict the calibration factors for the current and future scenarios. The trained and tested RF model has been used with projections of precipitation and temperature to estimate the projected runoff response over the basin. Figure 5 shows the comparison of NSE coefficients estimated between observed and NCEP and GCM simulations for the training period of 1951-1989 and the testing period of 1990-2005 of the statistical downscaling model for precipitation and temperature. The basin averaged NSE values estimated between observed and NCEP data for precipitation for the training and testing period were 0.65 and 0.44 respectively, whereas the NSE values estimated between observed and MIROC simulations of precipitation for the training and testing period were 0.57 and 0.32 respectively. Temperature predictions were more convincing compared to precipitation due to less variability in data. The basin averaged NSE values estimated between observed and NCEP data for  representing atmospheric radiation at 4.5 Wm -2 at the end of 2100. Both precipitation and temperatures were predicted to increase over the KRB for the period of 2021-2080. The annual average precipitation increase was predicted to be about 13.12% with a temperature increase of 0.6°C for the period 2061-2080 compared to the observed period of 1951-1989 with MIROC GCM outputs. Overall, the precipitation and temperature are predicted to increase over the KRB, affecting the water-energy balance variables (Rehana et al. 2020a). Such increase of precipitation and temperature projections based on GCM outputs was found to be comparable with the earlier research findings over the KRB based on Regional Circulation Model (RCM) based Coordinated Regional Downscaling Experiment (CORDEX) projections (e.g. Rehana et al. 2020aRehana et al. , 2020b. Their study noted an approximate increase of 2.19% in precipitation and an approximate increase of 1.29°C in temperature for the period 2021-2040 and 2041-2060 respectively compared to the observed period of 1966-2003 with the RCP 4.5 scenario with various CORDEX model outputs. The projections of precipitation and temperature have been used as input to the hydrological modelling framework as shown in Figure 2 to estimate the runoff for the current and future scenarios. The annual AET for observed period of 1951-1989 and 1990-2005 (Table 2). Overall, the AET has been predicted to increase under an increase   Table 2). The increase of AET over the basin was found to be more pronounced compared to the increase of precipitation over the KRB using GCM climate change projections.

CONCLUSIONS
The present study proposed a modelling framework to study the hydrological response under climate change over the KRB. A conceptual hydrological model was adopted to study the runoff response under variability of P and AET. A time-variant calibration approach was developed to estimate the model parameters relating to precipitation, evapotranspiration and uncalibrated runoff using two machine learning algorithms, Ensemble Regression and Random Forest model. The Random Forest algorithm was identified as a robust method to model the calibration factors with limited data available for training and testing with precipitation, evapotranspiration and uncalibrated runoff. The developed model was further used to study the runoff response under climate change variability of precipitation and temperature. A statistical downscaling model based on K-means clustering, CART and SVR was developed to predict the future scenarios of precipitation and temperature. Both precipitation and temperature were predicted to increase by 13.12% and 0.6°C respectively for the period 2061-2080 compared to the observed period of 1951-1989 with MIROC GCM outputs. The increase of temperature has predicted an increase of PET for the future scenarios. It can be noted that the AET estimates are defined by P and PET changes. The increase of temperature has resulted in an increase of PET and consequent increase of AET over KRB for the future scenarios. It can be noted that the projected increase of AET is more pronounced under an increase of PET compared to the increase of P, resulting in a decrease of runoff for the future scenarios over the KRB. The projected decrease of basin outlet runoff for the periods 2021-2040, 2041-2060 and 2061-2080 compared to the observed period 1951-1989 was noted as 15, 14.69 and 10.09% respectively. Overall, the basin outlet runoff was predicted to decrease (13.26%) for future scenarios under climate change due to an increase in temperature (0.6°C) compared to the precipitation increase (13.12%) over the KRB. As the conceptual model applied is based on the basic hydrological variables of P and AET, the resulting runoff outlets were predicted to decrease under the projected increase of AET. Overall, severe water shortage was predicted under climate change due to the projected increase of AET compared to the precipitation over KRB.  (1951-1989 (training), 1990-2005 (testing)) and future period (2021-2040, 2041-2060, 2061-2080)  The proposed modelling framework has considered one hydrological model with one particular GCM simulation under a particular concentration pathway scenario for demonstration purposes. It should be noted that hydrological response can be expected to vary with various hydrological models, calibration parameters, statistical downscaling models, various concentration path scenarios, and various machine learning algorithms. A detailed study may be evident under the possible range of uncertainties arising at each stage of the modelling framework, which can be a potential research problem.
The present study considered a calibration-free formulation of Budyko's hypothesis to estimate the annual AET on the catchment scale. The study intended to demonstrate how a hydrological calibration factor can be modelled based on various hydro-climatological variables and the same can be utilized for the climate change impact assessment. For this purpose, the catchment scale single calibration factor based conceptual hydrological model was adopted. However, the proposed methodology can be implemented with any such standard hydrological model, accounting for the number of calibration factors involved and the suitable data-driven model to be adopted. Furthermore, the present modelling framework can be extended by introducing multiple calibration factors accounting for the land use, soil, etc. (Jarsjö et al. 2008) in the hydrological modelling, which is a potential area of research. The main emphasis of the study was to develop a modelling framework to study the hydrological responses under climate change using a conceptual hydrological model with an integration of a calibration framework based on machine learning algorithms. The proposed modelling framework can be extendable for other river basins with sufficient data availability and validation by applying various machine learning algorithms. With the use of most readily available and operational hydrometeorological variables such as precipitation, temperature and runoff, the proposed model can be promising to study the catchment scale hydrological responses under climate change. However, validation of the modeled results is most critical in the implementation for other catchment studies. Specifically, the use of calibration-free Budyko formulation has to be validated with the water-balance or most dependable and state-of-the-art satellite based AET estimates. Furthermore, the study limited for annual water balances by neglecting the storage changes which can be further extended with the inclusion of appropriate storage components to study the water balances other than on annual scale. Due to the limitation over the availability of various meteorological variables in the estimation of PET, the present study adopted the Thornthwaite model, which can be extendable to the standard Penman model. Overall, the proposed methodology of inclusion of time-variant calibration factors in the conceptual hydrological model accounting for various hydrological variables is more reliable for catchment-scale water management under climate change.