Hybrid Deep Learning for Week-Ahead Evapotranspiration Forecasting

Reference evapotranspiration (ET) is an integral hydrological factor in soil-plant-atmospheric water balance studies and the management of drought events. This paper proposes a new hybrid-deep learning (DL) approach, combining Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU) along with Ant Colony Optimization (ACO), for a multi-step (week 1 to 4) daily-ET forecast. The method also assimilates a comprehensive dataset with 52 diverse predictors, i.e ., satellite-derived Moderate Resolution Imaging Spectroradiometer (MODIS), ground-based datasets from Scientific Information for Landowners (SILO) and synoptic-scale climate indices (CI). To develop a vigorous CNN-GRU model, a feature selection stage entails the Ant Colony Optimization (ACO) method implemented to improve the ET forecast model for the three selected sites in Australian Murray Darling Basin. The results demonstrate excellent forecasting capability of the hybrid CNN-GRU model against the counterpart benchmark models, evidenced by a relatively small mean absolute error and high efficiency. Overall, this study shows that the proposed hybrid CNN-GRU model successfully apprehends the complex and non-linear relationships between predictor variables and the daily ET.


Introduction
Reference evapotranspiration (ET) plays a significant role in agriculture, ecosystems, and ecological models (Yin et al., 2017). In hydrological cycles, evapotranspiration links the atmospheric water and surface water flows through water vapor transportation to the atmosphere, which eventually returns as precipitation. Moreover, in terms of water quantity, evapotranspiration holds the second-largest mechanism to precipitation in hydrological cycles (Zou et al., 2019). Considering the importance of ET in global energy balance (Zeng et al., 2019), it is essential to understand the hydrological cycle dynamics in order to continuously monitor stochastic in ET to promote sustainable freshwater usage (Zou et al., 2019).
The most critical parameters that affect evapotranspiration variability are humidity, wind speed, air temperature, and sunlight duration (Pejić et al., 2015). As the temperature increases, the capacity of water holding by soil and transportation of water vapor increases, which eventually increases the evapotranspiration rates. Moreover, because of the complexities in the land-plant-atmosphere system, actual evapotranspiration is rather challenging to measure. Using the Penman-Monteith method (Allen et al., 1998), reference evapotranspiration is determined as a basis for actual evapotranspiration (Yin et al., 2017). Reference evapotranspiration corresponds to the evapotranspiration percentage of reference plants. ET is potentially responsible for more than 90% of global water losses (Morison et al., 2020).
Importantly, ET receives significant attention as it has the greatest influence on agricultural water usage.
It is mentionable that, due to climate change, ET has fluctuated in different regions of the world (Piticar et al., 2016). Understanding ET's uncertainty of forecasting and its complex associations in changing climates in different regions, a future forecast model for ET under climate change influences continues to gain attention. Therefore, robust ET forecasts are essential in managing water resource issues under a changing climate and drought situations, particularly, to maintain agricultural water use efficiency.
Understanding land surface processes and vegetation affected by weather and climate are mainly grounded on numerical modelling of surface energy fluxes and the hydrological cycle (Bonan, 2008).
Researchers have made efforts to develop different modelling approaches used in agricultural systems to estimate evapotranspiration (Abdullah et al., 2015;Chen et al., 2020;Deo and Şahin, 2015;Mehdizadeh, 2018). The Penman-Monteith (PM) and Shuttleworth-Wallace (SW) models are used to predict evapotranspiration for seasonally varied vegetation (Zhu et al., 2013). These physically sound and rigorous models consider ET's relationships with the net radiation heat flux and soil-surface temperature. These models incorporate mass transfer and energy balance and are widely used around the world. The Priestley-Taylor and Flint-Childs (PT-FC) model is radiation based; the advectionaridity (AA) model is based on meteorological variables. The models can be developed with a small set of measurements (Wei et al., 2019). Although prior ET forecasting studies have adopted conventional machine learning methods, the deep learning method for time series analysis is also getting much attention from many researchers (Olah, 2015;Zeng et al., 2019).
The selection of appropriate predictor variables remains practically a vital part of any model's development, compelling computational intricacy, improving the forecasting accuracy and interpretation of the model's characteristics and nature of the predictors used (Prasad et al., 2018). In this research work, predictor variables from three distinct datasets, namely, the MODIS-satellite images, ground-based SILO data and climate indices are used to train the model. Remotely sensed data, i.e., MODIS had been identified as potential predictors for forecasting solar radiation (Ghimire et al., 2019a) and land surface temperature (Deo and Şahin, 2017). Besides this, the SILO dataset, as a Queensland Government database, is continuously used due to its reliability and accuracy (BOM, 2020). However, different studies have demonstrated the potential applying climate indices (Ali et al., 2019;Nguyen-Huy et al., 2018). Hence, integrating a deep learning hybrid model with satellite-derived products, climate indices, and SILO variables can significantly enhance the potential ET forecasts. However, remotely sensed satellite data are yet to be used to forecast ET, especially using a deep learning model. This paper aims to develop a forecast model that considers the dynamic characteristics of ET time series by means of a hybrid-deep learning approach. The proposed model is able to learn the temporal dependencies of the multivariate ET (and its predictor variables) to generate robust forecasting performance. The paper is presented as follows: Section II elucidates the related work. Section III describes the overview of deep-learning model development. Section IV provides experimental setup and explores the robustness of the proposed model through visual and statistical analysis methods.
Finally, the concluding remarks and future prospects of the research work are presented is the last section.

Related Works
ET forecasting has a rich literature, but these are addressing the problem of ET forecasting using statistical and conventional machine learning models, including artificial neural networks (ANN) (Traore et al., 2010), Random Forest (Wu et al., 2020) and extreme learning machines (ELM) (Abdullah et al., 2015). Patil and Deka (Patil and Deka, 2016) presented a comprehensive ET assessment method which used ELM model in the Thar Desert, India. The ELM method was compared with Hargreaves equations, ANN, and least-squares support vector machines (LS-SVM). The results revealed that ELM was a simple, yet an effective method with substantial improved ET forecasting. Huo et al., (2012) concluded that temperature and relative humidity were the most important predictors in their modelling. Abdullah et al. (2015) proved that ELM had an outstanding generalisation performance to predict PM-ET using four complete and incomplete meteorological input combinations for their case study in Iraq.
In contrast to the computational methods described above, deep learning (DL) techniques have become the most popular approach towards modelling sequential data (Ghimire et al., 2019c).
Intelligent models based on deep learning must use feature extraction to reveal the compounded associations between predictors and a target variable (Ghimire et al., 2019a). Hence predicting potential evapotranspiration using this algorithm is a useful approach for water resources management. However, the deep learning is yet to be explored within the present study region of Australian Murray Darling Basin.
Numerous DL approaches viz. Convolutional Neural Network (CNN), long short-term memory (LSTM), Gated Recurrent Unit (GRU) and Recurrent Neural network (RNN) have been implemented in different fields of time series forecasting including rainfall (Hu et al., 2018), air quality (Du et al., 2018), stream flow (Damavandi et al., 2019) and evapotranspiration (Tikhamarine et al., 2019). Liu et al. (2016) used a CNN approach to detect three types of climate extreme, namely the river system dynamics, tropical cyclones, and weather fronts independently. The study revealed that different sets of physical predictor variables were able to classify different climatic events. Zhang et al. (2017) implemented a deep belief network for the prediction of precipitation using seven environmental parameters from the previous day. They found a better forecast accuracy than various ML and statistical algorithms.
However, there were several days in which the forecasts were unreasonably good. In both hydrology and climatology areas, studies using 1D CNN methods are still scarce (Haidar and Verma, 2018) although CNN appears to outperform traditional machine learning models in many studies, to reach state-of-the-art performance. Despite its high capabilities, deep learning models, such as the CNN, have not been significantly explored in hydrological science.
This study adopts the gated recurrent unit (GRU) neural network as a modified version of the LSTM that has also attracted research attention in many recent problems (Zhang et al., 2018). There appears to be only a few studies on GRU-based models, especially within hydrology (Gao et al., 2020).
Convolutional Neural Networks (CNN) is no doubt a useful feature extraction method to improve the overall predictive process (Ghimire et al., 2019b). Therefore, an integration of CNN with GRU can, in foreseeable possibilities, lead to a robust pre-processing of extensive data that provides a viable option to improve the model's forecasting skill. There has also been recent studies that integrated CNN with LSTM for improved performance, such as Ghimire et al. (2019c) showing superior skill of CNN-LSTM model in a problem of solar radiation prediction. The integration of deep learning (i.e., CNN-GRU) for reference evapotranspiration forecasting, however, is yet to be tested explicitly, with no studies previously verifying the utility of this method for Australia's Murray Darling Basin, which is the focus of the present study.
The objectives of this paper are (1)

Problems and Motivations
ET forecasting has been a prime issue for making decisions on irrigation scheduling (Yin et al., 2017). In general, ET's estimation is a significant component in water-resources management, irrigation planning, hydrological modelling and assessment (Feng et al., 2017). This study aims to develop an ET -forecast model to anticipate ET's change for a specific time. The ET-forecast model can be described as follows: if t is the daily period where one weekly ET would be forecasted at the scale of t + 7 for week 1, t + 14 for week 2 ahead of ET forecasting. However, ET forecasting depends on related meteorological variables such as air temperature, evaporation, wind speed, precipitation, and humidity. Figure 2 shows the time series plot of selected predictor variables as an example.
Besides, reference evapotranspiration (ET) is the most important climatic element in water balance next to temperature and precipitation thus has implications in soil water balance, e.g., for analyses and prediction of plant-available water and irrigation. ET is challenging to measure directly, and in most cases, it is estimated from meteorological data (Thomas, 2000). Moreover, the ET forecasting task is challenging due to rapidly changing weather, and many factors influence it.
Considering the issue, an ET forecasting approach based on a hybrid deep learning architecture is implemented in this paper. Many researchers have studied the hybrid learning model, which is a practical approach for forecasting hydrological time-series.
This paper applies the CNN-GRU model, which is a potential approach (Ahmed et al., 2021a;Ahmed et al., 2021b). Some recent studies have demonstrated an enhanced performative integration of CNN with LSTM. However, the integration of three distinct data sets used in this paper was examined for their robustness and performance. In this paper, the applied remotely sensed MODIS dataset captures the land surface state while SILO data considers meteorological data from the land surface. Climate mode indices provide input features related to ET and atmospheric-oceanic states' teleconnections to improve the objective model's skill on ET forecasting.

Deep ET Forecasting Framework
This section describes the developed hybrid deep-learning method, CNN-GRU, for forecasting the ET.
The Ant Colony Optimisation (ACO) algorithm was used to select appropriate features for the model.
The proposed method combines one-dimensional CNN and GRU and considers ET data's temporal dependencies. Given the correlations between the ET time series' features and variables up to a specific precedent period, the historical memories of predictor and target variables are crucial, as stated in Figure   4 and Figure 5. first is the ACO for feature selection, the second is multiple one-dimensional convolutional layers for extracting correlated features learning predictor variables, and the third is the implementation of the Gated Recurrent Unit (GRU) network to capture the long time series dependency. Although there is no method is found to evaluate the contribution of the predictors in advance before developing the model (Tiwari et al., 2016). Although two standard ways of choosing the lagged ET memories and predictor variables for the optimum model (Prasad et al., 2017). They are partial autocorrelation (PACF) and cross-correlation function (CCF) approaches (reference). A critical background activity in terms of ET lags using PACF is shown in Fig. 4. The figure shows that antecedent monthly delays are significant.
Cross-correlation determined the statistical equality between the predictors and the target variable. Fig.   5 shows cross-correlation between predictors and ET for Menindee weather station. A set of relevant input combinations were then calculated from rcross of each predictor with ET. In this figure, the blue line indicates a 95% confidence level of the statistically significant rcross. The correlation of the predictor variables with ET was significant for all stations at lag zero (rcross: 0.25-0.85). From the predictor variables, 15 input variables were selected using the ACO feature selection algorithm.
To exploit the predictor variables' temporal dependence, a multiple one-dimensional CNN is trained to extract input function. The network consists of three layers: the input layer, the fully connected output layer, and a hidden layer(s) between the first two layers. In this work, three layers of CNN, max-pooling layer, and dropout layers were used based on the need. Let us assume a matrix A =

Multiple 1-Dimensional CNN (CNN)
This study uses Convolutional Neural Networks (CNN) for input data extraction to create the hybrid CNN-GRU model for multi-step forecasting. CNN's resemble traditional neural networks.
However, their connectivity between and inside the neuronal layers varies. In traditional neural networks, each neuron in the previous layer is ultimately linked to all neurones, while single-layer neurones do not contribute to the model's network. CNNs are like Feed Forward Neural Networks (FFNNs) with its three-layer model architecture focused on pooling, convolution, and maximum layer settings.
The connected layer is employed to estimate objective variables depending on the predictor variable's input features. CNN has proven to be a reliable modelling tool to extract hidden features in inputs and generating filters capturing data features in predictor variables (Oehmcke et al., 2018). To extract the pattern in an objective variable (i.e., ET) and associated predictor variables, each convolutional layer is established as follows (Nunez et al., 2018) Here, Q k is referred to as the weight of the kernel associated with k th feature map, f is the activation function, and the operator of the convolutional procedure is denoted by star sign ( * ). The rectified linear unit (ReLU) is used as an activation function, and adaptive moment estimation (Adam) is selected as an optimisation algorithm using the grid search approach. The ReLU is described as A one-dimensional convolutional operative was adopted to forecast the 1-Dimensional dataset, which essentially simplifies modelling procedures for real-time forecasting execution.

GRU for feature dependency learning
Our newly proposed hybrid CNN-GRU model utilises gated recurrent unit (GRU) neural network as the predictive tool after extracting features based on the CNN algorithm. GRU is a distinct type of long short-term memory (LSTM) network presented by Cho et al. (2014). GRU can achieve long-short reliance on declining ignition gradients. Along with similarities, GRU possesses different characteristics from the LSTM. For instance, the GRU owns two gates, namely the update gate and reset gate, whereas the LSTM has three gates (i.e., the input gate, forget gate, and output gate). Fig. 3 shows the structure of the gated recurrent unit network.
In a GRU network, two input features, including the input vector x(t) and output vector h(t-1), are present in each layer. The yield of each gate is achieved by logical operation and non-linear transformation of predictors. Moreover, the association between predictors and predictand can be defined as follows: where r(t) is the reset gate vector, z(t) is defined as the update gate vector, W and U are parameter metrics and vector. h is referred to as a hyperbolic tangent, and g is defined as a sigmoid function.
Finally, given the architecture of GRU, the training approach is chosen, which includes backpropagation through time. Based on previous studies, Adam optimiser was implemented as it has enhanced expertise.

ACO algorithm for feature selection
Ant Colony Optimization (ACO) is proposed by Dorigo and Caro (Dorigo and Di Caro, 1999), biologically inspired by colonies of ants' behaviours. In this study, we applied an ACO algorithm for feature selection. The ACO algorithm's theory is as follows: when ants locate a trace of foodstuff, they leave a fragrant substance defined as a pheromone to spot the path. When an ant tries finding food, it follows the path of laid pheromone. Besides, this ant again lay pheromone on the path so that the other ants can find the same route. However, if an ant decides between paths, it prefers the path with high pheromone level, which shows that more ants have gone by the path. It is a matter that the ants follow the shorter path for food; the shorter paths get more fragrant than longer paths. If an ant does not travel a path, then the pheromone evaporates over a certain period. Therefore, pheromone intensity is decreased (Sweetlin et al., 2017), and over time, all ants will follow the shorter way for food. Finally, the "evaporation of pheromone" and "probabilistic selection of paths" provide information to ants to find the shortest food path. The concepts lead to elasticity for resolving optimisation challenges. In brief, an ant can use the information in the other ants to choose a better solution. Fig. 3 shows the steps in the ACO algorithm.

Experiments
In this section, we explain the three distinct datasets and the framework of the application of models. We used three distinct datasets viz.

Datasets
The predictor variables were collected from three separate data sources, viz., in this analysis.
MODIS-satellite datasets are used to capture the land surface status and flow parameters on regular temporal resolutions; the ground-based SILO repository provides meteorological data in a ready-to-use format for biophysical modelling and climate mode indices datasets related to teleconnections between ET and atmospheric-ocean states to boost the objective. The descriptions of three experimental data sets are as follows (as presented in Table 1).   (Adnan et al., 2016), this study used all these indices due to closely correlated rainfall with lagged SOI values, which showed high predictability of rainfall from August-November. We also consider the Madden-Julian Oscillation (MJO), which is seen to cause considerable variance in tropical weather on monthly and weekly periods (BOM, 2020).

FAO-56 Penman-Monteith Equation:
As stated earlier, the Penman-Monteith method is a sound and rigorous model that considers the relationships with the net radiation, all kinds of heat flux surface temperature, the PM method was selected for ET calculation. Mathematically, the Penman-Monteith method is stated as follows: Here ET = reference evapotranspiration (mm); = slope from the curve of vapor-pressuretemperature curve (kPa o C -1 ); = net radiation (MJ m -2 ); G = soil heat flux (MJ m 2 day -1 ); γ= psychrometric constant (kPa •C -1 ); = average daily air temperature at 2 m (•C); U2 = mean daily wind speed at 2 m (ms -1 ); es = saturation vapour pressure (kPa); and ea = actual vapour pressure (kPa).
The calculation procedure of ET is outlined in the FAO-56 manual (Allen et al., 1998). Moreover, the variables used to calculate the ET value were collected from SILO database.

Experimental Setup and Performance
To achieve a vigorous hybrid deep learning model for ET forecasting, a prime task was to Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970), mean absolute error (MAE). Due to geographic differences between the study stations, we also employed the relative error-based metrics: i.e., relative RMSE (RRMSE) and relative MAE (RMAE). Mathematically, these metrics are as follows: In Eq. (8-12), and represents the observed and forecasted values for i th test value; and refer to their averages, accordingly, and N is defined as the number of observations.

Single Data Set Forecasting Results
The single data set ET forecasting for week 1-4 ahead period with three distinct data sets is provided in Table 2, which provides an RMSE of the comparative deep learning models, shallow machine learning models and our proposed model. According to The results show that deep learning models are more useful for ET forecasting than traditional shallow machine learning algorithm for week 1-4 ahead forecasting.
To further test the single-data set forecast performance of CNN-GRU and baseline models in three real-world datasets, we examine CNN-GRU's ET forecasting capacity and the other two selected baseline models over the testing period. Figs. 6(a)(b)(c) compare observed ET and forecasted ET of CNN-GRU, GRU and MLP models at the week-1 ahead forecasting. As shown in the figures, our model's output is better than GRU and MLP models, particularly in wave peak and vale time series. We also note that baseline models' single data forecasting performance is sensitive to different datasets. For example, in the MODIS dataset, the CNN-GRU model has better forecast performance than the SILO and Climate Mode Indices dataset (see Fig. 6(a)). Interestingly, forecasting is like all three types of datasets, which indicates a prospect of integrating the datasets.
In a nutshell, for single data set ET forecasting of time series under different times ahead (week 1-4), the proposed model provides the highest robustness. Moreover, comparing deep learning models is also found useful in the forecasting of ET. However, integrating multi-sourced data sets for ET forecasting is not that simple, but a potential approach to increasing forecasting performance.

Multi-step Forecasting Results Analysis
The multi-step (week 1-4) ahead ET forecasting performance with three integrated datasets are tabulated in Table3 Tables 3 and 4). However, a site-specific signature in the model accuracy was also evident, with the results for Menindee registering the lowest value of RMSE generated by the CNN-GRU model. In terms of r also, the CNN-GRU model returned the highest value for Menindee, suggesting that the CNN-GRU model was a potential tool for forecasting ET at 1-4 weeks ahead. Unsurprisingly, following other studies (e.g., (Wen et al., 2019)), this study indicates that as the length of the forecasting period was increased, the model's performance appears to reduce at a significant rate in such a way that the r-values reduced by 0.30%, 1.10%, 9.15%, 11% and 15% for the 1-4 weeks of ET forecasting.
Besides model evaluation so far, a further appraisal of our objective model is achieved using between the forecasted ET and the observed ET values was significantly higher for the MLP and GRU models forecasting, which concur with earlier metrics suggesting a potential inadequacy of ET forecast.
The blue circle identifies the improvement from classical MLP to GRU, and finally, the objective CNN-GRU model pointed in the figure that indicated the improvement in forecasting. Fig. 11 includes the wavelet coherency spectrums between observed and forecasted ET of week 1 using CNN-GRU and GRU model. The findings showed that the underlying existence of variation and periodicity across several time-scale bands of ET trends over the testing period. Wavelet coherence, scaled from 0 to 1, was determined based on Torrence and Webster (Torrence and Webster, 1999). The arrows indicate the relative phase relationship within significant zones of higher correlation. For the CNN-GRU model forecasted ET, the all-year band observed a more robust association from a slight discrepancy at different points. Besides, the standalone GRU shows a low correlation. Between 0.0050-0.0156 normalised frequency displayed no consistent trend during the sample period. The overall assessment indicates precise forecasting of ET using the objective models. Notably, in this study, the ACO algorithm was used to improve predictive models. Therefore, Fig. 12  Our study also suggests that groundwater recharge, deep percolation, and plant uptake, which are essential factors to concentrate evapotranspiration in different layers (Zhu et al., 2013), can be ideal variables to understand ET characteristics better while predicting future changes. The hybrid deep learning approach (i.e., CNN-GRU) incorporated with MODIS satellite-derived data, ground-based SILO data and climate mode indices (representing synoptic climate features) can be a good modelling tool for predicting ET months or other hydrological variables at multi-step lead times, including future use in water resource management and sustainable farming.

Conclusion and Future Prospects
Using deep hybrid learning, this study incorporates early warning evapotranspiration forecasting at four weekly steps ahead. The study has a hybrid predictive model (i.e., CNN-GRU) with Ant Colony Optimisation implemented in forecasting reference evapotranspiration. The novel approach combined the convolutional neural network with the gated recurrent unit network. Optimum efficiency.
The practicality of the model was tested using three distinct datasets of the Australian Murray Darling Basin. Elucidated by graphical presentations and statistical metrics of forecasted and observed ET, the findings reveal a superior CNN-GRU performance relative to an ensemble of other compering models.
The study has the following contributions: 1. The research was a novel approach to using the hybrid CNN-GRU model and the ACO algorithm, especially for the Australian Murray Darling Basin, by presenting constructive research methodologies.
2. The competing model (i.e., CNN-LSTM) and eight benchmarking models (i.e., GRU, LSTM, and MARS) were developed to assess the objective model's predictive performance against statistical score metrics and visual analysis of observed and forecasted ET in the test process. The results revealed that the root means squared error for all competing models was substantially more extensive than the objective model registering 0.126 mm, 0.929mm, 0.962 mm and 0.982 mm for week 1~ week 4 accordingly.
3. The ACO was found as a realistic approach to obtaining the best features from an optimal set of predictor variables, and the hybrid CNN-GRU model significantly improved the forecasting performance of evapotranspiration. Thus, the proposed CNN-GRU model yielded an appropriate degree of accuracy when applied at the weeks ahead ET forecast against classic standalone models.