The optimal alternative for quantifying reference evapotranspiration in climatic sub-regions of Bangladesh

Reference evapotranspiration (ETo) is a basic element for hydrological designing and agricultural water resources management. The FAO56 recommended Penman–Monteith (FAO56-PM) formula recognized worldwide as the robust and standard model for calculating ETo. However, the use of the FAO56-PM model is restricted in some data-scarce regions like Bangladesh. Therefore, it is imperative to find an optimal alternative for estimating ETo against FAO56-PM model. This study comprehensively compared the performance of 13 empirical models (Hargreaves–Samani, HargreavesM1, Hargreaves M2, Berti, WMO, Abtew, Irmak 1, Irmak 2, Makkink, Priestley-Taylor, Jensen–Haise, Tabari and Turc) by using statistical criteria for 38-years dataset from 1980 to 2017 in Bangladesh. The radiation-based model proposed by Abtew (ETo,6) was selected as an optimal alternative in all the sub-regions and whole Bangladesh against FAO56-PM model owing to its high accuracy, reliability in outlining substantial spatiotemporal variations of ETo, with very well linearly correlation with the FAO56-PM and the least errors. The importance degree analysis of 13 models based on the random forest (RF) also depicted that Abtew (ETo,6) is the most reliable and robust model for ETo computation in different sub-regions. Validation of the optimal alternative produced the largest correlation coefficient of 0.989 between ETo,s and ETo,6 and confirmed that Abtew (ETo,6) is the best suitable method for ETo calculation in Bangladesh.


Data and methods
Study area description and data sources. Bangladesh, situated in Southeast Asia, geographically it encompasses between 20° 30′ N and 26° 45′ N latitudes and 88° 0′ E to 92° 45′ E longitudes (Fig. 1). Banglapedia 48 divided Bangladesh into seven climatic sub-regions based on climatology and geography as shown in the Supplementary Material of Figure S2. The seven sub-regions are (1) south-eastern zone; (2) north-eastern zone; (3) northern part of the northern zone; (4) north-western zone; (5) western zone; (6) south-western zone and (7) south-central zone. Bangladesh experiences a sub-tropical humid monsoon climate with seasonal differences 4 . www.nature.com/scientificreports/ Western Bangladesh has usually become drier compared to other regions in Bangladesh 49 . Here, climatic variability is a regular scenario. Long-term daily average relative humidity, minimum temperature, maximum temperature, wind speed (at 2 m height), net radiation, evapotranspiration across the country are, respectively, 80%, 21.39 °C, 29.94 °C 1.32 ms −1 , 10.44 MJm −2 day −1 , and 3.72 mm day −1 .
Bangladesh Meteorological Department (BMD) runs only 43 meteorological stations across the country. The meteorological stations are unevenly distributed all over the country and most of the stations are located in the south-eastern regions. These meteorological stations are available for the climatic dataset, although some of the stations are newly established after the 1990s in Bangladesh and they do not have long-term data records (www.bmd.gov.bd). When more climatic variables are required, the dataset from a smaller number of stations was available. Due to these drawbacks, 20 stations were chosen for ET o estimation over the 38 years from 1980 to 2017. These selected 20 stations embody the seven climatic sub-regions of the country. Daily minimum (T min ) and maximum temperature (T max ) (°C), mean relative humidity (Hr) (%), wind speed (Uz) (Knots) and sunshine hour (h day −1 ) datasets of 20 stations were sourced from the BMD. Net radiation (Rn) and wind speed at 2 m height (U2) cannot directly be measured by weather stations. Daily Rn and U2 were estimated using the procedures recommended by Allen et al. 1 with the available meteorological datasets. A brief geographical and meteorological description of the selected stations is found in the Supplementary Material of Table S2. However, missing data in almost all the 20 stations was found. After the initial screening test, missing data of the 20 stations were less than 5% for the period of 1980-2017. Missing data for each station were filled by the existing records for the respective days from the adjacent neighbor stations. It is worthy to note that sunshine hour dataset in this study is continuous with no missing data. More details about the fill-up of missing meteorological datasets is given in the Supplementary Material (Table S3). The BMD follows the guideline of World Meteorological Organization (WMO) for weather dataset collection and record archiving. However, quality control of the dataset was primarily undertaken thoroughly by checking namely, positive values of parameters, for example, T min is lower than T max , and humidity is less than 100%. The homogeneity tests of the dataset were conducted to exhibit any anomaly in the dataset 50 . All of the datasets were passed through the quality control by the staff of the BMD. where, ET o is the reference evapotranspiration (mm day −1 ),R n isthe net radiation atcrop surface (MJm −2 day −1 ), Gis the soil heat flux density (MJ m −2 day −1 ), T is the average daily air temperature at 2 m height (°C), U 2 is the wind speed at 2 m height (ms −1 ), e s is the saturation vapour pressure (kPa),e a is the actual vapour pressure (kPa), e s − e a is the saturation vapour pressure deficit (kPa), is the slope of vapour pressure curve (kPa °C −1 ), γ is the psychrometric constant (kPa °C −1 ). Allen et al. 1 recommended G = 0. The detailed procedures of ET o estimation is found in FAO 56 paper 1 . R n is calculated by the Eqs. (2)(3)(4)(5)(6)(7)(8)(9)(10)(11): where, R ns is the net solar or shortwave radiation (MJ m −2 day −1 ), R nl is the net outgoing longwave radiation (MJ m −2 day −1 ), R s is the global solar or shortwave radiation (MJ m −2 day −1 ), N and n are, respectively, the maximum and actual possible sunshine duration, R a is the extraterrestrial radiation (MJ m −2 d −1 ), Gsc is the solar constant (0.0820 MJ m −2 min −1 ), d r is the inverse relative distance Earth-Sun, ω s is the sunset hour angle (rad), φ is latitude (rad), δ is solar declination (rad), J is the number of the day in the year between 1 (1 January) and 365 or 366 (31 December), σ is the Stefan-Boltzmann constant (4.903 × 10 −9 MJ K −4 m −2 day −1 ), α is albedo (α = 0.23), T max k and T min k are, respectively, the maximum and minimum absolute temperatures during 24-h, and R so is the clear sky solar radiation (MJ m −2 day −1 ). Allen et al. 1 recommended 0.25 for a s and 0.50 for b s . U z is measured wind speed at Z m above ground surface (ms −2 ) and z is respective station elevation above sea level (m). According to Allen et al. 1 , Saturation Vapour Pressure (e s ), Actual Vapour Pressure (e a ), Slope Vapour Pressure Curve (∆) and Psychrometric Constant (γ) are calculated by the following Eqs. (13)(14)(15)(16)(17)(18)(19), respectively: where, es is the mean saturation vapour pressure (kPa), e 0 (T max )ande 0 (T min ) are the saturation vapor pressure at maximum and minimum temperature, respectively. e a is the actual vapour pressure function (kPa) and Hr is the mean relative humidity. T ave , T max and T min are the mean, maximum and minimum air temperature, respectively, in °C and exp [·] is 2.7183 (base of natural logarithm) raised to the power [··]. P is the atmospheric pressure (kPa), λ is the latent heat of vaporization (2.45 MJ kg −1 ), C p is the specific heat at constant pressure (1.013 × 10-3 MJ kg −1 °C −1 ), ε is the ratio molecular weight of water vapour/dry air (0.622).  40 , and Turc 57 models were chosen to compare to the FAO56-PM model. The 13 empirical models were chosen based on the available input meteorological variables, universal acceptance and their applicability worldwide (Table S1). The Hargreaves-Samani (HS), Hargreaves M1 (HM1), Hargreaves M2 (HM2) and Berti models used in this study, as the HS, HM1 and HM2 models require only the temperature and extraterrestrial radiation datasets and Berti model requires only temperature data, making these models less complex. Therefore, the 13 empirical models used in the present study can be classified into the three classes: four temperature-based model (Hargreaves-Samani, Hargreaves M1, Hargreaves M2, and Berti), one mass transfer-based models (WMO), eight radiation-based models (Makkink, Priestly-Taylor, Jensen-Haise, Abtew, Irmak1, Irmak2, Tabari, and Turc). The performances and application of these models had never been validated in Bangladesh so far. The studied models, input parameters, computed equations with references are outlined in Table 1.

Empirical models.
Performance evaluation of 13 empirical models. Performance evaluation of 13 empirical models, based on the accuracy of each model for estimating ET o , was undertaken by six statistical criteria. The six statistical criteria were the mean bias error (MBE) 58 ; mean absolute error (MAE), correlation of determination (R 2 ), (11) R so = (0.75 + 2 × 10 −5 Z) R a   Sen's slope of estimator. Sen's slope of estimator 66 was applied for calculating the change of ET o,s in Bangladesh per decade and the statistics is as followed: Q is denoted as the slope between xj and xk.
The spatial distributions of the monthly ET o and its trends are mapped. Spatial distributions of the monthly meteorological variables; ET o,s , ET o,i , trends and the other examined variables are mapped by the inverse distance weighted interpolation model in ArcGIS 10.5 software.
Random forest (RF) model. The RF is a heuristic decision tree-based supervised machine learning model 67 that is appropriate for addressing the existence of the over-fitting problem to the decision trees, and other machine learning algorithm 68 . The RF is most robust, can handle numerous heterogeneous covariates, and has been effectively employed into the hydrological field 69 , genetic engineering field 70 and hydro-meteorological field 71 . The RF model has been benefited from the two more powerful algorithms e.g., bagging and random binary trees, which are called the powerhouse of this model. For developing the RF model, the number of trees and features in each split is essential. RF is a classifier which comprises of an assortment of classifier trees f m (x) for m = 1, …, M which relies on the parameters and every single tree casts a unit vote for input x 71 . Each tree generates an individual class which then combined and the majority vote predicts the final results. Present study optimized its accuracy with 100 trees, 1 execution slot, 5 seeds and with maximum depth 1. As a tree-based ensemble learning model, this model has extensively used to evaluate the importance degree of any climatic dataset in various regions 25,72 . To the best of author's knowledge, the RF model has not yet been employed to explore the importance degree of 13 empirical models against the FAO56-PM model in Bangladesh. The RF model is used to know which model is most reliable and dominant for estimating ET o . More detailed about the RF model can be found elsewhere 46,71 .

Results
Spatial distribution of meteorological variables. Figure 2 represents the distribution of multi-year mean meteorological variables of T ave , T min , T max , Rn, U2, and Hr from 1980 to 2017. Distribution of T ave (Fig. 2a), T min (Fig. 2b) and T max (Fig. 2c) showed almost similar results. Sub-region VI showed the higher values of historical temperature, while sub-regions II and III entirely showed the lowest temperature. Sub-region V showed the lowest temperature for the distribution of T ave and T min and moderate temperature for T max . Sub-regions I, IV and VII showed the moderate values. The higher rate of net radiation was observed in the sub-region I and lower rate of net radiation was found in the sub-regions II and III (Fig. 2d). Average values of Rn were seen in the sub-regions IV, V, VI, and VII. The highest wind speed was found in the sub-region of VI and the lowest in the sub-regions II, III and V (Fig. 2e). Sub-region I experienced a comparatively higher rate of Hr as it located          Exploring the most suitable model for ET o computation using the RF algorithm. The importance degree analysis of the empirical models was calculated by the RF method at different sub-regions of Bangladesh. The RF model depicted a suitable significant model followed by other analyses against the FAO56-PM model for all of the sub-regions ( Table 6). The best performed model is highlighted by bold face and the least performed model is marked by italic face (Table 6) for estimating ETo. As can be seen from Table 6, the ET o,6 outperformed for ETo estimation in all the sub-regions of Bangladesh as it produced a comparatively higher importance degree. Whereas, ET o,5 found as the worst model producing the lowest importance degree among all the models.

Validation of the best alternative model for ET o,s .
Validation is very important for determining the most suitable model from several potential models. All the applied empirical models need to validate to ensure whether the pre-analyses gave the accurate results or not and to find the best suitable alternative empirical model against the FAO56-PM. Validation was undertaken by utilizing the linear correlation method. Following Peng et al. 11 , the linear correlation was calculated by the below Eq. (33): where, a and b denoted as fitted coefficients. Table 7 shows the fitted a, b and R 2 values of correlation between ET o,s and ET o,i in seven sub-regions and whole Bangladesh. A strong correlation between ET o,s and ET o,i was found in all sub-regions and whole Bangladesh. Values of R 2 is greater than 0.8 in every sub-regions and in whole Bangladesh for all the 13 empirical models indicating a strong correlation between ET o,s and ET o,i . The model which is highly correlated with the ET o,s is highlighted with light pink color. Among 13 models ET o,6 performed best, producing greater R 2 values than the other models. ET o,6 model is simple among all the models used in this study as this model utilized only Tmax and Rs. Rs is calculated from T max and T min . T max and T min are available everywhere in each region and can estimate easily. So, it can be affirmed that the ET o,6 is the best suitable, preferred, accurate, simple and reliable model, highly consisted with the results of pre-analyses (spatial and temporal distribution, performance evaluation and importance degree analysis), for ETo estimation in all the sub-regions and whole Bangladesh.     (Turc) in Greece. There also existed a strong correlation (r-value 0.996) between ET o,s , and ET o,i (Blaney-Criddle) explored by Tabari et al. 40 in Iran. Peng et al. 11 found the largest correlation between monthly ET o,s , and ET o, 6 53 in China. Present study found the Abtew (ET o,6 ) model as the most reliable model for estimating ETo, compared to the other empirical models in all the sub-regions of Bangladesh, as this model produced a higher importance degree.
Evaluation of the performance of different empirical models for estimating the ET o,s was finally validated by Eq. (33). This study explored that Abtew (ET o,6 ) model outperformed other models with R 2 values ranged from 0.856 to 0.989 at all the sub-regions of Bangladesh. This is in good agreement with the earlier performance appraisal results in which RMSE, MAE, MBE, and NSE are the lowest in ET o,6 model. The main reason is that this model has high precision, easy, consistent; requiring less climatic datasets and strong association with FAO56-PM. This model provides satisfactory outcomes and generally uses simple computable parameters and has easy model forms. Djaman et al. 33 also found a similar result as this study that the Abtew was the best alternative model against the FAO56-PM in New Mexico, USA. Li et al. 24 found Valiantzas 3 as the outperformed model among 13 empirical models for estimating ETo with R 2 values ranged from 0.882 to 0.993. Peng et al. 11 explored ET o, 6 56 as the best-performed model at all the sub-regions and EMC among ten empirical models for calculating ETo with R 2 values ranged from 0.87 to 0.99. Shiri 5 showed Priestley-Taylor outweighed the other 6 empirical models for estimating ETo with R 2 values ranged from 0.636 to 0.792. Mohawesh 38 found Penman as the best-performed model in different regions of Jordan for estimating ET o,s with R 2 values ranged from 0.66 to 0.78 74 . Similarly, the mass-transfer-based model was the optimal model in computing ETo compared to the other models in humid regions in Iran 40 and forest regions in Greece 32 . Based on the accuracy, reliability, simplicity and higher correlation with ET o,s , the most suitable method for ETo calculation in Bangladesh is ET o,6 model.

Conclusions
In this study, daily meteorological datasets from 20 weather stations from seven sub-regions in Bangladesh for the period of 1980-2017 were used. A widespread comparison between ET o,s (calculated by FAO56-PM) and ET o,i (calculated by HS, HM1, HM2, BT, WMO, ABT, IR1, IR2, MAK, PT, JH, TAB, TR, respectively) has been carried out. The possible roles of these 13 empirical models against the FAO56-PM were also explored in this study. Out of 20 stations, 5 stations showed an increasing trend of ET o,s ; 11 stations showed a decreasing trend and 4 stations showed no trend of ET o,s . Spatiotemporal distribution of ET o,s , and ET o,i revealed that the model proposed by Abtew model showing the closest distribution of ET o,i to ET o,s . RE, RMSE, MAE, MBE, and NSE were employed for evaluating the empirical models which were identified ET o,6 as the outperformed model with the lowest errors for calculating ETo in different sub-regions and whole Bangladesh. By contrast, ET o,5 (WMO) and ET o,13 (Turc) models selected as the poorer alternative models with the higher statistical errors. RF model also confirmed the Abtew as the outperformed model. The linear regression model showed that a strong linear correlation was found between FAO56-PM and Abtew model. Validation by using Eq. (33) explored the similar outcomes that the ET o,6 model outperformed than the other models. This study recommends the model proposed by the Abtew (ET o,6 ) as the best alternative model with high accuracy, reliability and lowest errors for all the subregions and whole Bangladesh for calculating ETo when full climatic datasets for FAO56-PM model are unavailable. Future study should be focused on the evaluation of machine learning ensemble models for estimating daily ETo in Bangladesh. This research is a vital scientific contribution to ETo quantification and influential empirical models in Bangladesh where the large set of meteorological datasets could not be acquired. This study provides an important guidance for agricultural water practices, hydrological processes and irrigation management in Bangladesh, also useful as well as the similar subtropical climate region elsewhere in the world.

Data availability
The datasets investigated in the present research are easily reached from the corresponding author on request.