Evapotranspiration Estimation using Six Different Multi-layer Perceptron Algorithms

Evapotranspiration has a vital importance in water resources planning and management. In this study, the applicability of six different multi-layer perceptron (MLP) algorithms, Quasi-Newton, Conjugate Gradient, Levenberg-Marquardt, One Step Secant, Resilient Back propagation and Scaled Conjugate Gradient algorithms, in modeling reference evapotranspiration ( ET 0 ) is investigated. Daily climatic data of solar radiation, air temperature, relative humidity and wind speed from Antalya City are used as inputs to the MLP models to estimate daily ET 0 values obtained using FAO 56 Penman Monteith empirical method. The results of the MLP algorithms are compared with those of the multiple linear regression models with respect to root mean square error (RMSE), mean absolute error (MAE), Willmott index of agreement (d) and determination coefficient (R 2 ). The comparison results indicate that the Levenberg-Marquardt is faster and has a better accuracy than the other five training algorithms in modeling ET 0 . The Levenberg-Marquardt with RMSE = 0.083 mm, MAE = 0.006 mm, d = 0.999 and R 2 = 0.999 in test period was found to be superior in modeling daily ET 0 than the other algorithms, respectively.


Introduction
Accurate estimation of reference evapotranspiration (ET 0 ) has a vital importance for many studies such as hydrologic water balance, the design and management of irrigation system and water resources planning and management. The Penman-Monteith FAO 56 (PM FAO-56) model is recommended as the sole method for calculation of ET 0 and it has been reported to be able to provide consistent ET 0 values in many regions and climates [1][2]. The main shortcoming of the PM FAO-56 method is, however, that it needs large number of climatic data and variables which are unavailable in many regions (especially in developing countries like Turkey).
Recently, the multi-layer perceptron (MLP) neural networks successfully applied in ET 0 estimation. Kumar et al. used MLP models for the estimation of evapotranspiration and they found that the MLP performed better than the PM FAO-56 method [3]. Trajkovic et al. applied radial basis function neural networks in ET 0 estimation [4]. Kisi investigated the accuracy of the MLP with Levenberg-Marquardt training algorithm and reported that MLP can be successfully employed in modeling ET 0 from available climate data. MLP models were compared with some empirical models and found to have better accuracy in estimating ET 0 [5]. Gorka et al. Rahimikhoob investigated the use of MLP for estimating ET 0 based on air temperature data under humid subtropical conditions and found that MLP performed better than the Hargreaves method [6]. Marti et al. estimated ET 0 by MLP without local climatic data [7]. Marti et al. examined the 4-input MLP models for ET 0 estimation through data set scanning procedures [8]. Several contributions on MLP modeling in ET 0 estimation were reviewed by Kumar et al. [3]. Shrestha and Shukla used support vector machine for modeling of ET 0 using hydro-climatic variables in a subtropical environment [9]. Gocic et al. applied extreme learning machine for estimation of reference evapotranspiration and compared with empirical equations. It is evident from the literature; there is not any published work that compares the accuracy of, in modeling daily ET 0 [10].
The aim of this study is to investigate the accuracy of six different MLP algorithms, Quasi-Newton, Conjugate Gradient, Levenberg-Marquardt, One Step Secant, Resilient Backpropagation and Scaled Conjugate Gradient algorithms, in daily ET 0 estimation.

Materials
Daily weather data from Antalya Station (latitude 36° 42' N, longitude 30° 44' E) operated by the Turkish Meteorological Organization (TMO) in Turkey were used in the study. The station is located in Mediterranean Region (Figure 1) of Turkey and 47 m below the sea level. It has a Mediterranean climate (dry summers and wet winters). The maximum temperatures are 24°C for winter and 40°C for summer.
The data sample is composed of 7743 daily  records of solar adiation (SR), air temperature (T), relative humidity (RH) and wind speed (U 2 ). First 4645 data (60% of the whole data) were used to train the MLP models, second 1549 data (20% of the whole data) data were used for validation and the remaining 1549 data (20% of the whole data) were used for testing. Statistical parameters of the used weather data are reported in Table 1. In this table, the x mean , S x , C sx , C v , x min , and x max denote the mean, standard deviation, skewness, coefficient of deviation, minimum, and maximum, respectively. It is clear from the table that the relative humidity shows a skewed distribution. SR seems to be most effective parameter on ET 0 according to the correlation analysis. T mean and RH are the second and third most effective parameters on the ET 0 .

Multi-layer perceptron
Multi-layer perceptron is inspired from biological nervous system, though much of the biological detail is neglected. MLP networks are massively parallel systems composed of many processing elements. The MLP structure used in the present study is shown in Figure 2.
The network consists of layers of parallel processing elements, called neurons. Each layer in MLP is connected to the proceeding layer by interconnection weights. During the training/calibration process, randomly assigned initial weight values are progressively corrected. In this process, calculated outputs are compared with the known outputs and the errors are back propagated to determine the appropriate weight adjustments necessary to minimize the errors.
In the present study, six different training algorithms, Quasi-Newton (QN), Conjugate Gradient (CG), Levenberg-Marquardt (LM), One Step Secant (OSS), Resilient Back propagation (RB) and Scaled Conjugate Gradient (SCG), were used for adjusting the MLP networks. The detailed theoretical information about MLP can be found in Haykin [11].
Choosing optimal hidden nodes' number is a difficult task in developing MLP models. In this study, the MLP with one hidden layer was used and the optimal hidden nodes were determined by trialerror method. The sigmoid and linear activation functions were used for the hidden and output nodes, respectively. Two different iteration numbers, 1000 and 5000 were used for the MLP training because the variation of error was too small after 5000 epochs. A MATLAB code including neural networks toolbox was used for the MLP simulations. Four weather parameters were used as inputs to the MLP models to estimate ET 0 . Root mean square errrors (RMSE), mean absolute error (MAE), Willmott index of agreement (d) and determination coefficient (R 2 ) statistics were used for evaluation of the applied models. The RMSE, MAE and d can be defined as:   Optimal hidden node number that gave the minimum RMSE errors in the validation period was selected for each MLP model. In Table 2, (4, 10, 1) indicates a MLP model comprising 4 input, 10 hidden and 1 output nodes. The QN, CG, LM and RB algorithms has the same optimal hidden node numbers for the 1000 and 5000 epochs. The hidden node numbers of the OSS and SCG algorithms decrease by increasing epoch numbers. Actually, the runs of the LM, QN and CG algorithms were automatically stopped after 24, 830 and 354 epochs, respectively. It can be said that these epochs are enough for the training of QN, CG and LM algorithms because the error gradients are too small after these epochs. For this reason, the structure, training duration and accuracies of these three algorithms are same for the 1000 and 5000 epochs. It is clearly seen from Table 1 In which the N and ET show the number of data sets and reference evapotranspiration, respectively.

Application and results
Training, validation and test results of the MLP algorithms are given in Table 2. Training duration is also provided in this    Table 2, it is clear that the LM algorithm performs better than the other algorithms in daily ET 0 estimation in validation stage. There is a slight difference between the QN and LM algorithms. The accuracy ranks of the algorithms for the 1000 epochs are; LM, QN, SCG, OSS, CG and RB. In the case of 5000 epochs, the ranks are; LM, QN, SCG, RB, OSS and CG. The multiple linear regression (MLR) model results are also included in Table 2 for the test stage. It is obviously seen from the table that the LM algorithm has almost same accuracy with the QN and they perform better than the other four algorithms in test stage. In the case of 1000 epochs, the accuracy ranks of the algorithms in the test period are; LM, QN, SCG, CG, OSS and RB. In the case of 5000 epochs, however, the ranks are; LM, QN, SCG, OSS, RB and CG as found in the training period. All the algorithms are found to be better than the MLR in estimating daily ET 0 .
The scatterplots of the ET 0 estimates for the 1000 epochs are illustrated in Figure 5. It is clear from the fit line equations and R 2 values in the figure that all the algorithms gave better estimates than the MLR model. It is evident form the scatterplots that the slope of the LM algorithm (0.9962) is closer to the 1 than those of the other algorithms. The CG and OSS algorithms have much more scattered estimates than the QN, LM, RB and SCG. Figure 6 demonstrates the ET 0 estimates of the six MLP algorithms for the 5000 epochs. Here also the estimates of the LM algorithm are closer to the corresponding FAO-56 ET 0 values than the other five algorithms. The CG algorithm gave the worst estimates.
Ladlani et al. modeled daily FAO 56 PM ET 0 in the north of Algeria using two different ANN methods, radial basis neural networks (RBNN) and generalized regression neural networks (GRNN) [12]. Climatic data of daily mean relative humidity, sunshine duration, maximum, minimum and mean air temperature and wind speed were used as inputs to the applied models. The optimal RBNN and GRNN models provided the R 2 of 0.934 and 0.945, respectively. Adamala et al. applied second order neural networks (SONN) and compared with MLP method in estimating daily FAO 56 PM ET 0 in India [13]. They used inputs of daily climate data of minimum and maximum air temperatures, minimum and maximum relative humidity, wind speed and solar radiation in the models and they found that the best SONN and MLP models gave R 2 of 0.998 and 0.995, respectively. Yassin et al. used MLP and gene expression programming (GEP) in estimating FAO 56 PM ET 0 in Saudi Arabia [14]. They used daily data of maximum, minimum and mean air temperatures, maximum, minimum and mean relative humidity, wind speed at a 2 m height and solar radiation as input s to the models [15][16][17][18][19]. They found R 2 of 0.998 and 0.954 for the best MLP and GEP models in in estimating ET 0 . It is clear from Table 2 that the MLP models (R 2 values range 0.995-0.999) accurately estimate daily FAO 56 PM ET 0 of Antalya station from the R 2 viewpoint [20][21][22][23][24].
In overall, the LM and QN generally performed superior to the other algorithms in estimating daily FAO 56 PM ET 0 . Like QN method, the LM algorithm was designed to approach second order training speed [25][26][27][28]. They can converge much faster than first order algorithms such as CG, OSS, RB and SCG. However, the main disadvantage of these approaches is that they require large memory space for approximation when training has large-sized patterns. LM algorithm is viewed as a standout amongst the most efficient algorithms for training small and medium sized patterns [29][30].

Conclusion
This study investigated the accuracy and training speed of six different MLP algorithms, Quasi-Newton, Conjugate Gradient, Levenberg-Marquardt, One Step Secant, Resilient Backpropagation and Scaled Conjugate Gradient algorithms, in estimating daily reference evapotranspiration. The results of the MLP algorithms are compared with those of the multiple linear regression models with respect to root mean square error mean absolute error and determination coefficient. The LM was found to be faster and had a better accuracy than the other five training algorithms in estimating daily ET 0 . A slight difference exists between the QN and LM algorithms. The worst estimates were obtained from the CG algorithm. Comparison with multiple linear regression indicated that all the considered algorithms performed better than the MLR in estimating daily ET 0 .