Recent increase in the observation-derived land evapotranspiration due to global warming

Estimates of change in global land evapotranspiration (ET) are necessary for understanding the terrestrial hydrological cycle under changing environments. However, large uncertainties still exist in our estimates, mostly related to the uncertainties in upscaling in situ observations to large scale under non-stationary surface conditions. Here, we use machine learning models, artificial neural network and random forest informed by ground observations and atmospheric boundary layer theory, to retrieve consistent global long-term latent heat flux (ET in energy units) and sensible heat flux over recent decades. This study demonstrates that recent global land ET has increased significantly and that the main driver for the increased ET is increasing temperature. Moreover, the results suggest that the increasing ET is mostly in humid regions such as the tropics. These observation-driven findings are consistent with the idea that ET would increase with climate warming. Our study has important implications in providing constraints for ET and in understanding terrestrial water cycles in changing environments.


Introduction
Evapotranspiration (ET), the turbulent exchange of moisture and heat fluxes between the land and atmosphere, is a key process affecting the terrestrial hydrological cycle and climate variability (Gentine et al 2016, Miralles et al 2020. As a key component of water and energy balances, change in surface ET is an important metric for quantifying the evolution of the global water cycle in a changing environment. It was reported that due to climate change and global warming, global land ET might have increased by 10% during the period of 2003-2019(Pascolini-Campbell et al 2021. In addition to the radiative effect, vegetation biophysical controls also play a key role in regulating land ET (Wagle et al 2015, Zhou andWang 2016); the canopy structure and vegetation physiological effects in response to rising atmospheric CO 2 concentrations are typically modifying ecosystem transpiration and surface energy fluxes (Williams and Torn 2015, Yang et al 2019, Forzieri et al 2020. Changes in surface vegetation affect the partitioning of latent heat flux (LE) (ET in energy units) and sensible heat flux (H) through land-atmosphere interactions (Swann et al 2016, Lemordant et al 2018, Teuling et al 2019, Lansu et al 2020. However, it remains a challenge to represent the nonlinear response of LE and H to the changes in surface conditions such as vegetation physiological regulation and human water/land management. Own to the limitations of direct in situ observation, ET at continental and global scales is usually estimated using model simulation, remote sensing retrieval, and upscaling of in situ observations (Jung et al 2010, 2019, Haddeland et al 2011, Mueller et al 2011, Mu et al 2011, Miralles et al 2013. Among these, machine learning approaches to estimate surface fluxes based on accessible data (e.g. flux tower, satellite remote sensing, and weather station observations) has emerged in recent years, mainly due to their accuracy in deriving observational surface fluxes (Jung et al 2011, Alemohammad et al 2017, Jung et al 2019, Wang et al 2021. Artificial neural network (ANN) and random forest (RF) are the most commonly used machine learning methods for estimating global LE and H based on the FLUXNET, satellite remote sensing, and meteorological observational data. For example, Alemohammad et al (2017) developed an ANN model which used remote sensing information and other meteorological data as inputs to retrieve monthly LE and H on a global scale. Jung et al (2010) trained a model tree ensemble (MTE) based on the global measurements of FLUXNET towers, and then retrieved LE and H using instantaneous satellite remote sensing and reanalysis data as driven data. Recently, Jung et al (2019) further provide an ensemble product of surface energy fluxes based on MTE (i.e. the FLUX-COM), which are based on two methods of driven data: pure remote sensing and remote sensing plus meteorological data. However, the surface flux estimation results of the neural network and the tree model can be inconsistent.
Existing studies have reported different trends in global land ET due to the various methods and driven data. Jung et al (2011) found that the global land ET showed a decreasing trend during the 1982-2008 period, mainly due to the limitation of soil moisture water supply. Another study finds positive trends in land surface ET during the 1980-2011 period, and it was suggested that interdecadal changes in ET can be significantly impacted by the activity of the El Niño-Southern Oscillation (Miralles et al 2013). Pascolini-Campbell et al (2021) used gravity satellite observation and the water balance principle to estimate global land ET and found an 10% increase in global ET, which was mainly due to the rising air temperature. These various trends highlight the need to better constrain the estimates of global land ET and understand the drivers of ET change. Beside a small sensitivity as to the choice of the algorithm, land surface conditions are usually assumed to be stationary, i.e. the changes in CO 2 , nutrients or vegetation are not directly captured by remote sensing and reanalysis data. Moreover, these trends in ET driven by satellite remote sensing and reanalysis data are relatively short-term. The major advantage of remote sensing in ET is that it can provide large-scale and high-spatial coverage ET estimates, but it also can lose key signals in surface processes such as plant physiological regulation. However, vegetation physiologal response to rising CO 2 concentrations is thought to be critical for estimating long-term changes in land surface ET and terrestrial water cycles (Swann et al 2016, Scott and Biederman 2017, Lemordant et al 2018. Therefore, it is critical to better assess long-term changes in global land ET while also accounting for land surface conditions. In this study, we present a modified ANN and RF model for retrieving global land ET. This strategy retrieves consistent LE (ET in energy units) and H based on boundary layer energy budget and driven by ground-based observations of flux towers and weather stations. A major advantage of such retrieval is that it does not rely on any parameter assumptions, as it is directly informed by boundary layer heat and moisture budget (Gentine et al 2016), and thus it can reflect the effects of CO 2 fertilization or land condition changes on long-term surface fluxes. This study aims to provide an ensembled retrieval of land ET on a global scale and to improve our physical understandings of the driving factors of ET variability in a changing environment.

Model training data
The original observational dataset used to train machine learning models were collected from the half-hourly/hourly and integrated daily products from the FLUXNET2015 FULLSET dataset (Pastorello et al 2020) (https://fluxnet.org/data/ fluxnet2015-dataset/). To control the quality of the observational data, we only used the measurements and the high-quality gap-filling data from the 212 globally distributed flux towers. The distributions of the flux towers covers different climate zones and vegetation types (supplementary figure S1(a) available online at stacks.iop.org/ERL/17/ 024020/mmedia), and thus the observation data has representative under various climatic conditions. Based on the same flux tower observational data, two typical machine learning models (ANN and RF models) were employed to retrieve global land LE and H at a daily scale from 1975 to 2017. After model testing, shortwave radiation at the top of atmosphere (SW_IN_POT), monthly moving average precipitation (monthly P), maximum temperature (max T), minimum temperature (min T), relative humidity (RH), and surface wind speed (WS) were determined to be the variables for building the ANN and RF models (supplementary table S1). In this study, daily RH is calculated by using the daily data of vapor pressure deficit (VPD) based on the Clausius-Clapeyron equation.

Strategy for estimating ET using weather observations
The diurnal courses of air temperature and humidity are directly related to the rate of changes in LE and H. One of the main advantages of using the diurnal changes in temperature and humidity observed by weather stations to retrieve LE and H is that it does not rely on any assumptions as to the relationship between environmental and surface fluxes (Salvucci and Gentine 2013, Rigden and Salvucci 2015, Gentine et al 2016. Thus, the diurnal course of surface temperature naturally reflects the influences of vegetation photosynthesis and any changes in vegetation response to CO 2 . For instance, if the opening rate of vegetation stomata are decreased due to changes in plant water use efficiency (WUE) or biomass, they will increase H and decrease LE. This will in turn lead to an increase in daily temperature range in the boundary layer and a decrease in air humidity (figure 1). Wang et al (2020) found that the fertilization effect of vegetation has declined over recent decades, and thus the Fluxnet observational periods are long enough to capture the influence of vegetation physiological regulations on ET. This presented strategy can be used to estimate ET while also considering any changes in land surface conditions, as long as the environmental factors affect the variability of surface fluxes. In addition, the retrieval does not require remote sensing information on vegetation (such as leaf area index), and thus the surface fluxes can be retrieved over long time periods that are not covered by satellites. The relationships between surface fluxes and various environmental factors are very nonlinear, and thus we embed our strategy into machine learning models as they have powerful nonlinear regression capability.

ANN model
ANN is a machine learning algorithm with powerful ability for nonlinear regression. ANN can reproduce complex nonlinear relations between various environmental conditions (e.g. supplementary figure  S2(a)). Previous studies have shown that ANN models have good performance in retrieving surface water and heat fluxes (Zhao et al 2019, Chen et al 2020. In this study, we trained two multi-layer feedforward neural network models for predicting daily LE and H, respectively. In this study, the ANN model has one input layer, five hidden layers, and one output layer. Daily LE and daily H are the output of the ANN model. In the process of training ANN models, the input data is randomly divided into three subsets, with a percentage of 80%, 10%, and 10% for training, validating, and testing, respectively. Mean squared error (MSE) is set to be the metric used to evaluate model performance in the process of model training and adjustment of weights. The root mean square error (RMSE) and the Pearson correlation coefficient (R) are used to analyze the results of the predicted LE and H. Moreover, an optimal neural network model was found to be consist of five hidden layers and 20 neurons in each hidden layer. The activation function of hidden layers is set to be the tangent sigmoid function, and the output layer is a linear function. The maximum number of training epochs and the training accuracy target are set to be 500 epochs and 0.0001, respectively. Once one of the parameters exceeds the setting thresholds, early stopping was activated to control the risk of overfitting. Consistent with the inputs used to train ANN model using the Fluxnet observational data, daily SW_IN_POT, monthly averaged P, daily max T, daily min T, daily RH, and daily WS are used to drive the well-trained model.

RF model
RF uses a bootstrap resampling method to extract multiple sample subsets from the original sample to construct multiple decision trees, and then fuse the prediction results of those multiple decision trees (e.g. supplementary figure S2(b)). RF regression is a method based on non-parametric regression, which does not require statistical assumptions on predictor and target variables. With this flexibility, the RF algorithm is suitable for detecting the nonlinear response of surface water and heat fluxes to the changes in various environmental factors. In this study, we use the same Fluxnet data that has been used to train the ANN models to train two RF models for predicting daily LE and H, respectively. As with the well-trained ANN models, SW_IN_POT, monthly P, max T, min T, RH, and WS are the input variables of the RF model. Daily LE and daily H are the output targets. R and RMSE are also used to evaluate the performance of the RF model in the training process. In the training process of RF, 90% of the entire Fluxnet data was randomly used as the training data set, and 10% was retained to be the test data set. Generally, the complexity of the RF model is directly proportional to the number of decision trees (Ntree); the larger the Ntree, the greater the consumption of training time. Therefore, to ensure the diversity of classifiers, the RF model is trained using 800 trees in this study. To maximize the performance of the RF model, we set the maximum depth of tree and the maximum leaf node as default.

Model-driven data from weather stations
This study used globally distributed weather station observations during the 1975-2017 period to drive models (supplementary figure S1(b)), as the increasing trends in global air and land surface temperature have become more pronounced since the 1970s (Hartmann et al 2014). We collected daily observation records of precipitation, mean temperature, maximum and minimum temperature, dew point temperature, and surface wind speed from the weather records of the Global Daily Summary (GSOD) product, which are available from the National Centers for Environmental Information (NCEI) (www.ncei.noaa.gov/data/global-summaryof-the-day/archive/). The quality of the meteorological observational records that have strictly controlled the dew point temperature is used to calculate the actual vapor pressure. The details of processing weather station data and the semi-empirical model for calculating SW_IN_POT at the weather station position can refer to Wang et al (2021).

The predictions of LE and H
The ANN and RF models showed very similar performance for estimating LE and H at the daily scale (figure 2). In these 212 flux towers, all correlations of predicted and observed daily LE reached 0.83, and all are statistically significant at the p < 0.001 level. As for the prediction of H, all correlations of predicted and observed daily H can reach 0.79 (p < 0.001). Therefore, the predicted results of both models are highly and significantly correlated with the observed values. As for the RMSE of the two models, the RMSE of predicted and observed LE are less than 24.09 W m −2 , and the RMSE of predicted and observed H areless than 28.18 W m −2 . In spatial patterns, the predictions of the ANN and RF models show high correlation and the R in most global land areas exceeds 0.90, especially in the northern hemisphere (supplementary figure S3). The R and RMSE between the predicted and observed LE and H ensembles are also examined (supplementary figure S4). We found that the at the locations close to the ocean, the predicted results differ greatly from the actual observations. The bias of estimated LE and H are relatively large in the coastal areas of Australia, while the machine learning approach performs better in the northern hemisphere. In terms of the average of different latitudes, both models predict that tropical and subtropical regions are areas with high LE, and the high latitudes of the northern hemisphere are areas with low LE (supplementary figure S5). The spatial pattern of H is similar to the pattern of LE. For example, southwest North America, eastern Amazon, North Africa, and Australia are regions with relatively high H. As for the bias of estimation, the predicted LE (H) of the two models show larger RMSE in the tropics and its nearby regions such as southeast Asia. Although the ANN and RF models have very similar performance in predicting surface fluxes, there are slightly different in spatial distribution characteristics. In order to reduce the uncertainties causing by different algorithms and model structures, we calculate the ensemble of global land LE and H from the ANN and RF models. The global mean daily LE predicted by the ANN and RF models ranges from 0 to 120 Wm 2 d −1 , or equivalently to mean daily ET of 0-4.23 mm d −1 and mean annual ET of 0-1545 mm. The mean annual ET is comparable to estimates of the MTE model (0-1400 mm) during 1982-2008, when remote sensing data are available (Jung et al 2010).

Trends in the global surface fluxes
The long-term trends in ensemble LE, H, and evaporative fraction (EF), i.e. the ratio of LE to the surface available energy, are further estimated over our study period. In space, the trends in LE (H) predicted by the ANN and RF models are overall consistent with each other (supplementary figure S6). Meanwhile, there are some differences in the trends between the prediction of the ANN and RF models. The downward/upward trends of LE and H predicted by the ANN model are slightly larger than those predicted by the RF model, such as in Australia. Here, we typically focus on the changes in the ensemble of ET ( figure 3). The ET ensemble mainly shows an upward trend of 0-7.00 mm yr −1 for most global land areas. EF can be used as a proxy for soil moisture (Gentine et al 2007(Gentine et al , 2010, and thus the declining trends in ET ensemble in West Asia and western Russia are mainly caused by the limitation of soil moisture or a decrease of surface conductance ( figure 3(b)). Overall, the ET and EF ensembles show upward trends on most global land, except for some fractional land surfaces such as Western Russia, West Asia, the Mediterranean region, and southwestern United States (figure 3). As for the latitudinal averaged trends, tropical and subtropical regions are the primary areas where ET and EF increased significantly, as these humid areas have sufficient water supply to meet the demand for evaporation in a warming world. Moreover, a positive trend in the ET ensemble is detected in the temporal changes of ET over the past 43 years, and the mean upward rate is 1.11 mm yr −1 . Therefore, our observation-driven ET trends are consistent with the idea that global land ET should increase in a warming climate.

Influences of climate variability
Precipitation is the main source of water supply for land ET, and temperature is a main driver of atmospheric evaporation demand. Climate change and global warming may have a significant impact on global land water cycles. To investigate the two respective influences of precipitation and temperature on ET changes, we designed two sets of experiments, i.e. the models are driven by the weather station data (a) that removes the linear trend of precipitation (detrend P) and (b) that removes the linear trend of max T and min T (detrend T). Mean air temperature on the global land is mainly dominated by an upward trend, especially in the mid-to-high latitudes of the northern hemisphere (supplementary figure  S7(a)). The global mean ET and mean T show a significant correlation (R = 0.89, p < 0.001) ( figure 4(a)). After removing the linear trend in mean air temperature, the ET (detrend T) presents a mean positive trend of 0.52 mm yr −1 , while the ET (detrend P) shows a mean positive trend of 1.01 mm yr −1 . Although the rates of change in precipitation are larger than temperature (figure S7), our results emphasize that the  recent rising temperature has played a more important role than the changes in precipitation for the increase of global land ET. This is because even a small increase in temperature has a positive effect on the increase in ET. Meanwhile, we note that precipitation is a second-order effect. Therefore, our results recognize P and T to be two independent variables. In this case, the contribution of rising temperature to the increase in global land ET reached 87% over the past few decades.
The global land ET in response to climate change also presents large regional differences. Significant (c) The spatial pattern shows the trends in ET when removing linear trends in precipitation. The red curves mark the hotspot at mid-to-high latitudes of the northern hemisphere (RMHL) and the hotspot in tropic and subtropic (RTS). The hotspots are areas where there is a relatively significant correlation between mean ET and mean T. correlations between mean ET and mean T are observed in the hotspots, i.e. the region at midto-high latitudes of the northern hemisphere (RMHL) and the region in tropic/subtropic (RTS) ( figure 5(a)). Mean ET and T show a negative correlation on several land surfaces, i.e. southwestern North America, the Mediterranean region, West Asia, South Africa, and Western Australia. Since global temperatures mainly are substantially increasing over recent decades, these regions are the typical areas where drought event are prone to occur under global warming and climate change. Thus, a declining trend in ET is mainly due to the limitation of soil moisture in these areas .
In terms of the magnitude of upward trend, the influence of climate warming on ET is more significant in humid regions such as the Amazon region, West Africa, and Northwest India. The land ET (detrend T) shows a downward trend for most land areas ( figure 4(b)). This is because the water vapor contained in the air usually follows the Clausius-Clapeyron relation that the atmosphere can hold 7% more water for every 1 • C temperature increase, which potentially leads to an increase in annual P and thus result in a wet trend (Ban et al 2015, Papalexiou andMontanari 2019). However, the changes in the ET trend are very small when removing the trends in P (figure 4(c)). Therefore, the warming climate has an important impact on ET changes in the mid-tohigh latitudes of the northern hemisphere and in the tropical region. As for the temporal changes of land ET in the these hotspots, the correlations of mean ET and mean T in the RMHL and the RTS are 0.89 and 0.78, and both correlations are significant (p < 0.001). The mean upward trend of land ET at the RMHL is 0.05 mm yr −1 , while the mean upward trend of land ET at the RTS is 1.81 mm yr −1 (supplementary figure  S8). Therefore, although temperature rises rapidly at mid-to-high latitudes of the northern hemisphere, the humid regions including the tropics and their surrounding regions are identified to be the areas contributing more to the increase in global land ET. This is mainly due to the fact that the humid regions can provide sufficient water supplies to meet atmospheric evaporation demand and the physiological activities of vegetation.

Discussion
This study retrieved an ensemble of ET based on the boundary layer energy budget and using machine learning models driven by ground observations. In the stage of model training, we found that the machine learning approaches for retrieving LE and H have relatively larger uncertainty in Australia, especially in coastal areas. The reason for this uncertainty may be that Australian land is surrounded by the ocean, and thus the surface fluxes are easily affected by the atmospheric circulation such as water vapor from the ocean. Yet, the nonlinear machine learning models cannot well characterize the surface fluxes under such unstable conditions. Our observation-derived global land ET results show that the global land ET increased over the 1975-2017 period, with a positive linear trend ranging from 0 to 7.00 mm yr −1 , and the global land mean ET shows an upward trend of 1.11 ± 0.03 mm yr −1 . The rate of increased ET during the 1975-2017 period is lower than the positive trend of 2.30 ± 0.52 mm yr −1 during the 2003-2019 period, which is reported by the existing ET estimation derived from the Gravity Recovery and Climate Experiment (GRACE) and water balance method (Pascolini-Campbell et al 2021). As our study is conducted over a relatively longer period of time, the estimated range of upward trend is acceptable.
The increased global land ET is a key metric for the accelerated hydrological cycle under global warming. An increased ET indicates more loss of land surface water, which can intensify the drought stress on terrestrial ecosystems, thereby affecting water resources, climate, and agriculture (Jagermeyr et al 2021). Our observation-driven results show that the positive trend of land surface ET is mainly driven by the increasing temperature, which can also cause an increase in extreme precipitation, creating a negative feedback to offset the warming trend. Yet, we found that the warming climate dominates the increase in global land ET, and that the contribution of precipitation variability to the increase in ET is limited. Thus, the acceleration of the global water cycle was initially caused by the global warming trend. It is worth noting that the increase in ET may be related to other factors such as changes in VPD in addition to rising mean temperatures (Zhang et al 2013, de Kauwe et al 2017. It seems intuitive that the increasing VPD increases the atmospheric water demand. Although ET incresases in response to an increase in atmospheric demand, plants can reduce ET by closing stomata in response to increased VPD. ET responses are due to climate change and plant photosynthesis strategy. Massmann et al (2019) found that tropical and temperate climate zones are more likely to show positive ET responses to the increase in VPD than northern and Arctic climates. Meanwhile, different ecosystems have different ET responses. Therefore, the deeper mechanisms of ET responses need to be further explored on different time scales.
Our ET estimates are derived from ground-based observations of FLUXNET towers and global weather stations, and thus they do not rely on any parameter assumptions and can capture signals of changes in surface conditions such as the effects of vegetation change and human activities on terrestrial water cycles. It should be emphasized that the performance of observation-driven machine learning models for ET estimation can be influenced by the quality and the distribution of weather stations. Therefore, our models and results may have limitations in areas with few weather stations. Meanwhile, the models show limited performance over areas with significant moisture and heat exchange such as the Amazon and Australia coastal region.

Conclusions
In this study, we retrieved consistent global land LE and H ensembles over recent decades using machine learning approaches, informed by boundary layer energy budget and ground-based observations. The results can provide observational controls on the prediction of global land ET while also accounting for surface condition change. Furthermore, the responses of global land ET to climate change were quantitatively analyzed. Major conclusions are summarized as follows.
(a) Recent global land ET has increased significantly (p < 0.001), and the increased ET is mainly attributed to the rising temperature, which is consistent with the thermodynamic hypothesis that global land ET would increase with climate warming. On a global average, the rising temperatures contributed about 87% to the increase in global land ET over the past few decades. (b) The long-term trends in global land ET present large spatial differences. Humid regions such as the tropics and their surrounding areas are identified to be the areas with relatively larger increase trend and with more contribution to the increase in global land ET. (c) Our observation-driven results can provide constraints for long-term global land ET, and the observation-derived findings have important implications for further understanding the global terrestrial water and energy cycles in a changing environment.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).