Comparison of Remotely Sensed Evapotranspiration Models Over Two Typical Sites in an Arid Riparian Ecosystem of Northwestern China

: Accurate estimates of evapotranspiration (ET) are essential for the conservation of ecosystems and sustainable management of water resources in arid and semiarid regions. Over the last two decades, several empirical remotely sensed ET models (ERSETMs) had been developed and extensively used for regional-scale ET estimation in arid and semiarid ecosystems. These ERSETMs were constructed by combining datasets from different sites and relating measured daily ET to corresponding meteorological data and vegetation indices at the site scale. Then, regional-scale ET on a pixel basis can be estimated, using the established ERSETMs. The estimation accuracy of these ERSETMs at the site scale plays a fundamental and crucial role in regional-scale ET estimation. Recent studies have revealed that ET estimates from some of these models have significant uncertainties at different spatiotemporal scales. However, little information is available on the performance of these ERSETMs at the site scale. In this study, we compared eight ERSETMs, using ET measurements from 2013 to 2018 for two typical eddy covariance sites ( Tamarix site and Populus site) in an arid riparian ecosystem of Northwestern China, intending to provide a guide for the selection of these models. Results showed that the Nagler-2013 model and the Yuan-2016 model outperformed the other models. There were substantial differences in the ET estimation of the eight ERSETMs at daily, monthly, and seasonal scales. The mean ET of the growing season from 2013 to 2018 ranged from 465.93 to 519.65 mm for the Tamarix site and from 386.22 to 437.05 mm for the Populus site, respectively. The differences in model structures and characterization of both meteorological conditions and vegetation factors were the primary sources of different model performance. Our findings provide useful information for choosing models and obtaining accurate ET estimation in arid regions.


Introduction
Evapotranspiration (ET) refers to water transferring from the ground surfaces into the atmosphere, through soil evaporation (E) and plant transpiration (T) [1,2]. As an essential element of water use in arid and semiarid regions, ET plays a crucial role in the hydrological cycle. Water shortages and increasing human water demand frequently lead to fierce competition for water between humans and ecosystems in these regions [3,4]. As a result, the river flow decreases or even cuts off, and the structure and function of these ecosystems are deteriorated [5,6]. Therefore, accurate ET estimation is requisite for the conservation of ecosystems and sustainable management of water resources in these regions.
Over the last two decades, the integration of the eddy covariance approach and the remotesensing technique has been perceived as a feasible and effective way of estimating regional-scale ET [7][8][9]. Based on these techniques, several empirical remotely sensed ET models (ERSETMs) have been developed for ET estimation in arid and semiarid ecosystems. Most of these ERSETMs were constructed for riparian ecosystems under two simplified assumptions: (1) Plant transpiration predominates over ET with relatively weak soil evaporation in riparian ecosystems [10][11][12]; and (2) phreatophytes, which are the dominant vegetation types in riparian ecosystems, obtain water mainly from capillary fringe and groundwater, through root uptake, and have a reasonably steady supply of water [13,14]. Therefore, ET is considered to be principally determined by meteorological conditions and vegetation factors in these ERSETMs.
Generally, these ERSETMs were constructed at the site scale, and the main steps include the following: (1) measuring daily ET of different sites, using the eddy covariance flux towers; (2) compiling corresponding remotely sensed data and meteorological data for each site; (3) combining daily datasets from different sites and different vegetation types; and (4) establishing ERSETMs that relate measured daily ET to corresponding remotely sensed data and meteorological data, and validating the established ERSETMs [12,[15][16][17][18][19][20][21]. Once the construction process is completed successfully, regional-scale ET on a pixel basis can be estimated, using the established ERSETMs. Due to their reasonable estimation accuracy and the relatively low cost of input data acquisition, these ERSETMs have been extensively used for regional-scale ET estimation in arid and semiarid regions [22][23][24][25][26][27][28][29][30].
These ERSETMs differ significantly in model structures and characterization of both meteorological conditions and vegetation factors. Recently, studies have shown that ET estimates from some of these models are quite different from the estimating results of the traditional water balance method at different spatiotemporal scales. Nagler et al. [15] constructed an ERSETM based on the ET measurements from nine flux tower sites located in the Middle Rio Grande, Upper San Pedro River, and Lower Colorado River. They estimated ET over vast stretches of Western US riparian habitat, using the established ERSETM, and reported that the estimated annual ET rates were much lower compared with the values estimated from riparian water balance. Nouri et al. [31] compared ET estimation from an ERSETM [19] with a detailed soil water balance analysis in an urban parkland of Australia. They found that ET estimated by the ERSETM was not accurate at the monthly scale, but it provided good agreement on an annual time step due to the offset of errors. Moreover, Du et al. [32] employed the ERSETM presented in [19] and investigated the effects of distinguishing vegetation types on the ET estimates based on ET measurements from two eddy covariance sites. They noted that the ERSETM recalibrated based on the conventional method of combining vegetation types introduced more substantial uncertainties in ET estimation at different spatiotemporal scales. Undoubtedly, the estimation accuracy of these ERSETMs at the site scale plays a fundamental and crucial role in ET estimation at the regional scales and has a direct impact on the accuracy of regionalscale ET estimation. However, little information is available on the performance of these ERSETMs at the site scale.
In this study, we compared eight ERSETMs, using the ET measurements at two typical sites in an arid riparian ecosystem of Northwestern China. The primary purpose of this study was to (1) compare the performance of the eight ERSETMs at the site scale and (2) analyze the primary sources of different performance across these ERSETMs.

ERSETMs
In this study, we compared eight ERSETMs (Table 1) constructed for ET estimation in arid and semiarid ecosystems. Substantially, all of these ERSETMs are modifications of the crop coefficient method [7,33], in which the crop evapotranspiration is determined by the following equation: where is the reference evapotranspiration determined from the Penman-Monteith equation, using meteorological data [34], and is the crop coefficient that changes over a growth cycle, at different states of the crop development. In these ERSETMs, the modifications focused on the (meteorological conditions) and (vegetation factors) in the crop coefficient method and the combination of the two items.
On the one hand, the meteorological conditions in these ERSETMs were usually characterized by the daily maximum air temperature (Ta,m), nighttime land-surface temperature (Ts,n), and the reference evapotranspiration (ET0). According to the indicators used to characterize the meteorological conditions, these ERSETMs can be subdivided into two categories: temperature-based models and ET0-based models (Table 1). In these temperature-based models, the relationship between ET and temperature was expressed by using different functional forms. The Nagler-2005a model used a linear function to express the relationship between ET and temperatures above the threshold for vegetation growth [12]. In the Nagler-2005b model, the response of ET to temperature was described by using a logistic equation, by assuming three different stages. ET approaches zero for the temperature below a minimum temperature, remains relatively constant for the temperatures above a maximum temperature, owing to the increase of physiological resistance, and is proportional to the increase of vapor pressure deficit for the temperatures between them [15]. However, in the Scott-2008 model and the Bunting-2014 model, the relationship between ET and temperature was determined by an exponential function [16,17]. In these equations, the empirical coefficients a, b, c, d, e, and f are typically determined by the regression analysis. , and , are the daily maximum air temperature and the nighttime landsurface temperature, respectively. represents the vegetation indices, and * is the scaled , using Equation (2).
represents the leaf area index. is the reference evapotranspiration. Notably, during the initial construction of these ET0-based models, in the Nagler-2009 model and the Glenn-2015 model was calculated by using the Blaney-Criddle formula [35] and derived from the Penman-Monteith equation [34] in the Nagler-2013 model and the Yuan-2016 model.
On the other hand, the vegetation factors are related to the changes in the fraction of light absorption by the vegetation canopies over a growth cycle [12]. To characterize it, most of these ERSETMs used the expression (1−e −b×VI ) or (1−e −b×VI* ), which is modified from the Beer-Lambert Law, while others used the VI, VI*, or the leaf area index (LAI) directly (Table 1). Among them, VI can use the Normalized Difference Vegetation Index (NDVI) or the Enhanced Vegetation Index (EVI) [19], and VI * normalizes the VI dataset from different sites, using the following formula: where and are the VI values at bare soil area and full vegetation area, respectively. * converts the lowest and highest VI of each VI dataset to 0 and 1, respectively.
Additionally, the combination of meteorological conditions and vegetation factors is different in these ERSETMs. Generally, the combination of the two items usually includes two types: multiplication and addition. Most of these ERSETMs used the combination of multiplication, except for the Scott-2008 model, in which the addition approach was used (Table 1). Notably, the Glenn-2015 model is somewhat unique among these models, and it used a logistic equation to describe the behavior of ET concerning ET0 × VI [20] (Table 1).
All the differences mentioned above among these ERSETMs highlight the necessity to examine the performance of these ERSETMs.

Site Description
The study area is in the lower reaches of the Tarim River, which flows between the Taklimakan Desert and the Kuluk Desert and stretches from the Daxihaizi Reservoir to the Taitema Lake ( Figure  1). The study area has the characteristics of a relatively flat land surface [36] and a hyper-arid climate [37]. According to the meteorological observations from the Tikanlik Weather Station, the average annual temperature is 10.5 ℃, the annual precipitation is between 17 and 40 mm, and the potential evaporation is approximately 3000 mm yr −1 [6,37]. The vegetation is sparse and dominated by Tamarix ramosissima (T. ramosissima) and Populus euphratica (P. euphratica), which are typical phreatophytes, and belong to shrub and arbor forests, respectively [21,38]. Due to poor water conditions, the vegetation coverage and species richness of plant communities in the study area are relatively low [11].
According to the difference in vegetation types, we selected two typical sites to carry out this study. One site is dominated by T. ramosissima, and the other is dominated by P. euphratica. In this study, the former was called the Tamarix site, and the latter was called the Populus site.
The Tamarix

ET Data
The eddy covariance (EC) system was deployed on the flux tower of the Tamarix site and on the flux tower of the Populus site, respectively, to measure the energy and H2O/CO2 fluxes. The instruments have been described in detail in [10,11]. The measurements started in June 2011 over the Tamarix site and in June 2013 over the Populus site.
For both EC flux towers, fluctuations in vapor density, virtual air temperature, and wind speed were sampled at a frequency of 10 Hz. In addition, soil heat flux and meteorological data were measured and recorded every 30 min.
We used EC data from the end of April 2013 to the end of October 2018 in this study. For the Tamarix site, data for an extended period (form 29 January 2018 to 31 October 2018) were missing due to the malfunction of instruments. We did not perform gap-filling during the data process.
Raw EC data (10 Hz) were corrected and processed, using the EdiRe software (School of Geosciences, University of Edinburgh). In EdiRe, the correction procedures applied to the raw EC data included spike removal, spectral correction, coordinate rotation correction, sonic virtual temperature correction, lag correction of H2O/CO2 relative to the vertical wind component, and WPL density correction. All of the raw EC data were processed to 30 min averages.
The energy balance was forced to close by augmenting both latent and sensible heat fluxes while retaining the observed Bowen ratio [39], and the corrected latent heat flux (LE, W m -2 ) was used to calculate half-hourly ET (mm), using Equation (3): where is the latent heat of vaporization of water (2.45 kJ g −1 ); and is water density (1 g cm −3 ). Daily ET (mm d −1 ) was obtained by summing up all of the 48 half-hourly ET for one day. In the calculation process, when the half-hourly ET was negative, it was set to zero.

Ta,m and ET0 Data
Due to the malfunction of instruments or other errors, the daily meteorological variables measured by both flux towers were missing for some periods. In this study, we used the daily meteorological variables recorded at the Tikanlik Weather Station, which has a distance of approximately 30−40 km from the two study sites ( Figure 1). The meteorological variables include mean relative humidity, wind speed, pressure, the actual duration of sunshine, daily maximum air temperature, and daily minimum air temperature. These meteorological data were mainly used to calculate the reference evapotranspiration (ET0) through the FAO Penman-Monteith approach (Equation (4)) [34]: where, is reference evapotranspiration (mm d −1 ); is the net radiation at the crop surface (MJ m −2 d −1 ); is the soil heat flux density (MJ m −2 d −1 ), and it is assumed to be zero on daily time steps; is mean daily air temperature at 2 m height (℃); is the wind speed at 2 m height (m s −1 ); is saturation vapor pressure (kPa); is actual vapor pressure (kPa); − is saturation vapor pressure deficit (kPa); ∆ is the slope of vapor pressure curve (kPa ℃ −1 ); and γ is psychrometric constant (kPa ℃ −1 ). Detailed calculations of the parameters in Equation (4) based on the above meteorological data were presented in [34].
We also calculated ET0 by using the available daily meteorological variables observed by both flux towers during the study period, respectively, and compared them with the results calculated by using data from the Tikanlik Weather Station. For each site, the comparison results showed a high degree of correlation between ET0 calculated from these two sets of data (R 2 =0.927 for the Tamarix site and R 2 =0.934 for the Populus site; data not shown here).

NDVI Data
In this study, we downloaded Landsat 8 L1T scenes covering the 2013-2018 growing season (the growing season of T. ramosissima and P. euphratica is from DOY 121 to DOY 300 in the study area [10,21]) from the USGS GloVis [40]. Although the two study sites are in an overlapping part of two adjacent scenes (paths 141 and 142, row 32), most of the images have a high cloud cover, and only 65 cloud-free images were used in our analysis. To convert the original digital number (DN) values into surface reflectance, we performed radiometric calibration first and then atmospheric correction for all images, using the FLAASH module embedded in ENVI 5.3.1 (Exelis Visual Information Solutions, Boulder, CO, USA). Detailed processing was introduced in [32]. Afterword, the NDVI for each image was determined by using Equation (5) [41]: where and are the reflectance of the near-infrared band and red band, respectively. Finally, the NDVI time-series was constructed by stacking all the NDVI images.
Furthermore, we used the Savitzky-Golay filter technique [42,43] to smooth the NDVI timeseries, in order to minimize the perturbations due to anisotropic bidirectional effects, atmospheric instability, and system failure [44][45][46]. This method was selected because it performs better than other methods for NDVI noise-reduction in the arid ecosystems of Northwestern China [45]. The half-width of the smoothing window and the degree of the smoothing polynomial must be determined according to the NDVI observations when using this approach to smooth NDVI time-series. A smoother filtered NDVI time-series is usually generated by a higher half-width value of the smoothing window. It is typically set to an integer between 2 to 4 for the degree of the smoothing polynomial [43]. In this study, both of them were set as three.
Subsequently, to compare these ERSETMs comprehensively, the NDVI for each day throughout the whole growing season was required. Therefore, we interpolated the filtered NDVI of the growing season from 2013 to 2018 for both sites. The interpolation was performed on a pixel basis and included the following steps primarily: (1) The filtered NDVI was fitted to DOY by using a quadratic parabola; and (2) daily NDVI was estimated by using the fitted quadratic parabola function. In particular, the quadratic function was forced to pass through the first and last data points during the fitting process, in order to achieve a better interpolated daily NDVI time-series.
Finally, for the interpolated and the filtered NDVI time-series, the NDVI values of the pixels covering the footprint area of each flux tower were extracted and spatially averaged for each day. The averaged NDVI values were used for the analyses of each site. As an example, Figure 2a

Calibration of ERSETMs
In this study, we calibrated these ERSETMs in Table 1 for the Tamarix site and the Populus site separately. There are a few things to note concerning the model calibration. Initially, due to the high degree of correlation between Ts,n and Ta,m [47,48], the Ts,n in the Scott-2008 model and the Bunting-2014 model was replaced with the Ta,m following [12]. Furthermore, because the comparison of these ERSETMs in this study was made for the two study sites separately, we used the VI dataset of each site directly and did not scale them in the calibration of the Nagler-2005a model, Nagler-2005b model, Bunting-2014 model, and the Nagler-2009 model, following [16]. Additionally, in the calibration of the Nagler-2009 model and the Glenn-2015 model, ET0 computed using the Penman-Monteith equation was used rather than ET0 calculated from the Blaney-Criddle formula, since the former is preferred and considers the meteorological factors influencing ET more comprehensively [49][50][51].
Therefore, we used the daily ET, Ta,m, and ET0 estimated by using the Penman-Monteith equation, and the NDVI data to calibrate these ERSETMs.
We partitioned the dataset into two parts for each site, with approximately 80% and 20% of the data. The former was used as the calibration dataset and the latter as the validation dataset. Taking into account the relative integrity of the observed daily ET and the filtered NDVI, we used the 2017 data of the Tamarix site (9 samples) and the 2016 data of the Populus site (11 samples) as the validation dataset for the Tamarix site and the Populus site, respectively, and data from the other years (41 samples for the Tamarix site and 38 samples for the Populus site) were used as their respective calibration dataset.
We calibrated the eight ERSETMs by using the calibration dataset of the Tamarix site and the Populus site, respectively. The calibration process was conducted by the regression analysis, using OriginPro 2020 (Learning Edition) (OriginLab Corporation, Northampton, MA, USA). A nonlinear curve fit program embedded in OriginPro 2020 was used to determine the empirical parameters of each model. However, the calibration steps of the temperature-based models and the ET0-based models are different.
For the calibration of the temperature-based models, we referred to the method presented in [12,15]. Primarily, the parameter b in the relationship between ET and VI was determined for the Tamarix site and the Populus site, respectively. In this study, we used 0.221 and 9.598 for parameter b in the temperature-based models of the two study sites, respectively. These values were from [32], in which the Nagler-2013 model has been calibrated for the Tamarix site and the Populus site separately, using data from both the growing season and the non-growing season. Therefore, these values can better express the relationships between ET and VI for both sites. Moreover, the relevant parameters for describing the relationships between ET and Ta,m were determined by fitting ET with Ta,m using a logistic curve following [15] and using an exponential relationship following [16,17], respectively ( Figure 3). Notably, data for the non-growing season were also included in Figure 3. In this study, the parameter c in the Nagler-2005a model was determined as 20 for both sites (parameter c represents the threshold temperature for vegetation growth, and temperatures below about c ℃ supported negligible ET [12]); parameters c, d, and e in the Nagler- Nevertheless, the calibration of the ET0-based models is relatively simple. For their calibrations, the corresponding model equation was directly fitted to the calibration data of the two study sites, respectively. Notably, parameter b in the Nagler-2013 model of the two study sites also used the values as mentioned above.
In the process of nonlinear curve fit, each equation was subjected to statistical verification through the coefficient of determination (R 2 ), reduced chi-square values, and P-values for the regression coefficient. Table 2 shows the calibration results of these ERSETMs compared in this study.
where , and , are observed and simulated ET, respectively, during day I; is the mean of observed ET values; is the mean of simulated ET values; and n is the total number of observation days. Figure 4 shows the validation results of the eight ERSETMs for the Tamarix site (Figure 4a-h) and the Populus site (Figure 4i-p). Generally, the simulated ET agreed well with the observed ones for both sites. The eight ERSETMs explained 86.6%-95.0% and 51.0%-75.8% of ET variability over the validation dataset of the two study sites, respectively. However, the performance of the eight ERSETMs showed substantial differences between the two sites.

Statistical Performance of ERSETMs
Primarily, the validation results of the eight ERSETMs for both sites (Figure 4)   Ultimately, Table 3 and Table 4 present the statistical performance of the eight ERSETMs for the above two datasets, respectively. In general, for both sites, the ET0-based models performed better than the temperature-based models, which had a relatively lower NSE and relatively higher RMSE, MAE, and MaxError values. In the ET0-based models, the Nagler-2013 model and the Yuan-2016 model yielded better performance. Similarly, the eight ERSETMs showed relatively better performance at the Tamarix site than at the Populus site, and the higher NSE values were achieved at the Tamarix site.  Figure 6 displays the daily variations of the observed and simulated daily ET for the 2017 growing season of the Tamarix site and the 2016 growing season of the Populus site. All eight ERSETMs captured the variation process of daily ET, which increased continuously during the first half of the growing season and decreased gradually during the second half of the growing season. However, there were some differences between the models. Compared with the ET0-based models, the temperature-based models showed a relatively worse performance. The Nagler-2005a model provided some negative daily ET values for both sites at the end of the growing season. In addition, the Scott-2008 model and the Bunting-2014 model produced some extreme values from DOY 180 to DOY 240, especially for the Tamarix site. As a whole, the Nagler-2013 model and the Yuan-2016 model had better performance and provided a very consistent variation process with the observed daily ET for both sites.  Figure 7 depicts the variability of the mean monthly ET for the growing season from 2013 to 2018 estimated by the eight ERSETMs. All of the eight ERSETMs captured the variation characteristics of the monthly ET and revealed a similar monthly variation, which presented a process of increasing first and then decreasing with the maximum value in July and minimum value in October. However, there were substantial differences in the monthly ET estimation among these models. On the whole, the differences were relatively significant in May, June, and October. Both sites showed that the ET0based models achieved larger mean monthly ET values than the temperature-based models in May and June. Notably, compared with the other models, the Nagler-2005a model presented the smallest mean monthly ET values for October because there were some negative daily ET values in October estimated by this model (Figure 5a,i and Figure 6).

Differences in Monthly and Seasonal Scales
The mean ET of the growing season from 2013 to 2018 revealed significant differences among the eight ERSETMs (Figure 8). The ET0-based models showed the higher mean ET values of the growing season, which ranged from 510.41 to 519.65 mm for the Tamarix site and from 415.70 to 437.05 mm for the Populus site, respectively. However, the mean ET of the growing season estimated by the temperature-based models ranged from 465.93 to 484.23 mm for the Tamarix site and from 386.22 to 396.01 mm for the Populus site, respectively. Similarly, the interannual variability of the ET for the growing season from 2013 to 2018 also demonstrated considerable differences among the eight ERSETMs ( Figure 9). The growing season ET for all years estimated by the ET0-based models was generally higher than that estimated by the temperature-based models, and this result mainly due to the relatively sizeable monthly ET values achieved by the ET0-based models in May and June ( Figure  7).

Characterization of Meteorological Conditions
The eight ERSETMs compared in this study used Ta,m, Ts,n, and ET0 to characterize the meteorological conditions. Generally, our results showed better performance of the ET0-based models compared to the temperature-based models (Figures 4 and 5; Tables 3 and 4). ET is affected by a variety of meteorological factors, including wind speed, solar radiation, and air humidity [59]. Although air temperature is an indirect reflection of solar radiation, water molecules could not move into the atmosphere if the wind is absent or if the atmosphere has already been saturated with humidity [59]. Therefore, all meteorological factors affecting ET processes should be considered as comprehensively as possible in these ERSETMs. Similarly, Yuan et al. [11] found that ET was significantly correlated with ET0 on an hourly scale in the study area, but was not significantly correlated with any of the main meteorological factors, including the air temperature. ET0 computed using the Penman-Monteith equation [34] in this study takes all the main variables affecting ET into consideration. However, the Blaney-Criddle formula used in the initial construction of the Nagler-2009 model and the Glenn-2015 model is a temperature-dominated method. Studies indicated that the Penman-Monteith method outperformed other ET0 models [60] and has been regarded as the standard model for ET0 computation and the calibration of other empirical ET0 models [61][62][63]. To sum up, ET0 estimated using the Penman-Monteith equation is a better characterization for the meteorological conditions in these ERSETMs.

Effects of Vegetation Factors
By and large, all eight ERSETMs performed relatively better at the Tamarix site than at the Populus site for both the validation dataset and the daily estimates throughout the whole growing season (Figures 4 and 5; Tables 3 and 4). This result appears to be associated with the coverage of vegetation and the NDVI data quality. On the one hand, compared with the Tamarix site, the Populus site has low vegetation coverage (Figure 1d,e). Studies also showed that MODIS ET presented relatively considerable uncertainty in sparse vegetation areas [1,64]. Similarly, Yang et al. [65] reported that the TSEB (Two-Source Energy Balance) model showed significant uncertainties for ET partitioning, particularly over sparse vegetation regions. On the other hand, the NDVI data quality of the Populus site was relatively more inferior than that of the Tamarix site ( Figure 2). Similarly, studies showed that ET estimated from the SEBAL algorithm was most sensitive to the vegetation index (NDVI) [66], particularly in areas of sparse canopy cover [67]. In addition, the difference in monthly ET estimated from the eight ERSETMs for both sites was larger in May, June, and October; however, it was relatively smaller from July to September (Figure 7). Therefore, the characterizations of vegetation factors specifically affect the ET estimation from these ERSETMs, and the requisite attention may be needed for ET estimation in sparse vegetation areas using these ERSETMs.
Furthermore, the Nagler-2009 model and the Yuan-2016 model are substantially the same. The distinction between the two models is the characterization of the vegetation factors: The former used NDVI, while the latter used LAI (Table 1). Our results demonstrated that the Yuan-2016 model presented better performance than the Nagler-2009 model for both sites. This result suggests that LAI seems to be a better characterization of vegetation factors than NDVI for this specific model structure. LAI is directly correlated with the light absorbed by the vegetation foliage and has an essential effect on the photosynthetic activity and ET processes [68]. Similarly, studies also reported that LAI controlled the spatial variations of ET [11] and the seasonal variations of T/ET [69,70] in the arid ecosystems of Northwestern China.

Effects of Model Structures
In summary, the eight ERSETMs contain four types of model structures. The temperature-based models and ET0-based models include two model structures, respectively (Table 5).

Model Structures Models
Temperature-based models  (Table 1), they have different model structures (Table 5). In terms of the combination of meteorological conditions and vegetation factors, the Scott-2008 model used the additive approach, while the Bunting-2014 model adopted the multiplicative approach (Table 5). However, the performance of the Scott-2008 model was generally better than that of the other three temperature-based models (Figures 4 and 5; Tables 3-5 ). Similarly, in the original construction of the Scott-2008 model, they reported that the calibration results were improved by changing the model structure of the Nagler-2005b model (MS2) to the one they used (MS1) [16]. Thus, these results indicate that the MS1 seems to be relatively more appropriate for the temperature-based models.
Compared with the temperature-based models, the ET0-based models achieved better estimating results. Overall, the Nagler-2013 model and the Yuan-2016 model were superior to the other six models. The structure of the two models (MS3) is similar to the crop coefficient method (Equation (1)). However, the in the crop coefficient method was substituted by using different functions of vegetation indices in the Nagler-2013 model and the Yuan-2016 model. The former used an exponential function of NDVI or EVI modified from the Beer-Lambert Law, while the latter used a linear function of LAI (Table 1). Therefore, these results imply that the MS3 may be a better model structure for the construction of ERSETMs in arid riparian ecosystems.

Considerations for the Applications of ERSETMs
From an application perspective, as reported by Karimi and Bastiaanssen [71], the absence of regional-scale validation of ET estimation is one of the drawbacks in defining the reliability of remotely sensed ET products. However, these ERSETMs extrapolate ET observed or estimated at the site scale to regional scales based on the empirical relationship constructed at the site scale, and they are different from the commonly used ET algorithms based on the energy-balance equation. For these ERSETMs, their estimation accuracy at the site scale is the basis for accurate ET estimation at the regional scales and has a direct influence on the regional-scale ET estimation [32]. Therefore, the comparison of model performance at the site scale is meaningful for their application for regionalscale ET estimation.
By comparing the performance of the eight ERSETMs, we found that the Nagler-2013 model and the Yuan-2016 model outperformed the other models. Furthermore, we provided model parameters of the Nagler-2013 model and the Yuan-2016 model for the main vegetation types (Tamarix ramosissima and Populus euphratica) in arid riparian ecosystems. Our findings are meritorious for the selection of ERSETMs and ET estimation of Tamarix ramosissima and Populus euphratica in similar ecosystems. Additionally, these results also have potential application value for weather and climate modeling [2], ecosystem and environmental assessment [23,72], and sustainable water resources management [73,74] in arid regions.
Notably, the Nagler-2013 model and the Yuan-2016 model have their advantages and disadvantages when applied to regional-scale ET estimation. The Nagler-2013 model uses readily available NDVI or EVI, but there are more model parameters, while the Yuan-2016 model has fewer model parameters, but the collection of measured LAI data is relatively tricky. However, the development of global LAI products [68,[75][76][77] may provide a specific application foundation for the Yuan-2016 model. As mentioned in Section 4.1.2, characterization indicators of vegetation factors and their data quality also have an impact on the performance of ERSETMs. Therefore, in practical applications, the availability of remote-sensing data and their quality should be considered comprehensively in selecting an appropriate model between the two models.

Conclusions
In this study, we compared eight ERSETMs, using ET measurements of two typical sites in an arid riparian ecosystem of Northwestern China. The results showed that the Nagler-2013 model and the Yuan-2016 model outperformed the other models. The differences in model structures and characterization of both meteorological conditions and vegetation factors were the primary sources of different model performance. For the characterization of meteorological conditions in these ERSETMs, the reference evapotranspiration (ET0) estimated by using the Penman-Monteith equation was more appropriate than the daily maximum air temperature. The characterization indicators of vegetation factors and their data quality have a specific influence on model performance. The better model structure for constructing ERSETMs in arid riparian ecosystems was to replace in the crop coefficient method, using an appropriate function of vegetation indices. Our findings provide useful information for choosing models and obtaining accurate ET estimation in arid regions.