Land surface models significantly underestimate the impact of land-use changes on global evapotranspiration

Despite numerous assessments of the impact of land-use change (LUC) on terrestrial evapotranspiration (ET) that have been conducted using land surface models (LSMs), no attempts have been made to evaluate their performance in this regard globally. Errors in simulating LUC impacts on ET largely stem from LUC data interpretation (LI, i.e. mapping of gridded LUC data into annual plant function types) and model structure (MS, i.e. parameterization of land-surface processes). The objective of this study was to benchmark ET estimates from four LSMs using the Zhang-curve, a prototype of the Budyko framework that has been validated against global hydrological observations and used widely to quantify the impacts of LUC on ET. A framework was further proposed to quantify and attribute errors in estimated ET changes induced by LI or MS. Results showed that all LSMs underestimated ET changes by about 55%–78%, and 37%–48% of the error was attributable to LI, but only 11%–32% of the error was attributable to MS across the four LSMs. From a hydrological perspective, our analysis provided insights about the errors in estimated impacts of LUC on ET by LSMs. The results demonstrated that LUC data interpretation accounted for a larger fraction of errors than LSM structure. Therefore, there is an urgent need for the defining and development of consistent protocols for interpreting global LUC data for future assessments.


Introduction
Land-use change (LUC) has significantly affected global water and energy cycles, climate change, and ecosystem sustainability through direct changes to the timing and magnitude of evapotranspiration (ET) (Foley et al 2005, Bonan 2008, Sterling et al 2013, Cheng et al 2017a, Wang et al 2021. The impacts of LUCs on global terrestrial ET have been commonly evaluated using land surface models (LSMs) (Piao et al 2007, Liu et al 2008, Sterling et al 2013 can also be coupled with General Circulation Models for estimating LUC effects on climate change and possible feedbacks between them (Pitman et al 2009, Ghent et al 2011, de Noblet-ducoudre et al 2012, Brovkin et al 2013. A number of model comparison projects have been conducted to investigate LUC impacts on terrestrial ET, such as the Trends in Net Land-Atmosphere Exchange (TRENDY) (Le Quéré et al 2016), the Land-Use and Climate, Identification of robust impacts project (LUCID) (Lejeune et al 2017), the Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP) (Huntzinger et al 2013), and the Land Use Model Intercomparison Project (LUMIP) (Lawrence et al 2016). However, considerable discrepancies remain in their values of estimated ET and LUC-induced ET changes (de Noblet-ducoudre et al 2012, Kumar et al 2013, Mao et al 2015, Pan et al 2020. Therefore, it is crucial to benchmark and to attribute errors in estimated LUC-induced ET changes to better understand their impacts on the hydrological cycle, climate change, and sustainable development.
Quantifying errors in estimated LUC-induced ET changes by LSMs globally is very difficult, as direct large-scale ET measurements are not presently available. Although a few studies have attempted to evaluate the performance of LSMs for estimating the impacts of LUC on ET using field observations (Chen et al 2018, Cai et al 2019, our understanding of the causes of the large discrepancies in the estimated surface fluxes among LSMs at the global scale is still very limited. Previous studies based on catchment hydrological observations have demonstrated that forest coverage change is a key factor for quantifying the impact of LUC on water yield or ET at mean annual scale (Zhang et al 2001, Brown et al 2005, Zhou et al 2015, Ning et al 2020. Impacts of forest cover changes on mean annual ET are well understood based on paired catchment experiments. The hydrological impacts of forest cover changes on catchment water balance have been widely recognized, and it is generally accepted that afforestation will lead to increased ET and decreased streamflow (Zhang et al 2001, Cheng et al 2017b. The Budyko framework has been shown to provide accurate estimates of vegetation cover change impacts on ET (Li et al 2009, Yang et al 2009, Zhang et al 2012, Lei et al 2014, Jaramillo et al 2018. Therefore, the Budyko framework can be considered as an independent and robust framework to benchmark the impacts of LUC on regional water balance or ET simulated by LSMs. Changes in ET resulting from LUCs can be estimated using LSMs. However, large differences exist in the estimates of LUC impacts among LSMs. These differences can be attributed to differences in (a) interpretation of LUC data (hereafter denoted as LI) and (b) model structure (hereafter denoted as MS). They are sourced from two modeling steps (figure 1). One of the first steps in estimating the impact of LUC on ET in LSMs is to convert the input data of LUC (i.e. land-use types, LUTs) into the broadly defined plant function types (PFTs) that are commonly used in LSMs. However, it is known that the definition and number of PFTs vary among different LSMs (Poulter et al 2011). For example, the number of PFTs varies from 5 to 25 among the 12 LSMs used in CMIP5 (Li et al 2018). This means that a given input LUT can be represented differently in terms of PFTs by LSMs. Furthermore, LUC are represented as changes in fractional PFTs in the LSMs based on transition rules that differ among the models (de Noblet-ducoudre et al 2012, Brovkin et al 2013, Hartley et al 2017, Peng et al 2017, Di Vittorio et al 2018. For instance, cropland expansion would be represented by first reducing fractional grassland PFTs and then forest PFTs in some models (e.g. MOSES2), while other models would proportionally reduce all natural PFTs (e.g. ORCHIDEE). Different transition rules would lead to differences in the estimated LUC impacts on ET. Model structure can also affect estimated impacts of LUC on ET. LSMs simulate exchange of water, carbon, and energy between the surface and the atmosphere using models with different structural complexity. Hence, it is expected that differences in LSM structure will lead to differences in their estimates of LUC impacts on ET. In this study, model structure included process representation, parameterization, and spatial implementation of the model. No attempts were made to examine more specific aspects of process representation, for example, methods for modeling different components of ET (e.g. the Priestley-Taylor method used in LPJ-wsl and the Modified Penman-Monteith method used in ORCHIDEE for transpiration), and different parameterization of PFT attributes (e.g. rooting depth, maximum assimilation rate).
Previous studies have focused only on either LI or MS, and have shown that both can lead to large discrepancies in simulated ET among different LSMs (e.g. de Noblet-ducoudre et al 2012, Chen et al 2018, Di Vittorio et al 2018. However, to our knowledge, no studies have attempted to separate the contributions of LI and MS with respect to estimated discrepancies. To better evaluate the simulated effects of LUC on ET and their errors by LSMs, a framework based on the Zhang-curve (one prototype of the Budyko framework) was proposed to benchmark estimated ET changes by four LSMs of TRENDY. The primary objectives of the present study were to: (a) determine whether LSMs can capture LUCinduced ET changes estimated by the Zhang-curve (hereafter ∆ET Z ); (b) quantify the total relative errors in estimated ET changes by LSMs (hereafter ∆ET m ); (c) attribute the relative errors to either LI or MS.

Proposed benchmarking framework
2.1. The Zhang-curve for estimating ET Based on paired catchment observations and the Budyko theoretical framework, Zhang et al (2001) proposed a semi-empirical function for quantifying the impacts of LUCs (e.g. forest to grassland) on ET: where ET Z is mean annual evapotranspiration (mm); ET Z,F and ET Z,G are ET for forest LUTs and grassland LUTs areas, respectively; f is the forested land-use fraction; P is mean annual precipitation (mm yr −1 ); and w 1 (w 2 ) are mean annual potential evapotranspiration (mm yr −1 ) and empirical plant-available water coefficients of forest (grassland), respectively. Based on observations from over 200 forest and grassland catchments across a wide range of climatic zones (see appendix A in Zhang (1999) and w 1 (w 2 ) were determined as 1410 (1100) mm yr −1 and 2.0 (0.5), Figure 2. Flow chart of the benchmarking framework. LUH-HYDE3.1 refers to the LUC data of TRENDY v5 LSMs. LI_LUH and LI_LSM i refer to the land use data interpretation (LI) of LUH-HYDE3.1 and by LSMs, respectively. PFTs refer to the coverage of plant function types of LSMs. ∆f refers to the forested land use change provided by LUH-HYDE3.1. ∆p refers to the forested PFTs changes simulated by LSMs. MS_Zhang and MS_LSM i refer to the model structure (MS) of the Zhang-curve and LSMs, respectively. ∆ETz refers to the ET difference estimated by using the Zhang-curve and forested land use change.∆ET ′ z refers to the ET difference that was estimated using the Zhang-curve and changes in forested PFTs. ∆ETm refers to the ET difference estimated by LSM.
respectively. Parameters are spatially invariant, and the Zhang-curve has been widely validated and applied globally (Brown et al 2005, McVicar et al 2007, Zhang et al 2011, Ponce-Campos et al 2013. In this study, changes in mean annual ET of LUC cells estimated by the Zhang-curve were regarded as reference values for benchmarking the estimated changes in ET from LUC by LSMs.

Benchmarking and model error decomposition
Based on the Zhang-curve, a benchmarking framework was proposed to quantify total relative error in LUC-induced ET changes as estimated by LSMs, and to attribute the total relative error to either LUC data interpretation errors (i.e. LI) or to model structure errors (i.e. MS). The proposed benchmarking framework is illustrated schematically in figure 2.
For two cells with the same climate (i.e. rainfall and potential evaporation) but different forest cover, the difference in mean annual ET can be estimated from equation (2) as: where f 1 and f 2 are the fractions of forested LUTs in cell 1 and cell 2, respectively. In the LSMs, the difference in ET between these two cells is estimated based on PFTs rather than LUTs. ET values are calculated as: where ET m,1 and ET m,2 are estimated ET values for the two cells, respectively, p 1 and p 2 are fractions of all forest PFTs in cell 1 and cell 2, and ET m,F and ET m,G are estimated ET values for forest and grassland PFTs, respectively. The difference in estimated ET between the two cells can be calculated as: The difference between estimated ET change using the Zhang-curve and that of the LSM can be expressed as: Equation (7) can be expressed as: where ∆pδ z is the estimated ET change between forest and grassland PFTs using the Zhang-curve and is denoted as ∆ET ′ z , ε LI represents the difference in estimated changes in ET due to differences in mapping of LUTs to PFTs. ε MS represents the difference in estimated changes in ET due to differences in model structure.
In this study, the linear slope between ε TOT and ∆ET Z of all study cells was considered to be the total relative error (i.e. RE TOT ). Total relative error (%) can be estimated as: Similarly, linear slopes between ε LI and ∆ET Z and between ε MS and ∆ET Z are considered to be the relative errors accounted for by LI and MS (i.e. RE LI , RE MS ), respectively, and are calculated as: It is expected that ∆ET z is greater than zero (Zhang et al 2001  For two neighboring land cells with different land-use cover (i.e. forest and grassland), climatic conditions can be assumed to be the same or to be very similar. Hence, the simulated ET change between these two cells can be attributed to LUC. In this study, two neighboring land cells (one under forest cover and the other under grassland) were considered as paired-cells. ∆ET m between the two cells was calculated as the ET of the forested cell minus that of the grassland cell over the period of 1996-2005 based on the modeling outputs of scenario 3 of the TRENDY project with changing CO 2 , climate, and LUC data.

LUC and climate data
LUC data used in this study were the same as those used in the TRENDY simulations (i.e. LUH-HYDE3.1) (Hurtt et al 2011, Klein Goldewijk et al 2011. They were annual, half-degree, fractional data of five LUTs including cropland, pasture, urban, primary land, and secondary land for the historical period from 1500 to 2005 (see table S1 available online at stacks.iop.org/ERL/16/124047/mmedia for a description of some LUTs). The LUH-HYDE3.1 dataset also provided mean aboveground biomass density (kg C m -2 ) of secondary land and the location of potential forest (figure S1(a)). Potential forest is defined as the area with a potential aboveground biomass density larger than 2.0 kg C m -2 . In this study, annual forest coverage of LUTs in each individual cell was the sum of primary land located in the potential forest area and secondary land located in the potential forest area with mean aboveground biomass density over 2 kg C m -2, and it was used to calculate ∆f between two neighboring cells. Derived mean annual global forested area in terms of LUTs during 1996-2005 is shown in figure S1(b). Forested areas for the four LSMs are the sum of fractional forest PFTs (table S2, figures S1(c)-(f)), and were used to calculate ∆p between two neighboring cells.
The precipitation data for equation (2) was the same as the climate forcing for the TRENDY LSMs simulations (i.e. the CRU dataset) at a resolution of 0.5 • averaged over the period of 1996-2005.

Identification of paired cells and application
Selection of paired cells was based on LUTs (LUH-HYDE3.1). Paired cells were identified using two criteria to minimize the difference in hydro-climatic conditions. One criterion was the window size of 2 • × 2 • , and the other was a maximum elevation difference of 100 m. The GMTED2010 global digital elevation model dataset was used to implement the elevation criterion (Danielson and Gesch 2011). Cells with a threshold of 80% forested (grassland) LUTs over the period of 1996-2005 were considered as fully forested (grassland) cells. Identified pairs of cells that satisfied the above criterion are shown in figure 3 (n = 1133). Most paired cells were in the transition zones of different climate types (supplementary figure S2). The uncertainty caused by different thresholds for identification of paired cells were further assessed, including threshold of forest (nonforest) LUTs coverage (85%, 80%, 75%, and 70%), window size (2.5 • × 2.5 • , 2 • × 2 • , 1.5 • × 1.5 • ), and elevation differences (±100, ±200, ±300, ±400, ±500 m). Our benchmarked results and conclusions were robust and relatively insensitive to choices of the thresholds used for selecting the paired cells (see figures S3-S5).

Results
To demonstrate the application of the proposed benchmarking framework, the results of one pairedcell example in North America are presented in figure 3. These results include the estimated LUCinduced ET change (mm yr −1 ) by the Zhang-curve (∆ET z ) and the ET changes estimated by the four LSMs (∆ET m ). Also presented are the relative errors (%) due to land-use change data interpretation (RE LI ) and model structure (RE MS ). For this paired-cell example, all LSMs estimated an increase in ET as expected due to LUC changing from grassland to forest. However, the ET changes estimated by the LSMs were much smaller than the changes estimated by the Zhang-curve (262 mm) with underestimations ranging from 32% to 71%. A closer examination shows that LI was responsible for 18%-53% of the underestimation, while MS accounted for 3%-18% of the underestimation. Figure 4 shows the relationship between errors in estimated ET change and the reference ET change (i.e. ∆ET Z ) for all of the paired cells. The negative linear Figure 3. Spatial distribution of the identified paired cells (forest cells with green circles and grassland cells with blue crosses) and the decomposed relative errors contributed by land use data interpretation (RELI) and model structure (REMS) from four LSMs for an example paired-cell location in North America.∆ETz is the ET difference between forest cell and grassland cell that estimated using Zhang-curve and forested land use change. ∆ETm is the ET difference estimated by LSMs. ∆ET ′ Z is the ET difference estimated using Zhang-curve with forest PFTs changes. slopes in figure 4(a) (i.e. total relative error, RE TOT ) were −0.78, −0.55, −0.58, and −0.59 for CABLE, ISAM, LPJ-GUESS, and ORCHIDEE, respectively, indicating that all four LSMs underestimated LUCinduced ET change. Three of the four LSMs showed a total relative error of about −60%, while CABLE had the largest relative error (-78%). Clearly, all LSMs considerable underestimated LUC-induced ET change. Figures 4(b) and (c) show the relationship between ε LI and ∆ET Z (figure 4(b)) and ε MS and ∆ET Z (figure 4(c)). The linear slopes shown in the subpanels are the relative error contributions attributed to LI and MS (i.e. RE LI , RE MS ). Results showed that LI accounted for about 46%, 37%, 40%, and 48% of the errors for CABLE, ISAM, LPJ-GUESS, and ORCHIDEE, respectively, while MS accounted for about 32%, 18%, 18%, and 11% of the respective errors (see figure 4(d)).
The overall total and relative errors are summarized in figure 4(d). For all four LSMs, the relative error caused by LI (ranging from −37% to −48%) accounted for about 60% of total error on average, while the relative error caused by MS (ranging from −11% to −32%) accounted for about 40% of the total error on average. Therefore, LI was responsible for a much greater fraction of the total error than MS.

Contributions of land-use change data interpretation
Results suggest that LI resulted in larger errors than MS in quantifying LUC-induced ET changes using LSMs (RE LI , the green bars in figure 4(d)). LI-induced errors were largely caused by underestimation of the forested area when mapping LUT-based LUC into PFT-based LUC. The vegetated fraction interpreted by LSMs was obviously different from the original land-use data (LUH-HYDE3.1), although the general pattern and LUC of LUH-HYDE3.1 were captured by the four LSMs. For the 1131 paired-cells used in the study, mapping of the LUH-HYDE3.1 data into PFTs by the LSMs resulted in lower forested fraction and higher non-forested fraction compared with the initial input LUTs (figure S6). The change in forested area interpreted by the four LSMs in terms of PFTs was 0.9-1.3 × 10 6 km 2 , significantly lower than the change values derived from LUH-HYDE3.1 data in terms of LUTs (1.6 × 10 6 km 2 ). Because forest cover change is the most sensitive and important characteristic of LUC affecting ET, underestimation of LUCinduced ET change is likely to be caused by lower forested areas represented by the different modelling  (see table S1). Generally, LSMs with greater numbers of PFTs (e.g. LPJ-GUESS) had relatively lower RE LI . However, for some specific LSMs, paired cells with fewer numbers of PFTs had relatively lower RE LI (figure S7). For the identified paired-cells in different regions, locations classified as arid savannah climate were clearly different (e.g. African savannahs in figures S2, S7, and S8) than other climate classifications, possibly because vegetation dynamics of savannah ecosystems are poorly represented by LSMs (Whitley et al 2017). In situ observation and remote sensing data can be used to improve the accuracy of PFT classification and parameterization of LSMs in these regions (Wullschleger et al 2014, Baudena et al 2015, Whitley et al 2017. For the Qinghai-Tibet plateau, all LSMs interpreted identified cells in this region as fully covered with nonforest (figures S1(b)-(f). Thus, different LI resulted in a mismatch between PFT-based and LUT-based LUC, resulting in errors in estimated ET changes.
A previous study of LUCID experiments also showed that the impacts of deforestation on ET differed considerably among LSMs (de Noblet-ducoudre et al 2012) due to differences in LI. Peng et al (2017) and Di Vittorio et al (2018) found that different transition rules resulted in differences in forest areas, and thus different carbon emission changes. How LUC data are interpreted still varies greatly among different modeling groups. This study suggested that mapping of LUC data in terms of PFTs by the four LSMs introduced large errors in estimated LUC-induced ET fluxes. Recently, a few numerical experiments (e.g. the deforestation experiment of LUMIP) have conducted to evaluate the LUC impacts on climate with more consistent deforestation experiment setup (Lawrence et al 2016). Ma et al (2020) highlighted the need for consistent transition rules for LUC data interpretation, and recommended a set of transition rules based on the new version of LUH-HYDE3.1 data. Therefore, efforts should be made to consistently represent LUC data in LSMs in the future for assessment of vegetation water use and its changes.

Contributions of model structure
This study also showed that model structure induced all four LSMs to significantly underestimate LUCinduced ET changes during afforestation (figure 4). In this study, model structure included process representation (e.g. equations for estimating ET), parameterization, and spatial implementation of the model.
No attempts were made to examine more specific components of process representation. Basically, underestimations caused by MS were due to inadequate parameterization of key hydrological processes. Hydrological processes in the LSMs are usually simplified as one-dimensional and free-draining flow, and are therefore unable to capture lateral water exchange (Clark et al 2015). The consequences of this unrealistic representation can lead to failure in accurately capturing increases in continental water storage and residence time (Fan et al 2019). Based on a 1D free-draining model, Miguez-Macho and Fan (2012) showed that ET and headwater streamflow in the Amazon were zero during the dry season, but this result was not consistent with observed data from a flux tower and hydrological gauge stations in the region. The differences in the parameterization of key processes pertinent to ET (such as LAI dynamics, soil moisture parameterizations, and rooting depth) may also induce underestimation of LUC-induced ET differences. For example, Decker (2015) showed that there was a positive ET bias in the default CABLE LSM over the PFTs with short canopies, and the bias was significantly reduced by using improved soil moisture parameterizations. Many LSMs have an unrealistically shallow rooting depth (generally less than 2 m) that can profoundly impact the accuracy of the hydrological simulation (Lee et al 2005), and can result in lower plant available water for forest. Thereby it may underestimate the LUC-induced ET difference. However, it remains a grand challenge to identify the specific drivers or sources of errors caused by differences in model structures among different LSMs.
The proposed diagnostic framework is established based on catchment water and energy balance theory (i.e. Zhang-curve), which is one of top-down approaches and can achieve parsimony of model parameters with satisfactory performance worldwide. It can be applied to other modeling experiments as well. However, this benchmarking framework is limited to total ET and mean annual time scale and cannot provide more information about errors at processes level (i.e. different ET components) and at finer time scale (e.g. monthly or annually). More physical process of ET can be incorporated into the current framework for deeper understanding in the future.

Conclusions
Many modeling studies have been conducted to quantify the impact of LUC on global ET and the water cycle. However, few studies have attempted to benchmark apparent inconsistent results and to quantify the relative contributions of LUC data interpretation (LI) and model structure (MS) of LSMs. By making using of the Budyko framework that has been widely validated and used to quantify the impacts of forest cover changes on ET, estimated LUC-induced ET changes by four LSMs from the TRENDY project were benchmarked and a framework was proposed to attribute the error to either LI and MS.
Our results showed that all four LSMs captured the direction of ET change with LUC from non-forest to forest, but all four LSMs considerably underestimated the change (55%-78%). This finding was consistent with previous studies using observations from paired flux sties (Chen et al 2018, Cai et al 2019. More importantly, we found that both LI and MS contributed to underestimates of ET changes, and LI was the major cause for this underestimation. The results of this study demonstrate a new way to benchmark and quantify errors in estimated LUC-induced ET changes by LSMs from a hydrological perspective. The results demonstrate that there is an urgent need for consistent protocols to interpret LUC data for future assessments of LUC impacts on ET.

Data availability statement
All data, models, or code generated or used during the study are available from the corresponding author by request.
The data that support the findings of this study are available upon reasonable request from the authors.