Leaf area index in Earth system models: how the key variable of vegetation seasonality works in climate projections

Earth system models (ESMs) are widely used in scientific research to understand the responses of various components of Earth systems to natural and anthropogenic forcings. ESMs embody terrestrial ecosystems on the basis of the leaf area index (LAI) to formulate various interactions between the land surface and atmosphere. Here, we evaluated the LAI seasonality of deciduous forests simulated by 14 ESMs participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5) and CMIP6 to understand the efficacy of recent ESMs in describing leaf dynamics in the northern extratropics from 1982 to 2014. We examined three indicators of LAI seasonality (annual mean, amplitude, and phase) and three phenological dates (start (SOS), end (EOS), and length of growing season (LOS)) of the models in comparison to the third-generation LAI of Global Inventory Modeling and Mapping Studies (GIMMS LAI3g) and the Climate Research Unit gridded time series dataset. CMIP6 models tend to simulate larger annual means (1.7 m2 m−2), weaker amplitudes (0.9 m2 m−2), and delayed phases (226 DOY) compared to the GIMMS LAI3g (1.2 m2 m−2, 1.2 m2 m−2, and 212 DOY, respectively), yet are similar to the CMIP5 models (2.2 m2 m−2, 1.0 m2 m−2, and 225 DOY). The later phase is attributed to a systematic positive bias in EOS of the CMIP5 and CMIP6 models (later by 22 and 18 d, respectively) compared to the GIMMS LAI3g (261 DOY). Further tests on phenological responses to seasonal temperature revealed that the majority of CMIP5 and CMIP6 ESMs inaccurately describe the sensitivities of SOS and EOS to seasonal temperature and the recent changes in mean SOS and EOS distributions (2005–2014 minus 1982–1991). This study suggests that phenology schemes of deciduous forests, especially for autumn leaf senescence, should be revisited to achieve an accurate representation of terrestrial ecosystems and their interactions.


Introduction
Mid-and high-latitude vegetation shows clear seasonality in canopy structure from spring emergence to autumn senescence, in accordance with seasonal changes in environmental conditions (Richardson et al 2013, Piao et al 2020a. In Earth system models (ESMs), canopy structure is parameterized by leaf area index (LAI), which describes changes in the canopy structure resulting from surface interactions with the environment (Arora and Boer 2005). In addition, LAI is also the key variable controlling the magnitude of biosphere-climate interactions because it adjusts the surface energy balance, the hydrological cycle, the carbon budget, and boundary-layer meteorology (Bonan 2008, Jeong et al 2009, Richardson et al 2013, Piao et al 2020a, 2020b) (figure 1). ESMs utilize LAI to represent these complicated and dynamic processes that occur in terrestrial ecosystems, analyzing responses to environmental conditions, and adjusting surface interactions with the Earth system. The dynamic representation of the LAI in ESMs enables the prognostic simulation of terrestrial carbon fluxes and their influence on the concentration of atmospheric CO 2 , which is necessary for carrying out concentration-driven ESM experiments with a fully coupled carbon cycle (Hurrell et al 2013, Eyring et al 2016. Thus, an accurate representation of LAI seasonality in the models is necessary for realistic simulations of terrestrial ecosystem dynamics and the corresponding interactions with the surrounding environment in climate simulations. Vegetation seasonality remains a challenging topic in ESM research, primarily because of the highly intricate nature of vegetation phenology (Arora andBoer 2005, Richardson et al 2012). Despite the inherent complexities of various phenological processes, there have been numerous advances in vegetation seasonality modeling, from prescribed seasonal changes to dynamic phenology that is highly responsive to the simulation of meteorological conditions (Schwartz 2003). Currently, most ESMs describe the seasonal cycle of vegetation activity using processbased approaches. However, in current ESMs, the simulation of LAI seasonality is not yet sufficiently accurate to describe the Earth system. Several studies have found that simulations of vegetation seasonality frequently display delayed or prolonged growing seasons, especially for site-based models (earlier onset of 10 d over North America; Richardson et al 2012), uncoupled land surface models (later onset and offset of 20 and 44 d, respectively, over the Northern Hemisphere; Murray-Tortarolo et al 2013), and coupled ESMs of Coupled Model Intercomparison Project Phase 5 (CMIP5) (later onset and offset by 6 and 36 d, respectively; Anav et al 2013). This inaccurate representation of LAI dynamics disrupts energy balances, hydrological cycles, and carbon fluxes on the land surface in the model (Richardson et al 2013, Zhao et al 2020. Considering the role of vegetation phenology and its ramifications (Peñuelas et al 2009), accurate simulations of vegetation seasonality are a prerequisite for creating a more accurate representation of past, present, and future climate systems.
Modeling groups and various institutes worldwide have endeavored to improve ESMs on the basis of the CMIP. CMIPs were initiated to broaden our understanding of the climate system and its changes by natural and anthropogenic forcing in a multimodel context. During the last two decades, three generations of CMIPs (CMIP3, CMIP4, and CMIP5) have produced standardized and publicly available multi-model outputs, opening a new horizon in climate science by yielding numerous contributions to model evaluations, improvements in the field of physics, and future climate projections (Stocker et al 2013, Eyring et al 2016. As a continuation of CMIP5, the development of CMIP6 is ongoing in order to promote our understanding of Earth systems (Eyring et al 2016). The latest ESMs participating in CMIP6 have applied improved physics regarding the atmosphere, ocean, and land surface (Hajima et al 2020, Mauritsen et al 2019, Arora et al 2020, Boucher et al 2020, Danabasoglu et al 2020, Mudryk et al 2020, Seland et al 2020, which in turn may enhance the performance of LAI seasonality simulations. The evaluation of CMIP6 LAI outputs by comparing them with the previous CMIP5 LAI and remotely sensed LAI datasets (e.g. Global Inventory Modeling and Mapping Studies (GIMMS) LAI 3g ) can illustrate recent improvements in the simulated LAI seasonality, highlighting the remaining deficiencies of LAI simulation in the latest ESMs. Therefore, the evaluation of the CMIP6 LAI seasonality is crucial to finding the key to further improve LAI dynamics.
This study evaluated the LAI seasonality that was simulated in the CMIP6 ESMs in comparison with the CMIP5 ESMs outputs and remotely sensed GIMMS LAI 3g for deciduous forests over the northern extratropics (>30 • N) from 1982 to 2014. First, we compared the LAI seasonality in the CMIP5 and CMIP6 ESMs to the GIMMS LAI 3g with regard to three seasonal indicators (annual mean, amplitude, and phase) and three phenological dates (start, end, and length of the growing season; SOS, EOS, and LOS, respectively). In addition, we examined the interannual sensitivities and long-term responses of phenological dates to seasonal temperature variations in the CMIP5 and CMIP6 ESMs compared with the GIMMS LAI 3g and Climate Research Unit (CRU) TS4.01 near-surface air temperature. By illustrating biases in the mean patterns and temperature sensitivities, this study addresses the recent improvements and current challenges of the simulation of LAI seasonality in CMIP6 ESMs.

Data
This study used the GIMMS LAI 3g v4 dataset as the observational data to evaluate the simulated LAI in ESMs. GIMMS LAI 3g is the longest available satellite LAI dataset , which covers the entirety of the Earth's surface with a high spatiotemporal resolution (1/12 • and 15 d intervals) . Along with its long-term data, GIMMS LAI 3g acquires its high quality by multi-sensor-based bias corrections and 15 d maximum composites to alleviate the influence of non-vegetative effects. In addition, it substitutes possibly contaminated LAI data with snow or cloud to average the seasonal profile or for spline interpolation (Zhu et al 2013, Pinzon andTucker 2014). Many studies have utilized GIMMS LAI 3g to understand the seasonal evolution of vegetation activity (Park et al 2018, Garonna et al 2018, Piao et al 2020a, Zhao et al 2020 and evaluate CMIP5 or land surface models  due to its reliability and the availability of long-term data. We used a 1 km global land cover classification dataset, generated by the Department of Geography at the University of Maryland (UMd) (Hansen et al 2000), to confine our data analysis to natural deciduous forests in the northern mid-and highlatitude regions (>30 • N). Natural deciduous forests are classified into the following categories: deciduous needleleaf forest, deciduous broadleaf forest, mixed forest, and woodland. We utilized the nearsurface (2 m) air temperature of the CRU TS4.01 climate dataset to identify the response of phenological dates to air temperature for the GIMMS LAI 3g dataset. CRU TS4.01 is a high-resolution (0.5 • latitudelongitude grid) monthly climate dataset that covers all land areas, excluding Antarctica, from 1901 to 2016 (Harris et al 2020). It provides a credible climate dataset based on networks of surface weather station observations and is thus appropriate for examining changes in seasonal temperature.
We used the monthly LAI and near-surface air temperature outputs of a total of 14 models, with seven models participating in the CMIP5 and the remaining seven latest model versions participating in the CMIP6 (tables 1 and 2). By comparing the characteristics of the seven pairs of CMIP5 and CMIP6 models, we aimed to present the improvements in vegetation seasonality in the CMIP6 models, as compared to the previous CMIP. We analyzed near-surface air temperature and LAI from 1982 to 2014, which is the period of overlap among the CRU TS4.01 , GIMMS LAI 3g (1982, and CMIP6 historical scenario outputs . Historical simulations are driven by observationbased forcing, such as greenhouse gas concentration, gridded land-use data, volcanic aerosols, sea surface temperature, and sea ice concentrations (Eyring et al 2016), thus allowing the comparison of ESM simulations with observed changes in the climate system. In the case of CMIP5, the historical scenario only spans from 1850 to 2005, which is 9 year shorter period than the CMIP6 historical scenario. To ensure the maximum overlap in the CMIP6, CRU TS4.01, and GIMMS LAI 3g outputs, we supplemented the 9 year period (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) of missing data with the output of representative concentration pathway 8.5 (RCP8.5) scenario in line with previous studies (Quetin andSwann 2018, Mudryk et al 2020). This is justified by the finding that the RCP8.5 scenario closely (within 1% for 2005-2020) tracks anthropogenic greenhouse gas emissions (Schwalm et al 2020).
Several CMIP5 and CMIP6 models provide multiple simulations based on the same experimental configuration for ensemble analyses. However, some of the models have only released their first realization, denominated as r1i1f1; thus, we only utilized the first realization, following previous research . In order to maintain consistency in our data analysis, the GIMMS LAI 3g , UMd land cover classifications, and CMIP model outputs were regridded to Imposed Nitrogen a F, G, C, and P denote forest-, grass-, crop-, and pasture-type PFTs, respectively. a 0.5 • latitude-longitude resolution grid of the CRU TS4.01 dataset. The GIMMS LAI 3g and UMd land cover were harmonized to a 0.5 • resolution by averaging and majority sampling. For the LAI and temperature outputs of the CMIP models, linear interpolation was applied to transfer the relatively coarser model grid (table 1) to a finer resolution of 0.5 • .

Method
We analyzed three indicators of LAI seasonality (annual mean, amplitude, and phase) to evaluate the ESM simulation performance (figure 1). The amplitude and phase were calculated based on yearly sinusoidal components following the method of Stine et al (2009). The sinusoidal amplitude is approximately half of the LAI difference between the minimum and maximum in annual LAI cycles. The phase represents the middle of LAI seasonality, which is close to the timing of peak states, given as a degree ranging from 0 • to 360 • . In order to confine our study regions to forests, we excluded the regions where the annual maximum LAI was lower than 0.5 m 2 m −2 . Along with seasonality indicators, annual dates for start of growing season (SOS) and end of growing season (EOS) were calculated based on the logistic curve fitting method used by Park et al (2018). This was used over the northern extratropics (>30 • N) from 1982 to 2014. The SOS and EOS were computed for regions where the annual maximum LAI exceeded 0.5 m 2 m −2 , which is consistent with seasonality indicators. In addition, the length of the growing season (LOS) was defined as the difference between the SOS and EOS. The SOS, EOS, and LOS are phenological indicators used to understand the expansion of vegetation activity with the advancement of spring and autumn (Jeong et al 2011).
We used a Taylor diagram to show the mean spatial characteristics of seasonal indicators (annual mean, amplitude, and phase) and phenological dates (SOS, EOS, and LOS) simulated in the CMIP5 and CMIP6 models. The Taylor diagram represented both pattern correlations (as azimuthal angles) and normalized spatial standard deviations (as radial distances) to GIMMS LAI 3g . The pattern correlations to GIMMS LAI 3g were calculated from the mean patterns of each variable for each model from 1982 to 2014. The normalized standard deviations were defined as the ratio of the spatial standard deviation of each model to the spatial standard deviation of the GIMMS LAI3g. This diagram facilitates the evaluation of the simulation performance of the CMIP5 and CMIP6 models with regard to the relative spatial variability and pattern similarity. If a certain model output shows a pattern correlation and a normalized standard deviation that is close to one, the simulated distribution closely resembles the distribution of the GIMMS LAI3g with similar magnitudes of spatial variability.
We calculated T SOS and T EOS as the mean temperature of three months around the 33 year mean SOS and EOS, respectively, because temperature is the primary variable of vegetation seasonality in the northern extratropics (Jeong et al 2011). For the observation dataset and model outputs, we defined ∆SOS, ∆EOS, ∆T SOS , and ∆T EOS as the difference between the mean values within the earliest (1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991) and the latest (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) decade. The relationships between ∆SOS and ∆T SOS and between ∆EOS and ∆T EOS were then used to determine the responses of SOS and EOS to temperature variation.

Mean LAI seasonality and phenology
To examine the mean features of simulated LAI seasonality, we compared GIMMS LAI 3g with the multi-model averaged patterns of the three seasonality indicators in the CMIP5 and CMIP6 models. Figure 2 shows the mean distributions of the annual mean, amplitude, and phase of LAI in the GIMMS LAI 3g and the arithmetic differences to those simulated by the CMIP5 and CMIP6 models . The spatial average of the annual mean LAI was 1.2 m 2 m −2 in the GIMMS LAI 3g (figures 2(a) and (j)). However, the CMIP5 and CMIP6 models showed higher values of 2.2 and 1.7 m 2 m −2 , which were approximately 180% and 140% that of GIMMS LAI 3g , respectively (figures 2(d), (g) and (j)). The exaggeration of the annual mean LAI was evident in the CMIP5 models in most of the study regions, as compared to GIMMS LAI 3g . In contrast, the CMIP6 models showed a considerable improvement in terms of annual mean LAI exaggeration because the proportions of positive differences largely decreased in the study regions but remained noticeable in certain areas such as Scandinavia and Alaska. Compared to the overestimated annual mean, there was little difference between the spatially averaged amplitude in GIMMS LAI 3g (1.2 m 2 m −2 ) and in CMIP5 and CMIP6 (1.0 and 0.93 m 2 m −2 , respectively) (figure 2(k)). Regarding spatial patterns, CMIP5 and CMIP6 frequently displayed lower amplitudes along the latitude of 60 • N and higher amplitudes in the subarctic regions (figures 2(e) and (h)). This indicates that the CMIP5 and CMIP6 models tend to underestimate the seasonal amplitude but exaggerate the annual mean of LAI in deciduous forests compared to GIMMS LAI 3g .
With regard to the phase of vegetation, positive and widespread differences were shown in the study region in both CMIP5 and CMIP6 (figures 2(c), (f) and i). The spatially averaged phase in the study region was 208.8 • in GIMMS LAI 3g , while the simulated phases differed by 13.1 • and 13.5 • in CMIP5 and CMIP6, respectively (figure 2(l)). A phase delay of 10 • indicated that seasonality lags by approximately 10 d compared to the GIMMS LAI 3g . This delayed phase was clear in the boreal regions along the latitude of , and the arithmetic differences to those simulated by CMIP5 (d)-(f) and CMIP6 (g)-(i) models. Probability density functions of annual mean, amplitude, and phase (j)-(l) of the GIMMS LAI3g and the CMIP5 and CMIP6 simulations, with their means (vertical lines) are denoted in black, orange, and green, respectively. 60 • N, and exceeded 30 • in a few regions such as Alaska and Siberia, i.e. it was delayed by one month. Notably, the observed SOS has advanced by a few days over the past decades (Jeong et al 2011, Park et al 2018. However, this one-month delayed seasonality in boreal forests exhibits a significant difference in vegetation seasonality, which can distort vegetationclimate interactions during the growing season, and may mislead future projections of the impact of climate change on terrestrial ecosystems. This delayed LAI seasonality could be a result of the inaccurate representation of the growing season, i.e. a late SOS or EOS, in the CMIP6 model. Figure 3 shows the mean patterns of SOS and EOS for 1982-2014, indicating the EOS is the primary cause of the delayed seasonality. The spatial averages of the SOSs were 152.3, 152.3, and 158.0 DOY in GIMMS LAI 3g , CMIP5, and CMIP6, respectively (figure 3(j)). In the spatial pattern displayed by the CMIP5 models ( figure 3(d)), the SOS difference was relatively small, at less than 20 d in most of the high-latitude regions in Eurasia and North America. In contrast, in the CMIP6 models (figure 3(g)), the difference was larger than the results of the CMIP5, particularly in certain high-latitude regions in Eurasia. The ranges of the SOS differences from the 10th to 90th percentile were −11.4 to 11.7 DOY in the CMIP5, and −7.7 to 17.2 DOY in the CMIP6. This indicates that the CMIP6 models tend to simulate later start in the growing season dates compared to the CMIP5 with regard to the mean distribution. The majority of the CMIP5 and CMIP6 models showed relatively small biases in the mean values of SOS that were less than 20 d, regardless of the latitude (supplementary figure 3(a) (available online at stacks.iop.org/ERL/16/034027/mmedia)) and forest type (supplementary figure 4(a)). Hence, the difference in SOS is not sufficient to fully explain the delay in LAI seasonality.  at high latitudes. The 10th to 90th percentile ranges of the EOS difference were 5.0-34.4 d in CMIP5 and −5.0-34.0 d in CMIP6. Although the positive differences decreased slightly in CMIP6, both the CMIP models showed similar severe delays in the EOS, which led to the belated LAI seasonality simulated in the ESMs. This delay in autumn phenology, as shown in the CMIP6 models, is consistent with the results of previous studies on land surface models (Richardson et al 2012 and CMIP5 models . Most CMIP5 and CMIP6 models showed a positive bias in mean EOS that exceeded 20 d, especially at high latitudes (>50 • N; supplementary figure 3(b)). In the case of forest type, the belated EOS was distinct in deciduous needleleaf forests, mixed forests, and woodlands (supplementary figure 4(b)), which are largely distributed at latitudes higher than 50 • N (supplementary figure 2). This implies that both the CMIP5 and CMIP6 models do not appropriately describe the autumn senescence of vegetation at high-latitude cold climate zones.
The delay in the EOS resulted in a relatively longer LOS in both CMIP5 and CMIP6 models compared to the GIMMS LAI 3g , with an extension of more than several pentads in certain high-latitude regions (figures 3(f) and (i)). The spatial averages of the LOS in the GIMMS LAI 3g , CMIP5, and CMIP6 were 107.8, 129.4, and 120.5 DOY, respectively. The LOS differences were mostly positive in the Northern Hemisphere in the CMIP5 outputs, and the 10th and 90th percentiles were 6.5 and 36.2 d, respectively. In the case of CMIP6, the positive differences were relatively weak compared to those of CMIP5 in the highlatitude Eurasia region, with a narrow range shown by the 10th to 90th percentile (− 4.1-28.8 d). The results reveal that the simulated phenology shown by the CMIP6 models tends to have a longer LOS resulting from a later EOS, as compared to the GIMMS LAI 3g ; however, the bias to the satellite-derived LAI seasonality is decreased. The longer and later vegetation growing season contributes to the higher annual mean, weaker seasonal fluctuation, and delayed seasonality of LAI, as shown in figure 2. These large biases in the mean LAI seasonality could be caused by either a mean bias in a key climate variable (e.g. nearsurface air temperature) or the misrepresentation of vegetation characteristics and processes (e.g. carbon allocation, phenology, or plant functional types). However, the CMIP5 and CMIP6 did not show a significant mean bias in near-surface air temperature, in contrast to LAI seasonality (supplementary figure 5). This suggests that the positive biases in the mean SOS and EOS were largely derived from a misrepresentation of deciduous forests in terms of vegetation properties and processes.
In addition to the spatial patterns observed in the CMIP5 and CMIP6 ensembles, we examined the LAI seasonality characteristics for each CMIP5 and CMIP6 model. Figure 4 shows a Taylor diagram based on the mean spatial patterns of seasonality indicators of phenological dates of the CMIP5 and CMIP6 models. This diagram demonstrates that the majority of the models had pattern correlations lower than 0.8, indicating that ESMs cannot accurately represent the spatial patterns of vegetation seasonality, even in historical scenarios. The number of model outputs in a relatively reliable range (with a pattern correlation of >0.6, and a normalized standard deviation of >0.5 and <1.5) totaled 20, which is approximately 25% of the total number of outputs. A comparison of the lower (30-60 • N) and higher latitudes (>60 • N) indicates that the CMIP5 and CMIP6 models tended to show a better simulation performance over the lower-latitude regions (supplementary figures 6(a) and (d)) in terms of the mean spatial patterns. These diagrams (figures 4(a), 6(a) and (d)) clearly illustrate that most of the ESMs faced difficulty in simulating the mean characteristics in vegetation seasonality in terms of both spatial variability and patterns. Figures 4(b) and (c) display colored tables containing pattern correlations of each indicator in the CMIP5 (upper panel) and CMIP6 (lower panel) models to focus on whether each model could reproduce the mean spatial patterns of LAI seasonality found in the GIMMS LAI 3g . These panels show that the ESMs simulated the SOS with the highest correlations for most of the models. Most CMIP5 and CMIP6 models showed correlations of higher than 0.5, with improved values in CMIP6. In contrast, the results of the EOS were not in agreement with the GIMMS LAI 3g in both the CMIP5 and CMIP6; most of the models showed low correlations (less than 0.7). The exception was IPSL-CM6A-LR, which showed the best simulation performance of all phenological indicators, with a credible normalized standard deviation close to one and higher pattern correlations than its previous version, IPSL-CM5A-MR. This improvement was largely due to better simulations of the EOS over the lower-latitude regions (30-60 • N; supplementary figures 6(b) and (c)). However, figures 4(b) and (c) also indicates that recent models are not always better for simulating seasonality than previous versions. In the case of the multi-model ensembles, the pattern correlations were lower in CMIP6 than in CMIP5 regarding annual mean, amplitude, and EOS. This decrease in correlation was distinct for the amplitude simulated in CESM2 and NorESM2-LM, which used the community land model 5 (CLM5) as the land surface model (table 1). CESM2 and NorESM2-LM utilized the same land models and had almost identical land processes. As a result, they showed little difference in their partial correlations, despite differing evolutions of seasonal meteorology. However, compared to previous versions (CESM1-BCG and NorESM1-ME), the pattern correlations decreased significantly for most of the seasonal indicators in both CESM2 and NorESM2-LM in both higher-and lower-latitude regions (supplementary figures 6(b), (c), (e) and (f)), despite improvements in updated land surface and phenological processes (tables 1 and 2). This clearly illustrates the difficulties in vegetation modeling wherein changes in plant physiology and nutrient dynamics can result in unintended consequences in terms of vegetation seasonality.

Responses of phenology to seasonal temperature
One of the main drivers of changes in phenological dates is the near-surface air temperature for deciduous forests distributed over the northern extratropics (Jeong et al 2011). An accurate representation of responses in phenology to seasonal temperature change is closely connected to a long-term projection of changes in terrestrial ecosystems under climate change as well as its interactions with the Earth system. To briefly evaluate the phenological responses to temperature in the CMIP5 and CMIP6 models, we compared the interannual sensitivities of the SOS and EOS to the corresponding seasonal temperature (T SOS and T EOS ) for each ESM in the deciduous forests (figure 5). The observation-based datasets (GIMMS LAI 3g and CRU TS) show a negative relationship between SOS and T SOS , averaging −4.5 d K −1 ( figure 5(a)). Thus, higher spring temperatures were related to earlier SOS in deciduous forests. However, the CMIP5 and CMIP6 ensembles showed a relatively weak SOS sensitivity compared to the observation (−2.2 and −2.6 d K −1 on average, respectively). In particular, BCC-CSM1-1 and NorESM1-ME exhibited the weakest sensitivities among the CMIP5 models (−1.0 and 0.0 d K −1 , respectively), but the newest versions indicated a stronger phenological response to changes in near-surface air temperature (−2.3 and −2.5 d K −1 in BCC-CSM2-MR and NorESM2-LM, respectively). The sensitivity of SOS to T SOS tended to be weaker in the CMIP5 and CMIP6 models, but ESMs are able to simulate the advancement of SOS due to spring warming ( figure 5(a)).
In the case of the sensitivity of EOS to T EOS ( figure 5(b)), the observed sensitivity was positive (0.5 d K −1 ) on average, indicating that higher autumn temperatures lead to a later EOS. The EOS sensitivities to T EOS tended to show a weaker magnitude than the SOS sensitivity to T SOS because the EOS is affected by not only autumn temperature but also environmental conditions such as insolation, leaf aging, and water availability (Liu et al 2016). The CMIP5 and CMIP6 ensembles showed a slightly larger magnitude of the EOS sensitivity (0.7 and 0.8 d K −1 on average, respectively), but the sensitivities of each model showed a large multi-model spread, from −0.3 d K −1 of NorESM1-ME to 1.9 d K −1 of CanESM2 in CMIP5, and from −0.6 d K −1 of CESM2 to 2.9 d K −1 of MIROC-ES2L in CMIP6. Notably, large proportions of deciduous forests in BCC-CSM1-1, CESM-BGC, and NorESM1-ME showed a weaker sensitivity compared to the other models, and the median sensitivities were close to zero (−0.1, 0.1, and 0.0 d K −1 , respectively). In particular, in the CMIP6 models, CESM2 and NorESM2-LM consistently displayed weak EOS sensitivity to autumn temperature, with medians of 0.0 d K −1 for both. This irresponsive EOS in the two models seems to be related to a daylength-based trigger of growth offset in CLM4 and CLM5, and will be discussed in the subsequent sections.
To determine whether the CMIP5 and CMIP6 models can represent recent changes in phenological dates and temperature, we examined differences in the mean distributions of SOS, EOS, T SOS , and T EOS between the two periods of 1982-1991 and 2005-2014 (i.e. ∆SOS, ∆EOS, ∆T SOS , and ∆T EOS ). Figure 6(a) shows the mean and one-standard-deviation confidence ellipses of ∆SOS and ∆T SOS (2005-2014 minus 1982-1991). The ellipse denotes the two-dimensional standard deviation of the ∆T SOS and ∆SOS distributions; if a model displayed a change in temperature and phenological dates identical to that of CRU TS and GIMMS LAI 3g , its ellipse will be the same as the ellipse of the observation. However, only a few models overlapped considerably with the observation ellipse of ∆SOS and ∆T SOS ( figure 6(a)). MIROC-ES2L had averages of ∆SOS and ∆T SOS closest to the observation ellipse, and a few models (CESM1-BGC, MPI-ESM-MR, and MPI-ESM1-2-LR) showed similar averages regarding the ellipse of the observation. Most models tended to exaggerate ∆T SOS , that is, showing stronger warming around the SOS than the observation, which resulted in the distant ellipses and averages of models from the observation. In the case of ∆EOS and ∆T EOS (figure 6(b)), five CMIP5 models (BCC-CSM2-MR, CESM1-BGC, MIROC-ESM, MPI-ESM1-2, and NorESM1-ME), and three CMIP6 models (CEMS2, MIROC-E2L, and MPI-ESM1-2-LR) exhibited means in the observation ellipse. This represents the response of the EOS to autumn warming in the models with relatively higher similarities to the observation, contrary to the inaccurate representation of mean EOS distributions (figures 3 and 4). Compared to the mean of the observation, most models tended to have a lower ∆EOS but a higher ∆T EOS , similar to the higher ∆T SOS shown in figure 6(a). Among the ESMs, MIROC-ES2L was shown to have the closest mean values to the observation (figures 6(a) and (b)), which indicates that MIROC-ES2L similarly simulated changes in seasonal temperature and involved phenological responses shown in CRU TS and GIMMS LAI 3g for both spring and autumn.

Discussion
An accurate representation of LAI seasonality is essential in the simulation of land-vegetation interactions and long-term changes in climate (Richardson et al 2012). Despite its importance, CMIP6 ESMs tend to simulate the mean LAI seasonality with biases of larger annual means, weaker seasonal amplitude, and belated phases (figure 2) with delayed SOS and EOS (figure 3), as reported in previous CMIP5 model studies . Although there have been many improvements in land surface models since CMIP5 (tables 1 and 2), this study reveals that even the latest CMIP6 models do not sufficiently describe the LAI seasonality and phenological dates in terms of the mean distributions ( figure 4). Notably, the mean biases of EOS were more distinct than those of SOS, and this was common for the majority of the CMIP5 and CMIP6 ESMs. This systematic delaying bias of EOS exceeded 20 d on average at higher latitudes (>50 • N; figure 3 and supplementary figure 3) and was also more distinct in deciduous needleleaf forests and woodlands, which are mostly distributed over the cold climatic zones (supplementary figures 2 and 4). The lower pattern correlation at higher latitudes (supplementary figure 6) implies that vegetation processes related to senescence should be reexamined in boreal forests in order to more accurately represent LAI seasonality.
The delayed autumn phenology shown in the CMIP5 and CMIP6 models could have implications regarding regional circulation and global climate through biases in energy balances as well as carbon fluxes. With regard to photosynthesis, delayed leaf senescence is relatively less prominent in autumn due to lower light availability (Jeong 2020, Zhang et al 2020, as compared to spring leaf emergence. However, delayed autumn phenology and a longer growing season in ESMs result in excessive maintenance respiration and evapotranspiration with a lower surface albedo, which can lead to biases in surface carbon fluxes, soil moisture, and energy partitioning of net radiation into sensible and latent heat fluxes (Bonan 2008, Piao et al 2020a. Considering the importance of interactions between vegetation and climate, the inaccurate representation of EOS in the northern extratropics could lead to significant biases in seasonal climate patterns on both regional and global scales (Richardson et al 2012). In full-couple carbon cycle simulations, biases in spring or autumn phenology lead to biases in surface carbon fluxes and atmospheric carbon concentration, which can accumulate over time and result in large uncertainties regarding long-term climate predictions (Richardson et al 2012, Friedlingstein et al 2014. For climate simulations that do not include the prognostic carbon cycle, biases in LAI seasonality can lead to differing evolutions of seasonal meteorology and further longterm climate issues through the distortion of surface hydrology and energy balances. The results of this study imply that phenology schemes, particularly for autumn phenology, need to be improved in order to obtain a more accurate representation of vegetation seasonality and predictions for long-term climate change. The LAI seasonality and phenology in the land surface models are comprehensive and consequential phenomena related to various terrestrial ecosystem processes (e.g. start or end of leaf carbon allocation, carbon allocation fluxes, and PFTs representation). Therefore, they cannot be simulated realistically unless all the ecosystem processes are in tune. For example, in the case of seasonal deciduous forests in CLM, the spring green-up stage is initiated when the accumulated soil temperature exceeds a threshold that is determined by the annual mean 2 m air temperature of each grid point (Lawrence et al 2011). After this leaf onset, stored carbon transfers to the leaf during a fixed period of 30 d, which results in an increase in LAI depending on the specific leaf area of the PFTs. In autumn, senescence is triggered when the day length is shorter than a critical value, and the leaves lose their carbon over 15 d (i.e. the LAI decreases). Changes in photosynthesis and respiration due to meteorological conditions and nutrient dynamics can influence the phenological dates because LAI depends on leaf carbon content. Any biases in either meteorology, PFTs, or the senescence threshold can result in biased LAI seasonality, as shown in this study. Murray-Tortarolo et al (2013) reported positive biases in the phenological dates of land surface model simulations based on imposed PFTs and meteorological forcing. This implies that the belated biases in the LAI seasonality, especially in the EOS, are likely to result from vegetation schemes rather than PFTs and model meteorology.
Our understanding remains insufficient to describe vegetation phenology and its relationships with environmental conditions due to complexities in the physical processes of leaves (Arora and Boer 2005). For example, CLM uses a fixed length of green-up periods (30 d), but the green-up duration varies with temperature (Park et al 2015. The present land surface model is not sufficiently accurate because of knowledge gaps, particularly for autumn leaf senescence (Jeong 2020). Another example is the sensitivity of the EOS to temperature. EOS is influenced by a variety of complex factors such as daytime and nighttime temperature, precipitation, photoperiod, and leaf aging (Jeong 2020, Keenan and Richardson 2015, Liu et al 2016, Chen et al 2020. However, as mentioned above, CLM determines the start of the leaf senescence stage of seasonal deciduous forests solely by using daylength. As a result, the EOS sensitivity to T EOS was close to zero for all the CLM-based ESMs in CMIP5 and CMIP6 (CESM1-BGC, CESM2, NorESM1-ME, and NorESM2-LM in figure 5(b)). In the case of the CTEM of BCC-CSM2-MR and CanESMs, the start of senescence is triggered by consecutive days of net carbon loss due to unfavorable weather conditions, such as shorter day lengths, colder temperatures, and drier soil moisture, based on a carbon benefit approach (Arora and Boer 2005). In contrast, ORCHIDEE of IPSL-CM6A-LR utilizes leaf age, along with the temperature threshold, to parameterize litter fall and senescence (Krinner et al 2005). Although the CTEM and ORCHIDEE tended to exaggerate the EOS sensitivity to temperature more than that shown in the GIMMS LAI 3g and CRU TS, it is suggested that the CLM4 and CLM5 should include other variables to describe the EOS responses to temperature variation.
The responses of spring and autumn phenology to seasonal temperature variation is another notable issue still possessed by state-of-the-art ESMs as well as the mean biases. Both CMIP5 and CMIP6 ESMs tend to underestimate the SOS sensitivities compared to GIMMS LAI 3g ( figure 5(a)). This suggests that even if ESMs project a future warming pattern of spring temperature with high accuracy, the weaker SOS sensitivities in the ESMs will result in an underestimation of the SOS change due to warming and its interactions, for example, transpiration and cloudiness (Chen et al 2016, Ma et al 2016. Therefore, the phenological sensitivity of SOS and EOS should be improved to achieve better projection of terrestrial ecosystems and their role in climate systems in the future. Seasonal warming is another notably important factor that can have a significant influence on phenology simulations (figure 6). Many of the ESMs showed a stronger warming in spring and autumn than CRU TS. This stronger warming in the models can lead to earlier SOS and later EOS than those in the GIMMS LAI 3g , even in the case where the land surface models are ideal for describing changes in phenology. Therefore, realistic predictions of seasonal temperature changes are a prerequisite for accurate representation of the response of vegetation seasonality to climate change. For example, CESM2 and NorESM2-LM are based on the same land surface model (CLM5), and their mean sensitivities of SOS to temperature are almost the same (−2.6 and −2.5 d K −1 , respectively). However, there is a large difference in ∆SOS and ∆T SOS because of the different spring warming patterns between the two models ( figure 6). This demonstrates the importance of accurate projections of future climate change to foresee the changes in spring and autumn phenology.
In this study, we focused on the overall evaluation of LAI seasonality and its responses to seasonal temperature in CMIP6 in comparison with the previous CMIP5 and GIMMS LAI 3g . However, it should be noted that our results could be affected by uncertainties in either snow contamination of the satellite data or differences in spatiotemporal resolutions between models. The satellite-retrieved LAI dataset is a valuable tool for evaluating LAI seasonality in ESMs over a wide range, but its potential uncertainties such as snow contamination in high-latitude regions, should be discussed. In this study, we utilized the GIMMS LAI 3g based on a 15 d maximum composite and logistic fitting methods, to minimize the uncertainties in LAI seasonality related to snow. The maximum composite and logistic fitting were effective methods for reducing non-vegetative influences, but the non-vegetative influence of snow could not be completely removed from the analysis. Considering that uncoupled land model simulations have shown a delay in LAI seasonality similar to this study (Murray-Tortarolo et al 2013) and more realistic snow seasonality in CMIP6 models (Mudryk et al 2020), the influence of snow cover could be relatively minor compared to the large biases in LAI seasonality.
Another uncertainty can arise from the differences between the maximum-composite satellite LAI and the monthly averaged LAI from the CMIP models. GIMMS LAI 3g is based on the 15 d maximum composite to minimize potential uncertainties, which is not identical to monthly averaging and can create a bias between the satellite and modeled LAI. This methodological difference can partly contribute to the delayed biases of autumn phenology, but cannot fully explain the EOS differences between the GIMMS LAI 3g and ESMs that exceed 40 d. Nevertheless, applying the maximum composite to daily LAI output from the models will be a way to ensure consistency between the model output and satellite data for precise analyses. The harmonization of different spatial resolutions among CRU TS, GIMMS LAI 3g , and ESMs (table 1), or the limited number of years (33 years from 1982 to 2014), could also have caused uncertainties in our analyses.
This study assumed that seasonal temperature exerts a one-way influence on LAI seasonality, while LAI exerts an influence on temperature by modulating evapotranspiration and surface albedo (Bonan 2008). For example, delayed LAI seasonality over high latitudes can lead to a high temperature bias by lowering the surface albedo, and can amplify the biases in autumn phenology by exaggerating autumn temperature. Therefore, the sensitivities of phenological dates to seasonal temperature (figure 5) could be affected by the interactions and feedback processes between the vegetation seasonality and environmental conditions. Here, we aimed to illustrate the overall aspects of the CMIP6 LAI on the basis of monthly output, which is not a thorough investigation of the two-way consequence because of the misrepresentation of LAI seasonality. In future studies, a daily examination of LAI and temperature is required to clarify the consequences of delayed LAI seasonality and its feedbacks.
The vegetation fraction is an important factor that affects land-surface albedo and evapotranspiration (Brovkin et al 2013), which in turn influences LAI seasonality in the ESMs. The CMIP6 ESMs require a standard land-use scenario, land use harmonization 2 (LUH2), to facilitate intercomparison analyses of the ESMs (Eyring et al 2016, Ma et al 2020. In spite of the same land-use scenario, the implementation of LUH2 could lead to a discrepancy in vegetation or forest fractions due to different translation rules, or inconsistent PFTs of each ESM (table 2). Under the same land-use forcing, the discrepancy in forest fractions among ESMs is likely to be minor compared to other uncertainties. Considering the importance of vegetation fractions, however, the discrepancy and its consequent influences need to be clarified in intercomparison studies focusing on vegetation fraction and PFTs between the ESMs.

Conclusion
This study reveals that CMIP6 ESMs show a large bias in LAI seasonality and phenological dates with regard to the mean distributions and responses to temperature in deciduous forests over the northern extratropics, which is similar to the CMIP5 ESMs. These vegetation seasonality biases can result in inaccurate representations of vegetation-climate interactions such as surface energy balance, hydrological processes, and carbon fluxes. Our findings highlight that deciduous phenology, particularly leaf senescence, should be improved for a more accurate depiction of terrestrial ecosystems and their interactions with the Earth system.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.