DeepPhenoMem V1.0: Deep learning modelling of canopy greenness dynamics accounting for multi-variate meteorological memory effects on vegetation phenology

. Vegetation phenology plays a key role in controlling the seasonality of ecosystem processes that modulate carbon, water and energy fluxes between biosphere and atmosphere. Accurate modelling of vegetation phenology in the interplay of Earth’s surface and the atmosphere is thus crucial to understand how the coupled system will respond to and shape climatic changes. Phenology is controlled by meteorological conditions at different time scales: on the one hand, changes in key meteorological variables (temperature, water, radiation) can have immediate effects on the vegetation development; on the 20 other hand, phenological changes can be driven by past environmental conditions, known as memory effects. However, the processes governing meteorological memory effects on phenology are not completely understood, resulting in their limited performance of phenology simulated by land surface models. A deep learning model, specifically a long short-term memory network (LSTM), has the potential to capture and model the meteorological memory effects on vegetation phenology. Here, we apply the LSTM to model the vegetation phenology using meteorological drivers and canopy greenness at high temporal 25 resolution collected taking advantage of digital repeat photography by the PhenoCam network. We compare a simple multiple linear regression model, a no-memory-effect, and a full-memory-effect LSTM model to predict the whole seasonal greenness trajectory and the corresponding phenological transition dates of 50 sites and 317 site-year during 2009-2018, across deciduous broadleaf forests, evergreen needleleaf forests and grasslands. The deep learning model outperforms the multiple linear regression model, and the full-memory-effect LSTM model performs better than no-memory-effect model for all three plant 30 function types (median R 2 of 0.878, 0.957, and 0.955 for broadleaf forests, evergreen needleleaf forests and grasslands)


Introduction
Vegetation phenology characterizing key plant development stages such as leaf unfolding and leaf senescence, plays a pivotal role as primary regulator of ecosystem processes and land-atmosphere interactions (Peñuelas et al., 2009;Richardson et al., 2013;Piao et al., 2019).In response to the global change, vegetation phenology has shown divergent shifts in the diverse biomes (Menzel et al., 2006;Cleland et al., 2007;Wolkovich et al., 2012;Fu et al., 2015;Zhang et al., 2022), whilst at the same time exerting a substantial influence on ecosystem productivity and functions through the impact on biogeochemical processes, especially photosynthesis and carbon sequestration (e.g., Richardson et al., 2010) as well as ecosystem respiration (e.g., Migliavacca et al., 2015).Additionally, as green leaves are the primary interface for the exchange of energy, mass, and momentum between the terrestrial surface and planetary layer (Richardson et al., 2012), vegetation phenology plays a fundamental role in controlling seasonal dynamics of water and heat fluxes between the land and the atmosphere (Peñuelas et al., 2009;Richardson et al., 2009;Puma et al., 2013;Jin et al., 2017;Buermann et al., 2018;Koebsch et al., 2020;Wu et al., 2022).Given the significance of vegetation phenology within the Earth system, an accurate representation of vegetation phenology in land surface models (LSMs) is crucial to enhance our understanding of ecosystem processes and their dynamics in response to climate change.
Over decades, much of the modelling efforts have been made to improve the development of accurate phenological models at species-specific and vegetation-type scale (White et al., 1997;Chuine, 2000;Jolly et al., 2005;Delpierre et al., 2009), including understanding the physiological mechanisms and environmental driving factors controlling phenology (Fu et al., 2020).Currently, vegetation phenological models mainly include statistical models and process-based models.These models are developed to simulate phenological events by integrating meteorological variables which are supposed to drive the processes of vegetation phenology, utilizing ground observations or phenological proxies derived from remote sensing vegetation index data.The most popular approach to represent phenology in land surface models is based on the accumulated growing-degree-days (GDD) (Lawrence et al., 2019;Asse et al., 2020;Pollard et al., 2020).The GDD model assumes that vegetation phenological events occur when the accumulated growing-degree-day sum fulfils a given requirement (i.e., a threshold of accumulated temperature over a certain time period).Considering the physiological processes, plants experience dormancy before entering the growing season, and thus chilling is considered to be essential to break dormancy in phenological https://doi.org/10.5194/egusphere-2024-464Preprint.Discussion started: 7 March 2024 c Author(s) 2024.CC BY 4.0 License.models (Chuine, 2000;Zhang et al., 2022).Besides temperature, also the photoperiod, and soil water availability have been shown to be important drivers for vegetation phenology (Adole et al., 2019;Borchert et al., 2005;Luo et al., 2020).
Consequently, models based on GDD models have been improved by incorporating photoperiod and soil water availability effects, which have been applied in many LSMs, such as Biome-BGC (BioGeochemical Cycles) model (Thornton et al., 2002;Thornton and Rosenbloom, 2005), JSBACH (Jena Scheme for Biosphere-Atmosphere Coupling in Hamburg; (Mauritsen et al., 2019)), ORCHIDEE (ORganising Carbon and Hydrology In Dynamic Ecosystems; (Krinner et al., 2005a)) and so on.
Large uncertainties and biases in modelling phenology following this ad-hoc concepts have been identified within LSMs and Earth system models (Richardson et al., 2012;Jeong et al., 2012;Murray-Tortarolo et al., 2013;Lawrence et al., 2019;Peano et al., 2021), resulting in inaccurate estimations of primary productivity and the terrestrial ecosystem carbon and water cycle (Migliavacca et al., 2012).
To improve such phenology model performance, one has to consider more complex interactions of meteorological conditions that drive the vegetation phenological development.Phenology is triggered by meteorological conditions at various time scales.Instantaneous meteorological conditions like day-to-day variations in temperature, water, radiation can directly impact vegetation development.Additionally, longer-term and past meteorological conditions from the previous month / year, have legacy effects on phenological changes.For example, a plant might face delays in budburst if the chilling requirements are not fulfilled (Ren et al., 2021).These lasting or delayed impacts are often referred to as memory effects, representing the impact of previous climate conditions on the present or future vegetation development.Studies have revealed that besides the well-known memory effect from temperature like GDD or chilling, other meteorological variables like precipitation or drought also have the memory effects on vegetation growth and subsequent phenological appearance (Walter et al., 2011;Ogle et al., 2015;Ettinger et al., 2018;Liu et al., 2018b;Lian et al., 2021).Due to the complexity and the interplay of various meteorological factors, current modelling efforts face a challenge in incorporating these multi-variate memory effects (that refers to the different memory effects that can be associated to different meteorological drivers) in a mechanistic manner.
Recently, data-driven methods including deep learning techniques have been used to investigate the influence of climatic factors on land surface processes (Forkel et al., 2017;Reichstein et al., 2019;Besnard et al., 2019;Kraft et al., 2019;Callaghan et al., 2021;Zhou et al., 2021), demonstrating their potential in capturing long-term temporal dependencies (Sutskever et al., 2014;Bahdanau et al., 2016).Deep learning models aim to consider the full spectrum of meteorological inputs making predictions and thus hold promise in capturing long-term temporal dependencies from multiple variables (Besnard et al., 2019;Kraft et al., 2019).A recent study already indicated that the deep learning technique is capable of improving the predictability skill of vegetation phenology with respect to conventional methods (Zhou et al., 2021).The long short-term memory network (LSTM), a type of deep learning neural network, designed specifically for sequence prediction problems that deal with the memory effect (Hochreiter and Schmidhuber, 1997).Also, time series of consistent and widely distributed phenological nearsurface observations are now long enough so that they reflect short-as well as long-term sensitivities to meteorological conditions.Specifically, the continuous daily dataset from the PhenoCam initiative (Richardson et al., 2018), obtained from images of near-surface digital cameras, offers opportunities to apply deep learning methods for developing vegetation phenology models that account for the memory effects of multiple meteorological variables (Richardson et al., 2018;Seyednasrollah et al., 2019).
We propose to use the PhenoCam data and the LSTM framework to develop a deep learning model that not only is capable of predicting specific transition dates, but also in forecasting the state of the canopy greenness throughout the entire year.Most current phenology modelling studies focus on phenological transition dates only, such as the onset of budburst, flowering, or leaf senescence (Chuine, 2000;Delpierre et al., 2009;Fu et al., 2020).These phenological models effectively capture the processes that lead to individual phenological events, but fail to overlook the dynamical nature characterizing the continuous phenological development throughout the entire annual cycle, where the phenological state itself could influence the subsequent phenological development (Fu et al., 2014).Conversely, models targeting the whole seasonal trajectory (White et al., 1997) possess the capability to provide continuous predictions of the phenological development state that decisively influence other biogeochemical and -physical processes at the land surface.
More specifically, in this study, we focus on the modelling of the whole seasonal trajectory of canopy greenness, but also the prediction of transition dates in the annual cycle of canopy greenness.Overall, our key objective is to develop a robust deep learning vegetation phenology model on the basis of a LSTM to characterise the memory effects of multiple meteorological variables on canopy greenness using the PhenoCam observations.We also build a statistical model as the baseline to evaluate the performance of our machine learning model.Our study focuses on addressing the following research

PhenoCam data
The phenological data used in this study are acquired from the PhenoCam dataset v2.0 (https://daac.ornl.gov/VEGETATION/guides/PhenoCam_V2.html,(Seyednasrollah et al., 2019)).These data are derived from digital images photographed by automated and high-frequency digital cameras at half-hour intervals.The green chromatic coordinate (GCC) is calculated as the ratio of the green channel digital numbers to the total digital values of the digital Red-Green-Blue images within a predefined region of interest.The daily GCC time series are derived by calculating the 90th percentiles of GCC each day, that help mitigate adverse illumination effects caused by atmospheric influences.The daily GCC time series are used in this study to represent the canopy greenness development and senescence.Specifically, we select the GCC time series from Type I observation sites that follow a standard protocol to ensure data quality and continuity (Richardson et al., 2018).To ensure robust data, we exclude yearly GCC data for years with more than 20 days of missing digital images.
We further select sites with continuous observations available for more than 5 years.Missing data in the GCC time series are https://doi.org/10.5194/egusphere-2024-464Preprint.Discussion started: 7 March 2024 c Author(s) 2024.CC BY 4.0 License.interpolated using cubic spline interpolation method (Hall and Meyer, 1976).Additionally, we apply a locally weighted scatterplot smoothing method to reduce noise in the GCC time series (Cleveland, 1979).Our study focuses on three main plant functional types (PFT): deciduous broadleaf forest (DB), evergreen needleleaf forest (EN), and grassland (GR).Ultimately, a total of 50 sites and 317 site-year observations during 2009-2018 are used in the analysis.Of these, 28 sites with 178 site-year observations are DB sites, 13 sites with 82 site-year observations are EN sites and 9 sites with 57 site-year observations are GR sites.The spatial distribution of the study sites and test sites for each PFT is shown in Fig. 1.Fig. 1 Geographical distribution of study sites for deciduous broadleaf forest (orange circle), evergreen needleleaf (blue triangle) and grassland (green square).The test sites are specifically highlighted within a red frame.

Explanatory variables
The daily meteorological variables were obtained from the station-level Daymet dataset (https://daymet.ornl.gov/).These variables include daily minimum temperature (Tmin), daily maximum temperature (Tmax), daily daylength (DL), daily precipitation (P), daily water vapour pressure of the air ( ! ), and daily shortwave radiation (R).We extracte time-series of these meteorological variables for each studied Phenocam site.In our study, six dynamic variables are used, including Tmin, Tmax, DL, R, vapour pressure deficit (VPD) and soil water availability (SW).The daily VPD is the difference between the saturation pressure of water ( " , calculated from the daily mean temperature) and the actual water vapour pressure of the air ( ! ), calculated following Eq.( 1) & (2) (Alduchov and Eskridge, 1997).The daily SW is computed using a proxy of the sum of precipitation over the previous month (Eq.( 3)).Additionally, two static variables, mean annual temperature (Tmean), and mean annual precipitation (Pmean), are derived from the records of these two variables in PhenoCam dataset.
where  # is the soil water availability on day ,  #,-is the precipitation on day ( − ),  is the number of days away from the day of .

LSTM modelling approach
Our goal is to predict the whole seasonal trajectory of canopy green chromatic coordinate (GCC) from the time series of the eight predictor variables using one model per PFT for multiple sites.To make this task feasible, we subtract the winter baseline value (the mean of the minimum GCC values in available years) of GCC at each site and PFT, making the measurements more comparable across sites.Further, the predictor variables and targets (GCC) are globally normalized using a min-max transformation for each PFT.
To ensure that our models learn relationships that can be generalized, we evaluate them on unseen data in space and time.
For the spatial generalisation, we hold out 10% of all studied sites as unseen test sites.Furthermore, we use all data from the year of 2018 for each PFT as our temporal test dataset.The division of the data is illustrated in Fig. S1.Additionally, we divide the dataset into samples consisting of two years of input predictor variables (accounting for potential memory effects of meteorological variables from previous one year to current-year), along with one year of GCC observations corresponding to the second year of input.
To capture the relationship between the meteorological variables and the GCC, we employ a LSTM network (Hochreiter and Schmidhuber, 1997).We choose this method for several reasons.Firstly, we expect a highly non-linear relationship between the meteorological variables and observed canopy greenness, necessitating a flexible, nonparametric model such as a neural network (Hornik et al., 1989).Secondly, the representation of dynamic meteorological memory effects on canopy greenness, requires a model that can represent temporal interactions across scales.For this purpose, we utilize a recurrent neural network (RNN), specifically the LSTM which have demonstrated strong performance in prediction related problems involving time series data (Wu et al., 2017;Besnard et al., 2019;Kraft et al., 2019Kraft et al., , 2022)).
In our study, we employ a single-layer LSTM with 128 nodes, followed by one fully connected (output-) layer and preceded by one fully connected (input-) layer.The mean squared error between the predicted and the observed GCC is optimized using the gradient based AdamW (Loshchilov and Hutter, 2019), an algorithm that adds decoupled weight decay to the Adam (Kingma and Ba, 2017) optimizer.We train an ensemble of models to reach a more stable prediction.To this end, we create multiple datasets by leaving the data of one site out, respectively.On each of these datasets we train an LSTM model and use we decay the learning rate (a hyperparameter that determines the size of the steps taken during the optimization of a model) of, initially, 0.01 by a factor of 0.9 after each epoch.For testing we use the mean of the ensembled LSTMs as the prediction for the GCC.
To quantify the importance of the memory effects in the model, we additionally train our model on the same dataset with all data being randomly shuffled in the time dimension (Besnard et al., 2019;Kraft et al., 2019).In this dataset the "instantaneous" relation between the inputs and outputs of the current day is unimpaired but the effects of previous days cannot be learned, as these days are random.This LSTM model does not consider the memory effects, referred to as no-memoryeffect LSTM model M0.In contrast, the original model, which has access to the full history of the input variables, is referred to as the full-memory-effect LSTM model Mfull.The framework of our LSTM model in predicting canopy greenness GCC using six dynamic and two static predictors is illustrated in Fig. 2.
Fig. 2 The framework of canopy greenness modelling using LSTM.The LSTM is composed of a forget gate, input gate, output gate and a candidate and hidden state.The LSTM networks are adapted from Christopher Olah, http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

Model evaluation
To assess the modelling ability of deep learning model, we develop a baseline model using multiple linear regression (MLR) between the eight predictor variables and GCC (Eq.( 4)).The MLR model is trained and tested in the same training and testing dataset as the LSTM models.where  is our target,  .-/,  .!0 , , , , ,  .1!/ , and  .1!/ are predictor variables,  is the residual.
For model evaluation, we primarily use root mean square error (RMSE) and the coefficient of determination (R 2 ) for model evaluation.These metrics are calculated based on the predicted and observed GCC at each site for each PFT.We compare the model performance of all models in the testing dataset, and select the best model for each PFT by R 2 .For the best models, we evaluate their performance in simulating: 1) GCC observations; 2) GCC temporal variation; 3) phenological transition dates in testing datasets.GCC temporal variation includes three time-scales variation: daily variation, monthly mean GCC variation, and interannual variation of the anomalies of median GCC.For daily variation, we also calculate the daily anomalies for observations and LSTM models.For the monthly GCC variation, we further compare the mean monthly canopy development rate ( 233 ) during studied years between observed and predicted GCC time series.The rate of canopy development for each month (from February to December) is calculated at the monthly scale from the GCC time series using Eq. ( 5).
where  is time (month),  # and  #,+ is the mean  for a given year at month  and  − 1, respectively.
The phenological transition dates are estimated intending to define the start of season (SOS) and the end of season (EOS) for one year.We choose the dates corresponding to a 30% of the seasonal amplitude (from the 5th percentile to the 95th percentile) through greening rising and falling to represent the start of season (SOS) and end of season (EOS).SOS and EOS transition dates are estimated from both the predicted and observed GCC time series.

Model sensitivity analysis
In order to gain insights into the physical implications of deep learning models, we conduct two simple experiments to assess the model sensitivity to meteorological drivers.First, we increase (warming) and decrease (cooling) temperature (Tmin and Tmax) by 4 °C throughout the year while keeping all other predictors unchanged.Another experiment involves the same temperature adjustments for Tmin and Tmax, but with the VPD varying based on mean temperature while keeping other predictors constant.We then compare the annual canopy greenness cycles under warming and cooling conditions with the actual observations.Furthermore, to better understand how vegetation phenology shifts in response to a warming environment according to the LSTM models, we estimate the SOS and EOS in these two experiments.We evaluate the differences between the treatment of 1°C warming and the observations for SOS and EOS.

Model performance
A comparison of the model performance is conducted between the statistical MLR model and the LSTM models (including the no-memory-effect LSTM model M0 and the full-memory-effect LSTM model Mfull) on the test dataset for deciduous broadleaf (DB), evergreen needleleaf (EN) and grassland (GR) (Table 1).The LSTM models achieve a better performance for predicting the GCCs than the MLR model for all PFTs (Table 1).The coefficient of determination R 2 between modelled and observed canopy greenness GCC are much higher in LSTM models than MLR, with the median R 2 increased from MLR to LSTM from 0.779 to more than 0.806 for DB, from 0.777 to more than 0.830 for EN, and from 0.646 to more than 0.914 for GR.In summary, LSTM models achieve better performance for predicting GCC compared to the baseline MLR model in all three PFTs.Furthermore, comparing the two different LSTM models, the full-memory-effect model Mfull exhibits superior performance in simulating GCC than no-memory-effect model M0 across all three PFTs (Fig. 3).The median R 2 of all studied sites in the full-memory-effect model exceeds 0.85, specifically 0.878 for DB, 0.957 for EN, and 0.955 for GR.This represents an improvement in model performance of 8.9% for DB, 15.3% for EN, and 4.5% for GR, compared to the no-memory-effect model with R 2 of around 0.806 (DB), 0.830 (EN) and 0.914 (GR).Similarly, there is a reduction in bias of 12.5% (RMSE decreased from 0.036 in M0 to 0.032 in Mfull) for DB, 15.4% (RMSE decreased from 0.015 in M0 to 0.013 in Mfull) for EN, and 37.5% (RMSE decreased from 0.011 in M0 to 0.008 in Mfull) for GR in full-memory-effect models.These findings suggest considering memory effects from multiple meteorological factors can enhance the model performance in simulating GCC compared to models without considering memory effects.

255
The performance of LSTM models on unseen test sets in space and time shows that full-memory-effect LSTM model Mfull outperforms no-memory-effect LSTM model M0 both in unseen site(s) and unseen year across all three studied PFTs (Fig. 4).As for the performance on unseen site(s), Mfull consistently exhibits higher median R 2 values compared to M0, with improvements of 9.3%, 13.5%, and 3.5% for DB, EN, and GR respectively.Similarly, in terms of model performance across unseen years, Mfull demonstrates substantial enhancement in predictive accuracy, with median R 2 values increasing from 260 0.805 to 0.874 (an 8.6% improvement) for DB, from 0.824 to 0.956 (a 16% improvement) for EN, and from 0.914 to 0.964 (a 5.5% improvement) for GR.These findings underscore the robustness of the model in accurately forecasting GCC time series, even when face with previously unobserved spatial and temporal contexts.

Modelling the temporal variability of GCC in unseen sites: daily to interannual time-scales
Generally, LSTM models can capture the GCC canopy greenness temporal dynamics, with an initial increase followed by a decrease during the growing season.This is illustrated in Fig. 4, which displays the observed and predicted daily variability of GCC, daily GCC anomaly, seasonal variability of GCC and its development rate, and interannual variability of GCC anomalies in the unseen sites from 2009 to 2018 using LSTM models.The predicted daily variability of GCC shows a high correlation with observations across multiple years, with a significant correlation coefficient (r) above 0.9 (Fig. 5 a -c) in Mfull.Conversely, M0 shows increased noise in daily GCC variability, displaying larger biases in predicting both GCC peaks and minimum values, particularly evident for DB (Fig. 5 a).From the predictions of daily GCC anomalies which remove the mean seasonal cycle (Fig. 5 d -f), we can further find Mfull performs better than M0 in capturing the daily fluctuation though they are still not good at predicting the daily GCC anomalies as we expected.The R 2 between observed and predicted GCC anomalies are higher in Mfull than in M0 for DB (M0: 0.0007, Mfull: 0.02), EN (M0: 0.03, Mfull: 0.22), and GR (M0: 0.07, Mfull: 0.32).However, a discrepancy between observed and predicted absolute GCC at the daily scale is observed for EN in the unseen site, where the predicted GCC is overestimated compared to the observation in both Mfull and M0 (Fig. 5 b).
The overall seasonal cycle of monthly GCC shows good agreement between the observation and prediction by LSTM models.The observed GCC starts increasing in March (GR) or April (DB and EN), peaks in June (DB and GR) or July (EN), and gradually decreases until November (DB and GR) or December (EN) (Fig. 5 g -i).For EN and GR, both Mfull and M0 can effectively capture this seasonal pattern.However, in the case of DB, our Mfull model predicts a similar seasonal dynamic pattern of GCC to observations, depicting greening up before June or July followed by greening down until November or December, while M0 predicts a peak in greenness occurring in July which diverges from observation (Fig. 5 g).Similarly, the development rate of monthly GCC shows similar performance as seasonal cycle of monthly GCC.The largest increase in observed GCC occurs in May during the greening-up period for all three vegetation types, while the speed of greenness

Modelling the vegetation canopy phenological transition dates in unseen sites
Figure 6 illustrates that Mfull can capture the interannual variability of phenological transition dates, outperforming M0.
Regarding the start of season (SOS) (Fig. 6 a, b, c), Mfull consistently exhibits the same shift direction (sign of the anomaly) as observations in the majority of years.Specifically, we observe concordance in the direction of advance or delay between prediction and observation in 5 out of 7 years (71% of the years) for DB.Similarly, the predicted SOS shifts agreed well with the observations in more than 80% years for EN (89% of the years) and GR (80% of the years).Moreover, a high correlation is evident between the Mfull predicted interannual variability of SOS anomaly and the observed interannual variability of SOS anomaly.The correlation coefficient between observed and predicted interannual variability of SOS anomaly reaches up to more than 0.9 for EN and GR.In further, the observed delay of SOS with the trend of 2.1 days per year is also reproduced well by the Mfull (predicted trend is 1.2 days/year) in howland1 for EN during 2010 to 2018 (Fig. 6 b).Conversely, compared to Mfull, M0 exhibits poor performance in capturing the shift direction and lower correlation between the predicted interannual variability of SOS anomaly and the observed interannual variability of SOS anomaly.
As for EOS, good agreements between the predicted interannual dynamics of EOS anomalies by Mfull and the observed ones are found in DB and EN, although not in GR (Fig. 6 df).Firstly, Mfull predicted shift directions of EOS in most years (over 80%) are consistent with the observed shifts for each PFT.For DB, there are 86% of years showing the same shift direction of advance or delay between observed and predicted EOS.For EN and GR, the percentages are 89% and 80% respectively.On the other hand, we find a positive correlation between observed and Mfull predicted EOS anomalies for DB and EN.The observed delayed trend (2.7 days/year) for DB in harvardbarn2 is also well predicted (1.6 days/year) by the Mfull.Interestingly, Mfull also captures the larger advancement of EOS observed in 2018 compared to the mean EOS for EN, indicating its capability to capture extreme interannual anomalies.Compared Mfull, M0 shows a relatively poor performance, displaying larger bias in its predictions.

The model sensitivity analysis
The model sensitivity analysis indicates that full-memory-effect model Mfull can simulate well the GCC response to temperature.Fig. 7 illustrates the temperature sensitivity of GCC in the LSTM model Mfull for all three PFTs studied here.
Comparison of warming (red line, Fig. 7 a, b, c) and cooling (blue line, Fig. 7 a, b, c) alone treatments (increasing or decreasing 4 °C) to the unchanged temperature (± 0 °C) control (grey line, Fig. 7 a, b, c) reveals that warming led to greener and longer greenness season, while cooling caused to the less greening and shorter vegetation season for studied three PFTs (Fig. 7 a, b,   c).When VPD varies with temperature, high temperature along with the high VPD does not have a significant effect on the greenness and length of vegetation period.During the greenness rising and falling period, the canopy greenness is very similar to the actual GCC cycle (the control) for the three PFTs, but the peak greenness is lower than the actual GCC peak values, especially for DB and GR (Fig. 7 d, e, f).In the cooling and lower VPD treatment, it shows similar trends with the cooling condition but unchanged VPD.Cooling and lower VPD result in declining canopy greenness and shortened vegetation periods during the growing season.Furthermore, we also examine the temperature sensitivity of phenological events (SOS and EOS) (Fig. 8).A one-degree increase in temperature throughout the year resulted in an earlier start of the season (Fig. 8 a) and delayed end of the season (Fig. 8 b), regardless of low or high VPD.Under warm conditions alone, SOS appears to be one day earlier on average, while under warm and high VPD conditions, it shiftes to two days earlier.The one-degree temperature increase has a similar effect 350 on EOS compared to the one-degree increase accompanied by varied VPD.Through student t-tests on means of the two distributions, no statistically significant (p = 0.12 (SOS), p = 0.69 (EOS)) differences in means are found, indicating that temperature is the most influential meteorological factor affecting the start and end of the season.

Meteorological memory effects on vegetation canopy greenness
In our study, we have presented a new way to simulate canopy greenness dynamics by applying a data-driven LSTM model accounting for multi-variate meteorological memory.We find that multi-variate meteorological memory is of importance in developing vegetation phenological models.The impact of meteorological factors on vegetation phenological development encompasses both instantaneous and memory effects.Through a comparison of models accounting solely for instantaneous effects (MLR and M0), with those considering both instantaneous and memory effects of multiple meteorological variables (Mfull), we have demonstrated that models involving memory effects do outperform models without memory effects (Table 1 & Fig. 3).This suggests that considering both instantaneous and memory effects provides a more comprehensive explanation for vegetation development compared to solely instantaneous effects.
But what specific advantages does the full-memory-effect model offer over the no-memory-effect model?We will explore this question from several perspectives.Firstly, full-memory-effect model exhibits good performance in spatial and temporal extrapolation of canopy greenness.By comparing the model performance of Mfull and M0 in unseen site(s) and unseen years (Fig. 4), it becomes clear that the full-memory-effect model outperforms the no-memory-effect model in both unseen site(s) and unseen years for all three PFTs.This indicates that incorporating memory effects into the model enhances performance in unseen sites and years, a challenging task in modelling, especially for unseen sites.
Secondly, the inclusion of memory effects in models improves performance in predicting variabilities across different time scales for unseen sites.At the daily scale, the full-memory-effect model reduces noise on each day and predicts daily anomalies more accurately than the no-memory-effect model (Fig. 5).This underscores that daily changes in canopy greenness are influenced not only by instantaneous climate but also by the memory effects of previous climate on the canopy.Our results align with previous studies indicating that cumulative thermal summation, rather than daily temperature alone, determines vegetation phenology (Hänninen, 1990;Chuine, 2000).Regarding seasonal dynamics and interannual variability, our study finds that memory effects vary among PFTs.For deciduous broadleaf trees, the full-memory-effect model demonstrates a significant advantage in predicting seasonal and interannual dynamics (Fig. 5 g -l).It can capture well the seasonal dynamic pattern and the greening trend, which are the no-memory-effect model fails to predict.This suggests that changes in canopy greenness over long time scales for deciduous broadleaf trees are sensitive to relatively long-term meteorological changes.
This may be attributed to lagged effects of precipitation (Joshi et al., 2022), drought (Peng et al., 2019) and other factors (Gömöry et al., 2015;Ding et al., 2020;Joshi et al., 2022;Zhou et al., 2022;Liu et al., 2018) from previous months on canopy greenness.However, for evergreen and grasslands, both the full-memory-effect model and the no-memory-effect model show similar performance in predicting seasonal dynamics and interannual variability.It is noteworthy that the memory effect of precipitation in our study is already included in the no-memory-effect model, as the meteorological variable of soil water availability is calculated based on the weighted mean of precipitation in the previous month (due to not available soil moisture data).This implies that such memory effects may offset the performance difference between the full-memory-effect model and the no-memory-effect model.
Lastly, incorporating multi-variate meteorological memory effects into the LSTM model improves performance in predicting vegetation phenology (Fig. 6).Our results suggest that phenological shifts are influenced by meteorological memory effects, consistent with the notion that vegetation phenology is highly variable and responsive to long-term variation in climate (Sparks and Carey, 1995).Specifically, winter chilling (Chuine et al., 2016;Ettinger et al., 2020;Zhang et al., 2022), and the growing season temperature (Liu et al., 2018) can impact on the spring phenology and autumn development.However, unlike models primarily accounting for temperature memory effects alone, such as GDD (Hänninen, 1990;Chuine, 2000), our LSTM memory effect model shows promise in the incorporation of multiple memory effects from different meteorological variables.
It should be noted that although our study emphasizes the importance of memory effects of multiple meteorological variables, the specific contributions of different meteorological factors to memory effects on vegetation development remain unclear.Further in-depth studies of memory effects are still needed to discern the relative importance of memory for each meteorological factor, and their memory length across various developmental stages.

Machine learning modelling of vegetation phenology
In our study, we explore the potential of a deep learning approach using LSTM to predict vegetation phenology based on canopy greenness, specifically GCC annual cycles, using only meteorological variables as inputs.The results indicate that the superior performance of our deep learning model compared to a multiple linear regression model (Table 1), highlighting that deep learning models are capable of capturing nonlinear relationships between inputs and targets.This holds promises for improving the performance of current vegetation phenology models and a significant step toward a better representation of phenology on earth system models using deep learning approaches.However, comparing our deep learning model with process-based model is still challenging, as their modelling targets are different in many cases.Our deep learning models focus on the whole annual cycle of canopy greenness, whereas most process-based models concentrate on specific phenological events.
Our findings demonstrate that full-memory-effect LSTM model can generally explain daily-scale variations and seasonal dynamic changes, as evidenced by the high correlations between predicted and observed GCC (Fig. 5).This is likely because the full-memory-effect LSTM model can better learn the complex relationship between climatic dynamics and canopy greenness dynamics.However, our deep learning model has had less success in accurately predicting absolute GCC values and peak values (Fig. 5, Fig. S2).For instance, in the case of "howland1" site for evergreen needleleaf, the full-memory-effect LSTM model can predict the dynamics of canopy greenness well, but the absolute GCC values are overestimated (Fig. 5 b).
Indeed, although full-memory-effect model show a good performance, it also overestimates canopy greenness in some sites (Fig. 5 b, Fig. 8 a & c, Fig. S2), and underestimates in other sites (Fig. 9 b & d), Fig. S2).Possible reasons for this discrepancy could be: 1) different climatic drivers for different species among PFTs, 2) incomparable GCC data among sites, and 3) inadequate learning of site-specific characteristics.We aim to build a more general model in this study, but it should be noted that even within species, GCC can respond to climate differently to meteorological conditions (Denéchère et al., 2021).
Additionally, the combination of GCC data from all sites for a specific PFT in building the model may introduce errors and bias, as GCC data is not consistently calibrated and the colour signals can be sensitive to various parameters, such as camera type, species (foliage colours are different colours of green) or spectral properties of incoming light (Wingate et al., 2015;Richardson et al., 2018).Moreover, the use of static variables (climatological mean temperature and precipitation) to indicate spatial differences may not sufficiently capture site-specific information, leading to overestimation or underestimation of specific sites.Regarding interannual variability, we find that the predicted changes (increase or decrease) in peak GCC are consistent with observations in most years (Fig. 5 g -i), indicating that the model's capability to reproduce basic response of canopy greenness to climate changes.Furthermore, our models can also capture well the interannual variability in GCC (Fig. 5 k), and also the trends of interannual variability in anomalies of peak GCC were generated well by our data-driven models (Fig. 5j).Overall, these results demonstrate the ability of the LSTM model to reliably predict temporal variability.In terms of spatial performance, we find that the LSTM model has a good agreement with observations in most studied sites for all three PFTs (Fig. S3).This means our model is able to capture spatial variation within each PFT providing support that the model might represent a general model for each PFT.The modelling results of vegetation phenology reveal that the full-memory-effect LSTM model is also capable of predicting the shift in phenological transition dates (advance or delay of start of season (SOS) and end of season (EOS)) in most years in the unseen dataset (Fig. 7).The ability to predict the advancement or delay in phenology is crucial for estimating other key processes in the ecosystem functioning, such as ecosystem productivity, as the advancement of spring phenology and the delay of autumn phenology are typically associated with higher productivity (Richardson et al., 2010).Overall, our model's skill to accurately predict the average advance or delay in phenology is encouraging, although, it remains challenging to predict the exact phenological dates given the potential and systematic overestimation or underestimation in the GCC cycle (Fig. S4).

Can the deep learning model of vegetation phenology learn physically plausible relationships?
The sensitivity analysis of our deep learning model sheds light on its ability to learn meaningful physical insights.The model responds to warmer temperatures by predicting an earlier spring onset and later autumn senescence, which is in line with findings from other studies (Menzel et al., 2006;Jeong, 2020).These results underscore the capability of our deep learning framework to retrieve fundamental physical information solely from data.Our current study primarily focuses on developing the deep learning model.We also conduct a basic sensitivity analysis that has the potential to dismantle the LSTM model and learn from the identified relationship in the data how the response (i.e.GCC.A more extensive and comprehensive sensitivity analyses of the LSTM model, as well as interventional experiments, could offer insights into understanding phenology by identifying which predictors are influential and when.Especially, such approach might help to uncover the control of autumn phenology and its modelling -a long-standing challenge faced by process-based models that may struggle due to inadequate predictors inclusion or response functions (Delpierre et al., 2009;Liu et al., 2019).Moreover, the hybrid models by the integration of physic knowledge into the deep learning models might enhance our understanding of how climate change impact on phenology and associated consequences for the ecosystem, another key challenge in phenology modelling.

Conclusions
In this study, we develop a novel deep learning modelling framework incorporating multiple meteorological memory effects to predict the whole seasonal trajectory of canopy greenness and transition dates for each plant functional type using LSTM.
Our key findings can be summarized as follows: 1) The general deep learning model, trained for each PFT using LSTM, demonstrates the ability to generalize to unseen sites, indicating that the deep learning approach effectively captures the underlying mechanics of canopy greenness development.
2) The incorporation of multi-variate meteorological memory effects proves crucial in canopy greenness modelling.The LSTM model, accounting for these memory effects, can reproduce general temporal dynamics of canopy greenness across various time scales, from daily to inter-annual variability.Furthermore, it captures phenological shift directions, enhancing the model's comprehensive representation.
3) Our sensitivity analysis demonstrates the LSTM model's capability to learn plausible relationships, revealing its proficiency in acquiring fundamental physical knowledge about vegetation greenness and phenological development.
Our deep learning model accounting for multi-variate memory effects holds promise for improving our understanding of vegetation responses to climatic variability.In future, the integration of deep learning phenology models into coupled landsurface and earth system models, may further enhance our ability to comprehend and simulate complex interactions and feedback within these systems.
questions: (1) Can deep learning model perform better than statistical model?And can deep learning model accounting for memory-effects of multiple meteorological variables outperform models without accounting for such memory effects?(2) Does deep learning model successfully capture temporal variations on different time scales of canopy greenness and vegetation phenology?(3) Can deep learning model provide meaningful interpretations of the underlying physical and biological relationships between vegetation greenness, phenology, and a changing climate? 2 Materials and methods https://doi.org/10.5194/egusphere-2024-464Preprint.Discussion started: 7 March 2024 c Author(s) 2024.CC BY 4.0 License.
the left-out data to perform early stopping.Specifically, we stop the training when the performance no longer improves on the left-out validation set of 150 epochs (an epoch refers to one complete pass through the entire training dataset).Furthermore, https://doi.org/10.5194/egusphere-2024-464Preprint.Discussion started: 7 March 2024 c Author(s) 2024.CC BY 4.0 License.
Fig. 3 Model performance of GCC time-series estimation using no-memory-effect model (M0: a, b, c) and full-memory-effect model (Mfull: d, e, f) in testing dataset for deciduous broadleaf (DB), evergreen needleleaf (EN) and grassland (GR).The colour indicates the density of points (light blue is lower density, dark blue is higher density).The solid grey lines denote the 1:1 line.
Fig. 4 Coefficient of determination (R 2 ) comparisons between no-memory-effect model (M0: blue box) and full-memory-effect model (Mfull: red box) in all test sets, unseen site and unseen year for deciduous broadleaf (DB), evergreen needleleaf (EN) and grassland (GR).
Fig. 7 The sensitivity of canopy greenness (GCC) to temperature change only and both temperature and VPD change under 4 °C warming / cooling (red line / blue line) in the all year using Mfull for deciduous broadleaf (in howland2), evergreen needleleaf (in laurentides) and Fig. 8 Temperature sensitivity of start of season (SOS, Fig. 8 a) and end of season (EOS, Fig. 8 b) under warm alone (red) and warm and high VPD conditions (blue) over all three PFTs.