Int J Appl Earth Obs Geoinformation Characterizing spring phenology of temperate broadleaf forests using Landsat and Sentinel-2 time series

Vegetation phenology has a great impact on land-atmosphere interactions like carbon cycling, albedo, and water and energy exchanges. To understand and predict these critical land-atmosphere feedbacks, it is crucial to measure and quantify phenological responses to climate variability, and ultimately climate change. Coarse- resolution sensors such as MODIS and AVHRR have been useful to study vegetation phenology from regional to global scales. These sensors are, however, not capable of discerning phenological variation at moderate spatial scales. By o ﬀ ering increased observation density and higher spatial resolution, the combination of Landsat and Sentinel-2 time series might provide the opportunity to overcome this limitation. In this study, we analyzed the potential of combined Sentinel-2 and Landsat time series for estimating start of season (SOS) of broadleaf forests across Germany for the year 2018. We tested two common statistical modeling approaches (logistic and generalized additive models using thin plate splines) and the two most commonly used vegetation indices, the Normalized Di ﬀ erence Vegetation Index (NDVI) and the Enhanced Vegetation Index (EVI). We found strong agreement between SOS estimates from logistic and spline models (r EVI =0.86; r NDVI =0.65), whereas agreement was higher for EVI than for NDVI (RMSD EVI =3.07, RMSD NDVI =5.26 days). The choice of vegetation index thus had a higher impact on the results than the ﬁ tting method. The EVI-based SOS also showed higher correlation with ground observations compared to NDVI (r EVI =0.51, r NDVI =0.42). Data density played an important role in estimating land surface phenology. Models combining Sentinel-2A/B, with an average cloud-free observation frequency of 12 days, were largely consistent with the combined Landsat and Sentinel-2 models, suggesting that Sentinel-2A/B may be su ﬃ cient to capture SOS for most areas in Germany in 2018. However, in non-overlapping swath areas and mountain areas, observation frequency was signi ﬁ cantly lower, underlining the need to combine Landsat and Sentinel-2 for consistent SOS estimates over large areas. Our study demonstrates that estimating SOS of temperate broadleaf forests at medium spatial re- solution has become feasible with combined Landsat and Sentinel-2 time series.


Introduction
Vegetation phenology, the timing of seasonal plant development through phases of active growth and dormancy (Hänninen, 1990), is a widely used indicator of climate change impacts on terrestrial ecosystems (Parmesan and Yohe, 2003). Climate change is expected to alter ecosystem functions and processes globally, including spatial shifts of species ranges and temporal shifts of phenological events (Jeong et al., 2011;Rosenzweig et al., 2008). It is therefore important to improve our understanding of current phenological processes to further assess the response of ecosystems to climatic changes.
Remote sensing has become an essential tool for studying phenology at various spatial scales. Coarse-resolution data from the Moderate-resolution Imaging Spectroradiometer (MODIS; e.g. Zhang et al., 2003) and the Advanced Very High Resolution Radiometer (AVHRR; e.g. White et al., 2009) have been used for mapping phenological dynamics of forests globally. However, their spatial resolution is not sufficient to discern phenological patterns in heterogenous landscapes (Fisher and Mustard, 2007;Hufkens et al., 2012b). Conversely, medium resolution sensors have been lacking the temporal resolution to accurately characterize vegetation phenology. To circumvent this issue, studies have estimated inter-annual variations of spring phenology by pooling several decades of Landsat data (Melaas et al., 2013;Senf et al., 2017). To obtain annual SOS from Landsat data, Melaas et al. (2018) made use of increased observation densities in areas where acquisition paths overlapped. However, this method does not allow for spatially continuous phenology estimations.
Estimating spring phenology over shorter time periods has only recently become feasible with the launch of the two Sentinel-2 satellites in 2013 and 2017. Landsat 8 and Sentinel-2 combined offer a global average observation interval of 2.9 days, disregarding cloud coverage (Li and Roy, 2017), which substantially increases the potential for capturing annual phenology at the landscape scale. Underlining this, a recent study in the U.S. by Bolton et al. (2020) showed the increased value of phenological estimates from combined Landsat 8 and Sentinel-2 time series for resolving landscape-scale phenological patterns compared to MODIS-based estimates.
A variety of different methods have been developed for estimating phenological parameters from satellite time series. A common approach is to fit a single (global) function to a vegetation index time series, and subsequently use the fitted model parameters to describe the phenological response (Verma et al., 2016). For example, logistic functions have been widely used for deriving spring and autumn phenology from regional to global scales (Cai et al., 2017;Verma et al., 2016;Zhang et al., 2003). Asymmetric signals are, however, poorly captured by logistic functions (Verma et al., 2016). Alternatively, local smoothing methods such as spline-based methods are more flexible to capture various phenological shapes, but they are more sensitive to noise and observation gaps (Buitenwerf et al., 2015;Melaas et al., 2016). Since the performance of a method is intrinsically linked to the type and quality of the vegetation index time series, it seems logical to also test the most common methods and vegetation indices on combined Landsat and Sentinel-2 time series.
The objective of this study was to assess the potential of combined Landsat and Sentinel-2 time series for characterizing spring phenology in temperate broadleaf forests. Specifically, we addressed the following research questions: 1) How do start of season (SOS) estimates from Landsat and Sentinel-2 time series vary with commonly used models and vegetation indices? To address this question, we assessed the applicability of the frequently used logistic model and an approach based on generalized additive models using thin plate regression splines to derive SOS from the Normalized Difference Vegetation Index (NDVI) and the Enhanced Vegetation Index (EVI). 2) How do SOS estimates from Landsat and Sentinel-2 compare to ground observations? To address the challenge of validating satellite-based land surface phenology, we evaluated our estimates against ground observations of leaf unfolding. 3) How do SOS estimates from combined Landsat and Sentinel-2 time K. Kowalski, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102172 series compare to SOS estimates from single-sensor time series? To address this question, we estimated SOS from Landsat and Sentinel-2 time series separately and compared these estimates to SOS from combined time series.

Study area
We selected Germany as our study site as it covers a wide climate gradient within the temperate forest biome, and because it is one of the European countries with the largest number of phenological observations (Templ et al., 2018). About one third of Germany's total land area is covered by forests, of which nearly 42 % are broadleaf forests (Federal Ministry of Food and Agriculture, 2018). Germany lies in the temperate forest biome, where the annual cycle of plant phenology is mainly driven by temperature. The eastern part of Germany is characterized by a continental climate with high temperature extremes (average annual summer temperature of 18.2°C, average annual winter temperature of 0.7°C; meteorological station: Lübben, 51°55′36″N, 13°52′47″E). The western part is under oceanic influence, with milder winters and summers (average annual summer temperature of 17.6°C, average annual winter temperature of 2.6°C; meteorological station: Werl, 51°34′35″N, 7°53′16″E; DWD Climate Data Center (CDC), 2019). In mountainous regions and towards the Alps, temperatures are lower, varying with elevation (Zöller et al., 2017).

Sampling design
Our analysis is based on a large sample of Landsat and Sentinel-2 pixels close to phenological ground observations (Fig. 1). Because it is not feasible to directly match a single forest pixel to ground-based observations of leaf unfolding, we chose a two-stage sampling design to capture the phenological variability in the vicinity of the ground observations. The primary sampling units of the first sampling stage were the locations of the ground observations (plots) and a 2 km buffer zone. For each plot, up to six species-specific phenological observations were recorded. The exact location of each individual observation is unknown but restricted to a distance of 2 km from the reported center location of the plot (German Weather Service, 2018). Within each 2 km buffer, we masked all non-broadleaf forest pixels using the Landsat-based land cover classification by Pflugmacher et al. (2019), and further masked pixels disturbed within the last 32 years using the forest disturbance map by Pflugmacher et al. (2020). Finally, we imposed a minimum mapping unit of 11 contiguous pixels, i.e. an area of 9900 m 2 , to the remaining broadleaf forest pixels. In the second sampling stage, we selected thirty overlapping Landsat and Sentinel-2 pixels within each buffer (Fig. 1). Overall, we selected 767 plots (primary sampling units) and 23,010 pixels (secondary sampling units).

Landsat and Sentinel-2 data processing
We used all available Landsat 7, Landsat 8, and Sentinel-2A/B images acquired in 2018. We downloaded the Landsat Tier-1 Collection 1 data from the USGS archive, and the Sentinel-2 Level-1C data from the Copernicus Open Access Hub. To build harmonized time series from different sensors, we used the Framework for Operational Radiometric Correction for Environmental Monitoring (FORCE v.2.0;Frantz, 2019). In FORCE, the atmospheric correction is based on radiative transfer theory using a combined image-, database-and object-based estimation of aerosol optical depth (Tanre et al., 1979). The MODIS precipitable water product is used to characterize gaseous absorption (Gao and Kaufman, 2003). For the topographic correction, FORCE uses a modified image-based C-correction with a 1-arc-second digital elevation model (DEM) derived from the Shuttle Radar Topography Mission (SRTM; NASA JPL, 2013). To minimize the effects of sensor differences, we adjusted the blue, red and near-infrared spectral bands of Landsat 7 and 8 to match Sentinel-2 bands using reduced major axis regression (RMA; Smith, 2009). RMA models were built using a subset of pixel samples where data pairs from Landsat 7/8 and Sentinel-2A/B were acquired on the same day during 2016-2019. Model parameters (slope and intercept) were used to adjust the blue, red and near-infrared bands of Landsat 7/8. For each sample, we calculated EVI (Liu and Huete, 1995) and NDVI (Tucker, 1979) time series. We excluded all observations flagged as cloud, cloud shadow, or snow in the FORCE pixel quality layer.

Landsat and Sentinel-2 phenological models
We fitted two models for each of the 23,010 pixel samples: (i) a parametric logistic model (LOG), and (ii) a Generalized Additive Model (GAM) using thin plate splines (Fig. 2). Following Melaas et al. (2013), we used a logistic function of the form: where VI is the vegetation index (NDVI or EVI) at day of year (DOY) t. v min and v max are the minimum and maximum of the function, respectively. The parameter g is the rate of change (i.e. vegetation green-up rate), and s is the location of the inflection point. Following previous studies (Melaas et al., 2013;Zhang et al., 2003), we used the parameter s as SOS estimate (Fig. 2). SOS was defined as the point in time where the modeled change rate of the vegetation index was highest (Verma et al., 2016). The logistic model was fitted using the nls.multstart package in R (Padfield and Matheson, 2018;R Core Team, 2017). The implemented algorithm searches combinations of starting parameters and uses the Levenberg-Marquardt algorithm to identify the best model based on the Akaike Information Criterion (AIC). As initial search values, we set v min and v max to the minimum and maximum of the respective vegetation index time series and the initial SOS value s to the mean of all DOYs of observations where the vegetation index value was larger than the median. For the rate of change g, initializing models based on a range of starting parameters (intervals of 0.1 between 0 and 1) increased convergence of the models considerably. K. Kowalski, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102172 The GAM was fitted using the mgcv package in R (Wood, 2003). The generic formula of a one-dimensional GAM is: where VI is the observed vegetation index value at time t, β 0 is the intercept of the function at = t 0, s represents a smoothing function of covariate t, in this case the DOY, and ε t is a random error term with The smoothing function is defined by where the final smoothing function s t ( ) is the sum of all K basis functions b t ( ) k multiplied by its corresponding weight β k . We used thin plate regression splines as smoothing function, following recommendations by Wood (2003). Like the logistic function, the SOS of the GAM is derived by finding the DOY where the slope (i.e. the rate of change) of the function is highest (Fig. 2). The rate of change of VI t ( ) at a given point P t VI ( | ) is found by approximating the first derivative using finite differences. To decrease a potential negative bias of observations influenced by unfavorable atmospheric conditions, we iteratively assigned lower weights to observations lying below the previous model fit (Fig. 3). Weights were assigned as the quotient of observed and predicted EVI or NDVI value, raised to the power 4. All weights larger than 1, i.e., lying above the previous model fit, were set to 1. We found that 3 iterations removed the effects of outliers sufficiently (Fig. 3).
We fitted both models to NDVI and EVI time series between January 1 st and the time of maximum vegetation greenness in summer. We included additional observations within 20 days following the vegetation index maximum or, if no observation followed within 20 days, the next observation to account for variation in the data and to improve model fit. To minimize the effect of outliers, which can be caused by, e.g., snow cover during winter months (Beck et al., 2006), we calculated the mean base value of observations between DOY 1 to DOY 50 and replaced all values during that time with the base value.
We obtained four SOS estimates for each sample (i.e., GAM NDVI , GAM EVI , LOG NDVI , and LOG EVI ) by estimating SOS with both models and indices. In the same manner, we estimated SOS from single-sensor time series of Landsat and Sentinel-2, resulting in 8 additional estimates per sample. We calculated the mean difference between SOS estimates (Bias), the root-mean-square deviation (RMSD), and the Pearson correlation coefficient to evaluate the agreement between model results.

Evaluation of SOS estimates
For evaluating the satellite-based SOS estimates, we used phenological observations of the main broadleaf tree species (Aesculus hippocastanum, Alnus glutinosa, Betula pendula, Fagus sylvatica, Fraxinus excelsior, and Quercus robur) in Germany as in-situ reference. Data were provided by the Pan European Phenology project (Templ et al., 2018). We derived the average date of leaf unfolding (BBCH-code = 11; Meier, 2018) from the phenological observation (Fig. 1). In total, we obtained ground-based SOS estimates for 767 plots. For comparing SOS estimates from Landsat and Sentinel-2 time series to the phenological ground observations, we averaged all 30 pixel samples for each plot.

Effect of model and vegetation index on SOS estimates
We found a moderate to strong correlation between SOS from logistic models and GAMs (r EVI = 0.86 and r NDVI = 0.65; Fig. 4). Estimates from different models were highly consistent using EVI, whereas NDVI-based SOS estimates varied more strongly with the choice of model. Overall, the choice of vegetation index had the strongest impact on SOS estimates. SOS from NDVI and EVI correlated moderately (r GAM = 0.65, r LOG = 0.7) but they were less consistent than the EVIbased estimates from different models as indicated by the higher RMSD and Bias (Fig. 4). Notably, NDVI-based models estimated SOS on average 3 days earlier than EVI-based models.
The majority of SOS estimates (0.05 th -0.95 th quantile) ranged between DOY 98 and DOY 131, corresponding to SOS estimates in the time period from the beginning of April to mid-May. Spatially, SOS estimates varied along a North-South gradient with later SOS estimates in northern than in central and southern regions of the study area (Fig. 5). In regions associated with lower temperatures such as mountain ranges, SOS tended to be later than in surrounding areas.

Evaluation of phenological estimates
Model choice influenced SOS estimates less than the two vegetation indices. We therefore report results for evaluation of SOS estimates from one model (GAMs).
The correlation between satellite-based SOS estimates and groundbased observations of leaf unfolding was moderate (Fig. 6). EVI-based SOS estimates showed the highest correlations with ground observations (r = 0.51). In comparison, NDVI-based SOS estimates showed a weaker correlation with ground observations (r = 0.42). SOS estimates from ground observation were on average 6 days earlier than the EVI models and 3 days earlier than the NDVI models. After removing the systematic bias, the average difference (RMSD) between ground observations and satellite observations was 4.51 days for the EVI model and 4.84 days for the NDVI models, suggesting that EVI models were slightly more consistent with ground observations than NDVI models.

Comparison of single-sensor and multi-sensor SOS estimates
SOS estimates using only Sentinel-2 data were moderately to strongly correlated with SOS estimates from combined time series (r EVI = 0.81, r NDVI = 0.58). Sentinel-2 time series had an average observation frequency of one cloud-free observation every 12 days, whereas the combined Landsat and Sentinel-2 time series had an average observation frequency of 8 days. From Landsat data, one cloud- free observation was available every 21 days on average, which substantially diminished the correlation with the combined time series estimates (r EVI = 0.37, r NDVI = 0.23). In line with these results, the comparison of correlation coefficients, where SOS estimates were grouped by observation frequency, revealed that the agreement between SOS estimates weakened as observation frequency decreased (Fig. 7). For Sentinel-2, correlations to SOS estimates from combined time series were strong (r > 0.9) for observation frequencies higher than 10 days. The highest observation frequencies of 5 days led to the most consistent SOS estimates. SOS estimates from Landsat were, on average, 3 days later than estimates from combined time series. Sentinel-2-based SOS estimates were unbiased for EVI time series and 1 day earlier for NDVI time series. Comparing aggregated SOS estimates from either Landsat or Sentinel-2 to ground observations of leaf unfolding, we found weak to moderate correlations (r = 0.15-0.47). SOS estimates from Landsat showed weaker correlations (r = 0.15-0.24) than Sentinel-2-based estimates (r = 0.37-0.47). SOS estimates from Sentinel-2 and estimates from combined time series had nearly identical bias and RMSD when comparing them to ground observations (Bias = 2.8-6.23 days; RMSD = 6.5-7.86 days). For estimates based on Landsat time series, both measures were consistently higher (Bias = 6.52-9.01 days; RMSD = 10.29-11.05 days).
The results indicate that the high observation frequency of Sentinel-2 may be sufficient to estimate SOS from EVI time series in Germany for 2018. However, there are several areas that seem to benefit from adding Landsat observations. For example, observation frequency was considerably lower outside swath overlap areas of Sentinel-2 and in mountain ranges in central and southern Germany. In the latter case, this is related to increased cloud cover which resulted in observation frequencies of less than 15 days for the study period from January to June (Fig. 8).

Estimating spring phenology from Landsat and Sentinel-2 time series
We evaluated two models (logistic models and GAMs using thin plate regression splines) and vegetation indices (EVI and NDVI) to estimate spring phenology of temperate broadleaf forests from combined Landsat and Sentinel-2 time series. The results show that the logistic model and the GAM produced comparable SOS estimates, and that the choice of vegetation index had a higher influence on SOS estimates than the choice of model. The finding that two very different modeling approaches, with different sensitivities to data gaps and noise, come to the same conclusion speaks for the robustness of SOS estimates from combined Landsat and Sentinel-2 data. In other words, the choice of model becomes less important with very dense time series, at least in temperate forests.
The comparison of EVI and NDVI time series provides a useful frame of reference for the comparison of the model types, but it also confirms other studies that the choice of vegetation index matters. Using MODIS data, Hufkens et al. (2012a) demonstrated the impact of vegetation indices for retrieving phenological parameters from remote sensing K. Kowalski, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102172 time series. A great majority of studies relied on NDVI for deriving spring phenology (e.g. Beck et al., 2006;Jeong et al., 2011;Jönsson et al., 2018). Here, we found a higher suitability of EVI for deriving phenological estimates in temperate broadleaf forests. This result is in agreement with MODIS-based studies (Klosterman et al., 2014;Liang et al., 2011;Tan et al., 2008), but we showed that it also holds true for combined Landsat and Sentinel-2 time series. The NDVI has well-studied drawbacks compared to EVI such as its earlier saturation over areas with high biomass and its sensitivity to snowmelt dynamics. Additionally, the EVI is less sensitive to soil background (Huete et al., 2002;Tan et al., 2008). These factors might have influenced phenological estimates from NDVI time series in our study area. Besides NDVI and EVI, there might be additional vegetation indices better suited for estimating phenology. Jin and Eklundh (2014), for example, developed a plant phenology index that linearizes the relationship with canopy development. However, NDVI and EVI are easy to calculate and available as ready-to-use data, making them widely applicable.

Evaluation of phenological estimates
We found a moderate agreement between satellite-based SOS and ground observations and a consistent offset between the two data sets. Moderate agreements between satellite-based SOS and ground observations have been reported also by other studies (Melaas et al., 2016;Rodriguez-Galiano et al., 2015), albeit the reported offsets vary. Analyzing different levels of leaf development at a broadleaf site in North America, Melaas et al. (2016) found that satellite-based SOS, also derived from the inflection point in the transition period, were most highly correlated to ground observations when the leaf length reached 25%. In this study, we only had ground observations of the unfolding of the first leaves, which explains why the satellite-based SOS estimates were systematically later. However, this is not problematic as these systematic differences can be accounted for.
The lack of a strong agreement between satellite and ground-based SOS is not entirely surprising, as ground observations target single trees and specific phenophases, which are not representative of the phenological dynamics of an entire forest community as observed from space (Fisher and Mustard, 2007;White et al., 2009). Our results, with support of these studies, suggest that it is challenging to evaluate satellitebased SOS estimates with ground-based observations of species-specific phenophases. Underlining this, Liang et al. (2011) found that an upscaling approach from detailed ground observations to landscape-level phenology enhanced the comparability to satellite-based land surface phenology. Scientists and managers planning future phenological observation sites should recognize challenges related to scalability and representativeness of ground observations to improve their integration with satellite-based phenological monitoring networks.
The spatial variation of spring phenology related to regional-scale meteorological gradients was reproduced by estimates from combined Landsat and Sentinel-2 time series. In agreement with spatial patterns reported here, other studies (e.g. Doktor et al., 2005;Scheifinger et al., 2002) showed influences of prevalent large-scale weather conditions during spring on phenological timing in our study area. In central and southern Germany, SOS varied spatially with topography displaying temperature-elevation gradients and atmospheric inversions in mountain valleys (Scheifinger et al., 2002;Senf et al., 2017).
Our results suggest that combining Landsat and Sentinel-2 is a crucial step for estimating annual spring phenology of temperate forests from medium resolution sensors. Estimates from Landsat alone were generally poor as a result of the comparably low observation frequency. Melaas et al. (2016) also found that observation frequency of Landsat alone is not sufficient to map annual land surface phenology in temperate forests. In comparison, our SOS estimates from Sentinel-2 were generally consistent with the combined time series indicating that Sentinel-2 alone may be sufficient for annual SOS estimates. However, observation frequency was not evenly distributed across space. Outside swath overlap areas of Sentinel-2 and in mountain ranges, observation frequency was considerably lower, even from combined time series. Therefore, the inclusion of Landsat data added valuable information to the time series, especially in more data-sparse areas. Kovalskyy and Roy  K. Kowalski, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102172 (2015) showed that cloud cover substantially decreased available observations, e.g., across the United States. This was also the case for our study area as average observation frequencies were substantially lower than theoretical revisit intervals (Li and Roy, 2017). These results highlight the need for dense time series to estimate SOS annually. The frequency of cloud-free observations usually not only varies spatially but also from year to year. In this study, we analyzed only a single year, since 2018 was the first year for which both Sentinel-2A and 2B were available. Since 2018 was a hot and dry year with exceptionally low precipitation and cloud cover, we expect the results to be more optimistic than for more cloudy years. The year 2018 had an average of 2015 sun hours, the highest average since 1951 (thirty-year average = 1600 sun hours; German Weather Service, 2020). Also, the following year 2019 showed substantially higher than average sun hours in Germany (1834 sun hours). To understand how our results translate into future medium resolution phenology observations remains to be seen as Sentinel-2 time series accumulate, but we are also likely to see an increase in the capacity of medium resolution sensor acquisitions in the coming years (e.g., with the launch of Sentinel-2C).

Conclusion
Phenological shifts caused by climate change are altering structure, composition and function of ecosystems worldwide (Chuine, 2010;Parmesan and Yohe, 2003). Apart from ground observations of phenological stages, satellite remote sensing has become an important data source for deriving information on the phenological development of vegetation (Tang et al., 2016). In this study, we used dense time series of medium resolution data from Landsat and Sentinel-2 to model and characterize spring phenology of temperate broadleaf forests in Germany for a single year. By comparing two commonly used models and vegetation indices, we showed that the choice of vegetation index had a higher impact on the resulting SOS estimates than choice of model. We found a preference for EVI when comparing estimates to ground observations, even though we note that comparing land surface phenology to species-specific phenophases is challenging. We further showed that decreasing data density altered SOS estimates considerably, emphasizing the need for dense observation intervals and multi-sensor time series. Our results demonstrate the potential of combined Landsat and Sentinel-2 time series for estimating SOS annually.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.  K. Kowalski, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102172 FORCE. We also thank Markus Ungersböck for providing the phenological ground observations through the PEP725 project.