Spring onset forecast using harmonic analysis on daily mean temperature in Germany

The onset of spring is crucial for planning agricultural practices and has influences on animal activities as well. It is no doubt beneficial to properly forecast spring onset in advance. In this work, we present a new method based on harmonic analysis of daily mean temperature to predict the date of spring onset in the next year. The algorithm of the proposed method considers the memory of the seasonal cycle and can be easily conducted using the local past records. This study is based on gridded observational surface air temperature (SAT) in Europe. The SAT data are first decomposed into harmonics by wavelet transform and each harmonic’s time-dependent amplitude and phase are extracted. Then the time evolution for amplitude and phase are modelled by an AR(2) process, where we employ a new method to fit its coefficients from the data. This provides a prediction of the time dependent amplitudes and phases in the next year, from which we compose the future seasonal cycle and identify the prediction of spring onset by threshold crossing. We compare our model with a classical climatological seasonal cycle forecast. While this benchmark forecast leads to only very small variations of the predicted onset date, our method yields predictions which vary by about 15 days from year to year. We verify the correctness of our predictions by a correlation measure and the root mean squared errors.


Introduction
The variation of the growing season is an important indicator of climate change. In the past decades, many studies have reported a lengthening of the growing season at mid-high latitudes, which was mainly caused by an earlier onset of spring (Menzel and Fabian 1999, Ahas et al 2002, Walther et al 2002, Walther and Linderholm 2006, Christidis et al 2007. In Germany, the date of spring onset has been found to be advanced by an average of 0.09-0.21 days per year (Menzel et al 2003). However, the spring onset has not changed evenly with time but shows strong fluctuations. In some locations, the fluctuations of the spring onset date were even more than 40 days between adjacent years. Spring onset affects both physical and ecological systems. For example, the earlier spring onset may on the one hand lead to longer growing season and increase the potential number of harvests. On the other hand, it will increase insect and pest diseases (Patterson et al 1999) and reduce certain crops. Seasonal activities of animals including the migration and breeding date of birds (Sparks and Menzel 2002) are also affected by changes of spring onset. Besides, spring onsets are related to the snowmelt so it will lead to changes of the hydrological cycle (Yang et al 2002). Therefore, predicting the spring onset is of significant socioeconomic importance.
In many studies, the onset of spring was usually analysed and modelled by satellite data, phenological Fabian 1999, D'Odorico et al 2002) and meteorological observations. For example, Xin et al (2015) tested 9 phenological models to predict the timing of grassland spring onset and found that most of them produce biased errors and would misrepresent the timing of vegetation spring onsets. Model development is always constrained by the lack of long-term records from ground phenological observation. Besides, some studies determined spring onset by time of leafing (unfolding and colouring) and flowering (first bloom). This method is of course spatially limited to different plant species, so that it leads to the diverse results. In contrast, temperature records are more easily available and could be a good predictor of the start date of spring. Temperature is one of the main driver for spring onsets and many phenological characteristics in spring are closely linked with surface air temperature (SAT) e.g. Sparks and Menzel (2002), D'Odorico et al (2002).
For daily temperature, the timing of the spring onset is conventionally defined by temperatures rising above a certain threshold (Menzel et al 2003, Walther andLinderholm 2006), for example, the first day when daily mean temperature exceeds 5 Celsius degrees for 5 consecutive days (Jones et al 2002). However, this definition will cause multiple choices for the date of spring onset, owing to the significant fluctuations of the daily mean records. The most common solution is to ensure monotone increasing by filtering/pre-processing the daily temperature time series (Qian et al 2009). In particular, spring onset can be defined through the seasonal cycle due to its smoothness and physical meaning. Usually, the 30year climatology mean temperature is used as the seasonal cycle. Some studies did moving average or thirddegree polynomial filtering (Christidis et al 2007) to the data and obtained the smoothed seasonal cycle. Besides, there are several newly developed methods like Singular spectrum analysis Ghil 1989, Plaut et al 1995), seasonal-trend decomposition procedure based on loess (Cleveland et al 1990), Ensemble empirical mode decomposition (EEMD) (Wu and Huang 2009) and Nonlinear Mode Decomposition (NMD) . They were proven to be the reliable and effective ways to extract smooth and physically meaningful seasonal cycles (Deng and Fu 2018).
In this study, the spring onset is defined as the date when the seasonal cycle intersects a specific temperature threshold, here 5 Celsius. Based on this, we propose a novel prediction model of spring onset. Our forecast model is actually first predicting the seasonal cycle for the next year and then computing the spring onset as its threshold crossing, not predicting spring onset directly. The novelty of our method is that it considers the inter-annual variability and interior memory of the seasonal cycle. The predictions are verified by dates obtained from known temperature records and are compared with another traditional method.
In the following, section 2 introduces the original temperature data used in the Paper. We illustrate the details of our forecast model in section 3. We apply our method to predict spring onset and compare the prediction performance with a traditional method in section 4. Finally, section 5 presents a discussion and summary of this work. We show that our method is able to forecast spring onset utilising the information memorised in seasonal cycle.

Data
The ensembles daily gridded observational (E-OBS) data covering Germany for surface mean temperature from the European Climate Assess-ment&Dataset (ECA&D) were used in this study (https://www.ecad.eu/download/ensembles/downloa d.php). The data were recorded from 1 January 1950 to 31 December 2017 and gridded into a 0.25 degree regular lat-lon grid. This gridded dataset was derived through interpolation of observed station data in land-only Europe. Such regular datasets can help seek regional patterns of coherent variability.
The variations in minimum temperature are larger than those in maximum temperature, however, since we use a seasonal cycle to detect spring onset, no big differences are found by working with maximum, minimum and mean temperature. The only difference is about the threshold temperature (Menzel et al 2003). Here we used daily mean temperatures.
We choose central Europe as our main object because the changes of spring onsets are most remarkable in Europe, compared to other places (Christidis et al 2007). Various factors influence the climate in Germany such as North Atlantic Oscillation (NAO), and high latitude blockings make it full of diversity and an interesting place for study.

Methods
In this work, the date of spring onset s is defined by the intersection of the seasonal cycle and the threshold. Therefore, s is not predicted directly from the past time series itself, but from parameters of the seasonal cycle. Firstly, we will introduce how to extract the instantaneous amplitude A(t) and phase ϕ(t) of the cycle from the measured records. Then the time series of these cycle characteristics will be modelled as a second-order autoregressive process, denoted as AR(2). We determine its parameters using detrended fluctuation analysis (DFA) (Peng et al 1994). We are then able to predict the amplitude A ′ (t) and phase ϕ ′ (t) in the next year. Obviously, amplitude and phase then enable us to calculate the seasonal cycle of the following years from which the predictand, namely the date of spring threshold crossing, can be determined. The last part of methods is about how to verify the prediction with the observation.

Harmonic Analysis (HA)
We assume that the daily air temperature records can be separated into three components: the mean tem-peratureT, the seasonal cycle, and the noise ε(t). The seasonal component can further be transformed into several harmonics with different frequencies. The formula is as follows: Here the fundamental frequency is f 1 = 1/365 and n is the number of harmonics. n varies in different regions. For the temperature data in Germany, the shape of annual cycle deviates from the standard sinusoidal function, due to its long winter and short summer. With n = 2 we obtain good fits. The number of significant peaks in the power spectral density can help to decide on this number n.
To extract periodic components from the raw data, we will first remove the mean value. The harmonics of the seasonal cycle was reconstructed by projecting the residual onto the time-frequency plane, which is called time-frequency representation (TFR) . TFR allow us to visualise the evolution of the signal's spectral content in time and reconstruct the component with specific frequency by inverse transformation. There are many types of TFRs, here we use Wavelet Transform with the Lognormal wavelet. The instantaneous amplitude A 1 (t) and phase ϕ 1 (t) of the annual cycle are obtained from daily SAT data using the wavelet transform with the central frequency corresponding to one year. Similarly, the instantaneous amplitude A 2 (t) and phase ϕ 2 (t) of the semi-annual cycle are extracted with the central wavelet period of 1/2 year.
In order to forecast the seasonal cycle, we have to know the mean temperature, amplitudes and phases of two harmonics in the next year (equation (1)). Thus, we build the model as follows: (a) Since the first harmonic, in other words, annual cycle is the dominant component in the seasonal cycle, we extract its amplitude A 1 (t) and phase ϕ 1 (t) in a high resolution of time and predict A ′ 1 (t) and ϕ ′ 1 (t) for the next year. The prediction method will be explained in the next part.
(b) For the second harmonic whose variance contribution is very weak (see figure 1), we lower the resolution of time to better suppress the noise. Although the semi-annual cycle is not prominent and does not change much, it is still crucial in adjusting the shape of the seasonal cycle. Therefore, we keep its amplitude and phase as constants after the last data point: where t 0 denote the time we started to forecast and a time unit refers to one day. (c) As for the mean temperatureT, considering its low-frequency fluctuation and slowly increasing trend, we directly take the mean temperature of the last year to be the value in next year (persistence).
Finally, the seasonal cycle of the next year is reconstructed as: (3) Figure 1 shows amplitudes and phases in the location near Potsdam. The curves before t 0 are extracted by TFR, after t 0 are the predictions. When deriving the coefficients before t 0 , no observed data ahead of t 0 is used. Except for A ′ 1 (t) and ϕ ′ 1 (t), all parameters can be easily inferred by harmonic analysis (HA).

Forecast AR(2) process
This section describes how we predict the amplitude The procedure to predict the phase ϕ 1 (t) is the same, and amplitude and phase are predicted independently. As we can see in figure 1(a), the time series of the first amplitude A 1 (t) fluctuates strongly on large timescales, however it also has strong short-range correlations, i.e. it is smooth on short time scales. We want to make use of these shortrange correlations and find a prediction model for the amplitude of the harmonics in the next year. The simplest way of modelling short-range correlations would be persistence of the amplitude for the next year, i.e. keep the old value of the year before, like we did for the semi-annual harmonic. A more accurate model is one that describes the oscillations apparent in the time series. The most simple model for noisy oscillations is an AR(2) model, where Φ 1 , Φ 2 are the two AR parameters and a t is Gaussian white noise. This stochastic model is fully characterized by the auto-correlation functions. For a large range of parameters the correlation function of the AR(2) model oscillates with a typical period length while it gets damped and therefore relaxes to zero. It turns out that for our data, the relaxation time is about 3 years, so that one-year length predictions should be feasible. We want to fit this model to the time series of amplitudes and see how well it matches. The typical period length of the amplitude oscillations is apparently very long, so fitting the correlation function for short times is unlikely to give us a good result. Instead we should look at a logarithmic time axis that takes all timescales into account. Such an approach was presented in Kantz (2019a, 2019b)  Massah and Kantz 2016). The relation between the fluctuation function and the autocorrelation function and therefore a way to calculate theoretical correlation functions is known (Höll and Kantz 2015). Following those ideas, here gives an example. The AR(2) parameters Φ 1 , Φ 2 are fitted (figure 2(a)) and used to predict the amplitude in the next 3 years (figure 2(b)). When we want to predict amplitude or phase in some year, e.g. 2013, all the past data (blue line) before 2013 are used to fit the AR(2) model. What we call the expected data (grey line) are the amplitudes obtained by TFR on the time series containing the following years, so they serve as verification. As the inset in figure 2(b) shows, the differences between predictions and verification are only in the beginning and the end. They are caused by the end effect of TFR. So usually the end value of the reconstructed time series will deviate slightly from the true value. Both this end effect and the relaxation time of AR(2) model limit us to a prediction horizon of the order of the correlation time. Here the relaxation time of AR(2) model is about 3 years, so we show 3 years ahead prediction (red line) in figure 2(b). The prediction value drops into its mean state and the error grows in its relaxation time. In the following calculation, we actually only make one-year ahead prediction. Empirically, the error of one-year ahead prediction seems not so large and makes this prediction effective and applicable. We will, however, not evaluate the prediction skill of the seasonal cycle, but the skill of the spring onset prediction based on this cycle.

Climatological Annual Cycle (CAC)
We want to compare our harmonic analysis (HA) prediction model with a traditional way. The annual cycle is usually obtained by averaging each calendar day of a long time period. Considering the descending trend of s, which is probably caused by global warming (Linderholm 2006, Qian et al 2009, we take 30-years moving window to calculate the climatological annual cycle (CAC). That is to say, for each year to be predicted, its annual cycle is the mean of 30years before that year. Here 'annual cycle (AC)' is commonly used in many literatures and has the same meaning as the aforementioned 'seasonal cycle' . CAC method could remarkably reduce the noise compared to raw data, so that the intersection of the threshold and CAC can be found easily. Nevertheless, CAC does not result in the same amount of smoothing as the HA and other methods which involve additional temporal filtering. The temporal filtering of the other methods could result in a slight bias in the onset of spring relative to the unfiltered CAC method.

Verification of predicted spring onsets
Following the above method and equation (3), we are able to forecast the spring onset s as the date when our predicted seasonal cycle crosses the 5 degree Celsius threshold. Then how do we evaluate this forecast? The date of spring onset has no uniform definition. Since our definition is based on seasonal cycle, we perform two decomposition methods to extract seasonal cycles from data of the year for which we perform the prediction and find their intersections with the threshold as the 'true' spring onset date s. One method is ensemble empirical mode decomposition (EEMD), the other is non-linear mode decomposition (NMD), see Sec.1. These two methods are both effective and popular in extracting the seasonal cycle of temperature data (Deng and Fu 2018) and the extracted seasonal cycles are much more smooth than CAC method. It should be noted that although we have the whole years dataset and can derive the spring onset dates from the dataset directly, we actually suppose that we do not know all the future data as in a real situation. We will rather wait the year being forecast passed and then extracted the annual cycle of this year to insure that we do not use any future data beyond the predicting year.
As an example, we show temperature data and annual cycles for Potsdam (52 Instead, the intersections of the raw data (light colour) and the threshold were too many to decide which one should be the reference. Therefore, using AC, extracted by NMD and EEMD, to define spring onset is a more realistic and effective way than directly using daily temperature. The spring onset date s NMD and s EEMD of each year obtained by two methods at Potsdam are given in figure 3(b). We can find that they have a longtime advancing trend, a 7-8 years quasi-period (Jajcay et al 2016) and a high-frequency fluctuation around 3 years. s NMD and s EEMD are highly correlated, cross intensifying their reliability. In order to evaluate our predictions, both s NMD and s EEMD serve as the reference.
In the same way, we calculate s NMD and s EEMD series in every grid point. Figure 4 shows their mean value and standard error. In most regions, spring begins in March. The western regions always enter the spring earlier than the east. The date of spring onset is actually related to the elevation and landform. In places of the southern and east southern edge of Germany where the Bavarian Alps mountain and Bohemian forest are located, the dates of spring onset are significantly later than the other places. And the east northern part near the Baltic Sea does that as well. As for the standard error, the most significant fluctuations appear in the west northern area near the North Sea. The higher standard error makes the prediction more difficult. As for the southern area, the dates of spring onset will be more stable.
In the next section, both the spring onset predicted by HA and CAC are compared with the references s NMD and s EEMD to see their performances. The performance is measured by their error, correlation coefficients and root mean square error (RMSE). Detailed steps of spring onset prediction can be found in the flowchart summarized in the supplementary materials (see figure S1 and its corresponding description).

Forecast spring onset of the year 2017
Here we give an example of how we conduct and check the predictions for one specific year. We apply the HA method of section 3 to the observation data before 2017 (from Jan. 1st, 1950to Dec. 31th, 2016 to predict the annual cycle (AC) for 2017 and then calculate the spring onset date s P at each grid point. Although the prediction of the AC starts at Jan 1 using training data until Dec. 31, the actual spring onset is not before March and hence has a lead time of more than 2 months. The prediction error is calculated by the difference between predicted date and reference date: e P−NMD = s P − s NMD , e P−EEMD = s P − s EEMD (5) Similarly, the dates predicted by the CAC method, which is denoted as s C , are also compared with the referential date to give the error:  The distribution patterns of variables mentioned above are shown in figure 5. Comparing references (figure 5(a) and 5(b)) with predictions (figure 5(c) and 5(d)), both CAC and HA predict the correct pattern. They look like the mean state pattern (figure 4(a) and 4(b)) very much. Spring begins very late in the mountains (Bavarian Alps, Bohemian Forest and Black Forest) and begins earlier in the west than the east. However, their specific dates have deviations. Except for mountains, spring should begin before the end of March in most regions. The predictions by the HA method were very close to the correct dates, but the CAC method predicts that spring begins mostly in April. Furthermore, we present the errors defined   in equations (5) and (6) ure 4(b)). As for HA method, the errors are much smaller and have both positive and negative values limited by 5 days. This result indicates that the error of HA method is universal and much lower than that of CAC method. Let us stress that the fact that CAC predicts later-than-observed dates is specific to the year 2017. As we will illustrate later, CAC predicts almost the same dates every year, so that in other years it might be systematically too early. This is in contrast to HA, which is adaptive to the real situation and hence is able to capture the large fluctuations of reality.

30-years prediction of HA and CAC
To further test our prediction model, we repeated the one-year ahead predictions in section 4.2 over 30 years (from 1988 to 2017). In each grid point, the referential and predicted dates are sequenced to get the referential time series s NMD s EEMD , and predicted time series s P s C : in which i, j denotes the position of the grid point. Figure 6 gives these four series at Potsdam. The date predicted by HA is adaptive and varies along with the references, whereas CAC is more static and predicts roughly the same date every year. The linear regression coefficient of CAC results is −0.2449 day per year, which is due to the long-term warming change. Some previous works (Qian et al 2011) also found that the advancing trend of spring onset is not only caused by global warming but also the phase change of seasonal cycle. Besides, figure 6 shows that in some specific years (1991,1996-1998, 2007, 2013 and 2014), the HA prediction seems totally failed. This may be due to the wrong prediction of ϕ 2 . These 7 years can be taken as outliers. In those years, the prediction error could become very large and seriously affect the correlation results. For example, the Pearson's correlation coefficient (PCC) of S p and S NMD in figure 6 is 0.17, but the PCC increases to 0.53 if those 7 specific years are removed. For other grid points, HA also cannot predict successfully in those years.
Referring to s NMD and s EEMD , s P and s C are evaluated through a correlation coefficient map and the root mean square error (RMSE): Here s i denotes the referential date in s NMD and s EEMD , s ′ i denotes the prediction in s P and s C . The correlation map is calculated from grid-by-grid PCC between the references series s NMD s EEMD and the predicted series s P s C respectively. Since there are two referential sets and two prediction sets, four correlation maps are shown in figure 7 (all calculated without 7 outlier years). The first column (figure 7 (a)(c)) were the correlations between CAC predictions and references. In most regions, they are close to zero and even negatively correlated, which means there is no significant positive correlation between the references and CAC prediction. The significant PCC should be larger than 0.41 when the confidence level is 0.95 (Student's t test). For HA prediction, almost all the values of PCC are significantly positive and larger than 0.4. It indicates that HA method is able to predict the high-frequency variations of spring onset correctly.
In addition, we have to again clarify that figure 7 is calculated only for 23 years. Without outlier years, the mean PCC of HA method is 0.50. When all the 30 years are included, the mean PCC of HA is only 0.21, but still positive. HA still works better than CAC method (see figure S2). Another index to measure the prediction performance is RMSE (figure 8), which could present the absolute error of every time we do the prediction. Considering the bias of CAC method, as we mentioned before and one can see it in figure 6, we bias correct all the forecasts before calculating RMSE. The mean RMSE is about 15 days, which is comparable to CAC. According to figure 4(b) we know that the variations in the north are stronger than in the south, especially the northern west. The patterns of RMSE look like figure 4(b) very much. The values of RMSE are large in the northern west (about 22 days) and small in the southern east (around 12 days).

Conclusion and discussion
In this article, we present a new forecast model to predict the date of spring onset for the next year, based on harmonic analysis of the past temperature records. The method is conducted following these steps: (1) use HA to extract an amplitude and phase time series from the original temperature records up to the end of the year before the year being forecast; (2) fit the amplitude and phase as a function of time by an AR(2) model each, use these models to predict the amplitude and phase in the next year; (3) find the spring onset date in the predicted seasonal cycle.
Two kinds of referential dates are used to verify the predictions. Both of them prove the validity of HA method. Compared with the traditional CAC method, which can only reflect the long trend caused by global warming, HA is able to give the predictions of high frequency variations. The one-year ahead forecasts are repeated for 30 years. Although HA has no obvious advantages in terms of RMSE, it shows superiority in the correlation map. There is no significant correlation between CAC predictions and the references. In contrast, the predictions of our HA method are highly correlated with the referential dates.
Why does the method based on the predicted seasonal cycle give a better predictions of spring onset? It might be because the large-scale air circulation has influences on the seasonal cycle, further the date of spring onset is affected (Qian et al 2011). Many studies found that large scale air circulation such as NAO and northern annular mode (NAM) might affect the phase of the temperature annual cycle (Palus et al 2005, Stine et al 2009, D'Odorico et al 2002. For example, in the northern west of Germany, spring begins early because it is near the warm North Sea. It hardly ever snows, and the westerly brings warm air in the winter (Ekman 1999). When NAO is strengthened, the westerly is stronger and the spring would begin earlier. Our method of capturing and predicting the seasonal cycle may be implicitly predicting variations in the NAO.
Our spring onset forecast is based on the prediction of the annual cycle for the following year after the last data point which is used to fit the parameters of this annual cycle. We explored predictions where the training data end in January, December, or even November (not shown here). Since the relaxation time of the fitted AR(2) models was found to be about three years, we suggest that the gap between the end point used to fit the model and the date to be predicted should be no more than five months. While this is satisfied by our prediction of spring onset, we plan to test this method also for forecasting the difference between winter and summer temperatures, the warmest day, or summer length, etc. One could use the predicted seasonal cycle of temperature for many different forecast tasks, thereby opening a new window towards seasonal predictions. The dynamical reasons why there exists high predictability of the seasonal cycle still remains unclear and need further investigation, and one step will be to measure the skill of the seasonal cycle forecast as a function of lead time.