Comparison of parametric and non-parametric time-series analysis methods on a long-term meteorological data set

In the present explorative study, different time-series analysis methods, such as moving average, deterministic methods (linear trend with seasonality), and non-parametric Mann – Kendall trend test, were applied to monthly precipitation data from January 1871 to December 2014, with the aim of comparing the results of these methods and detecting the signs of climate change. The data set was provided by the University of Pannonia, and it contains monthly precipitation data of 144 years of measurements (1,728 data points) from the Keszthely Meteorological Station. This data set is special because few stations in Hungary can provide such long and continuous measurements with detailed historical background. The results of the research can provide insight into the signs of climate change in the past for the region of West Balaton. Parametric methods (linear trend and t -test for slope) for analyzing time series are the simplest ones to obtain insight into the changes in a variable over time. These methods have a requirement for normal distribution of the residuals that can be a limitation for their application. Non-parametric methods are distribution-free and investigators can get a more sophisticated view of the variable tendencies in time series.


Introduction
Climate change is one of the serious problems that mankind should face in the 21st century. Even the IPCC report (2013), while itself not a scientific publication, is based on more than 9,200 scientific publications, and states that a human role in the process admits of no doubt (95% is the probability that human influence has been dominant in the present changes of climate system). One of the main conclusions of the Summary for Policymakers (IPCC 2013) is that "it is extremely likely that human influence has been the dominant cause of the observed warming since the middle of the 20th century." It is mainly supported by Chapter 10 of AR5: "detection and attributionfrom global to regional" (de Larminat 2016). Climate change will probably affect all parts of the Earth, and in Central Europe, the Carpathian Region will be influenced as well. The hydrological cycle is an element of the climate system that is expected to change and the signs of these amendments can already be detected. Precipitation strongly influences the water cycle from local to global scale. Any modification in the amount or distribution of rainfall has significant impact on water availability, and therefore on water management.
The prediction of the effects of climate change on the Carpathian Region (including Hungary) is well investigated by Judit Bartholy and colleagues. This group of researchers applied Regional Climate Models to estimate projections of future climate for the Carpathian Region. Several publications underlined that the amount of precipitation will decline in the summer half-year and there is high uncertainty for the rainfall for the winter half-year (Bartholy et al. 2004(Bartholy et al. , 2008(Bartholy et al. , 2015Horányi et al. 2010;Kis et al. 2014;Pongrácz et al. 2011Pongrácz et al. , 2014. Recent model predictions of Kis et al. (2017) state that the spatial distribution of precipitation is not likely to change remarkably in the future in the Carpathian Region during the period of 1961−2100, but the annual distribution of precipitation is projected to be restructured. However, the hydroclimate of the region is quite variable in space and time , as the shallow groundwater fluctuations are driven by the Mediterranean cyclones from the Gulf of Genoa and by local/regional climate variables (Garamhegyi et al. in press). Besides the model predictions, it is interesting to search analogies of projected climate during the history of the Earth for better understanding of the processes. Prista et al. (2015) worked out chronostratigraphic analogies for IPCC scenarios, and stated that the Pliocene (mid-Piacenzian warm period) is the best analogue for warming climate in Europe.
For the tendencies of the past, Szalai et al. (2005) stated that the annual precipitation amount decreased by 11% between 1901 and 2004, according to the analysis of the Hungarian Meteorological Service. The biggest decline could be experienced in the spring; it was 25% for the aforementioned period. Bodri (2004) suggested a slow decrease in precipitation with a noticeable increase in precipitation variability for the 20th century. While Northern and Western Europe receive more precipitation in parallel with the warming tendency, Hungary, much like the Mediterranean region, gets less rainfall. The water balance shows a deficit in that the difference between water income and outflow is increasing. Between 1901 and 2009, the highest precipitation declines over the territory of Hungary occurred in the spring, nearly 20% of them (Lakatos and Bihari 2011).  examined several precipitation extreme indices and suggested that regional intensity and frequency of extreme precipitation increased in the Carpathian Basin in the second half of the last century, while the total precipitation decreased.
The aim of this study is to analyze the long-term data series of the meteorological measurements of precipitation amount at Keszthely (Western Hungary, N 46°44′, E 17°14′; Fig. 1) between 1871 and 2014 from the point of view of climate change, and to compare different statistical methods (conventional "regression on time" method and non-parametric Mann-Kendall trend test) on the results of time-series analysis based on this data set.
Several examples can be found in the literature for the application of the Mann-Kendall trend test, for example, Patle and Libang (2014) argued on trend analysis of annual and seasonal rainfall in the northeast region of India, and Salmi et al. (2002) analyzed the trends of atmospheric pollutants in Finland. Meteorological applications can be read in Rahman and Begum (2013) who determined trends of rainfall of the largest island in Bangladesh. Ganguly et al. (2015) investigated the tendencies of rainfall in Himachal Pradesh (northern India) between 1950 and 2005. Gavrilov et al. (2015Gavrilov et al. ( , 2016Gavrilov et al. ( , 2018 examined trends of air temperature by Mann-Kendall test in Vojvodina, Serbia. Salami et al. (2014) applied this non-parametric trend test for the analysis of hydrometeorological variables in Nigeria. Mapurisa and Chikodzi (2014) made an assessment of trends of monthly and seasonal rainfall sums in southeastern Zimbabwe. Karmeshu (2012) investigated the temperature and precipitation changes in the northeastern United States. Hydrological utilization is provided by Hamed (2008). Burn and Hag Elnur (2002) estimated the trends and variability of 18 hydrological variables by Mann-Kendall trend test. Hirsch et al. (1991) used the method for the investigation of stream water quality. Chaudhuri and Dutta (2014) analyzed the trends of pollutants, temperature, and humidity in India. Zarei et al. (2016) examined drought indexes in Iran applying the Mann-Kendall trend test. Gocić and Trajković (2013a) analyzed precipitation and drought data sets in Serbia using the non-parametric trend test. Several other applications of the Mann-Kendall trend test related to climate change can be found in the literature, for example, Jaagus (2006)

Data and methods
Monthly amounts of precipitation were analyzed from 1871 to 2014, initially measured in the area of the ancient Georgikon Academy of Agriculture at Keszthely, then at the meteorological station of the Hungarian Meteorological Service. The data set was provided by the Department of Meteorology and Water Management of the 318 Kocsis et al. Lake Balaton in Europe (upper left) with its artificial channel Sió (upper right) and natural water catchment (lower panel) according to Mika et al. (2010) University of Pannonia Georgikon Faculty (Keszthely). This data set is special because few stations in Hungary have continuous measurements over more than 140 years with detailed historical background (Kocsis and Anda 2006). The meteorological station of Keszthely was among those few important stations of Hungary that began measurements for the first time in the history of the Hungarian meteorological observations. A detailed history of the meteorological measurements is given by Kocsis and Bem (2007).

Linear regression and moving average
Simple linear regression of Y on t is a method to determine the tendency (Eq. 1): where β 0 is the intercept of the trend line, β 1 is the slope of the trend line, t is the number of time step, and ε t is the residual. The significance of the slope can be tested by several methods. In this study, the significance of the slope coefficient β 1 was tested by t-test. The regression model must be checked for normality of residuals, constant variance, and linearity of the relationship (Helsel and Hirsh 2002). This method is often called "regression on time" and the estimation method is the ordinary least squares (OLS) estimator. During the hypothesis test, an α = 5% significance level was used by one-tailed test, as it is supposed that the precipitation amount is likely to decrease, therefore β 1 is expected to be negative.
Another method for tendency detection is the moving average, and as seasonal component has a probable effect on the time series, a moving average of 12 tags that should eliminate a part of the seasonal effect, is applied, and every mean seasonal deviation was calculated for each season.

Mann-Kendall trend test
The Mann-Kendall trend test is widespread in climatological and hydrological analysis for time series; since it is simple and robust, it can cope with missing values and values below the detection limit (Gavrilov et al. 2016). This non-parametric test is commonly used to detect monotonic tendencies in series of environmental data as well (Pohlert 2017). No assumption of the normality is required (Helsel and Hirsh 2002). Hamed and Rao (1998) developed a modified Mann-Kendall test for autocorrelated data. Application of this modified method is presented, for example, by Amirataee et al. (2016). Yue et al. (2002) investigated the power of the Mann-Kendall test in hydrological series.
The Mann-Kendall trend test is based upon the work of Mann (1945) and Kendall (1975), and is closely related to Kendall's rank correlation coefficient. 320 Kocsis et al. The methodology is introduced following the detailed descriptions given by Gilbert (1987) and Hipel and McLeod (1994) as follows: In the case of determining the presence of a monotonic trend in a time series, the null hypothesis (H0) of the Mann-Kendall test is that the data come from a population where random variables are independent and identically distributed. The alternative hypothesis (Ha) is that the data follow a monotonic trend over time. The Mann-Kendall test statistic is given as (Eq. 2): where j > k and k = 1, 2, : : : , n − 1, j = 2, 3, : : : , n and n is the number of the data. sgn(x j − x k ) is calculated as follows (Eq. 3): (3) Kendall (1975) proved that S is asymptotically normally distributed with the following parameters (mean and variance; Eq. 4): where g is the number of tied groups in the data set, t p is the number of data in pth tied group, n is the number of data in the time series. A positive value of S means that there is an increasing trend, whereas a negative value of S means the opposite, that is, a decreasing trend with time. It was proven that over n > 10 number of data, the standard normal variate Z can be used as for hypothesis test (Eq. 5): During the hypothesis test, an α = 5% significance level was used in a one-tailed test, as it is supposed that precipitation amount is likely to decrease; therefore, τ is Comparison of parametric and non-parametric time-series analysis methods 321 expected to be negative, and the empirical significance level was determined. S is closely related to Kendall's rank correlation coefficient (τ; Eq. 6): where D is the possible number of data pairs from n member of the data set (Eq. 7)

Modified Mann-Kendall trend test for serially dependent data (seasonal Mann-Kendall trend test)
If seasonal cycles are present in the time series, it is suggested to use a trend test that removes the effect of seasonality (Gilbert 1987). Hirsch et al. (1982) and Hirsch and Slack (1984) developed the method and introduced the seasonal Mann-Kendall test for data that are serially dependent. In the modified Mann-Kendall trend test, a series of x observations recorded over K seasons for L years (without any tied values) is expressed as the following matrix (Rahman et al. 2017; Eq. 8): X = 2 6 6 4 x 11 x 21 · · · x K 1 x 12 x 22 · · · x K 2 · · · · · · · · · · · · x 1 L x 2 L · · · x K L 3 7 7 5 .
Let x il be the datum for ith season of the lth year. Each season (in reality each month) of all of the observed years is used to compute the Mann-Kendall parameter of S. Let S i computed for i season as follows (Gilbert 1987; Eq. 9): where l > k and n i is the number of data (over years) for ith season and (Eq. 10): The computation method of VAR(S i ) is given by Gilbert (1987). The S′ statistic for seasonal Kendall is computed as (Eq. 11) and VAR(S′) is as follows (Eq. 12): Finally, the Z-test statistic is computed and tested by hypothesis test (Eq. 13): The presence of positive autocorrelation in the data increases the chance of detecting trends when none actually exist, and vice versa (Hamed and Rao 1998). This effect of the existence of autocorrelation in data is often ignored. Hamed and Rao (1998) supposed a modified non-parametric trend test, which is suitable for autocorrelated data, and gave a detailed description of the modified Mann-Kendall trend test for autocorrelated data. In this study, this type of Mann-Kendall test was also applied.

Sen's slope estimator
After detecting the non-parametric trend, Sen's (1968) slope estimator was applied. This is a non-parametric method that can calculate the change per time unit (direction and volume). Sen's method uses a linear model to estimate the slope of the trend, and the variance of residuals should be constant is time (da Silva et al. 2015). First, N′ slope estimates were calculated (Q; Eq. 14): where x i ′ and x i are the data values at times (or during time period) i′ and i, respectively, and where i′ > i. N′ is the number of data pairs for which i′ > i. In the case where there is only one datum in each time period (Eq. 15), Comparison of parametric and non-parametric time-series analysis methods 323 where n is the number of time periods (Gilbert 1987;Gocić and Trajković 2013b). The N′ values of Q are ranked from smallest to largest and the median of Q values gives the slope of the tendency. The advantage of this method is that it limits the effect of outliers on the slope (Shadmani et al. 2012), and it is robust and free from restrictive statistical constraints (Lavagnini et al. 2011). Sen's slope estimator is widely applied in hydrological and meteorological research, for example, Marofi et al. (2012), Huang et al. (2013), Guo and Xia (2014), Talaee (2014), Zamani et al. (2016), Amirataee et al. (2016), and Liuzzo et al. (2016).

Seasonal slope estimator
The seasonal slope estimator is a generalization of Sen's slope estimator discussed above. A description of the method is given following Gilbert (1987). The individual N i slope is calculated first for the ith season as (Eq. 16): where l > k, and x il is the datum for ith season of the lth year and x ik is the datum form the ith season of the kth year. This computation is made for each of K season. Then, N1′ + N2′ + : : : + NK′ = N′ individual slope estimates are ranked and found their median (Gilbert 1987). Addinsoft's XLSTAT (2017) were used for carrying out the computations.

Results of "regression on time" and moving average
A total of 1,728 monthly precipitation data were analyzed. Mean monthly precipitation at Keszthely is 56 mm with a standard deviation of 37 mm. As a declining tendency in precipitation is proved for the territory of Western Hungary, a decreasing trend was supposed. Linear tendency ( y ⌢ t = 59 − 0.003 × t ) can be detected in one-tailed t-test (β 1 < 0) at α = 5%, and an alternative hypothesis can be accepted at a p value of 3.1% (Figs 2 and 3). The slope was −0.003 mm per time step (month).
There are multiple reasons for which these fitted values and corresponding p values are not entirely trustworthy. There is a significant correlation between the residuals (Fig. 4), which is not at all surprising, as we expect to have a yearly periodicity in the precipitation.
324 Kocsis et al. A moving average with tags of 12 sums can be used as a smoothing method that can partly eliminate the effect of the seasonality in the data series. The tendency of the 12MA (moving average) is not so clear on Fig. 5.
Trend analysis can be followed by the decomposition of the time-series data to trend, average seasonality, and random component. The tendency is modified by seasonal effect that can be described by corrected mean seasonal deviation. Corrected mean seasonal deviation gives the average volume of how much the seasonality increases or decreases the value given by the main trend (Table 1). Corrected mean seasonal deviations were computed using the values of moving averages, which filter the effect of seasonality and causality.

Result of non-parametric methods
A parametric method, such as "regression on time," is a commonly used method to determine the main tendency of the time series, but the requirement for the normal 326 Kocsis et al. distribution of residuals, namely that they should be uncorrelated, is not fulfilled. Another choice for detecting tendency is the non-parametric method of the Mann-Kendall trend test. In this case, non-parametric methods can give more appropriate results for the trend. In case the sign of the changes is determined (one-tailed test, τ < 0), significant decreasing modification can be seen with a p value of 3.24%. Sen's slope estimator gives a slope of −0.003 mm per month, similarly to a linear trend. As the time series contains a seasonal component, the values are not serially independent. A seasonal Mann-Kendall trend test was also applied, and the onetailed test proved the significant negative tendency at a p value of 3.86%. Sen's slope was −0.033 mm per time step (month) by paying attention to the effect of seasonality.
The data in the time series are autocorrelated and not serially independent. The modified Mann-Kendall trend test suggested by Hamed and Rao (1998) for autocorrelated data was used to detect the supposed declining tendency of the data set (one-tailed test, τ < 0). This method showed that no significant negative trend can be detected (p value was 50%). Therefore, if autocorrelation of the data is taken into account, no significant tendency can be statistically proven, Comparison of parametric and non-parametric time-series analysis methods 327 and it can be supposed that the monthly precipitation amount did not change significantly.

Discussion
A slow decrease in precipitation, together with the noticeable increase in precipitation variability, is characteristic for the 20th century (Bodri 2004). The tendency of the annual precipitation amounts between 1960 and 2009 showed a slight decrease in Hungary and a declining trend in Western Hungary which is higher than average, whereas in the northeastern part of the country precipitation amount increased (Lakatos and Bihari 2011). Lakatos and Bihari (2011) used the conventional separation of annual and seasonal precipitation amounts for research of changes in their study. The 144-year-long continuous data set of monthly precipitations had not been analyzed previously.

Conclusions
According to the "mainstream" opinion in climatology in Hungary, a decreasing tendency is supposed in monthly precipitation amounts; both parametric and nonparametric methods prove a significant negative trend in the time series of monthly precipitation amounts at Keszthely. However, the residuals of linear regression do not follow normal distribution and there is autocorrelation between them. Therefore, the results do not fulfill the requirements of diagnostic check stage. Moving averages can be used as smoothing technique that partly filter the effect of the seasonality and should provide information about the main tendency. The nonparametric Mann-Kendall trend test can be the chosen method as well, and has the advantage that it has no strict requirements for application. When analyzing monthly precipitation amounts, the effect of seasonality leads to the serial dependence of the data. This fact must be taken into account; therefore, the seasonal Mann-Kendall trend test can be used. This method in one-tailed test resulted in a significant negative tendency of monthly precipitation between 1871 and 2014 at a p value of 3.86%. Sen's slope estimator calculated −0.033 mm decrease in precipitation sum per time step (month) over the examined period by paying attention to seasonality. The modified Mann-Kendall trend test for autocorrelated data was also used and showed that there is no significant negative tendency in the time series. This result highlights the fact that the previously detected significant negative tendencies should be false because the methods do not consider the autocorrelation in the data. As an outlook, the time series assessed in the study should be taken into account in climate studies dealing with low-frequency signals (Sen and Kern 2016).