On Certain Stationarity Tests for Hydrologic Series

Abstract The main aim of the article was application of some statistical tests for investigation of weak stationarity of hydrologic time series. The tests were applied to mean monthly flow and maximum annual flow on three rivers: two Polish and one American river. Firstly, the modified Mann–Kendall test for autocorrelated data was used to detect trend. After detrending we used “unit root tests” based on the DF test and “stationarity tests” based on the KPSS test. The tests were investigated and compared in some aspects: analysis of residuals, application to seasonal series, AIC and Schwarz values.


INTRODUCTION
A time series {X t , t  T}, T = {1, 2, ...} is weak stationary if its mean is constant and the autocovariance function between two variables in the series depends only on the time distance between them.Weak stationarity is an important feature because the basic properties of the stationary series do not change with time.On the other hand, nonstationarity demonstrates certain "irregularity", which can be the result of "man's activities in the river basin or nature's large accidental or slow modifications" (Chow [8]).For weak stationary series the spectral theorem can be adapted because every weak stationary process X t generates a linear space, which is a Hilbert space of type L 2 .This facilitates the description of the time series in the frequency domain.The problem of weak stationarity and nonstationarity has long been known in hydrology.For example, Rao and You [29] proposed application of the spectral methods to detect nonstationarity of some rivers in the USA, Wang et al. [33] analyzed trend and stationarity of rivers in western Europe, Clarke [9] discussed variability for some South American drainage basins.In Polish textbooks this problem has been addressed, for example, by Banasik and Byczkowski [2] and Węglarczyk [35].
We present some statistical tests for investigation of weak stationarity.The first group of tests called "unit root tests" is based on the Dickey Fuller test, whereas the other on the Kwiatkowski-Phillips-Schmidt-Shin test.The construction of the tests enables an insight into the structure of the series.
Nonstationarity of hydrologic series can take various forms.One of them, trend, can be identified by using the known Mann-Kendall test or its modification.When analyzed series are seasonal, they manifest also nonstationary behavior.One of the presented tests will eliminate seasonality.
Another problem that often arises in hydrologic series is nonhomogeneity of the sample.Homogeneity is understood, as suggested by Węglarczyk [35], as invariability of some feature of the population the sample comes from.This feature may be understood as: conditions of experiment (genetic homogeneity) or some characteristics or the distribution of the random variable (statistical homogeneity).Weak stationarity is hence a kind of homogeneity in which the first two moments and covariance function do not change with time.
In the paper, there are investigated mean monthly flow time series on the Dunajec and the Nida Rivers in the period from November 1960 to October 1983 and on the Red River in the period from June 1901 to September 2010.We also consider maximum annual flow on the Dunajec, the Nida  and the Red River .The Red River flow data come from the USGS (National Water Information System) database available on the web site.
The tests presented in the article may also be used if annual maximum flow time series is genetically heterogeneous and the sample should be divided into two separate samples, the first with 'winter' and the second with 'summer' maxima.Then the method of alternative events presented in Ozga-Zielińska et al. [28] should be used for each sample before investigation of stationarity.

BASINS
The Nida River is a river in central Poland, a tributary of the Vistula River.It has a length of 151 kilometers.The river catchment has the area of 3352 km 2 to the analyzed Pińczów cross-section.To the first half of the previous century it created a unique multichannel system called "an inland delta", with many rare plants and animal species.In the 1980's, due to anthropogenic factors, the old channels disappeared and many wetlands vanished.The retention capacity of the catchment has dramatically decreased (Strużyński [32]).
The Dunajec River flows in southern Poland.It is a 247 km long, right tributary of the Vistula River.The catchment area to the Nowy Targ-Kowaniec cross-section is 681 km 2 .Great land slope on poorly permeable flysch parent rock causes quick runoff and high waves.The part of the river to the Nowy Targ-Kowaniec cross-section has a great flood potential.
The Red River of the North (North Dakota/Minnesota) is a North American River.It is about 885 kilometres long.The river caused a number of catastrophic floods in 1950, 1997 and 2009.It is vulnerable to flooding because of many ice jams in springs and differences in the slope.The total drainage area is 17400 km 2 .The data come from cross-section at Fargo (USGS 05054000).

DEFINITIONS AND METHODS
A process X t is weak stationary (stationary in a wide sense) if the mean value of X t is constant and the autocovariance function between X t and X s depends only on the difference s -t A typical example of a weak stationary process is white noise  t , independent, identically distributed random variables with zero mean and constant variance.On the other hand, the process X t = X t-1 +  t (random walk) is not stationary because its variance grows with t.Also the process that stays above or below the mean for long periods and the process that is heteroscedastic are both nonstationary.
We assume the basic model as where tr t X is a deterministic trend and Z t is a stochastic (noise) component.The most popular postulate about the form of tr t X is a linear dependence on t, tr t X = at + b.Let the lag operator B be defined, according to Box and Jenkins, as BX t = X t-1 .Denote also two differencing operators: one-lag differencing  = 1 -B and seasonal differencing  S = 1 -B S , (S > 1), In the more general case, the stochastic component is AR( p): (B)Z t =  t where (B) is a polynomial of order p in the lag operator B. We will consider also (B) = (B S ) * (B), where  * ,  are polynomials of degrees p 1 , p 2 , respectively.Such a particular process we denote SAR(p 1 , p 2 ).Let us remind that process X t is called integrated of order (d, D) denoted is weak stationary.We shall distinguish trend-stationarity, difference-stationarity and seasonal difference-stationarity of the process X t .In the trend-stationary model the process Z t must be stationary.This happens if all roots of  lie outside the unit circle because in this case the polynomial  is invertible.This follows from the fact that the operator (B) in the Hilbert space } , { T t X t  is invertible if all roots of the characteristic equation (x) = 0 lie outside the spectrum of the operator.For the operator (B) the spectrum is a subset of the unit circle and Z t , as a linear operator on white noise  t , is stationary.When some roots lie on the unit circle, we have a particular form of nonstationarity, reffered to as the unit root problem.In the difference-stationary model the polynomial  has one unit root and all other roots outside the unit circle.Then Z t is stationary.In the seasonal difference-stationarity the polynomial  has S unit roots.Then  s Z t is stationary.It should be noted that there are various methods of investigating seasonal series.In the above method, called seasonal adjustment, the series is seasonal differenced.In other methods the deterministic component is a sum of seasonal components or a combination of sines and cosines.
Let x 1 , ..., x n be the values of the sample.The classical statistical technique to determine the dependence among the successive values of the series is the use of the autocovariance function R(k) with the estimator r k and the autocorrelation function with the estimator ACF.In a weak stationary series they depend only on k.The estimators are consistent and asymptotically unbiased and have the form A similar role is played by the partial autocorrelation function (PACF).It describes the direct dependence between x i and x i+k , excluding intermediate values.
For all the tests considered in the paper, we fixed the significance level at 0.05.

PRELIMINARY ANALYSIS OF THE SERIES
Analysis of the autocorrelation functions is the initial procedure in the process of investigation.Figure 1 visualizes the estimators ACF and PACF of mean monthly flow on the Dunajec.The Ljung-Box (LB) test was applied to determine which autocorrelations are significant.Each series possessed 12-months seasonality.For the Nida Fig. 1.The ACF and the PACF of the mean monthly flow on the Dunajec river and the Red River the functions have similar shape.On the Dunajec and the Nida there was also a 6-month negative correlated dependence that we could not observe on the Red River. Figure 2 presents the ACF and PACF functions of maximum annual flow on the Red River.The lags with columns over the straight line are significant.Both functions demonstrate weak dependence in the series.For the Dunajec and the Nida the results are similar.The next step in the initial analysis was the investigation of the deterministic trend.The purpose was to reveal if the data increase or decrease with time.The problem of detecting trend has received attention during the last several years because of climate change.In hydrology it was studied, for example, by Hirsch and Slack [15], who proposed a seasonal trend test.McLeod and Hipel [26] applied Brillinger's [5] proposition based on spectral analysis, Khaliq et al. [17] investigated temporal changes in hydrological regimes.Among Polish authors Kundzewicz et al. [18,19], used some methods for detecting trend for maximum flow and Węglarczyk [34] discussed detecting trends in annual lake levels.The classical method is the Mann-Kendall test (Mann [25], Kendall [16]).The basic assumption for this test is independence and randomness of the observations.The MK test statistic has the form The variance of S equals Var(S) = .18 In most hydrologic series data the assumption on independence is not fulfilled and then variance Var(S) is underestimated.Bayley and Hammersley [4] proposed correction of the variance for autocorrelated series and considered "effective number of observations".On the basis of their proposition, Hamed and Rao [14] suggested the correction of variance to * * 18 where  S is the autocorrelation function of the ranks of the observations.We applied the modified Mann-Kendall test to mean monthly flow and to annual maximum flow.The correction of variance was done only for the data for which the partial autocorrelation was significant.Then the coefficient * S n n drifted between 1.04 for maximum flow on the Dunajec to 10.5 for mean monthly flow on the Red River.In the mean monthly flow on the Nida and the Dunajec no trend was observed, but there was ascending trend on the Red River.In maximum annual flow on the Dunajec and the Nida the test developed a descending trend, on the Nida stronger than on the Dunajec.The trend for maximum annual flow on the Red River was ascending.

UNIT ROOT TESTS
Let us consider the process that fulfills the equation The behavior of the process depends on .If ||  1 then the process X t is nonstationary (is random walk for  = 1).If || < 1, the process is stationary.This is why the tests based on the above equation are called "unit root tests".However, many hydrological time series have a more complicated structure and the enriched version of (2) should be used.The modified versions of equation ( 2) may contain a constant or a trend in a linear form as deterministic component.
The first unit root test was proposed by Dickey and Fuller [12].They considered three forms of tr t X : as zero, as constant, as linear function on t.This is equivalent to equation ( 2) and, respectively, to the following ones Now, applying equation ( 1), the conditions for the noise function Z t for the above cases can be found.We do it in the most general way (4).Assuming trend as at + b we get from (1) In order to fulfill (4), the process must have  = b + a -b,  = a -a and Z t -Z t -1 =  t .Hence, the DF test assumes the noise function Z t as AR(1) process with (B) = 1 -B.The equation equivalent to (2) is sometimes written as Dickey and Fuller tested the hypotheses If the null is rejected then the process is stationary.If not, the process is integrated in degree greater than zero or is not integrated at all, but this case does not occur in practice.
The maximum likelihood estimator of  is the least squares estimator where   ˆ is the standard deviation of  ˆ.The shape of the limiting distribution and consequently the critical values for the test depend on the form of the trend term.The test is left-sided and the percentiles of the limiting distributions are available, for example, in Dickey and Fuller [12] and MacKinnon [24].In Charemza and Deadman [6], the tables contain two critical values, which determine three intervals: rejecting, accepting and doubtful ones.These tables were applied to our hydrologic series.The DF test was used to mean monthly and annual maximum flow time series.We applied the version with or without trend, depending on the result of the Mann-Kendall test.In all the cases the test rejected the null; the DF test indicates stationarity.To validate the model the residuals have to be checked.They must form a white noise process.Using the Durbin-Watson (DW) statistic and the LB test the autocorrelation of the residuals was investigated.Sometimes they gave, however, contradictory results: the DW test did not indicate the autocorrelation because for all the data considered it equaled about 2, and the LB test showed that the residuals were correlated, especially for lags greater than 1.Charemza and Deadman [6] claim that the DW statistic is biased when the explanatory variable is the delayed response variable.Hence the LB test is more appropriate and we should make use of it.The residuals do not form white noise, hence we come to conclusion that the augmented version of the DF test, with the noise function as AR( p), should be used.The test was proposed by Said and Dickey [30] and is called the Augmented Dickey-Fuller (ADF) test.The differences of X t are represented as The process may contain constant or trend, as in the DF test.The critical values for limiting distribution of the ADF statistic are the same as for the DF test.
Many authors emphasize some disturbances between size and power of the ADF test.The choice of a large value of p causes a loss of power.But if p is too small, the size increases and the null hypothesis is rejected too often.This relationship was considered in Agiakloglou and Newbold [1].The problem of choosing the right p was extensively studied by Schwert [31], Hall [13], Cheung and Lai [7].There are some methods to find the lag length p in practice.We have chosen two criteria.The first is based on the Akaike Information Criterion (AIC) and/or Schwarz Bayesian Criterion (SBC).The other one criterion is that the lag p should be the lowest value, for which the residuals are uncorrelated.This criterion is based on the significance test for residuals: if the residuals are autocorrelated for lag length p, the model is augmented to lag p +1 and the procedure is repeated till autocorrelation vanishes.
The ADF test was applied for monthly river flow and it rejected the null for any p > 1.On the Nida, Dunajec and Red River we observed that if p was less than 12, the residuals were always correlated for lag = 12.This was caused by annual autocorrelation for monthly flow.When p = 13 was chosen, the residuals were not correlated anymore because annual seasonality was adjusted.The values for AIC and SBC were also lowest for p = 13 than for any p < 13.Hence the choice of p = 13 for the ADF test in the case of monthly flow seems to be reasonable.The three series were stationary as treated with noise function AR (13), however the Red River was trend-stationary.
For maximum annual flow on the Nida the DF test rejected the null, but the residuals indicated the autocorrelation of degree 2 and the model was inappropriate.The ADF test did not reject the null for any p > 1.For the Dunajec the results were similar.This implies that the maximum flow on the two rivers is not stationary.Then the first differences of the maximum flows were investigated.Both the DF and ADF tests indicated stationarity, hence each process had a unit root.For the Red River the DF test rejected the null, but the residuals were correlated.Then the ADF test was applied for various p > 1 and the autocorrelation vanished for p = 3. Hence maximum flow time series on the Dunajec and the Nida are not stationary and on the Red River are trendstationary.The noise process is AR(3) for the Red River.
Another test, based on DF tests, takes seasonality into account.It was proposed by Dickey, Hasza and Fuller [11] and is called DHF test.It tests whether the process is seasonal difference-stationary.The hypotheses for the model The DHF statistic is calculated analogously to the DF statistic.The test with S = 12 was applied to mean monthly flow and indicated stationarity for all three rivers.
We compared the AIC and SBC values when the ADF and DHF tests were used.They both were lower for the DHF test because it uses less parameters.

STATIONARITY TESTS
In the stationarity test, proposed by Kwiatkowski, Phillips, Schmidt and Shin [ where D t is deterministic and V t ,  t are independent stationary processes with zero mean.Under the null, the variance of V t must be zero.Under the alternative, the variance of V t is greater than zero If H A is true then the process W t is integrated and the sum W t +  t is the sum of an integrated and stationary components.The KPSS test uses the term of long-run variance.It checks if the autocorrelations decline at a hyperbolic rate to zero because then the process is stationary.The statistic is related to the short/long memory property.The problem of short/long memory is beyond the subject matter of this article and should be discussed in reference to analyzed flows in a separate study.The test is right-sided and the asymptotic critical values are given in [21].The process of verification is as follows.Firstly, the mean (or the trend) is extracted from the data x t and using the residuals z t the partial sum process S t and the autocorrelation-consistent long-run variance estimator s 2 ( p) are constructed, with the weighting function of the sample covariance k j (p) = 1 -1  p j .The truncation lag parameter was proposed by Schwert [31] as p = int , 100 where k  {4, 12}.
Secondly, the test statistic is calculated.It is given by KPSS = .) ( For the mean monthly flow time series on the Dunajec and the Nida the proposed truncation lags were p 1 = 5 and p 2 = 15.In all the cases the test did not reject the null hypothesis.This confirms stationarity.For the flows on the Red River, the parameters were p 1 = 7, p 2 = 22.The test applied for both p 1 and p 2 rejected the null, hence the series is nonstationary.For maximum annual flow on the Dunajec and the Nida the truncation lags were p 1 = 2 and p 2 = 8.For the Dunajec the test rejected the null and the series was nonstationary.For the Nida the result was similar, but the test statistic was on the level of rejection.For the Red River we obtained p 1 = 4 and p 2 = 12, but the test did not reject the null.Hence, the maximum flow both on the Nida and on the Dunajec is nonstationary and on the Red River is stationary.
The unit root tests and stationarity tests complement each other.For mean monthly flow on the Dunajec and the Nida and maximum annual flow on all the rivers the KPSS confirmed stationarity.But for the mean monthly flow on the Red River the results are contradictory, the ADF and DHF tests indicated, while the KPSS test neglected stationarity.
Hence, for the mean monthly flow on the Red River one more test should be used.We applied the test proposed by Leybourne and McCabe [22], called the LMC test.The basic equation is similar to (5), with a polynomial of X t with roots outside the unit circle instead of X t .The test is constructed similar to the KPSS test, but the authors considered another approach of the long-run variance.The implementation of the LMC test was used to the mean monthly flow series on the Red River according to Leybourne and McCabe [23].The LMC statistic has the same asymptotic property as the KPSS, hence the critical values are the same.The LMC test also rejected the null hypothesis on stationarity.
Demetrescu and Hassler [10] investigated the distribution of the KPSS statistic.They justified that it depends on the way the long-run variance is estimated.They recommended also the version of the KPSS test with seasonal dummies as being best sized among stationarity tests when applied to seasonal series.The discussion of this issue is wide and requires individual study.
In Table 1, the results of the ADF, DHF, KPSS and LMC tests for mean monthly flow are presented.If the test suggested stationarity, it was marked by "+" and by "-" in the opposite case.For the Dunajec and the Nida the versions with const and for the Red River with ascending trend were used.We see that the application of the "unit root tests" and "stationarity tests" to mean monthly flow on the Red River gave contradictory results.This might be caused by size distortions of the KPSS and LMC tests.This problem was discussed, for example, by Muller [27] and Kurozumi [20].They stated that in the presence of high autocorrelation the two tests reject the true null too often.The reason lies in difficulties in distinguishing between a serially correlated stationary time series and a unit root process.Analogous results are gathered in Table 2 for maximum annual flow.In this case all tests gave similar results.The results of the DF test were given in the tables because models were inappropriate.

CONCLUSIONS
If the series exhibits seasonality, both the ADF and DHF unit root tests presented may be used, but better properties has the DHF test.It is dedicated to seasonal series and has lower AIC and Schwarz values.
The application of the KPSS test to seasonal case led to distorted result.Its version with seasonal pattern should be analyzed and used, which is planned in a separate discussion.
When no seasonality occurs, for example, for maximum annual flow, the ADF and KPSS tests gave similar results.
The nonstationarity of maximum annual flow on the Dunajec and the Nida indicates that the noise process Z t contains another deterministic or stochastic pattern, for example, stronger trend.Its detailed identification is the task for the future.The observed nonstationarity, that occurs besides trend, indicates negative natural or anthropogenic influence on the hydrologic systems of the rivers.The descending trend on the Nida River confirms decreasing in the retention catchment, discussed by Bartnik et al. [3].

Fig. 2 .
Fig. 2. The ACF and the PACF of the maximum annual flow on the Red River

T a b l e 1 Results
21]and called the KPSS test, the hypotheses are formulated as