DETECTION OF NONSTATIONARITIES OF SEVERAL SMALL CZECH RIVERS BY STATISTICAL METHODS

The paper presents some results of a research devoted to a statistical analysis of discharges in pilot catchments in the Czech Republic with the aim to detect changes in their stochastic behaviour that might be caused by climate changes. The nonstandard statistical methods are applied to detect changes in annual means as well as in seasonal behaviour of runoffs series.


INTRODUCTION
An impact of global warming on hydrological variables is often discussed in papers on climate change, see [1] or [2] among others.Many papers present a prediction of future discharges for large European rivers using climatological and hydrological models, see e.g.[3], [4].[5] and [6].The same approach for several Czech rivers has been applied by [7].Results obtained by hydrological models are indeed interesting but we may also be interested in the question whether the real discharge series exhibit some changes that might be attributed to climate change.To answer this question historical records are analysed by statistical methods, see for example [8] or [9].In last twenty years many new statistical procedures known under the name statistical change-point detection methods have been developed for studying stationarity of time series and applied to climatological data, see e.g.[10] or [11].
Studying stationarity of discharge series by statistical change point detection methods has a long history.One of the first applied papers in this field was devoted to detection of change in the mean of the Nile river annual mean discharges, see [12].Nevertheless, applications of statistical procedures for detecting changes in stochastic behaviour of temperature series are much more frequent than applications studying homogeneity of discharge series.A large variability of discharge series in comparison with their relatively shorter length may cause that the statistical procedures are not able to detect trends or changes even if there might be some, see [13] and [14].Moreover, there are only very few streams not affected by construction of reservoirs or dams.
The usual aim of a statistical analysis is to detect trends and changes in annual or monthly means time series, while statistical inference based on daily means is less frequent.However, in case we are interested in changes in extremes (floods or draughts) or changes of seasonal behaviour, then an analysis of daily values is necessary.The aim of our paper is to apply changepoint methods suggested by [15] to detect changes in stochastic behaviour of runoffs of several Czech rivers.
In our pilot study we analysed annual as well as daily mean discharge series of several small Czech rivers that are practically not affected by human activity.In such a case the changes detected in their stochastic behaviour are most probably caused by climate change.The data were analysed from a hydrological point of view by [16], [17], [18] and [19] in the scope of a grant KLISPO, supported by the National agency of the research in Agriculture, Ministry of Agriculture, Czech Republic (2007)(2008)(2009)(2010)(2011).The research project KLISPO was mainly focused to following three questions:question of the reliability of supply purpose of water management reservoirs under new climate conditions,question of contrary requirements of supply and retention (flood protection) purposes of reservoirs including economical aspects and efficiency,question of operational safety of hydraulic structures and dams in new changed hydrological conditions.In the part devoted to stationarity of studied discharge series the authors conclude that in several cases some important characteristics, e.g.50 years return levels, have changed.However, they also conclude that these changes have no a general trend that entitles us to apply results obtained from one station to all stations or at least to some area.Therefore, they suggest analysing all discharge series separately.Finally, they conclude that a prediction of future behaviour for 50 coming years is not reliable.
The analysed discharge series are presented in Tab 1.The first column gives the name of the river together with the name of the station.The second and the third columns present the time span when the observations were available and the fourth column gives the years when the data were missing.The fifth column shows the number of the years for which the data were analysed.We would like to mention here that we worked with calendar years, i.e., any year started January 1 st and ended December 31 st .Moreover, we omitted the data corresponding to February 29 nd so that we had 365 daily values for every year.Fig. 1 shows the positions of all stations whose discharge series were analysed.
Tab. 1. Name of rivers, stations, beginning and the end of measurement, missing years, number of analysed years.

Change in annual means
Any study of stationarity of discharge series usually starts with an analysis of annual mean runoffs with the aim to detect a possible change in their mean value.The mean of the studied series may change in many ways but we are mainly interested in detection of a monotone trend.In the scope of mathematical statistics decision problems are solved by hypotheses testing.We may start with two most simple tests: a test for existence of a linear trend and a two-sample test for equality of two means.In the second test we split the series into two partsbefore and after a subjectively chosen year and we test whether the mean before and the mean after this chosen year are equal.Observing the values of the annual means ( 1 ), … (  ) the test statistic for the first test has the form , where  ̂ denotes the usual estimate of errors standard deviation based on a residual sum of squares and the test statistic for the second test has a form } 1/2 .Under the null hypothesis claiming that there is not a change, the statistics   and   (k) have a Student distribution with n-2 degrees of freedom.Supposing the two-sided alternative and taking the significance level equal to α, we reject the null hypothesis if the absolute value of the considered test statistic is larger than α/2 % upper quantile, i.e., (1-α/2) % quantile of a Student distribution with n-2 degrees of freedom.The first test has a largest power if there is a linear trend in the mean, while the second one if there is a sudden shift in the mean.On the other hand, the both tests are often able to detect any monotonic trend, but indeed with a smaller power.
As we mentioned before, for a two-sample test we have to decide how to split the series.In our applications bellow we have split the series after the year 1997.If we do not wish to split the series in a specifically chosen year we can take a change-point analysis approach, for more details see [20].In change point analysis we calculate a value of a two-sample test statistic for any time point (for any year that splits the series) so that instead of obtaining one value of a test statistic only, we get a sequence of values of test statistics, i.e., for any possible time point (any year) we have one value.Then our final test statistic is either a maximum, i.e.,     (), or a mean, i.e., ∑    ()/ of two-sample test statistics.Often also a maximum or mean of the weighted test statistics, where the weights are chosen to be wk Exact critical values of the introduced change-point test statistics are unknown but we can use critical values obtained by a so-called permutation principle.Permutation test was suggested by Fisher in 1930 and it has been successfully applied for many two-sample tests.[21] showed that it is possible to apply the permutation test for change-point problems as well.The permutation approach consists in combining two-samples together and randomly permuting the combined sample many times.Then, for every permuted sample a value of the considered test statistic is computed so that a large set of values is obtained where their number corresponds to the number of performed permutations.An α% upper empirical quantile may serve as a α% critical value of the test.

Change in mean annual cycle
It is not only mean of annual mean discharges that may change in time but also a mean of annual cycles.An annual cycle may be represented by a hydrogram of daily mean values for one calendar year, i.e. by a vector of 365 daily values, supposing that the 29 February values were omitted.We may be interested whether a mean vector of these vectors remains the same over the observation period.
The test statistic that may be applied for testing equality of two mean vectors has the form ), and ∑ ̂ is a pooled sample covariance matrix.The test statistic W(k) is a so-called Hotelling test statistic.After a proper standardization it has a F distribution with m and n-2 degrees of freedom with m denoting a dimension of the analysing vectors, for more details see [22].
The problem is highly dimensional and it is difficult to look for a change in at least one of 365 components.Therefore, it is convenient to reduce the dimensionality of the problem by replacing every "year" vector by a smaller number of parameters.[15] recommended replacing the 365dimensional vectors by 2L coefficients of the Fourier series expansion approximation with L smallest Fourier frequencies or by K scores of the method of principal components.Using both methods [15] showed that for European rivers using Fourier series approximation the good choice is to take L =1, 2, 3 (2 L parameters) or to take 6 ≤ K ≤ 12 for the method of principle components.Instead of testing equality of means of two 365-dimensional vectors we test equality of means of two vectors with either 2 L or K dimension.There is another natural method for analysing annual cycles.It replaces a vector of daily averages by a vector of monthly averages.In this case we look for a change in mean of 12dimensional vector and the test statistic W( k ) may be again applied.
We may be interested why the method of the principal components approximation may be suitable for detecting a possible change.The reason is that we consider the linear combination of the original daily values where the "larger" weights are given to the coordinates that have a larger variability and this larger variability may be caused by the change in their mean.This was proved by [23].The method of the Fourier series approximation may be suitable if annual cycles may be relatively well approximated by a Fourier series with the smallest Fourier frequencies.As the annual cycles are almost periodic functions with one or two peaks the approximation by a Fourier series may be quite good.
For the corresponding change-point problem we may use either a maximum or a mean of the statistics {W(k)}, respectively a maximum or a mean of the weighted statistics {wk W(k) } with the weights wk = k (n-k)/ n 2 .In our paper we consider: The approximate critical values and the corresponding p-values may be again obtained by the permutation principle.

Annual means
We have applied three testsa test for existence of a linear trend, a test comparing a mean in the period before the year 1997 with a mean after the year 1997 and a change-point test for detection a change in the mean value.The year 1997 was chosen subjectively to compare the last ten years of measurement with the previous period.Considering a significance level α = 0,05, no linear trend in mean annual averages was detected.Comparing the mean of the period before the year 1997 and after it, an increase of annual mean discharge has been detected for the Mumlava river (p-value is smaller than 0,001), see Fig. 2. If the change-point test has been applied then in addition to the Mumlava river (with all p-values smaller than 0,01) the Kyjovka river exhibits a significant change (a decrease) that started after the year 1971 (with all p-values smaller than 0,01).The Jizerka river is the longest analysed river and the tests showed that the change occurred in the beginning of measurements.Moreover, the p-values for the change-point analysis using all three test statistics (maximum, maximum of weighted statistics and mean of weighted statistics) are between 0,01 and 0,03.It means that choosing α = 0,01 instead of α = 0,05 the null hypothesis would not be rejected.Therefore, we may have some doubts whether there was a change or not.
We conclude that for two rivers (the Mumlava, the Kyjovka) a statistically significant change in mean has been detected at the significance level α = 0,01 and for one river (the Jizera) at the significance level α = 0,05.Fig. 2 represents the behaviour of annual mean discharge series of the Mumlava and Fig. 3 represents the same discharge series of the Kyjovka.

Annual cycles
For the first analysis we again split the series into two parts, i.e., before the year 1997 and after it.We applied three methodsmethod of principal components approximation, method of the Fourier series approximation and the method of monthly means.For the method of the principal components approximation we compared means of K=5 scores, for the method of the Fourier series approximation we compared 2 L=6 Fourier coefficients.Tab. 2 shows the values of the test statistics computed for all three methods together with their p-values computed from the F distribution.We present the results for those rivers where at least one of computed p-values fall under 0,1.We conclude that (with an exception of the Vydra) p-value of at least one of the tests is smaller than 0,05.Moreover, there are only two rivers (the Jizerka, the Morava) where the p value of one test is small while the p values two other methods are relatively large.For all other rivers the results of all tests are in agreement.

Tab. 2. Values of test statistics with corresponding pvalues for all three methods when we split the series in 1997
Interpreting the results of a change point analysis is more difficult.We applied for all three methods three test statistics.Tab. 3 presents a number of rejections at the significance level α = 0,1 together with the upper bound of the pvalues for all three methods for those stations where the null hypothesis of stationarity has been rejected at least once at the significance level α = 0,1.We may again conclude that for the Blanice and the Mumlava stationarity of seasonal behaviour has been clearly rejected.It seems plausible that seasonal behaviour of the Jizerka, the Morava, the Otava and the Vydra is also nonstationary.It is interesting to see that the results of change-point analysis agree quite well with the results of two-sample analysis.That indicates that the statistical significant change in seasonal behaviour most probably occurred around the year 1997 or in the other words seasonal behaviour has changed for many stations in last 8 -12 years of observations.

Tab. 3. Number of rejections at a significance level α=0,1 for all tests and all three methods when
change point analysis has been applied.

BASIC FEATURES OF NON-STATIONARY BEHAVIOR
Unfortunately, the above described statistical methods do not indicate the main causes of potentially nonstationary seasonal behaviour.To understand better these features we present Fig. 4 -9 of the smoothed annual mean cycle before 1997 and after it for all rivers where the null hypothesis of stationarity of a mean annual cycle has been rejected.(We smoothed the means by a kernel smoothing technique using the Epanechnikov window with a bandwith hn =0,05).Despite the differences in seasonal behaviour of all presented rivers, we may see some common features.In the period after 1998 the timing of spring high discharge is shifted to the beginning of the calendar year and moreover the winter runoffs are generally higher than in the period before 1997.
We calculated a set of locations (days) of annual maxima for the period before 1997 and after it.Tab. 4 presents medians, lower and upper quartiles of these two sets.We see that with a very few exceptions all these locations characteristics for a period before 1997 attain larger values.The same is true if we calculated locations of annual maxima for smoothed versions of annual cycles.
Tab. 4. Medians, lower and upper quartiles of locations (days) of spring maximal discharges for the period before 1997 and after the year 1998.For every year and all the rivers we also calculated mean runoffs for the period January-February.Tab. 5 presents medians, upper and lower quartiles for the period before 1997 and after.With some exceptions the statistical location characteristics of "winter runoffs" are for the period after the year 1998 larger than for the previous period.This might be caused by relatively warmer winters in this period.

CONCLUSION
In the last time many hydrologists and climatologists have tried to predict future behaviour of run-offs series with the help of different climate scenarios.Our goal was to perform a statistical analysis of real data, i.e., annual and daily mean values to detect changes in their stochastic behaviour.We studied non-stationarities of several small Czech rivers discharge series using a twosample test for equality of means of variables and vectors and its change point analogue.Stationarity of annual mean discharge series has been rejected for the Kyjovka and the Mumlava.Further, we applied the statistical tests to detect non-stationarities in seasonal cycle described by daily mean run-offs.As the vectors of daily values have too many components we replaced them by vectors of Fourier coefficients, vectors of principal components scores or vectors of monthly means.The tests detected a change in seasonal behaviour of the Blanice and the Mumlava (with very small p values) and of the Jizerka, the Morava, the Otava and the Vydra (with relatively small p values).For the Čeladenka, the Doubrava, the Orlice the non-stationarity of seasonal behaviour is ambiguous.In most cases the change consists in shift of spring peak run-offs and increase of winter (January-February) discharge.
As we could see there is no clear common trend in stochastic behaviour of analysed discharge series.On the other hand it seems that for some small Czech rivers the on-going climate change may affect distribution of discharge during a calendar year.Our analysis also shows that all sites and river profiles in Czech Republic have to be analysed separately as a general scenario does not seem to exist.We hope that our small study pilot study can help for assessing reliability of constructed and prepared flood protection measures.

Tab 5 .
Medians, lower and upper quartiles of mean discharges of January-February for the period before 1997 and after the year 1998.