Wavelet regression: An approach for undertaking multi-time scale analyses of hydro-climate relationships

Graphical abstract


Wavelet analysis
Wavelet transformation has been shown to be a powerful technique for characterizing the frequency, intensity, scale, and duration of variations in hydro-climatic process [1][2][3]. Wavelet analysis can also reveal localized time and frequency information without requiring the signal time series to be stationary, as required by the Fourier transform and other spectral methods [4].
The first task of WR is to approximate the variation patterns of a hydrological variable and its related climatic factors by using wavelet decomposition and reconstruction at different time scales.
The principle of wavelet decomposition and reconstruction is as follows [5,6]. Considering a given signal X(t), such as streamflow, temperature and precipitation, etc., which can be built up as a sequence of projections onto Father and Mother wavelets indexed by both k {k = 1, 2, . . . . . . } and s {s = 2 j , j = 1, 2, . . . . . . }. The coefficients in the expansion are given by the projections where J is the maximum scale sustainable by the number of data points, F j;k t wavelet, and C j;k t ð Þ ¼ 2 À j 2 Cð tÀ2 j k 2 j Þ is mother wavelet. Generally, father wavelet is used for the lowestfrequency smooth components, which requires wavelet with the widest support; mother wavelet is used for the highest-frequency detailed components. In other words, father wavelet is used for the major trend components, and mother wavelet is used for all deviations from the trend [6,7].
Once a mother wavelet is selected, the wavelet transform can be used to decompose a signal according to scale, allowing separation of the fine-scale behavior (detail) from the large-scale behavior (approximation) of the signal [6,7]. The relationship between scale and signal behavior is designated as follows: a low scale corresponds to compressed wavelet as well as rapidly changing details, namely high frequency, whereas a high scale corresponds to stretched wavelet and slowly changing coarse features, namely low frequency. Signal decomposition is typically conducted in an iterative fashion using a series of scales such as a = 2, 4, 8, . . . . . . , 2 L , with successive approximations being split in turn so that one signal is broken down into many lower resolution components [6].
The representation of the signal X (t) now can be given by: where S J ¼ P k s J;k F J;k ðtÞ and D j ¼ P k d j;k C j;k t ð Þ; j ¼ 1; 2; . . . ; J. In general, we have the relationship as where {S J ,S J-1 , . . . , S 1 }is a sequence of multi-resolution approximations of the function X(t) at everincreasing levels of refinement. The corresponding multi-resolution decomposition of X (t) is given by . Selecting a proper wavelet function is a prerequisite for wavelet analysis. The actual criteria for wavelet selection include self-similarity, compactness, and smoothness [8,9]. Choosing the Symmlet family [6], we experimented with alternative choices of scaling functions, and found that the results from 'Sym8' are robust. Therefore, 'sym8' is used for approximating the variation patterns of the hydrological variable and its related climatic factors at different time scales.

Regression analysis based on the results of wavelet analysis
The second task of WR is to fit the regression equation describing the hydro-climate relationship between a hydrological variable with its related climatic factors for each pattern at the chosen time scale based on the results of wavelet analysis.
Because the hydrological variable (e.g. streamflow or groundwater table, etc.) is affected by climatic factors (e.g. temperature, precipitation, etc.), we generally suppose the hydrological variable as dependent variable, Y, and climatic factors as independent variables, X 1 , X 2 , . . . , X k . The linear regression equation is as follows: where, b 0 is a constant, and b 1 ; b 2 ; . . . :; and b k are partial regression coefficient, which can be fitted by the Least Squares [10]. The significance of the regression Eq. (4) should be tested by the F-test with a significance level [11].
If the linear regression equation cannot well describe the hydro-climate relationship between a hydrological variable with its related climatic factors, we must fit a nonlinear regression equation instead.
Using the above approach, we can fit a most suitable regression equation for each pattern at the chosen time scale to describe the hydro-climate relationship between a hydrological variable with its related climatic factors based on the results of wavelet analysis. We called the above regression equation based on the results of wavelet analysis as wavelet regression equation (WRE).

The test of wavelet regression equation
We suggest to use the coefficient of determination and Akaike information criterion (AIC) to test the fitting effect of the above WRE.
In order to identify the uncertainty of the wavelet regression equation for a given time scale, the coefficient of determination, also known as the goodness of fit, was calculated as follows [11]: where R 2 is the coefficient of determination;Ŷ i and Y i are the simulate value by the WRE and observed data of the hydrological variable; Y is the mean of Y i (i ¼ 1; 2; . . . . . . ; n); RSS ¼ P n i¼1 ðY i ÀŶ i Þ 2 is the residual sum of squares; TSS ¼ P n i¼1 ðY i À YÞ 2 is the total sum of squares.
The coefficient of determination is a measure of how well the simulate results represent the actual data. A bigger R 2 indicates a higher certainty and lower uncertainty of the WRE.
To compare the relative goodness between different WREs, we also used the measure of Akaike information criterion (AIC) [12]. The formula of AIC is as follows: where k is the number of parameters estimated in the model; n is the number of samples; RSS is the same as in formula (5). A smaller AIC indicates a better goodness of the WRE [12]. For small sample sizes (i.e., n/k 40), the second-order Akaike Information Criterion (AIC c ) should be used instead where n is the sample size. As the sample size increases, the last term of the AIC c approaches zero, and the AIC c tends to yield the same conclusions as the AIC [13].

An application case
As known, the observation data from hydrological and climatic stations always present the stochastic and non-stationary characteristic. How can we reveal the patterns hidden in the stochastic and non-stationary data? Several application cases [14][15][16][17] in Northwest China have proved that the WR is effective approach, which present a good performance. Fig.1 shows the observation data of annual average temperature (AAT), annual precipitation (AP) and annual runoff (AR) in the Yarkand River basin of Northwest China. It is evident that all the data series of AAT, AP and AR are fluctuating, and difficult to identify the patterns hidden in the raw data. In order to uncover the patterns hidden in the raw data, we now analyze hydro-climate relationships at different time scales using the wavelet regression.
The five scales of time are designated as s 1 to s 5 , the Fig. 2(a) presents five variation patterns of AR. The s 1 curve retains a large amount of residual from the raw data, and drastic fluctuations exist in the study period. These characteristics indicate that, although the runoff varied greatly throughout the study period, there was a hidden increasing trend. The s 2 curve still retains a considerable amount of residual, as indicated by the presence of 4 peaks and 4 valleys. However, the s 2 curve is much smoother than the s 1 curve, which allows the hidden increasing trend to be more apparent. The s 3 curve retains much less residual, as indicated by the presence of 2 peaks and 2 valleys. Compared to s 2 , the increase in runoff over time is more apparent in s 3 . Finally, the s 5 curve presents an ascending tendency, whereas the increasing trend is obvious in the s 4 curve. Fig. 2(b) and (c) present five variation patterns of AAT and AP, which show the similar variation patterns to AR at the five scales of time.
Based on the results of wavelet analysis, five linear regression equations to describe the hydroclimate relationships between AR with AAT and AP were fitted at the five scales of time (Table 1), which show multi-time scale responses of annual runoff to regional climatic change in the Yarkand River basin of Northwest China.
Table 1 tells us that the significant level of wavelet regression equations (WREs) at the five scales of time from s 1 (2-year scale) to s 5 (32-year scale) achieved as high as α = 0.001, and the WRE at 1-year scale are also achieved the significant level of α = 0.01. By comparing their AIC values, we also compare the fitting effects of WREs at different time scales. The order of fitting effect of WREs at different time scales is as follows: the fitting effect of the WRE at s 5 (32-year scale) is the best, that at s 4 (16-year scale) is second, that at s 3 (8-year scale) is third, that at s 4 (4-year scale) is fourth, that at s 1 (4-year scale) is the penult, and that at s 0 is the worst.

Summary
Combining wavelet analysis and regression method, we developed an integrated approach, the wavelet regression (WR), which can be used to show the multi-time scale responses of a hydrological variable to climate change. The principle of the approach is that the wavelet analysis is used to reveal the variation patterns of a hydrological variable and its related climatic factors at different time scales, and then the regression method is used to show the hydro-climate relationship between the hydrological variable with its related climatic factors for each pattern at the chosen time scale. To illustrate the application of the approach, the hydro-climate relationships between annual runoff (AR)

Additional information
Previous studies showed that hydro-climate processes are complex systems [18][19][20][21][22], and the observed data of climate and hydrology are non-stationary and stochastic [23][24][25][26]. How can we find out the patterns hidden in the stochastic and non-stationary data? Multi-time scale analysis is an approach to attempt [27,28].
To show the multi-time scale responses of a hydrological variable (e.g. runoff, evaporation, or groundwater level, etc.) to climate change, we developed an integrated approach by combining wavelet analysis and regression method, which is called wavelet regression (WR). The main idea of the approach is that the wavelet analysis is used to reveal the variation patterns of a hydrological variable and its related climatic factors at different time scales, and then the regression method is used to show the hydro-climate relationship between the hydrological variable and its related climatic factors for each pattern at the chosen time scale. The procedure of this approach is shown as Fig. 3, and the effectiveness of the approach has been verified in some case studies [14][15][16][17][29][30][31][32].

Conflicts of interest
The author declares that they have no conflicts of interest.