A statistical approach for reconstructing natural streamflow series based on streamflow variation identification

Natural streamflow reconstruction is highly significant to assess long-term trends, variability, and pattern of streamflow, and is critical for addressing implications of climate change for adaptive water resources management. This study proposed a simple statistical approach named NSR-SVI (natural streamflow reconstruction based on streamflow variation identification). As a hybrid model coupling Pettitt’s test method with an iterative algorithm and iterative cumulative sum of squares algorithm, it can determine the reconstructed components and implement the recombination depending only on the information of change points in observed annual streamflow records. Results showed that NSRSVI is suitable for reconstructing natural series and can provide the stable streamflow processes under different human influences to better serve the hydrologic design of water resource engineering. Also, the proposed approach combining the cumulative streamflow curve provides an innovative way to investigate the attributions of streamflow variation, and the performance has been verified by comparing with the relevant results in nearby basin.


GRAPHICAL ABSTRACT INTRODUCTION
Water resources management and hydraulic engineering design closely depend on the quality of hydrologic data used in plan and design. The data series is required to belong to a single statistical population, which is called the assumption of stationarity (Xiong & Guo ). Unfortunately, the requirement is often difficult to be met due to the non-stationarity in hydrological series which is triggered by climate change and large-scale human activities (Seidou & Ouarda ). The non-stationarity could make the current and future streamflow be different from the historical streamflow employed in designs, implying that the original design, operation, and management strategies of water resources projects and river ecological protection may no longer be appropriate in the current changing environment and may consequently impose a greater risk. As reported in the literature, many rivers in the world have been greatly altered by water resources projects to control flow for meeting human needs (Naiman et al. ). Some observed hydrological data from various countries and regions have demonstrated significant inconsistency or non-stationarity, which is influenced by water infrastructure, channel modifications, drainage works, landcover change, and land-use change (Milly et al. ). This implies that the risk from river streamflow variations may be indistinguishable in today's world and makes a request for natural streamflow and stable streamflow series under different human-impacted environments in the engineering design and water management.
The variation in streamflow is often exhibited by a regime shift, known as a break, abrupt change, discontinuity or inhomogeneity in different areas, meaning a shift of the flow system from one regime to another, and the location of the regime shift in time is called the change point at which the parameters of the underlying distribution or the parameters of the model used to describe the time series abruptly change (Beaulieu et al. ). The changing parameters, including mean, variance, trend (regression), intercept, frequency, correlation coefficient, system information, and combinations thereof, are summarized by Beaulieu et al. () and are also regarded as types of abrupt shifts. As the types of an abrupt shift are complex and interwoven, it may be difficult to identify change points with one method (Zhang et al. ). Indeed, different methods may be required for different climate/hydrological elements or the same climate/hydrological element at different time scales. However, different methods may yield conflicting conclusions when applied to the same series. Thus, a need has arisen for a careful discussion and comparison of these methods to offer general guidance for With this consideration in mind, the objectives of the present study are to (1)

Study area
Two basins, that is, the Mahuyu River basin and the Tuwei River basin in the core of the Loess Plateau, China, were selected (Figure 1(a)). Therein, the Mahuyu River basin without trend or abrupt change in rainfall was chosen as the target basin to test the hybrid model, and the Tuwei River basin was used as a case area to verify the applicability of the proposed model.
The Mahuyu River, a secondary order tributary of the Yellow River, originates from the Naopan Mountain in Hengshan County, Shaanxi Province, covering an area of 372 km 2 (Figure 1(b)). It has a total length of 41.8 km from the estuary, with an average slope of 5.9. The Mahuyu River basin is characterized as having steep hillslopes with incised channels. The basin is exposed to an extratropical semiarid continental monsoon climate. The average annual precipitation in the catchment varies between 200 and 500 m of which 75% occurs during the flood season from June to September. Mahuyu gauge is the catchment outlet.
The Tuwei River, a branch of the Yellow River, flows from the northwest to the southeast (Figure 1(c)). The river has a total length of 139.6 km and covers an area of 3,253 km 2 (Wang et al. ). The upper-middle reaches of the river are the sandy shoal region and loess hilly-gully region that are typical landscapes in this basin. The annual precipitation is 402 mm, and the pan evaporation is 1,853 mm (Yang et al. ). Gaojiachuan gauge is the control gauge of the Tuwei River basin.
The features of the two basins, common in the Loess Plateau, such as sparse vegetation and loose soil, make serious soil erosion (Huang et al. ). To prevent soil erosion, many check-dams have been built in above basins. However, the rivers are intercepted by the check-dams and the water is retained in the reservoirs, which lead to continually decreasing streamflow in the downstream channel. It is well known that the Loess Plateau is subject to severe water resource shortages and fragile ecological environment (Li et al. ; Zhao et al. ). The abrupt changes occurring in the streamflow series could have a serious influence on water security and environmental protection in this area. From this perspective, this study provides a new statistical method for restoring the natural streamflow and stable streamflow series under certain human-impacted environments to serve for water resources management and ecological protection in the Loess Plateau.

Data
The time series data of regional average annual rainfall covering where sgn is a sign function and sgn(x t À A 'downward shift' in the level from the beginning of the series is indicated by a large K þ t,N ¼ max 1 t<T U t,N (K þ t,N notes positive K t,N ), and an 'upward shift' is indicated by a large K À t,N ¼ Àmin 1 t<T U t,N (K À t,N notes negative K t,N ; Kropp & Schellnhuber ). The change point of the series is located at K t,N if the significance probability p is equal to or greater than 0.95.
Pettitt's test is always considered a good exploratory tool for detecting change point because it requires no assumption about the distribution of data; because it is not sensitive to outliers and skewed distributions (Xie et al. ), it has some limits for application in hydrology. For example, the test works well for a single change point detection, and the assumption of independence or lack of serial correlation should be met before the test is used for the detection of change point (Busuioc & von Storch ). In addition, the test often fails or is invalidated when less hydrological or climate time series are used for testing.
Thus, the over-whitening procedure was employed in this

ICSS for detecting change points in variance
The ICSS algorithm was proposed by Inclan & Tiao (), which can detect multiple breakpoints in variance for a time series and is often used to detect multiple shifts in climatic and hydrological data (Yang et al. ). This algorithm includes two procedures: the centered cumulative sum of squares (CCSS) and the iterative procedure. The CCSS is regarded as the test statistics D k in this algorithm to estimate the number of changes, and the point in time of variance shifts and is calculated as follows: where N is the length of the series x(t  Figure 3: Step 1: The observed streamflow series is divided into trend and residual series by piecewise linear regression, and the partial autocorrelation coefficient function (PACF) test is implemented on the residual. If the result is significant, the segment needs to be done based on change points in variance obtained by ICSS. Otherwise, the observed series is independent and meets the requirement of Pettitt's test.
Step 2: The segment O-W procedure is employed to remove the serial correlation in the segmented original series, and the significant autocorrelation-removed series (SAR series) can be achieved.
Step 3: Depending on the change point(s) in mean detected by Pettitt's test together with the BS iterative algorithm, the SAR series is segmented, and the segmental mean is obtained. Then, the segmental mean is removed from the SAR series, and a new series with a mean of 0 is generated, namely, a mean-removed series (MR series).
Step 4: The standard deviations of segmental data before and after the change point in the variance are calculated in the MR series. And the ratio of standard deviations before and after the point is counted, by which it can be determined whether the latter part should be magnified or minified.
Then, a reconstructed sequence with 0 mean, or rather, a change in variance-removed sequence (CVR series) can be obtained, in which the latter segment has the same standard deviation as the front part of the series.
Step 5: The natural streamflow or stable streamflow under different environments can be achieved by combining the mean of the SAR series during certain periods and the CVR series.

Removal of serial autocorrelation
The autocorrelation in the streamflow series was detected by a PACF which can provide lag-n autocorrelation coefficients without interactive influences. The lag-n autocorrelation coefficients of the residual series before and after the autocorrelation removal are presented in  (Figure 4(a)). Thus, the data series needs to be

Change point detection in mean
Pettitt's test method coupled BS iterative algorithm was employed in this study for detecting multiple change points in the mean. The SAR streamflow series at Mahuyu gauge was checked, as shown in Figure 5. Because the ICSS algorithm requires that the series to test had no trend change, the segmental mean was removed from the SAR streamflow series at Mahuyu gauge ( Figure 6(a)). The MR series was used for the change point detection in variance (Figure 6(b)). According to the location of change point in variance, the variation information in variance of the SAR streamflow series can be obtained, as shown in Figure 8(a). Then, variance magnification was implemented in the latter part, and a CVR series was generated (Figure 8(b)).

Reconstruction of the natural streamflow
Three base periods representing different levels of human impact were determined depending on the detected change points in mean, that is, 1962-1971, 1971-1997, and 1997-2010, considering     was selected in this study (Figure 9(a)), and mean of streamflow during this period was combined with the CVR series to reconstruct the natural streamflow in 1962-2010, namely NSR-SVIed streamflow, as shown in Figure 9(b).

Model verification
To verify the reliability of the reconstructed result in statistic, the Kolmogorov-Smirnov test was introduced. The result showed that the latter segment data (after 1971) were not significantly different from the front data, indicating that the reconstructed natural streamflow data had the same distribution and satisfied the consistency or stationarity assumption required in the hydrological design.
In addition, the precipitation-runoff scatter diagram was employed to compare, and that of the measured and NSR-SVIed streamflow series is drawn together in Figure 10. As is well known, the violin plot, just like the boxplot with kernel density plot, has an advantage in revealing the distribution of data. Figure 11 shows the violin plots of two natural streamflow obtained by the NSR-SVI approach and precipitation-runoff model (P-R). It can be seen from the figure that the proposed method resulted in the similar distribution with the P-R model, implying the statistical approach depending on flow variation has a satisfactory performance in natural streamflow reconstruction with the P-R model.
The above triple comparisons demonstrate that the proposed statistical method depending on flow variation is  suitable for the reconstruction of natural streamflow series, and can serve the hydrologic design of water resource engineering. In application prospects, it can serve as a feasible method for areas where the hydrological models, restoring the water volume, etc. cannot be used due to the lack of data, and may also be considered as a cross-validation approach employed in a basin with sufficient data.

Model application
As mentioned above, the NSR-SVI method has been And the latter PV m -impacted streamflow was redressed by the equation of MR 2 ¼ MR 1 *MP 2 /MP 1 in Figure 12  streamflow, and the streamflow under precipitation variation, that is, PV-impacted streamflow, can be constructed, as shown in Figure 12(e). To verify the validity of PV-impacted streamflow, the precipitation-runoff scatter diagram was employed and the result is shown in Figure 13.  1978, respectively. Thus, the above analysis indicates that the calculated PV-impacted streamflow is reasonable and the method used is feasible.
To further assess the contribution of precipitation variation on the streamflow decrease, the cumulative streamflow curve, a simple but comprehensive graphical method, was applied in this study. Figure

CONCLUSIONS
In this study, a new statistical approach depending on the conjunctive use of the over-whitening procedure, Pettitt's test method, and ICSS algorithm, namely NSR-SVI, was proposed to detect streamflow variation (i.e., abrupt changes in mean and variance) and to reconstruct natural streamflow.
Results showed that (1) the segmented over-whitening procedure was applicable to remove the serial correlation in hydrological time series without significant harm to the trend component; (2) the Pettitt's test method coupled the iterative algorithm and ICSS accurately detected multiple change points in mean and variance and gave the locations; (3) the NSR-SVI approach only depending on flow variation, was suitable for the reconstruction of natural streamflow series, and can provide a series of stable streamflow under various anthropogenic interferences, so as to  better serve the area where the hydrological models, the restoring water volume, and rainfall-runoff model cannot be used due to the lack of data. Also, it can be used as a cross-validation approach for hydrologic design in a basin with abundant data.
Additionally, the joint application of NSR-SVI with cumulative curve was investigated in this study to quantify the contribution of precipitation variation and anthropogenic interference to streamflow decrease in the basin. The comparison with near Kuye River basin indicates the quantified contributions of precipitation variation and anthropogenic interference were 44 and 56% in the Tuwei River basin, consistent with the results in published literature, implying the proposed NSR-SVI approach combining the cumulative streamflow curve can provide an innovative way to investigate the attributions of catchment streamflow variation.