Identification of long-term trends and seasonality in high-frequency water quality data from the Yangtze River basin, China

Comprehensive understanding of the long-term trends and seasonality of water quality is important for controlling water pollution. This study focuses on spatio-temporal distributions, long-term trends, and seasonality of water quality in the Yangtze River basin using a combination of the seasonal Mann-Kendall test and time-series decomposition. The used weekly water quality data were from 17 environmental stations for the period January 2004 to December 2015. Results show gradual improvement in water quality during this period in the Yangtze River basin and greater improvement in the Uppermost Yangtze River basin. The larger cities, with high GDP and population density, experienced relatively higher pollution levels due to discharge of industrial and household wastewater. There are higher pollution levels in Xiang and Gan River basins, as indicated by higher NH4-N and CODMn concentrations measured at the stations within these basins. Significant trends in water quality were identified for the 2004–2015 period. Operations of the three Gorges Reservoir (TGR) enhanced pH fluctuations and possibly attenuated CODMn, and NH4-N transportation. Finally, seasonal cycles of varying strength were detected for time-series of pollutants in river discharge. Seasonal patterns in pH indicate that maxima appear in winter, and minima in summer, with the opposite true for CODMn. Accurate understanding of long-term trends and seasonality are necessary goals of water quality monitoring system efforts and the analysis methods described here provide essential information for effectively controlling water pollution.


Introduction
Since 1979, China has experienced sustained and rapid economic growth with a concurrent and growing problem of environmental pollution [1][2][3]. With rapid urban development and accelerating industrialization, the demand for water is increasing, and meanwhile water quality is declining because of pollution. All these changes have made water security a major contemporary PLOS  region to 1,600-1,900 mm in the southeastern region, and the annual mean temperature in the southern and northern parts of the middle and lower Yangtze River basin is 19 and 15˚C, respectively [26]. The Yangtze River basin supplies about 50% of the country's runoff and 40% of Chinese freshwater resources [27].

Datasets
A total of 17 environmental monitoring stations were selected for analysis in this study, in which 7 stations (Number 1, 5, 7, 11, 13, 15 and 16) are located along the trunk stream, 2 stations (8 and 9) were setup for monitoring the Danjiangkou Reservoir, and the rest were built to assess the influence of major tributaries (see Fig 1 and Table 1). Weekly water quality data (including pH, COD Mn , NH 4 -N, and DO) at these 17 stations were obtained and processed from a surface water quality monitoring system. The monitoring system covers seven river systems including the Yangtze, the Yellow, the Pearl, the Huai, the Hai, the Songhua and the Liao Rivers. So far, 148 monitoring stations have been completely established, containing 122 stations for monitoring rivers and 26 stations for monitoring lakes. All these stations test five water quality parameters including pH, total organic carbon (TOC), chemical oxygen demand (COD Mn ), ammonia-nitrogen (NH 4 -N), and dissolved oxygen (DO), every four hours. These data are processed by the government and used to generate weekly data and then the weekly data are published through the Data Center of China's Ministry of Environmental Protection. So only weekly pH, COD Mn , NH 4 -N, and DO were selected and because generally the water quality data are not made public in China. Data were collected at all stations for the period 2004-2015, except for three stations (Qingfengxia, Xingang and Chuchuo) which were only available for 2006 to 2015. Weekly water grades are also described according to environmental quality standards for surface waters in China (GB3838-2002, see S1 Table) (Grade I-V level means that water is "Excellent", "Good", "Satisfactory", "Bad", and "Very bad" respectively). Data related to river discharge, land use, gross domestic product (GDP), population density, and precipitation in the Yangtze River basin were also collected in order to analyze causes of spatiotemporal changes of water quality. Two hydrological stations (Yichang and Hukou), located in the junction zone of the upper, the middle and the lower Yangtze River, were chosen to obtain daily runoff data (Fig 1). The Yichang hydrological station is next to the Nanjingguan environmental station, and the Hukou hydrological station is next to the Hexishuichang environmental station. Land use data from 2010 were extracted from Landsat TM/ETM images with a spatial resolution of 1 km × 1 km at the national scale [28] and land use types were categorized as forest, wetland, desert, agriculture, grass, settlement, and others. Population and GDP for 2010 were provided by the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC), with 1km×1km resolution. Average annual precipitation from 2000 to 2009 came from the China meteorological data center.

Methods
Data pre-processing and summary statistics. After the data extraction process, weekly water quality data and daily runoff data were categorized and transformed into an Excel- format database. MATLAB (R2010a) was used to identify and assess data quality [29]. Data were then analyzed using descriptive statistics to quantitatively describe the main features of the dataset and time-series analysis (such as box plots) was performed on weekly data (the 'weekly' data come from processing of samples taken at 4 hourly time-steps) to allow a direct comparison between the hydrochemical information at different stations. Summary statistics including mean, median, and standard deviation were calculated for each pollutant. Box plots were used to evaluate changes in water quality and ArcGIS 10.0 was applied to display the spatial and temporal features of water quality at different stations. In addition, the Pearson's correlation coefficient [30] was used to evaluate the relationship between water quality and river discharge.
To further understand water quality variability for the whole Yangtze River basin, the full period 2004-2015 was divided into three sub-periods (2004-2007, 2008-2011, and 2012-2015), which can obtain more characters for the changes of water quality. We fisrt checked the trend for the whole period and then the probability distribution functions (PDFs) were calculated for regional mean (the average value of water quality at 17 stations) pH, COD Mn , NH 4 -N, and DO concentrations for all three sub-periods. A two-tailed Kolmogorov-Smirnov test was applied to assess whether the probabilities for different time periods are significantly different.
Seasonal Mann-Kendall trend analysis. The seasonal Mann-Kendall test (SMK) proposed by [31] was employed to detect monotonic trends of weekly water quality data. It is a nonparametric test and is used to detect potential trend change points in water quality trends [32][33][34][35]. Let X = (X 1 ,X 2 ,. . .,X n ) T be a time series of independent water quality observations, and X i = (X i1 ,X i2 ,. . .,X ij ). Here, n is the number of years and j is the number of weekly data for each year and so for weekly 'seasons', the first week data are compared only with the first weeks data of every year, the second week data only with the second weeks data of every year, and so on. The null hypothesis (H 0 ) is that there are no monotonic trends in time. The statistic for the gth season is: According to Hirsch et al. (1982), the SMK statistic,Ŝ, for the entire series is calculated according toŜ For detailed treatment of this analysis, see [31]. Here, the significance level p is selected at 0.05 and 0.10 with corresponding Z statistics of 1.96 and 1.65, respectively. A positive value of Z indicates an 'upward trend' and a negative value of Z indicates a 'downward trend'.
Time-series decomposition. Analysis of long-term trends and seasonal variability was carried out using the dynamic harmonic regression (DHR) technique, extensively described in [36]. Compared to a modified Clausius-Clapeyron equation, a multiple linear regression that includes both a linear and a harmonic dependence on time, and digital filtration (DF), DRH better fits the data and captures both the seasonal variations of the pollutant concentrations and the smaller scale interannual variations in the long term trends [37]. This method has been employed successfully in many examples with non-stationary environmental time-series analysis [38][39][40]. DRH decomposes an observed time series into component parts such as trend and seasonality [41] where y t is the water quality time series, T t is a longer-term trend or low frequency component), C t is a sustained cyclical component with a period separate from the seasonal component (e.g. a diurnal cycle), S t is the seasonal component (e.g., annual water quality seasonality), and e t is the residuals for water quality predications. The trend (T t ) was extracted by the integrated random walk (IRW) model, which is a special case of the generalized random walk (GRW) [42]. The seasonal components (S t ) were defined and calculated as follows: where ω i values are the fundamental and harmonic frequencies associated with the periodicity in the observed water quality series chosen by reference to the spectral properties. For instance, the period 52 corresponds to a weekly sampling data in an annual cycle. The generalized random walk was applied to model the phase and amplitude parameters, which were estimated recursively using the Kalman filter and a fixed interval smoother. The squared correlation coefficient between the calculated seasonal component and detrended water quality data was used to determine significance of seasonality; the significance of the trend was decided on the basis of the squared correlation coefficient between the calculated trend and deseasonalized water quality data. Finally, the coefficient of determination (R 2 ) was applied to evaluate the performance of the model.
where var is the variance estimate of the analyzed weekly water quality series.
Here, considering the regional representativeness of stations in different river segments and the distance between hydrological stations and environmental stations, two environmental stations (Nanjinguan and Hexishuichang) and two hydrological stations (Yichang and Hukou) that represent a large fraction of the total data were selected to present and discuss the DHR analysis. The runoff at Yichang station was highly impacted by the TGD, but it may also reflect the changes in annual runoff. The runoff data are daily data. Before discussing the correlations between runoff and water quality, we calculated the monthly annual runoff and water quality.

Overall water quality and temporal/spatial distribution
Analysis shows that overall water quality improved from 2004 to 2015 (Figs 2 and 3), consistent with the results from [19]. The univariate statistics of the water quality parameters are presented in Table 2, and the distribution of pH, DO, COD Mn , and NH 4 -N from 2004 to 2015 at 17 stations are shown in the boxplots in S2 Fig. Generally, as the figure shows, the median pH and DO concentration at Station 1 and 6 are generally higher relative to the other stations, while the median COD Mn and NH 3 -N concentrations are lower, revealing that the water quality in the Uppermost Yangtze River basin is better than in other areas. Except for Station 1, we see increasing concentrations of median DO and pH from Station 2 to Station 7, but decreasing concentrations of median COD Mn and NH 4 -N concentrations. Station 2 (Minjiang Bride) had the highest median COD Mn concentration (3.47 mg/l), the lowest median DO concentration (6.47 mg/l), and relative highly NH 4 -N concentration. We also found that the water was generally below Grade III during this period, consistent with previous work demonstrating that the middle and lower Min River was seriously polluted during 2003-2008 [43]. Station 14 (Chuchuo) had the highest median NH 4 -N concentration (0.97 mg/l) and the lowest median pH (6.94), followed by station 10 (Xingang), revealing high ammonia nitrogen pollution in the Xiang River and Gan River. According to the changes can be inferred through the shift and shape of the curves between three time periods from the Figure, we can see that there has been a noticeable decrease in COD Mn and NH 3 -N over the period (the PDF curve shifted from right (black curve) to left (red curve)), suggesting improvement overall in Yangtze River Basin water quality, especially over the 2012-2015 period; a tendency towards smaller pH-values was also found in recent years.
Overall water quality improvement is shown based on an evaluation of 17 environmental stations, which differs slightly from the State of Environment Report of China [44] indicating Grade I-III water increased slightly from 88.1% in 2014 to 89.4% in 2015, and Grade V+ water remained flat at 3.1%. However, the report agreed in that it showed overall improvement in the situation of the Yangtze River in recent years.   , which are mainly located on the upper and middle of the Yangtze River basin (Fig 4A). Total 10 stations (among them, 9 stations with significance at 95% confidence) exhibited a positive trend in COD Mn concentration, mainly occurring on major tributaries (e.g., station 10 on Xiang River, station 14 on Gan River) and the lower reach of the trunk stream ( Fig 4B). For NH 4 -N concentration, about half of the stations showed decreasing trends, most of which are distributed on the middle and lower reaches of the trunk stream ( Fig 4C). Increasing seasonal trends were found at 12 stations for DO concentration ( Fig 4D).

Time-series decomposition results
Long-term trends. Results of the DRH analysis, conducted based on two representative environmental stations, are illustrated in Fig 6. Significant long-term trends at these two  As shown in Fig 6C, significant long-term declines were found in stream water COD Mn concentrations at Nanjingguan station, the value of which was about 2.5 mg/l in January 2004 and finally decreased to1.5 mg/l in December 2015. One reason might be the attenuation of the TGR, with nutrient and sediment removal rates in reservoirs significantly affecting transport in river systems [46]. Moreover, the impact of the TGR might be unstable because the conditions in the TGR have changed (e.g., operating rules or water supply) during the time period. The establishment of the TGR has three engineering phases including 1992-1997, 1998-2003 and 2003-2009. Also, operating rules have been adjusted for different flooding situations. By comparison, COD Mn at Hexishuichang station initially decreased to around 1.8 mg/ l in February 2007, and then experienced two peaks (August 2010 and June 2012), and finally plateaued with only slight variation over the remaining years (Fig 6D).
Clear seasonal variability was found in NH 4 -N concentrations at both stations, and both of them finally decreased from January 2004 to December 2015 (Fig 6E and 6F). NH 4 -N at Long-term trends and seasonality of water quality in the Yangtze River basin Nanjinguan station experienced one peak (in June 2010) and then exhibited only slight variability from November 2013, while Hexhishuichang showed three peaks (two big peaks in August 2007 and July 2011, and a smaller one in 2014). At Nanjinguan station, the long-term decreasing trend for COD Mn generally parallels the trend in NH 4 -N, with a significant Pearson's Correlation coefficient between the COD Mn and NH 4 -N trends (Pearson's r = 0.62, p<0.01). Discharge at Yichang station initially decreased by February 2007, and then varied only slightly over the remaining period of record (Fig 6I). There are no similar overall trends for pH, DO, COD Mn , or NH 4 -N concentrations at this station. However, during the period from January 2007-February 2014, a similar decreasing trend was found for DO, COD Mn , and NH 4 -N. A slight variability over January 2004-February 2011 and a marked decline from March 2011 were found at Hukou station (Fig 6J), and similarly there was no overall trend in pH, DO, COD Mn , or NH 4 -N concentrations.
Seasonality. Initial review of weekly time-series plots (Fig 6) indicated a seasonal pattern for pollutants such as pH. Through DHR analysis, seasonal cycles of varying strength were extracted for all pollutant time-series (Table 4 and Table 4 shows that the strongest seasonal cycle was identified for NH 3 -N at Nanjinguan station (50.00%), while the weakest was for pH at Nanjinguan station (9.55%). Seasonal patterns in pH had small phases at both stations, with concentration maxima in winter, and minima in summer, which is consistent with previous work demonstrating that pH in June-August was slightly lower than in January-March in the middle Yangtze River [47]. The result was similar with other study sites. For example, Zeb et al. indicated that pH in winter was higher than in summer in Khyber Pakhtunkhawa (KPK) province of Pakistan [48].
COD Mn at both stations exhibited a seasonal cycle with variability in amplitude throughout the study period. Concentration peaks occurred in summer and minima in winter at Nanjinguan station, and peaks in spring and minima in the end of summer and start of autumn at Hexishuichang station. Generally, NH 4 -N and DO exhibited similar seasonal cycles at both stations. The crest and trough of DO reflected higher values in winter and lower values in summer, which agrees with results from [49]. DO concentration tended to track river temperature, with high concentrations in winter and early spring (times of lower water temperature), and low concentrations in summer and fall (times of higher water temperature).
Similar and clear seasonal cycles of varying strength were found for discharge at both stations (S3E and S3F Fig), with discharge maxima in summer, and minima in winter, reflecting seasonal precipitation (wet condition from May to October and dry condition from December Long-term trends and seasonality of water quality in the Yangtze River basin to April) influenced by both East and South Asian monsoon activities exhibited in the Yangtze River basin. Discharge patterns mirror pH in seasonality, although the amplitude of discharge was variable, especially at Hukou station.  [50,51]. Pollutants in industrial and household waste-water discharges were high in these cities (Fig 8 and S4 Fig), especially in Shanghai, Hangzhou, Chongqing and Chengdu. Moreover, the household waste-water discharged increased between 2011 and 2013 as people migrated to cities from the country [52]. Regional high pollution may be a consequence of drainage from municipal sewage, agricultural wastewater, and livestock production facilities. Water quality is strongly affected by land-use through changes of pollutant input and the water cycle. On the one hand, with the growth of agricultural and settlement land, point source pollution and nonpoint source pollution are likely to increase because of incremental sewage treatment plants and additional use of agricultural fertilizer, like other countries [53].

Discussions
On the other hand, land use may accelerate pollutants transport by changing the form of runoff. For example, most rain that falls on a parking lot runs off immediately [54], often draining into storm sewers with pollutants that transport it to a stream or ditch without filtration. Long-term trends and seasonality of water quality in the Yangtze River basin In addition, industrial waste-water discharge was nearly unchanged in Nanchang and Changsha between 2005 and 2013, but industrial pollutants and ammonia nitrogen discharge have increased, especially in the last three years (Fig 8). These pollutants entered the Xiang and Gan Rivers, possibly explaining the high concentration and increasing trends at stations 10 and 14. Yan et al. indicated that agricultural activities, industrial and domestic wastewater have caused increasing loads of nitrogen to be discharged into the Yangtze River between 1970 and 2003 [55]. In general, water quality at stations on northern tributaries (including 6, 8 and 9) were better than southern tributaries (e.g., station 10 on the Xiang River and station 14 on the Gan River), and NH 4 -N at stations on the trunk stream were relatively low. Meanwhile, heavy metals and nitrogen are the major pollutants in the lower Xiang [56,57] and Gan Rivers [58]. Although these changes are likely to contribute to deteriorating water quality, enormous effort has been made in recent decades in China including an ever-improving legal system and the popularity of sewage treatment plants, leading to improving water quality. Finally, Fig 4 and Table 3 show that there has been a measurable decrease in pollutants before flowing into dams and reservoirs and after flowing from them. For example, water quality after the dams is improving than before the dams; mean NH 4 -N at stations 10 and 14 were 0.53 mg/l and 0.65 mg/l respectively, but dramatically decreased to 0.27 mg/l and 0.20 mg/l at stations 11 and 15 after flowing from Dongting and Poyang Lakes. Possibly, high runoff from other tributaries with low pollutant concentration may dilute outflows from lakes and reservoirs and dams are positively influencing the retention of pollutants along the river. Additionally, attenuation of pollutant concentrations due to lakes is another well-established explanation for the data [59], especially under some of appropriate planning and contamination monitoring policies.

Conclusions
This study characterizes the spatio-temporal distribution, long-term trend, and seasonality of water quality in the Yangtze River basin using statistic methods and time-series decomposition. The used dataset were weekly water quality data (pH, COD, NH 4 -N, and DO) from 17 environmental stations for the period January 2004 through December 2015. The weekly water quality data allowed analysis of long-term trends and provided new insights into the changing amplitude and phase of the seasonality of the pollutants within the Yangtze River basin. Water quality gradually improved during this time period in the Yangtze River basin, but regional differences are still obvious. For example, high ammonia nitrogen pollution can be seen in the Xiang and Gan River basins because of high pollutants from industrial activities and sanitary sewage around these rivers. In addition, significant seasonal trends were identified in weekly pH, DO, COD Mn , and NH 4 -N concentration over the 2004-2015 period, and seasonal cycles of varying strength were extracted for pollutant time-series, suggesting the seasonal cycles of water quality in the Yangtze River basin. All these results could be helpful for fully understanding the seasonal trends of long-term water quality in the Yangtze River basin, which can provide essential information for effectively controlling water pollution and managing water resources.