Effects of Land Use, Topography and Socio-Economic Factors on River Water Quality in a Mountainous Watershed with Intensive Agricultural Production in East China

Understanding the primary effects of anthropogenic activities and natural factors on river water quality is important in the study and efficient management of water resources. In this study, analysis of Variance (ANOVA), Principal component analysis (PCA), Pearson correlations, Multiple regression analysis (MRA) and Redundancy analysis (RDA) were applied as an integrated approach in a GIS environment to explore the temporal and spatial variations in river water quality and to estimate the influence of watershed land use, topography and socio-economic factors on river water quality based on 3 years of water quality monitoring data for the Cao-E River system. The statistical analysis revealed that TN, pH and temperature were generally higher in the rainy season, whereas BOD5, DO and turbidity were higher in the dry season. Spatial variations in river water quality were related to numerous anthropogenic and natural factors. Urban land use was found to be the most important explanatory variable for BOD5, CODMn, TN, DN, NH4 +-N, NO3 −-N, DO, pH and TP. The animal husbandry output per capita was an important predictor of TP and turbidity, and the gross domestic product per capita largely determined spatial variations in EC. The remaining unexplained variance was related to other factors, such as topography. Our results suggested that pollution control of animal waste discharge in rural settlements, agricultural runoff in cropland, industrial production pollution and domestic pollution in urban and industrial areas were important within the Cao-E River basin. Moreover, the percentage of the total overall river water quality variance explained by an individual variable and/or all environmental variables (according to RDA) can assist in quantitatively identifying the primary factors that control pollution at the watershed scale.


Introduction
The deterioration of river water quality has become a primary environmental concern due to unsustainable anthropogenic activities; the demand for freshwater has been rapidly increasing in many developing countries, especially in China [1][2][3]. River water quality is controlled by complex anthropogenic activities and natural factors at both the river and watershed scales [1], [7][8][9]. Understanding the temporal and spatial variations in river water quality and estimating the primary regional factors that affect water quality can assist researchers in establishing priorities for sustainable water management [2], [10][11][12]. The relationships between water quality parameters and land use/cover, population density and point source discharge have been frequently studied [1], [7], [9], [13][14][15]. Simultaneously, topography and animal waste discharge are considered important factors that affect watershed river water quality [7], [13][14][15][16]. Particularly, several countries, e.g., China, contain large mountainous areas [17] and extensive animal production that lack strict management techniques [16], [18,19].
Recently, Pearson correlations have been widely employed to determine the relationships between environmental variables and river water quality [1], [7]. This method is simple and provides quantitative information; however, Pearson correlations lack visualization. Multiple regression analysis (MRA) is a useful tool that is commonly used to determine that watershed characteristics that best explain the spatial variability of an individual river water quality variable [7], [20]. This method lacks general information regarding pollution types. Moreover, principal component analysis (PCA) has become a widely accepted method in river water quality assessment and source apportionment studies in the last decade [2]. This method is commonly used to obtain the types of specific pollution sources without explanatory variables; however, PCA can only provide preliminary information regarding pollution types. Redundancy analysis (RDA), a multivariate statistical analysis method, has been proved useful for qualitative analysis of the interactions between river water quality and watershed characteristics in highly complex systems [13], [21,22]. RDA can reveal the influences of environmental factors on overall water quality, not only on single water quality variables [20]. Several commonly used statistical methods for pollution source identification are complementary (i.e., they have their own benefits and limits). Summary of commonly used statistical methods on pollution source identification in recent years is listed in Table  S1. However, comprehensive applications of different statistical methods to evaluate the effects of environmental variables on water quality have not been fully explored in river studies in China. Furthermore, few studies have used RDA analysis to quantitatively evaluate the effects of watershed characteristics on the overall river water quality.
The Cao-E River is located upstream of the Qiantang Estuary, which is a major riverine system in Zhejiang Province, China, and is the primary source of industrial, agricultural and domestic water supplies [10]. Furthermore, the blue algae bloom that occurred in July 2004 occurred in the Qiantang Estuary [23,24].
In this study, ANOVA, Pearson correlation analysis, MRA, PCA and RDA were applied to investigate the effects of subwatershed land use, topography and socio-economic factors (including animal waste discharge) on river water quality at the watershed scale for the Cao-E River system in a GIS environment. The objectives of this study were threefold: (1) to examine the temporal and spatial variations in water contamination within the study area, (2) to investigate the relationships between the river water quality parameters and land use, topography and socioeconomic factors, and (3) to identify the primary pollution sources to estimate the possible sources that affect the water quality parameters in the Cao-E River basin. The results can be helpful for water conservation in the Cao-E River basin, and they provide a valuable tool for water quality agencies to develop assessment strategies for effective water quality management and rapid solutions for water pollution problems at the watershed scale.

Ethics statement
No specific permits were required for the described field studies; the sampling did not cause any disturbance to the environment or to the protected species at the sampling sites.

Study area
The Cao-E River (29u089-30u159 N, 120u309-121u159 E) is located in Zhejiang Province, East China. The Cao-E River is one of the main tributaries of the Qiantang Estuary, which flows into the East China Sea. The river system contains a main stream and several major tributaries (i.e., the CT River, the XC River, the CL River, the HZ River, the XS River and the YT River; Figure 1). The main stream is ,197 km long with an average slope of 3.0%. The Cao-E River basin has a typical subtropical monsoon climate with an average annual rainfall of 1500 mm and an average ambient temperature of 16.2uC [25]. The watershed is within a mountainous region. Specifically, mountains and hills cover 2/3 of the area; the remaining area is covered by plains with intensive agricultural production [26].

Data Sources
Water quality data. The stream water samples (the number of samples used for statistical analysis is 36) were collected from July 2003 to June 2006 at 20 sampling sites throughout the watershed (Figure 1). Specific sampling stations that were largely influenced by point source pollution were excluded from further analysis. These specific sampling sites should meet the following conditions: (1) there was an obvious outlet of point pollution discharge near the sampling site, and (2) The standard deviation of the water quality parameter (BOD 5 ) at the specific sampling sites were triple more than the standard deviation for total sampling sites. The samples were collected once a month between 9:00 am and 16:00 pm. Moreover, the water samples were collected at approximately 0.3 m below the water surface in the central stream, placed into plastic bottles (2.5 L), transported to the laboratory and stored at 0,4uC for subsequent chemical analysis. The chemical measurements were performed in the laboratory within 24 h after collecting the water samples. Moreover, dissolved oxygen (DO), water temperature (T), pH and electrical conductivity (EC) were measured when the water samples were collected with a hand-held multi-parameter instrument (Multi 340i SETs, The Merck Co. Ltd., Germany). The turbidity was measured with a hand-held Turbiquant 1000IR (Multi 340i, The Merck Co. Ltd., Germany). The following chemical and biological water quality parameters were measured: chemical oxygen demand (COD Mn ), 5-day biochemical oxygen demand (BOD 5 ), total nitrogen (TN), total phosphorus (TP) and dissolved phosphorus (DP), which were measured using standard methods [27]. The ammonium nitrogen (NH 4 + -N) content was measured using an Astoria analyzer system (AAS, Brown Rupee CO. Ltd., Germany) after filtration through a 0.45-mm filter (Hailing Medicine, Zhejiang Province, China). The nitrate nitrogen (NO 3 2 -N) content was measured from the absorbance of the sample at 220 nm (A220) and 275 nm (A275) using a visible and ultraviolet spectrophotometer (Shanghai Spectrum Instruments, Shanghai, China) after filtration through a 0.45-mm filter; the results were determined by the following relationship: A220 -26A275 [27].
Land use/cover, topography and socio-economic data and GIS analysis. Latitude and longitude were measured using a hand-held GPS. The catchment boundaries were delineated on a 30-m spatial resolution digital elevation model (DEM); the stream network was represented using the Soil and Water Assessment Tool (SWAT) model. ArcGIS 9.3 Desktop GIS software was used to calculate the watershed characteristics for each sampling site. All datasets were converted to a common digital format (WGS 1984) and a common coordinate system (Albers Equal Area). Hydrologic units were not solely defined by the watershed above an individual point [28]; a hydrologic unit on the Cao-E River may cover 2/3 or more of the entire watershed above that sampling station (most hydrologic units on tributaries encompass the entire watershed above a sampling station).
The socioeconomic data, including land use change, the gross domestic product, animal husbandry output and human population in the watershed, were obtained from the Shengzhou (SZ), Xinchang (XC), Shangyu (SY) and Shaoxing (SX) County Statistical Yearbooks for the period 2003-2006 and were adjusted with some typical investigation in several towns within the watershed. The temporal variations in land use were considered to be minimal. The annual changes in forest areas and water areas were less than 0.05%, the annual decrease rate in cropland areas was less than 1% and the annual increase rate in urban areas was less than 2.5%., respectively, for the entire watershed and throughout the study period. Thus, Landsat Thematic Mapper imagery from 2006 was used to map the land cover in the study area [29]. The land cover was categorized into the following 4 classes: forest, water, cropland and urban land. The annual increases in human population density, gross domestic product per capita and animal husbandry output per capita were less than 1.5%, 10% and 2%, respectively, for the entire watershed and throughout the study period. Therefore, the averages of the socioeconomic data for the entire study period were selected in this study. The slope map was derived from the DEM using a GIS spatial analyst tool (ESRI, 2006). The land use, slope, DEM and administrative map (with attribute data, including the gross domestic product, animal husbandry output and human population for every town) were used to calculate the respective land use area (forest, water, urban land and cropland), mean elevation, mean slope, area, human population density, gross domestic product per capita and animal husbandry output per capita within each sub-watershed using a GIS spatial analyst tool. The abbreviations and statistics of the sub-watershed characteristics in the Cao-E River basin are summarized in Table 1.

Data analysis
Analysis of variance (ANOVA) was used to compare river water quality variations between different seasons (i.e., rainy and dry) via the Student-Newman-Keuls multiple-range test (P,0.05). Rela-tionships among the considered variables were tested using Pearson's coefficient with statistical significance set at P,0.05. Major gradients and principal patterns in the water quality data between wet and dry seasons were detected using principal component analysis (PCA). PCA transforms a dataset consisting of p variables (analytical constituents) that are interrelated or correlated to various degrees into a new dataset containing p new orthogonal and uncorrelated variables; these variables are called the principal components (PCs) [30], [35]. The PCs are linear functions of the original variables in which the sum of their variances is equal to that of the original variables [30]. The PCs following a descending ordered according to their variances with PC1 corresponding to the variable with the largest variance. Algebraically, for p original variables (i.e., x 1 , x 2 ,…, x p ). Additional PCs up to p can be formulated in a similar manner. The variances of the PCs are the eigenvalues; the coefficients or weights, are the eigenvectors extracted from the covariance or correlation matrix. In the example PCA, the correlation matrix was used in all cases. The goal of PCA is that the first k PCs (where k,,p) retain most of the information in all of the p original variables, which effectively reduces the practical dimensionality of the dataset. More specifically, if the correlations are high among many of the original variables, the first few PCs contain (or explain) a large percentage of the total variance and may be used to describe multivariate patterns or water quality variations across the watershed nearly as well as using the complete set of p original variables. These patterns are often related to specific sources of contamination [30].
The stepwise regression analysis was performed to determine the environmental or socio-economic factors (TP, P,0.10 and the others, P,0.05) that best explained the spatial variability of an individual river water quality variable [20]. Variables that were strongly intercorrelated with others (variance inflation factor .10) in the initial analysis, were removed and a further analysis was carried out with the remaining environmental variables for MRA and RDA [20], [31,32]. Variance inflation factor (VIF) is a common way for detecting multicollinearity, a general rule is that the Variance Inflation Factor should not exceed 10 [49]. RDA was performed using Canoco 4.5 software to evaluate the relationships between river water quality and the environmental variables [33]. RDA was chosen because previous inspection of the data revealed a linear response rather than a unimodal response in the primary river water quality variables [34]. A Monte Carlo permutation test was used to verify the significance of the models [29]. Two series of matrices (i.e., for river water quality and environmental variables) from the RDA results were visualized as ordination diagrams using the CanoDraw software package for Windows. The river water quality parameters and the environmental/socio-economic variables were represented with arrows. Correlations between the watershed characteristics and/or the river water quality parameters were obtained by projecting the watershed characteristics onto each river water quality parameter in which higher values indicated higher correlations [34]. Application summary of 5 statistical methods in this study was listed in Table S2. The statistical analyses, except RDA, were completed using the SAS 9.1 software for Windows.

Results
1 Physico-chemical water quality in the Cao-E River basin ANOVA was used to compare river water quality variations between the different seasons (i.e., rainy and dry). Temporal and spatial variations in the physico-chemical parameters are shown in Table 2 and Figure 2. More specifically, BOD 5 , TN, DO, pH, T and turbidity exhibited significant temporal variations (Table 2, p,0.10). Moreover, TN, pH and T were generally higher in the rainy season (April to September), whereas higher values for BOD 5 , DO and turbidity occurred in the dry season (October to March). Most of the water quality parameters, except pH and T, exhibited considerable spatial variations. BOD 5 , COD Mn and NH 4 + -N were higher in urban areas compared with surrounding rural areas, whereas DO exhibited a reverse response to the extent of urbanization. TN, DN, NO 3 2 -N, TP and DP were higher in the lower part of the basin where more land has been developed and were lower in the upper-eastern part of the Cao-E River basin. Turbidity and EC were higher in the lower part of the basin (Figure 2) where the reach is often affected by tides and/or point source pollution [36,37]. Major gradients and principal patterns in the water quality data between the wet and dry seasons were detected using PCA. The PCA based on only the dry season water quality data indicated that 4 significant factors (i.e., PCs with an eigenvalue .1) explained 87.8% of the total variance ( Table 3). The first factor accounted for 50.2% of the total variance and had a strong positive correlation with BOD 5 , COD Mn , TN, DN, NH 4 + -N, and T. The PCA of the wet season water quality data showed that the eigenvalues for the 2 most significant factors were 8.442 and 1.817, accounting for 78.9% of the total variance. Factor 1 accounted for 64.9% of the total variance and was positively correlated with BOD 5 , COD Mn , TN, DN, TP and DP ( Table 3).

Relationships between watershed characteristics and river water quality
Relationships between the river water quality variables and their corresponding sub-watershed land use, topography and socio-economic factors were explored using a Pearson correlation test, MRA and RDA. All environmental variables, including land use, topography and socio-economic factors, were utilized to    Table 4).
MRA demonstrated that no individual environmental factor was able to describe the overall water quality; however, most of the physico-chemical water parameters could be sufficiently predicted using 1 to 3 environmental factors ( Table 5). Specifically, BOD 5 could be predicted using UB, AR and GDP pc ; COD Mn with UB and AR; TN, DN, pH and DO with UB; NH 4 + -N with UB and HP d ; DP with HP d ; turbidity with AHO pc ; EC with GDP pc ; TP with UB and AHO pc ; and NO 3 2 -N with UB and SLP mean . The RDA of the overall water quality as the dependent variable suggested that FRT, UB, CRP, ELV, SLP mean , AR, HP d , GDP pc and AHO pc explained 15.7-50.8% of the river water quality spatial variation, and a combination of the topography, land use and socio-economic factors explained 86.1% of the variation in overall water quality ( Table 6). The ordination diagram of overall water quality and topography, land use and socio-economic factors indicated that the first RDA axis displayed a pollution gradient (e.g., COD Mn and TN increased along the axis), which was positively correlated with UB and HP d and negatively correlated with FRT and SLP mean (Figure 3), accounting for 52.4% of the total variance in water quality ( Table 7). The second axis was related to AHO pc and only explained 16.5% of the total variance ( Table 7).

Seasonal effects
Seasonal variations in the flow caused by the subtropical monsoon climate can partially explain the river water quality temporal variations [7], [38]. The relatively high TN concentrations in the rainy season were attributed to a flushing effect [36]. BOD 5 , which corresponds to point source pollution, were lower in the rainy season due to the dilution by precipitation [2]. Relatively high T in the rainy season resulted from the specific climatic conditions during this period (T is higher in the rainy season than in the dry season; see Table 2). The lower values of DO and higher values of pH in the rainy season may related to some environmental factors variations, such as flow rate, water temperature, aquatic plant growth and so on [50], [39][40][41]. The reasons are complex and the further research is needed in the future. The turbidity was higher in the dry season, which may be primarily because the reduced flow during this period caused the river water to be easily influenced by tides. Moreover, Odokuma and Okpokwasili [40] reported that the other parameters such as phosphate showed significantly higher values in the rainy season than in the dry season in the New Calabar River, Nigeria. These different results compared with our study were mainly due to the regional climate difference. However, the absence of a significant difference in the other physico-chemical parameters between the dry and wet seasons indicated mixed and irregular influences, e.g., point sources and diffusion may have played an important role [1], [42].

Land cover effects
In the study area, land-use types could play important roles in affecting major river water quality parameter distributions, as reflected by their considerable variability (Figure 2). This result is corroborated by the strong positive correlations between urban land and BOD 5 , COD Mn , TN, DN, NH 4 + -N and TP (Table 5) and the strong negative correlations with DO and pH. Moreover, most of the nutrient variables exhibited lower concentrations in the forest-dominated region, whereas the cropland-dominated region had high nutrient concentrations. The region downstream of the urban area exhibited higher concentrations of BOD 5 , COD, NH 4 + -N, TP and DP and lower pH and DO ( Figure 2); this finding agrees with many related studies [21], [43,44]. Studies have also shown that certain river water quality parameters (e.g., TN and TP) are primarily determined by agricultural land use at  the watershed scale in many parts of the world [1], [20]. However, this study found that agricultural land use in watersheds plays a minor role in explaining the spatial variations in river water quality (explaining 16.2% of the total overall water quality variance) in the Cao-E River basin. The different result is primarily due to the enhanced effects of urban runoff and incompletely treated industrial waste and domestic pollution on river water quality compared to agricultural runoff. In addition, vegetated buffers adjacent to the croplands can substantially mitigate agricultural runoff of nutrients and other contaminants via deposition, absorption, and denitrification [20]. Urban areas are primarily located along river networks in the Cao-E River basin. Therefore, urban effects on the river water quality were expected. Urban land use comprised a much smaller percentage of the Cao-E River basin than the cropland in this study. However, as observed from the regression model (BOD 5 , COD Mn , TN, DN, NH 4 + -N, NO 3 2 -N, TP, DO and pH) and the RDA analysis (explaining 50.8% of the total overall water quality variance) results, urban areas play a pivotal role in influencing water quality (Tables 5 and 6). This observation is most likely due to two factors: (1) the deficient capacity of urban sewage treatment plants leads to domestic pollution and industrial wastewater inputs into the Cao-E River system, and (2) the high percentage of impervious surfaces and over fertilization of grassy areas can increase discharge rates, sedimentation and pollutant runoff to streams [44].

Topography effects
Topographical factors played important roles in explaining spatial variations in river water quality within the Cao-E River basin (SLP mean was 36.3% and ELV was 29.8%). Topography largely regulated the river water quality parameters. Most related studies have shown that higher SLP mean and/or ELV lead to higher erosion rates, which subsequently increase the rate at which particulate matter enters a water body [13], [21], [45]. However, our results indicated an inverse relationship. Here, SLP mean and ELV were both negatively correlated with TN, DN, NO 3 2 -N, TP and DP; SLP mean exhibited a highly significant negative correlation with NO 3 2 -N values ( Figure 3 and Table 5). The slope effects on water chemistry varied. Watershed physical characteristics, such as soil properties (soil texture and soil drainage), morphological variables (drainage density and elongation) and particularly surficial debris, largely affected water chemistry in river waters [37], [46]. In addition, these negative correlations existed primarily because ELV and SLP mean were strongly correlated with land cover ( Table 8). High elevations and/or steep catchments are primarily occupied by forest land cover; however, cropland is primarily located in regions in which the topography is relatively flat and has a low ELV. Therefore, the forested mountain area, which has higher SLP mean and/or ELV, may export fewer nutrients than flat land (e.g., cropland). Similar results were published regarding water quality spatial variations in the Han River basin in South Korea [7] and non-point source effects on stream nutrient concentrations in the Seattle region of the USA [47].

The effects of socio-economic factors
The socio-economic factors commonly have different extent effects on river water quality, especially in developing countries such as China. Our study shows that socioeconomic factors have significant effects on river water quality during the study period ( Table 5) and that HP d , AHO pc , and GDP pc explain 44.6%, 26.9% and 26.2% of the total overall river water quality variance, respectively ( Table 6). Furthermore, HP d is a fundamental parameter for predicting DP and AHO pc for TP and turbidity ( Table 5). This result can be attributed to high population density, the massive animal production and the lack of strict management regulations for animal waste in the study area. The population density exceeds 380 people km 22 , the annual animal output value exceeds 2.0 billion Yuan and the annual hog production exceeds 1.0 million heads for the entire basin [48]. Most human and animal waste with incomplete treatment is discharged into the surrounding water bodies. Moreover, EC was positively correlated with GDPpc, which was most likely related to point source pollution [37].
In addition, some researches [4][5][6] point out that climate condition is also an important factor of influencing river water quality. Thus, we will further study it in our future work.

Conclusions
The primary results of this study can be summarized as follows: 1) The in-stream water quality in the Cao-E River basin streams suggested that TN, pH and T were generally higher in the rainy season, whereas BOD 5 , DO and turbidity were higher in the dry season. 2) Spatial variations in river water quality are typically associated with several anthropogenic and natural factors. Urban land cover was determined to be the most important explanatory variable for BOD 5 , COD Mn , TN, DN, NH 4 + -N, NO 3 2 -N, TP, DO and pH. Moreover, Animal husbandry output per capita was an important predictor for TP and turbidity, and Gross domestic product per capita largely determined the spatial variations in EC. The remaining unexplained variance resulted from other factors, such as topography.

3) 3) Pollution control for animal waste discharge in rural
settlements was important in the study area. Moreover, agricultural runoff, industrial pollution and domestic pollution in urban and industrial areas were also important factors within the Cao-E River basin. 4) The percentage of total overall river water quality variance explained by individual variables and/or all environmental variables in the study area (as determined using RDA) can assist in quantitatively identifying primary pollution control factors at the watershed scale.
This study improved our understanding of the anthropogenic activities and natural factors that affect river water quality and can assist in the design of efficient strategies for controlling river water pollution at the watershed scale. Moderate-resolution remote sensing data were adopted in this study; future investigations will require high-resolution DEM maps and additional land use classes to better evaluate the effects of specific environmental variables on overall river water quality.

Supporting Information
Table S1 Summary of commonly used statistical methods on pollution source identification in recent years. (DOC)