The Impacts of Land Use Patterns on Water Quality in a Trans-Boundary River Basin in Northeast China Based on Eco-Functional Regionalization

The relationships between land use patterns and water quality in trans-boundary watersheds remain elusive due to the heterogeneous natural environment. We assess the impact of land use patterns on water quality at different eco-functional regions in the Songhua River basin during two hydrological seasons in 2016. The partial least square regression indicated that agricultural activities associated with most water quality pollutants in the region with a relative higher runoff depth and lower altitude. Intensive grazing had negative impacts on water quality in plain areas with low runoff depth. Forest was related negatively with degraded water quality in mountainous high flow region. Patch density and edge density had major impacts on water quality contaminants especially in mountainous high flow region; Contagion was related with non-point source pollutants in mountainous normal flow region; landscape shape index was an effective indicator for anions in some eco-regions in high flow season; Shannon’s diversity index contributed to degraded water quality in each eco-region, indicating the variation of landscape heterogeneity influenced water quality regardless of natural environment. The results provide a regional based approach of identifying the impact of land use patterns on water quality in order to improve water pollution control and land use management.


Introduction
Water quality integrates important geomorphic, hydrologic, and some of the biological processes of a watershed which make it one of the essential elements of a healthy watershed [1]. The deterioration of surface water quality is a considerable issue in river basin management throughout the world, which has become a serious threat to the chemical integrity of the aquatic environment. Surface water can be polluted by anthropogenic activities in two ways: (1) by point sources, such as sewage treatment discharge; and (2) by non-point sources such as overland runoff from urban and agricultural areas (buffer zones) [2]. Non-point source pollution is more difficult to verify than point sources due to the intricate and diffuse nature of the interactions between runoff and landscape [3]. Land use patterns have been deemed as a significant regulator of contaminants in surface flows and interflows, which make it a critical research topic for clarifying the correlations of surface water quality and non-point source pollutants [4]. Previous studies have reported about the relationship between water quality and the composition of land use types, such as cropland and urban, were related with stream pollutants positively, while forest and grasslands that were less influenced by anthropogenic activities had negative correlations [5,6]. The spatial configuration of landscapes in the watershed played an important part in identifying hydrological processes, natural habitats, energy flows, and nutrient cycles [7]. Thus, the variation of landscape is one of the main factors affecting non-point pollution. A large numbers of landscape metrics, for example the quantification of the landscape, have been developed to characterize landscape patterns and used for clarifying the linkages between the landscape and the water quality [7][8][9]. Previous research usually correlated surface water quality with either simple measures of land use types or landscape configuration in a watershed. However, either of them are not comprehensive enough to indicate the influences of land use management on water quality in a watershed. Therefore, conducting an analysis on the relationships between water quality indicators and both watershed land use types and landscape characteristics can produce a more comprehensive result. Generally, multivariate linear regression (MLR) has widely been used to quantify the relationships between land use patterns and water quality parameters, especially stepwise multiple regression [7,10,11]. However, classical regression approach presents several problems when analysing the relationships between land use/landscape metrics and water quality parameters. First, many land use types and landscape metrics are highly correlated, which produces redundancies and leads to inaccurate results. Additionally, sample size (the number of different study plots or individuals) should be larger than the number of predictors in order to assure the significance of the regression analysis [12]. Thus, the application of technique such as partial least-square regression (PLSR) can overcome the inherent limitations caused by classical multivariate regression when handling multi-collinear and noisy data [13]. PLSR is a method to analyse the response variable by using a set of independent variables having best predictive power [14]. The output of PLSR is a combination and generalization of the principal component analysis (PCA) technique and the multiple linear regression technique. PLSR can deal with variables that are highly collinear by explicitly assuming dependency among the variables and evaluating the underlying structures and is specifically suitable for cases in which the number of samples is less than the possible variables [15].
The Songhua River basin plays an important role in industry, agriculture and forestry. Intense anthropogenic activities such as agricultural production and urbanization have long been affecting the water quality of Songhua River. In order to obtain the goal of national water environment treatment, the regionalization of freshwater ecological functions has been carried out in the eight major watersheds in China including the Songhua River. The definition and method of water eco-functional regionalization was first proposed by Omernik [16]. In the past few decades, many countries have conducted their study on water eco-functional regionalization. Most states in the USA have already completed the IV level water eco-functional regionalization. In addition, EU has established the regionalization systems of different scales in the WFD, of which the macroscale eco-functional regionalization is based on the similar descriptions of typology, biology and ecology. China started to carry out the study on freshwater eco-functional regionalization during the Eleventh Five-year Plan with the support of National Major Science and Technology Program for Water Pollution Control and Treatment of China. It is of great importance to implement watershed management on the basis of eco-functional regionalization. Thus, the objectives of this study are as follows: (1) analyse the spatial and temporal distribution characteristics of the surface water quality in each eco-region; (2) quantify the relationships between land use/landscape characteristics and water quality parameters using PLSR in a sub-basin scale, and identify the main factors determining the water quality during normal flow and high-flow periods in each eco-region [17].

Study Area and Sampling Sites
The Songhua River Basin (41 • 42 ~51 • 38 N, 119 • 52 ~129 • 31 E) is located in the northeast of China and is one of China's seven major river basins. This river basin occupies a large part of Heilongjiang Province, Jilin Province and the Northeastern part of Inner Mongolia Autonomous Region with an area of 55.7 × 104 km 2 . The Songhua River Basin is a trans-boundary watershed consists of three sub river basins: the Nenjiang River Basin in the North, the Second Songhua River Basin in the south, and the lower Songhua River Basin (the mainstream of Songhua River) in the northeast. Trans-boundary watershed is a watershed that crosses at least one political border, either border within a nation or an international boundary. The Nenjiang River originates on Yihehuli Mountain in the Great Khingan Mountains while the Second Songhua River originates at Tianchi Lake in the Changbai Mountains. Then, the two rivers travel all the way down and meet in Songyuan in Jilin Province. With a gentle slope and wide surface, the lower Songhua River carries the combined flow from Nenjiang and Second Songhua rivers, flowing northwestward 939 km before entering the Amur River.
The Songhua River Basin lies within a north temperate monsoon climate zone, and the temperature and rainfall varies significantly during the year, with the warmest month being July (20 • C~25 • C) and the coldest January (−20 • C). Annual precipitation averages 500 mm, showing a spatial tendency of higher in mountain area and lower in plain. 60~80% of the annual precipitation occurs from July to September, and only 5% occurs from December to February [18]. The Songhua River Basin is an important agricultural and industrial area in Northeast China. The dominant crops in the basin include soybean, corn, sorghum, and wheat. Industries mainly include petrochemical, machine manufacturing and paper making, distributed in the urban belt of main industrial cities like Harbin, Changchun, and Jilin along Songhua River [19].
In order to eliminate the impact of land use patterns on water quality covered up by heterogeneous natural environment elements, all 86 sampling sites were classified according to the level I water eco-functional regionalization of Songhua river basin. This regionalization grouped the 86 study sites into five different eco-regions: (1) mountainous normal flow (n = 10); (2) plain low-flow (n = 23); (3) hilly high flow (n = 32); (4) mountainous high flow (n = 12); (5) plain normal flow (n = 9). The five eco-regions were identified using a combination of climatic, hydrologic and topographic factors. The three factors referred to annual temperature, elevation and multi-average runoff depth respectively. Figure 1 shows the characteristics of the three factors. They were normalized and processed in ArcGis 10.2 (Esri, Redlands, CA, USA) by using the cell statistics and raster calculator in the spatial analysis toolset [20]. The main environmental characteristics of these five eco-regions were shown in Table 1.    Forty-three of those sample sites are part of the local government's long term monitoring network (Table A1). Not all samples were collected during the icebound season as some of the streams were frozen to the bottom. The water samples at each sites were sampled in polyethylene bottles pre-rinsed three times with distilled water and kept below 4 • C for laboratory analysis. Three samples were collected at each sampling site and a total of 256 samples were collected. Twelve representative parameters were selected for measurement, which are important indicators of water contamination influenced by anthropogenic activities. The parameters included pH, electrical conductivity (EC, µs·cm −1 ), dissolved oxygen (DO, mg·L −1 ), chemical oxygen demand (COD, mg·L −1 ), permanganate index (COD Mn , mg·L −1 ), ammonia nitrogen (NH 3 -N, mg·L −1 ), nitrate nitrogen (NO 3 -N, mg·L −1 ), total nitrogen (TN, mg·L −1 ), total phosphorus (TP, mg·L −1 ), fluoride (F − , mg·L −1 ), chloride (Cl − , mg·L −1 ), and sulfate (SO 2− 4 , mg·L −1 ). The values of pH, DO and EC were directly measured in situ with a multi-parameter water quality monitoring instrument (Thermo Fisher Scientific, Waltham, MA, USA). The values of other parameters mentioned in the paper were analyzed in the laboratory following the national standard methods [21]. Additionally, water samples except for fluoride, chloride and sulfate analysis were acidified with sulfuric acid to adjust the pH < 2.

Land Use and Landscape Metrics
The stream network and sub-basin boundaries were extracted from a digital elevation model (DEM, 30 m × 30 m data) using the hydrology toolset in ArcGIS 10.2. As a result, there were a total of 144 sub-basins formed in the study area, 53 sub-catchments were selected to study the relationships between land use patterns and water quality. Land use patterns were acknowledged to have little change within 5 years. Thus we chose the land use data of Songhua river basin in 2015, which was provided by Data Centre for Resource and Environmental Sciences, Chinese Academy of Sciences (RESDC) (http://www.resdc.cn). The land use types were classified into six categories: (1) agricultural land, mostly planted with corn, rice and soybean; (2) forest; (3) vegetated land; (4) water bodies, including rivers, reservoirs and ponds; (5) urban areas, including residential, commercial and industrial lands; and (6) unused land, including gravel, bare ground, and bare rock ( Figure 3). The six land use types were commonly used in previous studies and should all be considered to better understand the land use pattern and its relationship with water quality in the Songhua River Basin [5,7,22]. In order to investigate the relationships between landscape configurations and water quality, the landscape metrics in 53 sub-catchments representing the patch size, shape, structure, and landscape diversity were chosen at the landscape levels ( Table 2). They have been commonly used in previous studies in dealing with land use patterns in explaining water quality. In addition, these landscape metrics are important in understanding the ecological functioning and human perception in a landscape [8,23,24].

Statistical Analysis
All the water quality data were tested for normality by eco-regions using the Shapiro-Wilk test, since the number of sampling sites in each eco-region was below 50. Parameters not normally distributed were log transformed to increase variable normality [25]. One-way analysis of variance (ANOVA) with the post hoc Tukey's test was used to compare water quality variations between different eco-regions and seasons at significance level of p < 0.05. Boxplots of all sampling sites in the five eco-regions for the 12 water quality parameters were performed to study their seasonal and spatial variability. ArcGIS 10.2 was used to map the distributions of concentration of each water quality parameter by spatial interpolation in order to better understand the variation of water quality among sites. Besides water quality, one way ANOVA with the post hoc Turkey's test was used to compare the variance of landscape metrics between different eco-regions. Welch's Anova test was also done in case the heterogeneity of variance appears. Both ANOVA test were done at the significance level of p < 0.05. The test of normality of the landscape metrics was done using the Shapiro-Wilk test initially.
The partial least square regression (PLSR) was used to explore the relationships between land use patterns and water quality variations in different eco-regions and identify the key predictors for degraded water quality. This was carried out by converting explanatory variables onto orthogonal 'latent' components, which stand for independent variables in a regression. The calculated latent components in PLSR maximize the covariance between the response and explanatory variables through the simultaneous decomposition of X and Y matrices of vectors. The PLSR models were performed in SIMCA-P [26]. Cross-validation was the criterion used to determine the minimum number of latent components needed to acquire the most predictive PLSR model.
Within SIMCA-P, Q 2 (the fraction of the total variation of the dependent variables that can be predicted by a component) stands for the cross-validation of components, when Q 2 is larger than 0.5, the model is expected to introduce good predictive ability, while indicating no significance when Q 2 is smaller than 0.05. Q 2 was computed using the equation below: where PRESS is the abbreviation for the predicted residual sum of squares, and SS stands for residual sum of squares.
In the PLSR modelling, the variable importance in projection (VIP) is a criterion of estimating which independent variables can elucidate the dependent variables most significantly. VIP is calculated by the following equation: In Equation (2), p is the number of independent variables, m represents the number of components extracted from independent variables, k stands for the number of dependent variables, t h represents the components of independent variables, R 2 (y k ,t h ) represents the square of regression coefficients of y k and t h , w 2 hj is the weight of independent variables contributing to component t h . It is generally recognized that the independent variables with VIP values above 1 are of great significance for dependent variables; variable with VIP values below 0.8 are of minor importance; it is of medium significance when VIP values are between 0.8 and 1. The regression coefficients indicate the direction and strength of the impact of each variable in the PLSR model [14]. In order to avoid over fitting which leads to low statistical significance PLSR models for each response factors, not all anthropogenic factors must be included in a PLSR model. Therefore, the following PLSR analysis procedure was followed to obtain a most predictive model. Firstly, for each given water quality parameter, all predictor variables were included in the model. Next, a series of new PLSR models were conducted in which each new PLSR process was implemented with a variable excluded in order to minimize the value difference between the explained variation in the response (R 2 ) and the predictive ability of the model (Q 2 ) This procedure was repeated until as few predictors were remained [27]. Finally, the PLSR model with the largest Q 2 was chosen as the optimal model. A test of collinearity of the explanatory variables was done in prior of the application of PLSR method (Table A2).

Characteristics of Water Quality
Spatial and seasonal variations of water quality parameters in five eco-regions were illustrated by box-plots ( Figure 4). Interpolation maps were illustrated by ArcGis 10.2 in order to help better understanding the spatial variations of water contaminants ( Figure A1). According to the one-way ANOVA, all variables showed significant spatial differences among the five eco-regions (p < 0.05). Most of the variables showed significant temporal differences between different seasons. Large pH values often occurred in high-flow period, except in Zone 1 where high pH was found in normal flow period. EC was found higher in icebound season in every eco-region, however there was no significant difference between high-flow and normal flow period. The low concentrations of DO were mostly observed in ice bound season in every eco region, except in Zone 4. The concentrations of COD showed obvious temporal differences only in Zone 2 and Zone 5, large values were found in high-flow period in Zone 2, while higher concentrations were observed in Zone 5 in normal flow periods. The concentrations of COD Mn were mostly found higher in high-flow periods, whereas low concentrations were found in the ice bound season, and only Zone 5 presents large values in the mean flow period. The concentrations of NH 3 -N were higher in ice bound season in every eco-region except in Zone 1.  percentiles; the small square represented mean; the line in box represented median; values above or below whiskers were outliers). Zone 1-5 refers to the five eco-region listed in Table 1. Figure 5 showed the distribution of land use for each sampling sites at sub-basin scale in 2015. Arable land and forest were the dominant land use in Zone 1, ranging from 15.26% to 65.60% and 6.61% to 62.78% respectively. A relatively less proportion of urban land use was occurred in Zone 1, except for S7 which occupied 7.52% of the sub-basin. Arable land was the dominant land use in Zone 2, ranging from 14.13% to 86.28%. Zone 2 had a relatively larger proportion of urban land use with an average of 4.06% and a maximum of 9.68%. However, the ratio of forest area was relatively lower than other eco-regions with a maximum of 59.52%, while others were lower than 10%. Therefore Zone 2 was obviously disturbed by anthropogenic activities. Arable land was the dominant land use in Zone 3, ranging from 2.76% to 83.46%, with an average of 50.62%. Likewise, a significant amount of forest was discovered in some of the sub-basins, ranging from 1.41% to 93.86% with an average of 38.96%. The proportion of urban land use was the highest among the five zones, with an average of 5.16% and a maximum of 15.26%. A large proportion of forest was observed in Zone 4, with an average of 64.94% and a maximum of 90.46%. The land use ratio of arable land ranked only second to forest, with a maximum of 29.05%. The proportion of urban land use was lower than other eco-region with an average of 2.44%. The dominant land use in Zone 5 was arable land, ranging from 17.53% to 63.02%, with an average of 54.95%. The ratio of forest in most of the sub-basins was lower than 30%, with an average of 30%, the highest was 68.55% in S79. The average proportion of urban land use was 3.08%.

Land Use and Landscape Characteristics at Sub-Basin Scale
The dominant land use types in Songhua river basin were arable land and forest. It was observed that sub-basins with a larger ratio of urban land had a lesser ratio of grassland and forest but a larger ratio of arable land. This observation might indicate that urbanization on one hand had decreased the amount of forest and grassland to support more living residents in the city, on the other hand retained or even increased the amount of arable land for more food was needed. According to the one-way ANOVA, most of the landscape metrics except for LSI and AI were showed significant variations among the five eco-region (p < 0.05) ( Table 3

Linkages between Water Quality Parameters and Land Use, Landscape Metrics in Each Eco-Regions
The PLSR approach was applied to quantify the relationship between water quality and land use/landscape metrics for each eco-region mainly in the high flow and normal flow seasons. The ice-bound season was not included due to the little surface runoff. The summary of each optimal PLSR models were provided in Table 4, including the R 2 and Q 2 of each model as well as the number of components extracted in each model was to reach the minimum difference between R 2 and Q 2 and a larger Q 2 . The value of Q 2 should be larger than 0.5 to make the model predictive and significant. As Table 4 shows, most of the optimal models for water quality parameters extracted two components in both seasons, whereas certain models for water quality parameters extracted three components. In addition, individual models for water quality parameters only extracted one component in some eco-regions which leaded to low values of Q 2 . As we had observed, in Zone 1, Zone 2 and Zone 3, the optimal model for SO 2− 4 in high flow seasons and the models for Cl − in Zone 4 in both seasons, as well as the model for Cl − in high flow season extracted only one component. This indicate that increasing the number of components to the PLSR models cannot continually improve the explained variance and leads to lower predictive ability (i.e., larger gap between model R 2 and Q 2 values), the subsequent components are not strongly correlated with the residuals of the predicted variable [12]. Therefore, the models for SO 2− 4 in Zone 1, Zone 2 and Zone 3 in high-flow season and the models for Cl − in Zone 4 in both seasons and in Zone 5 in high-flow season were of low significance and predictive power.
The regression coefficient (RC) and the variable importance for the projection (VIP) are intuitive and comprehensive expressions of the relative importance of the variables when indicating how important land use types and landscape metrics are to the specific water quality parameters. Tables 5 and 6 illustrated the key variables (VIP > 1) of each optimal model with their regression coefficients (RCs) in high-flow and normal flow seasons respectively. Tables A3 and A4 presented all the regression coefficients of all the explanatory variables in each optimal model. The PLSR weights can be used to better understand the quantitative relation between the explanatory variables and response, because they are linear combinations of the original variables that define the scores. It is another important symbol to indicate the importance of individual land use/landscape metrics to water quality parameters. The weight plots illustrated the key predictors (VIP > 1) of each optimal model and highlighted the predictors with the highest weights in each model ( Figure 6).  Table 5. The relative importance of the key variables in the optimal models in high-flow season.  Table 2.

Y Significant Predictors
In Zone 1, water quality variables were influenced mostly by landscape metrics during high-flow and normal flow seasons. During high-flow season, only pH, EC, DO and SO 2− 4 were most influenced by land use metrics, of which the key variables with highest VIP values were WA UN, UR and AR respectively. The key variables with the highest VIP values for other water quality contaminants were mostly landscape metrics. ED was the most important predictor for COD, COD Mn and TP, while SHDI contributed the most to NO 3 -N, TN and Cl − . NH 3 -N appeared to be influenced the most by SHEI, whereas AI was the most predictive variables of F − . During the normal flow season, the most important predictors for pH and EC were the same as the high-flow season, while DO was most influenced by FO instead of UR. The key variables with the highest VIP values for NH 3 -N and NO 3 -N were UR and FO respectively. ED was the most important predictor for COD Mn and TP, while CONTAG contributed the most to COD. LPI, SHEI and IJI were the most important predictors for TN, F − and Cl − respectively.
In Zone 2, land use metrics predominated in the key variables with the highest VIP values in optimal models. During high-flow season, GR was the most important variables of the variations of DO, COD Mn , NO 3 -N, F − and Cl − , while IJI contributed the most to COD and NH 3 -N. FO, UR, AR and UN were the most important predictors for pH, EC, TN and TP. During normal flow season, pH, DO and COD Mn were all most influenced by GR, whereas UR contributed most to EC, COD, NO 3 -N and Cl − . Landscape metrics such as IJI, LSI and LPI were observed as the most important indicators of the models (VIP > 1). Lower IJI contributed to higher NH 3 -N and TN; higher LPI dedicated to higher TP, while LSI was positively correlated with F − , but negatively correlated with SO 2− 4 . In Zone 3, land use metrics contributed to more water quality variables than landscape metrics. During high-flow season, all water quality parameters were most influenced by AR, except for pH, COD Mn and NO 3 -N. FO was the most important predictors for pH and COD Mn , while LPI contributed the most to NO 3 -N. During normal flow season, AI and PD were observed as the most important predictors for COD Mn and SO 2− 4 , whereas other water quality parameters were most influenced by land use metrics.
In Zone 4, during high-flow season, pH and SO 2− 4 were most influenced by GR, whereas WA contributed the most to EC and COD. FO was observed the most important predictor for DO and COD Mn , and was positively related to DO and COD Mn . UR, AR and UN contributed most to NO 3 -N, F − and Cl − respectively. The rest of the water quality parameters were most influenced by ED, which was the only landscape metrics with the highest VIP values. During the normal flow season, landscape metrics such as SHEI, ED and PD contributed more to water quality parameters than that in high-flow season. SHEI was the most important variable for pH and COD; ED contributed more to NH 3 -N and TP; NO 3 -N and TN was impacted most by PD. Other water quality parameters were contributed most by land use metrics. The most important predictor for DO and COD Mn was FO which was the same as high-flow season. UR contributed the most to EC and SO 2− 4 , while F − was most influenced by GR. In Zone 5, during the high-flow season, AR was the most significant predictor for DO, NO 3 -N and TP; COD Mn and NH 3 -N were most impacted by UR and UN respectively. The rest of the water quality parameters were influenced by landscape metrics in most cases. SHDI contributed most to COD and TN, while F − and SO 2− 4 were most impacted by LSI; LPI and SHEI was the most significant predictors for pH and EC respectively. During the normal flow season, F − was most influenced by LSI, whereas pH and Cl − were contributed most by UR. FO and UN were the most significant predictors for COD MN and SO 2− 4 respectively. Other water quality parameters were most influenced by AR. It should be noted that all the predictors in the final optimal model were to some extent correlated with the specific water quality parameter. However only predictors with VIP values above 1 were considered to be of major importance. Overall, it was observed that the key variables with highest VIP values of the optimal models were spatially and temporally different. In addition, landscape metrics contributed more in Zone 2, Zone 3 and Zone 4, while land use metrics dominated in the other two eco-regions.

Key Land Use Types Predicting Water Quality
Many studies have reported that agricultural and urban land uses contribute to degrading water quality in adjacent aquatic systems, whereas vegetated areas such as grassland and forest have a positive contribution to water quality [6,28]. The results of this study were generally consistent with such previous findings. However not all water quality parameters in each eco-region showed a strong relationship with agricultural land use and urban land use.
In the high-flow season, although arable land was not always the variable with the highest VIP values in the optimal models of water quality indicators, it was predictive and associated with most water quality variables in each eco-region mainly in Zone 3 and Zone 5. Arable land was positively correlated with organic matter and some of the nutrients, while it correlated negatively with DO. This indicated that arable land actually served as a source for pollution in these eco-regions. Previous studies had noted that arable land had a positive impact on degraded water quality due to agricultural activities, such as fertilizer and pesticide application as well as livestock raising, which were often the major non-point source [4]. Forest was found to be closely related to most water quality variables than other land use types in Zone 4. It was identified that forest was positively correlated with DO and COD Mn during both seasons, while negatively correlated with NH 3 -N, TN and TP. It was unexpected that higher forest land use contributed to higher COD Mn , which to some extent contradicted previous research. Therefore, a possible speculation of this phenomenon is that the increase of forest would cause the accumulation of refractory organic matter derived from decaying plant material, afterwards flows into the surface water with overland runoff [29][30][31]. In addition, the terrain in Zone 4 is steeper than that in other eco-region which means that higher slope exists in this region. Previous study claimed that with an increasing slope, higher water flow rates would contribute to soil erosion and to the rates of particulate matter that picks up pollutants [32]. Although the dominant land use type in Zone 2 was arable land, the land use type contributed the most to the water quality in this region came out to be grassland. A negative relationship was identified between grassland and COD Mn , NO 3 -N, TP and F − , while DO was found positively correlated with grassland, which indicated the fixation and absorption effects of grassland for pollutants in overland runoff [5]. However the concentrations of non-point source pollutants, such as COD Mn and TN in Zone 2 were relatively high among the 5 eco-regions and was appeared to be related positively with arable land. This suggested that effect of grassland in decontaminating overland runoff has been weaken, even in upstream of the sub-basin. The main leading might be the looseness of surface soil structure caused by intensive grazing, which had been discovered frequently throughout our field survey. In addition, Zone 2 was known as the black soil area, the rich humus in the soil contributed to the increase of COD Mn and TN in surface water, especially in rainy season.
In the normal flow season, arable land was no longer the variable associating with the most water quality parameters in Songhua River Basin. Forest and urban land use had become the main predictors for water quality in some eco-regions such as Zone 1 and Zone 4. This phenomenon should be attributed to the decrease of precipitation, which reduced the volume of overland runoff generated by arable land. Another reason might be the first flush effects in storm events, which made surface runoff pollutants entering rivers more apparently in high flow season than that in other seasons [33]. Thus, fewer non-point pollutants was transported into surface water in normal flow season. However, forest was known to have the ability of intercepting the degraded water and filtering out nutrients. Thus forest land uses became one of the most influential factors towards the concentration of nutrients. Urban land use revealed its contribution to COD and NH 3 -N indicated that the water quality in Zone 1 was polluted by possible point sources mainly from domestic and industrial sewage since these pollution sources mostly distributed in built-up areas [5,7,32]. TN was the only water quality parameter that was dominated by arable land in Zone 2, while others were contributed mostly by grassland and urban land use. Indicating that point sources were more likely to be the major source of water contamination compared to that in high-flow season as urban areas was mostly covered with impervious surface and the drainage was continually routed to wastewater treatment plants and then discharged to local rivers as point sources [34].

Key Landscape Metrics Predicting Water Quality
Anthropogenic activities not only influence the composition of land use, but also change the landscape pattern. Previous studied had indicated that non-point source pollution loading, including soil erosion, sediment, and nutrient runoff, in rural watersheds was closely relevant to the landscape structure [35]. For example, a previous study had reported that landscape metrics consistently explained a 65% to 86% of the total variation in nutrients and suspended sediment to stream [36]. Landscape pattern metrics have been developed through the use of spatial tools, in order to attain the goal of quantification land use patterns and understand spatial heterogeneity and landscape structure [37].
In this study, degraded water quality was positively related with landscape fragmentation metrics such as PD and ED and was negatively related with LPI in most eco-regions. Previous studies have reported that PD and ED showed positive relationships with TN, TP, COD, and BOD concentrations in reservoirs in South Korea which was dominated by vegetated areas [7]. That was consistent with what it was found in this study, especially in Zone 1 and Zone 4, which are both mountainous regions with large proportions of forest. The value of PD and ED increased when growing numbers of small patched land cover types appeared in the watershed. The degree of forest fragmentation could be reflected by PD and ED [13]. Therefore, highly fragmented forest might not function efficiently to increase the permeation and decrease overland runoff and erosion from agricultural and urbanized areas, consequently let pollutants, sediments and nutrients flowed into the surface water without efficient interception. The CONTAG metric reflects the level of aggregation of land use types. Generally, a low CONTAG metric value means that land use types are highly fragmented, while a high value of CONTAG represents an aggregated landscape [38]. In this study, CONTAG was observed to be negatively related with non-point pollutants in the mountainous normal flow region (Zone 1) in both high-flow and normal flow seasons. Previous research had reported that in streams non-point pollutants are mainly derived from soil erosion and sediment yield [13]. Thus the CONTAG metric might be a factor contributing to soil erosion and sediment yield, especially in mountainous areas. In addition, intensive soil erosion occurred in the eastern part of the mountainous normal flow region. Thus, land use management in this region should focus on the aggregation of land use types in order to reduce in-stream non-point pollutants.
The AI metric represents the degree of physical connectedness and aggregation of land use within watersheds, and it is higher when land uses are more clustered and aggregated. In this study, the AI metric contributed to more water quality parameters in the normal flow season, and a negative relationship was identified between AI and water quality in two seasons. Therefore, degraded water quality usually occurs in watersheds with scattered land uses and plentiful land use patches.
The IJI metric represents the degree to which patch types are interspersed (not necessarily dispersed); lower values represent landscapes with poorly interspersed patch types (i.e., uneven distribution of patch type adjacencies), whereas higher values characterize landscapes in which the patch types are well distributed (i.e., equally adjacent to each other). The IJI metric was a more effective indicator for water quality parameters in high-flow season, which was positively correlated with COD, nutrients, F − and Cl − , while negatively correlated with DO. This suggested that human activities might disperse patches and therefore increase the potential of contaminants flowing into the river.
The LSI metric is an alternative to patch shape indices based on the average patch characteristics, which measures the perimeter-to-area ratio for landscape as a whole [38]. The higher the index indicates the more complex of the patches in the landscape. In this study, the LSI metric was an effective indicator for water quality variables of F − , Cl − and SO 2− 4 in high-flow season in Zone 1, Zone 4 and Zone 5, and pH, EC and NH 3 -N were found positively related to LSI in normal flow season in Zone 1 and Zone 5, besides F − and SO 2− 4 . This suggested that the shape of the patches in the landscape contributed more to anions concentration. Higher LSI indicates more edges presented in a landscape, which leads to negative impacts on water quality.
The SHDI and SHEI are diversity metrics influenced by richness and evenness respectively. Richness refers to the present patch type's number; evenness refers to the distribution of area among different types [38]. Larger values of SHDI and SHEI imply greater landscape diversity. Although SHDI and SHEI were not always the predictors with the highest VIP, they were both positively correlated with degraded water quality in the two seasons. The SHDI metric indicated more water quality variables in each eco-region than SHEI, except in Zone 3 and Zone 4 which only SHEI was observed contributing to water quality variables. This suggested that the degradation of water quality is likely to occur when the number of land use types increased and when different land use types are distributed homogenously.

Importance of an Eco-Functional Regionalization Land Use Pattern Impacts
The results of this study demonstrated that the key predictors and seasonal variations of the optimal models of each eco-region were considerably different (Tables 4-6). It was reported in previous researches that the relationship between land use pattern and water quality to some extent was influenced by regional differences because the gradients of anthropogenic land use were frequently overlapped on an underlying gradient in primitive characteristics of natural terrain [39]. For instance, urban areas often located in flat land instead of mountainous region because the former is more likely to develop rapidly. This is consistent with the results in our study that a relatively high proportion of urban areas generally appears in the plain catchments. Allan emphasized that anthropogenic land uses often covary with natural landscape, a factor that can result in overestimation of land use influence on river water quality. In this study, the five eco-regions were classified based on an analysis involving climatic, hydrologic and topographic factors. The relationships between land use patterns and water quality were analyzed without the disturbance of natural terrain since each eco-region has the similar topographic features. The results showed that the significant predictors of water quality parameters in each eco-region were apparently different. In addition, previous study had proposed that landscape heterogeneity within a large river basin might bring in model errors which implied the influences of agricultural lands in hilly watersheds might be covered up by strong influence of urban areas in the plain watersheds [11]. This indicated the importance of regional basis when exploring the relationship between land use patterns and stream water quality. Statistical models should be conducted on a relatively homogenous region in accordance to natural terrain, climatic and hydrological condition in order to get a reliable result. In addition, it also meets the requirement of the Chinese government that the watershed environment should be administrated based on the eco-functional regionalization.

Model Performance and Limitations
Multicollinearity among various land use/landscape metrics is one of the obstacles when establishing precise relationship between land use patterns and stream water quality. PLSR is of great advantage in this study because it is particularly conducive when the number of predictive indicators is similar to or higher than the number of observations and/or there is strong collinearity among the predictors [12]. Additionally, the PLSR methodology to some extent excludes the confounding relationships among both the independent and dependent variables and encourages a more impartial view of the contribution of land use patterns to stream water pollutants [27]. Therefore, this approach could be applied to the investigation of water contamination in other watersheds.
The method used in this study has some limitations. First, the water quality data during the low-flow and snowmelt period was not collected. Some researchers have claimed that a low-flow period should also be involved in the analysis, as farming activities such as sowing and fertilization occur frequently during this period, which causes negative impacts on river water quality [4,5]. As for the snowmelt period, some researchers take it into account when studying the watersheds located in cold-climate regions, as nutrient export produced by snowmelt runoff may differ significantly from runoff during storm events [25,40]. Thus, it is desirable to involve in situ sampling in low-flow and snowmelt period when investigating the relationships between land use patterns and water quality in the Songhua River basin in order to better understand the temporal variability in the watershed located in cold climate region and with intense agricultural activities. Second, some of the models were of low predictive power within the five eco-regions. such as the models (Q 2 < 0.5) of SO 2− 4 in Zone 1, Zone 2 and Zone 3 in high-flow season and the models of Cl − in Zone 4 in both seasons and in Zone 5 in high-flow season. This indicated that the variations of SO 2− 4 and Cl − might not be exclusively explained by land use and landscape metrics. This was consistent with some previous studies claiming that road density and geology characteristics could also contribute to the variation of ions besides land use patterns [10,41]. Therefore, further researches are inevitable to include other non-land use variables to build empirical models in order to obtain a higher predictive accuracy and confidence level.

Conclusions
The results of this study demonstrated that water contamination of the Songhua River basin in China presented high temporal and spatial variations within the five eco-regions. A partial least square regression (PLSR) approach was applied in exploring the relationships between land use patterns and specific water pollutants during both high-flow and normal flow seasons on a watershed scale. The results suggested that in different eco-regions, water quality parameters were influenced by different land use/landscape metrics. Arable land was observed a predictive variable contributing to degraded water quality mainly in hilly high flow region (Zone 3) and plain normal flow region (Zone 5), indicating that agricultural activities were the main factor contributing to degraded water quality in regions with a relative higher runoff depth and lower altitude. Grassland influenced most water quality parameters in the plain low flow region (Zone 2), suggesting the effect of intensive grazing in plain areas. Forest was observed to be negatively related to degraded water quality in mountainous high flow regions (Zone 4), indicating the importance of forest in mountainous areas. Landscape metrics were predictive variables contributing to degraded water quality in each eco-region during both hydrological seasons. The water quality degradation in both seasons were positively related to landscape metrics of patch density (PD), edge density (ED), landscape shape index (IJI), interspersion juxtaposition index (LSI), Shannon's diversity index (SHDI) and Shannon's evenness index (SHEI), and negatively associated with largest patch index (LPI), contagion (CONTAG) and aggregation index (AI). Additionally, PD and ED were observed to have major impacts on water quality contaminants, particularly in mountainous high flow region (Zone 4), which indicated the fragmentation of forest may not function effectively to decrease overland runoff and consequently let pollutants flow into the river; the landscape metrics of CONTAG was observed to be related to non-point pollutants in mountainous normal flow region (Zone 1) which suggested the correlations between CONTAG and soil erosion and sediment yield; LSI was found an effective indicator for anions in Zone 1, Zone 4 and Zone 5 mainly in high flow season; SHDI metric indicated more water quality variables in each eco-region than SHEI, except in Zone 3 and Zone 4 which only SHEI was observed to contribute to water quality variables. Indicating the variation of landscape heterogeneity contributed to the degradation of water quality in all eco-regions. The results of our study suggested that natural environmental factors such as topography, climate and hydrology may have some impacts on how land use patterns correlate with water quality. Thus, analyzing the relationships between land use patterns and water quality on an eco-functional regionalization basis is considerable. Therefore, further studies are needed to explore how the characteristics of eco-regions affect the casual relationship between land use patterns and water quality.
Author Contributions: P.C. and F.M. conceived and designed the research; P.C. and L.Z. performed the research; M.J. and P.C. contributed to data analysis of this work; P.C., F.M., Y.W., L.Z., Q.Y. and M.J. wrote the paper and approved the final version to be published.

Funding: This research was funded by National Major Science and Technology Program for Water Pollution
Control and Treatment of China, grant number (2015ZX07201-008).

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A    The key predictors with the VIP values above 1 in the optimal models are in bold; "--" means no valid model was found for this water quality variable; Abbreviation of land use/landscape variables are listed in Table 2.  The key predictors with the VIP values above 1 in the optimal models are in bold; "--" means no valid model was found for this water quality variable; Abbreviation of land use/landscape variables are listed in Table 2.