Environmental and Spatial Factors Affecting Surface Water Quality in a Himalayan Watershed, Central Nepal

Various spatial interrelationships among sampling stations are not well explored in the spatial modeling of water quality literature. This research explores the relationship between water quality and various social, demographic, and topographic factors in an urbanizing watershed of Nepal with a comparison of different connectivity matrices to conceptualize spatial interrelationships. We collected electrical conductivity and dissolved oxygen data from surface water bodies using a handheld probe and used the data to establish relationships with land use, topography, and population density-based explanatory variables at both watershed and 100-m buffer scales. The linear regression model was compared with different eigenvector-based spatial filtering models. These spatial filtering models were constructed using five different spatial conceptualizations based on different graph types generated from the geographic coordinates of the sampling sites. Population density, elevation, and percentage of sand in the watershed and riparian regions are most important in explaining dissolved oxygen concentration and electric conductivity. A human signature as population density and increased sand and gravel cover can be detected in this watershed impacting water quality. Among different graph types compared, the relative graph type provided the highest model strength signifying a stronger upstream-downstream relationship of dissolved oxygen, while knearest graph types with four neighbors provided the strongest model performance, indicating the impact of local factors on electrical conductivity. The relationships between socio-environmental factors and water quality and their spatial interrelationships identified in this work shed light on the source, mobilization, and transport of dissolved oxygen and electrical conductivity and can assist the water quality management endeavor.


Background
A stream's water quality is a result of a complex interaction of natural and anthropogenic processes in the watershed. Land-use change, population density, geology, and topography affect water quality in rivers (Baker, 2003;Lintern et al., 2018a). Human-modified land use is generally associated with degraded water quality and undermines ecosystem sustainability, including degradation of the freshwater ecosystem (Allan, 2004;Foley et al., 2005;Zampella et al., 2007). The anthropogenic impacts on surface water quality are not always straightforward, as complex interactions among various social, environmental, climatic, and political factors determine the consequences of these changes (Baker, 2003;Turner and Rabalais, 2003). These impacts are usually manifested as increased water temperature, increased nutrients (e.g., nitrogen and phosphorus), salt compounds, reduction in oxygen availability, and increased conductivity (Lintern et al., 2018a). The high concentration of nutrients and increased water temperature typically results in reduced oxygen levels in the water, as increased temperature reduces the solubility of oxygen, and remaining dissolved oxygen is also consumed rapidly by aquatic organisms, signifying eutrophication and deteriorated water quality (Cox, 2003).
Researchers have been using watershed characteristics at different scales to understand the spatial patterns of different water quality parameters across the stream network (Allan, 2004;King et al., 2005). Different landscape characteristics such as landcover types, topography, and other relevant explanatory features are extracted at scales including the entire watershed, riparian buffer, or some intermediate scales. The scale effects are not universal, as some factors are likely to affect water quality at the riparian scale, while others tend to do that at a watershed scale (Mainali et al., 2019). These relationships are different among different sites, seasons, and parameters studied as well. For example, Uriarte et al. (2011) reported that turbidity and dissolved oxygen (DO) responded to land-use change at a larger watershed scale while nitrogen was affected at the riparian buffer scale. While Mainali and Chang (2018) found a generally stronger influence on water quality at the stream buffer scale, the impact of scale in their model performance varied according to the parameters studied and seasons at which water quality data were collected. Some studies like Pratt and Chang (2012), Sliva and Williams (2001), and Zampella et al. (2007) reported a more significant influence of the whole watershed than a 100m buffer in their analyses.
Regression modeling approaches are commonly used to explore landscape factors affecting water quality at different scales. As water quality information is tied to a location, regression modeling approaches are expected to incorporate spatial interrelationships among different locations from which water quality information is collected. If spatial relationships are not considered, regression modeling might violate the assumption of independence of the residuals of such models. Several spatial regression models overcome the limitation of ordinary least square (OLS) models in analyzing the relationship between water quality and landscape variables. These models include spatial lag and error models (Anselin, 1988), spatial eigenvector-based models (Borcard and Legendre, 2002;Tiefelsdorf and Griffith, 2007), geographically weighted regression (GWR) models (Brunsdon et al., 1998), and spatial stream network-based models (Peterson and Hoef, 2010;Ver Hoef et al., 2006). In this work, we use an eigenvector-based spatial filtering-based regression method to explore the relationships between water quality and landscape matrices. We use eigenvector-based spatial filters to capture the spatial heterogeneity in the data and remove any clustering of residuals, which might lead to residual spatial autocorrelation (Getis and Griffith, 2002). Spatial filtering techniques generate a new set of explanatory variables representing the response variable's spatial structure. A selected set of those eigenvectors are then used as spatial predictors along with other predictor variables in the regression models. This approach has been recently used to modelaverages and trends in water quality Chang, 2018, 2020).
In the water quality modeling literature, different spatial conceptualizations of sampling sites, and their role in model outputs are not adequately explored (Mainali et al., 2019). Most studies use the spatial filtering approach with standard neighborhood criteria and weight matrix parameters without any attempt to modify them. In this work, we aim to explore how spatial conceptualizations of sampling sites rendered as different graph types in spatial-filtering affect the model output of DO and conductivity. We generate spatial eigenvector-based filters using five different graph types -Delaunay, Gabriel, Relative, Minimum Spanning Tree, and k-nearest-and use respectively fitted spatial filters in the regression model to compare their effectiveness in modeling dissolved oxygen and conductivity against the simple linear regression models.
This work uses the Setikhola watershed in central Nepal as a case study to explore the relationships between water quality and landscape features in the Nepal Himalaya. In Nepalese Himalaya, different water quality parameters respond to the differences in land use, land management, natural vegetation, and atmospheric deposition that are usually directly affected by elevation (Jenkins et al., 1995). As in most of the other parts of the world, nutrient loss from forested lands is lower than non-forested lands in the Himalayan region (Pandey et al., 1983). Collins and Jenkins (1996) reported that although the agriculture catchments showed higher ammonium content during the wet season, they were unlikely to damage aquatic biota in Nepal's mostly non-commercial agriculture practices. However, fertilizer input per hectare has since substantially increased, from 31 kg in 1995 to 131 kg in 2015 (Bista et al., 2016). As a result, surface water pollution due to agricultural runoff has also increased, especially in the mid-hill and lowland Terai region of Nepal (Bista et al., 2016;Sharma et al., 2005). Urbanization has also significantly increased in Nepal. In the study watershed, the urbanized area more than doubled from 1990 to 2013 (Rimal et al., 2015). The impact of urbanization on water quality is sparsely studied in Nepal and is mostly focused in the capital city of Kathmandu (Kannel et al., 2007a;2007b;S. Hammoud et al., 2018;Vaidya and Labh, 2017). The spatially explicit information related to water quality and the role of different landscape characteristics is not explored in the study watershed.
We assessed the spatial patterns of dissolved oxygen (DO) and electrical conductivity using the data collected from the field in December 2018 and January 2019. DO and electrical conductivity were chosen because they are important indicators of water pollution and the ecological integrity of surface water bodies (Cox, 2003;Lintern et al., 2018a). Data related to conductivity provide us information about the ability of water to pass electrical current, a measure of the availability of anions usually sourced from various chemicals, including alkali, chlorides, sulfides, and carbonate compounds. Conductivity is also related to temperature, as a warmer temperature tends to have higher conductivity (US EPA, 2013). Conductivity values are important indicators of biological integrity, as changes in conductivity usually indicate that pollution from discharge or other sources is entering the water bodies. The survival of aquatic organisms like fishes, algae, and macrophytes is directly related to oxygen availability in water. DO provides information about the human impacts in the water bodies, as increased temperature from anthropogenic activities leads to the reduction of dissolved oxygen. Polluted water has lower DO concentration because aquatic plants and bacteria in the polluted water consume oxygen, as does the decay of organic materials, which leads to eutrophic conditions (USGS DO, 2006).
A recent review by Mainali et al. (2019) reported that different spatial conceptualizations of the sampling sites to incorporate the neighborhood impacts on water quality remain unexplored in water quality modeling literature. In this work, we compare various spatial conceptualizations of sampling sites by leveraging the graph theory literature and statistical packages available in R software. We attempt to answer the following research questions: (1) How do DO and conductivity spatially vary in this watershed? (2) How different landscape features like the land cover, topography, and population density affect the water quality in the study watershed? and (3) How do different spatial conceptualizations of the sampling sites affect model results in this watershed?

Study area
Our study area is the Setikhola watershed, which includes the Pokhara valley and adjoining hills and mountains in Central Nepal ( Fig. 1). It is an example of an urbanization gradient in Nepal (Rimal et al., 2015). The City of Pokhara is one of the biggest cities in Nepal, a famous tourist destination, and a gateway to the popular Annapurna Conservation Area. The valley floor is a metropolis with a population greater than 500,000, while the hills are dominated by subsistence agriculture. The high elevation regions are mostly near-wilderness with forests, prairies, and snow-covered mountains, protected as a part of the Annapurna Conservation Area (ACAP, 2017). The area of this watershed is about 990 km 2 and includes 381 km of the river; three major lakes cover approximately 9 km 2 (Baral Gauli et al., 2016). There is a total of 6 lakes with 3 major lakes and other smaller lakes (Fig. 1b).
The elevation of the watershed ranges from 700 m to more than 8000 m above sea level. This watershed is located in one of the wettest regions of Nepal, with a total annual rainfall of about 4000-5400 mm, most of which falls during the monsoon season, June-August (CBS, 2013). The flow of rivers and the volume of lakes respond to the cyclic pattern of rainfall. The flow rate of the river was recorded at 40 AE 37 m 3 /s during June and July of 2012 (Pokharel et al., 2018). The lake system of the valley floor was recently added to the list of important wetlands as a Ramsar site (Baral Gauli et al., 2016). The water bodies of the proposed study area are home to dozens of waterbird species, native fishes, endangered otters, and amphibians (Bhandari and GC, 2008;Husen and Sherpa, 2017;Kafle et al., 2008). Many endangered raptors, including the slender-billed vulture, also inhabit this area and depend on the water resources directly and indirectly.
Most of the recent biodiversity-related studies in this region only focused on terrestrial systems like forests and rangelands, typically overlooking aquatic biodiversity (Thapa et al., 2015). The water system is an important habitat for different aquatic organisms, provides ecosystem services to people living around it, and is also a major economic driver in this valley, including the tourist attractions in lakes and rivers, and fishery activities in the lakes (Gurung et al., 2005;Husen and Sherpa, 2017). With increasing development around the lake in surrounding towns, sand gravel mining activities along the riverbanks for building construction, water quality has become a great concern for sustaining various human activities and ecological functions. Therefore, understanding the factors affecting the quality of surface water is of paramount importance for both people and the ecosystem in this watershed.

Water quality data
We sampled 93 data points from rivers and lakes of the watershed. There were 48 river data points and 35 lake data points. These data points were aggregated to 61 points after combining duplicate sampling in the river and different locations in the lake (Table 1). The field data were collected from December 20, 2018, to January 20, 2019. This dry winter period was chosen to minimize the effect of meteorological factors on water quality. There was not any precipitation in the watershed during and preceding the data collection period based on the Pokhara airport precipitation data acquired from the Department of Hydrology and Meteorology, Nepal (DHM, 2020, Fig. 2). We collected pH, conductivity, DO, and temperature data using the YSI probe (Professional Plus #603190). The YSI probe was dipped just below the surface in  different water bodies. In this work, we use electrical conductivity and dissolved oxygen (DO) data because they were relatively stable across the different times of the day in the watershed in winter, thereby allowing spatial pattern analysis.

Landcover data
A landcover classification of a Landsat 8 image was performed using the Google Earth Engine (Google Earth Engine, 2020). A cloud-free image was selected for the year 2017 as there was not any cloud-free image available for the year 2018 or early 2019 when sampling was performed. We used the Classification and Regression Tree (CART) classification method to classify land cover into seven different classes (Urban Light, Urban Dense, Agriculture, Forest, Sand, Bare, Snow & Glaciers). The Urban Dense landcover refers to the region with densely developed areas like downtown which are also prevalent in some of the highway junctions in our study area. The Urban Light land cover refers to the urban sprawl in the agricultural regions. These lightly dense urban regions usually are interspersed with agricultural lands.
The overall accuracy of the landcover map was about 82 percentage. The accuracy was measured by creating an error-matrix with a total of 115 polygons. Based on landcover information collected in the field, a set of known landcover type polygons were created, covering the entire watershed. The landcover category of those polygons was compared with the classified image by creating a confusion matrix (Lewis and Brown,Fig. 2. Daily precipitation during and before the data collection (DHM, 2020). Field water quality data were collected during December 20, 2018, and January 20, 2019. The confusion matrix provides us information about the percentage of pixels correctly classified in different landcover types. The confusion matrix was used to calculate the user's accuracy and the producer's accuracy, which were averaged to derive an overall accuracy. The landcover map hence derived was then used to extract different watershed level percentage land cover types to use as explanatory variables (Fig. 3a).

Topographic data
We used the Department of Survey, Government of Nepal's 20-m contour data as our elevation dataset. This dataset was interpolated to the digital elevation model (Fig. 3b) using the topo-to-raster the interpolation technique with ArcGIS (ArcGIS 10.5.1, 2020). The elevation surface was converted into a slope raster using the surface analysis tool of ArcGIS 10.5.1 (Fig. 3c). The interpolated elevation surface was also used to delineate the watershed boundary for each sampling station. The watershed polygons were used to extract the percentage of different landcover types, human population density, and an average of elevation and slope.

Population data
The latest population estimate based on WorldPop data was used (WorldPop Nepal, 2015). This is a 100-m resolution population estimate for the year 2015 (Fig. 3d). The population raster was clipped with a watershed boundary shapefile.

Watershed delineation and predictor variables extraction
The subwatershed boundaries of the study area were delineated for each sampling point using the watershed hydrology tool of ArcGIS, which involved calculating flow direction, flow accumulation, and delineation of watershed boundary based on the user-defined outlet. We used the zonal statistics tool to calculate an average and standard deviation of elevation, slope, and population density. The zonal histogram tool was used to calculate the number of pixels of each landcover type for each watershed draining to the sampling points. That value was converted to the percentage of each landcover type. A buffer of 100m from the center of the stream was calculated using the buffer tool in ArcGIS. We chose a 100m buffer as most of the streams in this watershed are narrow and a 100-m distance from the stream captures the immediate surrounding of the stream. It is also a standard practice in the water quality modeling literature to start the buffer distance with 100 m (Allan, 2004;Mainali and Chang, 2018;Sliva and Williams, 2001). Those buffer polygons were clipped for each watershed. Predictor variables were extracted for the buffer of each watershed draining into the sampling point.

Exploratory data analysis
We mapped the spatial patterns of different water quality parameters and compared the differences between rivers and lakes. There were 48 river data points and 35 lake data points. To test whether there is significant spatial clustering, we carried out spatial cluster and outlier analysis (Anselin's Local Moran's I) statistics using ArcGIS. This clustering was used to map high and low-value clusters of the water quality parameters in the watershed.

Regression analysis
After all the explanatory data sets were extracted for each sampling point, we used R version 3.6.1 software to analyze the data (Bivand, 2019;R Core Team, 2019). Only stream data points were used during regression analysis to remove any noise from the lakes.
This spatial linear regression modeling approach attempts to explore two major things; first, the relationships between various water quality parameters and landscape features at different scales, and second explore the impact of various spatial conceptualizations of sampling sites. This modeling approach attempts to mimic the spatial interrelationships between sampling sites by comparing several spatial approaches. In doing so we use both environmental variables related to watershed characteristics and processes as well as the spatial variables as spatial eigenvectors derived from distance matrices among sampling sites. The spatial eigenvectors which are spatial predictors are derived for each graph type to account for the spatial interrelationships depicted in each graph type. This will provide us with the model strength as an R 2 value, various model coefficients representing the magnitude and directions of each environmental variable, and significant spatial predictor as spatial eigenvectors.
The response data sets (Dissolved Oxygen and Electrical Conductivity) were evaluated for their distribution using the Shapiro-Wilk test. We found that DO concentration was normally distributed while electric conductivity was not. Therefore, electrical conductivity was logtransformed before the regression modeling. The variation inflation factor (VIF) statistics were run to identify the predictor variables that were not autocorrelated. We chose predictor variables having VIF less than 10. Using the predictor variables, regression analysis was run for dissolved oxygen and conductivity both at the watershed and buffer scale.

Spatial regression models and different graph types
In this work, different spatial interrelationships among sampling sites were explored using graph theory. Graph theory uses the simple mathematical concept of nodes connected by the edges that have weights and directions. These edges connected by nodes can be used to decipher the processes and mechanisms of the underlying spatial phenomenon being studied (Dale and Fortin, 2010). Several graph types are being used in graph theory literature. These different graph types have different levels of connectivity and result in different adjacency wmatrix (Yan et al., 2019). We hypothesize that using different connectivity matrices resulted from these graph types allows us to examine the spatial relation among sampling stations to better understand the underlying process and mechanism of water quality parameters. A default spatial graph type of spatial filtering algorithm is the Delaunay graph type, a 6-node degrees graph type (each node connects to 6 other nodes). The other graph types used are the subgraphs of the Delaunay that have different node degrees: Gabriel-4, Minimum Spanning Tree-2, k-nearest neighbor-2, and relative -3 (Dale and Fortin, 2010). All the graph types used in this analysis are undirected maps where edges link two vertices symmetrically (Fig. 4). Some of the graph types, like Relative and Minimum Spanning Tree, mimic the stream network to a certain extent.
Spatial-filtering algorithms were implemented using the spatialreg package in R version 3.6.1 (Bivand, 2019;R Core Team, 2019). The first step of this process involved creating a weight matrix based on neighborhood criteria using different graph types (Fig. 3). Each weight matrix was then decomposed and transformed using a set of mathematical functions to create eigenvalues and corresponding n-1 eigenvectors (Chun et al., 2016;Tiefelsdorf and Griffith, 2007). A set of fitted spatial filters that mimics the spatial structure of the response variable and can reduce the residual spatial autocorrelation was then selected to use as predictor variables along with other environmental variables in spatial regression for each graph type (Tiefelsdorf and Griffith 2007).
The eigenvector-based spatial filtering can be expressed as the following equation.
In equation (1), Y is a dependent variable, X is a matrix of independent variables. E k denotes the selected matrix of fitted spatial-filtering based eigenvectors, β is a set of regression coefficients for predictor variables, β ε is a set of regression coefficients for selected eigenvectors, and ε is random noise (error) (Chun et al., 2016;Mainali and Chang, 2018). While identifying coefficients the program uses a t-test to identify significant variables and the coefficients derived from the regression model. Fig. 4. Schematic representation of spatial patterns of the data points based on different graph types (Data points were created randomly using R software version 3.6.1).

Spatial patterns DO
The DO values of the watershed range from 4.7 to 10.38 mg/L with an average concentration of about 7.00 mg/L. The DO concentrations are higher at the main stem of Seti River while they are lower in other tributaries and lakes (Fig. 5a). There are clusters of high DO values in the high elevation regions, but no low-low clusters (Fig. 6a). The median difference of DO is significant (p ¼ 0.00048, H-test) between rivers and lakes (Fig. 7a), with higher DO in rivers than lakes. The DO values along the Setikhola stem are the highest. This result shows that the main stem of Setikhola River has an excellent DO range to support aquatic life, while DO in lakes and other tributaries are lower.

Conductivity
The conductivity of this watershed ranged from 16.1 to 354 μs/cm with a mean of about 150 μs/cm. Pokharel et al. (2018) reported an average of 166 μs/cm conductivity in the Seti-Khola River. In Fig. 5b we can see that some of the western tributaries have significantly lower conductivity than the rest of the watershed. Conductivity also substantially differed between rivers and lakes in this watershed, with significantly higher values in rivers than lakes (p¼ 1.11*10 À8 , H-test) (Fig. 7b).

Correlation analysis
The elevation standard deviation was significantly correlated with both DO and conductivity at both scales, while slope was positively correlated with DO at buffer scale only (Table 2). But slope standard deviation was correlated significantly with DO at the buffer scale while with conductivity at both scales. The average population density was significant for conductivity at the buffer scale only, while the standard deviation was significant at both scales. The forest landcover was significantly positively correlated with DO at the buffer scale, while agriculture was significantly positively correlated with both DO and conductivity at the buffer scale but not at the watershed scale. The percentage of the sand cover was significantly negatively correlated with the conductivity at the buffer scale.

Regression results
The R 2 value of the DO model ranged from 0.25 to 0.5 while R 2 values of conductivity ranged from 0.3 to 0.85 (Tables 3 and 4). The higher R 2 values for both spatial and aspatial models were reported using the 100m buffer scale. Fig. 8 displays spatial interrelationships among different sampling locations. The Relative and Minimum Spanning Tree graph types are the closest representation of the stream network, while knearest graph types have revealed the local clusters based on the immediate neighbors. The relative graph type yielded the highest model  performance for DO, while the k-nearest graph type yielded the highest model performance for conductivity (Fig. 8). The independent coefficients associated with the landscape matrix with the different graph types are the same. It refers to the fact that the intensity and direction of impacts of those parameters on water quality values are similar. The only difference being the spatial eigenvectors (vec). We can see from Tables 3 and 4 that there are a different set of spatial eigenvectors loaded in different graph types even though the coefficients associated with the landscape level predictor variables are the same. In Tables 3 and 4 we have reported the variables which are significant with p-value less than or equal to 0.05.

Dissolved oxygen regression model
Different spatial conceptualizations yielded various model strengths for DO. The R 2 values with explanatory variables at the watershed scale ranged from 0.25 to 0.48, while it is generally higher at the buffer scale with values ranging from 0.35 to 0.5 (Table 3, Fig. 9a). All models were statistically significant with a 95 percent confidence interval (p 0.05). As shown in Fig. 9a, spatial filtering-based regression always increases model performance, but the highest model performance for DO models was achieved when the relative graph type was used in both watershed and buffer scales. Only the standard deviation of elevation was a significant predictor at a watershed level. The standard deviation of the population and percentage of sand/gravel were significant predictors at the 100-m buffer scale (Table 3). The best model was derived using the relative graph spatial conceptualization at the buffer scale, with predictor variables % sand, and spatial eigenvector number 6 and 16.

Conductivity regression model
The conductivity model strengths were generally higher than DO. All models were significant at p 0.05. The model strengths of conductivity also varied according to different spatial conceptualization. The R 2 values ranged from 0.3 to 0.85 at the watershed scale while the buffer scale model strength ranged from 0.62 to 0.84 R 2 values (Table 4, Fig. 9b). Buffer scale models were usually weaker for conductivity models except for the aspatial linear model. The k-nearest graph model   strength was comparable between watershed and buffer scale models which also yielded the highest model strengths at both scales. In the regression model, the population standard deviation was always positively related to conductivity. When k-nearest spatial conceptualization was used, the average elevation was also positively associated with conductivity at the watershed scale. But at the buffer scale, elevation standard deviation, population standard deviation, and percentage bare land positively explain the variation of conductivity while percentage sand predicts it negatively (Table 4). The k-nearest graph at the watershed and the buffer scales had an R 2 value close to 0.85. However, the watershed scale model is simpler, with elevation and population standard deviation along with eigenvectors 3, 5, and 8 as the predictor variables.

Spatial patterns of dissolved oxygen and conductivity
Our DO range falls within the range reported elsewhere in Nepal and other Asian countries (Adhikari et al., 2017;Su et al., 2012;Yadav et al., 2019). Pokharel et al. (2018) reported an average of 8.0 mg/L in the Seti-Khola River from the data collected in July 2012. DO values greater than 4.0 mg/L are considered fair to support aquatic life, while higher than 6.5 is good, above 8.0 is excellent (Washington Ecology, 2002). In our study, DO is generally higher in the mainstream high-flow river, which is consistent with other studies that report increasing river flows are associated with high DO (Post et al., 2018). A relatively random spatial pattern for DO except for a high-high cluster of the high elevation result suggests that the factors affecting DO concentration are also randomly distributed in the watershed. The high-high cluster in the high elevation region might be associated with proximity to forest, cooler water temperatures coming from the snow and glaciers, the steeper slope, leading to higher turbulence resulting in rapid re-aeration. (de Mello et al., 2018;Su et al., 2013).
The conductivity range we reported is within a standard limit (max of 1500 μs/cm) according to the Nepal government (Water Quality Standard Nepal, 2005). Our conductivity values are within the range of previous studies like Pokharel et al. (2018) who reported an average of 166 μs/cm in this watershed. The higher range of conductivity in the high-flowing river like main SetiKhola and its bigger tributaries, and lower values in the smaller tributaries and lakes, suggest that conductivity is a function of watershed size and probably in-stream activities such as the dissolution of salts from bedrock. In a larger watershed, water delivered to the surface water comes in contact with more soil surface, thereby washing more ions and increasing conductivity (Water on the Web, 2020). We also cannot rule out the possibility that the differences in conductivity in different parts of the watershed might be a consequence of the differences in underlying geology: rock types with abundant dissolvable ions tend to increase water conductivity in the stream (Water on the Web, 2020).  Water quality in the study lakes was poorer than in the fast-flowing rivers that recycle nutrients and oxygen quickly. Both DO and conductivity were lower in the lakes. Notice, however, that there were some tributaries where conductivity was lower than the lakes, probably because of their small watershed size and/or underlying geology. In many cases, lakes have different water quality conditions from rivers because of their stagnant nature, physicochemical conditions, and responses to receiving waters that are typically affected by a combination of natural and human impacts (Low et al., 2016). Lakes hold nutrients and increase concentration over time, which can lead to eutrophication. All the lakes in this region also suffered some form of eutrophication, with such impacts more visible in small lakes (Field visit 2018/2019). According to local people, the macrophyte growths in bigger lakes are periodically removed to make room for boats. The aquaculture practices in the lakes, like fish farming in some of the lakes, and other factors such as the presence of the river in the watershed, land use, geology, and climate affect the intensity of human impacts in the lake (Nielsen et al., 2012;Zang et al., 2011).

Relationship between landscape matrix and water quality
We report that the riparian forest cover is positively correlated to DO, which is in line with other studies (e.g., (de Mello et al., 2018). The urban land cover did not directly correlate with either DO or conductivity. It is probably because dense urban land only covers a small area and is not evenly distributed across the entire watershed. The strong correlation of forest land cover with DO at the buffer scale suggests that all other human-modified landcover types are detrimental to DO, as expected according to other studies (Zhou et al., 2012). The effect of land cover in DO is manifested through increasing temperature, which leads to increased biological oxygen demand and depleted oxygen in the water bodies (Schindler et al., 2017). Various other studies have also found agricultural land use affecting DO significantly, which is consistent with our finding (Yadav et al., 2019). A negative effect of the built-up area and population growth on DO are also reported in various parts of the world (Su et al., 2013).
DO in surface water measures the ability of water to support life; it can be affected by various watershed factors. Different studies have found varying levels of success in modeling DO utilize landscape characteristics and statistical approaches (Su et al., 2013). found a maximum of 0.83 R 2 when they compared various spatial statistical models for the Qintiang river of China, while de Mello et al. (2018) reported 0.72 in the Sarapui River basin of Brazil (Chang, 2008). reported R 2 values in the range of 0.7 in the study of the Han River Basin, Korea. Although lower than these studies, we were successful in deriving the model with a reasonable R 2 value of 0.5 using a combination of somewhat limited socio-environmental (population standard deviation, agriculture, sand, and bare land cover) and spatial-filter based variables. The remaining variations might be explained by geology, soil types, and climatic variables, which are unavailable in the study region. Our result suggests that the percentage of sand coverage at the stream banks is a significant determinant of DO. This finding suggests that the sand and gravel mining rampant in the riparian area of this watershed might be reducing oxygen availability in the water bodies. Some previous studies have shown that sand and gravel mining can affect the aquatic ecosystem and also degrade overbank areas (Sreebha and Padmalal, 2011). However, the exact mechanisms by which the gravel and sand mines impact surface water quality remain to be explored.
Conductivity can be modeled with watershed characteristics better than other water quality parameters because of easier movements of soluble ions to the water, which are unique to different landscape characteristics under consideration (Lintern et al., 2018b). We found a high of 0.8 R 2 value in the current study. Conductivity can be affected by various watershed levels and in-stream factors like the concentration of phosphorus and nitrogen in the water, area of wetland surrounding water bodies, and climatic factors like precipitation (Fracz and Chow-Fraser, 2013). We also found several of these factors affecting the conductivity concentration of the river reaches. The presence of agriculture or sand cover and high population density reduces conductivity significantly in our watershed, which aligns with the study by Wenner et al. (2003) who reported that degraded streams usually had lower conductivity.

Impacts of spatial scales
Various studies have found different results in terms of the scale at which landscape matrices affect water quality. Studies have found a stronger effect of watershed characteristics than buffer scale characteristics on water quality in their models (Houlahan and Findlay, 2004;Pratt and Chang, 2012;Zhou et al., 2012). In contrast, Mainali and Chang (2018) reported a 100-m buffer as the best scale in explaining various water quality parameters in a larger river basin in South Korea. Similarly, we found generally higher model strength at the buffer scale for DO while similar model strengths between 100-m buffer and watershed scale for conductivity. Our results also indicate that there was a higher influence of different factors at the buffer scale than the watershed scale; land cover in the immediate surrounding of the river like sand and agriculture are significantly making water quality worse by reducing DO and conductivity.

Impacts of different spatial conceptualizations
We report that spatial filters significantly increase model performance, and spatial conceptualizations matter when creating spatial filters because they produce different model outputs. When spatial eigenvectors are created, the weights are provided based on the values of the neighborhood, which are different in different graph types. For DO, the highest model strengths were with Relative Graph type while it was the k-nearest for conductivity. Relative and Minimum Spanning Tree are the graph types closest to the real river network of our watershed; a difference between Relative and Minimum Spanning tree is in the connections between stations on the west side of the watershed. Relative graph type is closer to the real river network as the edges in this graph more closely follow the river network. The highest model strength in Relative Graph type suggests that DO is more directly affected by upstream-downstream relations along the river network. Many previous studies also showed that DO concentration was predominantly governed by various upstream factors like solute concentrations (Bailey and Ahmadi, 2014) and inclusion of upstream-downstream relationships improved the model performance of DO (Money et al., 2009).
The k-nearest spatial conceptualization refers to the neighbors defined around its immediate surroundings in all directions. The higher conductivity model strength using k-nearest spatial conceptualization suggests that conductivity is more affected by local than upstream factors. The local clustering of conductivity could be better captured by knearest clustering than other graph types. Previous studies also reported that the electric conductivity of the river is influenced by neighbors in all directions, or upstream values (Lintern et al., 2018b;Peterson and Hoef, 2010). It is also to be noted that the model strengths using other graph types are also significant, and only slightly lower than the k-nearest spatial conceptualization.

Conclusions
The spatial patterns of DO and conductivity, their relationships with socio-environmental factors, and various spatial and statistical interrelationships identified in this work elucidate the source, mobilization, and transport of DO and conductivity and can guide water quality management efforts. In this watershed, we report that the spatial clustering pattern of DO is affected by upstream factors, thereby revealing distinct DO concentrations in the main-stem and tributaries. Conductivity also revealed distinct spatial variations in main-stem and other tributaries and exhibited local clustering across tributaries.
The spatial regression models were successfully developed and compared using water quality data collected in the field, and various geographic information systems based social and environmental data. Among the factors considered in the analysis, we found the population density, agricultural land cover, and sand cover negatively impact the water quality as revealed by their relationships with DO and conductivity. The inter-scale comparison revealed a generally stronger impact of a 100-m riparian scale over the entire watershed in explaining the variation of DO and conductivity.
Our work provides a novel example of using graph theory in elucidating relationships among water quality measurement sites and their affinity with landscape processes. The model strengths are usually different according to the different spatial conceptualization of interrelations among sampling stations, as demonstrated by the graph types. Among different graph types compared, the relative graph types provided the highest model strength, signifying stronger up-stream downstream relation with DO, while k-nearest graph types with four neighbors provided the strongest model performance, indicating the greater impact of local factors in electrical conductivity.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.