Spatial Analysis of a Chesapeake Bay Sub-Watershed: How Land Use and Precipitation Patterns Impact Water Quality in the James River

Changes in land cover throughout the Chesapeake Bay watershed, accompanied by variability in climate patterns, can impact runoff and water quality. A study was conducted using the Soil and Water Assessment Tool (SWAT) for the James River watershed in Virginia, the southernmost tributary of the Chesapeake Bay, from 1986 to 2018, in order to evaluate factors that affect water quality in the river. This research focuses on statistical analysis of land use, precipitation, and water quality indicators. Land cover changes derived from satellite imagery and geographic information system (GIS) tools were compared with water quality parameters throughout that timeframe. Marked decreases in forest land cover were observed throughout the watershed, as well as increased residential development. Our findings suggest strong links between land cover modification, such as residential development, and degraded water quality indicators such as nitrogen, phosphorus, and sediment. In addition, we note direct improvements in water quality when forest land areas are preserved throughout the watershed.


Introduction
The James River, a 348-mile-long waterway with headwaters in the western mountains of Virginia [1] runs through the width of the state and empties into the Chesapeake Bay as one of its 5 largest tributaries [2]. An estimated 4 million people rely on the James River for drinking water and recreational activities, as well as residential, industrial, commercial, and military infrastructure [1]. The Chesapeake Bay, the USA's largest estuary, supports thousands of animal and plant species, as well as the approximately 18 million people who live in its watershed [3].
Human interaction with the landscape-including waste disposal and urban sprawl; fertilization of lawns, golf courses, fields, and crops; as well as litter and paved surfacescan affect runoff into the James River, and other rivers that ultimately empty into the Chesapeake Bay. A 28% increase in population is expected from 2000 to 2040 throughout the whole James River watershed basin, with population projected to exceed 4.7 million [1]. The region is predicted to expand, so evaluating the factors influencing water quality is critical for expansion and development. Suburban neighborhood growth in this area generally involves clearcutting of existing forest. Deforestation destroys habitats that would naturally slow and filter the runoff before reaching the river. With an increase in urban sprawl from the growing population, we see a rise in impervious land cover. This, in turn, causes an increase in surface runoff, collecting pollutants and transporting them to downstream locations via higher river discharges. Additionally, contaminants from residential, agricultural, and industrial sources, as well as the transportation sector, can work their way directly into the hydrosphere. These include fertilizer and animal manure from food production, power plant discharges, sewage, mining, and industrial contaminants. As a result, certain land use types related to agricultural and urban landscapes, as well as precipitation patterns, are factors to consider as contributors to excess nutrient and pollutant runoff into the James River, and ultimately, the Chesapeake Bay.
Water conservation efforts generally focus on the land immediately surrounding the body of water and can fail to account for the larger watershed surface area and runoff potential. It is therefore necessary to continue to highlight the impact of inland runoff on water quality, in order to reinforce the urgent need for land-based conservation procedures to protect our water resources. For this reason, this study offers a comprehensive hydrological analysis, comparing changes in precipitation patterns and land cover over time.
The hydrological model used in this study is the Soil and Water Assessment Tool (SWAT) model (Texas A&M University, College Station, TX, USA), because it allows the watershed to be divided into sub-watersheds, and incorporates data on soils, land use, slope, precipitation, and other variables in order to simulate runoff and subsequent nutrient transport [4]. This model is ideal for water quality analysis and predictions, as it incorporates land use, soil, and climate data [5].
There is a need for research using the SWAT model on the James River watershed, since no long-term published literature is available using this well-established hydrological model. To our knowledge, the only study using such an approach in the region was conducted on a nearby river, the Rappahannock River, and focused on the pilot forecast model Chesapeake Bay Forecast System (CBFS), which utilized the SWAT model [6]. Findings indicated that the SWAT model in this region performed well, especially for modeling streamflow (R 2 = 0.74), nitrates (R 2 = 0.65), phosphates (R 2 = 0.51), and sediment (R 2 = 0.64) [6]. This performance could be further enhanced with more data sites and longer duration of data for calibration, as well as performing and comparing similar SWAT models in other basins throughout the Chesapeake Bay watershed [6]. The extensive temporal data available for observations along the James River make it an ideal basin in which to further the SWAT modeling research in the Chesapeake Bay watershed.
Other methods have been used to detect water quality patterns and sources of pollution in the Chesapeake Bay watershed. Zhang (2017) [7] found that the James River showed increases in phosphorus and sediment when discharge rates were higher, potentially due to increases in nonpoint sources of runoff from changes in land cover (especially urbanization) and climate. Ryberg et al. (2017) [8] showed that increasing rates of precipitation and urban/suburban sprawl were contributing to excess phosphorus being discharged into the Chesapeake Bay from continued land development, despite positive improvements in farming and waste management practices. Suburban-type land use, where streets and storm drains swiftly transport runoff into the watershed, has been shown to be a plausible contributor to increasing nitrogen levels washing into the bay [9]). The James River showed an annual increase in nitrogen of 37 kg/km 2 over the 1990-2010 time period, likely due to suburban and urban sprawl, as well as an increasingly warmer and wetter climate [10]. These investigations show the importance of researching water quality in the Chesapeake Bay watershed. The impact of land cover changes over a long-term basis is relevant, as it can show planners and lawmakers why protecting our land is a valued component of water conservation.
As the Hydrologic and Water Quality System (HAWQS) (Texas A&M University, College Station, TX, USA) is a relatively new online platform for SWAT modeling introduced in 2017, prior availability of studies using the HAWQS is limited at this point. The HAWQS was used in a study by Chen et al. (2020) [11] to model the Upper Mississippi River Basin in order to compare various techniques within the SWAT model. Moreover, Yen et al. (2016) [12] used the HAWQS to demonstrate a SWAT model setup for the Illinois River Basin. Although successful modeling results were obtained in these and other studies, there is only a limited number of HAWQS research papers available, suggesting that further studies using the HAWQS, such as this one, are warranted.
Application of SWAT modeling, with rigorous calibration and validation protocols to characterize the factors affecting water quality, is generally lacking in the James River watershed-a region that is rapidly urbanizing. To help bridge this knowledge gap, the SWAT model is used in this paper in order to determine what land use factors and precipitation patterns contributed to increased or decreased pollution by nonpoint sources throughout the James River watershed from 1986 to 2018 (the timeframe of maximum data availability). The goal is to investigate the drivers for water quality fluctuations in the James River watershed by using a GIS-based spatial analysis of land use, precipitation, and water quality data on subdivided sections of the river over time. The overarching research question for this study is: do precipitation, runoff, and land cover changes exhibit a spatiotemporal relationship with water quality parameters, such as increased nutrients and sedimentation, throughout the James River watershed?

Model Setup
The SWAT model was used within the HAWQS system (version 1.1) to carry out the James River watershed analysis. The SWAT model configures the watershed into hydrologic response units (HRUs) in order to route the flow through the watershed [5]. HRUs have uniform land cover, slope, and soil characteristics within the sub-basins so as to simplify the modeling calculations within the program and enhance the accuracy of the runoff predictions [5]. Setting up the SWAT model in the HAWQS first requires delineating the watershed. The mouth of the James River empties into the Chesapeake Bay, but it begins in the Allegheny Mountains near the western state line of Virginia, traveling across the entire state before draining into the Chesapeake Bay [1] (Figure 1). The HUCs along the mainstem of the river are indicated in the list above, as are the tributaries of the James River. The total area of the watershed is 26,791.74 km 2 .
Part of the SWAT model process includes demarcating the watershed into hydrological response units (HRUs), which represent subdivided areas with common values for land, soil, and slope [12]. With the 8 HUCs of the James River watershed, the SWAT model was originally divided into 1449 unique HRUs. In order to simplify the model's processing and analysis, the number of HRUs was reduced in this case to 438 by eliminating the smaller land and soil thresholds, set at below 7% of the total area. During HRU selection, 7% was found to be the best value in order to eliminate only minor land use and soil categories that were not relevant to the analysis, while still retaining consistent ratios for the dominant categories used for comparison with historical land use changes during this study. An exemption was set to retain land use categories related to residential and/or urban developed areas, as these are relevant to this study. The next step in setting up the SWAT model within the HAWQS was to create a scenario. A scenario includes selecting the meteorological data to use, time period for analysis, and model warm-up period prior to generating the final model output. In this case, the meteorological data chosen were the Parameter-elevation Regressions on Independent Slopes Model (PRISM) data. This option was chosen because it had full availability throughout the time period, as well as full coverage of the study area. The start date of the SWAT simulation was 1 January 1986, and the end date was 31 December 2018, as that was the final possible date available from the system. A warm-up period of two years was selected, in keeping with the recommendation of studies using the SWAT model, resulting in the output of the model run beginning 1 January 1988 [13]. Using two years for a warm-up period for the modeling allowed enough time for the model to change the input, and thus reduced output uncertainty [14]. Although the SWAT model and input data are daily, a monthly output option was selected for this project, since its focus is on long-term and seasonal patterns. Table A4 in the Appendix A shows the sources of the full default data that were included by the HAWQS system in the SWAT model run, and Table A5 in the Appendix A shows the SWAT model's setup parameters. The SWAT version run was SWAT 2012 rev. 670.

Model Calibration and Validation
Calibration is an important step in any modeling process. While several types of uncertainty, such as input and output data uncertainty, as well as model structure and parameters, are inherent in any modeling process, calibration helps enhance accuracy and minimize the impacts of the uncertainty on the model output results [14]. The developers of the HAWQS system calibrated the SWAT models for 79 United States watersheds as part of their initial setup of the system [13]. The James River was among those watersheds included in the initial calibration, as was the Appomattox River-one of the tributaries of the James River watershed. The monthly calibration began with streamflow, which is crucial to the model performance in order to ensure that the correct volume of water is modeled prior to the water quality components. Sediment load was calibrated, followed by water quality data [13]. The results of the statistical analyses related to the calibration are included on the HAWQS website [13]). The SWAT-CUP program (Texas A&M University, College Station, TX, USA) was used for calibration, with the option of the sequential uncertainty fitting algorithm (SUFI-2) selected to optimize the model results. SUFI-2 archives results within 95% uncertainty range by accounting for uncertainty as well as the sensitivity of model parameters [13].
For the James River, Virginia watershed, the sites used for calibration were the James River at Cartersville, Virginia (USGS station ID 2035000), and the Appomattox River at Matoaca, Virginia (USGS station ID 2041650). Table 1 describes the detailed location of the calibration sites. The calibration was performed on the data from the years 1983-2001. The USGS data from the James River at Cartersville are a robust and longstanding source of water quality and quantity data for the James River before it enters the Richmond, VA area and the tidal portion of the river. The calibrated values of the Appomattox River site apply only to HUC02080207 prior to both calibrated HUCs draining into HUC02080206. These two sites were deemed sufficient for calibration purposes since they include the most prolific data available prior to reaching the tidal zone of the river, especially since a limitation of the SWAT model is that it does not model tidal processes.
The calibration and verification for the James River watershed was performed for flow, total suspended solids (TSS), total nitrogen, and total phosphorus. These variables were chosen because they have the longest period of availability with respect to water quality observations. A limitation of using the HAWQS is that the full calibration dataset is not provided in a format suitable for publication. Therefore, the calibration analysis was performed outside of the system, and only flow is shown here, due to irregularities in the observed data. Although calibrated by the HAWQS beginning in 1983, our model output began in 1988 because of our chosen timeframe for analysis including a two-year warmup period. Figure 2 shows the time series comparison between observed and modeled flow for the period 1988-2001. The time series for flow shows that the model is in close agreement with USGS-observed data for both sites-the James River at Cartersville, and the Appomattox River at Matoaca.  Table 2 shows the analyzed statistical values that were obtained within the HAWQS system when comparing the model with the measured data. Table 2. Calibration parameters for Nash-Sutcliffe efficiency (COE or NS) and percent bias (PBIAS) for two sites in the James River watershed [13]. The analysis included Nash-Sutcliffe efficiency (COE or NS), as well as percent bias (PBIAS), which are both indicators of model performance [13]. These values from Table 2 were evaluated in order to determine how well the model performed, and whether the results could be deemed acceptable. The performance indicators used to determine model suitability are shown in Table 3. Table 3. Statistics range values for Nash-Sutcliffe efficiency (COE or NS) and percent bias (PBIAS) to compare for an indication of model performance [13].

Performance
Rating COE PBIAS Streamflow PBIAS Sediment PBIAS TN and TP Table 3 is adapted from the HAWQS calibration manual, and shows the evaluators of performance for both COE (or NS) and PBIAS [15]. A comparison of the calibration results from Table 2 with the indication of performance from Table 3 indicates that the model performed well for both flow and total nitrogen, was unsatisfactory for TSS, and ranged from good to satisfactory for all other parameters. The performance rating of "very good" for flow in the mainstem of the James River is optimal for the SWAT model. Therefore, these values were deemed acceptable to use for the final SWAT model run. As a result of this calibration process, the new values for the model input are shown in Table A1 of the Appendix A. These values include the adjusted file name, adjusted parameter, and the value for each site used in the final model run. The results of the SWAT check program are shown in Figure A1 of the Appendix A.

Supplemental Analysis
In addition to running the SWAT model, further data analysis of the land use patterns was conducted outside of, and separate from, the SWAT model or the HAWQS system, in order to analyze their potential impact on runoff changes and possible changes in sediment and nutrient yields to the James River watershed. Land cover charts were obtained from six years within the model range -1992, 2001, 2004, 2008, 2011, and 2016-and were loaded into ArcGIS (ESRI, Redlands, CA, USA) in order to determine the area of each land cover class by HUC sub-basin in the James River watershed. Table A6 in the Appendix A shows the file information and source of the GIS files used for this analysis. The 1992 enhanced land cover data were used as the earliest data for the timeframe, since they included the 1980s land cover in their enhancement [16]). The 2016 land cover data were the latest available data at the time of analysis.

Land Cover
Although difficult to compare directly across time periods due to the changes in naming classification from the National Land Cover Database, it was necessary to quantify the land cover changes across the time periods. Given the variation in nomenclature, for this project the land use categories were grouped into five classes in order to encompass the highest percentage of area. The five classes were: forest, hay, residential, wetlands, and cultivated crops.
The classes in each of the above categories varied over the range of the years, but our analysis attempted to account for the top five areas with the largest percentages of land cover. At least 95% of all of the land cover possibilities were accounted for. The exception is the "open water" classification, which was not included in the analysis comprising the bulk of the difference in HUC02080206-which is nearest the bay, where the river is widest.
For an example of how the grouped land cover classes work, "forest" land cover includes all forest types, while "residential" represents all developed land cover types. Table A7 in the Appendix A shows the full descriptions of the land cover categories from the most recent 2016 classification. This process resulted in six years of comparable land cover data spanning the timeframe of the SWAT analysis to be used for comparison with the model output of the water quality and quantity data in the watershed (as shown in Table A2 in the Appendix A).
To show the changes across the time period of analysis, we calculated the percentage of total HUC land area for each land cover class. Table 4 represents the percentage land cover by each of the five grouped classes for the HUCs along the main stem of the James River during the beginning of the time period (1992) and at the end of the time period (2016).  Table 4 shows the trend throughout the watershed that residential land cover increased, and forest decreased, in every single HUC between 1992 and 2016. HUC02080201 did not change much in forest cover, which is understandable due to the protection of national forests in this part of the study area. HUC02080205, which includes the greater Richmond area, showed a residential cover increase from 6.30% in the 1990s to 10.68% in 2016, due to urban and suburban sprawl. By 2016, high-intensity development increased along the river in association with the urban areas, while the low, medium, and open-space development dramatically fanned out from the city, which also marked decreases in the forest cover in those areas. HUC02080206, which is the closest to the Bay, did see an increase in wetlands cover from 7.53% to 13.91% due to improvements in conservation methods, but also showed a marked increase in residential cover from 14.98% to 21.59% due to urban growth in the Hampton Roads area.

Principal Components Analysis
For the bulk of the statistical analysis, exploratory principal components analysis (PCA) and a correlation matrix were completed on the SWAT model output data along with the land use data. Table A2 of the Appendix A shows the definitions of the water quality and quantity categories from the SWAT model output, while Table A3 in the Appendix A shows the full compiled data used in the PCA and correlation matrix setup. PCA is useful in that it reveals the most important associations and interactions between variables in large datasets. By highlighting the variables that have the greatest influence, PCA reduces the large dataset, isolating those variables which require further study. PCA was run on the dataset, yielding three main principal components. Table 5 shows the numerical output for the principal components analysis. The far-left column displays the top three components in the model in decreasing order of magnitude. In addition to the principal components, eigenvalues, which indicate the highest amount of variance in the model output explained by each component, were calculated, and these are shown in the second column (total eigenvalues). The highest eigenvalue is the principal component (PC1), while the second highest is the second principal component (PC2).
The most important components in the model are those with total eigenvalues greater than 1 [17]. The third column (% Variance) displays the percentage of variance explained by each component. The far-right column shows the cumulative variance of the model after adding each successive principal component. Note that the top three principal components explain over 85% of all variance in the model. Component 1 (PC1) explains 51.95% of the variance in the model, while Components 2 and 3 (PC2 and PC3) explain 25.26% and 8%, respectively. Furthermore, the total eigenvalues for the first 3 principal components are 7.273 (PC1), 3.536 (PC2), and 1.12 (PC3).
Since there is such a pronounced difference between the first two principal components and the much lower third principal component, only the first two components were deemed vital to the James River Basin analysis. Figure 3 shows the PCA biplots (aka loading plots) for the James River watershed using the summed (yearly total) SWAT model results. PCA biplots are composed of an array of vectors, each representing a variable in the dataset. PCA results using average data displayed the same biplot pattern and results, indicating that the relationships between model variables are the same whether the sum or the average data are used. Additionally, the length of each vector represents the magnitude of the impact that each variable has on the principal components. Note that in Figure 3, the length of the vectors representing precipitation (PRECIP) and surface runoff (SURQ) indicates that these variables have less of an impact on sediment yield (SedYield) in the James River watershed compared to the land use variables (the forest, cultivated crops, wetlands, and residential categories).
In addition to the length of the vector representing variables, the clustering of vectors is also an important indicator of relationships in the data. Note that in Figure 3 there are three distinct clusters represented visually in the model output. Each of these clusters indicates correlation between the variables represented by the individual vectors. The first cluster near the top of the graph (in blue) shows that the variables hay land cover, soluble phosphorus (SolP), surface nitrogen (N_Surface), and soil water (SW) are closely related. The second clustering of variables in green indicates that precipitation, sedimentbound phosphorus (SedP), and both organic nitrogen and phosphorus are correlated with one another. The third cluster in red shows that the variables surface runoff (SURQ), sediment yield, and three land cover types (residential, wetlands, and cultivated crops) are closely linked.
Additional results, including a communalities table (Table A9) and a varimax rotation (Table A10) are available in the Appendix A.

Correlation Matrix
In the next part of the analysis, a correlation matrix was generated from the dataset (Appendix A, Table A3) in order to better identify and quantify the relationships between the variables in the James River watershed. The correlation matrix identified the strongest, most significant correlations between the SWAT output variables, and indicated whether these relationships were statistically significant (at either the 5% or 1% levels). The correlation analysis used two-tailed tests to see if there were significant associations in either direction (positive or negative correlations). Table 6 shows the correlation matrix for all variables in the SWAT model output. Overall, the same data trends that were identified in the PCA biplots were also evident in the correlation matrix. However, a more detailed assessment and precise quantification of the nature of the associations of all of the variables in the SWAT model was possible after correlation analysis. For example, the hydrological processes and impacts for the James River watershed are more discernable from the correlation results. These results are discussed in greater detail in the following sections.

Discussion
The results of this study indicate distinct correlations between hydrologic processes and land cover, land cover and water quality, precipitation and water quality, and nutrient processes.

Hydrologic Processes and Land Cover
The red cluster on the biplot in Figure 3 shows that the variable sediment yield (Sed Yield) is positively correlated to surface runoff, as well as three land use classes: residential, wetlands, and cultivated crops. This indicates that for sediment transport into the James River, land use has a greater impact than other factors, such as runoff (SURQ) or precipitation amounts (PRECIP).
In analyzing hydrologic processes, precipitation had its highest correlation with surface runoff according to the correlation matrix, as shown in Table 6. As expected, precipitation was positively correlated to surface runoff for the entire watershed (r = 0.623, p = 0.01). Soil moisture was not significantly correlated to precipitation values; this emphasizes the importance that soil type, texture, and fertilization management practices have on soil water retention [18].
For other variables in the model, surface runoff was most strongly correlated to sediment yields for the James River watershed (r = 0.644), and this relationship was significant at the 1% level. This positive correlation was stronger than it was for precipitation and sediment yield (r = 0.458). This indicates that sedimentation of the James River was more closely linked to runoff, as opposed to actual precipitation amounts. Generally, higher runoff was associated with higher sediment yields. This demonstrates the erosive power of water (particularly at higher velocities) as it flows over the land surface, as highlighted by Gellis et al. (2009) [19] in their thorough review of the Chesapeake Bay's watershed erosion and sediment transport from land cover changes.

Land Cover and Water Quality
In general, this SWAT model output highlights several hydrologic processes at work, as well as how they relate to water quality and land cover/land use. Hydrologic units covered primarily in forest land cover are associated with lower levels of runoff and sediments being transported into the James River. Forested land cover appears to have a greater impact on surface runoff and sedimentation of the James River than any other variable from the SWAT model, whereby undisturbed forest cover is associated with less runoff and river sedimentation. The same negative correlation was observed between hay fields and the sediment yield (SedYield) and surface runoff (SURQ) variables, but this relationship does not appear to be as strong as it is for the forest land cover classification, which shows the positive impacts forests have on water quality. Forested land plays a very important role in hydrologic cycling-including root uptake of water, evapotranspiration, and shade-and any alterations could have drastic consequences [20].
The four land cover variables (residential, forest, wetlands, and cultivated crops) accounted for the greatest amount of variation explained by the principal components, as shown by the communalities in Table A10 of the Appendix A. The communalities are analogous to an r-squared value and indicate the proportion of each variable's variance that can be explained by the respective principal component [17]. The communalities indicate that land cover is a significant factor for water quality and watershed modeling, as it describes the movement of water over the landscape within a watershed. The top two land cover variables were residential (0.926) and forest (0.922). It is interesting to note that these land cover types behave in entirely different or opposing ways when it comes to runoff over the surface and resulting water quality downstream. Another interesting finding is that there was a greater proportion of the variance of land cover categories (from 0.879 to 0.926), sediment (0.855), and nutrient variables (0.888 for soluble phosphorus, 0.805 for sediment-bound phosphorus, and 0.669 for surface nitrogen) explained by the principal components compared to actual hydrologic processes, such as precipitation (0.864), surface runoff (0.794), and soil water (0.781). This indicates that human modification of the land surface has an even greater impact on nutrient and sediment transport and water quality than the physical processes involved [21].
Land cover also plays an important role in sediment transport in the watershed. All land cover classifications were correlated to sediment yield, and were significant at the 1% level, as shown in the correlation matrix in Table 6. The strongest correlation was the negative relationship between sediment yield and the "forest" classification (r = −0.796), indicating that an increase in forest cover could decrease sediment yield, and vice versa. This illustrates an important relationship between forests and the hydrological cycle. Forested lands serve as effective land cover for reducing runoff velocity, increasing infiltration, and maintaining water quality. This is especially true when forests buffer riparian areas and serve to reduce the erosive power of water, leading to reduced sedimentation and sediment yields in a stream.
Conversely, there were positive correlations between sediment yield and the other land cover classifications: "cultivated crops" (r = 0.748), "wetlands" (r = 0.782), and "residential" (r = 0.839). Each of these relationships were significant at the 1% level. The relationship between "residential" and sediment yield was the strongest relationship between land cover and hydrologic processes or water quality variables in the model. When natural land cover is replaced with impervious paved surfaces, precipitation is directed into runoff more quickly than over other land surface cover.
Statistically significant correlations were also found between nutrient levels and four of the land cover classifications. In general, forested environments were negatively correlated to nutrient levels (r = 0.662 for nitrogen and r = 0.585 for phosphorus), again demonstrating the protective impact of forests on water quality. However, increases (decreases) in the other three land use types (residential, cultivated crops, and wetlands) all saw statistically significant increases (decreases) in nutrient levels, with the correlation coefficients being highest for the "residential" category (r = 0.671 for nitrogen and r = 0.569 for phosphorus). This is indicative of lower water quality from fertilizers and increased runoff in urban and suburban settings. Other positive correlations were found between these nutrients and the "wetlands" and "cultivated crops" land classes, with similar (moderately positive) correlation values and significance levels. Cultivated croplands had r values of 0.560 and 0.440 for nitrogen and phosphorus, respectively, indicating nutrient levels were found in the runoff associated with agricultural land use. Ultimately, the James River emerges into the tidal region of the Chesapeake Bay, where the "wetlands" land use classification is prevalent. Here, all nutrients are washed downstream to the mouth of the river, and wetlands intercept that water on its way to the bay, as well as collecting sediment. Wetlands were positively correlated with nutrient levels for nitrogen (r = 0.555) and phosphorus (r = 0.447) in the SWAT model output.
The only statistically significant negative correlation for organic nutrients in the analysis involved the "forest" land class. The "forest" land cover classification was negatively correlated with both organic nitrogen (r = −0.662) and organic phosphorus (r = −0.585), and all correlations were significant at the 1% level. This demonstrates the protective impact of forests on nutrient levels in water, as they reduce runoff and erosion and increase infiltration rates along the forest floor [22].

Precipitation and Water Quality
The green cluster in the biplot in Figure 3 shows that nutrient variables for the James River watershed-such as organic phosphorus (OrgP), sediment-based phosphorus (SedP), and organic nitrogen (ORGN)-are positively correlated with precipitation, more so than surface runoff (SURQ) or any type of land cover. This indicates that for organic nitrogen and phosphorus, precipitation is a greater factor for nutrient transport into the James River watershed than either surface runoff (SURQ) or land cover classification (forest, residential, wetlands, or cultivated crops). Precipitation was positively correlated with both organic nitrogen (r = 0.595) and organic phosphorus (r = 0.600), and all correlations were significant at the 1% level, as shown in the correlation matrix in Table 6. Soil water was also positively correlated with nutrients, but with slightly lower correlation values. Soil water was correlated with organic nitrogen (r = 0.411) and organic phosphorus (r = 0.536), with only the latter being significant at the 1% level.

Nutrient Processes
Dauer et al. (2000) [23] showed different factors that affect the benthic community in Chesapeake Bay, including land use, point and nonpoint source pollution, and nutrient loadings. In fact, the "residential" land use category appears to have a greater impact on sediment yield (SedYield) and surface runoff (SURQ) than any other land cover in the model. This is displayed by the long vectors for those variables on the biplot. Developed areas with increased areas of impervious surfaces and soil compaction are more likely to increase the surface runoff, sedimentation, and nonpoint source pollutants being transported into the watershed downstream. Furthermore, urban runoff has been found to be more contaminated than agricultural runoff [21].
On the other hand, the blue cluster in Figure 3 shows that surface nitrogen is more closely (and directly) related to soil moisture (SW) than it is to either precipitation, surface runoff, or land cover, likely due to the agricultural application of fertilizer. This demonstrates the important role of soil moisture and lateral flow below the surface for nutrient transport through the vadose zone in the watershed.
Nitrogen is an important water quality indicator to monitor, as it can accumulate in the bay from precipitation runoff and lead to algal blooms [24]. Phosphorus is another critical measurement, since it is also related to algal blooms, as well as sediment in the water, which is also harmful to aquatic life [24]. Organic nitrogen and phosphorus have a strong, positive correlation (r = 0.967, 1% significance), indicating that the same process that causes nitrogen to rise also causes phosphorus to rise. Furthermore, both nutrients are commonly applied together as fertilizer in agricultural and residential settings, so it is unsurprising that such a strong correlation was observed. For organic nitrogen and phosphorus, significant correlations were found between nutrient levels and precipitation (r = 0.595 for nitrogen and r = 0.600 for phosphorus), soil moisture (r = 0.411 for nitrogen and r = 0.536 for phosphorus), and sediment yield (r = 0.802 for nitrogen and r = 0.684 for phosphorus).
The strongest correlations in the model output were between organic nutrient loads and sediment yields (r = 0.802 for nitrogen and r = 0.684 for phosphorus), which were significant at the 1% level. This likely indicates sediment binding and transport as a major mechanism for nutrient circulation through the watershed [25]. Furthermore, both organic nitrogen and phosphorus were positively correlated with sediment-bound phosphorus (r = 0.921, and r = 0.920, respectively), and both were significant at the 1% level.
In summary, organic nutrients were found to increase (decrease) as residential, wetland, or cultivated crop land covers increased (decreased), or as sediment yields increased (decreased). Conversely, nutrient levels were found to increase (decrease) as forest land cover decreased (increased) in the James River watershed. Finally, organic nutrients were also shown to have positive, significant correlations with precipitation and soil water. This indicates that nutrients circulate widely through the James River watershed as a result of numerous physical processes related to soil, water, and plant characteristics, as well as human-induced land use modification.

Conclusions
This study is one of very few studies that utilizes the HAWQS system, which proved to be an efficient and effective way to successfully run the SWAT model on the James River watershed, with convenient access to relevant precipitation and land cover data. PCA and a correlation matrix were used with yearly total SWAT output from 1988 to 2018 and land cover satellite data groupings within the model time range, in order to discern relationships among hydrologic and terrestrial data. This method was effective, but the land cover analysis was cumbersome due to the changes in the data collection system over time. However, the advantage of doing a separate land cover analysis outside of the SWAT system is the ability to compare the changes directly with the model's output results. For future work, this study could be expanded to model the impacts of potential climate change and land use change scenarios in the event of increased precipitation and subsequent upsurges in runoff. One limitation of this study is that it does not adequately account for tidal processes, so there is limited applicability near the Chesapeake Bay in the coastal plain, but the SWAT model is more widely applicable for the remainder of the James River Basin upstream. Other tributaries in the Chesapeake Bay or other watersheds could be studied using similar methods in order to determine feasibility and repeatability.
This analysis demonstrates the multifaceted ways in which water quality in the James River watershed is tied to climate, hydrologic processes, and human modification of the surface land cover. The PCA and other statistical analyses helped to tease out these complex relationships and establish quantitative correlation values between the different variables. In general, residential development is associated with increased runoff and erosion, while forested lands show a corresponding decrease in runoff, erosion, and sediment transport. Sedimentation in the James River watershed is primarily dependent on surface runoff and land cover alterations of the surface, such as paving and cultivation, and is correlated to a lesser degree with precipitation levels, with a peak in spring when precipitation and snowmelt combine to increase runoff across the basin. One of the biggest takeaways from this study is the importance of forested land. An increase in forest land cover results in a general decrease in nutrient loads, while an increase in residential, cropland, or wetland cover is associated with an increase in nutrient levels in the watershed. These findings support our hypothesis, and demonstrate that cutting down trees to build neighborhoods, strip malls, infrastructure, etc., could degrade the water quality in the James River if protective measures are not taken to minimize sedimentation and nutrient transport. These findings also suggest that a future climate change scenario that includes an increase in precipitation, coupled with increased urbanization and enhanced runoff, may lead to further degradation in water quality in the James River.
This work indicates the importance of taking a holistic view of the James River watershed for any water quality conservation plan. Any plan that does not account for the impacts of land use change on hydroclimatic processes will fall short of meeting its goals of helping the James River and the Chesapeake Bay to thrive. Future water quality improvement plans should address land use conservation, sustainable agricultural practices, and the root causes of climate change. Some suggestions include homeowner education, sustainable neighborhood construction, conversion to organic farming and precision agriculture where only the necessary amount of fertilizer is added, forest management and protection incentives on private lands, bans on clearcutting for developmental purposes, limiting tree cutting in riparian areas, planting replacement trees, focusing on improving existing infrastructure rather than expansion, and strong climate change goals [22,26]. The most important objective is to raise awareness throughout the watershed that our actions on land can have a direct and lasting impact on the future of our clean water along the James River, and for miles downstream.

Conflicts of Interest:
The authors declare no conflict of interest.  The first column indicates the SWAT file name extension that was adjusted; (V) indicates that the existing value was replaced by the new number; and (%) indicates that the existing value was multiplied by the given value plus one [13]). See Appendix A Table A2 for the definitions of the variable acronyms. Table A2. Table defining variables used in output from the SWAT model for water quality and quantity indicators [5].     [13]. (Note that the dates in table are provided by the HAWQS website during the system initiation and are not consistent with updates for weather data).  Deciduous Forest: areas dominated by trees generally greater than 5 m tall, and making up more than 20% of total vegetation cover. More than 75% of the tree species shed foliage simultaneously in response to seasonal change.

42
Evergreen Forest: areas dominated by trees generally greater than 5 m tall, and making up more than 20% of total vegetation cover. More than 75% of the tree species maintain their leaves all year. Canopy is never without green foliage.

43
Mixed Forest: areas dominated by trees generally greater than 5 m tall, and making up more than 20% of total vegetation cover. Neither deciduous nor evergreen species make up more than 75% of total tree cover. Shrubland 52 Shrub/Scrub: areas dominated by shrubs; less than 5 m tall with shrub canopy typically making up more than 20% of total vegetation. This class includes true shrubs, young trees in an early successional stage, and trees stunted from environmental conditions. Herbaceous 71 Grassland/Herbaceous: areas dominated by gramanoid or herbaceous vegetation, generally making up more than 80% of total vegetation. These areas are not subject to intensive management such as tilling, but can be utilized for grazing. Planted/Cultivated 81 Pasture/Hay: areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing or the production of seed or hay crops, typically on a perennial cycle. Pasture/hay vegetation accounts for more than 20% of total vegetation.

82
Cultivated Crops: areas used for the production of annual crops, such as corn, soybeans, vegetables, tobacco, and cotton, as well as perennial woody crops such as orchards and vineyards. Crop vegetation accounts for more than 20% of total vegetation. This class also includes all land being actively tilled. Wetlands 90 Woody Wetlands: areas where forest or shrubland vegetation accounts for more than 20% of vegetative cover and the soil or substrate is periodically saturated with or covered with water.

95
Emergent Herbaceous Wetlands: Areas where perennial herbaceous vegetation accounts for more than 80% of vegetative cover and the soil or substrate is periodically saturated with or covered with water.
Model verification for the SWAT model was accomplished by using the SWAT Check tool. After running the model using the HAWQS, the initial results from the SWAT model were presented using SWAT Check in order to ensure the overall validity of the model run.
The SWAT Check results from the James River model display the overall hydrologic budget of the watershed for the output period 1988-2018 ( Figure A1). Figure A1. SWAT Check results displaying the overall hydrologic budget of the James River watershed [13].
The average precipitation and potential evapotranspiration were checked and verified against annual climate normal data from the University of Virginia (UVA) [29], and were indeed approximate matches for the watershed area [29]. In general, the water balance of a watershed should indicate that the water coming into the watershed as precipitation is approximately equal to the evaporation, transpiration, runoff, and groundwater recharge combined. The water balance of the James River watershed is displayed in Table A8. The total of the hydrological components is 1156.55 mm, which is approximately equal to the incoming basin-wide precipitation of 1159 mm, minus any uptake by plants or soil redistribution. The overall percentages of the water balance of the James River watershed, derived from the SWAT Check figures, are depicted in Table A8. Of the incoming precipitation, approximately 50% is used for evaporation and transpiration, almost 30% is direct runoff along the surface, and almost 15% is lateral flow, return flow, or subsurface runoff. Less than 6% is added below the surface to the water table.
The curve number is an indication of the level of imperviousness of the soils and surface cover. The average curve number of 77.7 for the James River watershed area shown in Figure A1 would imply more runoff and less infiltration than a smaller curve number would suggest. Any increases in the impervious surfaces in the watershed or soil compaction or saturation would lead to an increase in the curve number and a greater percentage of runoff in the water balance, which would continue to alter the natural flow and recharge of water in the watershed. Communalities: In the SPSS (Statistical Package for the Social Sciences) output, a communalities table was generated, and the results are presented below in Table A9. Communalities indicate the proportion of each variable's variance that can be explained by the respective principal component [17]. They are analogous to an r-squared value. There are several variables that explain a large proportion of the variance in the principal components analysis. The four variables forest, residential, wetlands, and cultivated crops explain over 90% of the variance, while another nine variables explain more than 75%. Varimax Rotation: In addition to identifying which variables have the greatest influence, principal components analysis yields important information regarding the amount of variance explained by each component for each variable in the model. After identifying the principal components, a varimax (variable maximum) rotation was used to reframe the coordinates and maximize the variances so that each variable was matched to a single factor [17]. The varimax rotation results are shown in Table A10 below. This varimax rotation is useful in that it redefines what the components represent by showing which variables are most highly correlated to each component. The first principal component was most highly correlated with the land cover classification categories derived from the satellite data and GIS analysis. For the first principal component, land cover classes-including forest (−0.951), residential (0.935), wetlands (0.903), and cultivated crops (0.898)-had the highest values. Interestingly, the forest category was negatively correlated with the first principal component, whereas the other land classes were positively correlated with it. For the second principal component, the variables that represent nutrient levels (soluble phosphorus = 0.825, surface nitrogen = 0.794), soil water movement (0.793), and the hay land cover (0.764) had the highest values. Variables with moderate correlations with the second principal component included organic phosphorus (0.589) and sediment-bound phosphorus (0.458). Therefore, the second principal component was defined more by soil water processes, nutrient transport, and the hay land cover class. The third principal component was defined primarily through the related hydrological processes of precipitation (0.850) and surface runoff (0.767).