Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan

https://doi.org/10.1016/j.envsoft.2006.02.001Get rights and content

Abstract

Multivariate statistical techniques, such as cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA), were applied for the evaluation of temporal/spatial variations and the interpretation of a large complex water quality data set of the Fuji river basin, generated during 8 years (1995–2002) monitoring of 12 parameters at 13 different sites (14 976 observations). Hierarchical cluster analysis grouped 13 sampling sites into three clusters, i.e., relatively less polluted (LP), medium polluted (MP) and highly polluted (HP) sites, based on the similarity of water quality characteristics. Factor analysis/principal component analysis, applied to the data sets of the three different groups obtained from cluster analysis, resulted in five, five and three latent factors explaining 73.18, 77.61 and 65.39% of the total variance in water quality data sets of LP, MP and HP areas, respectively. The varifactors obtained from factor analysis indicate that the parameters responsible for water quality variations are mainly related to discharge and temperature (natural), organic pollution (point source: domestic wastewater) in relatively less polluted areas; organic pollution (point source: domestic wastewater) and nutrients (non-point sources: agriculture and orchard plantations) in medium polluted areas; and organic pollution and nutrients (point sources: domestic wastewater, wastewater treatment plants and industries) in highly polluted areas in the basin. Discriminant analysis gave the best results for both spatial and temporal analysis. It provided an important data reduction as it uses only six parameters (discharge, temperature, dissolved oxygen, biochemical oxygen demand, electrical conductivity and nitrate nitrogen), affording more than 85% correct assignations in temporal analysis, and seven parameters (discharge, temperature, biochemical oxygen demand, pH, electrical conductivity, nitrate nitrogen and ammonical nitrogen), affording more than 81% correct assignations in spatial analysis, of three different sampling sites of the basin. Therefore, DA allowed a reduction in the dimensionality of the large data set, delineating a few indicator parameters responsible for large variations in water quality. Thus, this study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.

Introduction

A river is a system comprising both the main course and the tributaries, carrying the one-way flow of a significant load of matter in dissolved and particulate phases from both natural and anthropogenic sources. The quality of a river at any point reflects several major influences, including the lithology of the basin, atmospheric inputs, climatic conditions and anthropogenic inputs (Bricker and Jones, 1995). On the other hand, rivers play a major role in assimilation or transporting municipal and industrial wastewater and runoff from agricultural land. Municipal and industrial wastewater discharge constitutes a constant polluting source, whereas surface runoff is a seasonal phenomenon, largely affected by climate within the basin (Singh et al., 2004). Seasonal variations in precipitation, surface runoff, interflow, groundwater flow and pumped in and outflows have a strong effect on river discharge and, subsequently, on the concentration of pollutants in river water (Vega et al., 1998). Therefore, the effective, long-term management of rivers requires a fundamental understanding of hydro-morphological, chemical and biological characteristics. However, due to spatial and temporal variations in water quality (which are often difficult to interpret), a monitoring program, providing a representative and reliable estimation of the quality of surface waters, is necessary (Dixon and Chiswell, 1996).

The application of different multivariate statistical techniques, such as cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA), helps in the interpretation of complex data matrices to better understand the water quality and ecological status of the studied systems, allows the identification of possible factors/sources that influence water systems and offers a valuable tool for reliable management of water resources as well as rapid solution to pollution problems (Vega et al., 1998, Lee et al., 2001, Adams et al., 2001, Wunderlin et al., 2001, Reghunath et al., 2002, Simeonova et al., 2003, Simeonov et al., 2004). Multivariate statistical techniques has been applied to characterize and evaluate surface and freshwater quality, and it is useful in verifying temporal and spatial variations caused by natural and anthropogenic factors linked to seasonality (Helena et al., 2000, Singh et al., 2004, Singh et al., 2005).

In the present study, a large data matrix, obtained during an 8-year (1995–2002) monitoring program, is subjected to different multivariate statistical techniques to extract information about the similarities or dissimilarities between sampling sites, identification of water quality variables responsible for spatial and temporal variations in river water quality, the hidden factors explaining the structure of the database, and the influence of possible sources (natural and anthropogenic) on the water quality parameters of the Fuji river basin.

Section snippets

Study area

Fuji river basin study area, drained by the Fuji River, is located in the central part of Japan (Fig. 1). The basin area is 3570 km2 and the main stream length is 128 km. The Fuji River is located to the west of Mount Fuji, drawing a curve along the mountain, and is one of three prominent rapid watercourses in Japan. The river originates, as the Kamanashi River, from Mount Komagatake in the north of the Southern Alps, and as the Fuefuki River from the north of Yamanashi Prefecture. These two

Spatial similarity and site grouping

Cluster analysis was used to detect the similarity groups between the sampling sites. It yielded a dendrogram (Fig. 2), grouping all 13 sampling sites of the basin into three statistically significant clusters at (Dlink/Dmax) × 100 < 60. Since we used hierarchical agglomerative cluster analysis, the number of clusters was also decided by practicality of the results as there is ample information (e.g. landuse, location of wastewater treatment plants etc.) available on the study sites. The cluster 1

Conclusions

In this case study, different multivariate statistical techniques were used to evaluate spatial and temporal variations in surface water quality of the Fuji river basin. Hierarchical cluster analysis grouped 13 sampling sites into three clusters of similar water quality characteristics. Based on obtained information, it is possible to design a future, optimal sampling strategy, which could reduce the number of sampling stations and associated costs. Although the factor analysis/principle

Acknowledgements

The authors sincerely thank Yuki Hiraga for her help in the database development and the Fuji Xerox Setsutaro Kobayashi Memorial Fund for providing funding support. We would also like to acknowledge the help and support provided by the 21st Century Center of Excellence (COE), Integrated River Basin Management in Asian Monsoon Region, University of Yamanashi.

References (26)

Cited by (0)

View full text