Assessment of water quality using chemometric methods - a case study of Rusałka Lake, NW-Poland

Chemometric methods, such as cluster analysis, factor analysis and discriminant analysis were applied to identify and assess the quality of lake water. Samples were collected from the Rusałka Lake, located in Szczecin City from September 2012 to September 2015 with frequency once a month. 25 water quality indices were evaluated in particular: Chl a, Eh, temperature, pH, COD-Mn, COD-Cr, BOD5, DO, NO3-, NO2-, NH4+, TN, SRP, TP, Ca2+, Mg2+, Cl-, SO42- ,HCO3-, Fetot, Mntot, Pb, Zn, Cd, Cu. Cluster analysis was performed to determine the similarity in terms of variation of the examined water quality indices and to determine seasonal variation between inflow and outflow areas of the lake. Factor analysis revealed that water quality is shaped by high anthropogenic activities. Discriminant analysis was used for the final assessment of which of the studied variables discriminate between the inflow and outflow zones and seasons. The chemometric approach and results provided useful information on the type of parameters affecting the quality of water in the analyzed lake. The data and information obtained can lead to better understanding of changes which are present in small flow lakes under high anthropopressure.


Introduction
The chemical composition of lakes waters are strongly dependent on climatic changes (precipitation rate, soil erosion, weathering process, insolation, etc.) and constantly increasing anthropopressure in urban and industrial areas. All of that cause fast transformation in lakes ecosystems. In particular, this problem concerns the urban lakes with flowable character. They are often included into municipal rainwater-sewage system when they act as a retention reservoirs, as well as sedimentation ponds and even biological sewage treatment plants. Increasing environmental pollution and identifying new sources of pollution require regular testing. They should enable the assessment of water quality in reservoirs in such urban lakes and internal knowledge which changes in biohydrochemical processes affect their quality (e.g. Joniak et al. 2013).
A chemometric approach to analytical data in environmental sciences, in particular biohydrochemical data, has become widespread (eg. Einax 1995, Kowalkowski et al. 2006, Sojka et al. 2008, Härdle & Hlávka 2015, Najar & Khan 2012, Ogwueleka 2015, Kumar et al. 2015. The methods often used in this approach to data belong to the group of multivariate statistical techniques, among which cluster analysis, discriminant analysis and factor analysis are used. Cluster analysis as classification methods allow us to gather information about relations within dataset (Einax 1995, Kumarasamy et al. 2014. CA enable the detection of specific relationships between samples and studied water quality indices, as well as their direct impact on aquatic ecosystems. The discriminant analysis is used to determine which variables best divide a given set of cases into naturally occurring groups (Einax 1995, Kumarasamy et al. 2014, Thomas et al. 2015. The main applications of factor analysis are reduction of the number of variables and detection of structure in relationships between variables, i.e. variable classification. Therefore, factor analysis is used as a method of data reduction or structure detection (Einax 1995, Shaw 2003, Stanisz 2007, Basilevsky 2009). Above methods allows to reduce the amount of data needed to define environmental changes, determining the most important factors shaping variability of studied ecosystems.
The presented study was an attempt to determine biohydrochemical status and factors influencing water quality of the Rusałka Lake in the inflow and outflow area in years 2012-2015 using chemometric analyses.

Characterization of Rusałka Lake
Rusałka Lake is located in the central part of the city of Szczecin. It is a small (c.a. 2.9 ha) flow-through artificial water reservoir, created from the damming of the Osówka stream. The waters of Osówka flow into Lake Rusałka through an underground canal, after they are mixed with the waters of the Warszowiec stream. Estimated water retention time is from 14 to 46 days. In the direct catchment area there are green and recreational areas, allotments, public buildings as well as single and multi-family residential buildings. Lake is included in the municipal stormwater drainage system, where it acts as a settling pond for organic suspensions flowing down into the waters and a biological sewage treatment plant. Max depth of the Lake is 3.0m (eg. Poleszczuk & Bucior 2009, Poleszczuk et al. 2012.

Material and methods
Research on the waters of Rusałka Lake was conducted in the period from September 2012 to September 2015. Water samples were collected in the area of inflow and outflow from the surface layer -from a depth of about 0.5 m, at intervals of 30 days (once a month). 25 water quality indices were evaluated, in particular: Chl a, Eh, temperature, pH, COD-Mn, COD-Cr, BOD5, DO, NO3 -, NO2 -, NH4 + , TN, SRP, TP, Ca 2+ , Mg 2+ , Cl -, SO4 2-, HCO3 -, Fe, Mn, Pb, Zn, Cd, Cu. Sampling, transport and all analyses were conducted with the standards described in APHA (Rice et al. 2012).
Results of water quality indices examination were analyzed using cluster analysis (CA), factor analysis (FA) and discriminant analysis (DA). CA were conducted using Ward's method as a agglomeration technique and square Euclidean distance as distance matrix (Najar & Khan 2012, Ogwueleka 2015, Kumar et al. 2015, Loganathan et al. 2015, Wang et al. 2015. FA used as a method of data extraction the PCA technique after varimax rotation. CA and FA were conducted on standarized data (Najar & Khan 2012, Miller & Poleszczuk 2016. Canonical discriminant functions were calculated on the raw data using forward and backward methods (Wunderlin et al. 2001, Singh et al. 2004, Mustapha et al 2013, Thomas et al. 2015. To test the usage of collected data for multivariate statistical techniques, the Kaiser-Meyer-Olkin measure of sample adequacy and Bartlett's test of sphericity were performer (eg. Stanisz 2007). All calculations were performer using STATISTICA 12.0. PL.

Results and discussion
Information on the specificity of the processes occurring at individual sampling stations can be provided by the analysis of similarities and differences in the variability of the individual biological and physicalchemical studied indices, illustrated by CA, FA and DA. Basic information about collected data are presented in Table 1 as descriptive statistics.
CA made it possible to discuss the biohydrochemical processes that took place in the Lake waters, in particular about acid-base equilibria, changes in the oxidation-reducing potential, variation in the composition of nutrients, changes in phytoplankton biomass and oxygen status (Fig. 2). Using the Sneath criterion, each diagram shows the extraction of 3 main parameter blocks, which are characterized by a significant similarity in variability, with specific variations between the stations studied (Ogwueleka 2015, Kumar et al. 2015, Loganathan et al. 2015, Wang et al. 2015. Assuming that the concentration of chlorophyll a (Chl a) is a measure of the amount of phytoplankton biomass (Kawecka & Eloranta 1994), and the concentration of dissolved oxygen -in some circumstances -can be considered a measure of photosynthetic activity translating into the redox status of the water column, a discussion could be started on the variability of chlorophyll concentrations and in combination with the variability of other examined water quality indicators. And so, the concentrations of chlorophyll a in Lake Rusałka changed in the water inflow zone as did the total concentration of manganese, while in the outflow zone as iron concentration (Fig. 2). It was noted that this process was typical for reservoirs with more or less stagnant waters (Rajfur 2013).
Chlorophyll concentration changes on the water inflow to Lake Rusałka were correlated with dissolved oxygen concentrations. At the outflow, however, with changes in the concentration of total alkalinity. The pH values as well as the concentration of phosphorus and nitrogen compounds at the inflow as well as the oxygen concentration and the concentration of nitrogen compounds at the outflow had a definitely different nature of changes. A particularly specific finding is that changes in dissolved oxygen concentration in water were not associated with analogous changes in chlorophyll a concentration. Most likely, the drainage water was already oxygenated to such a high degree due to additional physical sorption of oxygen from the air that phytoplankton was not the "main supplier of "oxygen to the water column. In Lake Rusałka in the inflow and outflow zones, oxygen concentrations were not correlated with TP and SRP concentrations, which changed similarly to the concentrations of nitrogen compounds.
At the inflow of waters to Lake Rusałka, changes in oxygen concentrations were analogous to Eh, nitrites, total iron concentration and temperature. In the outflow parameter block they "joined" to the abovementioned indicators such as: COD-Cr and heavy metal concentrations. Water quality indices that changed in a specific way, i.e. they adopted unusual correlations for lake waters (Dojlido 1995) are: concentrations of calcium, sulphates and magnesium. It should be noted that the magnesium concentration determined in this work was the sum of the concentrations of all di-and multi-positive cations in ionic form and complexed -except for calcium ions -which were titrated by EDTA during the determination of general hardness. Sometimes it leads to a significant overstatement of magnesium levels (e.g. Domagała et al. 2001). The two-positive cations mentioned and the di-negative sulphate anion are often largely present in the water column as embedded in micelles of colloidal organic matter (Pitter 1999  There is a relationship between changes in oxygen and chloride ion concentrations, which can be explained by the fact that the studied lake was fed with water from storm water drainage draining significant amounts of melt water in the post-winter period, when the solubility of oxygen in cold water is high, and these waters contained large the amount of NaCl and CaCl2 salts -salts used to remove ice and snow in the winter (e.g. Gutchess et al. 2016) Changes in acid-base equilibria are characterized primarily by changes in pH and alkalinity. In the waters of Lake Rusałka, pH changes correlated with changes in COD-Cr. Assuming that COD-Cr is a measure of the amount of phytoplankton and detritus, i.e. matter that along the water runoff participated in the process of photosynthesis, this similarity becomes understandable (Dojlido 1995, Poleszczuk 2005. In Lake Rusałka, redox potential changes were similar to changes in total iron and nitrites. This correlation is typical for lake reservoirs because it is the redox equilibria established in redox Fe 2+ /Fe 3+ and NO2 -/NO3 -, that are directly responsible for the potentiometrically measured potential established on the redox measuring electrode (Schüring et al. 2000).
Changes in the concentration of phosphorus compounds were quite specifically associated with changes in the values of other tested indices (Pitter 1999). In the inflow zone, the concentration of phosphorus compounds was correlated with changes in pH and total nitrogen and nitrate concentrations. In turn, in the water outflow zone -they were correlated with the concentration of ammonium ions and COD-Mn, which also speaks in favor of the "passage" of phosphorus compounds from bottom sediments to the water column (Sigg & Stumm 1989).   In the case of Lake Rusałka, changes in the concentration of nitrogen compounds (NH4 + and TN) showed a similar nature of changes to the concentrations of phosphorus compounds and pH in the inflow zone, while in the outflow zone -these indicators (NH4 + and TN) showed a correlation with oxygen concentrations whose concentrations changed significantly seasonally. So they could (NH4 + ) arise as a result of oxidation processes of organic matter. FA provided information on the most important water quality indices that described studied Lake by performing data reduction with minimal loss of original information (Tab. 2) (Najar & Khan 2012, Miller & Poleszczuk 2016. FA for the inflow area emerged 8 significant factors after varimax rotationvarifactors (VF) (describing a total of 81,6% of the variance), which eigenvalues are greater than 1, when for outflow area only 7 VF (with 74,14% variance explained. And so for inflow area FV1 (21%) was correlated with changes in Eh, COD-Cr, BOD5, Cland TP, pointing to the inflow of biodegradable organic matter with municipal wastewater. VF2 (11%) showed correlations with Chl a, temp, pH, HCO3 -, and with DO -it was a factor indicating the occurring phytoplankton blooms. The third FV (11%) was associated with the inflow of organic matter originating from the decomposition of biomass, which is supported by the correlation of this factor with pH, COD-Cr, Ca 2+ , HCO3and NO3 -. FV4 (8%) showed an inflow of hardly decomposable organic matter together with incoming suspensions (correlation with COD-Mn and Ca 2+ ). FV5 (8%) showed sorption of heavy metals by phytoplankton (Chl a and Pb correlation) (Najar & Khan 2012, Rajfur 2013, Loganathan et al. 2015, Wang et al. 2015, Miller & Poleszczuk 2016. In turn, the first four varifactors of the outflow zone determined that FV1 (23%) and FV2 (13%), correlated with Chl a, Eh, BOD5, NH4 + , TN and TP as well as Chl a, DO, SRP and TP -were responsible for the description of the multiplication processes of phytoplankton biomass and increased competition between phytoplankton and other microorganisms involved in biodegradation processes. FV3 (8%) (correlation with COD-Mn, Cl -, Mn) -showed the presence of hardly decomposable organic matter originating from surface runoff from nearby streets, while FV4 (7%) (correlation with SO4 2-, HCO3 -, Cd and Cu) -the presence of processes on the border bottom-water that could have deposited heavy metals in the bottom sediments. FA showed that 16 out of 25 water quality indices in the inflow area and 17 from 25 in the outflow area had a significant contribution in variability of Rusałka Lake. It pointed out the potential origin of pollutants flowing into the lake that destabilized biohydrochemical processes in the inflow zone. In the outflow zone, FA showed a typical system of lake waters that characterizes the reservoir during disturbed self-cleaning process (Einax 1995, Najar & Khan 2012, Loganathan et al. 2015, Wang et al. 2015, Miller & Poleszczuk 2016. As results from the conducted discriminant analysis, the naturally occurring division differentiating the water reservoir, as the seasons in the September 2012 -September 2015 research period are 100% correctly assigned (Tab. 3, Fig. 3). The assignment analysis of the tested samples showed differences in sampling stations (Tab. 4). In the zone of water inflow to Lake Rusałka, the variability of the examined water quality indicators showed 81% proper classification to the examined group, while the remaining 19% corresponded to the predicted variability for the zone of water outflow from Lake Rusałka. In the outflow zone, the percentage of belonging to a given group was the lowest and amounted to 76%, where as much as 24% variability corresponded to the typical changes observed in Lake Rusałka in the inflow zone. Such results are satisfactory (Mustapha et al. 2013, Thomas et al. 2015) and testify to a good match between the measurement data and the analysis carried out. The occurring differences were determined within one flow water reservoir, which is Lake Rusałka and may indicate: (I) disturbance of biohydrochemical processes related to the conducted hydrotechnical works on the inflow of water to the reservoir; (II) discharges of pollutants in the lake outflow zone; (III) releasing contaminants deposited in the bottom sediments of the outflow zone.
DA showed that various factors determined the seasonal variability for each sampling station, which may indicate their ecological individuality. For inflow area factors determining seasonal variability were DO, Fe, TN, Ca 2+ , Pb, COD-Cr, and temperature. This can be read as the seasonal dependence of water quality on chemical changes at the water-bottom sediments border, as well as dependence on the inflow of water contaminated with organic matter of various origins. For outflow area, the most seasonally variable factors were Chl a, pH, SO4 2-, Ca 2+ , Cl -, NO3 -, which may indicate a significant impact of phytoplankton blooms, as well as the inflow to this zone of waters polluted by municipal wastewater (Pitter 1999). The results of the research showed numerous water pollution of Lake Rusałka. Thanks to this, the water quality of this lake can be classified (according to Polish legislation) as below good condition. Chemometric analyzes indicated disturbances in the self-purification processes of this reservoir. Considering the constant inflow of pollution, one of the solutions that could help improve the quality of the Rusałka lake waters is the use of various lake reclamation methods (eg. Kazner et al. 2012, Gałczyńska & Buśko 2016, 2018. Historical source of pollution was until the 1990s. significant amounts of various types of wastewater including domestic and industrial wastewater discharged into the Osówka stream (Poleszczuk et al. 2012). Currently, the catchment also plays a very important role in shaping the water quality of Lake Rusałka, which is a common phenomenon, particularly in small water reservoirs (Gałczyńska et al. 2011). During the research period, the catchment was subjected to huge anthropopressure manifested by hydrotechnical works of the Osówka and Warszowiec stream. In addition, the analyzes performed revealed either leaks in the sewage system, or on direct discharge of sewage from one of the nearby residential buildings (eg. Poleszczuk & Bucior 2009, Poleszczuk et al. 2012.

Conclusion
Using multiple multivariate statistical analysis pointed out that water quality in the Rusałka Lake is mainly affected by seasonal variability and constantly increasing anthropopressure. In addition, the used methods made it possible to reduce the number of water quality indices that allow a determination of the biohydrochemical state of Rusałka Lake. CA showed what biohydrochemical processes took place in the Rusałka Lake waters in case of acid-base equilibria, changes in the oxidation-reducing potential, variation in the composition of nutrients, changes in phytoplankton biomass and oxygen status. FA showed that 16 out of 25 water quality indices in the inflow area and 17 from 25 in the outflow area had a significant contribution in variability of Rusałka Lake. It pointed out the potential origin of pollutants flowing into the lake that destabilized biohydrochemical processes in the inflow zone. In the outflow zone, FA showed a typical system of lake waters that characterizes the reservoir during disturbed self-cleaning process. The chemometric analyses applied to determine factors influencing water quality in the Rusałka Lake provided satisfactory results despite the relatively short data series. Discriminant analysis showed that selected water quality indices such as DO, Fe, TN, Ca 2+ , Pb, COD-Cr, and temperaturefor inflow area and Chl a, pH, SO4 2-, Ca 2+ , Cl -, NO3 --for outflow area were responsible for seasonal variation in Rusałka Lake. The assessment of water quality should always include many indicators of water quality, which, unfortunately, involves considerable costs for such monitoring. The conducted chemometric analyzes allowed to indicate the most important indicators of water quality, and thus provided information on the possibility of reducing the number of determined water quality indices of in Lake Rusałka.