Data analysis of heavy metal content in riverwater: multivariate statistical analysis and inequality expressions

The purpose of this paper is to use multivariate statistical methods with asymmetric distributions approach, chemical analysis, and inductively coupled plasma–mass spectrometry (ICP-MS) device. We investigate data of heavy metal content from Akcay Riverwater to the Mediterranean involving Finike sea coast at Turkey. We determine the chemical content, origin of heavy metals of the surface water in Akcay River, which flows into the Mediterranean realted to the above-mentioned region by multivariate statistical analysis, pollution indices, and density maps involving numerical comments by numbers. With the help of special numbers represented by special chemical components and simmetric statistical methods given above, in this paper, we obtain many new relations and results. Furhermore, we give some comments, observations, and remarks about the results of this paper. These results have a high potential to be used not only in engineering fields and health sciences, but also in applied mathematics, statistics, and other fields.


Introduction
In recent years, special numbers and their applications were among the indispensable fields of many branches of science. Special numbers are frequently used in statistics methods and their applications. Special numbers that are representations of the chemical components discussed in this study are also very common and useful areas in applied sciences. With aid of applications of multivariate statistical methods, special numbers, and chemical analysis with inductively coupled plasma-mass spectrometry (ICP-MS) device, we study data of heavy metal content from Akcay Riverwater in water to the Mediterranean associated with Finike sea coast at Turkey. We believe that the results obtained in this paper will constitute a resource for many researchers working in related fields.
The need for clean water is increasing due to population growth. Issues such as climate change, misuse of water, and concentration of heavy metals in water have increased the importance of surface water used in agriculture, animal husbandry, and industry. The importance of heavy metal concentration in water can be understood from the recent literature studies. In [1], it is stated that human activity had influenced the water resources over the last years and the world faced the critical water supply and drinking water quality problems. In addition, heavy metal analysis of water and sediments and related differences, anomalies, and toxic effects is quite remarkable [2][3][4][5][6][7][8]. The results of chemical analysis of surface waters create data for statistical analysis. Recent studies in this field have shown that issues such as the use of statistical methods in engineering and the interpretation of symmetric or asymmetric distributions of the obtained distributions have started to attract attention.
In several studies, multivariate statistical techniques, such as cluster analysis (CA) and principal component analysis (PCA), and pollution indices, such as water quality index (WQI), trophic state index (TSI), ecological risk index (ERI), and ecological risk assessment (ERA), were used to determine water quality and the cause of anomalies [7,[18][19][20].
Finike district, which is located on the coast of the Mediterranean Sea in the southwest of Antalya, is a highly fertile agricultural and touristic area [21]. ArcGIS 10.6.1 Software package was used to draw the maps. As can be seen on the site location map of the area given in Fig. 1, three watercourses cross the agricultural area.
Orman and Kaplan [23] reported the sulfur content in the soil and highlighted the tomato monoculture in greenhouses in Kumluca and Finike districts. Also, [24] studied the irrigation waters used in greenhouses in Kumluca and Finike districts; [25] presented the phosphorus distribution and bioavailability in marine surface sediments from differ- Figure 1 Location map of the study area modified from [22] and sample locations ent coasts, one of which is Finike. The only study on heavy metal distribution in Finike was carried out in [26], and the focus of this study was the heavy metal distribution in backshore sediments. No information is available about determination of the health effects of heavy metals in surface waters of Akcay River for the people living in Finike region. For this reason, creating basic information about the current state of Akcay River water pollution is vital for people's health and future research. Heavy metal anomalies in water, regardless of their origin, may adversely affect human health, especially in areas where agriculture is common. The Akcay River and its tributaries were studied because they pass through the Finike region and are used as irrigation water.
This study focused on Akcay River waters and aimed (1) to investigate total heavy metal anomalies of thirteen heavy metals (Sr, Fe, Ba, Cr, Mn, V, Ni, As, Zn, Cu, Ti, Co, Pb); (2) to determine the possible sources of these metals contamination and; (3) to try defining patterns in metal content between samples with the help of multivariate statistical analysis; (4) to assess the human health risk of metal contamination with pollution indices (contamination factor CF; enrichment factor EF; geo-accumulation index Igeo), and (5) to draw the density maps.

Main results with their materials and methods
The study area is located in Finike, which is an important district of Antalya in terms of tourism and agriculture (see Fig. 1). Locating on the southeast of the Teke Peninsula, this district lies on a plain with fertile agricultural areas. Agriculture and tourism are the primary sources of income for the people living in this district. Therefore agricultural lands and greenhouses cover wide areas across the district. The slope of the plain is quite low, and there are deposits formed by streams coming from the mountains. Akcay River and its tributaries are the primary surface water in the plain. The surface water of the Akcay river from which samples are taken is generally clear and odorless. Physical contamination is not observed. There are physical color changes in areas passing through the settlements and especially in industrial areas. Limestones are the dominant rocks in the region. Terra rossa, rich in aluminum, is observed on these rocks. The region has a hot and dry climate in the summer, mild and rainy in the winter.

Sample collection
Samples were collected from the Akcay River and two other streams in May 2018. To avoid artificial contamination, particular attention was paid to the selection of materials used. Each of the 38 water samples was collected in a polythene container. The sample locations within the study area are shown in Fig. 1. The objective to collect samples at equal distances could not be met due to physical barriers, such as land usage, and therefore water samples were taken at the nearest distances that the physical conditions would allow.

Laboratory analyses
The samples were prepared following the procedure defined in U.S. Environmental Protection Agency (EPA) method 3005A [27]. The laboratory analyses were carried out using the ELAN DRC-e model inductively coupled plasma-mass spectrometry (ICP-MS) device in the Research Center Laboratory at Akdeniz University; see [13,20].
Certified pure primary reference material was analyzed with the water samples from the site. The chemical analysis and validation of the samples collected from the field were carried out as specified in the study [28].

Statistical analysis
The results of the chemical analysis were analyzed by several multivariate statistical methods (correlation analysis, principal component analysis, cluster analysis, and regression analysis) to determine the distribution of metals in surface water. Principal component analysis (PCA) is a dimensional reduction technique that represents variability in a dataset by a reduced set of new variables formed as linear combinations of the input data. Prior to PCA, the Kaiser-Meyer-Olkin test was performed to assess the adequacy of the metals data for factor analysis such as PCA [29]. Bartlett's test of sphericity was also used to assess the structure of variability among the metals data and suitability for PCA. Test results should indicate high correlation between variables for the data to be suitable for factor analysis [30]. Principal components with eigenvalues >1 were retained for interpretation. Hierarchical cluster analysis was performed to evaluate which sampling locations have similar metals content. Squared Euclidean distance and average of between groups connection criteria were used in cluster analysis. SPSS 23 Software package was used to perform these multivariate statistical analyses.

Pollution indices
World Health Organization (WHO) uses various indices to identify heavy metal anomalies and to evaluate the results. Using the geo-accumulation index (Igeo), enrichment factor (EF), and contamination factor (CF) methods have become popular in recent studies (Table 1). In the literature, these indexes have been widely used in the analysis of both sediment and surface water [31][32][33]. Metal pollution in agricultural soils could also be identified using these indices [34]. Further, [35] employed these three methods to examine heavy metal pollution in both water and sediment, and [36] used these indices to evaluate contamination degree of the Rybnik water reservoir. These same pollution indices (Igeo, EF, and CF) were used in this study for comparability in evaluating heavy metal concentrations in the Akcay River and its tributaries in Finike Plain. The index calculations took into account the permissible limits for heavy metals published by the WHO, EPA, and the Turkish Standard Institute (TSE) shown in Table 2.
Observe that Tables 1 and 2 also include some chemical components and their special numbers representing inequalities and their lower and upper limits.

Applications of results
Initially, heavy metals concentrations of the surface water samples were compared with the limit values determined by WHO, USEPA, and TSE stated in Table 2. Descriptive statistics of metals concentrations for the 38 surface water samples are given in Table 3.  Table 1 Pollutions indices [33] Indexes

Used formulas Values
Contamination Factor (CF) CF n = C n sample/C n background * CF < 1 low contamination 1 ≤ CF < 3 moderate contamination 3 ≤ CF < 6 considerable contamination CF ≥ 6 very high pollution Enrichment Factor (EF) EF n = (C n /Fe) sample/(C n /Fe) background * * EF < 2 natural variability 2 < EF < 5 moderate enrichment 5 < EF < 20 significant enrichment 20 < EF < 40 very high enrichment EF > 40 extremely high enriched Geochemical Index (Igeo) I geo = log 2(C n /1.5B n ) * * * Igeo < 0.42 unpolluted 0.42 < Igeo < 1.42 low unpolluted 1.42 < Igeo < 3.42 moderately polluted 3.42 < Igeo < 4.42 strongly polluted Igeo > 4.42 extremely polluted * CF (metal) is the ratio between the content of each metal and the background value in sediment and water samples of the study area [35]. * * C/Fe (sample) and C/Fe (background) represent the heavy metal-to-Fe ratios in the study and in the background sample, respectively [35]. * * * Cn expresses the content of the toxic metal n, Bn expresses background data of the toxic metal n, 1.5 is a a factor of possible lithological changes [37].  [39] b - [ 43] c- [38] c - [ 42] d- [40] and index calculations. The maximum Fe concentration exceeded the permissible limits according to WHO and USEPA standards; however, it was classified as Class 2 water according to Turkish surface water regulations. The locations of Sr and Fe concentrations are given in Figs. 2(a) and 2(b).

Distribution of metals in surface water
The chain of inequality formed by the chemical components given in this section has very important applications in chemistry, engineering, and other applied sciences.  The amount of heavy metals has indicated large variations depending on the locations. This variation may be due to geological and geographical features in different locations [46]. Skewness values are used to analyze the asymmetrical or symmetrical distribution of elements. The series is right-skewed (positive) when the skewness > 0. However, the series is left-skewed (negative) when the skewness < 0, which means it is not symmetrical [46]. The results of the descriptive statistics of 13 elements for a total of 38 surface water samples are given in Table 3. According to the chemical analysis results, the metals are listed in the order of their average concentration value from the highest to the smallest one. Therefore a chain of inequalities formed by the chemical components is given as follows: Sr (1024.943) exhibited the highest concentration at location FJ26, an extremely high Fe (652.081) concentration was detected at location F24, the highest Cr concentration was found at location F21, and the highest Mn concentration was found at location F17. Locations FJ26, FJ25, FJ27, and F28 have the highest content of Sr.

Correlation between variables
Since the sample number was 38, the data were considered normal according to the central limit theorem. Therefore Pearson's correlation was used.
The correlation matrix of some elements and metals that were analyzed is presented in Table 4. This analysis explains the associations between the elements themselves. The correlation relationship of similar elements shows the highest values and is explained by strong correlation. According to the results of correlation analysis, As has a strong cor- Table 4 Pearson's correlation coefficients between 13 metals observed in surface water samples received from Akcay River relation with Cu, Sr, V, and Zn; likewise, Ba has a strong correlation with Fe, Ni, and Sr; Co has a strong correlation with Mn; Cr has a strong correlation with Ni; Cu has a strong correlation with Pb, Sr, V, and Zn; Fe has a strong correlation with Mn; Mn has a strong correlation with Ni; Ni has a strong correlation with Zn; Pb has a strong correlation with Zn; Sr has a strong correlation with V and Zn; Ti has a strong correlation with Zn; V has a strong correlation with Zn (p < 0.01).

Factor analysis
The Kaiser-Meyer-Olkin test (KMO) reveals whether the sample data are suitable for analysis. Since the results of KMO test performed on the data were found to be 0.5 ≤ KMO, the data were determined to be suitable for statistical analysis [47][48][49] (Table 5).
Since the factors with values greater than 1 were determined, 4 factors were identified, and 77.135% of the cumulative value was explained ( Table 6). According to the Scree Plot graph obtained from the data, it was seen that the data got flattened after the 4th factor (Fig. 3).
Principal component analysis (PCA) has an important place among the other multivariate statistical analyses used in this study. It revealed the differences between the surface water samples using the variables (Table 7). In this study, PCA was performed as a self-analysis tool. It revealed the change explained by the correlation matrix and varimax rotation. Four principal components were identified using SPSS 23 software package.
According to the component matrix, the first PCA represents As, Ba, Co, Cu, Fe, Pb, Sr, Ti, V, and Zn; the second PCA represents Cr, Ni, and Mn, the third PCA represents none, the fourth PCA represents Pb, and the fifth PCA represents Sr (Table 7). Four PCs were revealed with eigenvalues >1; these PCs explained 77.135% of the total variance in the heavy metal dataset (Table 7, Fig. 4). Typically, heavy metals, except for several of them, are grouped in the first component. The component plot is also compatible with these findings and shows that the general view is concentrated in a component.
Metals representing each component are thought to be of similar origin. Different components are considered as of different origin.

Cluster analysis
In the literature, there are studies on determining water samples having similar characteristics using the results of the chemical analysis of surface water samples [50,51]. Hierarchical cluster analysis (HCA) was performed to determine groups. According to the results of HCA, three groups were created from the Q-mode cluster of sample locations at the arbitrary similarity level of 50%. The first group consists of samples between F25 and F30, and the second group consists of the samples F19 and F24. The third group consists of the remaining samples (Fig. 5). In particular, the locations showing anomalies for Sr and Fe were placed in two different groups. There were a few locations between these groups. Therefore it has been interpreted that the heavy metal enters into the river locally.

Regression analysis
Model summary of ANOVA for regression analysis is shown in Table 8. For a linear regression model made against the chemical analysis results of heavy metals belonging to the samples in the study area, the R square (R 2 ) value, which measures the rate of variation in your dependent variable explained by your independent variables, was calculated as 69.8%. The high value of R-square change indicated that the number of samples and analysis results were sufficient and significant; see [4]. Also, the Durbin-Watson statistics was performed after the regression model to test whether the variables were still correlated. Since the Durbin-Watson coefficient was determined to be 0.863, it could be stated that there was no autocorrelation according to these data. According to the results of the ANOVA, the significance value was determined to be Sig. = 0.000, and the data used in the statistical analysis was found to be smooth and sufficient.

Pollution indices
By using the minimum, maximum, and average values of the chemical analysis the results of the surface water samples, the contamination factor (CF), enrichment factor (EF), and geo-accumulation index (Igeo) values, were calculated according to the limit values suggested by WHO (Table 9).
Descriptive statistics of the indices were calculated, and the histograms of the distributions were prepared to examine the skewness and kurtosis (Table 10, Fig. 6). According to these distributions, CF and EF show a positively right-skewed asymmetric distribution.
The general view of Igeo shows symmetric distribution with no skew. The kurtosis values of CF and EF were determined to be 8.044 > 3, and they showed a peaked distribution with a leptokurtic view. The histogram of Igeo (-0.902 < 3) was determined to show a platykurtic view.
The following minimum and mean values and their classifications were determined from the results of the chemical analysis (except Sr) carried out on water samples collected from Akcay River:   Therefore it is thought that the health index calculations and the highest anomalies (locations FJ26, FJ25, FJ27, and F28: Sr > location F24: Fe > location F19: Cr > location F17: Mn) obtained from this study will be very scientifically important.
Compared to the pollution indices of the surface waters in the immediate surrounding and the world, the maximum Fe value of Akcay River was determined to be higher than that of Badovci Lake, Uzuncayir Dam Lake, Silesian Basin, and Kumluca River (Table 11). Table 12 shows the results of the studies on the chemical contents of surface water from different regions of the world. The heavy metals of Fe, Cr, Co, Ba, As, Cu were found to have a higher concentration in the samples from Akcay River compared to the samples from Kumluca, which is close to the study area. However, the higher value of Fe compared to the other samples from Turkey attracted attention. The value of Fe content was found to be higher than the samples from the world except for the River Nile, Egypt (see Table 12).  Our study area was determined to have higher Sr values than all other samples. It was reported that the presence of high levels of Sr led to pharmacological effects on human bones [69]. Due to the reports stating a negative correlation between high strontium content in potable water and the incidence of dental caries, the consumption of natural water with relatively high levels of stable strontium should be considered [59].

Conclusions
In this study, the heavy metal content of 38 surface water samples from the Akcay River in Finike district was investigated in terms of 13 heavy metals (As, Ba, Co, Cr, Cu, Fe, Mn, Ni, Pb, Sr, Ti, V, Zn). The chemical contents of the samples were determined, and several pollution indices were calculated (contamination index; enrichment factor; geo-accumulation index). Multivariate statistical analyses (descriptive statistics, correlation analysis, factor analysis, regression analysis, cluster analysis, model summary, and ANOVA) were used to interpret the obtained data.
According to the results of the chemical analysis, the elements were ordered in terms of their excess values: Sr > Fe > Ba > Cr > Mn > V > Ni > As > Zn > Cu > Ti > Co > Pb. The number of samples is over 30. Therefore Pearson's correlation analysis was performed. Sr, which showed the highest anomaly, was found to have a high positive correlation with V and Zn, and Fe was found to have a high positive correlation with Mn. No significant negative correlation was determined. The heavy metals with positive correlation show similar origins.
The suitability of the data for the analysis was tested using the Kaiser-Meyer-Olkin test (KMO), and the KMO value was found to be 0.617. KMO results showed that the data was perfectly suitable for used factor analysis. Factor analysis determined four factors, and these factors began to get flatten after the 4th Scree Plot. The factors were explained by a cumulative value of 77.135. According to the results of principal component analysis (PCA), four components were found, and the results were determined to be compatible with the factor analysis. Heavy metals with the highest anomalies (As, Ba, Co, Cu, Fe, Pb, Sr, Ti, V, Zn) were found to be together in the first component. This union, which includes the majority of the heavy metals examined, could be observed more clearly in the "Component Plot".
The cluster analysis created three groups from the Q-mode cluster at the arbitrary similarity level of 50%. The samples showing Sr anomaly were determined to be in the same group (F25, F26, F27, F28, F29, F30), but it was determined to have no effect on the other tributaries of Akcay River. The heavy metal of Fe was listed in a different group. The locations of the samples F24 and F19, where Fe was found to be effective, showed similar properties due to local effect.
In the regression analysis, R 2 value was found to be 69.8% in the model summary, and the results were determined to be very significant. The Durbin-Watson coefficient was calculated to be 0.863, and no autocorrelation was determined. According to ANOVA, the significance value was determined to be Sig. = 0.000, and the data were found to be quite sufficient.
According to the descriptive statistics of kurtosis values of the pollution indices, the kurtosis values of CF and EF were determined to be 8.044 > 3, and they showed a peaked distribution with a leptokurtic view. Igeo, which had a kurtosis value of -0.902 < 3, was determined to show a platykurtic view. According to the skewness values, CF and EF showed positively right-skewed asymmetric distributions, whereas Igeo showed symmetric distribution with no skew. The minimum and mean values of indices were found to be normal compared to the world averages. However, the samples were determined to have "moderate pollution" locally considering Fe values in the calculation of the contamination index and enrichment factor.
According to these pollution index values, Fe anomaly in Akcay River was found to be higher than in Badovci Lake, Uzuncayir Dam Lake, Silesian Basin, and Kumluca River. According to the chemical content of the samples, the maximum Fe content was found to be higher than the world averages except for the samples from Turkey and the River Nile (Egypt), and this has attracted attention. Although the value of Fe content exceeded the limit values suggested by WHO and USEPA, the water was classified as Class 2 water according to Turkish surface water regulations (2015). The Sr value of the samples has quite much exceeded the values of WHO standards, natural mineral water samples from Europe, Ramadan City (Egypt), Kumluca River (Turkey), and the limit values of the indices.
For the health of living things, necessary precautions must be taken at the locations of F24 and F19, where Fe content was found to be high, and at the locations of F25, F26, F27, F28, F29, and F30 on the separate tributary of Akcay River, where Sr content was found to be high. The analysis of the obtained data with multivariate statistical methods was successful.
In the near future, it is planned to carry out different studies and applications of the inequality chain formed by the chemical components given in this paper, their special numbers, and the symmetric distribution functions used, with applied mathematical methods.
As a result, the results obtained in this paper have high application potentials in applied mathematics, statistics, chemistry, engineering, and other applied sciences.