Application of multivariate statistical techniques in the evaluation of large-scale water treatment plants in Baghdad.

This paper aims to evaluate large-scale water treatment plants’ performance and demonstrate that it can produce high-level effluent water. Raw water and treated water parameters of a large monitoring databank from 2016 to 2019, from eight water treatment plants located at different parts in Baghdad city, were analyzed using nonparametric and multivariate statistical tools such as principal component analysis (PCA) and hierarchical cluster analysis (HCA). The plants are Al-Karkh, Sharq-Dijlah, Al-Wathba, Al-Qadisiya Al-Karama, Al-Dora, Al-Rasheed, Al-Wehda. PCA extracted six factors as the most significant water quality parameters that can be used to evaluate the variation in drinking water quality and responsible for 73.389% of the variance in the data set. Based on this selection criterion, the most significant water quality parameters that can be used to evaluate the variation in drinking water quality parameters are the mineral-related parameters (e.g., Ca+2, Mg+2, salinity, hardness), the nutrient parameters (i.e., dissolved nitrate and nitrite and orthophosphate), and a physical parameter. HCA analysis was able to group water treatment plants with similar raw water and treated water quality based on the water quality data from eight WTPs into three clusters.


Introduction
Baghdad water supply and sanitation are discriminated against by poor water quality and lack of service. Three decades of war, along with limited environmental cognizance, have destroyed Baghdad's water resources management system. 91% of the population has access to drinking water, with significant variations between governates and urban and rural areas. Only 77% of the population has access to improved drinking water sources in ruler areas, while 98% in urban areas. Also, 6.2 of the population does not use an enhanced sanitation facility. Besides the inadequacy of potable water, the threat of waterborne diseases is critical public health issues. Preventable health risks are directed to poor water management services. UNICEF (2012) [1]. Clean and safe water results from a careful evaluation of source water quality variation and the plant's performance monitoring. The operators of water treatment plants (WTPs) should regularly analyze water treatment plants' performance and ensure the system operates with the most efficient equipment and technology. Evaluating the plant's performance helps identify the factors that inhibit the treatment plant (treatment processes) from producing acceptable quality water. Adequate and representative performance indicator systems are required to ensure the proper functioning of each water treatment unit. Usually, performance indicator (PI) methodologies have been proposed on performance IOP Publishing doi: 10.1088/1757-899X/1105/1/012109 2 evaluation of WTPs, which is a simple, understandable tool. Performance indicator system (PIs) can be used to evaluate the performance historically over previous periods to show improvement or deterioration in performance so that remedial measures can be taken before service is affected  [5] conducted a performance indicator system to be used as a standardized methodology for performance assessment in drinking water treatment plants. They identified 80 PI over seven domains: treated water quality, plant reliability, natural resources and raw materials; by-products management; safety; human resources; and economic and financial resources.  [8] for evaluation WTPs. In recent years, Multivariate statistical tools, particularly hierarchical cluster analysis (HCA) and principal component analysis have been successfully used in the water field to analyze the complex structure that underlies many analytical data. The PCA is a powerful tool capable of handling massive amounts of data, making it smaller uncorrelated significant variables while still keeping as much information as possible, explaining the variance observed. [20] This study has been performed to study the levels changes of different chemical and physical parameters due to the discharge of the city wastes and verify its compliance with the Iraqi standards and WHO standards; also, the quality of the Tigris River water for irrigation purposes was evaluated. Finally, PCA and HCL were performed to explore a reduction in data that enables the grouping of water quality variables into selected factors with common features to describe the plants' behavior in terms of pollution sources and treatment efficiency.

Sample collection and dataset
Data records of physicochemical parameters of raw water and treated water for the period 2013-2016 were obtained monthly from eight water treatment plants (WTPs) located in Baghdad. The main characteristics of the WTPs are summarized in Table 1 below and the location of the mentioned WTPs presented in figure 1. ).

Data analysis
Approximately 1672 data points of both raw water and treated water were subjected to descriptive statistical analysis using Microsoft Excel 2019 and SPSS (Statistical Package for Social Sciences) 26.0. Table 3 and Table 4 represent mean and standard deviation corresponding to the eight water treatment plants' measured parameters. Pearson correlation test was performed to investigate the possible relationship between the physicochemical parameters of water. Principal factor analysis (PFA) was conducted to determine the significant parameters that describe the variation in the total data set. PCA It is a dimension reduction tool used to identify a smaller number of uncorrelated variables (principal components) from a large set of data with minimum loss of original information Vialle et al.

, (2011)[21].
Kaiser-Meyer-Olkin (KMO) test of sampling adequacy was used to study the data's fit degree to factor analysis. If KMO above 0.5, this indicates the Correlation matrix should and can be factored analyzed.

Kaiser et al., (1974)[22]
suggested guidelines for KMO value, and it does have below 0.5 is unacceptable. KOM value in 0.5 level is miserable, they also write that KOM in the 0.6 level is acutely mediocre, even in the KOM 0.7 level is middling, in the 0.8 level is meritorious and certainly in the 0.9 level is marvelous. Bartlett's test of sphericity was performed to examine the null hypothesis that the resulting correlation matrix is an identity matrix. The PCA was calculated using the following steps: 1. Standardize the variable data to make sure they all have equal weights in the analysis. 2. This step followed by Calculation of the covariance matrix. 3. Find the eigenvalues and the corresponding vectors. 4. Eliminate any components that contribute to only a small percentage of the difference in data sets. 5. Establish the factor loading matrix and conduct a rotation of varimax on the factor loading matrix to conclude the principal stations. In this study, owing to wide variations in data dimensionality, the data was standardized through z scale transformation. Standardization eliminates classification errors and eliminates the effect of various measurement units, and makes the data dimensionless. The Ward method was used to evaluate distances between clusters as an ANOVA approach, and the means of squared Euclidean Distances as a measure of the differences between analytical values from two separate samples.

Results and discussion 4.1 Data analysis
The descriptive statistics were developed after the data consistency analysis to verify compliance with the IQS and WHO standards in Table 2. Table 3 and Table 4 show the results for raw water and treated water parameters under each WTP study, respectively.

4.1.1PH
For raw water data, the PH was ranged from 8.04 to 7.33. It shows the water at the Tigris river is neutral to sub-alkaline. As most crops can withstand a pH range of 4.5 to 9.0 Lund, (1971)[27], so it is evident from these findings that issues with irrigating with such water are unlikely to be experienced, as far as the pH is concerned. The PH value for drinking water was within permissible for both WHO and IQS.

4.1.2Total dissolved solids
Apart from Al-Karkh WTP, there was an increase in TDS concentration throughout the other WTPs. These substances are introduced to the water from agricultural, industrial, and domestic waste. The obtained results of this study show that the overall average concentration of TDS is 568.037 mg/l, which put it in class I concerning its impact on crop yield (no problem) and class II (increasing problem) regarding its impact on soil permeability according to FAO standards [28]. Irrigation with such water should post no problem on crops; however, the way the Iraqi farmer practices irrigation leads to built-up salts in the soil. They are applying a large amount of water and paying no attention to water use for different crops. As the temperature goes up to 50 Cº during the summer, water evaporation from the soil becomes critical, leaving the salt behind Mutlak (1980) [29]. For drinking water, the current study of drinking water matched the Iraqi standards for safe drinking water.

4.1.3Turbidity
Turbidity showed a wide range of fluctuations along the studied period reaching its maximum value at Al-Wehda WTP (105.125 NTU). The higher values of Al-Wehda and Al-Rasheed, especially in winter is probably due to erosion from upland caused by rainfall, organic matter from sewage discharges during vegetable oil plant and electrical power plant bypass. phosphorus from different sources can increase algae growth, resulting in increased turbidity. Sources of phosphorus may include treatment facilities for wastewater, nutrient runoff from cropland and other sources. High turbidity can drastically reduce the river aesthetic quality, harming recreation and tourism. It can raise the cost of drinking water treatment, affecting the irrigation, and damage fish and other aquatic life via reducing food sources, preventing the efficient growth of eggs and larvae of fish; and influencing gill function. . For drinking water, the result did not exceed the limits permitted by the WHO standard as the highest turbidity rate was reached at Al-Wehda (3.25 NTU). The increase and decrease in turbidity levels depend on the contents of the river water in terms material causing turbidity, the age of the project, the efficiency of operation and maintenance of the project, as well as the water consumption by citizens in quantities more than the productive capacity of the project as the water does not have sufficient time to stagnate in the sedimentation basins or use lowquality alum.

4.1.4Alkalinity
Except for Sharq-Dijlah and Al-Karkh, the alkalinity of raw water exhibits high concentration above the standard limits. Higher levels of alkalinity in surface waters can buffer acid rain and other acid waste and stop changes in pH that are detrimental to marine life. Oram, (2020)[36]. For drinking water, the average alkalinity excessed the permissible limit at Al-Karama water treatment plant, whereas the average other parameters were within the limit.

Total hardness
The hardness concentration was slightly increased as the water passes through Baghdad, ranging from 282.167 to 340.250 mg/l. The results obtained from this study for drinking water was ranged from 284.188 /l to 337.542 mg/l and accomplice with the WHO standards.

Calcium and Magnesium
The overall average Ca +2 , and Mg +2 concentrations in Tigris river were 81.26 and 29.34 mg/l respectively. The results present in Table 2 shows that there was a slight increase in Mg +2 and Ca +2 concentration in the river due to city discharge. These ions are beneficial when the water is used for irrigation purposes.

4.1.8Inorganic nitrogen
The results show that the concentration of NO -, NO -, and NH in raw water was found to be 0.00918,

4.1.9Chloride and sulfate
The average concentration of Cl and SO 4 in raw water 68.25 and 197.12 mg/l, respectively. Both Cl and SO 4 exhibits variation, which is likely due to the variation in the river discharge. In drinking water 69.1 and 198.62 mg/l. It is evident from these finding that the concentration of Cl and SO 4 in drinking is more significant than in raw water. This increase is caused by the adding of alum and chlorine to the water. Despite this increase in the concentration of SO 4 and Cl in water, it remains with the WHO limitations.

4.1.10Aluminium
The Aluminium concentration in water was ranged from 0.01 to 0.016 mg/l and in treated water ranged from 0.046 to 0.113 mg/l .The increasing in aluminum concentration in drinking water depends on its concentration in raw water and the use of alum as a coagulant. (WHO, 1989) Table 2. Descriptive statistics of the physicochemical parameter of raw water data for each WTP.

Multivariate analyses of raw and treated water
The dendrograms in figure 2 illustrated the categorization of the water treatment plants based on the similarity of the levels of the physicochemical parameters of raw water and treated water. Ward's method and the square Euclidean distance were used as a measure of dissimilarity. Al-Karkh differed from the other WTPs and remain isolated while the other plants formed a larger group. Sharq-Dijlah, Al-Wathba, and Qadisiya got paired up as a sub-cluster. It is possible to notice that the smallest Euclidean distance between these three plants, they shared the same characteristic features and the same source of contaminations. The three WTPs joined Al-Dora and Alkarma to form one cluster. The raw water of these plants is affected by urban wastes and anthropogenic activities in a mediocre state. Again, Al-Rasheed joined along with Al-Wehda WTP and formed cluster 2. The raw water quality at AL-Wehda WTP is affected by the wastes discharged into the river from the vegetable oil plant. Also, the raw water at Al-Rasheed plant was influenced by the impact of the south Baghdad electrical power station and AL-Rasheed electrical power station. Cluster 1 and cluster 2 are combined with Al-Karkh to form cluster 3. The dendrogram in figure 3.b grouped the data of drinking water into three clusters. AL-Karkh also remains isolated while the other plans grouped because they presented slightly lower performance. Al-Qadisiya, Al-Dora, formed sub-cluster and joined Al-Wathba, Al-Karama, and Sharq-Dijlah to cluster 1, which includes relatively less polluted water. The second cluster contains Al-Rasheed and Al-Wehda. Both these plants are old, and witness no attempted to expand since 2008. In contrast, the third cluster formed by joining cluster 1, cluster 2, and Al-Karkh WTP.

Principal component analysis
The data were standardized, and the KMO test and Bartlett sphericity test were calculated, which was 0.795, and the Bartlett sphericity test is smaller than 0.001, which indicates the feasibility for PCA.
It can be seen from the scree plot in figure 3 that only six principal components have eigenvalues greater than unity and explain 73.389% of the total variance in the data set.

Summary and Conclusion
This study was undertaken to evaluate the performance of eight water treatment plant in Baghdad city. The outcomes showed that the multivariate statistical techniques helped identify the significant input from a large data set that affects drinking water quality. In this study, the PCA in Table 6 highlighted 11 out of 19 as the Main contamination source in drinking water. The parameters including turbidity Cl, PH, SO4, TDS, E.C, Fl, Fe, NH 3 , SiO 2 , and PO 4 , are responsible for the variation of drinking water quality, which mainly introduced to water from weathering of minerals, agricultural waste, naturally existing organic matter, and domestic waste from the human activities. The hieratical cluster analysis grouped the WTPS into three clusters with similar raw water and treated water clusters. The dendrogram illustrated the merging process. This merging process showing not only which data were joined at each point but when they merge in order of increasing cluster distance given the sequence of merges. Three clusters were formed corresponding to the similarity in raw water data cluster 1 contains Sharq-Dijlah, Al-Wathba, Qadisiya, Al-Karama, and Al-Dora. Cluster 2 contains Al-Rasheed and Al-Wehda. Cluster 3 joined Al-Karkh to cluster one and cluster 2. Regarding the similarity of the performance of the WTPs grouped into three clusters. Cluster 1 contains Al-Qadisiya, Al-Dora, Al-Wathba, Sharq-Dijlah, and AL-Karama. Cluster 2 contains Al-Wehda and Al-Wathba. Cluster 3 containing cluster 1, cluster 2, and Al-Karkh WTP.