Cluster and Factor Analysis of Groundwater in Mafraq Area , Jordan

Cluster and Factor analysis were performed on 55 well representative groundwater samples gathered from the open files of the Water Authority and samples during 2011.The collected samples were analyzed for a total of 12 water quality descriptors (variables) including pH, EC,TC,TDS, Ca, Mg, Na, K, CI, HCO3, SO4 and NO3. The study finds that a two-factor model is suggested, it explains over 79.8% of the total groundwater quality variation. Factor 1 includes concentrations of EC, TDS, Na, K, Ca, Mg, Cl ,HCO3, SO4 and NO3. Factor 2 includes TC and PH. Qmode cluster analysis demonstrated that there are two main hydrochemical groups , the first group shows the similarity between NO3, K, Mg, and Na as one group ,which probably represent the effects of weathering of the rich feldspare and mica , in addition to agriculture fertilizers. The second cluster is Cl and HCO3 dominated but also contains low concentration of SO4.


IntrOduCtIOn
In arid and semi-arid regions, groundwater is typically a significant part of the total available water resources.The usefulness of water for particular purpose is determined by the water quality( Ali and Masoud 2009).The quality of water is almost as important as quantity in any water supply planning.Water quality is influenced by natural and anthropogenic effects including local climate, geology and irrigation practices.The chemical character of any groundwater determines its quality and utilization.The quality is a function of the physical, chemical and biological parameters, and could be subjective, since it depends on the particular intended use, (Belkhiri et al,2010).
In recent years, with increasing number of chemical and physical variables of groundwater, a wide range of statistical methods were applied for proper analysis and interpretation of data (Ashley & Lloyd, 1978; Usunoff & Guzman,1989;Suk &Lee, 1999 andSanchez-Martos et al., 2001).Multivariate statistical analysis comprises a number of statistical methods or a set of algorithms that may be applied to several fields of empirical investigation.The methods of cluster analysis and factor analysis were used with remarkable success as a tool in the groundwater quality studies.These methods also give a better understanding of the physical and chemical properties of the groundwater system in space and time (Helsel & Hirsch, 1992;Davis, 2002;Liu et al., 2003;Love et al., 2004 andHussain et al., 2008).
The purpose of this study is to evaluate the present status of water quality in an area northeast of Jordan using multivariate statistical analysis.This analysis will help in planning present water resource in the area as well as providing a baseline for future water quality evaluation studies.

Study sampling area , water and statistical analysis the Study Area
The study area lies between latitude 32º 12´ and 32º 30´ North and longitude 36º 12 and 36º 56´ East in the northeastern of Jordan (Fig. 1).It covers the eastern part of Mafraq area and a part of what is called Harrat Al Sham.The study area is covered by basalt eruptions which spread from Syria in the North into Sudia Arabia in the south.This area lies in the semi-arid region of Jordan and is susceptible to various threats such as growing urban areas as well as developing agricultural areas.Annual precipitation in the study area ranges between 100 and 150 mm , and thus the area is considered to be a semi-desert area.Temperatures can rise in the summer to 45ºC.The dry climate, the atmospheric dust and low intensity of precipitation affect the quality of precipitation water, generally causing increased salt content (Salameh 1996).The Mafraq area straddles two groundwater basins in Jordan; i.e. the Yarmouk basin, and the Amman-Zarqa basin .In general, the shallow groundwater in the Yarmouk basin is found in the B2/A7 aquifer.The overlying geologic formations consist of marly layers and form aquicludes dipping with increasing angles towards the Yarmouk and Jordan rivers.The recharge to the aquifer takes place in the highlands of Irbid and Ajlun and further to the northeast beyond Jordan's territories (Salameh 1996).The Amman-Zarqa basin has two main aquifers, namely the deep A4 and the shallow complex which consists of the B2/ A7 or A7 alone or B2/A7 together with wadi fills and basalts.This basin can be divided into two parts: the area northeast of Wadi Zarqa and the western part extending to the west of Wadi Zarqa.Overpumping is already taking place along Wadi Zarqa part of the basin, such as in the Khalidya and Dhuleil subareas (Water Authority 1989).

Stratigraphy
The age of the rocks out cropping in the study area is ranging from Cretaceous (Ajlun and Belqa Group) to Recent.The succession and a brief description of the lithostratigraphy are summarized in Table (1).

Aquifers
In general, the aquifers in the study area are divided into three main complexes; the deep sandstone aquifers, the Amman-Wadi Sir aquifer and the upper aquifers.Deep sandstone aquifers, according to Salameh (1996), this complex forms one unit in southern Jordan.To the north, thick limestones and marls gradually separate it into two aquifer systems which remain interconnected hydraulically.The Paleozoic Disi group aquifer is the oldest and crops out only in the southern part of Jordan and along the Wadi Araba Dead Sea Rift Valley.The Kurnub and Zerqa group (Jurassic-Lower Cretaceous) is also a sandstone aquifer underlying the northern area of Jordan and overlying the Disi group aquifer.It outcrops along the Zerqa river basin.

Amman Wadi Es-Sir Aquifer (A7)
This aquifer system consists of two formations; the Wadi Es-Sir (A7) and Amman (B2) formations separated by the Um-Al Ghudran (B1), which is missing in some places.The Amman together with Wadi Es-Sir formations forms one of the most important and extensive aquifers outcropping in the high rainfall areas, where most of the recharge occurs.This aquifer is located in the western parts of the study area and some of the studied wells derive water from this complex.According to the Water Authority (1989) the recharge of groundwater in B2/ A7 to the Yarmouk basin is through three sources.The Ajlounn mountains, where Ajloun dome is present; underflow from the northeastern desert basin toward the studied area at Um Essurab area and underflow moving towards the Yarmouk river from Syrian territories.

upper aquifers
This consists mainly of two systems; the first is the Basalt aquifer which extends from the Syrian Jabel Druz area southward towards the Azraq and Wadi Dhuliel region.This is the second main aquifer located in the easternparts of the study area.The second consists of sedimentary rocks and alluvial deposits of Tertiary and Quaternary ages.These rocks form local aquifers overlain partly by

Sampling collection
The hydrochemical data used in this study comprises information abstracted from the open files of the Water Authority for 55 production wells during 2011.The parameters analysed consisted of pH , Total dissolved solids (TDS), Temperature (T), Electrical conductivity (EC), Calcium(Ca), Magnesium (Mg), Sodium(Na), Potassium ( K), Chloride( CI) , Bicarbonate (HCO 3 ) , Sulfate( SO 4 ) and Nitrate (NO 3 ).The water quality parameter values are in mg/l except pH and EC in µs/cm.

Statistical Analysis Coefficient Correlation (r)
Correlation analysis measures the closeness of the relationship between chosen independent and dependent variables.This analysis attempts to establish the nature of the relationship between the variables .In this study, the relationship of water quality parameters on each other in the data of water analyzed was determined by calculating correlation coefficient(R).

Cluster Analysis
Cluster analysis is the name given to an assortment of techniques designed to perform classification by assigning observations to groups so each is more or less homogeneous and distinct from other groups (Hussain et al., 2008).There are two types of cluster analysis: R and Q-modes.R-mode was performed on different water quality variables.Q-mode cluster analysis was performed on the water chemistry data to group the samples in terms of water quality (Davis, 2002;and Tabachnick & Fidell, 2006).The hydrochemical results of all samples were statistically analyzed by using the software SPSS 15 software (SPSS 15, 2010).

Factor Analysis
Factor analysis is a multivariate statistical technique that can be utilized to examine the underlying patterns or relationships for a large number of variables and summarize information in a smaller set of factors or components for prediction purposes (Davis, 2002).Principle component analysis (PCA) is the most frequently employed factor analytic approach.PCA defined as an orthogonal linear transformation that transforms the variables to a new coordinate system such that the greatest variance by any projection of the variables comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.PCA is theoretically the optimum transform for a given data in least square terms (Usunoff & Guzman,1989;Brown 1998;Ceron et al., 1999 andTabachnick &Fidell, 2006).To determine the number of components to extract, data obtained were used as variable inputs.Prior to the analysis, the data were standardized to produce a normal distribution of all variables (Jayakumar &Siraz, 1997 andDavis, 2002).The weights of the original variables in each factor are called loadings, each factor is associated with a particular variable.Communality is a measure of how well the variance of the variable is described by a particular set of factors (Grande et al., 2003).

rESuLt And dISCuSSIOn
The parameters analyzed consisted of pH, EC,TC, Ca, Mg, Na, K, CI, HCO 3 , CO 3 , SO 4 and NO 3 .The chemical analysis data were subjected to descriptive statistical tests, and the results are presented in Table 2.
The groundwater samples of the study area have pH values ranging from 7.24 to 8.18, which indicate that the groundwater is slightly alkaline.The electrical conductivity (EC) values ranged 370 to 4890ìS/cm.The order of abundance of the major cations is Na > Ca> Mg > K and only 7% of the samples exceed the desirable limit of Ca for drinking water (75 mg/L) .For Mg, 15% of the samples exceed the limit of 50 mg/L.The abundance of the major anions Cl > HCO 3 >SO 4 > NO 3 and almost 22 % of the samples exceeded the desirable limit of Cl (200 mg/L), but the sulfate concentrations are all below Fig. 2: dendogram for 12 variables from cluster analysis in Q-mode health guidelines (WHO, 1993).Whereas 9 % of the samples exceed that of NO 3 (50 mg/L).
Correlation coefficient is commonly used to measure and establish the relationship between two variables.It is a simplified statistical tool to show the degree of dependency of one variable to the other.The correlation matrix of ten variables has been presented in Table 3.The EC values exhibit high positive correlation with TDS, Na, K ,Ca, Mg, Cl ,HCO 3 ,SO 4 and NO 3 .Calcium and magnesium presented a strong positive correlation (0.92), indicating a common source.Cl andNa also possess a very good positive correlation (0.94) between each other.
Electrical conductivity (EC) was positively correlated with chlorides, sulphates, sodium, potassium, calcium, magnesium etc. which constitutes major anions and cations present in ground water.TDS maintained positive relationship with chloride, sulphate, calcium, magnesium, sodium, potassium etc.The major exchangeable ions Na-Ca correlate positively (0.86) and strong correlation between sodium and magnesium (0.82) shows that cation exchange dependency is evident.Chloride show positive correlation with most anions and cations.Occurrence of good correlation between calcium and chlorides ( 0.95) indicates about the total hardness of water.Table 4 presents the eigenvalues, the percentage of variance, the cumulative eigenvalue and the cumulative percentage of variance associated with each other.It reveals that first two factors explain approximately 79.8% of total variance.Table 5 shows the loading of vaimax rotated factor matrix for twofactor model.Evidently , the first factor is generally more correlated with the variables than the second factor.This is to be expected because these factors are extracted successively, each one accounting for as much of the remaining variance as possible.
The terms 'strong', 'moderate', and 'weak' as applied to factor loadings, refer to absolute loading values of Ã 0.75, 0.75-0.5 and 0.5-0.3,respectively.Factor 1, which explains 69.5% of the total variance (Table 4), has strong positive loadings on EC, TDS, Na + , Ca + , Mg +2 ,Cl -, HCO -3 , SO 4 -2 , NO 3 -and moderate loadings on K + .EC and TDS both have loadings of 0.995, and control the overall mineralization.Ca has a loading of 0.960, and the high loading may be attributed to its abundance in the earth crust or as the byproduct of the weathering of feldspars, amphibole and pyroxenes.Nitrate having a loading of 0.884 could be associated with anthropogenic activities.Sodium has loading of 0.942,it could be derived from the weathering of plagioclase feldspar, atmospheric dust washed by rain water and also through cation exchange process while magnesium has loading of 0.944 and it could derived from the weathering of mafic minerals.Chloride has a loading of 0.982, and it is derived from anthropogenic sources or the source of chloride in the area could be from water trapped during magmatic activities (Juvenile water).
Among the cations, calcium has the highest loading followed by magnesium, sodium and potassium.Among the anions, chloride has the highest loading followed by nitrate, sulfate and bicarbonate, having loadings of 0.982, 0.884, 0.805 and 0.788 respectively.The high loading of HCO3 is related to natural mineralization.Considering the order of loadings of the ions, the groundwater in the area can be classified as earth alkaline water that is characterized by high concentration of chloride and the ionic ratio of the water constituents is as follows: (Ca +2 + Mg +2 ) Ã Na + , Cl -Ã HCO This factor appeared to be related to contamination from agricultural inputs following the use of chemical fertilizers .It represents a diffuse form of contamination.Positive correlation between the measured concentrations of nitrate and potassium (r=0.580) may be an indication of common source that is related to the application of NPK fertilizer.SO 4 and HCO 3 can result from the decomposition of organic matter, SO 4 source may also be from gypsum and anhydrite that are spread in the area.These sources may also be derived from the decomposition of waste materials.This underlying relationship is also indicated by the correlation of measured concentrations of HCO 3 and SO 4 (r=0.55).The negative loading of pH is apparently associated with SO 4 and HCO 3 due to organic decomposition, and the negative loadings of SO 4 and HCO 3 could be explained by the chemical reduction of SO 4 to H 2 S while HCO 3 is increased.Factor 3 can therefore be ascribed as sulphate reduction.According to Wu and Weng (1988), SO 4 in groundwater could be reduced to H 2 S during degradation of organics.Thus, the stronger the reducibility of the environment, the lower the concentration of SO 4 , and then the more the concentration of HCO 3 will increase (Wen and Qin, 2007).Factor 2 accounts for about 10.38% of the total variance.This factor indicated loadings with respect to pH (0.736) and TC (-0.789).
The water chemistry data were investigated by cluster analysis .Figure 2 shows the Q-mode cluster analysis dendogram of the 12 descriptors.The variables cluster into two major groups, the first group shows the similarity between NO 3, K, Mg, and Na as one group ,which probably represent the effects of weathering of the rich feldspare and mica , in addition to agriculture fertilizers.The second cluster is Cl and HCO 3 dominated but also contains low concentration of SO 4 .
The high correlation coefficient of EC with CI content (Table 4) indicates that chloride contributes relatively more to the salinity than other ions.It was noticed that for the same specific EC values, Na concentration might vary.The high Na content in some samples is derived from cation exchange process between Ca-HCO 3 water and sodium rich zeolites within the basaltic aquifer.Sodium is usually released to the water, and calcium ions will be fixed by zeolites.The high NO 3 content and its rising concentration is due to the effect of water infiltrating back from irrigation.Water infiltrating through the soil will cause Ca and Mg ions to replace Na ions in low salinity water.Na is the major cation in the groundwater that is used for irrigation.

COnCLuSIOnS
In this study, different multivariate statistical techniques were used to evaluate variations in groundwater quality of Mafraq area, Jordan.Interpretation of analytical data showed that EC values exhibit high positive correlation with TDS ,K, Ca, Mg, Na, Cl , SO4 ,NO3 and HCO3.In principal component analysis, the results showed that in a two-factor model, factor 1 account of about 69.5% and has high loading with EC, TDS, Na + , Ca + , Mg +2 ,Cl -, HCO -3 , SO 4 -2 , NO 3 -and moderate loadings on K + .whereasFactor 2 accounts for about 10.38% of the total variance and has loadings with pH and TC.
The variables cluster into two major groups, the first group shows the similarity between NO 3, K, Mg, and Na as one group ,which probably represent the effects of weathering of the rich feldspare and mica , in addition to sources from agriculture fertilizers.The second cluster is Cl and HCO 3 dominated but also contains low concentration of SO 4 .
Alternatively, calcium sulfate type water may originate by solution of anhydrite or gypsum.The presence of three modern playas in the study area suggests the possibility that buried evaporates may explain part of the mineralization found in the water of the upper basin of Mafraq area.The nearly perfect correlation between Na + and C1 -implies that the groundwater in the basin incorporates those components either by solution of halite or by mixing with concentrated sodium chloride type water such as occurs at depth in many closed basins.The modestly high concentrations of magnesium observed in some of the water samples may result from chemical weathering of minerals and later concentration of magnesium at the expense of calcium in places where the solubility of calcite is locally exceeded and some calcium carbonate precipitate in the aquifers.The origin of NO 3 -in water is agricultural activities using fertilizers .

table 2 : basic statistical parameters for the distribution of parameters
Fig. 1 : Location map of the study area.