Spatial interpolation of water quality index based on Ordinary kriging and Universal kriging

Abstract Water is a very vital needed substance in order to maintain the important activities of humans. However, contaminated water can transmit diseases such as typhoid, dysentery, diarrhea, cholera and polio. Pakistan is a highly affected country due to the scarcity of safe and healthy water sources. The current study mainly focuses to determine water quality in the selected area. For this purpose, an integrated surface water quality index (SWQI) based on 18 hydrochemical parameters is employed. SWQI substantially correlates with pH, EC, SO4, HCO3 and heavy metals. Therefore, these parameters are utilized in the calculation of SWQI. The results of SWQI show that about 69.23% of samples are ‘very good quality’. Moreover, 30.77% of the total samples are of poor quality and are identified as unsuitable for drinking. Further, Ordinary kriging (OK) and Universal kriging (UK) are used to predict the SWQI at unobserved locations and map the SWQI in the study area to explore the spatial distribution. The prediction performance of the two kriging methods is assessed through cross-validation in terms of root mean square prediction error (RMSPE). It is concluded that the prediction performance of the UK is better than OK in terms of RMSPE.


Introduction
Freshwater resources have come under the spotlight due to quantity and quality reasons.However, surface water resources are extremely sensitive to changing natural processes such as rainfall, temperature, forest cover and climate change (Yang, Linyu, and Shun 2009;Usman, et al. 2014).The changes in natural environmental processes are greatly influenced by anthropogenic activities, including growth in urban areas and agricultural pursuits (Khatri and Tyagi 2015;Bouguerne et al. 2017;Verma et al. 2019).Anthropogenic activities can potentially cause the degradation of surface water resources.The pollutants from urban areas, industries, and agricultural lands make surface water resources vulnerable (Satterthwaite, McGranahan, and Tacoli 2010;Huang et al. 2018;Salerno, Gaetano, and Gianni 2018).In recent years, potentially toxic trace elements (PTEs) have affected aquatic ecosystems around the world (Nazeer, Hashmi, and Malik 2014;Paudyal, et al. 2016).Excessive concentrations of the PTEs can be hazardous to the ecosystem's health (Iqbal and Shah 2013).These elements can enter into the water bodies (e.g.rivers and lakes) through runoffs from agriculture, atmospheric deposition, bedrock, and water drainage (Paudyal, et al. 2016;Iqbal and Shah 2013) and thus risk human health and the aquatic ecosystem (Wu et al. 2009).
Therefore, it is crucial to quantify and assess the trace metals in the marine environment to investigate risks to humans and aquatic ecosystems.Lithogenesis, erosion, weathering and other geological phenomena are the natural sources of pollution (Stafilov et al. 2010).Traffic emissions, and agricultural, mining, and industrial activities are among the anthropogenic sources (Anju and Banerjee 2012;Gu et al. 2014).Cadmium (Cd) is the most-probable carcinogen causing lung cancer ( _ Zukowska and Biziuk 2008).Lead (Pb) adversely affects the child's brain and central nervous system (CNS) (Kaufmann, Staes, and Matte 2003).It causes abdominal pain, restlessness, irritability, headache, and sleeplessness.It further, causes behavioral abnormalities and learning issues for the under five years aged children.Zinc (Zn) may result in infertility, and kidney and CNS disorders.Copper (Cu) induces depression and lung cancer (Sani et al. 2017).Cr is known to cause cancer and tumor of respiratory organs (Langård, Andersen, and Gylseth 1980).Mo triggers molybdenosis in livestock (Alloway and Ayres 1997).Continuously expanding industrialization and mining exercises have resulted in the excessive enrichment of toxic heavy metals in all kinds of soils, surface water and living organisms (Iqbal, Saleem, and Shah 2016).
Water is serving humans' activities in numerous way whether directly or indirectly.It is used for agriculture, domestic, recreational, and industrial purposes.However, water is deteriorating globally due to heavy metal pollution.Geometric increase in human population coupled with rapid urbanization, industrialization, technological advancement, and agricultural development has deteriorated the pristine nature of inland water bodies, viz.lakes, wetlands, rivers, and so on (Ibrahim and Nafi'u 2017).Water contamination is a common worldwide problem in mountain recreation areas (Kuniyal, Jain, and Shannigrahi 1998;Pickering and Buckley 2003;Stephenson 1993).The pristine water is the primary attraction of Swat and Dir, however uncontrolled tourism has caused detrimental changes to the water.People reside near two river basins, Swat and Panjkora, use rivers for drinking water and household purposes, and eat fish from the rivers.
Various methods such as water quality index and multivariate statistical tools are used to provide a better understanding of water quality (Wali et al. 2022;Khan et al. 2022).The geostatistical methods are useful for analyzing spatial variability structure, interpolating between point observations, and creating a map of interpolated values with an associated error map (Zhou, et al. 2011;Arslan 2012).Kriging is one of the geostatistical interpolation approaches consisting of several methods, including Indicator kriging, Simple kriging, Ordinary kriging and Co-kriging, commonly applied in estimating the spatial distribution of characteristics of interest (Lee et al. 2007;Babiker, Mohamed, and Hiyama 2007;Dindaro glu 2014;Gyamfi et al. 2016).Ahmadi and Sedghamiz (2007) analyzed groundwater level spatial and temporal variations using Ordinary kriging and Co-kriging methods.Delbari and Pour Miremadisar (2010) estimated groundwater's salinity and depth using Ordinary kriging, Co-kriging, and the inverse fourth power of distance methods.Also, Hooshm, et al. (2011) applied kriging and Co-kriging methods to evaluate the groundwater's chloride content and sodium adsorption ratio.Hence, several methods and multivariate statistical tools have been used to provide a better understaning of water quality.To extend researchers work to the various Glacial Alpine Lakes and Glacier fed Rivers such as River Swat and River Panjkora, Khyber Pakhtunkhwa, Pakistan.The current study mainly focuses on water quality through an integrated surface water quality index (SWQI) based on 18 hydrochemical parameters.SWQI significantly correlates with pH, EC, SO 4 , HCO 3 and heavy metals.Therefore, these parameters are employed in the calculation of SWQI.Ordinary kriging (OK) and Universal kriging (UK) are utilized to predict the SWQI at unobserved locations and map the SWQI to explore the spatial distribution.

Study area description
The current research area covers a wide portion of Pakistan's Khyber-Pakhtunkhwa province and includes the three districts of Swat, Lower Dir, and Upper Dir.The northern mountain ranges have a temperate zone where Swat is located (Figure 1).Its altitude ranges from 500 to 6500 m above sea level (Qasim, et al. 2011).The Swat River originates in Kalam at the confluence of the Ushu and Utror rivers, then runs down the valley for 160 km to Chakdara.The river runs over 250 km from Kalam to a location not far from Charsada.Numerous big and small tributaries join the river during its course.River Panjkora is a 220 km long river that runs through the Dir Upper and Dir Lower districts and join River Swat at Bosaq (where the rivers Dir Lower, Malakand, and Bajaur Agency converge).From there, River Swat flows onwards.River Panjkora ultimately merges with River Kabul at District Charsadda.The current research work thoroughly for the water quality along the various Glacial Alpine Lakes, and Glacier fed Rivers such as River Swat and River Panjkora, Khyber Pakhtunkhwa, Pakistan

Glacial alpine lakes
Ageta et al. claim that there are three distinct categories of glacial lakes (Ageta, et al. 2000): (1) Unconnected glacial lakes are lakes that are not directly connected to glaciers but may have a glacier within their basin (2) supraglacial lakes (melting ponds), which form on the glacier's surface downstream; or (3) proglacial lake, which are moraine-dammed lakes that are in contact with the glacier front.Some of these lakes have substantial water storage capacities and are vulnerable to GLOFs.(Floods from glacial lake outbursts) (Salerno, et al. 2012).The geomorphological, climatic, and hydrological characteristics typical of the latitude and altitude at which their hydrographic basins are located; are strongly linked to the lakes' large chemical and biological variety (Lami, et al. 1998;Kamenik et al. 2000;Sommaruga and Psenner 2001;Laurion, Lami, and Sommaruga 2002).The ecosystems are extremely delicate and vulnerable (Karlsson, Jonsson, and Jansson 2005;Brown, Hannah, and Milner 2007), where climate change greatly impacts their functioning and structure (Stenseth et al. 2002;Wrona et al. 2006;Battarbee et al. 2005;Baudo, Tartari, and Vuillermoz 2007).The glacial lakes are a good indicator for assessing the effects of climate change at high elevations (Richardson and Reynolds 2000;Benn, Wiseman, and Hands 2001).

Hydrochemical parameters analyzed
Due to the cumbersome sampling process due to high altitude hike, sampling was conducted on different days, 23, 24, 27, 28, and 29th June 2021.A total of 26 samples were collected from lakes and rivers.All the samples were taken in triplicates, kept on ice for a short period, and then subjected to physiochemical parameters analysis within six hours of being transferred to the PCRWR laboratory in Peshawar.Samples were sent to the centralized resource laboratory University of Peshawar to analyze water for heavy metals and stored at 4 C.The geographic location of each lake and river was determined using GPS.With the aid of a geologist, the mineralogy and lithology of the neighboring mountains and lakes were determined.All the samples were collected in various bottles for different analyses, for heavy metals 250 mL pre-cleaned polythene bottles that had been being soaked and washed with 10% HNO 3 were used, and for chemical oxygen demands (COD), dissolved oxygen (DO), turbidity, hardness, 1 L of water was collected.The COD, nitrate, sulphate, and DO levels were assessed using standard Methods.All samples were filtered using a 0.47 m WatmanV R glass microfiber filter GF/F.One part of each sample was acidified with 5% HNO3 (3-5 drops) to analyze heavy metals quantitatively.While the remaining portion of the sample was not acidified, it was nonetheless analyzed for chloride, sulfate, nitrate, and other elements.Physical parameters such as pH, electrical conductivity (EC), and total dissolved solids (TDS) were measured in situ during sample collection using the portable handheld meter and were caliberated before use (Model 8361).The USAbased Perkin Elmer AAS 700 atomic absorption spectrophotometer was used to measure the quantities of heavy metals, duplicate samples, certified reference materials (CRM 700), batches of water samples, and reagent blanks were used to evaluate quality assurance and control.Mohr's method was used for calculations of Cl À concentrations and bicarbnate using titrimetric method (Rashid et al. 2019).Nitrate and Sulphate were measured using a UV visible spectrophotometer (HACH DR 5000) utilizing a standard turbid metric method, at wavelengths of 420 and 410 nm, respectively (Apha 2007).A study area map and a geological map were designed to interpret data sets via ARC.GIS 10.3.The water quality at sampled locations was assessed by using WQI.

Calculation of water quality index
Horton was the first to propose the water quality index (WQI), which is the most appropriate way to assess water quality (El Baba et al. 2020;Boudibi, Sakaa, and Zapata-Sierra 2019;Peterside, Hart, and Nwankwoala 2022).Many researchers used this index to analyze water quality (Sani et al. 2017;Wali et al. 2022;Hooshm, et al. 2011;Qasim, et al. 2011;Raza et al. 2021;Wali, Alias, and Bin Harun 2020a;Wali, Alias, and Harun 2021;Wali, Alias, and Bin Harun 2020b).The first step in this procedure is to assign weights to each physicochemical parameter.Then, the assignment of weights to different parameters is determined according to their relative importance in potable water.WQI can be calculated by using the following formula where in Equation (1) SI i is the sub-index calculated by multiplying a parameter's quality rating to its corresponding relative weights, that is, In Equation (2) x i is the relative weight of ith parameter, and q i is the quality rating of ith parameter of water (see, Equation (3)).
C i is the concentration of each parameter in the water sample; S i is the standard value of the ith parameter.

Geostatistical techniques
Ordinary Kriging and Universal kriging techniques were used to obtain the prediction maps of the water quality index (WQI).Kriging is a geostatistical approach that estimates a variable's value over a continuous spatial field using a limited number of sampled data points.In kriging, the Variogram model has important for controlling kriging weights.In sample data, the gamma-values or semivariances for each pair of points are plotted against the distance between them.Mathematically variogram is defined as in Equation ( 4): where c(h) is the semi-variance; N(h) the number of pairs separated by distance or lag h; Z(s j ) the measured sample at a point s j ; and Z(s j þ h) the measured sample at point (s j þ h).The experimental variogram is the plot of observed values used to explore the spatial structure of the data.In contrast, the model that best fits the data is known as a theoretical or model variogram.In this study, Circular and Powered Exponential variograms were fitted to the experimental variogram.

Ordinary kriging
Ordinary kriging is the most frequently used method for predicting the study variable at unsampled locations.In Ordinary kriging, it is assumed that the study variable Z at the location s j is a stationary random function, which means that the mean of Z(s j ) is constant but unknown.However, it is a straightforward and extensively used approach for predicting the study variable, as it gives both prediction values and prediction errors.
In model-based Ordinary kriging, the distribution of response variable Z is assumed to be Gaussian as described in Equation ( 5) m is the mean, and P Z is the covariance matrix which can be written as (see Equation 6) where R is a correlation matrix, and it depends on vector-valued parameter a, that is, a ¼ (s 2 , d 2 , and u).Most of the kriging methods require normally distributed data.
If the data is not normally distributed, then in order to normalize the response variable, we use the transformation technique (i.e.Box-Cox Transformation).The equation for Ordinary kriging can be written as; In Equation ( 7) where b j (j ¼ 1, 2, … , n) are the weights, and the sum of weights should be equal to one, (i.e.Equation 8) The weights are derived using the Lagrange multipliers method 2.5.2.Universal kriging Ordinary kriging assumes a constant mean for a real-valued random function Z(s i ), but the mean value varies across the entire region, and the variable is said to be nonstationary.The non-stationary regionalized variable will have two components; the mean value of a regionalized variable and the residual.Universal kriging (UK) is one of the most frequently used methods for spatial mapping data when there is a trend in the data and the dependent variable does not fulfill the criterion of weak stationary.Weak stationery can be described as the mean of the response variable is constant over the region, and the spatial dependence of any two data points in space is a function of the distance between them.In Universal kriging, the mean is a function of coordinates in linear, quadratic, or higher form.The model of Universal kriging can be written as; Z $ N lðsÞ, where in Equation ( 9) l(s) is the mean, and P Z is the covariance matrix.In Universal kriging, the trend component l(s) can be modeled as; where in Equation (10) b k is the kth coefficient, P k is th function that describes the trend, and l represents the number of functions that are used to model the trend.The weights in Universal kriging is estimated in the same way as in Ordinary kriging.P Z is the covariance matrix that can be written as (see Equation ( 11)); where R is a correlation matrix, and it depends on vector-valued parameter a, that is, and u).The values of the parameter are estimated using different methods of estimation, that is, Maximum likelihood Estimation (MLE), Restricted Maximum likelihood Estimation (REML), Ordinary Least Square (OLS) or Weighted Least Square (WLS).

Results and discussions
Water has significant importance for humans lives directly and indirectly.Water is plays a key role in agriculture, domestic, recreational, and industrial purposes.However, water gets several contaminants from agriculture and other anthropogenic activities and is a real warning to water quality.Currently water management strategies are not adequate to treat the contaminated water and water quality is still a severe issue in several countries of the world.Pakistan is a highly affected country due to the shortage of safe and healthy water sources.The current study examines the Lake and River water quality for agriculture and drinking.The current research mainly focused to the water quality, so, carried out through an integrated surface water quality index (SWQI) based on 18 hydrochemical parameters.SWQI considerably correlates with pH, EC, SO 4 , HCO 3 , and heavy metals.The descriptive statistics of physicochemical & heavy metals of water samples are provided in Table 1.Where all the parameters were measured in mg/L except for turbidity in NTU, PH in numbers and E c in lS/cm.

Water quality index
In this study, WQI was calculated using physicochemical parameters and heavy metals, and the results of the WQI of the samples are presented in are identified as unsuitable for drinking.The prediction maps revealed that in most areas, the water quality has not deteriorated to a higher level.However, in some areas, the WQI crossed the permissible limit, which indicates that in such regions, the water is not fit for drinking purposes, agricultural activities, fish farms, and so on.

Prediction of WQI at unsampled locations
For Ordinary kriging, the study variable Z at location s i is assumed to be a stationary random function, which means that the mean and variance of Z(s i ) are independent of locations.In contrast, in Universal kriging, there is a trend in the data, and the dependent variable Z(s i ) does not fulfill the criterion of weak stationary.In Universal kriging, the mean of Z(s i ) is a function of coordinates in linear, quadratic, or higher form.In both procedures, first, the empirical variograms are calculated from the observed data to check the spatial dependence in the data, and then the theoretical variograms are fitted to obtain the estimates of variogram's parameters.

Variogram envelope
Before the spatial interpolation, a variogram envelope is used to check the spatial dependence in the data.Variogram envelopes can be constructed by using the maximum and minimum values of each bin.After constructing the variogram envelope, the points of the experimental variogram are plotted on it to check the spatial dependence.For example, in Figure 2, the dashed lines are the minimum, and maximum bounds of the variogram model, whereas the dots plotted on the envelope represent the empirical variogram.Further, we can see the majority of the points of the variogram lie inside the envelope, so we conclude that spatial correlation exists among the observations.range ¼ 0.085.Figure 3 shows the circular and powered exponential variogram models fitted to the experimental variogram for OK and UK.

Prediction values maps
The two selected methods of interpolation, that is, OK and the UK, are used to interpolate WQI at unsampled locations and to generate contour maps for the study region.After predicting WQI at the unsampled location, the values are back-transformed to their original scale.The maps of predicted values of WQI and prediction error for different kriging techniques are given in Figure 4 shows the prediction map of WQI in the study area obtained by using Ordinary Kriging.Overall, the map can be observed as dominated by values indicating good water quality.The highest values of WQI are observed between the latitude of 28.5-29.0 and the longitude of 73-75, which means poor water quality.Another peak value of WQI can be seen between latitude 29.5-30.0 and longitude 70.6-70.8.Such regions contain water that should not be used directly for drinking purposes.In a few regions, the values of WQI indicate the good quality of water, which lies in latitudes 30.2-30.6 and longitude 70.5-70.8.But overall, the map suggests that the area is dominated by good water quality as the WQI lies below 100 in most of the region.Furthermore, in the prediction error map obtained from Ordinary kriging, the prediction error map shows that the highest  error was recorded in the regions that lie between latitude 28.1-29.8and longitude 69.8-70.2,whereas the minimum error was shown by the region having latitude 29.2-30 and longitude 70.6-71.The map shows that, on average, the error associated with different prediction points ranges from 25 to 35. Figure 5 presents the prediction map of WQI obtained by using Universal kriging (UK).A higher WQI has indicated poor water quality at a latitude of 29.2-29.6 and a longitude of 70.5-70.8.Overall, the map can be covered by lower WQI, then slightly higher WQI between latitude 28.2-29.6 and longitude 69.8-70.5.Overall, the map indicates a smooth trend of WQI along different longitudes and latitudes.In addition, the prediction error map obtained by the UK can show minimum errors in the latitude 29.5 and longitude 70.7; however, the prediction errors lie between 1 and 6.Moreover, the prediction performance of different techniques, that is, OK and the UK, were compared using the leave-one-out cross-validation (LOOCV) technique.Table 3 shows OK and the UK with their corresponding root mean square prediction error (RMSPE).The results show that in terms of RMSPE Universal kriging (UK) performed better than Ordinary kriging (OK) while predicting WQI at unvisited locations.Furthermore, standard deviation maps shows that UK outperforms the OK in terms of prediction accuracy.

Conclusion
The current research describes the water quality to the various Glacial Alpine Lakes, and Glacier fed Rivers such as River Swat and River Panjkora, Khyber Pakhtunkhwa, Pakistan.In order to determine the water quality in selected area the integrated surface water quality index (SWQI) is utilized that based on 18 hydrochemical parameters.SWQI considerably correlates with pH, EC, SO 4 , HCO 3 , and heavy metals.Thus, these parameters are used in the calculation of SWQI.Moreover, geostatistical interpolation methods, that is, Ordinary kriging and Universal kriging are used to interpolate the SWQI at unsampled locations, in order to understand the spatial distribution of the SWQI in the study area the prediction values maps are generated.
The prediction maps revealed that in the majority of the areas, the water quality is not deteriorated to a higher level.However, in some areas the WQI has crossed the permissible limit, which indicates that in such regions, the water is not appropriate for drinking purpose, agricultural activities and fish farms etc. Cross-validation technique are used to evaluate and compare the performance of different proposed models (i.e.Ordinary kriging and Universal kriging).The results show that the predictive performance of universal kriging is greater than Ordinary kriging in terms of root mean square prediction error (RMSPE).The outcomes of the current research may provide the valuable reference and procedures for water resource and water quality management Pakistan.Moreover, due to the depth of these lakes, access to soil and sediments is not accessed in this research.However, in future the combination of

Figure 1 .
Figure 1.The geographical locations selected for the current analysis.
3.2.2.Parameter estimation of variogram modelKriging with parameters of experimental variogram does not give us appropriate predictions at unobserved locations, so different variogram models are fitted to the empirical variogram that characterizes the variation in a random process with changing lags and ensures favorable variances in the predictions.In this study, in Ordinary kriging (OK), the Circular variogram model is fitted to the experimental variogram, as in terms of root mean square error (RMSE), its performance is found better as compared to other variogram models.The parameters of fitted Circular variograms are estimated by using the Restricted Maximum Likelihood Method (REML) method, that is,, sill¼ 210.8 and range ¼ 1.18.Whereas in Universal kriging (UK), the Powered Exponential variogram model's performance is found better compared to other models.In the UK the parameters of powered exponential variogram estimated by the REML method are used as input parameters in kriging, that is,, sill¼ 27.8 and

Figure 3 .
Figure 3. Circular variogram fitted to emperical variogram for OK and powered exponential variogram fitted to empirical variogram for UK.

Figure 4 .
Figure 4. Prediction map of WQI by using OK and prediction error map of WQI by using OK.

Figure 5 .
Figure 5. Prediction map of WQI by using UK and prediction error maps of WQI by using UK.

Table 2
water(WQI ¼ 50-100), poor water (WQI ¼ 101-150), very poor water (WQI ¼ 151-200) and water unsuitable for drinking (WQI > 200).It can be observed that most water samples indicate good quality, as about 69.23% of samples lies in the "very good quality" category.Moreover, 30.77% of the total samples are of poor quality and

Table 1 .
Descriptive statistics of Physicochemical & Heavy metals of observed samples.

Table 2 .
Categorization of water on the basis of WQI.Figure 2. Variogram envelope for WQI without Trend.

Table 3 .
Prediction techniques with their corresponding RMSPE.