Prediction of Groundwater Quality Index Using Classiﬁcation Techniques in Arid Environments

: Assessing water quality is crucial for improving global water resource management, particularly in arid regions. This study aims to assess and monitor the status of groundwater quality based on hydrochemical parameters and by using artiﬁcial intelligence (AI) approaches. The irrigation water quality index (IWQI) is predicted by using support vector machine (SVM) and k-nearest neighbors (KNN) classiﬁers in Matlab’s classiﬁcation learner toolbox. The classiﬁers are fed with the following hydrochemical input parameters: sodium adsorption ratio (SAR), electrical conductivity (EC), bicarbonate level (HCO 3 ), chloride concentration (Cl), and sodium concentration (Na). The proposed methods were used to assess the quality of groundwater extracted from the desertic region of Adrar in Algeria. The collected groundwater samples showed that 9.64% of samples were of very good quality, 12.05% were of good quality, 21.08% were satisfactory, and 57.23% were considered unsuitable for irrigation. The IWQI prediction accuracies of the classiﬁers with the standardized, normalized, and raw data were 100%, 100%, and 90%, respectively. The cubic SVM with the normalized data develops the highest prediction accuracy for training and testing samples (94.2% and 100%, respectively). The ﬁndings of this work showed that the multiple regression model and machine learning could effectively assess water quality in desert zones for sustainable water management.


Introduction
Groundwater is a crucial resource for many different purposes, including drinking water, agriculture, and industrial uses [1,2]. Assessing and monitoring the quality of groundwater, however, has consistently been a major challenge that needs to be overcome to ensure the long-term sustainability of already depleted water resources. While some groundwater is a renewable resource that can be replenished through rainwater and snowmelt, it can be depleted if consumed faster than naturally recharged [3,4]. On the other hand, non-renewable groundwater resources that have been stored for thousands of years are finite and can be drained if overexploited. Farmers in arid areas usually on key input parameters and subsequently support effective water resource management decisions for irrigation and drinking purposes in the study area. This will make it easier to identify problems early and take steps to protect this vital resource. Artificial intelligence algorithms are employed to predict the irrigation water quality index (IWQI) of the study area based on the input parameters of water electrical conduction (EC), sodium concentration (Na), sodium adsorption ratio (SAR), chloride concentration (Cl), and the percentage of bicarbonate (HCO 3 ), and the output parameter is IWQI. These parameters were computed based on the analysis of 166 samples collected from an arid desert. Using the study's findings, farmers in arid areas can boost agricultural productivity through enhanced irrigation water quality management, and policymakers and stakeholders can make reasonable choices on water resource management. The implications of the proposed methods are two-fold. Firstly, for irrigation purposes, the IWQI provides a valuable tool for stakeholders and farmers to assess the suitability of groundwater for agricultural uses. By comparing the predicted IWQI with the recommended standards, decision-makers can determine whether the groundwater is suitable for irrigation or if additional treatment measures are necessary. This information is essential for optimizing crop production, minimizing the negative impact of poor water quality on agricultural yields, and ensuring sustainable water resource management. Secondly, in terms of drinking water, the predicted IWQI allows for an evaluation of the groundwater's suitability for human consumption. By comparing the calculated IWQI with the World Health Organization (WHO) drinking water standards, decision-makers can also assess the potential health risks associated with the consumption of the groundwater. This information is crucial for ensuring the provision of safe drinking water to communities, as it helps identify the need for appropriate treatment measures or the implementation of alternative water sources. By providing stakeholders and decision-makers with a reliable and efficient tool to evaluate groundwater quality, our study empowers them to make informed decisions regarding water resource management and safeguarding the health and well-being of communities reliant on groundwater for irrigation and drinking purposes.

Study Area
The investigated area, located in the southwestern part of Algeria between 5 • 38 38 W and 2 • 6 30 E latitudes and 24 • 53 30 N and 31 • 42 27 N longitudes, covers a total area of 297,790 km 2 , which constitutes approximately 18% of the area of Algeria. Figure 1 illustrates the location of the study area. The study area belongs to the Algerian Sahara, one of the world's driest and hottest areas [31]. The summers are long and hot, and the winters are short and warm. Adrar is characterized by scarce rainfall, where the annual average is about 15 mm yearly, and the evaporation rate is about 4500 mm yearly [32]. Temperatures in the summer are consistently high and can exceed 45 • C [33]. The study area often experiences a scorching, dusty southerly wind called the Sirocco in the summer [34]. During this time, the northern part of the country can be soaked for as long as 40 days [21]. Geographically, Adrar is bounded by Erg Chech in the west, Tadmait in the east, the occidental Erg in the north, and Tanezrouft in the south. This area comprises four natural Saharan regions: Gourara, Touat, Tidikelt, and Tanezrouft. In the study region, the hydrographic network is represented by Wadi Messaoud, which is the continuity of Wadi Saoura towards the north (the latter drained from the Saharan Atlas), and Oued Tillia and its tributaries, which drain the plateau of Tademaït towards the southeast at the level of Zaouïet Kounta from Baamer to Reggane. At the eastern end of the Touat depression, an intense hydrographic network of small distinct ravines drains the plateau of Tademait. Adrar is mainly an agricultural region characterized by its traditional irrigation system, named "foggara" [35]. Hydrogeologically, the study area is part of the transboundary Northern Saharan Aquifer System (SASS) [36]. Many of these deposits are deeply buried, and their thickness can reach at least 2000 m [37]. In addition to siliciclastic sandstone, some parts of the aquifer are karstic and evaporite [36]. Aquifers in this area tend to be highly productive. Over the centuries, foggaras (water intense hydrographic network of small distinct ravines drains the plateau of Tademait. Adrar is mainly an agricultural region characterized by its traditional irrigation system, named "foggara" [35]. Hydrogeologically, the study area is part of the transboundary Northern Saharan Aquifer System (SASS) [36]. Many of these deposits are deeply buried, and their thickness can reach at least 2000 m [37]. In addition to siliciclastic sandstone, some parts of the aquifer are karstic and evaporite [36]. Aquifers in this area tend to be highly productive. Over the centuries, foggaras (water galleries) have exploited the aquifer of Continental Intercalaire (CI) around its edges in the Sahara [38].

Data Collection
For this study, 166 water samples from the research area were provided by the national water resources agency (ANRH). The samples were provided from boreholes and the foggara system. The data for each sample consisted mainly of chemical elements represented by pH, cations such as magnesium (Mg), calcium (Ca), sodium (Na), and potassium (K), and anions such as chloride (Cl), sulfate (SO4), and bicarbonate (HCO3). Pollution indicators such as nitrate (NO3) and other physical elements are represented by electrical conductivity (EC) and temperature (°C). The assessment of the suitability of groundwater in the region of Adrar for irrigation was established with the international standard provided by the Food and Agriculture Organization (FAO). Therefore, this database adequately represents groundwater quality in the study area. A summary of the collected data is given in Table 1.

Data Collection
For this study, 166 water samples from the research area were provided by the national water resources agency (ANRH). The samples were provided from boreholes and the foggara system. The data for each sample consisted mainly of chemical elements represented by pH, cations such as magnesium (Mg), calcium (Ca), sodium (Na), and potassium (K), and anions such as chloride (Cl), sulfate (SO 4 ), and bicarbonate (HCO 3 ). Pollution indicators such as nitrate (NO 3 ) and other physical elements are represented by electrical conductivity (EC) and temperature ( • C). The assessment of the suitability of groundwater in the region of Adrar for irrigation was established with the international standard provided by the Food and Agriculture Organization (FAO). Therefore, this database adequately represents groundwater quality in the study area. A summary of the collected data is given in Table 1.

. Suitability Indices for Irrigation
A great deal of variation exists in irrigation water's quality depending on the type and quantity of its salts. Groundwater irrigation waters contain NaCl as their predominant salt [39]. Consequently, the sodium adsorption ratio (SAR) played an important role in determining the effects of the application of irrigation water on soil structural behavior in earlier research [40]. The USDA's salinity lab defined SAR as [41]: where concentrations of cations ( Na + , Ca 2+ + Mg 2+ are expressed in milliequivalents per liter (mEq/L).

Irrigation Water Quality Index (IWQI)
Depending on the crop pattern, soil type, and climate, irrigation quality requirements may vary from one field to another [42]. Hence, a spatially distributed assessment of individual quality parameters is possible through irrigation water quality mapping. GIS can therefore be used to visualize such maps and make comparative evaluations. Nowadays, groundwater has been assessed for its suitability for irrigation and drinking purposes using the Irrigation Water Quality Index (IWQI) in many regions worldwide [27,43]. For this study, the IWQI model, developed by Meireles et al. [44], was used to analyze the data. First, it was necessary to identify the most relevant irrigation parameters. Second, aggregation weights (wi) and quality measurement values (qi) were defined. According to the irrigation water quality characteristics required by the Food and Agriculture Organization (FAO) for agricultural uses, proposed by Ayers and Westcot [45], values of (qi) were calculated based on every chemical parameter, as shown in Table 2. Equation (2) is used in this model to calculate the irrigation water quality parameter (qi), which is determined by the tolerance limits of the parameters listed in Table 2: where qi is the quality of each parameter, qi max stands for the maximum value of qi for every class, xij stands for every parameter's observed value, x in f represents the lower limit class of the parameter, qi amp is the amplitude of quality measurement class, and X amp is the amplitude class. X amp was evaluated based on the highest value determined in the analysis of the physicochemical properties of groundwater. According to Table 3, the parameter weights in the IWQI were calculated, as suggested by Meireles et al. [44]. Finally, the IWQI is calculated using Equation (3). Table 2. Limiting values for parameters used in quality assessments (q i ).

Classification Learner
Using classification and prediction methods of machine learning can minimize timeconsuming efforts by avoiding using many underlying calculations to predict a specific output [21]. When fed with reliable data, machine learning can predict appropriate categories (patterns), while an untrusted data source could inversely affect machine learning results. Therefore, it is vital to prepare the collected data and randomly divide them into two groups, one for training and the other for testing to assess the data quality. The latter group is essential to determine the accuracy of the constructed machine learning model. As soon as the machine learning algorithm is run on the input dataset, the model determines the outputs, so it is necessary to choose a model that is relevant to the task and the information presented and related to it [26,46]. Multiple models are suitable for many tasks, such as recognizing and processing images. Hence, the classification learner tool in Matlab is used to predict the irrigation water quality index (IWQI).
The input parameters to the classification learner toolbox in this study are water electrical conduction (EC), sodium concentration (Na), sodium adsorption ratio (SAR), chloride concentration (Cl), and the percentage of bicarbonate (HCO 3 ), while the output parameter is IWQI. These parameters were computed based on the analysis of 166 groundwater samples. The samples were divided into 156 samples for training all classification learner models, and the remaining samples were used as testing data to identify the accuracy of the models. The IWQI state was defined based on specified range limits, as illustrated in Table 4. As shown in Table 5, training and testing data are distributed by water state based on training and testing processes. Table 4. The state of irrigation water based on IWQI limits.

70-100
Very good 55-70 Good 40-55 Satisfactory 0-40 Unsuitable Table 5. Distribution of the collected data based on the irrigation water state. Training  14  18  32  92  156  Testing  2  2  3  3  10 The classification learner toolbox was used with the trained data to select the bestperforming classifier. Then, the raw, standardized, and normalized data were applied for all classifiers to investigate the best way to obtain a high-accuracy classifier. Some samples of the raw data of the five inputs to all classifiers are illustrated in Table 6.

Irrigation Water State Very Good Good Satisfactory Unsuitable Total
Next, the data are standardized and normalized to enhance the classifiers' accuracies. In order to standardize the data, each column was divided by the standard deviation of the parameter column and then subtracted from the average of the column. The new parameter values (X new ) can be determined as follows: where µ refers to the mean and σ is the standard deviation of each variable of the five input variables of the trained data. The normalization can be developed using the maximum value of all data for each parameter. Therefore, the new transformation ratios become Equation (5): SVM is a machine learning tool that splits data into two classes via the hyperplane. The main objective of SVM is to reduce errors by customizing the hyperplane, which increases the tolerance limit. Since the optimization problem is convex rather than linear, SVM offers a unique solution compared with ANN models containing many local minima [26,47]. First, it must satisfy the maximum distance between points for each category. After that, the exact classification can happen. The hyperplane classifies all points outside its margin as different. Larger features make it difficult to categorize them. Good classification can occur with a large margin, as shown in Figure 2 [48,49].
where the f(x) function will learn from the training data we feed it, and it will then be able to perform the classification process for future data or unseen data outside the original range of datasets. The training process is carried out to find the maximum amount of margin (M) that can be obtained. The margin is mathematically represented as follows: The relationship between M and W is inverse. The main equation for the SVM, which this work is based on, takes the following form:

Weighted K-Nearest Neighbors (KNN) Classifier
For classification and regression, the KNN algorithm is used (most commonly) as a supervised learning algorithm. Datasets can be resampled, and this algorithm can calculate missing magnitudes. This method uses the k closest neighbors (data points) to predict the class of a new variable. Unlike model-based algorithms, instance-based learning uses whole training cases to predict the output of unseen data instead of learning by weights from training data. Based only on the number of points closest to a new point, the k-nearest neighbors method neglects much information. The steps of this method are summarized as follows: 1. The value of the variable k, which expresses the number of neighbors, is determined. 2. The distances between a new point and those in the dataset are calculated. 3. After arranging the points according to the minimum distance calculated in the previous step, the number of adjacent ones is calculated. 4. The class for the neighbors is defined. 5. Finally, the class with the most neighbors is the expected class for this point.  The hyperplane mathematical representation in SVM is as follows: where x and W are the vectors. The vector W refers to the weight vector. The training data can be simulated as: This means that ordered pairs can represent our data (x n , y n ), where x n refers to features and y n refers to the label of the x n . The classification function can be expressed as: where the f (x) function will learn from the training data we feed it, and it will then be able to perform the classification process for future data or unseen data outside the original range of datasets. The training process is carried out to find the maximum amount of margin (M) that can be obtained. The margin is mathematically represented as follows: The relationship between M and W is inverse. The main equation for the SVM, which this work is based on, takes the following form:

Weighted K-Nearest Neighbors (KNN) Classifier
For classification and regression, the KNN algorithm is used (most commonly) as a supervised learning algorithm. Datasets can be resampled, and this algorithm can calculate missing magnitudes. This method uses the k closest neighbors (data points) to predict the class of a new variable. Unlike model-based algorithms, instance-based learning uses whole training cases to predict the output of unseen data instead of learning by weights from training data. Based only on the number of points closest to a new point, the k-nearest neighbors method neglects much information. The steps of this method are summarized as follows: 1.
The value of the variable k, which expresses the number of neighbors, is determined.

2.
The distances between a new point and those in the dataset are calculated. 3.
After arranging the points according to the minimum distance calculated in the previous step, the number of adjacent ones is calculated. 4.
The class for the neighbors is defined.

5.
Finally, the class with the most neighbors is the expected class for this point.

Chemical Composition of the Study Area
The FAO's standards for agricultural purposes proposed by Ayers and Westcot [45] are compared with all physicochemical parameters in this study. Groundwater in the study area shows significant differences in chemical composition. The values of pH ranged from 7.35 to 8.19, averaging 7.71. For the electrical conductivity (EC), the values varied from 620.00 µδ/cm to 5920.00 µδ/cm with an average of 2475.75 µδ/cm, where the acceptable level of EC is 3000 µδ/cm according to FAO guidelines [45]. Therefore, 80.72% of Adrar's electrical conductivity (EC) values are within the acceptable limits for irrigation

Chemical Composition of the Study Area
The FAO's standards for agricultural purposes proposed by Ayers and Westcot [45] are compared with all physicochemical parameters in this study. Groundwater in the study area shows significant differences in chemical composition. The values of pH ranged from 7.35 to 8.19, averaging 7.71. For the electrical conductivity (EC), the values varied from 620.00 µδ/cm to 5920.00 µδ/cm with an average of 2475.75 µδ/cm, where the acceptable level of EC is 3000 µδ/cm according to FAO guidelines [45]. Therefore, 80.72% of Adrar's electrical conductivity (EC) values are within the acceptable limits for irrigation purposes. Groundwater in the research area contains calcium concentrations ranging from 1.07 to 15.10 mEq/L. Consequently, all samples are within the permissible range of the FAO recommendations, which set a maximum value of 20 mEq/L [45]. It was observed that magnesium values varied from 0.65 to 14.63 mEq/L with a mean value of 4.90 mEq/L. So, about 50.6% of samples are within the standards stipulated by the FAO (<5 mEq/L) [45]. Sodium levels range from 1.52 mg/L to 38.70 mg/L, within FAO standard limits (<40 mEq/L) [45]. About 97.59% of the potassium values of Adrar are within the acceptable limits for irrigation purposes stipulated by the FAO (<2 mEq/L) [45]. Throughout this study, the sulfate value ranged from 2.08 to 20.

Irrigation Water Quality Results
Calculated SAR values range from 1.01 to 13.71, with a mean and standard deviation of 5.45 and 1.95, respectively. Generally, every sample whose SAR value ranges between 0 and 18 qualifies as an excellent or good irrigated area, as in our study [50][51][52]. Results of the calculated IWQI for the region of Adrar are presented in Table 7 and Figure 4. There was a wide range of IWQI values, ranging from 3.64 to 93.77, with an average of 41.81. According to Meireles et al. [44], the IWQI was divided into four categories: (i) excellent or very good, when IWQI is more than 70; (ii) good, when IWQI is between 55 and 70; (iii) satisfactory, when IWQI is between 40 and 55; and (iv) inappropriate or unsuitable, when IWQI is below 40. In our study area, the analysis concluded that 16 samples fell into the very good category, representing 9.64% of the total sample set. There were approximately 12.05% of samples deemed good, and there were approximately 21.08% deemed satisfactory. In addition, 95 samples were categorized as unsuitable, accounting for 57.23% of all samples examined.

Artificial Intelligence
In this section, the classification learners' prediction accuracy results are reported. First, the input and output data were specified in the workspace in the MATLAB learner toolbox, and then the command Classification Learner was written in the command box. The input and output files must be identified from the workspace on a new session page.

Artificial Intelligence
In this section, the classification learners' prediction accuracy results are reported. First, the input and output data were specified in the workspace in the MATLAB learner toolbox, and then the command Classification Learner was written in the command box. The input and output files must be identified from the workspace on a new session page. The input parameters were determined, and the last column was kept as the output. A 10-fold cross-validation was selected to ensure a good training process leading to a stable classification model.
When all of the classification learners were used with the raw, standardized, or normalized data, the SVM and KNN developed the highest prediction accuracy for the trained data, so the results of the SVM and KNN are reported here.

SVM Results for Standardized Data
The standardized data can be obtained using Equation (1). For the standardized data, the cubic SVM gave the highest prediction accuracy of 92.9%. Figure 5 shows data samples' distribution with correct and incorrect prediction as a scatter plot of cubic SVM. In this plot, the name of the trained file appears as input_av with 156 observations. Five parameters are used as predictors. The incorrectly predicted points are marked with colored crosses (x). The codes 1, 2, 3, and 4 on the x and y axes in Figure 5 refer to very good, good, satisfactory, and unsuitable IWQI states, respectively. Figure 6 illustrates the confusion matrix of the cubic SVM. As can be seen from Figure 6a, 14 observations exhibit very good IWQI. The cubic SVM classifier correctly predicts thirteen out of fourteen samples; one sample is incorrect and belongs to the good IWQI state. Correct diagnoses are highlighted in green, while incorrect diagnoses are highlighted in red. For the good IWQI state, the cubic SVM correctly predicts 15 of 18 samples, with three incorrect samples, one for satisfactory and two samples for the very good IWQI state. Therefore, the prediction accuracy of the good IWQI state is 83% as shown in Figure 6b. The highest prediction accuracy is for the unsuitable IWQI state where the cubic SVM correctly predicts 91 from 92 samples, with an accuracy of 99%. The total accuracy of the cubic SVM is 92.9% for all trained data samples. the prediction accuracy of the good IWQI state is 83% as shown in Figure 6b. The highest prediction accuracy is for the unsuitable IWQI state where the cubic SVM correctly predicts 91 from 92 samples, with an accuracy of 99%. The total accuracy of the cubic SVM is 92.9% for all trained data samples.  A receiver operating characteristic (ROC) is shown in Figure 7. ROC plots show the current classifier performance with the true positive rate (TPR) on the y-axis and false positive rate (FPR) on the x-axis. Based on the figure mentioned above, 1% of the observations were incorrectly assigned to the positive category based on an FPR of 0.01, while 93% of the observations are correctly classified as positive by the classifier, as indicated by the TPR of 0.93. It is considered a poor classification result when the ROC curve makes a 45 • angle, as opposed to a perfect classification result when it makes an acute angle. A classifier's accuracy can be measured by its area under the curve (AUC). Classifier accuracy increases with increasing AUC. It can be seen from Figure 7 that the AUC is 100%, meaning that the classifier performed better than expected. curve makes a 45° angle, as opposed to a perfect classification result when it makes an acute angle. A classifier's accuracy can be measured by its area under the curve (AUC). Classifier accuracy increases with increasing AUC. It can be seen from Figure 7 that the AUC is 100%, meaning that the classifier performed better than expected. Finally, the results of all classifiers are presented in Table 8, explaining the accuracy of all classifiers with the trained (raw, standardized, and normalized) data. The results of applying all classifiers on the trained data indicated that the best performance for the raw data is with the linear support vector machine (SVM) where the prediction accuracy was 92.9%. High accuracy can be obtained through the cubic SVM (92.9%) when applying all classifiers with the standardized data. In addition, the cubic SVM (94.2%) can develop Finally, the results of all classifiers are presented in Table 8, explaining the accuracy of all classifiers with the trained (raw, standardized, and normalized) data. The results of applying all classifiers on the trained data indicated that the best performance for the raw data is with the linear support vector machine (SVM) where the prediction accuracy was 92.9%. High accuracy can be obtained through the cubic SVM (92.9%) when applying all classifiers with the standardized data. In addition, the cubic SVM (94.2%) can develop high accuracy for the normalized data. The weighted k-nearest neighbors (KNN) classifier generates the second-best prediction accuracy with raw, standardized, and normalized data of 92.3%, 92.3%, and 92.9%, respectively. Therefore, the two classifiers SVM and KNN are presented in the following section since they develop higher prediction accuracy for the trained data.

SVM Results for Normalized Data
The normalized data can be obtained using Equation (2). For the normalized data, the cubic SVM gave the highest prediction accuracy of 94.2%. Figure 8 shows the confusion matrix of the cubic SVM with a prediction accuracy of 94.2%. Figure 8a shows that the number of observations expressing very good IWQI is 14 samples, which are correctly predicted. For good IWQI, 15 of 18 samples were correctly predicted, two incorrect samples were predicted as very good, and one sample was satisfactory. The prediction accuracy for good IWQI was 83% as presented in Figure 8b. For the satisfactory state, the prediction accuracy was 88%, where 28 of 32 samples were correctly predicted and the other four incorrect samples were unsuitable. A total of 90 out of 92 samples were correctly predicted in the case of unsuitable IWQI, and two samples were incorrectly predicted as the satisfactory IWQI state.
An FPR of 0.01, which indicates that 1% of the observations were classified incorrectly, can be seen in Figure 9. The TPR is 1.0, indicating that the classifier correctly assigns 100% of the observations to the positive class. This figure shows a classifier that performs better due to the 100% AUC.
incorrect samples were predicted as very good, and one sample was satisfactory. The prediction accuracy for good IWQI was 83% as presented in Figure 8b. For the satisfactory state, the prediction accuracy was 88%, where 28 of 32 samples were correctly predicted and the other four incorrect samples were unsuitable. A total of 90 out of 92 samples were correctly predicted in the case of unsuitable IWQI, and two samples were incorrectly predicted as the satisfactory IWQI state. An FPR of 0.01, which indicates that 1% of the observations were classified incorrectly, can be seen in Figure 9. The TPR is 1.0, indicating that the classifier correctly assigns 100% of the observations to the positive class. This figure shows a classifier that performs better due to the 100% AUC.

SVM Results for Raw Data
When applying all classifiers on the raw data of the inputs (EC, Na, SAR, Cl, HCO 3 ), the linear SVM developed the highest prediction accuracy of 92.9%. Figure 10 shows the distribution of the correct and incorrect samples with linear SVM. The incorrectly predicted samples appear as colored crosses. Figure 11 shows the confusion matrix of the linear SVM. The prediction accuracies for IWQI state were 79% (11/14) predicted samples appear as colored crosses. Figure 11 shows the confusion matrix of the linear SVM. The prediction accuracies for IWQI state were 79% (11/14), 83% (15/18), 91% (29/32), and 98% (90/92) for very good, good, satisfactory, and unsuitable IWQI, respectively. The inaccurate predictions were 21% (3/14), 17% (3/18), 9% (3/32), and 2% (2/92) for very good, good, satisfactory, and unsuitable IWQI, respectively. For example, for the satisfactory IWQI state, the linear SVM predicts 29 samples as the satisfactory IWQI state, one as the good IWQI state, and two as the unsuitable IWQI state.

KNN Results for Normalized Data
The KNN classifier developed 94.2% prediction accuracy with the normalized data as in Equation (2). Figure 12 shows the results of the KNN classifier with 94.2% prediction accuracy. Figure 12 illustrates that the prediction accuracies were 100% (14/14), 83% (15/18), 81% (26/32), and 100% (92/92) for the very good, good, satisfactory, and unsuitable IWQI states, respectively. As seen in Figure 13, both positive and negative predictive values are shown. Based on Figure 13, we can determine the positive and negative predictive values. According to the figure, the predicted class 3 for satisfactory appears 29 times; 26 of these are correct, with an accuracy of 90% for the satisfactory IWQI state (class 3), with a 10% error rate for the good IWQI state (class 2).

KNN Results for Normalized Data
The KNN classifier developed 94.2% prediction accuracy with the normalized data as in Equation (2). Figure 12 shows the results of the KNN classifier with 94.2% prediction accuracy. Figure 12 illustrates that the prediction accuracies were 100% (14/14), 83% (15/18), 81% (26/32), and 100% (92/92) for the very good, good, satisfactory, and unsuitable IWQI states, respectively. As seen in Figure 13, both positive and negative predictive values are shown. Based on Figure 13, we can determine the positive and negative predictive values. According to the figure, the predicted class 3 for satisfactory appears 29 times; 26 of these are correct, with an accuracy of 90% for the satisfactory IWQI state (class 3), with a 10% error rate for the good IWQI state (class 2).
(15/18), 81% (26/32), and 100% (92/92) for the very good, good, satisfactory, and unsuitable IWQI states, respectively. As seen in Figure 13, both positive and negative predictive values are shown. Based on Figure 13, we can determine the positive and negative predictive values. According to the figure, the predicted class 3 for satisfactory appears 29 times; 26 of these are correct, with an accuracy of 90% for the satisfactory IWQI state (class 3), with a 10% error rate for the good IWQI state (class 2).  The constructed SVM with standardized, normalized, and raw data was tested with ten samples to verify the accuracy of the constructed model. Table 9 shows the results o applying the classifiers with different data. The prediction accuracies of the classifiers wit the standardized, normalized, and raw data were 100%, 100%, and 90%, respectively. Th cubic SVM with the normalized data develops the highest prediction accuracy for trainin and testing samples (94.2% for training, 100% for testing).  The constructed SVM with standardized, normalized, and raw data was tested with ten samples to verify the accuracy of the constructed model. Table 9 shows the results of applying the classifiers with different data. The prediction accuracies of the classifiers with the standardized, normalized, and raw data were 100%, 100%, and 90%, respectively. The cubic SVM with the normalized data develops the highest prediction accuracy for training and testing samples (94.2% for training, 100% for testing).

Conclusions
Groundwater is an essential resource for drinking and irrigation in many parts of the world, particularly in arid regions. However, in many areas of the world, groundwater is unsafe to drink and can negatively affect crop production due to high concentrations of contaminants such as industrial chemicals or agricultural pesticides. This study aimed to assess the water quality for irrigation purposes in the region of Adrar and to develop a classification model to predict the irrigation water quality index (IWQI) class. Additionally, the sodium adsorption ratio (SAR) was assessed to determine the effects of water on soil structural behavior. Analyzing irrigation water quality can improve agricultural productivity and prevent plant damage. The calculation of SAR for the groundwater samples in the study area showed that they belong to the "very good" and "good" classes. Based on available data, it can be observed that most physicochemical parameters are within the Food and Agriculture Organization (FAO) criterion for agricultural purposes. Results for the calculated IWQI in the study area ranged from 3.64 to 93.77, with an average of 41.81. Based on the IWQI results, over 57.23% of the study area falls within the unsuitable category, mainly in the south and northeast parts of the study area. On the other hand, approximately 12.05% of samples were deemed good, and about 21.08% were considered satisfactory. The rest of the study area, about 9.64%, falls within the "high restriction" category, which is dominant in the western parts of the study area.
Artificial intelligence was used to predict groundwater quality for irrigation in the study area. SVM with the normalized data emerged as the optimal model for predicting the IWQI, where the accuracy for training and testing samples was 94.2% and 100%, respectively. This study has successfully developed an accurate SVM model for IWQI in arid areas. By combining physicochemical data, SAR, IWQI, and GIS, we can comprehensively understand water quality and its governing mechanisms in the study area. In this study, the methodology used to summarize the monitoring data could be an efficient and valuable tool for reporting the data to decision-makers. According to our findings, artificial intelligence techniques can enhance groundwater quality management plans in Adrar. By predicting the irrigation water quality index (IWQI) using hydrochemical parameters and machine learning techniques, the study provides a means to evaluate the suitability of groundwater for irrigation and potentially drinking purposes. Furthermore, the model may be adopted in other desert regions where the costs of estimating several water quality variables are high and might be restrictive. Further improvements may also be achieved by including more hydrochemical parameters and applications in different climate regimes.