Mapping groundwater potential zone in the subarnarekha basin, India, using a novel hybrid multi-criteria approach in Google earth Engine

Assessing groundwater potential for sustainable resource management is critically important. In addressing this concern, this study aims to advance the field by developing an innovative approach for Groundwater potential zone (GWPZ) mapping using advanced techniques, such as FuzzyAHP, FuzzyDEMATEL, and Logistic regression (LR) models. GWPZ was carried out by integrating various primary factors, such as hydrologic, soil permeability, morphometric, terrain distribution, and anthropogenic influences, incorporating twenty-seven individual criteria using multi-criteria decision models along with a hybrid approach for the Subarnarekha River basin, India, in Google earth engine (GEE). The predictive capability of the model was evaluated using a Multi-Collinearity test (VIF <10.0), followed by applying a random forest model, considering the weighted impact of the five primary factors. The hybrid model for GWPZ classification showed that 21.97 % (4256.3 km2) of the area exhibited very high potential, while 11.37 % (2202.1 km2) indicated very low potential for GW in this area. Validation of the groundwater level data from 72 observation wells, performed by the Area under receiver operating characteristic (AUROC) curve technique, yielded values ranging between 75 % and 78 % for different models, underscoring the robust predictability of GWPZ. The hybrid and LR-FuzzyAHP models demonstrated remarkable effectiveness in GWPZ mapping, indicating that the downstream and southern regions boast substantial groundwater potential attributed to alluvial soil and favorable recharge conditions. Conversely, the central part grapples with a scarcity of groundwater. It holds the potential to assist planners and managers in formulating strategies for managing groundwater levels and alleviating the impacts of future droughts.


Introduction
As the most dynamic and sustainable renewable natural resource, groundwater both maintains the Earth's biogeochemical cycle equilibrium [1] and serves as a viable alternative water source in dry and semi-dry areas globally [2].It is a prominent source of fresh drinking water for human consumption, with many developing countries depending heavily on it for various purposes, including agriculture, domestic usage, and urban and industrial growing [3].The water demand is steadily rising as a consequence of population expansion, urban expansion, industrialization, and agricultural irrigation, potentially hurting groundwater storage and quality [4].Unregulated groundwater development can lead to water scarcity issues, posing challenges in addressing environmental degradation and climate change patterns [5,6].Therefore, groundwater management becomes vital for soil conservation efforts to ensure future food security.
In India, groundwater is pivotal in Gross domestic product (GDP) growth.It accounts for 50 % of water consumption in urban areas and more than 80 % in rural areas [7].The Central Groundwater Board [8] reported that India's annual renewable groundwater resource is ~433 Billion cubic meters (BCM).The primary groundwater consumption is in the irrigation sector, accounting for 92 % (213 BCM), followed by industrial and domestic use, which accounts for 18 BCM.Groundwater monitoring is essential for estimating river basins' water budgets, enabling sustainable hydrologic decision support systems.It also facilitates water quality monitoring for local livelihoods [9].India has several river basins, including the Ganga, Brahmaputra, Godabari, Narmada, Mahanadi, and Suvarnarekha, among others.
The Suvarnarekha River basin is of significant importance in meeting the urban water demands, agriculture, irrigation needs, hydroelectric power generation, domestic use, and industrial requirements in the states of Jharkhand, Odisha, and even West Bengal.During the 2000-2010 period, the annual average groundwater recharge rate in the river basin varied from 519 mm to 858 mm [10].However, in recent decades, the ecological environment of the Subarnarekha catchment area has deteriorated due to various factors, including low-intensity rainfall, reduced water holding capacity, increased mining activity, industrial growth, soil erosion, deforestation, and water pollution [11][12][13].According to research conducted by Gautam et al. [14], heavy metals and nitrate pollution have been the main drivers behind the reduction in groundwater quality across the entire catchment area.Monitoring groundwater quantity and quality is urgently needed for sustainable development in the whole basin, as groundwater is less susceptible to contamination compared to surface water [15].Mandal et al. [10] also experienced a significant decrease in the recharge rate, particularly for the years 2000, 2002, and 2009, which calls for special attention.
The quantification and delineation of groundwater resources employing conventional techniques such as geological, geophysical, or hydrogeological methods are often labor-intensive, cost-ineffective, and time-consuming [16].Therefore, recording and evaluating the outcomes of subsurface hydrological inquiries can provide a better alternative approach to traditional groundwater potential mapping.A cohesive method combining Remote sensing (RS) and Geographic information system (GIS) approach can serve as a superior decision support system for intelligently assessing Groundwater Potential (GWP), groundwater quality suitability, discharge, recharge, and storage mapping [17][18][19][20][21]. Mohammed et al. [22,23] integrated RS and GIS techniques with the Analytic hierarchy process (AHP) method for effectively evaluating potential groundwater recharge zones in the Iraqi Western desert region.Tamesgen et al. [24] presented the GWP analysis with nine geo-environmental parameters through Ethiopia's RS/GIS-based Multi-Criteria Decision Making (MCDM) approach.Kisiki et al. [25]used geospatial and RS data to define the groundwater recharge zones through the GWP evaluation.They then performed a sensitivity analysis to determine the impact of hydrologic and geological factors on their variations.The widely used multi-criteria-based decision support techniques for GWP mapping include AHP [22,[26][27][28], Frequency ratio (FR) [29], Logistic regression (LR) [30], Fuzzy set [31], Quick unbiased efficient statistical tree (QUEST) [32], Weighted linear combination (WLC) [33], Evidential belief function (EBF) [34], Multi-influencing factor (MIF) [35], Shannon's entropy [36], TOPSIS [37], Dempster-Shafer model [38], Bayesian network model [39] etc. Causal relationships based on Fuzzy decision-making trial and evaluation laboratory (FDEMATEL) approaches have also been applied for soil erosion, flood, and landslide susceptibility mapping [40,41].Such integrated methods have also been used for groundwater potential mapping, for instance the study by Echogdali [42] in the Akka Basin, Morocco.
Recent trends showed a paradigm shift towards developing big data geospatial hybrid MCDM-based cloud environments that enabled machine learning models for mapping groundwater productivity and availability.Emerging cloud computing platforms, including Google earth engine (GEE), Climate Engine (CE), SEPAL, IBM Cloud, OpenEO, Amazon web services (AWS), and Microsoft Azure cloud technology, have shown their capability in handling a vast amount of Analysis-ready product (ARP) conditioning parameters.These platforms effectively expedite decision-making, surpassing traditional data pre-processing methods [43][44][45][46][47]. Al-Ozeer et al. [48] used Azure Cloud with many parameters to quantify groundwater potential maps in Northern Iraq.The integration of RS, cloud computing, and MCDM with a AHP architecture has ushered in trends in achieving high efficiency in groundwater potential mapping [49].
GEE is an open-source, freely distributed cloud platform with petabyte-level storage ability, having earth observation data over the past four decades [50][51][52].Magnoni et al. [53] incorporated GEE cloud computing with hydrological FLDAS models to identify groundwater dynamics in the suitability area for groundwater recharge zones during 2014-2017 in Brazil.Previous research had been limited to topography and a few conditioning parameters for quantifying GWP mapping.However, the new GEE cloud provides more parameters to aid groundwater assessment, evaluation, and conservation [54].carried out GWP zoning using the GEE platform, integrating GIS and RS techniques by incorporating fifteen groundwater recharge monitoring parameters for Islamabad, Pakistan.They identified the 15 % area as suitable for extracting groundwater.
While conventional methods of mapping and delineating GWPZ often rely on ground field measures/surveys and costly hydrogeological and geophysical tools, the appearance of the GEE cloud presents a transformative opportunity.GEE provides extensive access to many conditioning parameters, including hydrological, permeability, morphometric, and anthropogenic criteria, enabling real-time GWPZ mapping with ample storage capacity.Nevertheless, in the context of the Subarnarekha River basin, limited studies have been conducted to ascertain the potential occurrence of subterranean water.Bridging this research gap, integrating cloud-based Geotools presents a affordable and expeditious means of generating and modeling geoscientific data.This current study used a synthesis of MCDM processes with GIS and RS models to outline groundwater potential areas.
The study's novelty includes incorporating a large number of influencing factors for GW potential zone mapping through various multi-criteria analysis approaches enabled by the GEE cloud platform.The study also indicated that the combination of MCDM variation and RS-GIS for groundwater prospecting could be a cost-effective technique that could overcome the limitations of traditional methods.The study also encouraged the utilization of secondary information for quick assessment of GWPZ.In addition, this study provides deep insight into the ground storage change analysis in specific periods.The database can be utilized by the officials and authorities to adequately plan and manage the artificial recharge projects in the study area, ensuring that the region's consumption stays sustainable.The scope of this study extends to near-real-time monitoring of underground water potential within any given watershed.The insights from this research will be instrumental in formulating a sustainable groundwater management strategy.Additionally, it will serve as a valuable resource for both private and public sectors in identifying optimal locations for borehole drilling operations.
The study was carried out to GWPZ by using three MCDM approaches, namely, Fuzzy Analytic Hierarchy Process (FAHP), Fuzzy Decision-Making Trial and Evaluation Laboratory (FDEMATEL), and Logistic Regression (LR) with the assistance of twenty-seven influential parameters in the GEE cloud platform.Additionally, data from 72 well locations were incorporated to detect appropriate zones for groundwater recharge.

Study area
The Subarnarekha River basin, located in the Jharkhand region of India (Fig. 1), covers an area of 19,296 km2.The study area is positioned between the latitude of 21    09′ to 87 • 27' North, with an elevation of 740 m in a heterogeneous landscape.The fundamental landscape portion of our study area in its eastern side is marked by river terraces with newer geological formations, such as tertiary gravels, recently deposited alluvium, and Pleistocene alluvium [12].Diverse parent materials and various soil groups, such as alluvial, red, and latosol soils, underline the river basin area.The basin area experiences a humid tropical climate characterized by hot summers and mild winters with an average annual rainfall of 1400 mm.According to Jain Fig. 2. Details workflow of the steps for the GWPZs mapping.
C. Singha et al. et al. (2007), around 80-90 % of total annual precipitation occurs during June-October.The basin has 12 gauge discharge sites and two flood forecasting stations.Moreover, the Subarnarekha basin provides the three states (West Bengal, Odisha, Jharkhand) with a considerable water source for their industrial, irrigation, and municipal needs.
This basin include a variety of mineral materials (e.g.copper, iron, gold, and uranium), which can be exposed as a result of unplanned mining as well as untreated industrial domestic wastewater, thus polluting the river and threatening marine life [55].Heavy rainfall in the Chhotonagpur plateau generally brings floods and heavy siltation in the lower Subarnarekha basin, causing loss of property, domestic animals, and sometimes people.
There are 38 dams (i.e., Hatia, Getasuld, Galudih, Chandil), 12 barrages, and four weirs in the river basin for providing irrigation in the surrounding region.The part of the Subarnarekha basin shows more than a 4 m rise in groundwater level fluctuation due to recharge and fall of less than 2 m in most of its parts.Higher levels of chlorine (up to 85.2 mg/L) and sodium (up to 39 mg/L) are present in groundwater [11].

Groundwater inventory mapping
The groundwater inventory data were acquired from the Central Ground Water Board, India [8].Groundwater wells, with a high yield of ≥10 m 3 h -1 (mean of score) were considered as a threshold for potentiality GWPZ activity in our study area.For  cross-validation, 72 well were used as samples to produce the GWPZ through a training (70 %) and testing (30 %) validation approach.

Methodology
The GWPZ mapping methodology was established before fieldwork, as illustrated in Fig. 2.This comprehensive framework covers five distinct stages.Firstly, it involves the preparation of the GWPZ conditioning database.Following this, sufficient data is employed to create Groundwater inventory mapping.The third stage incorporates applying the Random Forest (RF) model and Multi-collinearity analysis to determine variable importance and select GWPZ conditioning factors.Subsequently, spatial predictions for GWPZ maps are generated using a combination of FuzzyAHP, FuzzyDEMATEL, Logistic Regression (LR), and hybrid models.Finally, a feature selection analysis is executed employing the Boruta technique.The resulting maps are then rigorously validated using the Area Under the Curve (AUC) curve [39] to determine the most effective model for GWPZ mapping.

Source of data
The present research demonstrates using numerous factors for groundwater potential assessment in the Subarnarekha River basin.Five main factors were selected through an broad investigation of the literature and expert opinions on their relevance to groundwater assessment.These twenty-seven sub-factors were designated based on the relative influence of each data layer, determined through practical experience and knowledge of factors affecting GWPZ mapping.They were then adjusted based on their proximity to specific sub-categories [54].The five major conditioning factors chosen are hydrological criteria, morphometric criteria, permeability, terrain distribution, and anthropogenic factors (see Table 1 and Supplementary Fig. 1).

Groundwater potential evaluation
Various geo-environmental factors play a significant role in determining the groundwater status of an area.In this study, conditioning factors were used for GWPZ mapping (Table 2).A total of twenty-seven primary Groundwater conditioning factors (GWCFs) were classified into five major groups based on their similar properties and consistency (Supplementary Fig.  [42,45,54] All conditioning factors are reclassified into raster layers with a resampled 30-m spatial resolution.Ten experts in the fields of hydrogeology, meteorology, and local administrators were interviewed through a questionnaire.Additionally, water resources experts were consulted to gather their opinions on these factors' relative importance and ranking.Furthermore, the weights of the factors were estimated using FuzzyAHP, FuzzyDEMATEL, and Logistic regression techniques.Considering the probability of influencing GWP, individual factors were classified into five levels of impact, including very high, high, moderate, low, and very low.The groundwater potentiality of individual conditioning factors was evaluated, and the combined groundwater potentiality level for the main factors was estimated.The ranges of the selected factors were classified into two categories: numerical and non-numerical.The factors with numerical ranges were closely observed to decide whether they influenced the GWPZ directly or inversely.The numerical ranges of the potential classes were divided through Jank's natural break grouping based on the available data range in ArcGIS software [56,57].
where, R 2 J is the coefficient of determination of J th factor.The threshold value of tolerance and VIF is > 0.1 and < 10, respectively [59].The multicollinearity may exist when the value VIF exceeds 5.0, though a significant impact will be observed when it crosses 10.

Random forest (RF) modeling
RF modeling has been widely employed for nonparametric multivariate classification and regression tasks.RF works by constructing many decision trees during training and aggregating their outputs to improve predictive accuracy and control overfitting, C. Singha et al. making it a robust choice for complex datasets [60].Our study employed the 'caret' package in R version 4.0.2 to implement the RF model.Specifically, after conducting multiple experiments, we configured the number of trees ('ntree') to 500 and set the 'mtry' parameter to 10.We adopted a 10-fold cross-validation approach during the RF modeling process to enhance the model's robustness and mitigate overfitting.The GWCFs were prioritized based on the mean decrease in accuracy using the Gini index.We ensured that the total number of trees and factors tested at each split were fixed at 500 and 5, respectively, to achieve the lowest Out-of-bag (OOB) error (Supplementary Fig. 2).

MCDM modeling
In the FAHP modeling, all parameter weights are established relying on the decisions made by the decision-maker and their preferences for the alternatives.The analysis was conducted in the Python Google Colab cloud environment to generate the final results.

Fuzzy-AHP modeling
Fuzzy AHP represents an updated version of the AHP method introduced by Saaty.Given the inherent uncertainty and vagueness associated with the AHP method, FAHP is considered a more effective alternative.The fuzzy system incorporates fuzzification, which involves converting linguistic terms into membership functions.These membership functions consist of three parts: lower, upper, and intermediate membership functions.In our study, the triangular fuzzy number (TFN) matrix is implemented to manage uncertainty and ambiguity through the membership system (Supplementary Fig. 3).TFNs are typically denoted as (l, m, u) or (l/m, m/u), representing the lowest possible value, the highest possible value, and the most likely value, respectively.The scale of relative importance, which ranges from 1 to 9 in crisp numbers, is replaced with fuzzy numbers (FNs).Each term is assigned a value between 0 and 1, representing its degree of membership within the intersection numbers.TFNs have a linear representation of each degree of membership on their left and right sides, as described in (Eq.( 3) & Eq. ( 4)) [61].
where, l(y) and r(y) denote the left and right sides of a FNs, respectively.
To determine the weights of evaluation criteria using FAHP, pairwise comparison matrices were formulated for all criteria within the hierarchy which is expressed as matrix Ã in (Eq.( 5)): where, ãij the relationship between parameters i and j.When parameters i and j are identical (i.e., i = j), the notation 1 is defined as a triangular fuzzy number represented by (1, 1, 1).Extending this concept, a fuzzy scale ranging from 1 to 9 is employed to assess the relative importance of parameters i to j.Conversely, the inverse of this scale, ranging from 1− 1 to 9− 1 , is utilized to measure the relative importance of parameters i to j.This nuanced approach in evaluating the significance of parameters in relation to each other is underpinned by a fuzzy transformation rating scale, the specifics of which are elucidated by Tahria et al. [62].Subsequently, the fuzzified pairwise comparison matrix was estimated using the Buckley fuzzy system to calculate the final fuzzy weighting through the geometric mean method (Eq.( 6)).
where, ãin is fuzzy comparison value of criterion i to criterion n; therefore, ri is the geometric mean of fuzzy comparison value of criterion i to each criterion.In wi , i is the fuzzy weight of the i th criterion, and it can be revealed by a TFN.wi = (lw i ,uw i ,mw i ), where lw i , uw i , mw i , stand for the lower, upper, and intermediate values of the fuzzy weight of the i th criterion, respectively.In this research, the principle of linguistic pairwise comparison was employed to estimate hierarchical fuzzy weights.Fuzzy weights were defined for the five selected GWPZ main criteria using fuzzy numbers, and these weights were further processed through the center of the area as a defuzzification method.This method calculated the defuzzified numeric crisp weights, which were then used to estimate the normalized final weight of the GWP influencing factor.
C. Singha et al.

Fuzzy DEMATEL modeling
DEMATEL is a group decision-making tool used to address complex inter-criterion relationship problems by interpreting causal effects [63].Expert opinions, judgments, and respondent views are assigned numerical scores on a scale of five levels (0-4) for each perception.To account for inconsistencies and vagueness in expert opinions and subjectivity, we computed the initial direct influence matrix by converting the linguistic variables into corresponding TFNs (Triangular Fuzzy Numbers) (Eq.( 7)), as shown in Supplementary Table 1.
The initial direct influence matrix is built through several pair-wise comparison matrix.Computed the initial fuzzy direct-relation matrix Z k by having inspectors acquaint with the fuzzy pair-wise relation between the entities in an n × n matrix, where k is the number of respondents.Based on the direct-relation, matrix is recognized as Where Z is n × n non-negative matrix; Z ij denotes the direct influence of factor i on factor j; and, when i = j and the diagonal features Z ij = 0 (Eq.( 8)).
Calculated the normalized fuzzy direct-relation matrix "D" using the following equation (Eq.( 9)) Obtained the total-relation matrix T using Eq. ( 8), where n × n identity matrix is denoted with I. Lower and upper values are computed individually from the formula (Eq.( 10)). where, Acquiring the column (C j ) and row (R i ) sums for each column j and row i from the T matrix, respectively, using equation (Eq.( 11)).
The final Fuzzy DEMATEL weight W j of criteria acquired from the equation (Eq.12) The criterion cause and effect prominence/relation is shown by the causal diagram, where horizontal axis is represented by (R i +C i ) and the vertical axis is defined by (R i − C i ).The "Prominence" indicator was demarcated in the horizontal axis, represented the relative importance of each parameter.The "Relation" is indicated in the vertical axis with the extent of influence of the factor.In this intersection module, the cause group is exhibited when the (R i − C i ) = > 0, otherwise (R i − C i ) =< 0 condition is contained in the effect group of the factor.Causal graph delineating the two-dimensional space for judgment of the multi-decision support system environment identifying the most influential factor and how they are interdependent with other parameters.In this study, Fuzzy DEMATEL methods were applied using Python jupyter notebook inbuilt GoogleColab cloud environment.
C. Singha et al.

Logistic regression modeling
The logistic regression model elucidates the GWPZ probability association between two intervariables [64].The model output could be binary components, such as 0 or 1, high (100 % potential) or low (0 % potential) on a sigmoid-shaped curve.Logistic function p(z) working in this model is defined by the following formula (Eq.13) where, z is the net input of linear combination which is the linear combination of weights (β) and sample features (x), and given by the following equation (Eq.( 14)).The probability of the best-fitting model was derived from logit transformation (Eq.( 15)).
where the coefficients β are computed using the maximum likelihood function.(β 0 represents the intercept of the model, and β i represents the criteria.p(x) denotes the odds ratio that explained the occurrence of the potential probability.

Multi-collinearity application
In this section, we present the results of our analysis on multicollinearity among twenty-seven influential parameters categorized into five major criteria for GWPZs.We examine the tolerance and VIF values to assess multicollinearity issues affecting the GWCFs.Tolerance and VIF values below 0.1 and less than 10, respectively, indicated the absence of significant multicollinearity issues among predictor variables.The multicollinearity test result (Table 3) demonstrated no multicollinearity problems affecting the effectiveness of groundwater potential, as all the GWCFs had tolerance and VIF values below the specified thresholds.Tolerance values for the factors ranged from 0.13 to 0.85, while VIF values ranged from 1.16 to 7.69.Aspect had the highest tolerance value (0.85) among the conditioning factors, while HBSE had the highest VIF value (7.69).

RF application for the relative criterion importance
The RF algorithm used 100 trees and 27 conditioning factors for the GWPZ estimation.It can be seen from Table 4 that the RF model correctly identified 34 non-potential wells as non-potential wells and 5 non-potential incorrectly as potential wells.Conversely, the algorithm correctly identified 28 groundwater potential wells as potential wells and 5 groundwater wells incorrectly as non-potential wells.Furthermore, the OOB error rate, estimated at 13.89 %, indicates an accuracy of 86.11 % in GWPZ estimation.The reduction in OOB error over the splits and tree numbers were depicted in Supplementary Fig. 2. The relative importance of the GWCFs derived from RF model in Fig. 3 (a) and (b) revealed that soil loss (6.78), soil type (6.08), total runoff (3.94), elevation (2.94), and soil moisture (2.71) were the most critical sub-criteria in the GWPZ analysis in the study area, while SPI (0.08) and baseline water stress (0.001) were the minor important criteria (Table 5).

GWPZs with main criteria
We employed 27 sub-factors, categorized into five major groups, to predict groundwater potential.The suitability of each parameter was assessed within these five significant criteria.A groundwater potential map was generated by integrating sub-criteria with individually reclassified layers using the natural break (Jenks) method, resulting in five-level classifications.These five central criteria-based GWPZ maps were then used to generate various output maps through MCDM models.In terms of hydrologic criteria, the highly and very highly potentiality classes covered 23.86 % and 18.73 % of the total area, respectively (Table 6 & Fig. 4 (a)).Permeability and terrain distribution were the most influential criteria for very low potentiality, with shares of 8.73 % and 1.46 %, respectively (Fig. 4 (b) and (d)).Regarding morphometric criteria, the moderately potential class accounted for 26.71 %, while the highly potentiality class covered 34.52 % (Fig. 4 (c)).In the anthropogenic criterion, areas with low to very low potentiality constituted 0.91 %.In comparison, 19.01 % of the areas fell under the moderately potential class (Fig. 4 (e)).The highest GWP class, covering 50.11 % of the area, was associated with the very highly potential category for the anthropogenic criterion (Table 6).

MCDM model derived GWPZs map
The MCDM models generated a GWPZ map for the Subarnarekha basin, classifying it into five primary levels.Fuzzy AHP, Fuzzy DEMATEL, and the Logistic Regression model were analyzed using an open-source Python Jupyter Notebook hosted on the Google Colab cloud platform.

Fuzzy AHP estimation
The Fuzzy AHP method was used to estimate the individual criterion weights, which were then employed to create the GWPZ map

Table 4
The confusion matrix of RF model.for the Subarnarekha basin.The fuzzy membership function was utilized to normalize the final fuzzy weighted criteria derived from fuzzy crisp layers (Table 7).Among the five criteria, the most influential factor in mapping the GWPZ was the hydrologic factor, with a weight of 0.410; thus, it was assigned the highest rank, indicating its suitability for groundwater recharge conditions.Conversely, the soil permeability, morphometric, terrain distribution, and anthropogenic parameters were measured less significant, with weights of 0.314, 0.162, 0.070, and 0.044, respectively.The groundwater potentiality zones were categorized into five classes: very highly, highly, moderately, low, and very low, covering areas of 3235., respectively.The areas with very high and high groundwater potential were concentrated in the lower part of the Subarnarekha River (Fig. 5 (a)).In contrast, areas with moderate and low GWPZs dominated the upper and middle parts of the catchment area.

Fuzzy DEMATEL estimation
The study integrates fuzzy sets and the DEMATEL system to develop a comprehensive framework for measuring groundwater potential in the Subarnarekha catchment area.The interplay between each criterion was determined using linguistic variables and triangular fuzzy numbers (TFN) (Table 8).The study results were elucidated through causal graph analysis.In this analysis, groundwater potential GWP condition criteria such as g1 (hydrologic, 0.129), g3 (morphometric, 0.526), and g5 (anthropogenic, 0.155) were categorized into the cause criteria cluster.Meanwhile, the effect criteria cluster included g2 (soil permeability, − 1.376) and g4 (terrain distribution, − 0.431) in GWPZ modeling (Fig. 4 (f)).The cause cluster criteria depicted the impact of the influencing factors, while the effect cluster criteria referred to the consequences of the affected criteria.Soil permeability emerged as the most important criterion in GWP, with the highest Di-rj value of − 1.376 (Table 8).In contrast, anthropogenic factors, with a higher Di + rj value of 1.152, were identified as significant criteria for GWPZ.Di + rj, termed as "prominence," was used to determine which parameters had the most substantial overall influence in the system.The results indicated that hydrologic criteria exhibited the most robust interactions with other parameters.

Table 8
Fuzzy DEMATEL analysis for the assessment of the criterion crisp, weight, and rank.Based on FDEMATEL, normalized weights for the respective conditioning criteria were utilized to generate the GWPZs map.Hydrologic criteria carried the highest weight (0.231) and ranked 1st, followed by soil permeability (0.228) in second rank, morphometric (0.199) in third rank, terrain distribution (0.187) in fourth rank, and anthropogenic (0.155) at fifth rank.Approximately 18.80 % of the study area was categorized as very highly groundwater potential zones (Fig. 5 (b)).Additionally, 28.45 % and 28.94 % were designated as highly and moderately potential areas, respectively.In comparison, 7.71 % of the area was classified as having very low groundwater potential zones (Table 9).

Logistic regression estimation
The LR model yielded coefficients (β) for independent criteria in the GWPZ map equation (Eq.( 16)): Z = − 13.59 + 0.464 X hydrologic + 0.114 X soil permeability + 0.019 X morphometric + 0.218 X terrain distribution − 0.073 X anthropogenic (16) These conditional criteria were utilized to predict groundwater resources and estimate the GWPZ probability map.Positive coefficients (β) for hydrologic, soil permeability, morphometric, and terrain distribution positively influence groundwater occurrence probability.Conversely, the negative coefficient value (β) for anthropogenic (− 0.073) diminishes the likelihood of groundwater occurrence, making it a less significant parameter.By examining the inter-relationships between dependent and independent criteria using the LR model, the GWPZ map was created (Fig. 5 (c)).It reveals varying groundwater potential levels, from very low (darker blue) to very high (lighter yellow) in the southeastern part.Finally, the GWPZ levels were reclassified into the following classes: very highly (13.16 %), highly (22.51 %), moderately (30.80 %), low (23.50 %), and very low (10.03 %) (Table 9).

Hybrid MCDM modeling
An integrated MCDM modeling approach was used to create a GWPZ overlay layer in ArcGIS to minimize uncertainties in model performance.The combination of FAHP and FDEMATEL models resulted in a groundwater potentiality map (Fig. 5 (d)), with approximately 29.03 % indicating moderate potential (see Table 9).

Integrated of FAHP and LR model
Correspondingly, the combination of the FAHP and LR models revealed that within the total area of the Subarnarekha River basin, 19.73 % was categorized under the very highly GWP zone (Fig. 5 (e)).Additionally, 25.05 % and 20.36 % belonged to the highly and moderately potential classes, respectively.The spatial distribution of GWP results via the combined LR and FDEMATEL approach indicated that 48.88 % of the area had highly to very high GWPZ (Fig. 6).Furthermore, 15.6 % exhibited moderate potential, while low and very low potential areas covered 11.62 % and 23.9 % of the region, respectively (Fig. 5 (f)).In the final hybrid results of Fuzzy-AHP-DEMATEL-LR, very highly, highly, moderately, low, and very low GWP classes accounted for 14.7 % (2852.45km 2 ), 25.75 % (4989.78km 2 ), 23.0 % (4456.22km 2 ), 23.4 % (4546 km 2 ), and 13.06 % (2530.48km 2 ) of the entire basin, respectively (Fig. 6).The model's output indicated that the downstream areas of the river basin exhibited high prospect zones due to their flat alluvial soil topography (Fig. 5 (g)).These regions have a high infiltration rate, resulting in greater aquifer storage capacity.In contrast, the recharge rate was identified as low in the northern and central parts of the basin, classifying them under the low and very low potential zone classes.

Validation of groundwater levels
The GWPZ map was evaluated by long-term groundwater level data for pre-and post-monsoon periods.A groundwater depth map of the selected study area was created for both pre-and post-monsoon periods, categorized into five classes based on the water table depth of groundwater level (Fig. 7 (a) and (b)).The average groundwater level depth in the pre-and post-monsoon seasons varied from 4.35 to 7.57 m below groundwater level (mbgl), respectively.A high fluctuation level indicated low potential, while low fluctuation denoted high potential.The central part of the basin exhibited a high water table depth over both the pre-and post-monsoon seasons, indicating low GWPZ.
In contrast, the upper and lower parts of the study area were characterized as having high potential for groundwater storage throughout the entire basin which is align with the groundwater potential zone map developed through our model.The 'Map query' tool compared the groundwater potential zone map with the groundwater table depth.Most wells with low water table depths were located in the suitable GWPZ (very highly and highly classified) as mapped by the MCDM models for both the pre-monsoon and postmonsoon periods.Slightly reduced matching of both maps for pre-monsoon and post-monsoon GW depth may be attributed to the type of groundwater use in the south-central region of our study area (Fig. 7 (c) and (d)).
The performance and efficiency of the Fuzzy-AHP, Fuzzy-DEMATEL, and LR models were evaluated using the AUC matrices.From the validation results, it can be concluded that the LR-Fuzzy-AHP-DEMATEL model demonstrated superior effectiveness in identifying groundwater potential areas.This model reflects the correlation between groundwater potential criteria and the assigned criterion weights obtained through an MCDM model.The AUC-ROC value of LR-FDEMATEL-FAHP (0.782) is higher than that of LR-FAHP (0.775).The AUC results indicate that LR-FAHP achieved approximately 78 %, LR achieved 77 %, and FDEMATEL-FAHP achieved 76 %.FAHP and FDEMATEL-LR both achieved approximately 75 % (Fig. 8).Therefore, the AUC results for the study area ranging between 75 % and 78 % indicate medium to the high predictability of GWPZ.An AUC value greater than 75 % is considered satisfactory for acceptable model performance in this context [65].

Validation with actual pumping well yield
To certify the GWPZ map, yield data from pumping wells in the lower basin of the Subarnarekha watershed (Baliapal and Hasimpur blocks, Balasore district, Odisha, India) were utilized.In this region, the average well yield was approximately 2334.14 m 3 per day, primarily used for irrigation.Nearly all existing pumping wells in the region fell within the "good" and "very good" categories as determined by the hybrid model.Out of the ten wells, nine yields agreed with the GWPZ mapping, and one partially agreed with the GWPZ map.Two wells, currently in a defunct condition, disagreed with the GWPZ map (Table 10).Therefore, the overall accuracy level of the hybrid MCDM model was approximately 77 %.While this accuracy level may not be very high compared to previous studies involving fewer factors, it is noteworthy that some studies utilizing machine-learning techniques have achieved similar levels of efficiency.For example, the AHP produced an accuracy of 78.8 % [66] with an AUROC score of 0.705, which was considered indicative of good prediction capability for AHP.Additionally, results using the LR algorithm on testing data showed an AUROC score of 0.686.

Groundwater status in the study area
In the study area, the continuous expansion of agricultural land, mining activities, deforestation, industrialization, and urbanization have led to an increasing demand for water, resulting in the depletion of the water table over time.Additionally, groundwater recharge decreases due to lower precipitation levels and intermittent rainfall events with limited runoff [67].In such a scenario, there is a critical need for effective water management policies and training programs to enhance understanding of the hydrologic, morphometric, soil permeability, and anthropogenic factors interconnected with aquifer storage and surface runoff conditions.Consequently, groundwater resource mapping and analysis are crucial in monitoring long-term water sustainability and development within any region [23].

Selection of individual factors
Twenty-seven relevant individual factors were measured for mapping groundwater potential zones.With the availability of cloud computing platforms, the estimation of parameters for high-resolution and large-sized images can be carried out easily, quickly, and efficiently.Increasing the number of parameters can precisely identify groundwater monitoring parameters and provide accurate zone mapping.However, the proximity of some parameters may lead to overlapping influences.Therefore, similar properties and composition factors were grouped under the same primary criteria [54].also employed this approach of grouping criteria for groundwater potential mapping in Pakistan.In addition, the weight allocated to each class in the different criteria thematic maps based on their water potential capacity and characteristics is determined using the AHP method [57].The data on the region's groundwater prospects validated the method's accuracy.
In the MCDM modeling output, the hydrologic factor was the most crucial criterion, ranking highest in the GWPZ mapping analysis.The causal diagram revealed that the hydrologic (g1) criteria held the most significance, with high prominence and strong relationships compared to other criteria.Similarly, the terrain distribution (g4) criteria had less influence, with low prominence and weaker relationships than others.Morphometric (g3) and anthropogenic (g4) criteria were essential and could be influenced by other criteria, although they had low prominence and high relationships with other criteria.Lastly, the soil permeability criteria were essential and could not be significantly influenced by other criteria, with high prominence and low relationships.

Model integration as hybrid approach
To overcome the limitation of a single model and better represent all influencing factors, we integrated the MCDM models, namely,  Fuzzy AHP [68] and Fuzzy DEMATEL, along with machine learning techniques, such as LR and RF, to achieve better groundwater zone mapping [69].used logistic regression and RF to map groundwater at Sohag Governorate in Egypt [70].created a GWPZ map in Upper Mesopotamia, Turkey, using ten theme layers through the Fuzzy AHP technique.All the different conditional criteria, modeling maps, calculations, and evaluations were conducted using fast-processing GIS, GEE, and the Google Colab cloud environment.The ROC validation results indicate that the hybrid (AUC~78 %) model is a more effective MCDM-based tool for identifying GWPZ areas.This hybrid model reveals that very high to high potential GWP classes cover 40.45 % (7842.23 km2) of the entire basin.

Groundwater potential zoning
The results shown that high groundwater resources were situated in close vicinity to the river channel (<500 m), characterized by a gentle slope, high TWI, high soil moisture, and low drainage density (<2.2 sq km) around the riverbank area [71].showed that slope, LULC, and soil factors were the most sensitive factors for assessing GWPZ in Bangladesh.The low and very low potentiality areas corresponded to high population density, built-up land, and a short distance to the road network.Low potentiality areas also featured low porosity lithological ingredients (such as slate, phyllite, mica schist, gabbro-anorthosite, slate, phyllite, gabbro, diorite, staurolite kyanite schist, phyllite, and shale) with low permeability, high runoff, steep slopes, high elevation, and significant soil erosion.
Implementing targeted measures for water conservation, such as groundwater recharge plans and various structural enhancements, is crucial.These actions, aligned with local terrain and geological features, are designed to bolster groundwater storage capacity in the study area.The downstream and southern parts of the region exhibit good groundwater potential due to the presence of alluvial soil with a high infiltration rate, particularly in flat terrains.Generally, the southern regions experience higher groundwater recharge as runoff flows slower than the northern regions.Among the terrain distribution parameters, the NDVI indices serve as a reliable indicator of groundwater potential, particularly in the region's western part, where the NDVI value exceeds 0.55.The study's findings indicated that the NDVI could be utilized to identify areas with a shallow water table and natural vegetation and areas with inadequate in-situ observations [72].In the land-atmospheric system, evapotranspiration and soil moisture strongly correlate with groundwater potential.In the lower stream of the entire study area, the presence of moderate to high evapotranspiration and soil moisture levels suggests the existence of an unconfined aquifer [73].The Subarnarekha region faces significant soil loss due to extensive weathering and large-scale deforestation [12].With a similar study, we could identify the susceptible zones and carry out necessary preventive measures in the river basin.
Moreover, baseline water stress rises due to illegal mining activities, agricultural water withdrawals, and industrial demands [11].The RF modeling results highlight that soil loss and soil type are the most influential variables, contributing to an accuracy of 86.11 % in estimating groundwater potential in the study area.The RF model was found suitable for this type of study and could be extended to other regions for wider applicability of the model as demonstrated by Madani and Niyazi, [74] in Western Saudi Arabia.

Bourta Senstivity analysis
Multi-collinearity-based critical analysis was carried out on the factors influencing final outputs.The selected parameters were evaluated through Boruta feature selection analysis to determine their level of significance (Fig. 9) GW distribution in the region.The least essential parameters could be included in the final potential zoning map [75].The Boruta sensitivity of the groundwater influencing factors was summarized (Table 11).Based on the mean importance for the availability of the groundwater influencing factors, namely Total Runoff (30.74), rainfall (20.63), soil type (18.48), and DEM (18.02) are the most important factors, followed by soil moisture (16.31), population density (15.57), soil ET (13.85),TRI (13.02), soil loss (12.50), drainage density (10.47), lithology (9.90), NDVI (6.89), geomorphology (5.98), land use modification (5.81), road distance (5.77), lineament density (5.22), and slope (3.05).However, distance to rivers, BWS, TWI, TPI, SPI, profile curvature, aspect, LULC, HBSE, and GMIS were revealed to have been rejected among all assigned factors (Fig. 9).This will optimize the time required for analysis as well as improve the overall model accuracy.

GRACE analysis
As suggested by Scanlon et al. [76]; the utilization of Gravity Recovery and Climate Experiment (GRACE) products for the measurement of equivalent water thickness (EWT) has proven instrumental.This choice is particularly noteworthy due to the substantial enhancement in the spatial localization and amplitude of improved terrestrial Total Water Storage anomalies (TWSA).The generation of GRACE-based EWT heat maps effectively illustrated the regional water balance pattern within the study area.To obtain comprehensive information, data were gathered from the Centre for Space Research (CSR) and the Jet Propulsion Laboratory (JPL), as depicted in Fig. 10 (a) and 10 (b).
The monthly GRACE products were averaged for the area to deliver anomalies in EWT in centimeters during the period from 2002 to 2022.The EWT range of CSR and JPL products was − 29.49 to 37.30 cm.The heat maps generated from the GRACE-based EWT data illustrate the regional water balance patterns within the study area (SF.4).
In the early years, water availability in the basin region showed seasonal fluctuations, with levels under the mean from March to June and above the mean from July to November.Analyzing Equivalent EWT dynamics from CSR data, peak EWT occurred in August-October of 2003,2007,2011,2020, and 2021 (30 cm), while the minimum was noted in April-June of 2009, 2010, 2016, and 2018 (− 15 cm).Similarly, JPL data revealed a maximum EWT of 30 cm in August-October of 2003,2007,2008, and 2020, with a minimum recorded at − 15 cm in March-June during 2010, 2013, 2016, and 2017, respectively.These patterns provide insights into temporal variations in water availability.
Analysis of heat maps for the study area indicates that in the earlier years (2002-2007), water availability was higher from August to October.However, in recent years (2010-2019), there's been a decreasing trend in EWT during April to June.This decline may stem from reduced precipitation and the influence of climate change, marked by elevated daily and monthly temperatures.A parallel study by Salehie et al. [77] using GRACE solutions also highlighted a decline in water resources, ranging from 0.04 to 0.08 cm per year, within the Aral Sea's delta basin.

Scope and limitation of the models
The ranking of individual factors can sometimes be criticized due to the need for a fixed pattern.However, grouping individual parameters with similar properties and proximity into primary criteria can reduce the overlap of influence from individual factors on GWPZ.The range of individual factors under a particular class may have localized effects.For instance, this study area's precipitation ranged from 1200 mm to 1800 mm, which will not hold up well in other study areas.The cumulative effect of individual factors under primary criteria can determine the degree of suitability for the final mapping [78].Incorporating machine learning models (e.g., RF, XGBoost, SVM), advanced cloud pltaform (e.g., Amazon web sservices (AWS), Microsoft azure, Climate engine (CE), OpenEO) and metaheuristic optimization techniques (e.g., genetic algorithm, and particle swarm optimization) can introduce new modeling approaches for GWPZ mapping [18,48,[79][80][81][82]. Cloud platforms support the rapid assessment of numerous individual factors and enable   the incorporation of large volumes of data.Modeling with climate change scenarios may provide a better understanding of future GWPZ in the study area.Indian meteorological data on precipitation with grid-based distribution may offer improved precipitation zoning compared to CHIRPS.

Conclusion
This study commenced a novel and highly effective approach for assessing groundwater resources in the Subarnarekha River basin, India, using a hybrid MCDM model.The integration of FuzzyAHP, FuzzyDEMATEL, and LR MCDM models, along with GIS and RS techniques through the GEE cloud platform.The average of all models revealed that approximately 14.74 % of the study area had a very high potential for groundwater occurrences, primarily in the southern and northern parts of the basin.Conversely, the middle part of the basin exhibited very low potential, covering approximately 10.71 % of the area.The hybrid MCDM model was rigorously validated using real-field yield data, achieving an accuracy level of 77 % and an impressive AUC score of 78 %.This hybrid MCDM proved to be a unique and robust method for identifying and quantifying GWPZ in the Subarnarekha River basin.The RF and Boruta algorithms also highlighted the significance of factors, such as soil texture, elevation, slope, precipitation, soil moisture, and runoff in GWPZ modeling, in conjunction with other influencing variables.An alarming trend was observed during the study regarding the

Fig. 3 .
Fig. 3. Variable importance analysis of (a) mean decrease accuracy and (b) mean decrease Gini for GWPZ influencing factors using the RF model.

Fig. 7 .
Fig. 7.Comparison of GW depth in potential zones; a) Groundwater depth during pre-monsoon, (b) Groundwater depth during post-monsoon, (c) Suitable GW potential zones with better GW depth table: Pre-monsoon, d) Suitable GW potential zones with better GW depth table: Post-monsoon.

C
.Singha et al.

Fig. 10 .
Fig. 10.Monthly EWT variation from 2002 to 2022 derived from GRACE (a) CSR and (b) JPL products in Subarnarekha River basin.

Table 1
Data layers sources and description for groundwater potential mapping.

Table 2
Selected GWPZs criterion range with five potentiality classes.
C.Singha et al.

Table 3
Results of Multi-collinearity analysis among the GWCFs.

Table 5
Assessing the relative significance of the GWCFs.

Table 6
Area coverage of GWPZ classes for five main criteria.

Table 7
Fuzzy AHP analysis for the assessment of the criterion weight and rank.

Table 9
MCDM model derived GWPZs area coverage in sq.km and percentage.
table: Pre-monsoon, d) Suitable GW potential zones with better GW depth table: Post-monsoon.
C.Singha et al.

Table 10
Assessment of generated GWPZ map based on the well yield information.
C.Singha et al.

Table 11
Analysis of GWPZ importance factor using by Boruta technique.
C.Singha et al.