Groundwater salinity in the Horn of Africa: Spatial prediction modeling and estimated people at risk

Background: Changes in climate and anthropogenic activities have made water salinization a significant threat worldwide, affecting biodiversity, crop productivity and contributing to water insecurity. The Horn of Africa, which includes eastern Ethiopia, northeast Kenya, Eritrea, Djibouti, and Somalia, has natural characteristics that favor high groundwater salinity. Excess salinity has been linked to infrastructure and health problems, including increased infant mortality. This region has suffered successive droughts that have limited the availability of safe drinking water resources, leading to a humanitarian crisis for which little spatially explicit information about groundwater salinity is available. Methods: Machine learning (random forest) is used to make spatial predictions of salinity levels at three electrical conductivity (EC) thresholds using data from 8646 boreholes and wells along with environmental predictor variables. Attention is paid to understanding the input data, balancing classes, performing many iterations, specifying cut-off values, employing spatial cross-validation, and identifying spatial uncertainties. Results: Estimates are made for this transboundary region of the population potentially exposed to hazardous salinity levels. The findings indicate that about 11.6 million people (~7% of the total population), including 400,000 infants and half a million pregnant women, rely on groundwater for drinking and live in areas of high groundwater salinity (EC > 1500 µ S/cm). Somalia is the most affected and has the largest number of people potentially exposed. Around 50% of the Somali population (5 million people) may be exposed to unsafe salinity levels in their drinking water. In only five of Somalia ’ s 18 regions are less than 50% of infants potentially exposed to unsafe salinity levels. The main drivers of high salinity include precipitation, groundwater recharge, evaporation, ocean proximity, and fractured rocks. The combined overall accuracy and area under the curve of multiple runs is ~ 82%. Conclusions: The modelled groundwater salinity maps for three different salinity thresholds in the Horn of Africa highlight the uneven spatial distribution of salinity in the studied countries and the large area affected, which is mainly arid flat lowlands. The results of this study provide the first detailed mapping of groundwater salinity in the region, providing essential information for water and health scientists along with decision-makers to identify and prioritize areas and populations in need of assistance.


Introduction
Due to increasing human demands on the environment and climatic changes, water salinization is a threat faced by many countries around the world (Thorslund and van Vliet, 2020;van Engelen et al., 2022). Countries in arid and semiarid regions with decreasing runoff are particularly at risk (IPCC, 2007). Eighteen of the 20 poorest countries in the world are found in arid and semiarid areas and are particularly prone to increased aridity and periods of drought (World-Bank, 2018). Natural geological conditions, seawater intrusion, changes in precipitation and evaporation, overexploitation of groundwater, and poor irrigation practices contribute to increasing water salinity worldwide (Amer and Vengosh, 2001;Barica, 1972;Damania et al., 2019;IPCC, 2022a;Russ et al., 2020). This affects the availability of fresh water for biodiversity, human consumption, and crop productivity, thereby contributing to water insecurity. High salinity in water also increases soil salinization and, consequently, desertification.
The Horn of Africa, which includes Djibouti, Eritrea, Ethiopia, Kenya and Somalia, has natural characteristics that favor high salinity in groundwater. It is estimated that 80% of the population in the region depends on groundwater (UNICEF, 2020) and that only 19% of groundwater sources have salinity levels below ~ 800 μS/cm of electrical conductivity (EC), which is generally considered safe for human consumption (WHO, 2004). Numerous authors have pointed to high salinity as one of the leading causes for the closure or abandonment of wells in the Horn of Africa, as it can cause groundwater to become unsuitable for human consumption (FAO-SWALIM, 2013;Gurmessa and Taye, 2021;Pavelic et al., 2012). The World Health Organization (WHO) provides a guideline value for palatability but does not have a health-based standard for salinity. Nevertheless, the WHO encourages countries where salinity concentration is high to identify the components of salinity and to consult local authorities for guidance (WHO, 2003). Considering Australia and India, which are challenged with high salinity, the EC of drinking water should not exceed ~ 800 or ~ 700 μS/cm EC, respectively (BIS, 2012;NHMRC and NRMMC, (2011)). In the Horn of Africa, standards have been adapted to take into consideration the high salinity levels. In Kenya, governmental guidelines require EC in drinking water to not exceed ~ 2000 μS/cm (WAS-REB, 2008); in Ethiopia, it should not exceed ~ 1500 μS/cm (ESA, 2013). While the Somali government proposed a drinking water guideline of EC 3500 µS/cm in 1985 based on a study conducted in the Bay region, known as the Water Development Agency (WDA) standard (MoEWR, 1985). Subsequently, the 2004 Somali National Water Policy advised the use of recognized international drinking water quality standards, such as those of the WHO (MoEWR, 2002). However, it appears that the WDA guidelines may still be relevant, as few drinking water sources meet the WHO standards . Putting these values into context, Freeze and Cherry (1979) described water as brackish between 1500 and 15,000 μS/cm EC, and the WHO classifies the palatability of drinking water as unacceptable above 1500 µS/cm (WHO, 2011), while the FAO does not recommend the use of water for irrigation or human consumption above 3000 µS/cm (Abrol et al., 1988). For reference, seawater has an average value of ~ 50,000 µS/cm. Many recent studies have indicated a link between excess salinity in drinking water and hypertension, cardiovascular diseases, kidney malfunction, skin diseases, and diarrhea (Hallenbeck et al., 1981;IPCC, 2022a;Khan J et al., (2020); Naser et al., 2022Naser et al., , 2017Radwanur et al., 2017;Rosinger et al., 2021;Shammi et al., 2019). Damania et al. (2019) highlight that exposure to high levels of salinity in drinking water is most dangerous during pregnancy and childhood. During pregnancy, high salinity consumption has been linked to preeclampsia, gestational hypertension, and miscarriage (Khan et al., 2014(Khan et al., , 2011Nahian et al., 2018;Pinchoff et al., 2019;Scheelbeek et al., 2016). It has also been identified as a cause of high neonatal and infant mortality (Dasgupta et al., 2016;Joseph et al., 2019;Naser et al., 2020).
High salinity levels can also be harmful to a region's agriculture. For example, salinity greater than 2800 μS/cm may severely restrict horticulture (Abrol et al., 1988;UNESCO, 2002). Consequently, groundwater-dependent agriculture is restricted to ~ 8% of the territory in the low-lying areas of the Horn of Africa (FAO, 2013), where most of the territory and population (80%) is dedicated to the farming of goats, camels, cattle, and sheep (Muthusi et al., 2007). Such livestock generally tolerates high salinity levels and is able to thrive with EC levels of 6300 μS/cm or more (depending on the species), with goats and sheep (the predominant livestock in the area) tolerating more than 10,000 μS/cm (Basnyat, 2007;NRC, 1974). However, the more frequent and prolonged droughts affecting the region every 4-5 years in recent decades have worsened access to non-saline and non-brackish drinking water sources (FAO-SWALIM, 2021;Funk, 2020). This highlights the link between climate change and water quantity and quality. During periods of scarcity, the local population will use whatever water is available, regardless of its quality, putting their health at risk (Muthusi et al., 2007). The population of the five countries together in the Horn of Africa grew by 328% between 1980 and 2019, with urbanization putting further pressure on the already stressed aquifers used by cities (Nasreldin et al., 2016). This is especially concerning in an area with limited water infrastructure where 50% of the population is extremely poor (less than US$1.90 (2011 Purchasing Power Parity) per day per capita) (World-Bank, 2020. Although the availability of fresh water is well documented in the Horn of Africa, salinity levels are not (Muthusi et al., 2007), and there is no comprehensive understanding of the geographical distribution of water sources with safe salinity levels. In this study, we spatially predict salinity concentrations in groundwater at three thresholds to provide policy-relevant information according to different relevant waterquality standards. This is done using machine learning modeling with measurements of EC in groundwater and various predictor variables. In addition to examining the spatial distribution of groundwater salinity in the Horn of Africa, this study identifies the most vulnerable segments of the population: pregnant women and infants aged 0-12 months. In light of the increasing salinization of groundwater worldwide, the findings presented here may also be relevant elsewhere.

Study area
The Horn of Africa is located in East Africa (Lat. 17 • 59 ′ N-5 • 50 ′ S, Long. 51 • 14 ′ E-33 • 06 ′ E) and includes Djibouti, Eritrea, Ethiopia, Kenya and Somalia (Fig. 1). It borders the Gulf of Aden in the north and the Indian Ocean in the east, encompasses 3000 km of coastline, and covers an area of around 2,488,000 km 2 .
This region is comprised mainly of arid and semiarid plateaus, plains, and highlands, with the Ogaden desert in the Somali region of Ethiopia and the Chalbi and Nyiri deserts in Kenya accounting for substantial parts of the territory. The mountainous areas of northern Somalia reach 2460 m.a.s.l., while the mountains of Ethiopia along the Great Rift Valley reach well over 4000 m.a.s.l. The climate is harsh with extreme weather conditions, including drought, high average temperatures, and strong coastal winds. Temperatures can reach over 40 • C on the plains in summer and are cooler in mountainous areas, where average annual temperatures range between 6 • and 17 • C. Rainfall varies from north to south, averaging less than 20 mm per year in the north of Somalia and reaching 2000 mm per year in the southwest highlands of Ethiopia and Kenya (Muchiri, 2007). Potential evapotranspiration is high, especially in low-elevation areas in the north where it reaches 3000 mm per year and exacerbates groundwater salinity (Muchiri, 2007;WASH, 2019).

Data preparation and predictors
To predict the occurrence of salinity levels in this transboundary region, 8646 groundwater quality data points were compiled from 31 different sources ( Fig. 1) (Acacia, 2020b(Acacia, , 2020aAddisu Deressa Geleta, 2012;Adem, 2012;Ashun, 2014;Ayenew et al., 2009;Bairu et al., 2013;Blandenier, 2015;Bretzler et al., 2011;Brhane, 2016;Charity Water, 2020;Demlie et al., 2008;Ezekiel et al., 2017;FAO-SWALIM, 2018;Gebrehiwot et al., 2011;Gulta Abdurahman and Moltot, 2018;Kang'ethe, (2015); Kanoti, 2021;Makokha K. Jacquelyne, 2017;Muraguri and A, 2013;Nasreldin et al., 2016;Owango Wadira, 2020;Rango et al., 2010;Reimann et al., 2002;Rusiniak and Sekuła, 2021;Sottas, 2013;Tadesse, 2020Tadesse, , 2013Tadesse et al., 2010;Tanui et al., 2020). A detailed list can be found in Table S1. Although most of these data were given as EC, a conversion factor of 0.7 EC (μS/cm)/ Total Dissolved Solids (TDS) (mg/L) was used where reported as TDS (Weert, 2009). Forty-one geospatial predictor variables of geology (Fig. S1), climate, topography, soil, and ecology were collected for their known or potential relationships with elevated groundwater salinity levels. Fig. S2 lists these variables highlighting those finally selected. Since the scale of most of the predictor variables is 250 m, this common unit was used to grid the salinity data points. When there was more than one salinity measurement in a pixel, the geometric mean was calculated. Although the salinity data come from different sources and generally do not report errors, the data were converted into binary format according to the three thresholds described below, whereby 1 represents values above the threshold and 0 represents values below the threshold. This transformation was carried out for each of the three EC thresholds: 800 μS/ cm, 1500 μS/cm, and 2500 μS/cm, thereby largely eliminating any inconsistencies associated with the salinity measurements. The data were then balanced in relation to the class with the lowest representation, thus avoiding biases caused by unbalanced classes. This involved randomly discarding the excess data of the majority class and doing so over many iterations such that all data were ultimately used.

Modeling
Groundwater salinity was modelled at multiple thresholds to provide policy-relevant information according to the different water quality standards available: 1500 μS/cm (TDS ~ 1000 mg/L), which is the WHO standard for palatable drinking water (WHO, 2011) and used by local authorities in Djibouti and Ethiopia to regulate drinking water (ESA, 2013; MAEPE-RH, 2011); 800 μS/cm (TDS ~ 560 mg/L), which is the WHO standard for good-quality drinking water; 2500 μS/cm (TDS ~ 1750 mg/L), which acknowledges that local populations may consume high-saline water when no other options are available (Muthusi et al., 2007). However, values above 2000 μS/cm EC are considered unfit for human consumption (FAO-SWALIM, 2013).
The widely used random forest machine-learning method was used (e.g. Akter et al., 2021;Mosavi et al., 2021), which is suited for complex classification problems based on a set of decision trees (Breiman, 2001). Unlike other methods, the random forest algorithm does not make prior assumptions about the relationships between the target and predictor variables. It offers good performance when complex non-linear relationships and a large number of predictor variables are involved. Tools are also available for interpreting the fitted model, such as the importance of variables or partial dependence plots.
Training/testing datasets were created using spatial-block cross validation based on the size and shape of the study area. This approach is particularly useful when data are not uniformly dispersed throughout the study area. The spatial blocks are split into k parts (folds) and data are assigned to the folds sequentially. The data of a given fold are set  Table S1. An electrical conductivity (EC) of 800 µS/cm is the threshold for good quality water (WHO, 2004), 1500 µS/cm is the palatability threshold (WHO, 2011), and 2500 µS/cm is the highest value for which RF modeling is still feasible with respect to the distribution of available data. The histogram in the lower right shows the asymmetric distribution of EC measurements with three marked clusters: <800 µS/cm (39% of the data), 800-2500 µS/cm (33% of the data) and > 2500 µS/cm (28% of the data). aside for use in validation, while the rest of the data are used for training a model. Then the whole process is iterated until all folds have been used for testing. The spatial block approach is used to avoid spatial biases produced in the data splitting and is widely used as a form of validation (see 3.4 Validation). To avoid any bias, 100 iterations with different randomly selected training/testing datasets were performed. One hundred iterations were employed, since the results indicated that the performance of 1000 iterations, when executed and averaged, produced model performance metrics that varied by less than 1% compared to the results obtained by running 100 iterations. In addition, to prevent the disparity in class frequencies from negatively influencing the model's performance, the dataset was balanced at each iteration by randomly down-sampling the majority class in the training set to match the least frequent class. Then 1001 trees were grown to produce each model. Finally, the predictor variables were selected according to their contribution to overall model performance, retaining only those variables that had a mean decrease in accuracy > 0 on average across all iterations of the random forest model. Only 16 variables were retained that systematically improved the predicted accuracy.
To assess model performance, different metrics were used, including area under the curve (AUC), sensitivity, specificity, and false discovery rate (FDR). The AUC measures the area under the ROC curve, with values usually ranges from 0.5 (a guess) to 1 (perfect predictive accuracy). The AUC information was complemented with sensitivity and specificity, which provide further information on the discrimination capability of the model. Sensitivity indicates the correctly predicted positive fraction, and specificity is the correctly predicted negative fraction. The FDR reports the rate among all positive predictions that are actually negative cases.

Mapping
Probability maps of groundwater salinity exceeding the considered thresholds were created using the mean of 100 model predictions. Multiple iterations avoid bias produced by the arbitrary selection of a particular combination of validation and training data points. Groundwater salinity hazard was labeled as high or low relative to the thresholds used. Since the distribution of modeled probabilities may vary based on the prevalence of cases in a given dataset, the probability cutoff was set between high/low hazard where sensitivity = specificity, which equally weights the model's ability to correctly classify high and low salinity concentrations. This information was used to classify the salinity hazard in groundwater as high or low using the average probability cut-off of the 100 iterations. Two additional maps were created that classify the salinity hazard in groundwater as high or low using the 5th and 95th percentiles of the probability cut-off points of the 100 iterations. Taken together, these maps present the most stable and reliable risk areas, as they are less sensitive to imbalanced data or any arbitrary selection of input data.
The spatially uncertain areas were subsequently identified using the area of applicability (AOA) with the R package Caret Application for Spatio-Temporal (CAST) models (Meyer and Pebesma, 2021). The AOA is the area where the model can be applied with an expected average performance similar to that estimated with the training data. Consequently, areas outside the AOA present more uncertain predictions since they are outside the range of the training data. The estimation of the AOA is based on the threshold offered by a dissimilarity index (DI). The DI provides a unitless measure of how much each point outside the training data differs by considering the distances of predictor variables in a multidimensional space and weighting the predictor variables by their importance as derived by the random forest model. Thus, a new data point is outside the AOA when the DI exceeds a quantile of 0.95. Areas outside the AOA with less certainty were therefore removed from the risk maps to increase the clarity of the results.

Validation
By their nature, environmental spatial data in which geographically closer data are often more similar than distant data violate the assumption that data are independent and identically distributed, on which traditional forms of validation such as leave-one-out and K-fold are based. Thus, the use of conventional methods of validation in the spatial context can lead to overoptimistic estimates of prediction errors. To account for this, the model was evaluated with ten spatial blocks (each 420 × 420 km) using the R package blockCV (Valavi et al., 2019). Since data are rarely evenly dispersed over the model domain, to define the size and characteristics of the blocks, the package offers different alternatives to maximize the representativeness of the data and avoid blocks with little or no data. For instance, block assignments to folds can be implemented to achieve the most even distribution of data, resulting in a similar presence across classes. In turn, a semivariogram can be used to estimate a block size that minimizes spatial autocorrelation (SAC) among explanatory variables and presents sufficient representativeness across classes. Finally, the package enables the visualization of the blocks' locations and the data distribution within them.

Estimates of the exposed population
The modeled high-salinity areas in the Horn of Africa were combined with population density maps for 2020 (WorldPop, 2018) to assess the population at risk. In addition to the total affected population, the numbers of at-risk pregnant women and infants aged 0-12 months were also identified. These estimates account for country-wide rates of household groundwater use in urban and rural areas, as provided by the WHO/UNICEF Joint Monitoring Programme (WHO/UNICEF Joint Monitoring Program (JMP), (2019)). Rural and urban areas were distinguished on the basis of global urbanization grids from the European Commission's Joint Research Centre (Pesaresi et al., 2019). The groundwater-consuming population was multiplied by the probability of salinity concentrations exceeding the thresholds considered. Estimates of the people at risk at the different salinity thresholds were then disaggregated to first-level administrative units in each country (e.g. provinces).

Salinity hotspots
The hazard maps derived from the random forest modeling are shown in Fig. 2. The affected area in the Horn of Africa is substantial, as elevated levels of groundwater salinity are prevalent across a majority of the region, with only 41% of the region having salinity levels <800 μS/ cm. Most of these areas are located in the western parts of Ethiopia and Kenya as well as in high-altitude areas. While 45% of the Horn of Africa has groundwater salinity levels > 1500 μS/cm, the largest salinity hazard area is found in Somalia, followed by Djibouti, Kenya and Ethiopia (Fig. S3). Areas with very high groundwater salinity (>2500 μS/cm), corresponding to 21% of the Horn of Africa, include the Somali region of Ethiopia, the North Eastern, Coast and Eastern provinces of Kenya, and most of Somalia.
Somalia is the most heavily affected country in the region in terms of groundwater salinity, where, based on the three thresholds considered, about 90% (86-94%) of the country's total area is at risk, excluding only a few isolated places in the Awdal, Woqooyi Galbeed, and Middle Juba regions. Kenya has the second largest affected area, located mainly in the North Eastern, Coast and Eastern provinces. At the 800 μS/cm level, about 66% of Kenya is affected. In comparison, at the 1500 μS/cm threshold, the affected area of Kenya reduces sharply to 35%, while at the 2500 μS/cm threshold, only 32% of the country is affected. Ethiopia has the smallest affected area in the region, with about 44% of the country affected at > 800 μS/cm, decreasing to 26% at > 1500 μS/cm and 16% at > 2500 μS/cm, which affects only the Somali region of the country. About 46% of Djibouti is affected by salinity at > 1500 μS/cm and 10% at > 2500 μS/cm. Estimates of salinity in Eritrea were not made due to too few data points available for this study, resulting in the model of Eritrea being hampered by high spatial uncertainty and low confidence in the predictions (Fig. S3).

Population at risk
Our estimates show that about 7.6, 11.6 and 17.9 million people in the Horn of Africa are exposed to salinity levels above 2500, 1500 and 800 μS/cm, respectively (Table 1). These numbers correspond to 11%, 7%, and 5% of the overall population, respectively. Breaking down the exposure by country, it is evident that some regions are much more strongly affected than others. With ~ 5 million people, Somalia has the highest number of people exposed to salinity at the 1500 μS/cm level, which is 48% of the country's total population (Table S3). Due to generally high salinity levels in the country, the exposed population does not vary much when also considering the salinity thresholds of 2500 μS/cm (about 40% of the country's population) or 800 μS/cm (51% of the population).
The proportion of people exposed to high salinity in the other countries of the region is smaller. In Kenya, ~3 million people are exposed to salinity levels above 1500 μS/cm (i.e., 6% of the total population) (Fig. 3A), though the population is unevenly distributed with most of the exposed population concentrated in the North Eastern

Table 1
Estimated population potentially affected by groundwater salinity concentrations above the 800 µS/cm, 1500 µS/cm and 2500 µS/cm thresholds. The ranges are derived from considering probability cut-off values at the 0.05 and 0.95 percentiles. Special attention is drawn to the number of pregnant women and infants affected. province (38% of the total population). In Ethiopia, 2.6% of the total population is exposed to salinity levels above 1500 μS/cm (Fig. 3B), with the affected population at all salinity levels being almost exclusively located in the Somali region (35%). In three countries, the national exposure estimates are driven by a single region: the Somali region in Ethiopia, the North Eastern province in Kenya, and the Djibouti region in Djibouti. In Djibouti, ~300,000 people (34% of the total) are at risk at 1500 μS/cm, but only 3% are at risk at the higher salinity level of 2500 μS/cm (see Table S3 for more information). Population estimates were not made for Eritrea, as there is high spatial uncertainty in the predictions here.

Predictor variables
The probability of groundwater salinity in the Horn of Africa exceeding 800 μS/cm, 1500 μS/cm, and 2500 μS/cm was modeled by initially considering a set of 41 geospatial variables. Of these, 16 variables were retained and are plotted in Fig. 4 according to their relative importance. These variables represent climate, hydrology, topography, geology, and soil parameters. The interplay of these diverse factors in determining salinity levels in the area are reflected in the importance and PDPs of the individual variables. Variable importance for each of the three thresholds can be found in Figs. S4, S5, and S6. Partial dependence plots are provided in Figs. S7, S8 and S9, which depict the relationships between the predictor variables and salinity.

Model performance
No significant variation in model performance (e.g. AUC) or the spatial distribution of salinity was observed in the 100 different model runs for each threshold (Figs. S7-S9). Thus, the average AUC for all three thresholds was 0.82 with consistent ranges across all thresholds and iterations (800 μS/cm: 0.79-0.83; 1500 μS/cm: 0.81-0.83; and 2500 μS/ cm: 0.80-0.84). As expected when modeling with a balanced training dataset, sensitivity, specificity, precision, and balanced accuracy all had similar values at the optimal cut-off point and were around 0.80 which is similar to the AUC.

Salinity mapping
Prediction maps have been created of high groundwater salinity in the Horn of Africa at the levels of 800 μS/cm, 1500 μS/cm, and 2500 μS/ cm and represents the first detailed mapping of groundwater salinity in the region. To produce reliable and stable maps and metrics across all iterations, steps were taken to balance classes, perform many iterations, identify optimal cut-off values, and validate and identify uncertainties in the model. To avoid biases, the training data were balanced by randomly down-sampling the majority class in the training set to match the minority class, randomly dividing the data into training and testing datasets, and iterating each model 100 times. Block-based spatial cross validation was used to avoid spatial bias and provide a more realistic estimate of each model's predictive performance. However, block-based spatial cross-validation comes with two major challenges: using data separated into blocks may inadvertently reduce the diversity of the data, and the method does not take into account the predictive ability of the model for combinations of predictor data that are similar to but spatially distant from the training data. These challenges can be addressed through the careful implementation of the blocks and performing multiple iterations with these blocks.
The diversity of input data largely influences the success of a prediction. However, groundwater sampling points are often clustered at locations of greater interest and do not necessarily represent large areas well. This poses a significant challenge for spatial prediction, as nearby points tend to have less diversity in the range of environmental variables and, therefore, less variability to model. Although there were no areas with strong data clustering patterns, the model's applicability was estimated among clusters by comparing the values of the predictor data at the training data points with the rest of the region via the DI. Predictions in areas with considerable dissimilarity to the training data were masked, which are found mostly along the western boundary of the region in upland and/or humid areas where there are few data, e.g. Eritrea, thus also identifying areas where the model produces reliable predictions (see Fig. 2).
To ensure a sufficient quantity of data, we followed the strategy of collecting all data available. This approach is particularly effective for geogenic groundwater contaminants such as salinity, as binary concentration levels typically remain consistent across expansive regions, owing to the fact that geological changes are slow processes. Specifically, the presence of gypsum deposits and fractured rocks related to ancient marine deposits are responsible for the salinity levels observed. Therefore, this approach is an effective means of addressing geogenic contaminants. However, it should be noted that seasonal fluctuations can occur at certain locations, and these cannot be taken into account in the modeling due to a lack of data.

Salinity hotspots
The groundwater salinity maps for the three thresholds highlight the uneven spatial distribution of salinity in the region and the large areas potentially affected (Fig. 2). These areas are mainly the arid flat lowlands of Somalia, Kenya, Djibouti and Ethiopia. For the reference threshold of 1500 μS/cm EC, about 46% of the modeled area is potentially affected. Under all thresholds examined, Somalia is particularly affected in proportion to its total area (more than 80% of its territory is affected), followed by Kenya, with more than 30% affected (Fig. S3). Kenya's hotspots are the Chalbi and Nyiri deserts, Ethiopia's hotspots are the Ogaden Desert in the Somali region and Djibouti is affected mainly in the Djibouti region. These results confirm and extend the knowledge on areas potentially affected by high groundwater salinity in Ethiopia and Kenya (Ashun, 2014;Meaza et al., 2019;Rosinger et al., 2021), and in particular in Somalia (Muthusi et al., 2007;Said et al., 2021).
The on-going droughts that have affected the region have increased pressure on the limited groundwater productivity in the area, affecting groundwater quality and increasing stress on the water provisioning system (FAO-SWALIM, 2021;Funk, 2020). Rising temperatures and low humidity promote evaporation and, thus, the concentration of minerals in groundwater. Numerous authors have pointed to high salinity groundwater as one of the leading causes for the closure or abandonment of wells in the Horn of Africa, as the water has become unsuitable for human consumption (FAO-SWALIM, 2013;Gurmessa and Taye, 2021;Pavelic et al., 2012). Future climate projections indicate that by the end of the 21st century, during the short rainy season, the region could experience an increase in annual rainfall and a decrease in drought events (IPCC, 2022b). This contrast to the observed drying trend is known as the "East African rainfall paradox" (Bichet et al., 2020;IPCC, 2022b). However, long-term rainfall projections carry significant uncertainties. For example, the frequency, duration and intensity of droughts with global warming of 2 • C or higher are projected to increase in Somalia but decrease or remain unchanged in Kenya and the Ethiopian highlands (IPCC, 2022b). At the same time, temperatures are projected to increase in the region, enhancing evaporation and water consumption that may lead to a decrease in groundwater recharge, exacerbating salinization.
This situation exemplifies the strong link between water quantity and water quality in an area where there are notable deficiencies in the water sector related to water quality testing and monitoring. Deficiencies in water quality regulation are exacerbated by the rather limited understanding of how the water supply gets contaminated and the risk involved with using contaminated water (UNDP, 2016).
Estimates of the affected area and population in Eritrea were not calculated due to spatial uncertainty in the model (Fig. 3). Future research could specifically focus on groundwater salinity in this country, as some authors have reported problem areas (e.g. Lowenstern et al., 1999;Zerai, 1996).

Population exposed
Based on the population at risk, Somalia stands out with the greatest number of people exposed to high salinity at all three thresholds (Fig. 5). Regardless of the threshold level, the high proportion of the population affected by salinity in Somalia hardly varies, with ~ 50% of the people exposed to high salinity in the Horn of Africa residing in Somalia. In the cases of Kenya, Djibouti and Ethiopia, the potentially affected populations are concentrated in specific regions. For example, for the EC 1500 μS/cm threshold, 73% and 16% of the affected people in Kenya live in the North Eastern and Coast provinces, respectively, while 71% of the affected population in Djibouti lives in the Djibouti region and 81% of those affected in Ethiopia live in the Somali region (Fig. 3, Fig. 5 and Table S3).
As illustrated in Fig. 3C and 3D, pregnant women and infants are particularly vulnerable to high salinity levels in groundwater. There are ~ 372,000 potentially exposed infants (up to one year of age) in the Horn of Africa, of which ~ 210,000 live in Somalia. Only five of the 18 regions in Somalia are home to just under 50% of the infant population exposed to salinity levels of 1500 μS/cm; in regions of Somalia such as Nugaal, Sool, and Bakool, around 60% of infants could be at risk at all studied thresholds (Table S5). Somalia also has the highest relative and absolute number (i.e., ~237,000) of at-risk pregnancies in the Horn of Africa for the 1500 μS/ cm threshold. In regions such as Hiiraan and Bakool, over 50% of pregnant women are at risk under all the thresholds (Table S4). However, when looking at first-level administrative units, the Somali region in Ethiopia has the highest number of potentially affected pregnancies of the four countries. The region's at-risk pregnancies are estimated to be 136,000 at the 1500 μS/cm level (38% of the region's total pregnancies) and decline to 63,000 for the 2500 μS/cm threshold (23% total). In Somalia, the proportion of affected pregnancies remains almost the same, regardless of the threshold level.

Socioeconomic considerations
All studied countries have high rates of poverty and inequality, which inhibit the provisioning of safe sources of drinking water. In critical periods of drought, it has been reported that the price of water is highly volatile, increasing by up to 50% and costing more than US$1 per cubic meter (Mourad, 2021).
Drought leads to famine, and it has been reported that populations under such environmental stress will use whatever water is available, regardless of its quality (Muthusi et al., 2007) and putting their health at risk. Families often mix powdered milk with untreated water to feed their children. Unfortunately, highly saline water has been linked to diarrhea and other intestinal illnesses that can be life-threatening for infants (Chakraborty et al., 2019).
In addition, the lack of fresh water affects the area's scarce and fragile groundwater-dependent agriculture through crop losses and intensified soil degradation, which drives desertification (IPCC, 2019). Due to the adverse conditions for agriculture, a large part of the population in the Horn of Africa is engaged in livestock farming of more saline-resistant species, such as goats or camels (Muthusi et al., 2007). However, due to the large spatial and temporal extent of droughts, livestock keepers are forced to migrate internally in search of resources for their animals. This can lead to even more pressure on the few available water sources and increased social conflicts in the area (MoEWR, 2021;UNDP, 2016). This massive livestock migration has left children and women with limited access to food, increasing the likelihood of malnutrition and ill health (UNDP, 2016).

Influence of climate change
Salinity contamination in groundwater is the result of the interplay of several human and natural factors with some factors playing a more dominant role, such as climatic, topographical, hydrological, and geological characteristics. High salinity is concentrated in low and flat areas close to the sea with high temperatures and where precipitation is low and evaporation is high. Precipitation, recharge, evaporation, gypsum deposits, fractured ancient marine deposits, and proximity to the ocean were found to be the main factors related to high salinity levels in the study area (Supplementary Material,Figs. S3,S4,and S5).
Water quality in terms of salinity content has frequently been found to relate to climatic conditions in the area. It has been reported that during periods of drought and even during regular dry seasons, borehole yields have decreased and groundwater quality sometimes becomes more saline or brackish (FAO-SWALIM, 2021;WASH, 2019).  points to the arid conditions of a region as the best predictor of groundwater salinity throughout Africa. Drought has also been identified as a primary cause of reduced aquifer recharge. The present study has observed that lower groundwater recharge is positively related to higher groundwater salinity (see partial dependency plots in Figs. S3, S4, and S5). Anthropogenic causes, such as contamination (Brhane, 2018) or excessive groundwater extraction and rapid urbanisation, may lead to an increase in sealing of the soil surface, which, in turn, may reduce groundwater recharge rates and contribute to high salinity in confined areas. Coastal areas in the region are particularly vulnerable to saline intrusion due to over-exploitation of groundwater (Idowu and Lasisi, 2020;MNR, 2013).
The key challenges in understanding groundwater salinity in the Horn of Africa lie in identifying and understanding the different constituents that explain high EC (e.g. sodium, potassium, calcium, magnesium, chloride, fluoride etc.). Certainly, the distribution of these salinity constituents varies across such a large region as the Horn of Africa and there are drinking water health guidelines for some of their constituents, such as boron, fluoride, magnesium, and nitrate (WHO, 2003). The WHO recommends identifying the components of salinity to help protect the population from hazardous contaminants. For example, groundwater in areas of Somalia, Ethiopia, and Kenya contains high concentrations of fluoride (Kut et al., 2016). Given current drying trends (FAO-SWALIM, 2021), the challenges for estimating near-future groundwater quality in the Horn of Africa lie in better understanding how salinity drivers relate under different temperature and precipitation projections.

Adaptation to water salinity
Reducing salinity in the discussed context of the region involves complex and far-reaching challenges. Therefore, there is a need to adapt to high water salinity; e.g. Somalia's 2013 climate change adaptation plan already highlights the need to adapt to increasing groundwater salinity (MNR, 2013). Adaptation would involve tasks such as careful management and governance of freshwater sources present in the region. Much work can still be done to improve the management of the resource, monitor water quality, and educate the population to avoid overexploitation and contamination of available freshwater boreholes and wells (UNDP, 2016). There is also a need to improve and add to the available infrastructure (MoEWR, 2021) and promote approaches such as "safe sourcing", which is a hydrogeological approach to finding lowsalinity groundwater in high-salinity groundwater environments (Gurmessa and Taye, 2021) or deeper exploration (Godfrey et al., 2019). Water harvesting technologies are further interventions (Gebru et al., 2021). Additionally, the fragmented coordination of water management must be addressed to reach a functioning water governance (Mourad, 2020). Before the civil war in Somalia, 34% of the irrigable land was used; today it is considerably less, down to 9% (MoEWR, 2021). A meatbased diet has the largest water footprint, followed by vegetarian and vegan diets. Hence, in the face of water scarcity, promoting a plantbased diet and agricultural practices to support it could be an alternative to nourish the population (Zucchinelli et al., 2021). To strengthen crop farming, it is recommended to promote active irrigation management, soil moisture monitoring, and the introduction of plant species resilient to salinity or reduced water availability (Abrol et al., 1988;Mariani and Ferrante, 2017). At the same time, filtering or desalination technologies can be implemented, which are expensive today but are expected to become increasingly common in the future with the rapid advancement of desalination technologies, the reduction of emissions and costs through the use of solar energy, and market maturation (Sharon and Reddy, 2015;World-Bank, 2019). Somalia's national water resources strategy 2021-2025 highlights the exploration of desalinization alternatives to ensure the availability and sustainable management of water and sanitation (MoEWR, 2021). Djibouti's national program for rural water supply by 2030 also proposes the use of this technology (OWAS, 2013). This would imply a reduction in cost and the introduction of adequate measures for the safe management of the brines derived from desalination (World-Bank, 2019). A comprehensive review of adaptation options taking into account groundwater salinity and water scarcity in the region can be found in Gurmessa and Taye (2021).

Conclusion
In this research, the areas affected by unsafe levels of salinity in groundwater and the number of people potentially exposed to high salinity in the Horn of Africa were estimated. Salinity in groundwater in this area mainly affects arid flat lowlands with low groundwater recharge in gypsum deposits and fractured rocks related to ancient marine deposits. Sporadic and scarce rainfall and high temperatures favor evaporation and the salinization of groundwater and soil, which contributes to water insecurity. For the areas this research has highlighted as relevant for high EC, the next step would be to identify the constituents that drive high salinity (e.g. sodium, potassium, magnesium etc.). Furthermore, another challenge of estimating groundwater quality in the Horn of Africa lies in better understanding the relationship between salinity drivers and various rainfall and temperature projections. More precipitation, as predicted with climate change, could raise groundwater levels but also mobilize salt from the surface, leading to higher salinity.
An uneven spatial distribution of groundwater salinity across the Horn of Africa was found. Large potentially affected areas are found in the eastern regions, namely in the Chalbi and Nyiri deserts in Kenya, the Ogaden desert in the Somali region of Ethiopia, the Djibouti region in Djibouti, and most of Somalia. In the studied countries, the estimate of people exposed to salinity exceeding 1500 μS/cm (800 μS/cm -2500 μS/cm) is 11.6 (7.5-17) million people, including 372,000 (258,000-553,000) infants and 491,000 (305,000-713,000) pregnant women. These numbers equate to ~ 7% of the overall population. However, Somalia is undoubtedly the country most affected by high salinity in groundwater in absolute and relative terms; around 50% of the population, or 5 million people, may be exposed to salinity levels in their drinking water, with this proportion remaining nearly constant across the salinity levels examined. This is particularly hazardous for vulnerable populations in the country, such as infants and pregnant women who comprise about half a million people at risk. This geospatial predictive modeling provides the first detailed mapping of groundwater salinity in the region, which is essential information for water and health scientists and decision-makers for identifying and prioritizing areas and populations in need of assistance.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.