Skip to main content

Advertisement

Log in

Classifying arsenic-contaminated waters in Tarkwa: a machine learning approach

  • Original Article
  • Published:
Sustainable Water Resources Management Aims and scope Submit manuscript

Abstract

Access to clean and safe drinking water is key to the improvement of social lives in most developing countries. Due to its hazardous nature and detrimental effects on human health, increased quantities of arsenic in water bodies have been a growing global health concern in recent years. In Ghana, elevated arsenic concentration is reported in some waters in Tarkwa. However, constant monitoring of arsenic concentrations in these water sources are inhibited by the associated huge expenses. To facilitate early detection, this study aimed at developing efficient machine learning models for classifying high, medium and low levels of arsenic contamination using physical water parameters, such as total dissolved solids, pH, electrical conductivity and turbidity. These parameters were selected, because they are relatively inexpensive to measure, their data were available and they may influence the concentration of arsenic in the water. Thus, three machine learning models, namely, extra trees, random forest and decision tree, were developed and assessed using evaluation metrics, such as accuracy, precision and sensitivity. The evaluation results justified the superiority of the extra trees and random forest models over decision tree. However, all developed machine learning models generally gave remarkable performance when classifying waters with high and low levels of arsenic contamination. Moreover, the variable importance analysis revealed that pH had the strongest influence in classifying arsenic contaminated waters followed by electrical conductivity. The outcome of the study has revealed the potency of machine learning algorithms in assisting water monitoring practitioners for monitoring arsenic concentration in water sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The data sets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  • Abba SI, Hadi SJ, Abdullahi J (2017) River water modelling prediction using multi-linear regression, artificial neural network, and adaptive neuro-fuzzy inference system techniques. Proc Comput Sci 120:75–82. https://doi.org/10.1016/j.procs.2017.11.212

    Article  Google Scholar 

  • Abbas G, Murtaza B, Bibi I et al (2018) Arsenic uptake, toxicity, detoxification, and speciation in plants: physiological, biochemical, and molecular aspects. Int J Environ Res Public Health 15:13

    Article  Google Scholar 

  • Acharyya SK, Lahiri S, Raymahashay BC, Bhowmik A (2000) Arsenic toxicity of groundwater in parts of the Bengal basin in India and Bangladesh: the role of Quaternary stratigraphy and Holocene sea-level fluctuation. Environ Geol 39:1127–1137

    Article  CAS  Google Scholar 

  • Ahoulé DG, Lalanne F, Mendret J et al (2015) Arsenic in African waters: a review. Water Air Soil Pollut. https://doi.org/10.1007/s11270-015-2558-4

    Article  Google Scholar 

  • Ampomah EK, Qin Z, Nyame G (2020) Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement. Information 11:332

    Article  Google Scholar 

  • Asante KA, Agusa T, Subramanian A et al (2007) Contamination status of arsenic and other trace elements in drinking water and residents from Tarkwa, a historic mining township in Ghana. Chemosphere 66:1513–1522. https://doi.org/10.1016/j.chemosphere.2006.08.022

    Article  CAS  Google Scholar 

  • Ayotte JD, Nolan BT, Gronberg JA (2016) Predicting arsenic in drinking water wells of the Central Valley, California. Environ Sci Technol 50:7555–7563

    Article  CAS  Google Scholar 

  • Baah-Ennumh TY, Adom-Asamoah G (2019) Land use challenges in mining communities—the case of Tarkwa-Nsuaem municipality. Environ Ecol Res 7:139–152

    Article  Google Scholar 

  • Bhattacharya P, Sracek O, Eldvall B et al (2012) Hydrogeochemical study on the contamination of water resources in a part of Tarkwa mining area, Western Ghana. J Afr Earth Sci 66–67:72–84. https://doi.org/10.1016/j.jafrearsci.2012.03.005

    Article  CAS  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Breiman L (2017) Classification and regression trees. Routledge, London

    Book  Google Scholar 

  • Brus DJ, Kempen B, Heuvelink GBM (2011) Sampling for validation of digital soil maps. Eur J Soil Sci 62:394–407

    Article  Google Scholar 

  • De Ville B, Neville P (2013) Decision trees for analytics: using SAS Enterprise miner. SAS Institute, Cary

    Google Scholar 

  • Dehghan AA, Kazemi M (2013) Measurement and comparison of heavy metals concentration in vegetables used in Mashhad. Zahedan J Res Med Sci 15:3

    Google Scholar 

  • Erickson ML, Elliott SM, Brown CJ et al (2021) Machine-learning predictions of high arsenic and high manganese at drinking water depths of the glacial aquifer system, northern continental United States. Environ Sci Technol 55:5791–5805. https://doi.org/10.1021/acs.est.0c06740

    Article  CAS  Google Scholar 

  • Ewusi A, Ahenkorah I, Kuma JSY (2017) Groundwater vulnerability assessment of the Tarkwa mining area using SINTACS approach and GIS. Ghana Min J 17:18–30

    Article  Google Scholar 

  • Ewusi A, Ahenkorah I, Aikins D (2021) Modelling of total dissolved solids in water supply systems using regression and supervised machine learning approaches. Appl Water Sci 11:1–16

    Article  Google Scholar 

  • Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42

    Article  Google Scholar 

  • Ghashghaie M, Eslami H, Ostad-Ali-Askari K (2022) Applications of time series analysis to investigate components of Madiyan-rood river water quality. Appl Water Sci 12:202. https://doi.org/10.1007/s13201-022-01693-5

    Article  Google Scholar 

  • Guo P-T, Li M-F, Luo W et al (2015) Digital mapping of soil organic matter for rubber plantation at regional scale: an application of random forest plus residuals kriging approach. Geoderma 237:49–59

    Article  Google Scholar 

  • Gupta P, Vishwakarma M, Rawtani PM (2009) Assesment of water quality parameters of Kerwa Dam for drinking suitability. Int J Theor Appl Sci 1:53–55

    Google Scholar 

  • Ibrahim B, Ewusi A, Ahenkorah I (2022a) Assessing the suitability of boosting machine-learning algorithms for classifying arsenic-contaminated waters: a novel model-explainable approach using Shapley additive explanations. Water 14:3509

    Article  Google Scholar 

  • Ibrahim B, Majeed F, Ewusi A, Ahenkorah I (2022b) Residual geochemical gold grade prediction using extreme gradient boosting. Environ Challenges 6:100421. https://doi.org/10.1016/j.envc.2021.100421

    Article  CAS  Google Scholar 

  • Ibrahim B, Ahenkorah I, Ewusi A, Majeed F (2023a) A novel XRF-based lithological classification in the Tarkwaian paleo placer formation using SMOTE-XGBoost. J Geochem Explor 245:107147. https://doi.org/10.1016/j.gexplo.2022.107147

    Article  CAS  Google Scholar 

  • Ibrahim B, Ewusi A, Ziggah YY, Ahenkorah I (2023b) A new implementation of stacked generalisation approach for modelling arsenic concentration in multiple water sources. Int J Environ Sci Technol. https://doi.org/10.1007/s13762-023-05343-4

    Article  Google Scholar 

  • Kusimi JM, Kusimi BA (2012) The hydrochemistry of water resources in selected mining communities in Tarkwa. J Geochem Explor 112:252–261

    Article  CAS  Google Scholar 

  • Lombard MA, Bryan MS, Jones DK et al (2021) Machine learning models of arsenic in private wells throughout the conterminous United States as a tool for exposure assessment in human health studies. Environ Sci Technol 55:5012–5023

    Article  CAS  Google Scholar 

  • Mahjoobi J, Etemad-Shahidi A (2008) An alternative approach for the prediction of significant wave heights based on classification and regression trees. Appl Ocean Res 30:172–177

    Article  Google Scholar 

  • Majeed F, Ziggah YY, Kusi-Manu C et al (2022) A novel artificial intelligence approach for regolith geochemical grade prediction using multivariate adaptive regression splines. Geosyst Geoenviron 1:100038

    Article  Google Scholar 

  • Manning BA, Goldberg S (1996) Modeling competitive adsorption of arsenate with phosphate and molybdate on oxide minerals. Soil Sci Soc Am J 60:121–131. https://doi.org/10.2136/sssaj1996.03615995006000010020x

    Article  CAS  Google Scholar 

  • Medunić G, Fiket Ž, Ivanić M (2020a) Arsenic contamination status in Europe, Australia, and other parts of the world. Arsen Drink Water Food 1:183–233

    Article  Google Scholar 

  • Nordstrom DK (2002) Worldwide occurrences of arsenic in ground water. Science (80-) 296:2143–2145

    Article  CAS  Google Scholar 

  • Ostad-Ali-Askari K, Shayannejad M (2021) Quantity and quality modelling of groundwater to manage water resources in Isfahan-Borkhar Aquifer. Environ Dev Sustain 23:15943–15959. https://doi.org/10.1007/s10668-021-01323-1

    Article  Google Scholar 

  • Pal M, Mather PM (2003) An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens Environ 86:554–565

    Article  Google Scholar 

  • Park Y, Ligaray M, Kim YM et al (2016) Development of enhanced groundwater arsenic prediction model using machine learning approaches in Southeast Asian countries. Desalin Water Treat 57:12227–12236. https://doi.org/10.1080/19443994.2015.1049411

    Article  CAS  Google Scholar 

  • Peiravi R, Dehghan AA, Vahedian M (2013) Heavy metals concentrations in Mashhad drinking water network. Zahedan J Res Med Sci 15:11

    Google Scholar 

  • Petrusevski B, Sharma S, Schippers JC, Shordt K (2007) Arsenic in drinking water. IRC International Water and Sanitation Centre, Delft, pp 36–44

    Google Scholar 

  • Podgorski J, Berg M (2020) Global threat of arsenic in groundwater. Science (80-) 368:845–850. https://doi.org/10.1126/science.aba1510

    Article  CAS  Google Scholar 

  • Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, London

    Google Scholar 

  • Rahmanian N, Ali SHB, Homayoonfard M et al (2015) Analysis of physiochemical parameters to evaluate the drinking water quality in the state of Perak, Malaysia. J Chem 2015:716125. https://doi.org/10.1155/2015/716125

    Article  CAS  Google Scholar 

  • Rodriguez-Galiano VF, Ghimire B, Rogan J et al (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104

    Article  Google Scholar 

  • Sahin EK (2022) Comparative analysis of gradient boosting algorithms for landslide susceptibility mapping. Geocarto Int 37:2441–2465

    Article  Google Scholar 

  • Tanha J, Abdi Y, Samadi N et al (2020) Boosting methods for multi-class imbalanced data classification: an experimental review. J Big Data 7:1–47

    Article  Google Scholar 

  • Welch AH, Stollenwerk KG (2003) Arsenic in ground water: geochemistry and occurrence. Springer, New York

    Book  Google Scholar 

  • Welch AH, Westjohn DB, Helsel DR, Wanty RB (2000) Arsenic in ground water of the United States: occurrence and geochemistry. Groundwater 38:589–604

    Article  CAS  Google Scholar 

  • WHO (2004) Guidelines for drinking-water quality. World Health Organization, Geneva

    Google Scholar 

  • Zhang M, Shi W, Xu Z (2020) Systematic comparison of five machine-learning models in classification and interpolation of soil particle size fractions using different transformed data. Hydrol Earth Syst Sci 24:2505–2526

    Article  Google Scholar 

  • Abhishek L (2020) Optical character recognition using ensemble of SVM, MLP and extra trees classifier. In: 2020 international conference for emerging technology (INCET). IEEE, New York, pp 1–4

  • Beauxis-Aussalet E, Hardman L (2014) Simplifying the visualization of confusion matrix. In: 26th Benelux conference on artificial intelligence (BNAIC)

  • Derczynski L (2016) Complementarity, F-score, and NLP Evaluation. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16). pp 261–266

  • Dickson KB, Benneh G (1980) A new geography of Ghana Longmans

  • Géron A (2017) Hands-on machine learning with scikit-learn and tensorflow: concepts. Tools, tech build intelligent system

  • Ghana Statistical Service (2014) Population and housing census: district analytical report Tarkwa Nsuaem Municipality. Ghana Statistical Service Accra, Ghana, pp 16–18

  • Hinkle SR, Polette DJ (1999) Arsenic in ground water of the Willamette Basin, Oregon. US Department of the Interior, US Geological Survey

  • Howard ME (2012) Investigation of arsenic in the transition zone basin of the Mojave River

  • IARC (2004) Some drinking-water disinfectants and contaminants, including arsenic

  • Medunić G, Fiket Ž, Ivanić M (2020) Arsenic contamination status in Europe, Australia, and other parts of the world BT. In: Srivastava S (ed) Arsenic in drinking water and food. Springer, Singapore, pp 183–233

  • Natasha, Shahid M, Imran M, et al (2020) Arsenic environmental contamination status in South Asia BT. In: Srivastava S (ed) Arsenic in drinking water and food. Springer, Singapore, pp 13–39

  • Owusu AM (2013) Determination of total arsenic and the relationship between the arsenic levels and other determined physicochemical properties of some biological and environmental samples from selected towns in the Amansie West district of the Ashanti Region

  • WHO (2017) 2017 WHO guidelines for drinking water quality: first addendum to the fourth edition. J Am Water Work Assoc 109:44–51

Download references

Acknowledgements

The authors wish to thank the management of University of Mines and Technology for providing data for the modeling.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew Nkoom.

Ethics declarations

Conflict of interest

Authors have no existing conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 20 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ayisha, M., Nkoom, M. & Doke, D.A. Classifying arsenic-contaminated waters in Tarkwa: a machine learning approach. Sustain. Water Resour. Manag. 10, 55 (2024). https://doi.org/10.1007/s40899-024-01042-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40899-024-01042-1

Keywords

Navigation