Skip to main content
Log in

Application of singular value decomposition (SVD) and semi-discrete decomposition (SDD) techniques in clustering of geochemical data: an environmental study in central Iran

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Common multivariate clustering techniques are ineffective in identifying subtle patterns of correlation, and clustering of variables or samples within complex geochemical datasets. This study compares the combination of singular value decomposition (SVD) and semi discrete decomposition (SDD), with that of hierarchical cluster analysis (HCA), to examine patterns within a multielement soil geochemical dataset from an agricultural area in the vicinity of Pb–Zn mining operations in central Iran. SVD was used to both identify patterns of correlation between variables and samples and to “denoise” the data, and SDD to simultaneously cluster the samples and variables. The results reveal various spatial associations of mining waste-associated metals As, Ba, Pb and Zn, and within the remaining elements whose distribution is largely controlled by the major oxides. SVD–SDD was found to be superior to HCA, in its ability to detect subtle clusters in soil geochemistry indicative of mine-related contamination in the study area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Aitchison J, Barceló-Vidal C, Martín-Fernández JA, Pawlowsky-Glahn V (2000) Log-ratio analysis and compositional distance. Math Geol 32:271–275

    Article  Google Scholar 

  • Alter O, Brown PO, Botstein D (2000) Singular value decomposition for genome–wide expression data processing and modelling. Proc Natl Acad Sci 97(18):10101–10106

    Article  CAS  Google Scholar 

  • Anderson RH, Farrar DB, Thoms SR (2009) Application of discriminant analysis with clustered data to determine anthropogenic metals contamination. Sci Total Environ 408(1):50–56

    Article  CAS  Google Scholar 

  • Baker K (2005) Singular value decomposition tutorial. Ohio State University

  • Barceló-Vidal C, Pawlowsky-Glahn V, Grunsky E (1996) Some aspects of transformations of compositional data and the identification of outliers. Math Geol 28:501–518

    Article  Google Scholar 

  • Bech J, Poschenrieder C, Llugany M, Barceló J, Tume P, Tobias F, Barranzuela J, Vásquez E (1997) Arsenic and heavy metal contamination of soil and vegetation around a copper mine in Northern Peru. Sci Total Environ 203(1):83–91

    Article  CAS  Google Scholar 

  • Berkhin P (2006) A survey of clustering data mining techniques. Grouping multidimensional data. Springer, Berlin, pp 25–71

    Book  Google Scholar 

  • Bošnjak MU, Capak K, Jazbec A, Casiot C, Sipos L, Poljak V, Dadić Ž (2012) Hydrochemical characterization of arsenic contaminated alluvial aquifers in Eastern Croatia using multivariate statistical techniques and arsenic risk assessment. Sci Total Environ 420:100–110

    Google Scholar 

  • Carslaw DC, Beevers SD (2013) Characterising and understanding emission sources using bivariate polar plots and k–means clustering. Environ Model Softw 40:325–329

    Article  Google Scholar 

  • Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 1(2):245–276

    Article  CAS  Google Scholar 

  • Clare AP, Cohen DR (2001) A comparison of unsupervised neural networks and k–means clustering in the analysis of multi–element stream sediment data. Geochemistry: exploration. Environ Anal 1:119–134

    CAS  Google Scholar 

  • Cohen DR, Skillicorn DB, Gatehouse SG, Dalrymple IJ (2003) Signature detection in geochemical data using singular value decomposition and semi–discrete decomposition 21st Internat Geochem Explor Symp (IGES)

  • Cohen DR, Rutherford NF, Morisseau E, Christofiou E, Zissimos AM (2012) Anthropogenic versus lithological influences on soil geochemical patterns in Cyprus. Geochem Explor Environ Anal 12:349–360

    Article  CAS  Google Scholar 

  • Costa M, Gonçalves AM (2011) Clustering and forecasting of dissolved oxygen concentration on a river basin. Stoch Environ Res Risk Assess 25(2):151–163

    Article  Google Scholar 

  • Dalrymple IJ, Cohen DR, Gatehouse SG (2005) Optimisation of partial extraction chemistry for an acetate leach Geochemistry: exploration. Environ Anal 5:279–285

    CAS  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol):1–38

  • Dubitzky W (2008) Data mining techniques in grid computing environments. Wiley, Chichester

    Book  Google Scholar 

  • Edwards PG, Gaines KF, Bryan Jr AL, Novak JM, Blas SA (2014) Trophic dynamics of U, Ni, Hg and other contaminants of potential concern on the Department of Energy’s Savannah River Site. Environ Monitor Assess 186(1):481–500

    Article  CAS  Google Scholar 

  • Everitt B, Landau S, Leese M (2001) Cluster analysis. Hodder Headline Group, London

    Google Scholar 

  • Everitt B, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley Series in Probability and Statistics, Wiley

    Book  Google Scholar 

  • Filzmoser P, Hron K, Reimann C (2012) Interpretation of multivariate outliers for compositional data. Comput Geosci 39:77–85

    Article  Google Scholar 

  • Filzmoser P, Ruiz-Gazen A, Thomas-Agnan C (2014) Identification of local multivariate outliers. Stat Pap 55:29–47

    Article  Google Scholar 

  • Geranian H, Mokhtari AR, Cohen DR (2013) A comparison of fractal methods and probability plots in identifying and mapping soil metal contamination near an active mining area, Iran. Sci Tot Environ 464:845–854

    Article  Google Scholar 

  • Ghaed Rahmati R, Fathianpour N (2008) Dividing the stone units of Irankuh region the algorithms of classified providing pictures of regional satellite data. J Eng Geol 2:395–412 (in Persian)

    Google Scholar 

  • Ghazban F, Mcnutt RH, Schwarcz HP (1994) Genesis of sediment–hosted Zn–Pb–Ba deposits in the Irankuh district, Esfahan area, west–central Iran. Econ Geol 89:1262–1278

    Article  CAS  Google Scholar 

  • Hongjin J, Daoming Z, Yanxiang S, Yangang W, Xisheng W (2007) Semi–hierarchical correspondence cluster analysis and regional geochemical pattern recognition. J Geochem Explor 93(2):109–119

    Article  Google Scholar 

  • Hubert L, Meulman J, Heiser W (2000) Two purposes for matrix factorization: an historical appraisal. SIAM Rev 42(1):68–82

    Article  Google Scholar 

  • Islam MS, Ahmed MK, Habibullah-Al-Mamun M (2015) Apportionment of heavy metals in soil and vegetables and associated health risks assessment. Stoch Environ Res Risk Assess 30(1):365–377

    Article  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Kalman D (1996) A singularly valuable decomposition: the SVD of a matrix. College Math Journal. 27(1):1–23

    Article  Google Scholar 

  • Kaski S (1997) Data exploration using self-organizing maps. Acta Polytechnica Scand 82. Espoo

  • Kolda TG, O’Leary DP (1998) A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Trans Inf Syst 16(4):322–346

    Article  Google Scholar 

  • Kolda TG, O’Leary DP (1999a) Latent semantic indexing via a semi-discrete matrix decomposition. In: Cybenko G et al (eds) The mathematics of information coding, extraction and distribution. Springer-Verlag, Berlin

    Google Scholar 

  • Kolda TG, O’Leary DP (1999b) Computation and uses of the semidiscrete matrix decomposition. Tech Rpt CS–TR–4012, Dept. Computer Science, Univ Maryland

  • Korre A (1999) Statistical and spatial assessment of soil heavy metal contamination in areas of poorly recorded, complex sources of pollution. Stoch Env Res Risk Assess 13(4):260–287

    Article  Google Scholar 

  • Krishna AK, Mohan KR, Murthy NN, Periasamy V, Bipinkumar G, Manohar K, Rao SS (2013) Assessment of heavy metal contamination in soils around chromite mining areas, Nuggihalli, Karnataka, India. Environ Earth Sci 70(2):699–708

    CAS  Google Scholar 

  • McConnell S, Skillicorn DB (2001) Outlier detection using semi–discrete decomposition. Technical Report 2001–452, Dept of Computing and Information Science, Queen’s University

  • McConnell S, Skillicorn DB (2002) Semidiscrete decomposition: A bump hunting technique. Australasian Data Mining Workshop

  • Meshkani SA, Mehrabi B, Yaghubpur A, Alghalandis YF (2011) The application of geochemical pattern recognition to regional prospecting: a case study of the Sanandaj-Sirjan metallogenic zone, Iran. J Geochem Explor 108(3):183–195

    Article  CAS  Google Scholar 

  • Mokhtari AR, Cohen DR, Gatehouse SG (2009) Geochemical effects of deeply buried Cu–Au mineralization on transported regolith in an arid terrain. Geochemistry: exploration. Environ Anal 9:227–236

    CAS  Google Scholar 

  • Mokhtari AR, Rodsari PR, Cohen DR, Emami A, Bafghi AAD, Ghegeni ZK (2015) Metal speciation in agricultural soils adjacent to the Irankuh Pb–Zn mining area, central Iran. J Afr Earth Sc 101:186–193

    Article  CAS  Google Scholar 

  • Mooi E, Sarstedt M (2011) Cluster analysis. A concise guide to market research. Springer, Berlin, pp 237–284

    Book  Google Scholar 

  • O’Leary DP, Peleg S (1983) Digital image compression by outer product expansion communications. IEEE Trans 31(3):441–444

    Article  Google Scholar 

  • Rastad E (1981) Geological, mineralogical and ore facies investigation of the lower cretaceous stratabound Zn–Pb–Ba–Cu deposits of the Irankuh mountain range, Isfahan, west central Iran. PhD thesis, Heidelberg University

  • Reimann C, Filzmoser P, Garrett RG, Dutter R (2008) Statistical data analysis explained: applied environmental statistics with R. John Wiley Sons, Chichester

    Book  Google Scholar 

  • Ren L, Cohen DR, Rutherford NF, Zissimos AM, Morisseau E (2015) Reflections of the geological characteristics of Cyprus in soil rare earth element patterns. Appl Geochem 56:80–93

    Article  CAS  Google Scholar 

  • Skillicorn DB (2004) Finding unusual correlation using matrix decompositions. Symposium on intelligence and security informatics. Springer, Tucson, pp 83–99

    Google Scholar 

  • Skillicorn DB (2007) Understanding complex datasets: data mining with matrix decompositions. CRC Press, Boca Raton

    Book  Google Scholar 

  • Skillicorn DB, Cohen DR (2004) Detecting mineralisation using partial element extraction; A case study. 4th SIAM international conference on data mining, Florida, April 24, 2004

  • Stewart GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566

    Article  Google Scholar 

  • Teimoryacl F, Pakzad H, Baghery H (2012) The study of source of metals and mineralization fluids in Irankuh deposit. J Stratigr Sedimentol Res 44(3):83–102 (in Persian)

    Google Scholar 

  • Templ M, Filzmoser P, Reimann C (2008) Cluster analysis applied to regional geochemical data: problems and possibilities. Appl Geochem 23(8):2198–2213

    Article  CAS  Google Scholar 

  • Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. In: Berrar DP, Dubitzky W, Granzow M (eds) A practical approach to microarray data analysis. Kluwer, Norwell, pp 91–109

    Chapter  Google Scholar 

  • Ward JH (1963) Hierarchical grouping to optimize an objective function. JASA 58(301):236–244

    Article  Google Scholar 

  • Xu R, Wunsch D (2005) Survey of clustering algorithms. Neural Netw IEEE Trans 16(3):645–678

    Article  Google Scholar 

  • Zumlot T, Batayneh A, Nazal Y, Ghrefat H, Mogren S, Zaman H, Elawadi E, Laboun A, Qaisy S (2013) Using multivariate statistical analyses to evaluate groundwater contamination in the north western part of Saudi Arabia. Environ Earth Sci 70(7):3277–3287

    Article  CAS  Google Scholar 

  • Zyto SA, Grama W, Szpankowski S (2002) Semi-discrete matrix transforms (SDD) for image and video compression. Kluwer, Amsterdam

    Book  Google Scholar 

Download references

Acknowledgments

The authors thank to the Iranian Geological Survey for their support and the soil analysis, Islamic Azad University (Bafgh branch) and also Dr. Soleimani (Isfahan University of Technology) for his assistance in the project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Reza Mokhtari.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zekri, H., Mokhtari, A.R. & Cohen, D.R. Application of singular value decomposition (SVD) and semi-discrete decomposition (SDD) techniques in clustering of geochemical data: an environmental study in central Iran. Stoch Environ Res Risk Assess 30, 1947–1960 (2016). https://doi.org/10.1007/s00477-016-1219-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-016-1219-5

Keywords

Navigation