Skip to main content
Log in

A chemo-ecologists’ practical guide to compositional data analysis

  • Original Article
  • Published:
Chemoecology Aims and scope Submit manuscript

An Erratum to this article was published on 08 December 2016

Abstract

Compositional data are commonly used in chemical ecology to describe the biological role of chemical compounds in communication, defense or other behavioral modifications. Statistical analyses of compositional data, however, are challenging due to several constraints (e.g., constant sum constraint). We use an ontogenetic series of defensive gland secretions from larvae, three nymphal stages and adults of the oribatid model species Archegozetes longisetosus as a typical chemo-ecological data set to prepare a practical guide for compositional data analyses in chemical ecology. We compare various common and less common statistical and ordination methods to depict small quantitative and/or qualitative differences in compositional datasets: principal component analysis (PCA), non-metric multidimensional scaling (NMDS), multivariate statistical tests (Anderson’s permutational multivariate analyses of variance = PERMANOVA; permutational analyses of multivariate dispersions = PERMDIPS), linear discriminant analysis (LDA), the data mining algorithm Random Forests, bipartite network analysis and dynamic range boxes (dynRB). We summarize which methods are suitable for different research questions and how data needs to be structured and pre-processed. Network analyses and dynamic range boxes are promising tools for analyzing compositional data beyond the “classical” methods and provide additional information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Aitchison J (1982) The Statistical-analysis of compositional data. J Roy Stat Soc B Met 44:139–177

    Google Scholar 

  • Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, London

    Book  Google Scholar 

  • Anderson MJ (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol 26:32–46. doi:10.1111/j.1442-9993.2001.01070.pp.x

    Google Scholar 

  • Anderson MJ (2005) PERMANOVA: a FORTRAN computer program for permutational multivariate analysis of variance. Department of Statistics, University of Auckland, Auckland

    Google Scholar 

  • Anderson MJ (2006) Distance-based tests for homogeneity of multivariate dispersions. Biometrics 62:245–253. doi:10.1111/j.1541-0420.2005.00440.x

    Article  PubMed  Google Scholar 

  • Anderson MJ, Walsh DCI (2013) PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: what null hypothesis are you testing? Ecol Monogr 83:557–574. doi:10.1890/12-2010.1

    Article  Google Scholar 

  • Anderson MJ, Ellingsen KE, McArdle BH (2006) Multivariate dispersion as a measure of beta diversity. Ecol Lett 9:683–693. doi:10.1111/j.1461-0248.2006.00926.x

    Article  PubMed  Google Scholar 

  • Anderson MJ, Gorley RN, Clarke KR (2008) PERMANOVA + for PRIMER: guide to software and statistical methods. PRIMER-E, Plymouth

    Google Scholar 

  • Bacon-Shone J (2011) A Short History of Compositional Data Analysis. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. John Wiley & Sons Ltd, Chichester, pp 2–11

    Google Scholar 

  • Bischoff M, Jürgens A, Campbell DR (2014) Floral scent in natural hybrids of Ipomopsis (Polemoniaceae) and their parental species. Ann Bot-London 113:533–544. doi:10.1093/aob/mct279

    Article  Google Scholar 

  • Blüthgen N (2010) Why network analysis is often disconnected from community ecology: a critique and an ecologist’s guide. Basic Appl Ecol 11:185–195

    Article  Google Scholar 

  • Blüthgen N, Menzel F, Blüthgen N (2006a) Measuring specialization in species interaction networks. BMC Ecol 6:9. doi:10.1186/1472-6785-6-9

    Article  PubMed  PubMed Central  Google Scholar 

  • Blüthgen N, Mezger D, Linsenmair KE (2006b) Ant-hemipteran trophobioses in a Bornean—rainforest diversity, specificity and monopolisation. Insectes Soc 53:194–203. doi:10.1007/s00040-005-0858-1

    Article  Google Scholar 

  • Blüthgen N, Menzel F, Hovestadt T, Fiala B, Blüthgen N (2007) Specialization, constraints, and conflicting interests in mutualistic networks. Curr Biol 17:341–346. doi:10.1016/j.cub.2006.12.039

    Article  PubMed  Google Scholar 

  • Bray JR, Curtis JT (1957) An ordination of the upland forest communities of southern wisconsin. Ecol Monogr 27:326–349

    Article  Google Scholar 

  • Breiman L (2001) Random Forests. Mach Learn 45:5–32. doi:10.1023/A:1010933404324

    Article  Google Scholar 

  • Brückner A, Heethoff M (2016) Scent of a mite: origin and chemical characterization of the lemon-like flavor of mite-ripened cheeses. Exp Appl Acarol 69:249–261. doi:10.1007/s10493-016-0040-7

    Article  PubMed  Google Scholar 

  • Craig A, Cloarec O, Holmes E, Nicholson JK, Lindon JC (2006) Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal Chem 78(7):2262–2267

    Article  CAS  PubMed  Google Scholar 

  • Dormann CF, Fründ J, Blüthgen N, Gruber B (2008a) Indices, graphs and null models: analyzing bipartite ecological networks. Open Ecol J 2:7–24

    Article  Google Scholar 

  • Dormann CF, Fründ J, Gruber B (2008b) Introducing the bipartite package: analysing ecological networks. R News 8:8–11

    Google Scholar 

  • Emery VJ, Tsutsui ND (2016) Differential sharing of chemical cues by social parasites versus social mutualists in a three-species symbiosis. J Chem Ecol 42:277–285. doi:10.1007/s10886-016-0692-0

    Article  CAS  PubMed  Google Scholar 

  • Filzmoser P, Hron K, Reimann C (2009) Principal component analysis for compositional data with outliers. Environmetrics 20:621–632. doi:10.1002/env.966

    Article  Google Scholar 

  • Fisher RA (1925) Statistical methods for research workers. Oliver and Boyd, Edinburgh

    Google Scholar 

  • Goodpaster AM, Kennedy MA (2011) Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies. Chemometr Intell Lab Syst 109(2):162–170. doi:10.1016/j.chemolab.2011.08.009

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hair JF, Black WC, Babin BJ, Anderson RE (2009) Multivariate data analysis: a global perspective, 7th edn. Prentice Hall, New York

    Google Scholar 

  • Heethoff M, Raspotnig G (2011) Is 7-hydroxyphthalide a natural compound of oil gland secretions?—Evidence from Archegozetes longisetosus (Acari, Oribatida). Acarologia 51:229–236. doi:10.1051/acarologia/20112004

    Article  Google Scholar 

  • Heethoff M, Raspotnig G (2012) Expanding the ‘enemy-free space’ for oribatid mites: evidence for chemical defense of juvenile Archegozetes longisetosus against the rove beetle Stenus juno. Exp Appl Acarol 56(2):93–97. doi:10.1007/s10493-011-9501-1

    Article  PubMed  Google Scholar 

  • Heethoff M, Laumann M, Bergmann P (2007) Adding to the reproductive biology of the parthenogenetic oribatid mite, Archegozetes longisetosus (Acari, Oribatida, Trhypochthoniidae). Turk J Zool 31:151–159

    Google Scholar 

  • Holland SM (2008) Non-metric multidimensional scaling (MDS). (online document) https://strata.uga.edu/software/pdf/mdsTutorial.pdf

  • Hutchinson GE (1957) Concluding remarks. Cold Spring Harb Symp Quant Biol 22:415–427

    Article  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis. Springer Group, Heidelberg

    Google Scholar 

  • Jombart T, Devillard S, Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet 11:94. doi:10.1186/1471-2156-11-94

    Article  PubMed  PubMed Central  Google Scholar 

  • Junker RR, Keller A (2015) Microhabitat heterogeneity across leaves and flower organs promotes bacterial diversity. FEMS Microbiol Ecol 91:97. doi:10.1093/femsec/fiv097

    Article  Google Scholar 

  • Junker RR, Loewel C, Gross R, Dötterl S, Keller A, Blüthgen N (2011) Composition of epiphytic bacterial communities differs on petals and leaves. Plant Biol 13:918–924. doi:10.1111/j.1438-8677.2011.00454.x

    Article  CAS  PubMed  Google Scholar 

  • Junker RR, Kuppler J, Bathke AC, Schreyer ML, Trutschnig W (2016) Dynamic range boxes—A robust non-parametric approach to quantify size and overlap of n dimensional hypervolumes. Methods Ecol Evol. doi:10.1111/2041-210X.12611

    Google Scholar 

  • Kohl SM, Klein MS, Hochrein J, Oefner PJ, Spang R, Gronwald W (2012) State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics 8:146–160. doi:10.1007/s11306-011-0350-z

    Article  CAS  PubMed  Google Scholar 

  • Kriesell L, Hilpert A, Leonhardt SD (2016) Different but the same: bumblebee species collect pollen of different plant sources but similar amino acid profiles. Apidologie. doi:10.1007/s13592-016-0454-6

    Google Scholar 

  • Kruskal JB (1964) Multidimensional-scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29:1–27. doi:10.1007/Bf02289565

    Article  Google Scholar 

  • Kucera M, Malmgren BA (1998) Logratio transformation of compositional data—a resolution of the constant sum constraint. Mar Micropaleontol 34:117–120. doi:10.1016/S0377-8398(97)00047-9

    Article  Google Scholar 

  • Lachenbruch PA, Goldstein M (1979) Discriminant Analysis. Biometrics 35(1):69–85. doi:10.2307/2529937

    Article  Google Scholar 

  • Leonhardt SD, Blüthgen N (2012) The same, but different: pollen foraging in honeybee and bumblebee colonies. Apidologie 43:449–464. doi:10.1007/s13592-011-0112-y

    Article  Google Scholar 

  • Leonhardt SD, Schmitt T, Blüthgen N (2011) Tree resin composition, collection behavior and selective filters shape chemical profiles of tropical bees (Apidae: meliponini). PLoS One. doi:10.1371/journal.pone.0023445

    Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by Random Forest. R News 2:18–22

    Google Scholar 

  • Lorenzi MC, Azzani L, Bagneres AG (2014) Evolutionary consequences of deception: complexity and informational content of colony signature are favored by social parasitism. Curr Zool 60:137–148

    Article  Google Scholar 

  • Lunetta KL, Hayward LB, Segal J, Van Eerdewegh P (2004) Screening large-scale association study data: exploiting interactions using Random Forests. BMC Genet 5:32. doi:10.1186/1471-2156-5-32

    Article  PubMed  PubMed Central  Google Scholar 

  • Martin S, Drijfhout F (2009) A review of ant cuticular hydrocarbons. J Chem Ecol 35:1151–1161. doi:10.1007/s10886-009-9695-4

    Article  CAS  PubMed  Google Scholar 

  • Martin-Fernandez JA, Barcelo-Vidal C, Pawlowsky-Glahn V (2003) Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math Geol 35:253–278. doi:10.1023/A:1023866030544

    Article  Google Scholar 

  • Mathis KA, Tsutsui ND (2016) Cuticular hydrocarbon cues are used for host acceptance by Pseudacteon spp. Phorid Flies that attack Azteca sericeasur ants. J Chem Ecol 42:286–293. doi:10.1007/s10886-016-0694-y

    Article  CAS  PubMed  Google Scholar 

  • Menzel F, Orivel J, Kaltenpoth M, Schmitt T (2014) What makes you a potential partner? Insights from convergently evolved ant–ant symbioses. Chemoecology 24:105–119. doi:10.1007/s00049-014-0149-2

    Article  Google Scholar 

  • Minchin PR (1987) An evaluation of the relative robustness of techniques for ecological ordination. Vegetatio 69:89–107. doi:10.1007/Bf00038690

    Article  Google Scholar 

  • Mitchell L (2011) A parallel Random Forest implementation for R. Technical report, EPCC

    Google Scholar 

  • Næs T, Mevik BH (2001) Understanding the collinearity problem in regression and discriminant analysis. J Chemometrics 15:413–426. doi:10.1002/cem.676

    Article  Google Scholar 

  • Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H (2015) Vegan: community ecology package R package version 2: 3–5. http://CRAN.R-project.org/package=vegan

  • Palarea-Albaladejo J, Martin-Fernandez JA (2015) zCompositions—R package for multivariate imputation of left-censored data under a compositional approach. Chemom Intell Lab Syst 143:85–96

    Article  CAS  Google Scholar 

  • Ranganathan Y, Borges RM (2011) To transform or not to transform: that is the dilemma in the statistical analysis of plant volatiles. Plant Signal Behav 6:113–116. doi:10.4161/psb.6.1.14191

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Reyment RA (1989) Compositional data analysis. Terra Nova 1:29–34. doi:10.1111/j.1365-3121.1989.tb00322.x

    Article  Google Scholar 

  • Sakata T, Norton RA (2003) Opisthonotal gland chemistry of a middle-derivative oribatid mite, Archegozetes longisetosus (Acari: trhypochthoniidae). Int J Acarol 29:345–350

    Article  Google Scholar 

  • Simpson GL, Oksanen J (2016) Analogue: analogue matching and modern analogue technique transfer function models. R package version 017-0. http://cranr-project.org/package=analogue

  • Sledge MF, Moneti G, Pieraccini G, Turillazzi S (2000) Use of solid-phase microextraction in the investigation of chemical communication in social wasps. J Chrom A 873:73–77. doi:10.1016/S0021-9673(99)01176-0

    Article  CAS  Google Scholar 

  • Späthe A, Reinecke A, Olsson SB, Kesavan S, Knaden M, Hansson BS (2013) Plant species- and status-specific odorant blends guide oviposition choice in the moth Manduca sexta. Chem Senses 38:147–159. doi:10.1093/chemse/bjs089

    Article  PubMed  Google Scholar 

  • Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random Forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comp Sci 43:1947–1958. doi:10.1021/Ci034160g

    Article  CAS  Google Scholar 

  • R Development Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria ISBN 3-900051-07-0, URL http://www.R-project.org

  • van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genom 7:142. doi:10.1186/1471-2164-7-142

    Article  Google Scholar 

  • van den Boogaart KG, Tolosana R, Bren M (2014) Compositions: compositional data analysis. R package version 1.40-1. http://CRANR-project.org/package=compositions

  • van der Maarel E, Franklin J (2013) Vegetation Ecology. Wiley-Blackwell, New York

    Book  Google Scholar 

  • Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York. ISBN 0-387-95457-0

    Book  Google Scholar 

  • Wagner D, Brown MJF, Broun P, Cuevas W, Moses LE, Chao DL, Gordon DM (1998) Task-related differences in the cuticular hydrocarbon composition of harvester ants, Pogonomyrmex barbatus. J Chem Ecol 24:2021–2037

    Article  CAS  Google Scholar 

  • Wagner D, Tissot M, Cuevas W, Gordon DM (2000) Harvester ants utilize cuticular hydrocarbons in nestmate recognition. J Chem Ecol 26:2245–2257. doi:10.1023/A:1005529224856

    Article  CAS  Google Scholar 

  • Wehner K, Norton RA, Blüthgen N, Heethoff M (2016) Specialization of oribatid mites to forest microhabitats—the enigmatic role of litter. Ecosphere. doi:10.1002/ecs2.1336

    Google Scholar 

  • Weiner CN, Werner M, Linsenmair KE, Blüthgen N (2014) Land-use impacts on plant-pollinator networks: interaction strength and specialization predict pollinator declines. Ecology 95:466–474

    Article  PubMed  Google Scholar 

  • Weiss I, Ruther J, Stökl J (2015) Species specificity of the putative male antennal aphrodisiac pheromone in Leptopilina heterotoma, Leptopilina boulardi, and Leptopilina victoriae. Biomed Res Int. doi:10.1155/2015/202965

    Google Scholar 

  • Wilkinson L (2002) Multidimensional scaling. Systat 10 2 Statistics II, Systat Software, Richmond: 119–145

  • Worley B, Halouska S, Powers R (2012) Utilities for quantifying separation in PCA/PLS-DA scores plots. Anal Biochem 433(2):102–104. doi:10.1016/j.ab.2012.10.011

    Article  PubMed  PubMed Central  Google Scholar 

  • Wurdack M, Herbertz S, Dowling D, Kroiss J, Strohm E, Baur H, Niehuis O, Schmitt T (2015) Striking cuticular hydrocarbon dimorphism in the mason wasp Odynerus spinipes and its possible evolutionary cause (Hymenoptera: chrysididae, Vespidae). P Roy Soc B-Biol Sci. doi:10.1098/rspb.2015.1777

    Google Scholar 

Download references

Acknowledgements

Adrian Brückner is supported by a PhD scholarship from the German National Academic Foundation (Studienstiftung des deutschen Volkes). We thank Klaus Birkhofer (Lund University), Robert R. Junker (University of Salzburg) and Nico Blüthgen (TU Darmstadt) for discussing PERMANOVA/PERMDISP, dynamic range boxes and the network approach with us. We further thank Lukas Kauling for experimental assistance and carrying out a preliminary experiment. This study was partly funded by the German Research Foundation (DFG, HE 4593/5-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Heethoff.

Additional information

Handling Editor: Marko Rohlfs.

An erratum to this article is available at http://dx.doi.org/10.1007/s00049-016-0228-7.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brückner, A., Heethoff, M. A chemo-ecologists’ practical guide to compositional data analysis. Chemoecology 27, 33–46 (2017). https://doi.org/10.1007/s00049-016-0227-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00049-016-0227-8

Keywords

Navigation