Review
So Many Variables: Joint Modeling in Community Ecology

https://doi.org/10.1016/j.tree.2015.09.007Get rights and content

Trends

Many ecological questions require the joint analysis of abundances collected simultaneously across many taxonomic groups, and, if organisms are identified using modern tools such as metabarcoding, their number can be in the thousands.

While historically such data have been analyzed using ad hoc algorithms, it is now possible to fully specify joint statistical models for abundance using multivariate extensions of generalized linear mixed models.

These modern ‘joint modeling’ approaches allow the study of correlation patterns across taxa, at the same time as studying environmental response, to tease the two apart.

Latent variable models are an especially exciting tool that has recently been used for ordination as well as for studying the factors driving co-occurrence.

Technological advances have enabled a new class of multivariate models for ecology, with the potential now to specify a statistical model for abundances jointly across many taxa, to simultaneously explore interactions across taxa and the response of abundance to environmental variables. Joint models can be used for several purposes of interest to ecologists, including estimating patterns of residual correlation across taxa, ordination, multivariate inference about environmental effects and environment-by-trait interactions, accounting for missing predictors, and improving predictions in situations where one can leverage knowledge of some species to predict others. We demonstrate this by example and discuss recent computation tools and future directions.

Section snippets

A New Phase for Community Modeling in Ecology

Many of the questions posed in ecology require the consideration of abundance (see Glossary, including presence/absence) collected simultaneously across multiple taxonomic groups, for example species. The abundances in different taxa typically form the response variables in a multivariate analysis and are analyzed for several different goals, recent examples include: to study the impact of experimental removal of invasive crayfish on macroinvertebrate communities [1], to find taxa that can act

Joint Models for Abundance

The methods described in this paper are all extensions of the generalized linear model (GLM) [19], widely used to model abundance (e.g., 20, 21, 22). A joint model necessarily requires the inclusion of random effects, hence some form of mixed model [23], to capture correlation in abundance across taxa. There are several ways to proceed, and a key issue to consider is the level of complexity in the model. A balance needs to be found between using a sufficiently simple model that its parameters

Modeling Residual Correlation Between Taxa

An important application of joint models is in estimating the correlation between taxa that arises for reasons not attributable to the measured predictors included in the model. Such correlation could be due to biotic interactions such as competition and facilitation, although the exact type of biotic interaction cannot be inferred from co-occurrence 4, 54. It could also be due to joint response to unmeasured predictors, or to other forms of misspecification of the mean model [26].

If the number

Model-Based Ordination

By treating latent variables as ordination axes, a LVM (commonly with two latent variables) can be understood as a model-based approach to unconstrained ordination 32, 33. A model-based approach to ordination offers several advantages over traditional ordination methods. For example, models can be used to account for important (and otherwise spurious) data properties such as the mean–variance relationship [56]. Model selection and residual analysis tools can be used to verify key aspects of a

Multivariate Inferences about Predictors

Joint models, whether GLMMs or LVMs, can be used to make multivariate inferences about the effect of the predictor variables xi, while accounting for any residual correlation between taxa. Accounting for correlation between taxa, and doing so in a flexible way, is important to ensure that inferences made jointly across multiple taxa are statistically valid. Two examples of this are when studying how well species traits explain interspecific variation in environmental response (Box 2) and when

Accounting for Missing Predictors

While diagnostic tools can be used to check assumptions, one can never be sure that all assumptions in the mean model are correct, and some violations remain hard to detect. One or more important predictors could be missing from the study, or perhaps the form in which the measured predictors enter the model is incorrect (e.g., assuming a quadratic response when the true response is more complex). The statistical term for such failures is ‘misspecification’ of the mean model [66]. Fortunately,

Improving Predictions

When predicting abundance across a set of correlated taxa, joint models could improve predictive performance even if the model were correctly specified.

Joint models have a particular advantage for in-sample prediction because they can make use of correlations across taxa, which contain information useful for predicting abundance of one taxon from others. For example, when using a LVM, if predictions are made on the same samples that were used to fit the joint model, then one can condition on

Concluding Remarks

Joint models are flexible tools with exciting potential for application in ecology, especially community ecology, where the number of taxa is rarely small compared to the number of samples. In such instances a latent variable approach can be used for a range of purposes, as discussed here, although this list is by no means exhaustive.

Both multivariate GLMMs and LVMs can be understood as special types of mixed effects models designed for multivariate data. Hence they can be used for much the

Acknowledgments

D.I.W was supported by an Australian Research Council Future Fellowship (FT120100501) and an Australian Academy of Science travel grant. B.O’H. was supported by a LOEWE (Landes-Offensive zur Entwicklung Wissenschaftlich-ökonomischer Exzellenz) initiative of the Hessian Ministry for Science and the Arts. O.O. and S.T. were supported by Academy of Finland grants 250444 and 251965, respectively. F.K.C.H. was supported by Australian Research Council discovery project grant DP140101259. We thank the

Glossary

Abundance
the extent to which a type of organism is present in a sample unit, measured either as a count, biomass, % cover, a factor with ordered levels, or presence/absence.
Continuous variable
a variable that can take any value within some interval (cf. discrete variable). Abundance is rarely continuous, complicating the modeling process.
Discrete variable
a variable that can take one of a countable number of distinct values. Abundance is often discrete, for example counts could be 0, 1, 2, 3,...

References (92)

  • T.S. Doherty

    A continental-scale analysis of feral cat diet in australia

    J. Biogeogr.

    (2015)
  • K. Faust et al.

    Microbial interactions: from networks to models

    Nat. Rev. Microbiol.

    (2012)
  • P. Legendre et al.

    Numerical Ecology

    (2012)
  • D.I. Warton

    Penalized normal likelihood and ridge regularization of correlation and covariance matrices

    J. Am. Stat. Assoc.

    (2008)
  • D.I. Warton

    Model-based thinking for community ecology

    Plant Ecol.

    (2015)
  • H. Gauch

    Multivariate Analysis in Community Ecology

    (1982)
  • P. Legendre

    Relating behavior to habitat: solutions to the fourth-corner problem

    Ecology

    (1997)
  • M.J. Anderson

    A new method for non-parametric multivariate analysis of variance

    Aust. Ecol.

    (2001)
  • S. Ferrier et al.

    Spatial modelling of biodiversity at the community level

    J. Appl. Ecol.

    (2006)
  • J. Elith et al.

    Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines

    Divers. Distrib.

    (2007)
  • A. Guisan et al.

    SESAM: a new framework integrating macroecological and species distribution models for predicting spatio-temporal patterns of species assemblages

    J. Biogeogr.

    (2011)
  • P.J. Leitão

    Mapping beta diversity from space: Sparse generalised dissimilarity modelling (SGDM) for analysing high-dimensional data

    Methods Ecol. Evol.

    (2015)
  • N. Cressie

    Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling

    Ecol. Appl.

    (2009)
  • A.J. Dobson et al.

    An Introduction to Generalized Linear Models

    (2008)
  • R.B. O’Hara et al.

    Do not log-transform count data

    Methods Ecol. Evol.

    (2010)
  • D.I. Warton et al.

    The arcsine is asinine: the analysis of proportions in ecology

    Ecology

    (2011)
  • E. Szöcs et al.

    Ecotoxicology is not normal

    Environ. Sci. Pollut. Res.

    (2015)
  • T. Jamil

    Selecting traits that explain species–environment relationships: a generalized linear mixed model approach

    J. Vegetation Sci.

    (2013)
  • O. Ovaskainen

    Modeling species co-occurrence by multivariate logistic regression generates new hypotheses on fungal interactions

    Ecology

    (2010)
  • W.D. Kissling

    Towards novel approaches to modelling biotic interactions in multispecies assemblages at large spatial extents

    J. Biogeogr.

    (2012)
  • L.J. Pollock

    Understanding co-occurrence by modelling species simultaneously with a Joint Species Distribution Model (JSDM)

    Methods Ecol. Evol.

    (2014)
  • J.S. Clark

    More than the sum of the parts: forest climate response from Joint Species Distribution Models

    Ecol. Appl.

    (2014)
  • A. Skrondal et al.

    Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models

    (2004)
  • D.J. Bartholomew

    Latent Variable Models and Factor Analysis: A Unified Approach

    (2011)
  • J. Quackenbush

    Computational analysis of microarray data

    Nat. Rev. Genet.

    (2001)
  • S.C. Walker et al.

    Random-effects ordination: describing and predicting multivariate correlations and co-occurrences

    Ecol. Monogr.

    (2011)
  • F.K.C. Hui

    Model-based approaches to unconstrained ordination

    Methods Ecol. Evol.

    (2015)
  • D. Goodall

    Objective methods for the classification of vegetation. iii. an essay in the use of factor analysis

    Aust. J. Bot.

    (1954)
  • R.H.G. Jongman

    Data Analysis in Community and Landscape Ecology

    (1995)
  • A.D. Letten

    Fine-scale hydrological niche differentiation through the lens of multi-species co-occurrence models

    J. Ecol.

    (2015)
  • J.B. Grace

    Structural Equation Modeling and Natural Systems

    (2006)
  • J. Belmaker et al.

    Relative roles of ecological and energetic constraints, diversification rates and region history on global species richness gradients

    Ecol. Lett.

    (2015)
  • A. Bhattacharya et al.

    Sparse Bayesian infinite factor models

    Biometrika

    (2011)
  • K.Y. Liang et al.

    Longitudinal data analysis using generalized linear models

    Biometrika

    (1986)
  • D.I. Warton

    Regularized sandwich estimators for analysis of high-dimensional data using generalized estimating equations

    Biometrics

    (2011)
  • J.W. Hardin

    Generalized Estimating Equations (GEE)

    (2005)
  • Cited by (538)

    • Ecological niche modelling

      2024, Current Biology
    View all citing articles on Scopus
    View full text