Skip to main content
Log in

Non-parametric regression on compositional covariates using Bayesian P-splines

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

Methods to perform regression on compositional covariates have recently been proposed using isometric log-ratios (ilr) representation of compositional parts. This approach consists of first applying standard regression on ilr coordinates and second, transforming the estimated ilr coefficients into their contrast log-ratio counterparts. This gives easy-to-interpret parameters indicating the relative effect of each compositional part. In this work we present an extension of this framework, where compositional covariate effects are allowed to be smooth in the ilr domain. This is achieved by fitting a smooth function over the multidimensional ilr space, using Bayesian P-splines. Smoothness is achieved by assuming random walk priors on spline coefficients in a hierarchical Bayesian framework. The proposed methodology is applied to spatial data from an ecological survey on a gypsum outcrop located in the Emilia Romagna Region, Italy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, New York

    Book  MATH  Google Scholar 

  • Aitchison J, Bacon-Shone J (1984) Log contrast models for experiments with mixtures. Biometrika 71(2): 323–330 http://www.jstor.org/stable/2336249

  • Brezger A, Lang S (2006) Generalized structured additive regression based on bayesian P-splines. Comput Stat Data Anal 50(4):967–991. doi:10.1016/j.csda.2004.10.011

    Article  MathSciNet  MATH  Google Scholar 

  • Bruno F, Greco F, Ventrucci M (2014) Spatio-temporal regression on compositional covariates: modeling vegetation in a gypsum outcrop. Environ Ecol Stat. doi:10.1007/s10651-014-0305-4

  • Currie I, Durbán M, Eilers P (2006) Generalized linear array models with applications to multidimensional smoothing. J R Stat Soc B 68:259–280

    Article  MATH  Google Scholar 

  • Di Marzio M, Panzera A, Venieri C (2014) Non-parametric regression for compositional data. Stat Model. doi:10.1177/1471082X14535522

  • Egozcue J, Pawlowsky-Glahn V, Mateu-Figueras G, Barcel-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300. doi:10.1023/A:1023818214614

    Article  MathSciNet  MATH  Google Scholar 

  • Eilers P, Marx B (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11:89–121

    Article  MathSciNet  MATH  Google Scholar 

  • Eilers P, Marx B (2010) Splines, knots, and penalties. Wiley Interdiscip Rev Comput Stat 2:637–653

    Article  Google Scholar 

  • Eilers P, Currie I, Durbán M (2006) Fast and compact smoothing on large multidimensional grids. Comput Stat Data Anal 5:61–76

    Article  Google Scholar 

  • Fahrmeir L, Kneib T, Lang S (2004) penalized structured additive regression for space-time data: a Bayesian perspective. Stat Sin 14:715–745

    MathSciNet  Google Scholar 

  • Goicoa T, Militino A, Ugarte M (2011) Modelling aboveground tree biomass while achieving the additivity property. Environ Ecol Stat 18(2):367–384. doi:10.1007/s10651-010-0137-9

    Article  MathSciNet  Google Scholar 

  • Goicoa T, Ugarte M, Etxeberria J, Militino A (2012) Comparing car and P-spline models in spatial disease mapping. Environ Ecol Stat 19:537–599

    Article  MathSciNet  Google Scholar 

  • Hron K, Filzmoser P, Thompson K (2012) Linear regression with compositional explanatory variables. J Appl Stat 39(5):1115–1128. doi:10.1080/02664763.2011.644268

    Article  MathSciNet  Google Scholar 

  • Kneib T, Muller J, Hothorn T (2008) Spatial smoothing techniques for the assessment of habitat suitability. Environ Ecol Stat 15:343–364

    Article  MathSciNet  Google Scholar 

  • Lang S, Brezger A (2004) Bayesian P-splines. J Comput Graph Stat 13:183–212

    Article  MathSciNet  Google Scholar 

  • Lee D, Durbán M (2009) Smooth-car mixed models for spatial count data. Comput Stat Data Anal 53:2968–2977

    Article  MATH  Google Scholar 

  • Lee DJ, Durbán M (2011) P-spline anova-type interaction models for spatio-temporal smoothing. Stat Model 11(1):49–69. doi:10.1177/1471082X1001100104

    Article  MathSciNet  Google Scholar 

  • Rue H, Held L (2005) Gaussian Markov random fields. Chapman and Hall-CRC, London

    Book  MATH  Google Scholar 

  • Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent gaussian models using integrated nested laplace approximations (with discussion). J R Stat Soc Ser B 71(2):319–392

    Article  MathSciNet  MATH  Google Scholar 

  • Ruppert D, Wand P, Carroll R (2003) Semiparametric regression. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit (with discussion). J R Stat Soc Ser B Stat Methodol 64(4):583–639. doi:10.1111/1467-9868.00353

    Article  MathSciNet  MATH  Google Scholar 

  • Tolosana-Delgado R, van den Boogaart KG (2011) Linear models with compositions in R. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester, pp 356–371

    Chapter  Google Scholar 

  • Tolosana-Delgado R VDBK (2013) Regression between compositional data sets. In: Proceedings of the 5th international workshop on compositional data analysis, Statistical Modelling Society, pp 163–176

  • Ugarte MD, Goicoa T, Militino AF (2010) Spatio-temporal modeling of mortality risks using penalized splines. Environmetrics 21(3–4):270–289. doi:10.1002/env.1011

    MathSciNet  Google Scholar 

  • Ugarte M, Goicoa T, Etxeberria J, Militino A (2012) A P-spline anova type model in space-time disease mapping. Stoch Environ Res Risk Assess 26(6):835–845. doi:10.1007/s00477-012-0570-4

    Article  Google Scholar 

  • Velli A (2014) Relationships between plant diversity and environmental heterogeneity in rupicolous grasslands on gypsum. The case study of Alysso-Sedion albi (Habitat 6110), Ph.D. Dissertation, University of Bologna

  • Wood S (2006) Generalized additive models: an introduction with R. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

Download references

Acknowledgments

We wish to thank Carlo Ferrari, Giovanna Pezzi and Andrea Velli for introducing us to the problem, providing data and performing data pre-processing. The research work underlying this paper was funded by a FIRB 2012 Grant (Project No. RBFR12URQJ; title: Statistical modeling of environmental phenomena: pollution, meteorology, health and their interactions) for research projects by the Italian Ministry of Education, Universities and Research. We thank two anonymous referees for their suggestions and comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Ventrucci.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7702 KB)

Supplementary material 2 (pdf 21911 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bruno, F., Greco, F. & Ventrucci, M. Non-parametric regression on compositional covariates using Bayesian P-splines. Stat Methods Appl 25, 75–88 (2016). https://doi.org/10.1007/s10260-015-0339-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-015-0339-2

Keywords

Navigation