Skip to main content

Statistical Modelling

  • Chapter
  • First Online:
  • 4617 Accesses

Abstract

In this chapter, we present statistical modelling approaches for predictive tasks in business and science. Most prominent is the ubiquitous multiple linear regression approach where coefficients are estimated using the ordinary least squares algorithm. There are many derivations and generalizations of that technique. In the form of logistic regression, it has been adapted to cope with binary classification problems. Various statistical survival models allow for modelling of time-to-event data. We will detail the many benefits and a few pitfalls of these techniques based on real-world examples. A primary focus will be on pointing out the added value that these statistical modelling tools yield over more black box-type machine-learning algorithms. In our opinion, the added value predominantly stems from the often much easier interpretation of the model, the availability of tools that pin down the influence of the predictor variables in concise form, and finally from the options they provide for variable selection and residual analysis, allowing for user-friendly model development, refinement, and improvement.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Allison, P. D. (2010). Survival analysis using SAS: A practical guide (2nd ed.). Cary, NC: SAS Institute.

    Google Scholar 

  • Cox, D. R. (1958). The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society B, 20, 215–242.

    MathSciNet  MATH  Google Scholar 

  • Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society, Series B., 34(2), 187–220.

    MathSciNet  MATH  Google Scholar 

  • Diggle, P. J., & Chetwynd, A. G. (2011). Statistics and scientific method: An introduction for students and researcher. New York: Oxford University Press.

    Book  Google Scholar 

  • Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (Springer series in statistics). New York: Springer.

    Book  Google Scholar 

  • Harrell, F. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis (Springer series in statistics). Heidelberg: Springer.

    Book  Google Scholar 

  • Hastie, T., & Tibshirani, R. (1990). Generalized additive models. London: Chapman and Hall.

    MATH  Google Scholar 

  • Kalbfleisch, J. D., & Prentice, R. L. (2002). The statistical analysis of failure time data (Wiley series in probability and statistics) (2nd ed.). Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.

    Article  Google Scholar 

  • Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer.

    Book  Google Scholar 

  • Leitgöb, H. (2013). The problem of modelling rare events in ML-based logistic regression – Assessing potential remedies via MC simulations. Conference Paper at European Survey Research Association, Ljubliana.

    Google Scholar 

  • McCullagh, P., & Nelder, J. (1989). Generalized linear models (Monographs on statistics & applied probability) (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Montgomery, D., Peck, E., & Vining, G. (2006). Introduction to linear regression analysis. New York: Wiley Interscience.

    MATH  Google Scholar 

  • Plackett, R. L. (1972). The discovery of the method of least squares. Biometrika, 59(2), 239–251.

    MathSciNet  MATH  Google Scholar 

  • Sen, A., & Srivastava, M. (1990). Regression analysis: Theory, methods, and applications. New York: Springer.

    Book  Google Scholar 

  • Stigler, S. M. (1981). Gauss and the invention of least squares. Annals of Statistics, 9(3), 465–474.

    Article  MathSciNet  Google Scholar 

  • Tufféry, S. (2011). Data mining and statistics for decision making. Chichester: Wiley.

    Book  Google Scholar 

  • Wood, S. (2006). Generalized additive models: An introduction with R (Texts in statistical science). Boca Raton, FL: Chapman & Hall/CRC.

    Book  Google Scholar 

Download references

Acknowledgments

The authors thank the editors for their constructive comments, which have led to significant improvements of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Ruckstuhl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dettling, M., Ruckstuhl, A. (2019). Statistical Modelling. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds) Applied Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-11821-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11821-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11820-4

  • Online ISBN: 978-3-030-11821-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics