A Bayesian perspective of statistical machine learning for big data

Sambasivan, Rajiv; Das, Sourish; Sahu, Sujit K.

doi:10.1007/s00180-020-00970-8

A Bayesian perspective of statistical machine learning for big data

Original paper
Published: 01 April 2020

Volume 35, pages 893–930, (2020)
Cite this article

Computational Statistics Aims and scope Submit manuscript

1170 Accesses
15 Citations
Explore all metrics

Abstract

Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword ‘learning’ in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view—where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Vitor Werner de Vargas, Jorge Arthur Schneider Aranda, … Jorge Luis Victória Barbosa

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

A random forest guided tour

Article 19 April 2016

Gérard Biau & Erwan Scornet

References

Al-Jarrah OY, Yoo PD, Muhaidat S, Karagiannidis GK, Taha K (2015) Efficient machine learning for Big Data: a review. Big Data Res 2:87–93
Google Scholar
Andrieu C, De Freitas N, Doucet A, Jordan MI (2003) An introduction to MCMC for machine learning. Mach Learn 50:5–43
MATH Google Scholar
Berger JO (1993) Statistical decision theory and Bayesian analysis, 2nd edn. Springer series in statistics. Springer, New York
Google Scholar
Berger JO (2017) Sequential Analysis, vol 1–3. Palgrave Macmillan UK, London
Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
MathSciNet MATH Google Scholar
Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44:813–852
MathSciNet MATH Google Scholar
Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112:859–877
MathSciNet Google Scholar
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60:223–311
MathSciNet MATH Google Scholar
Bousquet O, Boucheron S, Lugosi G (2004) Introduction to statistical learning theory. Advanced lectures on machine learning. Springer, New York, pp 169–207
MATH Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
MATH Google Scholar
Breiman L (2001a) Random forests. Mach Learn 45:5–32
MATH Google Scholar
Breiman L (2001b) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16:199–231
MATH Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
MATH Google Scholar
Castro R (2018a) 2DI70 - Statistical learning theory, lecture notes. http://www.win.tue.nl/~rmcastro/2DI70/files/2DI70_Lecture_Notes.pdf. Accessed 8 Oct 2019
Castro R (2018b) ELEN6887: Complexity regularization and the squared loss. http://www.win.tue.nl/~rmcastro/6887_10/files/lecture11.pdf. Accessed 8 Oct 2019
Chapelle O, Scholkopf B, Zien A (2010) Semi supervised learning, vol 1. The MIT Press, Cambridge
Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794
Chen Z, Hruschka E, Liu B (2016) Lifelong machine learning and computer reading the web. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 2117–2118
Chipman HA, George EI, McCulloch RE (2006) Bayesian ensemble learning. In: Proceedings of the 19th international conference on neural information processing systems. NIPS’06. MIT Press, Cambridge, pp 265–272
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. McGraw-Hill, New York
MATH Google Scholar
Das S, Dey D (2006) On Bayesian analysis of generalized linear models using Jacobian technique. Am Stat 60:265–268
MathSciNet Google Scholar
Das S, Dey D (2010) On Bayesian inference for generalized multivariate gamma distribution. Stat Probab Lett 80:1492–1499
MathSciNet MATH Google Scholar
Das S, Dey D (2013) On dynamic generalized linear models with applications. Methodol Comput Appl Probab 15:407–421
MathSciNet MATH Google Scholar
Das S, Roy S, Sambasivan R (2018) Fast gaussian process regression for big data. Big Data Res 14:12–26
Google Scholar
Das S, Yang H, Banks D (2012) Synthetic priors that merge opinion from multiple experts. Stat Polit Policy 4:2151–7509
Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:107–113
Google Scholar
Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository, individual household electric power consumption data set. https://archive.ics.uci.edu/ml/machine-learning-databases/00235/. Accessed 8 Oct 2019
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55:78–87
Google Scholar
Duvenaud D (2014) Automatic model construction with gaussian processes. University of Cambridge, Computational and Biological Learning Laboratory, PhD thesis
ForestScience (1998) Forest CoverType Dataset by Forest Science Department of Colorado State University. https://archive.ics.uci.edu/ml/datasets/covertype Data downloaded from UCI Machine Learning Repository. Accessed 8 Oct 2019
Foroughi F, Luksch P (2018) Data science methodology for Cybersecurity Projects. ArXiv preprint arXiv:1803.04219
Friedman JH (1998) Data mining and statistics: What’s the connection? Comput Sci Stat 29:3–9
Google Scholar
Friedman J, Hastie T, Tibshirani R (2009) The elements of statistical learning, 2nd edn. Springer series in statistics. Springer, New York
MATH Google Scholar
Gammerman A, Vovk V, Vapnik V (1998) Learning by transduction. In: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, Burlington, pp 148–155
Gelfand AE, Dey DK (1994) Bayesian model choice: asymptotics and exact calculations. J R Stat Soc Ser B (Methodological) 56:501–514
MathSciNet MATH Google Scholar
Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409
MathSciNet MATH Google Scholar
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. CRC Press, Boca Raton
MATH Google Scholar
Germain P, Lacasse A, Laviolette F, Marchand M (2009) PAC-Bayesian learning of linear classifiers. In: Proceedings of the 26th international conference on machine learning (ICML), pp 353–360
Gershman SJ, Blei DM (2012) A tutorial on Bayesian nonparametric models. J Math Psychol 56:1–12
MathSciNet MATH Google Scholar
Ghavamzadeh M, Mannor S, Pineau J, Tamar A (2015) Bayesian reinforcement learning: a survey. Found Trends Mach Learn 8:359–483
MATH Google Scholar
Ghoshal S, Vaart AVD (2017) Fundamentals of nonparametric bayesian inference. Cambridge University Press, Cambridge
MATH Google Scholar
Goodfellow I (2018) Practical methodology for deploying machine learning. https://www.youtube.com/watch?v=NKiwFF_zBu4&t=1781s. Accessed 8 Oct 2019
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press http://www.deeplearningbook.org. Accessed 8 Oct 2019
Google Research (2019) Quantum Computing, Quantum Computing, Google Research. https://www.ibm.com/quantum-computing/learn/what-is-quantum-computing/. Accessed 8 Oct 2019
Govindaraju V, Rao CR (2013) Machine learning: theory and applications. Elsevier, North Holland
MATH Google Scholar
Guillaumin M, Verbeek J, Schmid C (2010) Multimodal semi-supervised learning for image classification. In: 2010 IEEE computer society conference on computer vision and pattern recognition, CVPR 2010
Haussler D (1992) Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inf Comput 100:78–150
MathSciNet MATH Google Scholar
Head M, Holman L, Lanfear R, Kahn A, Jennions M (2015) The extent and consequences of p-hacking in science. PLOS Biol 13:e1002106. https://doi.org/10.1371/journal.pbio.1002106
Article Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
MATH Google Scholar
Holzinger A (2014) On topological data mining. Interactive knowledge discovery and data mining in biomedical informatics. Springer, New York, pp 331–356
Google Scholar
IBM Q (2019) Quantum computing. https://www.ibm.com/quantum-computing/learn/what-is-quantum-computing/. Accessed 8 Oct 2019
Inmon B (2016) Data lake architecture: designing the data lake and avoiding the garbage dump. Technics Publications, New Jersy
Google Scholar
Jain P, Kar P (2017) Non-convex optimization for machine learning. Found Trends Mach Learn 10:142–336
MATH Google Scholar
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning, ICML 99, pp 200–209
Kadane JB, Wasilkowski GW (1983) Average case-complexity in computer science: a Bayesian view. Technical Report
Karbalayghareh A, Qian X, Dougherty ER (2018) Optimal Bayesian transfer learning. IEEE Trans Signal Process 66:3724–3739
MathSciNet MATH Google Scholar
Kendall A, Gal Y (2017) What uncertainties do we need in bayesian deep learning for computer vision? In: 31st conference on neural information processing systems, NIPS 2017
Kimball R (2013) The data warehouse lifecycle toolkit: expert methods for designing, developing, and deploying data warehouses, 3rd edn. Wiley, New York
Google Scholar
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York
MATH Google Scholar
Larose DT (2005) Discovering knowledge in data: an introduction to data mining. Wiley, New Jersey
MATH Google Scholar
Larose DT (2006) Data mining methods & models. Wiley, New York
MATH Google Scholar
Laskov P, Gehl C, Krüger S, Müller K-R (2006) Incremental support vector learning: analysis, implementation and applications. J Mach Learn Res 7:1909–1936
MathSciNet MATH Google Scholar
Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets, 2nd edn. Cambridge University Press, Cambridge
Google Scholar
Ĺheureux A, Grolinger K, Elyamany HF, Capretz MA (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–7797
Google Scholar
Li Q, Lin N (2010) The Bayesian elastic net. Bayesian Anal 5:151–170
MathSciNet MATH Google Scholar
Lichman M (2016) UCI machine learning repository. https://archive.ics.uci.edu/ml/machine-learning-databases/housing/. Accessed 8 Oct 2019
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the eleventh international conference, Rutgers University, New Brunswick, NJ, July 10–13, pp 157–163
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5:716–727
Google Scholar
Lu Z, Monteiro RD, Yuan M (2012) Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Math Program 131:163–194
MathSciNet MATH Google Scholar
Manfred O, Ole W (1999) A Bayesian approach to on-line learning. In: Saad D (ed) On-line learning in neural networks. Cambridge University Press, Cambridge, pp 363–379
Google Scholar
McCullagh P, Nelder J (1989) Generalized linear models, 2nd edn. Chapman and Hall/CRC, London
MATH Google Scholar
McKinsey (2018) How companies are using big data and analytics, McKinsey & Company. https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/how-companies-are-using-big-data-and-analytics. Accessed 8 Oct 2019
Microsoft Research (2018) Microsoft Research Lab - Asia. https://www.microsoft.com/en-us/research/lab/microsoft-research-asia/articles/machine-learning-research-hotspots/. Accessed 8 Oct 2019
Mitchell TM (2006) The discipline of machine learning, vol 9. Carnegie Mellon University, School of Computer Science, Machine Learning Department, Carnegie Mellon
Google Scholar
National Institute of Standards and Technology - US Department of Commerce (2018) NIST Big Data Interoperability Framework: Volume 1, Definitions. http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-1.pdf
Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer, New York
MATH Google Scholar
Nowak R (2018) Statistical learning theory, Lecture 3. http://nowak.ece.wisc.edu/SLT09/lecture3.pdf. Accessed 8 Oct 2019
Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686
MathSciNet MATH Google Scholar
Park S, Choi S (2010) Hierarchical Gaussian process regression. In: ACML, pp 95–110
Pechyony D (2009) Theory and practice of transductive learning. Computer Science Department, PhD thesis, Technion
Pentina A, Lampert CH (2014) A PAC-Bayesian bound for lifelong learning. In: Proceedings of the 31st international conference on machine learning. ICML 14, vol 32, pp 991–999
Poupart P, Vlassis N, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: Proceedings of the 23rd international conference on Machine learning, ICML 06, pp 697–704
Pratt LY (1992) Discriminability-based transfer between neural networks. Adv Neural Inf Process Syst 5:204–211
Google Scholar
Qiu J, Wu Q, Ding G, Xu Y, Feng S (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016:67
Google Scholar
Quadrianto N, Ghahramani Z (2015) A very simple safe-bayesian random forest. IEEE Trans Pattern Anal Mach Intell 37:1297–1303
Google Scholar
Rajaratnam B, Sparks D (2015) MCMC-based inference in the era of big data: a fundamental analysis of the convergence complexity of high-dimensional chains. https://arxiv.org/abs/1508.00947
Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge
MATH Google Scholar
Ravi Kumar P (2014) Statistical machine learning and Big-p, Big-n, complex Data. http://uwtv.org/series/computer-science-engineering-lecture-series-2013/watch/IxNky5abdL8/. Accessed 8 Oct 2019
Sambasivan R, Das S (2017a) Big data regression using tree based segmentation. In: Proceedings of INDICON, IEEE
Sambasivan R, Das S (2017b) A statistical machine learning approach to yield curve forecasting. In: Proceedings of the international conference on computational intelligence in data science, IEEE
Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6:1–114
MathSciNet MATH Google Scholar
Shalev-Shwartz S (2007) Online learning: theory, algorithms, and applications, PhD thesis, Hebrew University
Shalev-Shwartz S, Singer Y (2008) Tutorial on theory and applications of online learning, Tutorial ICML
Sharma R, Das S (2017) Regularization and variable selection with copula prior. Corespondence https://arxiv.org/abs/1709.05514
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
MATH Google Scholar
Shinal J (2017) Google CEO Sundar PIchai: moving all directions at once. https://www.cnbc.com/2017/05/18/google-ceo-sundar-pichai-machine-learning-big-data.html. Accessed 8 Oct 2019
Shmueli G (2010) To explain or to predict? Stat Sci 25:289–310
MathSciNet MATH Google Scholar
Silver DL, Yang Q, Li L (2013) Lifelong machine learning systems: beyond learning algorithms. In: AAAI Spring Symposium: Lifelong Machine Learning, vol 13, pp 05
Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. In: Proceedings of the 31st conference on neural information processing systems, NIPS
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst 25:2951–2959
Google Scholar
Sutton RS, Barto AG (1998) Introduction to reinforcement learning, vol 135. MIT Press, Cambridge
MATH Google Scholar
Therneau T, Atkinson B, Ripley B (2017) rpart: Recursive Partitioning and Regression Trees R package version 4.1-11
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
MathSciNet MATH Google Scholar
Tibshirani R (2019) Lecture notes in statistical learning. http://statweb.stanford.edu/~tibs/stat315a/glossary.pdf. Accessed 8 Oct 2019
Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analyzers. Neural Comput 11:443–482
Google Scholar
Torrey L, Shavlik J (2009) Transfer learning. In: Soria E, Martin J, Magdalena R, Martinez M, Serrano A (eds) Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, vol 242. IGI Global, Pennsylvania
Google Scholar
Tresp V (2000) A Bayesian committee machine. Neural Comput 12:2719–2741
Google Scholar
UC Berkeley (2018) Statistical machine learning, Univ of California at Berkeley. https://www.stat.berkeley.edu/~statlearning/. Accessed 8 Oct 2019
Van de Geer S (1990) Estimating a regression function. Ann Stat 18:907–924
MathSciNet MATH Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Vlassis N, Ghavamzadeh M, Mannor S, Poupart P (2012) Bayesian reinforcement learning. In: Wiering M, van Otterlo M (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin
Google Scholar
Wasserman L (2004) All of statistics: a concise course in statistical inference. Springer Texts in Statistics. Springer, New York
MATH Google Scholar
Williams C (2015) AI guru Ng: fearing a rise of killer robots is like worrying about overpopulation on Mars. https://www.theregister.co.uk/2015/03/19/andrew_ng_baidu_ai/. Accessed 8 Oct 2019
Wolpert DH (1992) Stacked generalization. Neural Netw 5:241–259
Google Scholar
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8:1341–1390
Google Scholar
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1:67–82
Google Scholar
Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26:97–107
Google Scholar
Xiong H, Pandey G, Steinbach M, Kumar V (2006) Enhancing data analysis with noise removal. IEEE Trans Knowl Data Eng 18:304–319
Google Scholar
Yang Y, Tokdar ST et al (2015) Minimax-optimal nonparametric regression in high dimensions. Ann Stat 43:652–674
MathSciNet MATH Google Scholar
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Proceedings of the 27th international conference on neural information processing systems, pp 3320–3328
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59:56–65
Google Scholar
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on Machine learning, vol 116. ACM
Zhiyuan Chen C, Hruschka E, Liu B (2016) KDD 2016 Tutorials - YouTube. http://www.youtube.com/playlist?list=PLvM6T5w9YQBL6rP1-vGqhAa-SQ84KVv0c. Accessed 8 Oct 2019
Zhu J, Chen J, Hu W, Zhang B (2017) Big learning with Bayesian methods. Natl Sci Rev 4:627–651
Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320
MathSciNet MATH Google Scholar

Download references

Acknowledgements

Sourish Das’s research has been supported by an Infosys Foundation Grant and a TATA Trust Grant to CMI and also by a UK Government funded Commonwealth-Rutherford Scholarship (Grant No. RF 2017-123).

Author information

Authors and Affiliations

Chennai Mathematical Institute, Chennai, India
Rajiv Sambasivan, Sourish Das & Sujit K. Sahu
University of Southampton, Southampton, UK
Rajiv Sambasivan, Sourish Das & Sujit K. Sahu

Authors

Rajiv Sambasivan
View author publications
You can also search for this author in PubMed Google Scholar
Sourish Das
View author publications
You can also search for this author in PubMed Google Scholar
Sujit K. Sahu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sourish Das.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sambasivan, R., Das, S. & Sahu, S.K. A Bayesian perspective of statistical machine learning for big data. Comput Stat 35, 893–930 (2020). https://doi.org/10.1007/s00180-020-00970-8

Download citation

Received: 17 December 2018
Accepted: 20 February 2020
Published: 01 April 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00180-020-00970-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bayesian perspective of statistical machine learning for big data

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey on semi-supervised learning

A random forest guided tour

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Bayesian perspective of statistical machine learning for big data

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey on semi-supervised learning

A random forest guided tour

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation