Abstract
Classification, the task of assigning objects to a given set of categories, is used in almost every field. One important sub-branch of classification consists of methods that learn classification functions from example data. The following chapter will provide an overview of the most basic concepts and methods of this type of data-driven classification. We will first highlight the basic ideas behind classification, along with some examples related to tourism. Thereafter, we will introduce measures of classification performance, which are necessary to direct data-driven training of classification functions and/or to evaluate classification results. As an essential part of this chapter, we will provide self-contained, yet stripped-down, descriptions of the most crucial data-driven classification methods. As such, we will focus on nearest neighbor classifiers, logistic regression, Naïve Bayes, decision trees and ensemble variants thereof, support vector machines, and finally, artificial neural networks. All of the concepts and methods will then be applied to a specific use case in an accompanying Jupyter notebook, demonstrating the practical implementation of these concepts and methods through the use of Python and the machine learning framework scikit-learn.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., … Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems.
Allison, P. D. (2002). Missing data. Quantitative applications in the social sciences (Vol. 136). SAGE Publications.
Ben-Hur, A., Ong, C. S., Sonnenburg, S., Schölkopf, B., & Rätsch, G. (2008). Support vector machines and kernels for computational biology.PLoS Comput. Biol., 4(10), e1000173. https://doi.org/10.1371/journal.pcbi.1000173
Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems, 24.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (Eds.). (1984). Classification and regression trees. CRC Press.
Chollet, F. (2018). Deep learning with python. Safari tech books online. Manning Publications.
Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate deep network learning by exponential linear units (ELUs). In Proceedings of the Fourth International Conference on Learning Representations. San Juan, Puerto Rico.
Cortes, C., & Vapnik, V. N. (1995). Support vector networks. Machine Learning, 20, 273–297.
Cox, D. R. (1966). Some procedures connected with the logistic qualitative response curve. In F. N. David (Ed.), Research papers in probability and statistics (festschrift for J. Neyman) (pp. 55–71). John Wiley & Sons.
Cramer, J. S. (2002). The origins of logistic regression (Tinbergen institute working paper no. 2002-119/4). https://doi.org/10.2139/ssrn.360300
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874.
Fix, E., & Hodges, J. (1951). Discriminatory analysis. Nonparametric discrimination; consistency properties. Randolph Field, TX.
Florkowski, C. M. (2008). Sensitivity, specificity, receiver-operating characteristic (ROC) curves and likelihood ratios: Communicating the performance of diagnostic tests. Clinical Biochemist Reviews, 29(Suppl. 1), S83–S87.
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (pp. 315–323).
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Hand, D. J., & Yu, K. (2001). Idiot's Bayes: Not so stupid after all? International Statistical Review, 69(3), 385. https://doi.org/10.2307/1403452
Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26(2), 451–471.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (second) Springer series in statistics. Springer.
Hinton, G. (2012). Neural networks for machine learning online course. Retrieved from http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
Hoerl, A. E. (1962). Application of ridge analysis to regression problems. Chemical Engineering Progress, 58, 54–59.
Horton, N. J., & Kleinman, K. P. (2007). Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician, 61(1), 79–90. https://doi.org/10.1198/000313007X172556
John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In P. Besnard & S. Hanks (Eds.), Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (pp. 338–345). Morgan Kaufmann.
Jolliffe, I. (2014). Principal component analysis. In B. S. Everitt & D. Howell (Eds.), Encyclopedia of statistics in behavioral science. John Wiley & Sons. https://doi.org/10.1002/9781118445112.stat06472
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3th International Conference for Learning Representations, San Diego, CA.
Le Cessie, S., & van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C, 41(1), 191. https://doi.org/10.2307/2347628
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Luntz, A., & Brailovsky, V. (1969). On estimation of characters obtained in statistical procedure of recognition. Techicheskaya Kibernetica, 3. (in Russian).
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent. Advances in neural information processing systems (Vol. 12, pp. 512–518). MIT Press.
McKinney, W. (2010). Data structures for statistical computing in python. In Proceedings of the Python in Science Conference, Proceedings of the 9th Python in Science Conference (pp. 56–61). SciPy. https://doi.org/10.25080/Majora-92bf1922-00a
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.
Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169–198. https://doi.org/10.1613/jair.614
Oskolkov, N. (2021). Dimensionality reduction. In R. Egger (Ed.), Tourism on the verge. Applied data science in tourism: Interdisciplinary approaches, methodologies and applications. Springer.
Palme, J., Hochreiter, S., & Bodenhofer, U. (2015). KeBABS: An R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15), 2574–2576. https://doi.org/10.1093/bioinformatics/btv176
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems 32 (pp. 8024–8035). Curran Associates.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, É. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(85), 2825–2830.
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In P. J. Bartlett, B. Schölkopf, D. Schuurmans, & A. J. Smola (Eds.), Advances in large margin classifiers (pp. 61–74). MIT Press.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann.
Ramos-Henríquez, J. M., Gutiérrez-Taño, D., & Díaz-Armas, J. (2021). Value proposition operationalization in peer-to-peer platforms using machine learning. Tourism Management, 84, 104288. https://doi.org/10.1016/j.tourman.2021.104228
Reif, J., & Schmücker, D. (2020). Exploring new ways of visitor tracking using big data sources: Opportunities and limits of passive mobile data for tourism. Journal of Destination Marketing and Management, 18, 100481.https://doi.org/10.1016/j.jdmm.2020.100481
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. https://doi.org/10.1038/323533a0
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.https://doi.org/10.1007/BF00116037
Schapire, R. E. (2003). The boosting approach to machine learning: An overview. In D. D. Denison, M. H. Hansen, C. C. Holmes, B. Mallick, & B. Yu (Eds.), Lecture notes in statistics: Vol. 171. Proceedings MSRI workshop on nonlinear estimation and classification (pp. 149–171). Springer.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. https://doi.org/10.1016/j.neunet.2014.09.003
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Adaptive computation and machine learning. MIT Press.
Stekhoven, D. J., & Bühlmann, P. (2012). Missforest – Non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118. https://doi.org/10.1093/bioinformatics/btr597
Stöckl, A., & Bodenhofer, U. (2021). Regression. In R. Egger (Ed.), Tourism on the verge. Applied data science in tourism: Interdisciplinary approaches, methodologies and applications. Springer.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. The Journal of the Royal Statistical Society, Series B, 58(1), 267–288.
Tran, N., Schneider, J.-G., Weber, I., & Qin, A. K. (2020). Hyper-parameter optimization in classification: To-do or not-to-do. Pattern Recognition, 103, 107245. https://doi.org/10.1016/j.patcog.2020.107245
Vapnik, V. N. (1998). Statistical learning theory. Adaptive and learning systems. Wiley Interscience.
Veloso, B. M., Leal, F., Malheiro, B., & Burguillo, J. C. (2019). On-line guest profiling and hotel recommendation. Electronic Commerce Research and Applications, 34, 100832. https://doi.org/10.1016/j.elerap.2019.100832
Wu, T.-F., Lin, C.-J., & Weng, R. C. (2004). Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research, 5, 975–1005.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. The Journal of the Royal Statistical Society, Series B, 67(2), 301–320.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Further Readings and Other Sources
Further Readings and Other Sources
-
Logistic regression: https://medium.com/data-science-group-iitr/logistic-regression-simplified-9b4efe801389
-
Naïve Bayes: https://medium.com/x8-the-ai-community/a-simple-introduction-to-naive-bayes-23538a0395a
-
Decision trees: https://medium.com/swlh/a-beginners-guide-to-decision-trees-84ca34927818
-
Random forests: https://medium.com/@harshdeepsingh_35448/understanding-random-forests-aa0ccecdbbbb
-
Gradient tree boosting: https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d
-
Support vector machines: https://medium.com/@LSchultebraucks/introduction-to-support-vector-machines-9f8161ae2fcb
-
Artificial neural networks: https://medium.com/@purnasaigudikandula/a-beginner-intro-to-neural-networks-543267bda3c8
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bodenhofer, U., Stöckl, A. (2022). Classification. In: Egger, R. (eds) Applied Data Science in Tourism. Tourism on the Verge. Springer, Cham. https://doi.org/10.1007/978-3-030-88389-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-88389-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88388-1
Online ISBN: 978-3-030-88389-8
eBook Packages: Business and ManagementBusiness and Management (R0)