Classification optimization for training a large dataset with Naïve Bayes

Nguyen, Thi Thanh Sang; Do, Pham Minh Thu

doi:10.1007/s10878-020-00578-0

Classification optimization for training a large dataset with Naïve Bayes

Published: 22 April 2020

Volume 40, pages 141–169, (2020)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

395 Accesses
7 Citations
Explore all metrics

Abstract

Book classification is very popular in digital libraries. Book rating prediction is crucial to improve the care of readers. The commonly used techniques are decision tree, Naïve Bayes (NB), neural networks, etc. Moreover, mining book data depends on feature selection, data pre-processing, and data preparation. This paper proposes the solutions of knowledge representation optimization as well as feature selection to enhance book classification and point out appropriate classification algorithms. Several experiments have been conducted and it has been found that NB could provide best prediction results. The accuracy and performance of NB can be improved and outperform other classification algorithms by applying appropriate strategies of feature selections, data type selection as well as data transformation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proposed Approach for Book Recommendation Based on User k-NN

An Improved Dictionary Based Genre Classification Based on Title and Abstract of E-book Using Machine Learning Algorithms

Classification and Retrieval Method of Library Book Information Based on Data Mining

References

Amatriain X, Jaimes A, Oliver N, Pujol JM (2011) Data mining methods for recommender systems. In: Ricci F, Rokach L, Shapira B, Kantor PB (eds) Recommender systems handbook. Springer, Boston, pp 39–71
Chapter Google Scholar
Frank E, Hall MA, Witten IH (2016) The WEKA workbench. Online appendix for “data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on international conference on machine learning, Bari, Italy. Morgan Kaufmann Publishers Inc, pp. 148–156
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet Google Scholar
Han J, Kamber M, Pei J (eds) (2012a) 2—Getting to know your data. In: Data mining, 3rd edn. Morgan Kaufmann, Boston, pp 39–82
Chapter Google Scholar
Han J, Kamber M, Pei J (eds) (2012b) 3—Data preprocessing. In: Data mining, 3rd edn. Morgan Kaufmann, Boston, pp 83–124
Chapter Google Scholar
Han J, Kamber M, Pei J (eds) (2012c) 9—Classification: advanced methods. In: Data mining, 3rd edn. Morgan Kaufmann, Boston, pp 393–442
Chapter Google Scholar
Han J, Kamber M, Pei J (2012d) Data mining, 3rd edn. Morgan Kaufmann, Boston
MATH Google Scholar
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on international conference on machine learning, vol 32. Beijing, China, JMLR.org: II-1188-II-1196
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. arXiv:1310.4546
Nguyen TTS (2019) Model-based book recommender systems using Naive Bayes enhanced with optimal feature selection. In: Proceedings of the 2019 8th international conference on software and computer applications, Penang, Malaysia. ACM, pp 217–222
Novakovic J (2010) The impact of feature selection on the accuracy of Naive Bayes classifier. In: 18th telecommunications forum TELFOR, Serbia, Belgrade
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, Association for Computational Linguistics
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, Boston
Google Scholar
Ratanamahatana C, Gunopulos D (2003) Feature selection for the naive bayesian classifier using decision trees. Appl Artif Intell 17(5–6):475–487
Article Google Scholar
Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Boston, pp 532–538
Chapter Google Scholar
Shi H, Liu Y (2011) Naïve Bayes vs. support vector machine: resilience to missing data. Springer, Berlin
Google Scholar
Taheri S, Mammadov M (2013) Learning the Naive Bayes classifier with optimization models. Int J Appl Math Comput Sci 23(4):787–795
Article MathSciNet Google Scholar
Tin Kam H (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition
Witten IH, Frank E, Hall MA (2011a) Chapter 2—Input: concepts, instances, and attributes. In: Witten IH, Frank E, Hall MA (eds) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Boston, pp 39–60
Chapter Google Scholar
Witten IH, Frank E, Hall MA (2011b) Chapter 5—Credibility: evaluating what’s been learned. In: Witten IH, Frank E, Hall MA (eds) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Boston, pp 147–187
Chapter Google Scholar
Witten IH, Frank E, Hall MA (2011c) Chapter 7 - Data Transformations. In: Witten IH, Frank E, Hall MA (eds) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Boston, pp 305–349
Chapter Google Scholar
Xhemali D, Hinde CJ, Stone RG (2009) Naïve Bayes vs. decision trees vs. neural networks in the classification of training web pages. Int J Comput Sci Issues 4(1):16–23
Google Scholar
Xu W, Jiang L, Yu L (2018) An attribute value frequency-based instance weighting filter for Naive Bayes. J Exp Theor Artif Intell 31:1–12
Google Scholar
Yu H (2009) Support vector machine. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Boston, pp 2890–2892
Chapter Google Scholar

Download references

Acknowledgements

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number: 06/2018/TN.

Author information

Authors and Affiliations

School of Computer Science and Engineering, International University – Vietnam National University, Ho Chi Minh City, Vietnam
Thi Thanh Sang Nguyen & Pham Minh Thu Do

Authors

Thi Thanh Sang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Pham Minh Thu Do
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thi Thanh Sang Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, T.T.S., Do, P.M.T. Classification optimization for training a large dataset with Naïve Bayes. J Comb Optim 40, 141–169 (2020). https://doi.org/10.1007/s10878-020-00578-0

Download citation

Published: 22 April 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s10878-020-00578-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification optimization for training a large dataset with Naïve Bayes

Abstract

Access this article

Similar content being viewed by others

Proposed Approach for Book Recommendation Based on User k-NN

An Improved Dictionary Based Genre Classification Based on Title and Abstract of E-book Using Machine Learning Algorithms

Classification and Retrieval Method of Library Book Information Based on Data Mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classification optimization for training a large dataset with Naïve Bayes

Abstract

Access this article

Similar content being viewed by others

Proposed Approach for Book Recommendation Based on User k-NN

An Improved Dictionary Based Genre Classification Based on Title and Abstract of E-book Using Machine Learning Algorithms

Classification and Retrieval Method of Library Book Information Based on Data Mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation