Abstract
Book classification is very popular in digital libraries. Book rating prediction is crucial to improve the care of readers. The commonly used techniques are decision tree, Naïve Bayes (NB), neural networks, etc. Moreover, mining book data depends on feature selection, data pre-processing, and data preparation. This paper proposes the solutions of knowledge representation optimization as well as feature selection to enhance book classification and point out appropriate classification algorithms. Several experiments have been conducted and it has been found that NB could provide best prediction results. The accuracy and performance of NB can be improved and outperform other classification algorithms by applying appropriate strategies of feature selections, data type selection as well as data transformation.
Similar content being viewed by others
References
Amatriain X, Jaimes A, Oliver N, Pujol JM (2011) Data mining methods for recommender systems. In: Ricci F, Rokach L, Shapira B, Kantor PB (eds) Recommender systems handbook. Springer, Boston, pp 39–71
Frank E, Hall MA, Witten IH (2016) The WEKA workbench. Online appendix for “data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on international conference on machine learning, Bari, Italy. Morgan Kaufmann Publishers Inc, pp. 148–156
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Han J, Kamber M, Pei J (eds) (2012a) 2—Getting to know your data. In: Data mining, 3rd edn. Morgan Kaufmann, Boston, pp 39–82
Han J, Kamber M, Pei J (eds) (2012b) 3—Data preprocessing. In: Data mining, 3rd edn. Morgan Kaufmann, Boston, pp 83–124
Han J, Kamber M, Pei J (eds) (2012c) 9—Classification: advanced methods. In: Data mining, 3rd edn. Morgan Kaufmann, Boston, pp 393–442
Han J, Kamber M, Pei J (2012d) Data mining, 3rd edn. Morgan Kaufmann, Boston
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on international conference on machine learning, vol 32. Beijing, China, JMLR.org: II-1188-II-1196
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. arXiv:1310.4546
Nguyen TTS (2019) Model-based book recommender systems using Naive Bayes enhanced with optimal feature selection. In: Proceedings of the 2019 8th international conference on software and computer applications, Penang, Malaysia. ACM, pp 217–222
Novakovic J (2010) The impact of feature selection on the accuracy of Naive Bayes classifier. In: 18th telecommunications forum TELFOR, Serbia, Belgrade
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, Association for Computational Linguistics
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, Boston
Ratanamahatana C, Gunopulos D (2003) Feature selection for the naive bayesian classifier using decision trees. Appl Artif Intell 17(5–6):475–487
Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Boston, pp 532–538
Shi H, Liu Y (2011) Naïve Bayes vs. support vector machine: resilience to missing data. Springer, Berlin
Taheri S, Mammadov M (2013) Learning the Naive Bayes classifier with optimization models. Int J Appl Math Comput Sci 23(4):787–795
Tin Kam H (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition
Witten IH, Frank E, Hall MA (2011a) Chapter 2—Input: concepts, instances, and attributes. In: Witten IH, Frank E, Hall MA (eds) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Boston, pp 39–60
Witten IH, Frank E, Hall MA (2011b) Chapter 5—Credibility: evaluating what’s been learned. In: Witten IH, Frank E, Hall MA (eds) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Boston, pp 147–187
Witten IH, Frank E, Hall MA (2011c) Chapter 7 - Data Transformations. In: Witten IH, Frank E, Hall MA (eds) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Boston, pp 305–349
Xhemali D, Hinde CJ, Stone RG (2009) Naïve Bayes vs. decision trees vs. neural networks in the classification of training web pages. Int J Comput Sci Issues 4(1):16–23
Xu W, Jiang L, Yu L (2018) An attribute value frequency-based instance weighting filter for Naive Bayes. J Exp Theor Artif Intell 31:1–12
Yu H (2009) Support vector machine. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Boston, pp 2890–2892
Acknowledgements
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number: 06/2018/TN.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nguyen, T.T.S., Do, P.M.T. Classification optimization for training a large dataset with Naïve Bayes. J Comb Optim 40, 141–169 (2020). https://doi.org/10.1007/s10878-020-00578-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-020-00578-0