Abstract
The paper describes research on ways of datasets discretization, when test datasets are used for evaluation of a classifier. Three different approaches of processing for training and test datasets are presented: “independent”—where discretization is performed separately for both sets assuming that the same algorithm parameters are used; “glued”—where both sets are concatenated, discretized, and resulting set is separated to obtain training and test sets, and finally “test on learn”—where test dataset is discretized using ranges obtained from learning data. All methods have been investigated and tested in authorship attribution domain using Naive Bayes classifier.
References
Baron, G.: Influence of data discretization on efficiency of Bayesian Classifier for authorship attribution. Procedia Comput. Sci. 35, 1112–1121 (2014)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the 12th International Conference, pp. 194–202. Morgan Kaufmann (1995)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1029 (1993)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Kim, S.B., Han, K.S., Rim, H.C., Myaeng, S.H.: Some effective techniques for Naive Bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)
Kononenko, I.: On biases in estimating multi-valued attributes. In: 14th International Joint Conference on Articial Intelligence, pp. 1034–1040 (1995)
Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. In: Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth. HCI, Information Retrieval and Pervasive Technologies, pp. 3–24. IOS Press, Amsterdam, The Netherlands (2007)
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. Int. Trans. Comput. Sci. Eng. 1(32), 47–58 (2006)
McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: AAAI-98 Workshop On Learning For Text Categorization, pp. 41–48. AAAI Press (1998)
Schneider, K.M.: Techniques for improving the performance of Naive Bayes for text classification. In: Proceedings of 6th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), pp. 682–693 (2005)
Stańczyk, U.: Rule-based approach to computational stylistics. In: Bouvry, P., Kłopotek, M., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) Security and Intelligent Information Systems, LNCS (LNAI), vol. 7053, pp. 168–179. Springer, Berlin (2012)
Stańczyk, U.: Ranking of characteristic features in combined wrapper approaches to selection. Neural Comput. Appl. 26(2), 329–344 (2015)
Youn, E., Jeong, M.K.: Class dependent feature scaling method using Naive Bayes classifier for text datamining. Pattern Recognit. Lett. 30(5), 477–485 (2009)
Acknowledgments
The research described was performed at the Silesian University of Technology, Gliwice, Poland, in the framework of the project BK/RAu2/2016. All experiments were performed using WEKA workbench [4].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Baron, G., Harężlak, K. (2016). On Approaches to Discretization of Datasets Used for Evaluation of Decision Systems. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds) Intelligent Decision Technologies 2016. Smart Innovation, Systems and Technologies, vol 57. Springer, Cham. https://doi.org/10.1007/978-3-319-39627-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-39627-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39626-2
Online ISBN: 978-3-319-39627-9
eBook Packages: EngineeringEngineering (R0)