On Approaches to Discretization of Datasets Used for Evaluation of Decision Systems

Baron, Grzegorz; Harężlak, Katarzyna

doi:10.1007/978-3-319-39627-9_14

Grzegorz Baron⁵ &
Katarzyna Harężlak⁵

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 57))

800 Accesses
10 Citations

Abstract

The paper describes research on ways of datasets discretization, when test datasets are used for evaluation of a classifier. Three different approaches of processing for training and test datasets are presented: “independent”—where discretization is performed separately for both sets assuming that the same algorithm parameters are used; “glued”—where both sets are concatenated, discretized, and resulting set is separated to obtain training and test sets, and finally “test on learn”—where test dataset is discretized using ranges obtained from learning data. All methods have been investigated and tested in authorship attribution domain using Naive Bayes classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Baron, G.: Influence of data discretization on efficiency of Bayesian Classifier for authorship attribution. Procedia Comput. Sci. 35, 1112–1121 (2014)
Article Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the 12th International Conference, pp. 194–202. Morgan Kaufmann (1995)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1029 (1993)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Kim, S.B., Han, K.S., Rim, H.C., Myaeng, S.H.: Some effective techniques for Naive Bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)
Article Google Scholar
Kononenko, I.: On biases in estimating multi-valued attributes. In: 14th International Joint Conference on Articial Intelligence, pp. 1034–1040 (1995)
Google Scholar
Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. In: Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth. HCI, Information Retrieval and Pervasive Technologies, pp. 3–24. IOS Press, Amsterdam, The Netherlands (2007)
Google Scholar
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. Int. Trans. Comput. Sci. Eng. 1(32), 47–58 (2006)
Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: AAAI-98 Workshop On Learning For Text Categorization, pp. 41–48. AAAI Press (1998)
Google Scholar
Schneider, K.M.: Techniques for improving the performance of Naive Bayes for text classification. In: Proceedings of 6th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), pp. 682–693 (2005)
Google Scholar
Stańczyk, U.: Rule-based approach to computational stylistics. In: Bouvry, P., Kłopotek, M., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) Security and Intelligent Information Systems, LNCS (LNAI), vol. 7053, pp. 168–179. Springer, Berlin (2012)
Google Scholar
Stańczyk, U.: Ranking of characteristic features in combined wrapper approaches to selection. Neural Comput. Appl. 26(2), 329–344 (2015)
Article MathSciNet Google Scholar
Youn, E., Jeong, M.K.: Class dependent feature scaling method using Naive Bayes classifier for text datamining. Pattern Recognit. Lett. 30(5), 477–485 (2009)
Article Google Scholar

Download references

Acknowledgments

The research described was performed at the Silesian University of Technology, Gliwice, Poland, in the framework of the project BK/RAu2/2016. All experiments were performed using WEKA workbench [4].

Author information

Authors and Affiliations

Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Grzegorz Baron & Katarzyna Harężlak

Authors

Grzegorz Baron
View author publications
You can also search for this author in PubMed Google Scholar
Katarzyna Harężlak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katarzyna Harężlak .

Editor information

Editors and Affiliations

Gdynia Maritime University, Gdynia, Poland
Ireneusz Czarnowski
Artificial Intelligence Department, Universidad Politécnica de Madrid, Madrid, Spain
Alfonso Mateos Caballero
KES International, Shoreham-by-sea, UK
Robert J. Howlett
University of Canberra, Canberra, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baron, G., Harężlak, K. (2016). On Approaches to Discretization of Datasets Used for Evaluation of Decision Systems. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds) Intelligent Decision Technologies 2016. Smart Innovation, Systems and Technologies, vol 57. Springer, Cham. https://doi.org/10.1007/978-3-319-39627-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-39627-9_14
Published: 09 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39626-2
Online ISBN: 978-3-319-39627-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

On Approaches to Discretization of Datasets Used for Evaluation of Decision Systems