Skip to main content

On Approaches to Discretization of Datasets Used for Evaluation of Decision Systems

  • Conference paper
  • First Online:
Intelligent Decision Technologies 2016

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 57))

Abstract

The paper describes research on ways of datasets discretization, when test datasets are used for evaluation of a classifier. Three different approaches of processing for training and test datasets are presented: “independent”—where discretization is performed separately for both sets assuming that the same algorithm parameters are used; “glued”—where both sets are concatenated, discretized, and resulting set is separated to obtain training and test sets, and finally “test on learn”—where test dataset is discretized using ranges obtained from learning data. All methods have been investigated and tested in authorship attribution domain using Naive Bayes classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Baron, G.: Influence of data discretization on efficiency of Bayesian Classifier for authorship attribution. Procedia Comput. Sci. 35, 1112–1121 (2014)

    Article  Google Scholar 

  2. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the 12th International Conference, pp. 194–202. Morgan Kaufmann (1995)

    Google Scholar 

  3. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1029 (1993)

    Google Scholar 

  4. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  5. Kim, S.B., Han, K.S., Rim, H.C., Myaeng, S.H.: Some effective techniques for Naive Bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)

    Article  Google Scholar 

  6. Kononenko, I.: On biases in estimating multi-valued attributes. In: 14th International Joint Conference on Articial Intelligence, pp. 1034–1040 (1995)

    Google Scholar 

  7. Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. In: Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth. HCI, Information Retrieval and Pervasive Technologies, pp. 3–24. IOS Press, Amsterdam, The Netherlands (2007)

    Google Scholar 

  8. Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. Int. Trans. Comput. Sci. Eng. 1(32), 47–58 (2006)

    Google Scholar 

  9. McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: AAAI-98 Workshop On Learning For Text Categorization, pp. 41–48. AAAI Press (1998)

    Google Scholar 

  10. Schneider, K.M.: Techniques for improving the performance of Naive Bayes for text classification. In: Proceedings of 6th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), pp. 682–693 (2005)

    Google Scholar 

  11. Stańczyk, U.: Rule-based approach to computational stylistics. In: Bouvry, P., Kłopotek, M., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) Security and Intelligent Information Systems, LNCS (LNAI), vol. 7053, pp. 168–179. Springer, Berlin (2012)

    Google Scholar 

  12. Stańczyk, U.: Ranking of characteristic features in combined wrapper approaches to selection. Neural Comput. Appl. 26(2), 329–344 (2015)

    Article  MathSciNet  Google Scholar 

  13. Youn, E., Jeong, M.K.: Class dependent feature scaling method using Naive Bayes classifier for text datamining. Pattern Recognit. Lett. 30(5), 477–485 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

The research described was performed at the Silesian University of Technology, Gliwice, Poland, in the framework of the project BK/RAu2/2016. All experiments were performed using WEKA workbench [4].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katarzyna Harężlak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Baron, G., Harężlak, K. (2016). On Approaches to Discretization of Datasets Used for Evaluation of Decision Systems. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds) Intelligent Decision Technologies 2016. Smart Innovation, Systems and Technologies, vol 57. Springer, Cham. https://doi.org/10.1007/978-3-319-39627-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39627-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39626-2

  • Online ISBN: 978-3-319-39627-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics