Skip to main content

Hybrid Splitting Criterion in Decision Trees for Data Stream Mining

  • Conference paper
  • First Online:
Book cover Artificial Intelligence and Soft Computing (ICAISC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9693))

Included in the following conference series:

Abstract

In this paper the issue of splitting criteria used in decision tree induction algorithm designed for data streams is analyzed. A hybrid splitting criterion is proposed which combines two criteria established for two different split measure functions: the Gini gain and the split measure based on the misclassification error. The hybrid splitting criterion reveals advantages of its both component. The online decision tree with hybrid criterion demonstrates higher classification accuracy than the online decision trees with both considered single criteria.

M. Pawlak carried out this research at USS during his sabbatical leave from University of Manitoba.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.: Data Streams: Models and Algorithms. Springer, New York (2007)

    Book  MATH  Google Scholar 

  2. Aghdam, M.H., Heidari, S.: Feature selection using particle swarm optimization in text categorization. J. Artif. Intell. Soft Comput. Res. 5(4), 231–238 (2015)

    Article  Google Scholar 

  3. Bas, E.: The training of multiplicative neuron model based artificial neural networks with differential evolution algorithm for forecasting. J. Artif. Intell. Soft Comput. Res. 6(1), 5–11 (2016)

    Article  Google Scholar 

  4. Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 7th SIAM International Conference on Data Mining (SDM 2007), pp. 443–449 (2007)

    Google Scholar 

  5. Bifet, A., Kirkby, R.: Data stream mining: a practical approach. Technical report, The University of Waikato (2009)

    Google Scholar 

  6. Bilski, J., Smolag, J.: Parallel architectures for learning the RTRN and elman dynamic neural networks. IEEE Trans. Parallel Distrib. Syst. 26(9), 2561–2570 (2015)

    Article  Google Scholar 

  7. Bilski, J., Smolag, J., Zurada, J.M.: Parallel approach to the levenberg-marquardt learning algorithm for feedforward neural networks. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing. LNCS, vol. 9119, pp. 3–14. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  8. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)

    MATH  Google Scholar 

  9. Cpałka, K., Rebrova, O., Nowicki, R., Rutkowski, L.: On design of flexible neuro-fuzzy systems for nonlinear modelling. Int. J. Gen. Syst. 42(6), 706–720 (2013)

    Article  MATH  Google Scholar 

  10. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)

    Google Scholar 

  11. Duda, P., Hayashi, Y., Jaworski, M.: On the strong convergence of the orthogonal series-type Kernel regression neural networks in a non-stationary environment. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 47–54. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Duda, P., Jaworski, M., Pietruczuk, L.: On pre-processing algorithms for data stream. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 56–63. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Er, M.J., Duda, P.: On the weak convergence of the orthogonal series-type Kernel regresion neural networks in a non-stationary environment. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 443–450. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC, Boca Raton (2010)

    Book  MATH  Google Scholar 

  15. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)

    Article  MATH  Google Scholar 

  16. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  17. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106 (2001)

    Google Scholar 

  18. Ikonomovska, E., Loskovska, S., Gjorgjevik, D.: A survey of stream data mining. In: Proceedings of the 8th National Conference with International Participation, ETAI, pp. 19–21 (2007)

    Google Scholar 

  19. Jaworski, M.: Data stream mining algorithms based on hybrid techniques. Ph.D. thesis, Institute of Computational Intelligence, Czestochowa University of Technology, Poland (2015)

    Google Scholar 

  20. Jaworski, M., Duda, P., Pietruczuk, L.: On fuzzy clustering of data streams with concept drift. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 82–91. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  21. Jaworski, M., Er, M.J., Pietruczuk, L.: On the application of the Parzen-Type Kernel regression neural network and order statistics for learning in a non-stationary environment. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 90–98. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  22. Jaworski, M., Pietruczuk, L., Duda, P.: On resources optimization in fuzzy clustering of data streams. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 92–99. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  23. Jaworski, M., Rutkowski, L., Pietruczuk, L., Duda, P.: New frameworks and splitting criteria for decision trees in stream data mining. IEEE Trans. Neural Netw. Learn. Syst. (2016). (submitted for publication)

    Google Scholar 

  24. Kirkby, R.: Improving Hoeffding Trees. Ph.D. thesis, University of Waikato (2007)

    Google Scholar 

  25. Kitajima, R., Kamimura, R.: Accumulative information enhancement in the self-organizing maps and its application to the analysis of mission statements. J. Artif. Intell. Soft Comput. Res. 5(3), 161–176 (2015)

    Article  Google Scholar 

  26. Korytkowski, M., Nowicki, R., Scherer, R.: Neuro-fuzzy rough classifier ensemble. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part I. LNCS, vol. 5768, pp. 817–823. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  27. Korytkowski, M., Rutkowski, L., Scherer, R.: Fast image classification by boosting fuzzy classifiers. Inf. Sci. 327, 175–182 (2016)

    Article  MathSciNet  Google Scholar 

  28. Matuszyk, P., Krempl, G., Spiliopoulou, M.: Correcting the usage of the hoeffding inequality in stream mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 298–309. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  29. McDiarmid, C.: On the method of bounded differencies. In: Surveys in Combinatorics, pp. 148–188 (1989)

    Google Scholar 

  30. Miyajima, H., Shigei, N., Miyajima, H.: Performance comparison of hybrid electromagnetism-like mechanism algorithms with descent method. J. Artif. Intell. Soft Comput. Res. 5(4), 271–282 (2015)

    Article  MATH  Google Scholar 

  31. Nowicki, R.: Rough sets in the neuro-fuzzy architectures based on monotonic fuzzy implications. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 510–517. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  32. Nowicki, R., Nowicki, R.: Rough sets in the neuro-fuzzy architectures based on non-monotonic fuzzy implications. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 518–525. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  33. Nowicki, R., Rutkowski, L.: Soft techniques for bayesian classification. In: Rutkowski, L., Kacprzyk, J. (eds.) Neural Networks and Soft Computing. Advances in Soft Computing, pp. 537–544. Physica-Verlag, A Springer–Verlag Company, Heidelberg (2003)

    Chapter  Google Scholar 

  34. Pietruczuk, L., Duda, P., Jaworski, M.: A new fuzzy classifier for data streams. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 318–324. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  35. Pietruczuk, L., Duda, P., Jaworski, M.: Adaptation of decision trees for handling concept drift. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013, Part I. LNCS, vol. 7894, pp. 459–473. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  36. Pietruczuk, L., Zurada, J.M.: Weak convergence of the recursive Parzen-Type probabilistic neural network in a non-stationary environment. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 521–529. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  37. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  38. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  39. Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: The CART decision tree for mining data streams. Inf. Sci. 266, 1–15 (2014)

    Article  Google Scholar 

  40. Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)

    Article  Google Scholar 

  41. Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Knowl. Data Eng. 26(5), 1048–1059 (2015)

    MathSciNet  Google Scholar 

  42. Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)

    Article  Google Scholar 

  43. Sakurai, S., Nishizawa, M.: A new approach for discovering top-k sequential patterns based on the variety of items. J. Artif. Intell. Soft Comput. Res. 5(2), 141–153 (2015)

    Article  Google Scholar 

  44. Woźniak, M., Kempa, W.M., Gabryel, M., Nowicki, R.K., Shao, Z.: On applying evolutionary computation methods to optimization of vacation cycle costs in finite-buffer queue. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014, Part I. LNCS, vol. 8467, pp. 480–491. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by the Polish National Science Center under Grant No. 2014/13/N/ST6/01848.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Jaworski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Jaworski, M., Rutkowski, L., Pawlak, M. (2016). Hybrid Splitting Criterion in Decision Trees for Data Stream Mining. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39384-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39383-4

  • Online ISBN: 978-3-319-39384-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics