Abstract
In this paper the issue of splitting criteria used in decision tree induction algorithm designed for data streams is analyzed. A hybrid splitting criterion is proposed which combines two criteria established for two different split measure functions: the Gini gain and the split measure based on the misclassification error. The hybrid splitting criterion reveals advantages of its both component. The online decision tree with hybrid criterion demonstrates higher classification accuracy than the online decision trees with both considered single criteria.
M. Pawlak carried out this research at USS during his sabbatical leave from University of Manitoba.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.: Data Streams: Models and Algorithms. Springer, New York (2007)
Aghdam, M.H., Heidari, S.: Feature selection using particle swarm optimization in text categorization. J. Artif. Intell. Soft Comput. Res. 5(4), 231–238 (2015)
Bas, E.: The training of multiplicative neuron model based artificial neural networks with differential evolution algorithm for forecasting. J. Artif. Intell. Soft Comput. Res. 6(1), 5–11 (2016)
Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 7th SIAM International Conference on Data Mining (SDM 2007), pp. 443–449 (2007)
Bifet, A., Kirkby, R.: Data stream mining: a practical approach. Technical report, The University of Waikato (2009)
Bilski, J., Smolag, J.: Parallel architectures for learning the RTRN and elman dynamic neural networks. IEEE Trans. Parallel Distrib. Syst. 26(9), 2561–2570 (2015)
Bilski, J., Smolag, J., Zurada, J.M.: Parallel approach to the levenberg-marquardt learning algorithm for feedforward neural networks. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing. LNCS, vol. 9119, pp. 3–14. Springer, Heidelberg (2015)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)
Cpałka, K., Rebrova, O., Nowicki, R., Rutkowski, L.: On design of flexible neuro-fuzzy systems for nonlinear modelling. Int. J. Gen. Syst. 42(6), 706–720 (2013)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Duda, P., Hayashi, Y., Jaworski, M.: On the strong convergence of the orthogonal series-type Kernel regression neural networks in a non-stationary environment. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 47–54. Springer, Heidelberg (2012)
Duda, P., Jaworski, M., Pietruczuk, L.: On pre-processing algorithms for data stream. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 56–63. Springer, Heidelberg (2012)
Er, M.J., Duda, P.: On the weak convergence of the orthogonal series-type Kernel regresion neural networks in a non-stationary environment. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 443–450. Springer, Heidelberg (2012)
Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC, Boca Raton (2010)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106 (2001)
Ikonomovska, E., Loskovska, S., Gjorgjevik, D.: A survey of stream data mining. In: Proceedings of the 8th National Conference with International Participation, ETAI, pp. 19–21 (2007)
Jaworski, M.: Data stream mining algorithms based on hybrid techniques. Ph.D. thesis, Institute of Computational Intelligence, Czestochowa University of Technology, Poland (2015)
Jaworski, M., Duda, P., Pietruczuk, L.: On fuzzy clustering of data streams with concept drift. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 82–91. Springer, Heidelberg (2012)
Jaworski, M., Er, M.J., Pietruczuk, L.: On the application of the Parzen-Type Kernel regression neural network and order statistics for learning in a non-stationary environment. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 90–98. Springer, Heidelberg (2012)
Jaworski, M., Pietruczuk, L., Duda, P.: On resources optimization in fuzzy clustering of data streams. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 92–99. Springer, Heidelberg (2012)
Jaworski, M., Rutkowski, L., Pietruczuk, L., Duda, P.: New frameworks and splitting criteria for decision trees in stream data mining. IEEE Trans. Neural Netw. Learn. Syst. (2016). (submitted for publication)
Kirkby, R.: Improving Hoeffding Trees. Ph.D. thesis, University of Waikato (2007)
Kitajima, R., Kamimura, R.: Accumulative information enhancement in the self-organizing maps and its application to the analysis of mission statements. J. Artif. Intell. Soft Comput. Res. 5(3), 161–176 (2015)
Korytkowski, M., Nowicki, R., Scherer, R.: Neuro-fuzzy rough classifier ensemble. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part I. LNCS, vol. 5768, pp. 817–823. Springer, Heidelberg (2009)
Korytkowski, M., Rutkowski, L., Scherer, R.: Fast image classification by boosting fuzzy classifiers. Inf. Sci. 327, 175–182 (2016)
Matuszyk, P., Krempl, G., Spiliopoulou, M.: Correcting the usage of the hoeffding inequality in stream mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 298–309. Springer, Heidelberg (2013)
McDiarmid, C.: On the method of bounded differencies. In: Surveys in Combinatorics, pp. 148–188 (1989)
Miyajima, H., Shigei, N., Miyajima, H.: Performance comparison of hybrid electromagnetism-like mechanism algorithms with descent method. J. Artif. Intell. Soft Comput. Res. 5(4), 271–282 (2015)
Nowicki, R.: Rough sets in the neuro-fuzzy architectures based on monotonic fuzzy implications. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 510–517. Springer, Heidelberg (2004)
Nowicki, R., Nowicki, R.: Rough sets in the neuro-fuzzy architectures based on non-monotonic fuzzy implications. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 518–525. Springer, Heidelberg (2004)
Nowicki, R., Rutkowski, L.: Soft techniques for bayesian classification. In: Rutkowski, L., Kacprzyk, J. (eds.) Neural Networks and Soft Computing. Advances in Soft Computing, pp. 537–544. Physica-Verlag, A Springer–Verlag Company, Heidelberg (2003)
Pietruczuk, L., Duda, P., Jaworski, M.: A new fuzzy classifier for data streams. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 318–324. Springer, Heidelberg (2012)
Pietruczuk, L., Duda, P., Jaworski, M.: Adaptation of decision trees for handling concept drift. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013, Part I. LNCS, vol. 7894, pp. 459–473. Springer, Heidelberg (2013)
Pietruczuk, L., Zurada, J.M.: Weak convergence of the recursive Parzen-Type probabilistic neural network in a non-stationary environment. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 521–529. Springer, Heidelberg (2012)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: The CART decision tree for mining data streams. Inf. Sci. 266, 1–15 (2014)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Knowl. Data Eng. 26(5), 1048–1059 (2015)
Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)
Sakurai, S., Nishizawa, M.: A new approach for discovering top-k sequential patterns based on the variety of items. J. Artif. Intell. Soft Comput. Res. 5(2), 141–153 (2015)
Woźniak, M., Kempa, W.M., Gabryel, M., Nowicki, R.K., Shao, Z.: On applying evolutionary computation methods to optimization of vacation cycle costs in finite-buffer queue. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014, Part I. LNCS, vol. 8467, pp. 480–491. Springer, Heidelberg (2014)
Acknowledgments
This work was supported by the Polish National Science Center under Grant No. 2014/13/N/ST6/01848.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Jaworski, M., Rutkowski, L., Pawlak, M. (2016). Hybrid Splitting Criterion in Decision Trees for Data Stream Mining. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-39384-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39383-4
Online ISBN: 978-3-319-39384-1
eBook Packages: Computer ScienceComputer Science (R0)