Skip to main content
Log in

Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Many researchers have applied clustering to handle semi-supervised classification of data streams with concept drifts. However, the generalization ability for each specific concept cannot be steadily improved, and the concept drift detection method without considering the local structural information of data cannot accurately detect concept drifts. This paper proposes to solve these problems by BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) ensemble and local structure mapping. The local structure mapping strategy is utilized to compute local similarity around each sample and combined with semi-supervised Bayesian method to perform concept detection. If a recurrent concept is detected, a historical BIRCH ensemble classifier is selected to be incrementally updated; otherwise a new BIRCH ensemble classifier is constructed and added into the classifier pool. The extensive experiments on several synthetic and real datasets demonstrate the advantage of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Liu Q, Ma H P, Chen E H, Xiong H. A survey of context-aware mobile recommendations. International Journal of Information Technology & Decision Making, 2013, 12(1): 139-172.

  2. Li Y, Si J, Zhou G J, Chen S C. FREL: A stable feature selection algorithm. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(7): 1388-1402.

    MathSciNet  Google Scholar 

  3. Peng Y, Lu B L. Discriminative extreme learning machine with supervised sparsity preserving for image classification. Neurocomputing, 2017, 261: 242-252.

    Google Scholar 

  4. Li Y, Li T, Liu H. Recent advances in feature selection and its applications. Knowledge and Information Systems, 2017, 53(3): 551-577.

    Google Scholar 

  5. Li Y F, Liang D M. Safe semi-supervised learning: A brief introduction. Frontiers of Computer Science, 2019, 13(4): 669-676.

    Google Scholar 

  6. Noorbehbahani F, Fanian A, Mousavi S R, Hasannejad H. An incremental intrusion detection system using a new semi-supervised stream classification method. International Journal of Communication Systems, 2017, 30(4): 1-26.

    Google Scholar 

  7. Sedhai S, Sun A. Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems, 2017, 5(1): 169-175.

    Google Scholar 

  8. Haque A, Khan L, Baron M. SAND: Semi-supervised adaptive novel class detection and classification over data stream. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.1652-1658.

  9. Haque A, Khan L, Baron M, Thuraisingham B M, Aggarwal C C. Efficient handling of concept drift and concept evolution over stream data. In Proc. the 32nd International Conference on Data Engineering, May 2016, pp.481-492.

  10. Wang Y, Li T. Improving semi-supervised co-forest algorithm in evolving data streams. Applied Intelligence, 2018, 48(10): 3248-3262.

    Google Scholar 

  11. Hosseini M J, Gholipour A, Beigy H. An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowledge and Information Systems, 2016, 46(3): 567-597.

    Google Scholar 

  12. Wu X D, Li P P, Hu X G. Learning from concept drifting data streams with unlabeled data. Neurocomputing, 2012, 92: 145-155.

    Google Scholar 

  13. Li P P, Wu X D, Hu X G. Mining recurring concept drifts with limited labeled streaming data. ACM Transactions on Intelligent Systems and Technology, 2012, 3(2): Article No. 32.

    Google Scholar 

  14. Masud M M, Gao J, Khan L et al. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In Proc. the 8th IEEE International Conference on Data Mining, December 2008, pp.929-934.

  15. Masud M M, Woolam C, Gao J et al. Facing the reality of data stream classification: Coping with scarcity of labeled data. Knowledge and Information Systems, 2012, 33(1): 213-244.

    Google Scholar 

  16. Xu W H, Qin Z, Chang Y. Semi-supervised learning based ensemble classifier for stream data. Pattern Recognition and Artificial Intelligence, 2012, 25(2): 292-299. (in Chinese)

  17. Zhang P, Zhu X Q, Tan J L, Guo L. Classifier and cluster ensembles for mining concept drifting data streams. In Proc. the 10th IEEE International Conference on Data Mining, December 2010, pp.1175-1180.

  18. Zhang T, Ramakrishnan R, Livny M. BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182.

    Google Scholar 

  19. Gao J, Fan W, Jiang J, Han J. Knowledge transfer via multiple model local structure mapping. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2008, pp.283-291.

  20. Li Y C, Wang Y L, Liu Q et al. Incremental semi-supervised learning on streaming data. Pattern Recognition, 2019, 88: 383-396.

    Google Scholar 

  21. Zhou Z H. When semi-supervised learning meets ensemble learning. Frontiers of Electrical and Electronic Engineering in China, 2011, 6(1): 6-16.

    Google Scholar 

  22. Zhang M L, Zhou Z H. Classifier ensemble with unlabeled data. arXiv:0909.3593, 2009. https://arxiv.org/abs/0909.3593, August 2010.

  23. Zhang M L, Zhou Z H. Exploiting unlabeled data to enhance ensemble diversity. Data Mining and Knowledge Discovery, 2013, 26(1): 98-129.

    MathSciNet  MATH  Google Scholar 

  24. Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA: Massive online analysis. Journal of Machine Learning Research, 2010, 11: 1601-1604.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Min Wen.

Electronic supplementary material

ESM 1

(PDF 508 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, YM., Liu, S. Semi-Supervised Classification of Data Streams by BIRCH Ensemble and Local Structure Mapping. J. Comput. Sci. Technol. 35, 295–304 (2020). https://doi.org/10.1007/s11390-020-9999-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-9999-y

Keywords

Navigation