Abstract
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of the 29th VLDB Conference, Berlin, Germany (2003)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: KDD, pp. 71–80 (2000)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 71–80. ACM, New York, NY, USA (2000)
Ebbers, M., Abdel-Gayed, A., Budhi, V., Dolot, F.: Addressing Data Volume, Velocity, and Variety with IBM InfoSphere Streams V3.0. (2013)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: A survey of classification methods in data streams. In: Aggarwal, C.C. (ed.) Data Streams. Advances in Database Systems, vol. 31, pp. 39–59. Springer, New York (2007)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM SIGMOD Rec. 34, 18–26 (2005)
Gama, J., Kosina, P.: Learning decision rules from data streams. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two, IJCAI 2011, pp. 1255–1260. AAAI Press (2011)
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall / CRC, London (2010)
Jadhav, A., Jadhav, A., Jadhav, P., Kulkarni, P.: A novel approach for the design of network intrusion detection system (NIDS). In: 2013 International Conference on Sensor Network Security Technology and Privacy Communication System (SNS PCS), pp. 22–27 (2013)
Le, T., Stahl, F., Gomes, J.B., Gaber, M.M., Di Fatta, G.: Computationally efficient rule-based classification for continuous streaming data. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXI, pp. 21–34. Springer International Publishing, Switzerland (2014)
Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382 (2001)
Tennant, M., Stahl, F., Di Fatta, G., Gomes, J.: Towards a parallel computationally efficient approach to scaling up data stream classification. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXI, pp. 51–65. Springer International Publishing, Switzerland (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Tennant, M., Stahl, F., Gomes, J.B. (2015). Fast Adaptive Real-Time Classification for Data Streams with Concept Drift. In: Di Fatta, G., Fortino, G., Li, W., Pathan, M., Stahl, F., Guerrieri, A. (eds) Internet and Distributed Computing Systems. IDCS 2015. Lecture Notes in Computer Science(), vol 9258. Springer, Cham. https://doi.org/10.1007/978-3-319-23237-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-23237-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23236-2
Online ISBN: 978-3-319-23237-9
eBook Packages: Computer ScienceComputer Science (R0)