Abstract
Classifying non-stationary and imbalanced data streams encompasses two important challenges, namely concept drift and class imbalance. “Concept drift” (or non-stationarity) is changes in the underlying function being learnt, and class imbalance is vast difference between the numbers of instances in different classes of data. Class imbalance is an obstacle for the efficiency of most classifiers and is usually observed in two-class datasets. Previous methods for classifying non-stationary and imbalanced data streams mainly focus on batch solutions, in which the classification model is trained using a chunk of data. Here, we propose an online perceptron model. The main contribution is a new error model inspired from the error model of recursive least square (RLS) filter. In the proposed error model, non-stationarity is handled with the forgetting factor of RLS error model and for handling class imbalance two different errors weighting strategies are proposed. These strategies are verified with convergence and tracking theories from adaptive filters theory. The proposed methods is evaluated on two synthetic and six real-world two-class datasets and compared with seven previous online perceptron models. The results show statistically significant improvement to previous methods.
Similar content being viewed by others
References
Abdulsalam H, Skillicorn DB, Martin P (2011) Classification using streaming random forests. Knowl Data Eng IEEE Trans 23(1):22–36
Alippi C, Boracchi G, Roveri M (2011) A just-in-time adaptive classification system based on the intersection of confidence intervals rule. Neural Netw 24(8):791–800
Alpaydın E (2010) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge
Cavallanti G, Cesa-Bianchi N, Gentile C (2007) Tracking the best hyperplane with a simple budget Perceptron. Mach Learn 69(2):143–167. doi:10.1007/s10994-007-5003-0
Cesa-Bianchi N, Conconi A, Gentile C (2005) A second-order perceptron algorithm. SIAM J Comput 34(3):640–668. doi:10.1137/s0097539703432542
Chen S, He H (2010) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol Syst 2(1):35–50
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585
Ditzler G, Polikar R (2010) An ensemble based incremental learning framework for concept drift and class imbalance. Paper presented at the world congress on computational intelligence (WCCI), 18–23 July 2010, Barcelona, Spain
Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley-Interscience, New York
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. Neural Netw IEEE Trans 22(10):1517–1531
Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC Press, Boca Raton
Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. Paper presented at the SIAM international conference, Minneapolis, MN
Goldenshluger A, Nemirovski A (1997) On spatial adaptive estimation of nonparametric regression. Math Methods Stat 6:135–170
Harries M (1999) Splice-2 comparative evaluation: electricity pricing. University of South Wales, NSW, Australia
Haykin S (2001) Adaptive filter theory, 4th edn. Prentice Hall, Englewood Cliffs
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Martínez-Rego D, Fontenla-Romero O, Alonso-Betanzos A (2012) Nonlinear single layer neural network training algorithm for incremental, nonstationary and distributed learning scenarios. Pattern Recogn 45(12):4536–4546
Masud MM (2009) Adaptive classification of scarcely labeled and evolving data streams. University of Texas, Dallas
Masud MM, Jing G, Khan L, Jiawei H, Thuraisingham BM (2011) Classification and novel class detection in concept-drifting data streams under time constraints. Knowl Data Eng IEEE Trans 23(6):859–874
NOAA (2010) “Weather data”. http://users.rowan.edu/~polikar/research/NSE/
Pavlidis NG, Tasoulis DK, Adams NM, Hand DJ (2011) Landa perceptron: an adaptive classifier for data streams. Pattern Recogn 44(1):78–96
Soda P (2011) A multi-objective optimisation approach for class imbalance learning. Pattern Recogn 44(8):1801–1810
Street NW, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. Paper presented at the 7th ACM SIGKDD international conference on knowledge discovery and data mining, 26–29 Aug 2001, San Francisco, California, USA
Sun J, Li H (2011) Dynamic financial distress prediction using instance selection for the disposal of concept drift. Expert Syst Appl 38(3):2566–2576
Tahir MA, Kittler J, Yan F (2012) Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn 45(10):3738–3750
Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical report: TCD-CS-2004-15. Computer Science Department, Trinity College Dublin, Dublin
UCI Repository of Machine Learning Database (2007) School of information and computer science, Irvine, CA: University of California. http://www.ics.uci.edu/~mlearn/MLRepository.html
Wang J, You J, Li Q, Xu Y (2012) Extract minimum positive and maximum negative features for imbalanced binary classification. Pattern Recogn 45(3):1136–1145
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:60–101
Yang Y, Wu X, Zhu X (2006) Mining in anticipation for concept change: proactive-reactive prediction in data streams. Data Min Knowl Discov 13(3):261–289
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghazikhani, A., Monsefi, R. & Sadoghi Yazdi, H. Recursive least square perceptron model for non-stationary and imbalanced data stream classification. Evolving Systems 4, 119–131 (2013). https://doi.org/10.1007/s12530-013-9076-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-013-9076-7