Skip to main content
Log in

Recursive least square perceptron model for non-stationary and imbalanced data stream classification

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Classifying non-stationary and imbalanced data streams encompasses two important challenges, namely concept drift and class imbalance. “Concept drift” (or non-stationarity) is changes in the underlying function being learnt, and class imbalance is vast difference between the numbers of instances in different classes of data. Class imbalance is an obstacle for the efficiency of most classifiers and is usually observed in two-class datasets. Previous methods for classifying non-stationary and imbalanced data streams mainly focus on batch solutions, in which the classification model is trained using a chunk of data. Here, we propose an online perceptron model. The main contribution is a new error model inspired from the error model of recursive least square (RLS) filter. In the proposed error model, non-stationarity is handled with the forgetting factor of RLS error model and for handling class imbalance two different errors weighting strategies are proposed. These strategies are verified with convergence and tracking theories from adaptive filters theory. The proposed methods is evaluated on two synthetic and six real-world two-class datasets and compared with seven previous online perceptron models. The results show statistically significant improvement to previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abdulsalam H, Skillicorn DB, Martin P (2011) Classification using streaming random forests. Knowl Data Eng IEEE Trans 23(1):22–36

    Article  Google Scholar 

  • Alippi C, Boracchi G, Roveri M (2011) A just-in-time adaptive classification system based on the intersection of confidence intervals rule. Neural Netw 24(8):791–800

    Article  Google Scholar 

  • Alpaydın E (2010) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge

    MATH  Google Scholar 

  • Cavallanti G, Cesa-Bianchi N, Gentile C (2007) Tracking the best hyperplane with a simple budget Perceptron. Mach Learn 69(2):143–167. doi:10.1007/s10994-007-5003-0

    Article  Google Scholar 

  • Cesa-Bianchi N, Conconi A, Gentile C (2005) A second-order perceptron algorithm. SIAM J Comput 34(3):640–668. doi:10.1137/s0097539703432542

    Article  MathSciNet  MATH  Google Scholar 

  • Chen S, He H (2010) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol Syst 2(1):35–50

    Article  Google Scholar 

  • Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585

    MathSciNet  MATH  Google Scholar 

  • Ditzler G, Polikar R (2010) An ensemble based incremental learning framework for concept drift and class imbalance. Paper presented at the world congress on computational intelligence (WCCI), 18–23 July 2010, Barcelona, Spain

  • Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley-Interscience, New York

    Google Scholar 

  • Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. Neural Netw IEEE Trans 22(10):1517–1531

    Article  Google Scholar 

  • Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. Paper presented at the SIAM international conference, Minneapolis, MN

  • Goldenshluger A, Nemirovski A (1997) On spatial adaptive estimation of nonparametric regression. Math Methods Stat 6:135–170

    MathSciNet  MATH  Google Scholar 

  • Harries M (1999) Splice-2 comparative evaluation: electricity pricing. University of South Wales, NSW, Australia

  • Haykin S (2001) Adaptive filter theory, 4th edn. Prentice Hall, Englewood Cliffs

    Google Scholar 

  • He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  • Martínez-Rego D, Fontenla-Romero O, Alonso-Betanzos A (2012) Nonlinear single layer neural network training algorithm for incremental, nonstationary and distributed learning scenarios. Pattern Recogn 45(12):4536–4546

    Article  MATH  Google Scholar 

  • Masud MM (2009) Adaptive classification of scarcely labeled and evolving data streams. University of Texas, Dallas

  • Masud MM, Jing G, Khan L, Jiawei H, Thuraisingham BM (2011) Classification and novel class detection in concept-drifting data streams under time constraints. Knowl Data Eng IEEE Trans 23(6):859–874

    Article  Google Scholar 

  • NOAA (2010) “Weather data”. http://users.rowan.edu/~polikar/research/NSE/

  • Pavlidis NG, Tasoulis DK, Adams NM, Hand DJ (2011) Landa perceptron: an adaptive classifier for data streams. Pattern Recogn 44(1):78–96

    Article  MATH  Google Scholar 

  • Soda P (2011) A multi-objective optimisation approach for class imbalance learning. Pattern Recogn 44(8):1801–1810

    Article  MATH  Google Scholar 

  • Street NW, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. Paper presented at the 7th ACM SIGKDD international conference on knowledge discovery and data mining, 26–29 Aug 2001, San Francisco, California, USA

  • Sun J, Li H (2011) Dynamic financial distress prediction using instance selection for the disposal of concept drift. Expert Syst Appl 38(3):2566–2576

    Article  Google Scholar 

  • Tahir MA, Kittler J, Yan F (2012) Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn 45(10):3738–3750

    Article  Google Scholar 

  • Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical report: TCD-CS-2004-15. Computer Science Department, Trinity College Dublin, Dublin

  • UCI Repository of Machine Learning Database (2007) School of information and computer science, Irvine, CA: University of California. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Wang J, You J, Li Q, Xu Y (2012) Extract minimum positive and maximum negative features for imbalanced binary classification. Pattern Recogn 45(3):1136–1145

    Article  Google Scholar 

  • Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:60–101

    Google Scholar 

  • Yang Y, Wu X, Zhu X (2006) Mining in anticipation for concept change: proactive-reactive prediction in data streams. Data Min Knowl Discov 13(3):261–289

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adel Ghazikhani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghazikhani, A., Monsefi, R. & Sadoghi Yazdi, H. Recursive least square perceptron model for non-stationary and imbalanced data stream classification. Evolving Systems 4, 119–131 (2013). https://doi.org/10.1007/s12530-013-9076-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-013-9076-7

Keywords

Navigation