Skip to main content
Log in

Novelty detection in data streams

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In massive data analysis, data usually come in streams. In the last years, several studies have investigated novelty detection in these data streams. Different approaches have been proposed and validated in many application domains. A review of the main aspects of these studies can provide useful information to improve the performance of existing approaches, allow their adaptation to new applications and help to identify new important issues to be addresses in future studies. This article presents and analyses different aspects of novelty detection in data streams, like the offline and online phases, the number of classes considered at each phase, the use of ensemble versus a single classifier, supervised and unsupervised approaches for the learning task, information used for decision model update, forgetting mechanisms for outdated concepts, concept drift treatment, how to distinguish noise and outliers from novelty concepts, classification strategies for data with unknown label, and how to deal with recurring classes. This article also describes several applications of novelty detection in data streams investigated in the literature and discuss important challenges and future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Aggarwal CC (2007) Data streams: models and algorithms. Springer, Berlin

    Book  Google Scholar 

  • Aggarwal CC (2013) Outlier analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  • Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th conference on very large data bases, pp 81–92

  • Al-Khateeb T, Masud MM, Khan L, Aggarwal C, Han J, Thuraisingham B (2012a) Stream classification with recurring and novel class detection using class-based ensemble. In: Proceddings of the IEEE 12th international conference on data mining (ICDM ’12). IEEE Computer Society, Washington, DC, USA, pp 31–40

  • Al-Khateeb TM, Masud MM, Khan L, Thuraisingham B (2012) Cloud guided stream classification using class-based ensemble. In: Proceedings of the 2012 IEEE 5th international conference on computing (CLOUD’12). IEEE Computer Society, Washington, DC, USA, pp 694–701

  • Albertini MK, de Mello RF (2007) A self-organizing neural network for detecting novelties. In: Proceedings of the 2007 ACM symposium on applied computing (SAC ’07), pp 462–466

  • Aregui A, Denœux T (2007) Fusion of one-class classifiers in the belief function framework. In: Proceedings of the 10th international conference on information fusion, pp 1–8

  • Bicego M, Figueiredo MAT (2009) Soft clustering using weighted one-class support vector machines. Pattern Recognit 42(1):27–32

    Article  MATH  Google Scholar 

  • Box GEP, Jenkins G (1990) Time series analysis: forecasting and control. Holden-Day, Incorporated, San Francisco

    Google Scholar 

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):58

    Article  Google Scholar 

  • Coull S, Branch J, Szymanski B, Breimer E (2003) Intrusion detection: a bioinformatics approach. In: Proceedings of 19th international conference on computer security applications (ACSAC 2003). Nevada, USA, IEEE Computer Society, Las Vegas, pp 24–33

  • de Faria ER, Goncalves IR, Gama J, Carvalho ACPLF (2015a) Evaluation of multiclass novelty detection algorithms for data streams. Knowl Data Eng, IEEE Trans 27(11):2961–2973. doi:10.1109/TKDE.2015.2441713

  • de Faria ER, Carvalho ACPLF, Gama J (2015b) MINAS: multiclass learning algorithm for novelty detection in data streams. Data Min and Knowl Discov. doi:10.1007/s10618-015-0433-y

  • Dawid AP (1984) Statistical theory: the prequential approach (with discussion). J R Stat Soc A 147:278–292

    Article  MATH  MathSciNet  Google Scholar 

  • Denis F, Gilleron R, Letouzey F (2005) Learning from positive and unlabeled examples. Theor Comput Sci 348(1):70–83

    Article  MATH  MathSciNet  Google Scholar 

  • Dries A, Rückert U (2009) Adaptive concept drift detection. Stat Anal Data Min 2(56):311–327

    Article  MathSciNet  Google Scholar 

  • Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531

    Article  Google Scholar 

  • Faria ER, Gama J, Carvalho ACPLF (2013a) Novelty detection algorithm for data streams multi-class problems. In: Proceedings of the 28th symposium on applied computing (ACM SAC’13), pp 795–800

  • Faria ER, Gonçalves IR, Gama J, Carvalho ACPLF (2013b) Evaluation methodology for multiclass novelty detection algorithms. In: Proceedings of the 2nd Brazilian conference on intelligent systems (BRACIS’13), pp. 19–25

  • Farid DM, Rahman CM (2012) Novel class detection in concept-drifting data stream mining employing decision tree. In: Proceedings of the 7th international conference on electrical computer engineering (ICECE’ 2012), pp 630–633

  • Farid DM, Zhang L, Hossain A, Rahman CM, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst Appl 40(15):5895–5906

    Article  Google Scholar 

  • Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. SIGMOD Rec 34(2):18–26

    Article  Google Scholar 

  • Gama J (2010) Knowledge discovery from data streams, 1st edn. CRC Press Chapman Hall, Boca Raton

    Book  MATH  Google Scholar 

  • Gama J, Sebastião R, Rodrigues PP (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346

    Article  MATH  MathSciNet  Google Scholar 

  • Gaughan G, Smeaton AF (2005) Finding new news: novelty detection in broadcast news. In: Proceedings of the 2nd Asia conference on Asia information retrieval technology (AIRS’05), pp 583–588

  • Gogoi P, Bhattacharyya D, Borah B, Kalita JK (2011) A survey of outlier detection methods in network anomaly identification. Comput J 54(4):570–588

    Article  Google Scholar 

  • Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  • Hayat M, Basiri J, Seyedhossein L, Shakery A (2010) Content-based concept drift detection for email spam filtering. In: Proceedings of the 5th international symposium on telecommunications (IST’10), pp 531–536

  • Hayat MZ, Hashemi MR (2010) A DCT based approach for detecting novelty and concept drift in data streams. In: Proceedings of the international conference on soft computing and pattern recognition (SoCPaR), pp 373–378

  • Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126

    Article  MATH  Google Scholar 

  • Hoffmann H (2007) Kernel PCA for novelty detection. Pattern Recognit 40(3):863–874

    Article  MATH  Google Scholar 

  • Juszczak P, Duin RPW (2004) Combining one-class classifiers to classify missing data. In: Roli F, Kittler J, Windeatt T (eds) Multiple classifier systems. Springer, Berlin, pp 92–101

    Chapter  Google Scholar 

  • Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391

    Article  Google Scholar 

  • Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790

    MATH  Google Scholar 

  • Krawczyk B, Michal W (2013) Incremental learning and forgetting in one-class classifiers for data streams. In: Proceedings of the 8th international conference on computer recognition systems (CORES’ 13), advances in intelligent systems and computing, vol 226, pp 319–328

  • Lee H, Roberts S (2008) On-line novelty detection using the kalman filter and extreme value theory. In: Proceedings of 19th international conference on pattern recognition (ICPR 2008). Tampa, Florida, USA, IEEE, pp 1–4

  • Li X (2006) Improving novelty detection for general topics using sentence level information patterns. In: Proceedings of the 15th ACM international conference on information and knowledge management (CIKM ’06), ACM, pp 238–247

  • Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM’03), pp 179–186

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  MATH  MathSciNet  Google Scholar 

  • MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) 5th Berkeley symposium on mathematical statistics and orobability, vol 1, pp 281–297

  • Markou M, Singh S (2003a) Novelty detection: a review part 1: statistical approaches. Signal Process 83(12):2481–2497

    Article  MATH  Google Scholar 

  • Markou M, Singh S (2003b) Novelty detection: a review part 2: neural network based approaches. Signal Process 83(12):2499–2521

    Article  MATH  Google Scholar 

  • Marrocco C, Simeone P, Tortorella F (2007) A framework for multiclass reject in ECOC classification systems. In: Proceedings of the 15th Scandinavian conference on image analysis (SCIA’07), pp 313–323

  • Marsland S (2003) Novelty detection in learning systems. Neural Comput Surv 3:157–195

    Google Scholar 

  • Marsland S, Shapiro J, Nehmzow U (2002) A self-organising network that grows when required. Neural Netw 15:1041–1058

    Article  Google Scholar 

  • Masud M, Gao J, Khan L, Han J, Thuraisingham BM (2011a) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874

    Article  Google Scholar 

  • Masud MM, Chen Q, Khan L, Aggarwal CC, Gao J, Han J, Thuraisingham BM (2010a) Addressing concept-evolution in concept-drifting data streams. In: Proceedings of the 10th IEEE international conference on data mining (ICDM’10), pp 929–934

  • Masud MM, Gao J, Khan L, Han J, Thuraisingham B (2010b) Classification and novel class detection in data streams with active mining. In: Proceedings of the 14th Pacific-Asia conference on advances in knowledge discovery and data mining—volume Part II (PAKDD’10), pp 311–324

  • Masud MM, Al-Khateeb TM, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B (2011b) Detecting recurring and novel classes in concept-drifting data streams. In: Proceedings of the 11th IEEE international conference on data mining (ICDM ’11), pp 1176–1181

  • Masud MM, Woolam C, Gao J, Khan L, Han J, Hamlen KW, Oza NC (2011c) Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inf Syst 33(1):213–244

    Article  Google Scholar 

  • Menahem E, Rokach L, Elovici Y (2013) Combining one-class classifiers via meta-learning. In: ACM international conference on information and knowledge management (CIKM 2013), p to be appeared

  • Minegishi T, Niimi A (2011) Detection of fraud use of credit card by extended VFDT. In: World congress on internet security (WorldCIS’11), pp 152–159

  • Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc, New York

    MATH  Google Scholar 

  • Nadeem MSA, Zucker JD, Hanczar B (2010) Accuracy-rejection curves (ARCs) for comparing classification methods with a reject option. In: Workshop and conference proceedings on machine learning in systems biology, vol 8, pp 65–81

  • Park CH, Shim H (2010) Detection of an emerging new class using statistical hypothesis testing and density estimation. Int J Pattern Recognit Artif Intell 24(1):1–14

    Article  Google Scholar 

  • Perdisci R, Gu G, Lee W (2006) Using an ensemble of one-class svm classifiers to harden payload-based anomaly detection systems. In: Proceedings of the 6th international conference on data mining (ICDM ’06), pp 488–498

  • Perner P (2008) Concepts for novelty detection and handling based on a case-based reasoning process scheme. Eng Appl Artif Intell 22:86–91

    Article  Google Scholar 

  • Pillai I, Fumera G, Roli F (2011) A classification approach with a reject option for multi-label problems. In: Proceedings of the 16th international conference on image analysis and processing: Part I (ICIAP’11), pp 98–107

  • Pimentel MA, Clifton DA, Clifton L, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249

    Article  Google Scholar 

  • Ramezani R, Angelov P, Zhou X (2008) A fast approach to novelty detection in video streams using recursive density estimation. In: Proceedings of the 4th international IEEE conference on intelligent systems (IS ’08), vol 2, pp 14–2–14–7

  • Rios G, FILHO RH, Coelho ALC (2011) An autonomic security mechanism based on novelty detection and concept drift. In: Proceeding of the 7th international conference on autonomic and autonomous systems

  • Rusiecki A (2012) Robust neural network for novelty detection on data streams. In: Proceedings of the 11th international conference on artificial intelligence and soft computing—volume Part I (ICAISC’12), pp 178–186

  • Schölkopf B, Williamson R, Smola A, Taylor JS, Platt J (2000) Support vector method for novelty detection. Adv Neural Inf Process Syst 12:582–588

    Google Scholar 

  • Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  MATH  Google Scholar 

  • Shyu ML, Sarinnapakorn K, Kuruppu-Appuhamilage I, Chen SC, Chang L, Goldring T (2005) Handling nominal features in anomaly intrusion detection problems. In: Proceedings of the 15th international workshop on research issues in data engineering: stream data mining and applications (RIDE ’05), pp 55–62

  • Silva JA, Faria ER, Barros RC, Hruschka ER, Carvalho ACPLF, Gama J (2014) Data stream clustering: a survey. ACM Comput Surv 46(1):31

    Google Scholar 

  • Singh S, Markou M (2005) A black hole novelty detector for video analysis. Pattern Anal Appl 8(1):102–114

    Article  MathSciNet  Google Scholar 

  • Singh S, Markow M (2004) An approach to novelty detection applied to the classification of image regions. IEEE Trans Knowl Data Eng 16(4):396–407

    Article  Google Scholar 

  • Spinosa EJ, Carvalho ACPLF (2004) SVMs for novel class detection in bioinformatics. In: Proceedings of III Brasilian workshop on bioinformatics (WOB 2004), BrasÃlia, pp 81–88

  • Spinosa EJ, de A C P L F de Carvalho, Gama J (2008) Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM symposium on applied computing (SAC ’08), ACM, pp 976–980

  • Spinosa EJ, Carvalho ACPLF, Gama J (2009) Novelty detection with application to data streams. Intell Data Anal 13(3):405–422

    Google Scholar 

  • Srivastava A (2006) Enabling the discovery of recurring anomalies in aerospace problem reports using high-dimensional clustering techniques. In: IEEE Aerospace conference

  • Tan SC, Ting KM, Liu TF (2011) Fast anomaly detection for streaming data. In: Proceedings of the 22th international joint conference on artificial intelligence—volume 2 (IJCAI’11), pp 1511–1516

  • Tavakkoli A, Nicolescu M, Bebis G (2006) A novelty detection approach for foreground region detection in videos with quasi-stationary backgrounds. In: Proceedings of the 2nd international symposium on visual computing

  • Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009) A detailed analysis of the kdd cup 99 data set. In: IEEE symposium on computational intelligence for security and defense applications, 2009. CISDA 2009, pp 1–6

  • Tax DMJ, Duin RPW (2001) Combining one-class classifiers. In: Proceedings of the 2nd international workshop on multiple classifier systems (MCS ’01), pp 299–308

  • Tax DMJ, Duin RPW (2008) Growing a multi-class classifier with a reject option. Pattern Recognit Lett 29(10):1565–1570

    Article  Google Scholar 

  • Ting KM, Tan SC, Liu FT (2009) Mass: a new ranking measure for anomaly detection. In: Technical report fa2386-09-1-4014, Gippsland School of Information Technology, Monash University

  • Tsymbal A (2004) The problem of concept drift: definitions and related work. In: Technical report TCD-CS-2004-15, Computer Science Department, Trinity College, Dublin

  • Vapnik VN (1998) Statistical learning theory, 1st edn. Wiley, New York

    MATH  Google Scholar 

  • Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceeding of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), pp 226–235

  • Wang W, Guan X, Zhang X (2008) Processing of massive audit data streams for real-time anomaly intrusion detection. Comput Commun 31(1):58–72

    Article  Google Scholar 

  • Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  • Yang Y, Zhang J, Carbonell J, Jin C (2002) Topic-conditioned novelty detection. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’02), pp 688–693

  • Yeung D, Chow C (2002) Parzen-window network intrusion detectors. In: Proceedings of the 16th international conference on pattern recognition, pp 385–388

  • Yeung D, Ding Y (2003) Host-based intrusion detection using dynamic and static behavioral models. Pattern Recognit 36:229–243

    Article  MATH  Google Scholar 

  • Zhang J, Yan Q, Zhang Y, Huang Z (2006) Novel fault class detection based on novelty detection methods.In: Intelligent computing in signal processing and pattern recognition. Lecture notes in control and information sciences, vol 345. Springer, Berlin, pp 982–987

Download references

Acknowledgments

Thanks to European Commission through project MAESTRA (ICT-2013-612944), ERDF through the COMPETE Programme, National Funds through FCT within the project FCOMP - 01-0124-FEDER-022701, and CAPES, CNPq and FAPESP, Brazilian funding agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elaine R. Faria.

Appendix

Appendix

Table 4 Public data sets used in ND in DS

The Table 4 lists the principal public online data sets used in the works referred in this survey. They may be downloaded from the following repositories/sites:

  • UCI Machine Learning Repository - is a large collection data sets that may be used in different kinds of machine learning tasks, such as clustering, classification, pattern recognition, with a wide variety of different application areas. Available at http://archive.ics.uci.edu/ml.

  • KDD Cup Center - annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining. Available at http://www.kdd.org/kddcup/index.php.

  • The NSL-KDD Data Set - is a selected collection of records from the KDD Cup 99 which purpose is to overcome some of the problems of the KDD Cup 99 (Tavallaee et al. 2009). In this site, there are also available different sets for training and testing. Avalaible at http://nsl.cs.unb.ca/NSL-KDD/.

  • Data Mining Tools Repository - tools developed at the UTD data mining lab, headed by Dr. Latifur Khan. Each tool is part of a project, where it is possible to download related published papers, data sets used in the publications and source code of some of the authors algorithms . Available at http://dml.utdallas.edu/Mehedy/.

  • MOA - is an open source framework for DS mining. It includes synthetic data generators for classification and clustering tasks.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Faria, E.R., Gonçalves, I.J.C.R., de Carvalho, A.C.P.L.F. et al. Novelty detection in data streams. Artif Intell Rev 45, 235–269 (2016). https://doi.org/10.1007/s10462-015-9444-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-015-9444-8

Keywords

Navigation