Skip to main content
Log in

Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream

  • Multi-Source Data Understanding (MSDU)
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Outliers are a critical factor that affects the accuracy of data-based predictions and some other data-based processing; thus, outliers must be effectively detected as soon as possible to improve the credibility of the data. In recent years, massive outlier detection approaches have been proposed for static data and precise data; however, the uncertainty and weight information of each item was not considered in this prior work. Moreover, traditional outlier detection approaches only take the deviation degree of each data element as the standard for determining outliers; therefore, the detected outliers do not fit the definition of an outlier (i.e., rarely appearing and different from most of the other data). Aimed at these problems, a minimal weighted infrequent itemset mining-based outlier detection approach that can be applied to an uncertain data stream, called MWIFIM–OD–UDS, is proposed in this paper to effectively detect implicit outliers, which have a rarely occurring frequency, uncertainty and a certain weight of the itemset, while the characteristics of the data stream are considered. In particular, a matrix structure-based approach that is called MWIFIM–UDS is proposed to mine the minimal weighted infrequent itemsets (MWiFIs) from an uncertain data stream, and then, the MWIFIM–OD–UDS method is proposed based on the mined MWiFIs and the designed deviation indexes. Experimental results show that the proposed MWIFIM–OD–UDS method outperforms the frequent itemset mining-based outlier detection methods, FindFPOF and LFP, in terms of its runtime and detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Adda M, Wu L, Feng Y (2007) Rare itemset mining. In: Proceedings of the 6th international conference on machine learning and applications, pp 73–80

  2. Aggarwal CC (2013) Managing and mining sensor data. Springer, New York

    Google Scholar 

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB), pp 487–499

  4. Ahmed CF, Tanbeer SK, Jeong BS, Lee YK, Choi HJ (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994

    Google Scholar 

  5. AsSadhan B, Zeb K, Al-Muhtadi J, Alshebeili S (2017) Anomaly detection based on LRD behavior analysis of decomposed control and data planes network traffic using SOSS and FARIMA models. IEEE Access 5:13501–13519

    Google Scholar 

  6. Bai M, Wang X, Xin J, Wang GR (2016) An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 181:19–28

    Google Scholar 

  7. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–104

  8. Cagliero L, Garza P (2014) Infrequent weighted itemset mining using frequent pattern growth. IEEE Trans Knowl Data Eng 26(4):903–915

    Google Scholar 

  9. Cai CH, Fu AWC, Cheng CH, Kwong WW (1998) Mining association rules with weighted items. In: Proceedings of international database engineering and applications symposium (IDEAS’98), pp 68–77

  10. Cai SH, Sun RZ, Cheng CM, Wu G (2017) Exception detection of data stream based on improved maximal frequent itemsets mining. In: Chinese conference on trusted computing and information security, pp 112–125

  11. Cao KY, Wang GR, Han DH, Ding GH, Wang AX, Shi LX (2014) Continuous outlier monitoring on uncertain data streams. J Comput Sci Technol 29(3):436–448

    MathSciNet  Google Scholar 

  12. Cao L, Yang D, Wang Q, Yu Y, Wang J (2014) Scalable distance-based outlier detection over high-volume data streams. In: Proceedings of the 30th IEEE international conference on data engineering (ICDE), pp 76–87

  13. Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Pacific-Asia Conference on knowledge discovery and data mining, pp 47–58

  14. Cuzzocrea A, Leung CKS, MacKinnon RK (2014) Mining constrained frequent itemsets from distributed uncertain data. Future Gener Comput Syst 37:117–126

    Google Scholar 

  15. Haglin DJ, Manning AM (2007) On minimal infrequent itemset mining. In: Proceedings of the 7th international conference on data mining, pp 141–147

  16. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD record, pp 1–12

    Google Scholar 

  17. Hawkins DM (1980) Identification of outliers. Chapman and Hall, London

    MATH  Google Scholar 

  18. He ZY, Xu XF, Huang JZ, Deng SC (2005) FP-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118

    Google Scholar 

  19. Hemalatha CS, Vaidehi V, Lakshmi R (2015) Minimal infrequent pattern based approach for mining outliers in data streams. Expert Syst Appl 42(4):1998–2012

    Google Scholar 

  20. Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowl-Based Syst 121:32–40

    Google Scholar 

  21. Karim MR, Cochez M, Beyan OD, Ahmed CF, Decker S (2018) Mining maximal frequent patterns in transactional databases and dynamic data streams: a spark-based approach. Inf Sci 432:278–300

    MathSciNet  Google Scholar 

  22. Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K, Manolopoulos Y (2016) Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Inf Syst 55:37–53

    Google Scholar 

  23. Lee G, Yun U, Ryang H (2015) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256

    Google Scholar 

  24. Lee G, Yun U, Ryu KH (2017) Mining frequent weighted itemsets without storing transaction ids and generating candidates. Int J Uncertain Fuzziness Knowl-Based Syst 25(01):111–144

    Google Scholar 

  25. Lim Y, Kang U (2017) Time-weighted counting for recently frequent pattern mining in data streams. Knowl Inf Syst 53(2):391–422

    Google Scholar 

  26. Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187

    Google Scholar 

  27. Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250

    Google Scholar 

  28. Lin JCW, Gan W, Fournier-Viger P, Hong TP, Chao HC (2017) Mining weighted frequent itemsets without candidate generation in uncertain databases. Int J Inf Technol Decis Mak 16(06):1549–1579

    Google Scholar 

  29. Liu J, Deng HF (2013) Outlier detection on uncertain data based on local information. Knowl-Based Syst 51:60–71

    Google Scholar 

  30. Mao G, Wu X, Jiang X (2012) Intrusion detection models based on data mining. Int J Comput Intell Syst 5(1):30–38

    Google Scholar 

  31. Park SH, Kim SM, Ha YG (2016) Highway traffic accident prediction using VDS big data analysis. J Supercomput 72(7):2815–2831

    Google Scholar 

  32. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: ACM SIGMOD international conference on management of data, pp 427–438

    Google Scholar 

  33. Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231

    Google Scholar 

  34. Szathmary L, Napoli A, Valtchev P (2007) Towards rare itemset mining. In: 19th IEEE international conference on tools with artificial intelligence (ICTAI), pp 305–312

  35. Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180

    Google Scholar 

  36. Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 661–666

  37. Troiano L, Scibelli G (2014) A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets. Data Min Knowl Discov 28(3):773–807

    MathSciNet  MATH  Google Scholar 

  38. Tsang S, Koh YS, Dobbie G (2011) RP-tree: rare pattern tree mining. In: Proceedings of the 13th international conference on data warehousing and knowledge discovery, pp 277–288

    Google Scholar 

  39. Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on WIT-trees. Expert Syst Appl 40(4):1256–1264

    Google Scholar 

  40. Wang B, Yang XC, Wang GR, Yu G (2010) Outlier detection over sliding windows for probabilistic data streams. J Comput Sci Technol 25(3):389–400

    Google Scholar 

  41. Wang W, Yang J, Yu PS (2004) WAR: weighted association rules for item intensities. Knowl Inf Syst 6:203–229

    Google Scholar 

  42. Yan QY, Xia SX, Feng KW (2012) Probabilistic distance based abnormal pattern detection in uncertain series data. Knowl-Based Syst 36:182–190

    Google Scholar 

  43. Yu JX, Chong Z, Lu H, Zhang Z, Zhou A (2006) A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf Sci 176(14):1986–2015

    Google Scholar 

  44. Yun U, Kim D, Yoon E, Fujita H (2017) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205

    Google Scholar 

  45. Yun U, Leggett JJ (2005) WFIM: weighted frequent itemset mining with a weight range and a minimum weight. In: Proceedings of the 4th SIAM international conference on data mining, pp 636–640

  46. Zhang S, Li X, Zong M, Zhu X, Cheng D (2017) Learning k for knn classification. ACM Trans Intell Syst Technol 8(3):43

    Google Scholar 

  47. Zhang W, Wu J, Yu J (2010) An improved method of outlier detection based on frequent pattern. In: WASE international conference on information engineering (ICIE), pp 3–6

  48. Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. IEEE Trans Cybern 46(2):450–461

    Google Scholar 

  49. Zhu X, Li X, Zhang S, Ju C, Wu X (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1275

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Chinese Universities Scientific Fund under grant number 2017XD001 and the Fundamental Research Funds for the Central Universities under grant number 2018XD004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruizhi Sun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, S., Sun, R., Hao, S. et al. Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream. Neural Comput & Applic 32, 6619–6639 (2020). https://doi.org/10.1007/s00521-018-3876-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3876-4

Keywords

Navigation