Abstract
Outliers are a critical factor that affects the accuracy of data-based predictions and some other data-based processing; thus, outliers must be effectively detected as soon as possible to improve the credibility of the data. In recent years, massive outlier detection approaches have been proposed for static data and precise data; however, the uncertainty and weight information of each item was not considered in this prior work. Moreover, traditional outlier detection approaches only take the deviation degree of each data element as the standard for determining outliers; therefore, the detected outliers do not fit the definition of an outlier (i.e., rarely appearing and different from most of the other data). Aimed at these problems, a minimal weighted infrequent itemset mining-based outlier detection approach that can be applied to an uncertain data stream, called MWIFIM–OD–UDS, is proposed in this paper to effectively detect implicit outliers, which have a rarely occurring frequency, uncertainty and a certain weight of the itemset, while the characteristics of the data stream are considered. In particular, a matrix structure-based approach that is called MWIFIM–UDS is proposed to mine the minimal weighted infrequent itemsets (MWiFIs) from an uncertain data stream, and then, the MWIFIM–OD–UDS method is proposed based on the mined MWiFIs and the designed deviation indexes. Experimental results show that the proposed MWIFIM–OD–UDS method outperforms the frequent itemset mining-based outlier detection methods, FindFPOF and LFP, in terms of its runtime and detection accuracy.
Similar content being viewed by others
References
Adda M, Wu L, Feng Y (2007) Rare itemset mining. In: Proceedings of the 6th international conference on machine learning and applications, pp 73–80
Aggarwal CC (2013) Managing and mining sensor data. Springer, New York
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB), pp 487–499
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK, Choi HJ (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994
AsSadhan B, Zeb K, Al-Muhtadi J, Alshebeili S (2017) Anomaly detection based on LRD behavior analysis of decomposed control and data planes network traffic using SOSS and FARIMA models. IEEE Access 5:13501–13519
Bai M, Wang X, Xin J, Wang GR (2016) An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 181:19–28
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–104
Cagliero L, Garza P (2014) Infrequent weighted itemset mining using frequent pattern growth. IEEE Trans Knowl Data Eng 26(4):903–915
Cai CH, Fu AWC, Cheng CH, Kwong WW (1998) Mining association rules with weighted items. In: Proceedings of international database engineering and applications symposium (IDEAS’98), pp 68–77
Cai SH, Sun RZ, Cheng CM, Wu G (2017) Exception detection of data stream based on improved maximal frequent itemsets mining. In: Chinese conference on trusted computing and information security, pp 112–125
Cao KY, Wang GR, Han DH, Ding GH, Wang AX, Shi LX (2014) Continuous outlier monitoring on uncertain data streams. J Comput Sci Technol 29(3):436–448
Cao L, Yang D, Wang Q, Yu Y, Wang J (2014) Scalable distance-based outlier detection over high-volume data streams. In: Proceedings of the 30th IEEE international conference on data engineering (ICDE), pp 76–87
Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Pacific-Asia Conference on knowledge discovery and data mining, pp 47–58
Cuzzocrea A, Leung CKS, MacKinnon RK (2014) Mining constrained frequent itemsets from distributed uncertain data. Future Gener Comput Syst 37:117–126
Haglin DJ, Manning AM (2007) On minimal infrequent itemset mining. In: Proceedings of the 7th international conference on data mining, pp 141–147
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD record, pp 1–12
Hawkins DM (1980) Identification of outliers. Chapman and Hall, London
He ZY, Xu XF, Huang JZ, Deng SC (2005) FP-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118
Hemalatha CS, Vaidehi V, Lakshmi R (2015) Minimal infrequent pattern based approach for mining outliers in data streams. Expert Syst Appl 42(4):1998–2012
Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowl-Based Syst 121:32–40
Karim MR, Cochez M, Beyan OD, Ahmed CF, Decker S (2018) Mining maximal frequent patterns in transactional databases and dynamic data streams: a spark-based approach. Inf Sci 432:278–300
Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K, Manolopoulos Y (2016) Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Inf Syst 55:37–53
Lee G, Yun U, Ryang H (2015) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256
Lee G, Yun U, Ryu KH (2017) Mining frequent weighted itemsets without storing transaction ids and generating candidates. Int J Uncertain Fuzziness Knowl-Based Syst 25(01):111–144
Lim Y, Kang U (2017) Time-weighted counting for recently frequent pattern mining in data streams. Knowl Inf Syst 53(2):391–422
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Chao HC (2017) Mining weighted frequent itemsets without candidate generation in uncertain databases. Int J Inf Technol Decis Mak 16(06):1549–1579
Liu J, Deng HF (2013) Outlier detection on uncertain data based on local information. Knowl-Based Syst 51:60–71
Mao G, Wu X, Jiang X (2012) Intrusion detection models based on data mining. Int J Comput Intell Syst 5(1):30–38
Park SH, Kim SM, Ha YG (2016) Highway traffic accident prediction using VDS big data analysis. J Supercomput 72(7):2815–2831
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: ACM SIGMOD international conference on management of data, pp 427–438
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231
Szathmary L, Napoli A, Valtchev P (2007) Towards rare itemset mining. In: 19th IEEE international conference on tools with artificial intelligence (ICTAI), pp 305–312
Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180
Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 661–666
Troiano L, Scibelli G (2014) A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets. Data Min Knowl Discov 28(3):773–807
Tsang S, Koh YS, Dobbie G (2011) RP-tree: rare pattern tree mining. In: Proceedings of the 13th international conference on data warehousing and knowledge discovery, pp 277–288
Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on WIT-trees. Expert Syst Appl 40(4):1256–1264
Wang B, Yang XC, Wang GR, Yu G (2010) Outlier detection over sliding windows for probabilistic data streams. J Comput Sci Technol 25(3):389–400
Wang W, Yang J, Yu PS (2004) WAR: weighted association rules for item intensities. Knowl Inf Syst 6:203–229
Yan QY, Xia SX, Feng KW (2012) Probabilistic distance based abnormal pattern detection in uncertain series data. Knowl-Based Syst 36:182–190
Yu JX, Chong Z, Lu H, Zhang Z, Zhou A (2006) A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf Sci 176(14):1986–2015
Yun U, Kim D, Yoon E, Fujita H (2017) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Yun U, Leggett JJ (2005) WFIM: weighted frequent itemset mining with a weight range and a minimum weight. In: Proceedings of the 4th SIAM international conference on data mining, pp 636–640
Zhang S, Li X, Zong M, Zhu X, Cheng D (2017) Learning k for knn classification. ACM Trans Intell Syst Technol 8(3):43
Zhang W, Wu J, Yu J (2010) An improved method of outlier detection based on frequent pattern. In: WASE international conference on information engineering (ICIE), pp 3–6
Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. IEEE Trans Cybern 46(2):450–461
Zhu X, Li X, Zhang S, Ju C, Wu X (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1275
Acknowledgements
This work was supported in part by the Chinese Universities Scientific Fund under grant number 2017XD001 and the Fundamental Research Funds for the Central Universities under grant number 2018XD004.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cai, S., Sun, R., Hao, S. et al. Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream. Neural Comput & Applic 32, 6619–6639 (2020). https://doi.org/10.1007/s00521-018-3876-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3876-4