High average-utility sequential pattern mining based on uncertain databases

Lin, Jerry Chun-Wei; Li, Ting; Pirouz, Matin; Zhang, Ji; Fournier-Viger, Philippe

doi:10.1007/s10115-019-01385-8

High average-utility sequential pattern mining based on uncertain databases

Regular Paper
Published: 22 July 2019

Volume 62, pages 1199–1228, (2020)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jerry Chun-Wei Lin ORCID: orcid.org/0000-0001-8768-9709¹,
Ting Li²,
Matin Pirouz³,
Ji Zhang⁴ &
…
Philippe Fournier-Viger⁵

447 Accesses
23 Citations
Explore all metrics

Abstract

The emergence and proliferation of the internet of things (IoT) devices have resulted in the generation of big and uncertain data due to the varied accuracy and decay of sensors and their different sensitivity ranges. Since data uncertainty plays an important role in IoT data, mining the useful information from uncertain dataset has become an important issue in recent decades. Past works focus on mining the high sequential patterns from the uncertain database. However, the utility of a derived sequence increases along with the size of the sequence, which is an unfair measure to evaluate the utility of a sequence since any combination of a high-utility sequence will also be the high-utility sequence, even though the utility of a sequence is merely low. In this paper, we address the limitation of the previous potential high-utility sequential pattern mining and present a potentially high average-utility sequential pattern mining framework for discovering the set of potentially high average-utility sequential patterns (PHAUSPs) from the uncertain dataset by considering the size of a sequence, which can provide a fair measure of the patterns than the previous works. First, a baseline potentially high average-utility sequential pattern algorithm and three pruning strategies are introduced to completely mine the set of the desired PHAUSPs. To reduce the computational cost and accelerate the mining process, a projection algorithm called PHAUP is then designed, which leads to a reduction in the size of candidates of the desired patterns. Several experiments in terms of runtime, number of candidates, memory overhead, number of discovered pattern, and scalability are then evaluated on both real-life and artificial datasets, and the results showed that the proposed algorithm achieves promising performance, especially the PHAUP approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Rashmin Gajera, Suresh Patel, … Ayush Solanki

A Comprehensive Survey on Machine Learning-Based Big Data Analytics for IoT-Enabled Smart Healthcare System

Article 06 January 2021

Wei Li, Yuanbo Chai, … Xingwang Li

A comprehensive survey of data mining

Article 06 February 2020

Manoj Kumar Gupta & Pravin Chandra

Abbreviations

ARM:: Association rule mining
auub :: Average-utility upper-bound value
AU list:: Average-utility list
FIM:: Frequent itemset mining
HAUIM:: High average-utility itemset mining
HAUIs:: High average-utility itemsets
HTWUIs:: High transaction-weighted utilization itemsets
HUIM:: High-utility itemset mining
HUIs:: High-utility itemsets
HUSPs:: High-utility sequential patterns
HUSPM:: High-utility sequential pattern mining
PHAUSPM:: Potential high average-utility sequential pattern mining
PHAUSPs:: Potential high average-utility sequential patterns
PHAUB:: The designed baseline algorithm
PHAUP:: The designed projection-based algorithm
PHAUUBDC:: Potential high average-utility upper-bound downward closure
PHAUUBSPs:: Potential high average-utility upper-bound sequential patterns
PHUSPs:: Potential high-utility sequential patterns
SPM:: Sequential pattern mining
SWDC:: Sequential weighted downward closure
suub :: Sequence utility upper-bound value
TWU:: Transaction-weighted utility
TWDC:: Transaction-weighted down closure
UFIM:: Frequent itemset mining on uncertain databases
UFIs:: Frequent itemsets in uncertain databases
\( \mu \) :: Minimum expected support threshold
\( \delta \) :: Minimum high average-utility threshold

References

Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 29–38
Agrawal R, Imielinski T, Swami AA (1990) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Article Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large database. In: ACM SIGMOD international conference on management of data, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: International conference on very large data bases, pp 619–624
Agrawal R, Srikant R (1995) Mining sequential patterns. In: IEEE international conference on data engineering, pp 3–14
Ahmed CF, Tanbeer SK, Jeong BS (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI J 32(5):676–686
Article Google Scholar
Alkan OK, Karagoz P (2015) CRoM and HuspExt: improving efficiency of high utility sequential pattern extraction. IEEE Trans Knowl Data Eng 27(10):2645–2657
Article Google Scholar
Bernecker T, Kriegel HP, Renz M, Verhein F, Zue A (2009) Probabilistic frequent itemset mining in uncertain databases. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128
Bui N, Vo B, Huynh VN, Lin CW, Nguyen LTT (2016) Mining closed high utility itemsets in uncertain databases. In: Proceedings of the 7th symposium on information and communication technology, pp 7–14
Chau M, Cheng R, Kao B (2005) Uncertain data mining: a new research direction. In: The workshop on the sciences of the artificial, pp 1–8
Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: The Pacific-Asia conference on knowledge discovery and data mining, pp 47–58
Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: European conference on principles of data mining and knowledge discovery, pp 36–40
Chapter Google Scholar
Ge J, Xia Y, Wang J, Nadungodage CH, Prabhakar S (2017) Sequential pattern mining in databases with temporal uncertainty. Knowl Inf Syst 53(3):821–850
Article Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Article MathSciNet Google Scholar
Hong TP, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Article Google Scholar
Lan GC, Hong TP, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Mak 11:1009–1030
Article Google Scholar
Lan GC, Hong TP, Tseng VS, Wang SL (2014) Applying the maximum utility measure in high utility sequential pattern mining. Expert Syst Appl 41(11):5071–5081
Article Google Scholar
Lan Y, Wang Y, Wang Y, Yi S, Yu D (2015) Mining high utility itemsets over uncertain databases. In: International conference on cyber-enabled distributed computing and knowledge discovery, pp 235–238
Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: The Pacific-Asia conference on knowledge discovery and data mining, pp 653-661
Lin CW, Hong TP, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. Lecture Notes Comput Sci 5990:131–139
Article Google Scholar
Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38:7419–7424
Article Google Scholar
Lin JCW, Li T, Fournier-Viger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243
Article Google Scholar
Lin JCW, Ren S, Fournier-Viger P, Hong TP, Su JH, Vo B (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346
Article Google Scholar
Lin JCW, Gan W, Fournier-Viger P, Hong TP (2017) Efficiently mining uncertain high-utility itemsets. Soft Comput 21(11):2801–2820
Article Google Scholar
Lin JCW, Yang L, Fournier-Viger P, Hong TP (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238
Article Google Scholar
Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: The Pacific-Asia conference on knowledge discovery and data mining, pp 689–695
Chapter Google Scholar
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: ACM international conference on information and knowledge management, pp 55–64
Lu T, Vo B, Nguyen HT, Hong TP (2014) A new method for mining high average utility itemsets. Lecture Notes Comput Sci 8838:33–42
Article Google Scholar
Muzammal M, Gohar M, Rahman AU, Qu Q, Ahmad A, Jeon G (2018) Trajectory mining using uncertain sensor data. IEEE Access 6:4895–4903
Article Google Scholar
Muzammal M, Rajeev (2015) Mining sequential patterns from probabilistic databases. In: The Pacific-Asia conference on knowledge discovery and data mining vol 44(2), pp 325–358
Article Google Scholar
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Article Google Scholar
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 273–282
Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. VLDB Endow 5(11):1650–1661
Article Google Scholar
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25:1772–1786
Article Google Scholar
Wang J, Huang J, Chen Y (2016) An efficiently mining high utility sequential patterns. Knowl Inf Syst 49(2):597–627
Article Google Scholar
Wang L, Cheng R, Lee SD, Cheung D (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: ACM international conference on information and knowledge management, pp 429–438
Wang J, Liu F, Jin C (2017) PHUIMUS: a potential high utility itemsets mining algorithm based on stream data with uncertainty. Math Problems Eng, vol 2017, Article ID 8576829, p 13
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Article Google Scholar
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: SIAM international conference on data mining, pp 211–225
Yin J, Zheng Z, Cao L (2012) USpan: an efficient algorithm for mining high utility sequential patterns. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 660–668
Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: IEEE international conference on data mining, pp 1259–1264
Zhang B, Lin JCW, Fournier-Viger P, Li T (2017) Mining of high utility-probability sequential patterns from uncertain databases. PLoS One 12(7):1–21
Google Scholar
Zhao Z, Yan D, Ng W (2014) Mining probabilistically frequent sequential patterns in large uncertain databases. IEEE Trans Knowl Data Eng 26(5):1171–1184
Article Google Scholar
Zida S, Fournier-Viger P, Wu CW, Lin JCW, Tseng VS (2015) Efficient mining of high-utility sequential rules. In: International workshop on machine learning and data mining in pattern recognition, pp 157–171
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Ting Li
Department of Computer Science, California State University, Fresno, USA
Matin Pirouz
School of Agricultural, Computational and Environmental Sciences, University of Southern Queesland, Toowoomba, QLD, Australia
Ji Zhang
School of Natural Sciences and Humanities, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Philippe Fournier-Viger

Authors

Jerry Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ting Li
View author publications
You can also search for this author in PubMed Google Scholar
Matin Pirouz
View author publications
You can also search for this author in PubMed Google Scholar
Ji Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerry Chun-Wei Lin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, J.CW., Li, T., Pirouz, M. et al. High average-utility sequential pattern mining based on uncertain databases. Knowl Inf Syst 62, 1199–1228 (2020). https://doi.org/10.1007/s10115-019-01385-8

Download citation

Received: 26 March 2018
Accepted: 12 July 2019
Published: 22 July 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10115-019-01385-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

High average-utility sequential pattern mining based on uncertain databases

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Comprehensive Survey on Machine Learning-Based Big Data Analytics for IoT-Enabled Smart Healthcare System

A comprehensive survey of data mining

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High average-utility sequential pattern mining based on uncertain databases

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Comprehensive Survey on Machine Learning-Based Big Data Analytics for IoT-Enabled Smart Healthcare System

A comprehensive survey of data mining

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation