Skip to main content
Log in

High average-utility sequential pattern mining based on uncertain databases

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The emergence and proliferation of the internet of things (IoT) devices have resulted in the generation of big and uncertain data due to the varied accuracy and decay of sensors and their different sensitivity ranges. Since data uncertainty plays an important role in IoT data, mining the useful information from uncertain dataset has become an important issue in recent decades. Past works focus on mining the high sequential patterns from the uncertain database. However, the utility of a derived sequence increases along with the size of the sequence, which is an unfair measure to evaluate the utility of a sequence since any combination of a high-utility sequence will also be the high-utility sequence, even though the utility of a sequence is merely low. In this paper, we address the limitation of the previous potential high-utility sequential pattern mining and present a potentially high average-utility sequential pattern mining framework for discovering the set of potentially high average-utility sequential patterns (PHAUSPs) from the uncertain dataset by considering the size of a sequence, which can provide a fair measure of the patterns than the previous works. First, a baseline potentially high average-utility sequential pattern algorithm and three pruning strategies are introduced to completely mine the set of the desired PHAUSPs. To reduce the computational cost and accelerate the mining process, a projection algorithm called PHAUP is then designed, which leads to a reduction in the size of candidates of the desired patterns. Several experiments in terms of runtime, number of candidates, memory overhead, number of discovered pattern, and scalability are then evaluated on both real-life and artificial datasets, and the results showed that the proposed algorithm achieves promising performance, especially the PHAUP approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Abbreviations

ARM:

Association rule mining

auub :

Average-utility upper-bound value

AU list:

Average-utility list

FIM:

Frequent itemset mining

HAUIM:

High average-utility itemset mining

HAUIs:

High average-utility itemsets

HTWUIs:

High transaction-weighted utilization itemsets

HUIM:

High-utility itemset mining

HUIs:

High-utility itemsets

HUSPs:

High-utility sequential patterns

HUSPM:

High-utility sequential pattern mining

PHAUSPM:

Potential high average-utility sequential pattern mining

PHAUSPs:

Potential high average-utility sequential patterns

PHAUB:

The designed baseline algorithm

PHAUP:

The designed projection-based algorithm

PHAUUBDC:

Potential high average-utility upper-bound downward closure

PHAUUBSPs:

Potential high average-utility upper-bound sequential patterns

PHUSPs:

Potential high-utility sequential patterns

SPM:

Sequential pattern mining

SWDC:

Sequential weighted downward closure

suub :

Sequence utility upper-bound value

TWU:

Transaction-weighted utility

TWDC:

Transaction-weighted down closure

UFIM:

Frequent itemset mining on uncertain databases

UFIs:

Frequent itemsets in uncertain databases

\( \mu \) :

Minimum expected support threshold

\( \delta \) :

Minimum high average-utility threshold

References

  1. Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 29–38

  2. Agrawal R, Imielinski T, Swami AA (1990) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925

    Article  Google Scholar 

  3. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large database. In: ACM SIGMOD international conference on management of data, pp 207–216

  4. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: International conference on very large data bases, pp 619–624

  5. Agrawal R, Srikant R (1995) Mining sequential patterns. In: IEEE international conference on data engineering, pp 3–14

  6. Ahmed CF, Tanbeer SK, Jeong BS (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI J 32(5):676–686

    Article  Google Scholar 

  7. Alkan OK, Karagoz P (2015) CRoM and HuspExt: improving efficiency of high utility sequential pattern extraction. IEEE Trans Knowl Data Eng 27(10):2645–2657

    Article  Google Scholar 

  8. Bernecker T, Kriegel HP, Renz M, Verhein F, Zue A (2009) Probabilistic frequent itemset mining in uncertain databases. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128

  9. Bui N, Vo B, Huynh VN, Lin CW, Nguyen LTT (2016) Mining closed high utility itemsets in uncertain databases. In: Proceedings of the 7th symposium on information and communication technology, pp 7–14

  10. Chau M, Cheng R, Kao B (2005) Uncertain data mining: a new research direction. In: The workshop on the sciences of the artificial, pp 1–8

  11. Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: The Pacific-Asia conference on knowledge discovery and data mining, pp 47–58

  12. Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: European conference on principles of data mining and knowledge discovery, pp 36–40

    Chapter  Google Scholar 

  13. Ge J, Xia Y, Wang J, Nadungodage CH, Prabhakar S (2017) Sequential pattern mining in databases with temporal uncertainty. Knowl Inf Syst 53(3):821–850

    Article  Google Scholar 

  14. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  15. Hong TP, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265

    Article  Google Scholar 

  16. Lan GC, Hong TP, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Mak 11:1009–1030

    Article  Google Scholar 

  17. Lan GC, Hong TP, Tseng VS, Wang SL (2014) Applying the maximum utility measure in high utility sequential pattern mining. Expert Syst Appl 41(11):5071–5081

    Article  Google Scholar 

  18. Lan Y, Wang Y, Wang Y, Yi S, Yu D (2015) Mining high utility itemsets over uncertain databases. In: International conference on cyber-enabled distributed computing and knowledge discovery, pp 235–238

  19. Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: The Pacific-Asia conference on knowledge discovery and data mining, pp 653-661

  20. Lin CW, Hong TP, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. Lecture Notes Comput Sci 5990:131–139

    Article  Google Scholar 

  21. Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38:7419–7424

    Article  Google Scholar 

  22. Lin JCW, Li T, Fournier-Viger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243

    Article  Google Scholar 

  23. Lin JCW, Ren S, Fournier-Viger P, Hong TP, Su JH, Vo B (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346

    Article  Google Scholar 

  24. Lin JCW, Gan W, Fournier-Viger P, Hong TP (2017) Efficiently mining uncertain high-utility itemsets. Soft Comput 21(11):2801–2820

    Article  Google Scholar 

  25. Lin JCW, Yang L, Fournier-Viger P, Hong TP (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238

    Article  Google Scholar 

  26. Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: The Pacific-Asia conference on knowledge discovery and data mining, pp 689–695

    Chapter  Google Scholar 

  27. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: ACM international conference on information and knowledge management, pp 55–64

  28. Lu T, Vo B, Nguyen HT, Hong TP (2014) A new method for mining high average utility itemsets. Lecture Notes Comput Sci 8838:33–42

    Article  Google Scholar 

  29. Muzammal M, Gohar M, Rahman AU, Qu Q, Ahmad A, Jeon G (2018) Trajectory mining using uncertain sensor data. IEEE Access 6:4895–4903

    Article  Google Scholar 

  30. Muzammal M, Rajeev (2015) Mining sequential patterns from probabilistic databases. In: The Pacific-Asia conference on knowledge discovery and data mining vol 44(2), pp 325–358

    Article  Google Scholar 

  31. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  32. Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 273–282

  33. Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. VLDB Endow 5(11):1650–1661

    Article  Google Scholar 

  34. Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25:1772–1786

    Article  Google Scholar 

  35. Wang J, Huang J, Chen Y (2016) An efficiently mining high utility sequential patterns. Knowl Inf Syst 49(2):597–627

    Article  Google Scholar 

  36. Wang L, Cheng R, Lee SD, Cheung D (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: ACM international conference on information and knowledge management, pp 429–438

  37. Wang J, Liu F, Jin C (2017) PHUIMUS: a potential high utility itemsets mining algorithm based on stream data with uncertainty. Math Problems Eng, vol 2017, Article ID 8576829, p 13

  38. Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626

    Article  Google Scholar 

  39. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: SIAM international conference on data mining, pp 211–225

  40. Yin J, Zheng Z, Cao L (2012) USpan: an efficient algorithm for mining high utility sequential patterns. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 660–668

  41. Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: IEEE international conference on data mining, pp 1259–1264

  42. Zhang B, Lin JCW, Fournier-Viger P, Li T (2017) Mining of high utility-probability sequential patterns from uncertain databases. PLoS One 12(7):1–21

    Google Scholar 

  43. Zhao Z, Yan D, Ng W (2014) Mining probabilistically frequent sequential patterns in large uncertain databases. IEEE Trans Knowl Data Eng 26(5):1171–1184

    Article  Google Scholar 

  44. Zida S, Fournier-Viger P, Wu CW, Lin JCW, Tseng VS (2015) Efficient mining of high-utility sequential rules. In: International workshop on machine learning and data mining in pattern recognition, pp 157–171

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerry Chun-Wei Lin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, J.CW., Li, T., Pirouz, M. et al. High average-utility sequential pattern mining based on uncertain databases. Knowl Inf Syst 62, 1199–1228 (2020). https://doi.org/10.1007/s10115-019-01385-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01385-8

Keywords

Navigation