Mining Top-K constrained cross-level high-utility itemsets over data streams

Han, Meng; Liu, Shujuan; Gao, Zhihui; Mu, Dongliang; Li, Ang

doi:10.1007/s10115-023-02045-8

Mining Top-K constrained cross-level high-utility itemsets over data streams

Regular Paper
Published: 21 January 2024

Volume 66, pages 2885–2924, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Meng Han¹,
Shujuan Liu¹,
Zhihui Gao¹,
Dongliang Mu¹ &
…
Ang Li¹

165 Accesses
Explore all metrics

Abstract

Cross-Level High-Utility Itemsets Mining (CLHUIM) aims to discover interesting relationships between hierarchy levels by introducing the taxonomy of items. To tackle this issue of the current CLHUIM algorithms encountering a challenge in dealing with large search spaces, researchers have proposed the concept of mining Top-K cross-level high-utility itemsets(CLHUIs). However, the results obtained by these methods often contain redundant itemsets with significant differences in hierarchy levels, and a large proportion of itemsets with higher abstraction levels, making it neglect some detailed information and unable to provide information of itemsets within the specified hierarchy range. Additionally, they are unable to handle dynamic transactional data. To address the aforementioned problems, this paper proposes Top-K Constrained Cross-Level High-Utility Itemsets Mining (TKCCLHM) algorithm to efficiently mine Top-K itemsets across different hierarchy levels over data streams. Firstly, a new hierarchical level concept is introduced to control the abstraction level of the introduced items, and Top-K itemsets are mined within a specific hierarchy range based on this concept. Secondly, a sliding window-based data structure called Sliding Window-based Utility Projection List (SUPL) is designed, which combined with transaction projection techniques to mine CLHUIs efficiently. Lastly, a Batch and Utility Hash Table (BUHT) structure capable of storing batch and (generalized) item utility information is proposed, along with a new threshold raising strategy. Extensive experiments on six datasets with taxonomy information demonstrated that the proposed algorithm exhibited significant improvements in runtime and scalability performance compared to the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of data mining

Article 06 February 2020

Stratified random sampling from streaming and stored data

Article 23 October 2020

MMC: efficient and effective closed high-utility itemset mining

Article 24 May 2024

References

Han M, Zhang N, Wang L, Li XJ, Cheng HD (2023) Mining closed high utility patterns with negative utility in dynamic databases. Appl Intell 53(10):11750–11767
Article Google Scholar
Lin JC-W, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive GA-based model for closed high-utility itemset mining. Appl Soft Comput 108:107422
Article Google Scholar
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255
Article Google Scholar
Srikant R, Agrawal R (1997) Mining generalized association rules. Futur Gener Comput Syst 13(2–3):161–180
Article Google Scholar
Hipp J, Myka A, Wirth R, Güntzer U (2016) A new algorithm for faster mining of generalized association rules. Proceedings of the Principles of Data Mining and Knowledge Discovery: Second European Symposium, PKDD’98 Nantes. Springer, Berlin and Heidelberg, Berlin, pp. 74–82
Sriphaew K, Theeramunkong T (2002) A new method for finding generalized frequent itemsets in generalized association rule mining. In: Proceedings of the ISCC 2002 seventh international symposium on computers and communications. CA: IEEE Computer Society, Los Alamitos, pp. 1040–1045
Zhong M, Jiang T, Hong Y, Yang XH (2019) Performance of multi-level association rule mining for the relationship between causal factor patterns and flash flood magnitudes in a humid area. Geomat Nat Haz Risk 10(1):1967–1987
Article Google Scholar
Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194:68–84
Article Google Scholar
Cagliero L, Chiusano S, Garza P, Ricupero G (2017). Discovering high-utility itemsets at multiple abstraction levels. In: Proceedings of the European conference on advances in databases and information systems. Switzerland: Springer, Cham, pp. 224–234
Fournier-Viger P, Wang Y, Lin JC-W, Luna JM, Ventura S (2020) Mining cross-level high utility itemsets. In: Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Switzerland: Springer, Cham, pp. 858–871
Tung NT, Nguyen LTT, Nguyen TDD, Fournier-Viger P, Nguyen N-T, Vo B (2022) Efficient mining of cross-level high-utility itemsets in taxonomy quantitative databases. Inf Sci 587:41–62
Article Google Scholar
Nouioua M, Wang Y, Fournier-Viger P, Lin JC-W, Wu JM-T (2021) Tkc: mining top-k cross-level high utility itemsets. In: Proceedings of the 2020 international conference on data mining workshops. New York, IEEE, pp. 673–682
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management. Maui, HI, USA pp. 55–64
Fournier-Viger P, Wu C W, Zida S, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of the International symposium on methodologies for intelligent systems. Roskilde, Denmark, pp. 83–92
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Article Google Scholar
Zida S, Fournier-Viger P, Lin JC-W, Wu CW, Tseng VS (2017) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625
Article Google Scholar
Peng A Y, Koh Y S, Riddle P (2017) mHUIMiner: a fast high utility itemset mining algorithm for sparse datasets. In: Proceedings of the advances in knowledge discovery and data mining: 21st pacific-asia conference. Jeju, South Korea pp. 196–207
Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Article Google Scholar
Jiang H, Li X, Wang HJ, Wei JH (2022) Cross-level high utility itemset mining algorithms based on data index structure. J Comput Appl 43(7):2220
Google Scholar
Tung N, Nguyen LT, Nguyen TD, Kozierkiewicz A (2021) Cross-level high-utility itemset mining using multi-core processing. In: Proceedings of the International Conference on Computational Collective Intelligence pp. 467–479
Wang Y (2021) Algorithms for cross-level high utility itemset mining. Herbin Institute of Technology
Wu CW, Shie B-E, Yu PS, Tseng VS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 78–86
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126
Article Google Scholar
Tseng VS, Wu C-W, Fournier-Viger P, Yu PS (2015) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Article Google Scholar
Duong Q-H, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Article Google Scholar
Singh K, Singh SS, Kumar A, Biswas B (2019) TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49:1078–1097
Article Google Scholar
Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165
Article Google Scholar
Sun R, Han M, Zhang CY, Shen MY, Du SY (2021) Mining of top-k high utility itemsets with negative utility. J Intell Fuzzy Syst 40(3):5637–5652
Article Google Scholar
Ashraf M, Abdelkader T, Rady S, Gharib TF (2022) TKN: an efficient approach for discovering top-k high utility itemsets with positive or negative profits. Inf Sci 587:654–678
Article Google Scholar
Wu R, He Z (2018) Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48(10):3429–3445
Article Google Scholar
AHMED C F, TANBEER S K, Jeong B S (2010) Efficient mining of high utility patterns over data streams with a sliding window method. In: Software engineering, artificial intelligence, networking and parallel/distributed computing. Springer, Berlin and Heidelberg, Berlin, pp. 99–113
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231
Article Google Scholar
Baek Y, Yun U, Kim H, Nam H, Kim H, Lin JC-W, Vo B, Pedrycz W (2021) Rhups: mining recent high utility patterns with sliding window–based arrival time control over data streams. ACM Trans Intell Syst Technol (TIST) 12(2):1–27
Article Google Scholar
Jaysawal BP, Huang J-W (2020) SOHUPDS: a single-pass one-phase algorithm for mining high utility patterns over a data stream. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing pp. 490–497
Cheng H, Han M, Zhang N, Wang L, Li XJ (2021) ETKDS: an efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model. J Intell Fuzzy Syst 41(2):3317–3338
Article Google Scholar
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China (62062004) and the Ningxia Natural Science Foundation Project (2023AAC03315).

Author information

Authors and Affiliations

School of Computer Science and Engineering, North Minzu University, Yinchuan, 750021, China
Meng Han, Shujuan Liu, Zhihui Gao, Dongliang Mu & Ang Li

Authors

Meng Han
View author publications
You can also search for this author in PubMed Google Scholar
Shujuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhihui Gao
View author publications
You can also search for this author in PubMed Google Scholar
Dongliang Mu
View author publications
You can also search for this author in PubMed Google Scholar
Ang Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MH contributed to writing—review & editing, supervision, funding acquisition, and resources. SL contributed to conceptualization, methodology, software, and writing—original draft. ZG contributed to data curation, investigation, and visualization. DM contributed to formal analysis and project administration. AL contributed to validation.

Corresponding author

Correspondence to Meng Han.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, M., Liu, S., Gao, Z. et al. Mining Top-K constrained cross-level high-utility itemsets over data streams. Knowl Inf Syst 66, 2885–2924 (2024). https://doi.org/10.1007/s10115-023-02045-8

Download citation

Received: 19 July 2023
Revised: 28 September 2023
Accepted: 07 December 2023
Published: 21 January 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s10115-023-02045-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Top-K constrained cross-level high-utility itemsets over data streams

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

Stratified random sampling from streaming and stored data

MMC: efficient and effective closed high-utility itemset mining

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining Top-K constrained cross-level high-utility itemsets over data streams

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

Stratified random sampling from streaming and stored data

MMC: efficient and effective closed high-utility itemset mining

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation