Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

Huang, Chun-Hong; Leu, Yungho

doi:10.1007/s10586-017-1609-6

Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

Published: 03 January 2018

Volume 22, pages 2851–2863, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

353 Accesses
3 Citations
Explore all metrics

Abstract

Frequent Itemset mining is time consuming for large datasets. Many parallel frequent itemset mining algorithms have been proposed to speed up the mining process. This paper presents a parallel frequent itemset mining algorithm on a cluster of personal computers. To facilitate parallel frequent itemset mining, we use prefix path based method to decompose a transactional dataset into its frequent 1-itemset sub-datasets. We called the parallel frequent itemset mining algorithm based on the frequent 1-itemset sub-dataset decomposition the single-level parallel frequent itemset mining algorithm (SLPFIM) in our PC cluster platform. To mitigate the bottleneck caused by time-consuming 1-itemset sub-datasets, we propose a multi-level parallel frequent itemset mining (MLPFIM) algorithm to further decompose the time-consuming 1-itemset sub-datasets into their corresponding sub-sub-datasets. The fine granule of the sub-sub-datasets enhances the load balancing in parallel frequent itemset mining. The experimental results showed that the SLPFIM offered a maximum of 11.9x speedup over the non-parallel execution of the FP-Growth algorithm while the MLPFIM achieved a maximum of 23.1x speedup over the non-parallel execution of the FP-Growth algorithm. The experimental results also showed that the MLPFIM offered a maximum of 2.14x speedup over the SLPFIM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Rashmin Gajera, Suresh Patel, … Ayush Solanki

Big data analytics on Apache Spark

Article 13 October 2016

Salman Salloum, Ruslan Dautov, … Joshua Zhexue Huang

MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix

Article 24 April 2021

Pandu Sowkuntla & P. S. V. S. Sai Prasad

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. Paper presented at the ACM SIGMOD (1993)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 1–12 (2000)
Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Google Scholar
Wur, S.-Y., Leu, Y.: An effective boolean algorithm for mining association rules in large databases. Paper presented at the sixth international conference on database systems for advanced applications, Hsinchu, Taiwan (1999)
Zaiane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. Paper presented at the data mining, 2001. Proceedings IEEE International Conference on ICDM 2001 (2001)
Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl.-Based Syst. 20(4), 329–335 (2007)
Google Scholar
Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. Paper presented at the workshop frequent item set mining implementations (FIMI 2003). Melbourne (2003)
Rácz, B.: Nonordfp: an FP-growth variation without rebuilding the FP-tree. paper presented at the 2nd international workshop on frequent itemset mining implementations (FIMI 2004), Brighton (2004)
Goethals, B., Zaki, M.J.: Advances in Frequent Itemset Mining Implementations: Report on FIMI’03. from http://fimi.cs.helsinki.fi (2003)
Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16, 321–334 (2004)
Google Scholar
Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. Paper presented at the DaMoN ’09 Proceedings of the fifth international workshop on data management on new hardware Providence, RI, USA, June 28–28 (2009)
Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. (2013)
Zhou, J., Kun-Ming, Y., Bin-Chang, W.: Parallel frequent patterns mining algorithm on GPU. Paper presented at the systems man, 10–13 Oct. 2010 (2010)
Özdogan, G., Abul, O.: Task-parallel FP-growth on cluster computers. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds.) Computer and Information Sciences, vol. 62, pp. 383–388. Springer, Dordrecht (2010)
Google Scholar
Pramudiono, I., Kitsuregawa, M.: Parallel FP-growth on PC cluster. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 2637, pp. 467–473. Springer, Berlin (2003)
Google Scholar
Yu, K.-M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37, 2486–2494 (2010)
Google Scholar
Huang, C.-H., Leu, Y.: A LINQ-based conditional pattern collection algorithm for parallel frequent itemset mining on a multi-core computer. Paper Presented at the Proceedings of the ASE BigData & SocialInformatics 2015, Kaohsiung, Taiwan (2015)
Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. Paper Presented at the Very Large Data Base (2007)
Vu, L., Alaghband, G.: Novel parallel method for mining frequent patterns on multi-core shared memory systems. Paper Presented at the Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems (2013)
Hadoop. (2015). Hadoop. from http://zh.wikipedia.org/wiki/Apache_Hadoop
Farzanyar, Z., Cercone, N.: Accelerating frequent itemsets mining on the cloud: a MapReduce-based approach. Paper Presented at the proceedings of the 2013 IEEE 13th international conference on data mining workshops (2013)
Le, Z., Zhiyong, Z., Jin, C., Junjie, L., Joshua Zhexue, H., Shengzhong, F.: Balanced parallel FP-growth with MapReduce. Paper presented at the information computing and telecommunications (YC-ICT), 2010 IEEE youth conference on, 28–30 Nov. 2010 (2010)
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. Paper presented at the proceedings of the 2008 ACM conference on recommender systems, Lausanne, Switzerland (2008)
Li, N., Zeng, L., He, Q., Shi, Z.: Parallel Implementation of Apriori Algorithm Based on MapReduce. Paper presented at the Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 2012 13th ACIS International Conference on 8-10 Aug. 2012 (2012)
Moens, S., Aksehirli, E., Goethals, B.: Frequent Itemset Mining for Big Data. Paper presented at the Big Data, 2013 IEEE International Conference on, 6–9 Oct. 2013 (2013)
Xun, Y., Zhang, J., Qin, X., Zhao, X.: FiDoop-DP: data partitioning in frequent itemset mining on hadoop clusters. IEEE Trans. Parallel Distrib. Syst. 28(1), 101–114 (2017)
Google Scholar
Chen, Lin, Junzhong, Gu: PFIN: a parallel frequent itemset mining algorithm using nodesets. Int. J. Database Theory Appl. 9(6), 81–92 (2016)
Google Scholar
Ozkural, E., Ucar, B., Aykanat, C.: Parallel Frequent Item Set Mining with Selective Item Replication. IEEE Trans. Parallel Distrib. Syst. 22(10), October 20 (2011)
Joy, R., Sherly, K.K.: Parallel frequent itemset mining with spark RDD framework for disease prediction. 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016)
Goethals, B. (2015). FIMI repository. from http://fimi.ua.ac.be/data/
Borgelt, C. (2015). Christian Borgelt’s Web Pages from http://www.borgelt.net/

Download references

Author information

Authors and Affiliations

Department of Information Management, National Taiwan University of Science and Technology, 43, Keelung Road, Section 4, Taipei, Taiwan
Chun-Hong Huang & Yungho Leu

Authors

Chun-Hong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yungho Leu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yungho Leu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, CH., Leu, Y. Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers. Cluster Comput 22 (Suppl 2), 2851–2863 (2019). https://doi.org/10.1007/s10586-017-1609-6

Download citation

Received: 26 September 2017
Revised: 10 December 2017
Accepted: 22 December 2017
Published: 03 January 2018
Issue Date: March 2019
DOI: https://doi.org/10.1007/s10586-017-1609-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big data analytics on Apache Spark

MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big data analytics on Apache Spark

MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation