Skip to main content
Log in

Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Frequent Itemset mining is time consuming for large datasets. Many parallel frequent itemset mining algorithms have been proposed to speed up the mining process. This paper presents a parallel frequent itemset mining algorithm on a cluster of personal computers. To facilitate parallel frequent itemset mining, we use prefix path based method to decompose a transactional dataset into its frequent 1-itemset sub-datasets. We called the parallel frequent itemset mining algorithm based on the frequent 1-itemset sub-dataset decomposition the single-level parallel frequent itemset mining algorithm (SLPFIM) in our PC cluster platform. To mitigate the bottleneck caused by time-consuming 1-itemset sub-datasets, we propose a multi-level parallel frequent itemset mining (MLPFIM) algorithm to further decompose the time-consuming 1-itemset sub-datasets into their corresponding sub-sub-datasets. The fine granule of the sub-sub-datasets enhances the load balancing in parallel frequent itemset mining. The experimental results showed that the SLPFIM offered a maximum of 11.9x speedup over the non-parallel execution of the FP-Growth algorithm while the MLPFIM achieved a maximum of 23.1x speedup over the non-parallel execution of the FP-Growth algorithm. The experimental results also showed that the MLPFIM offered a maximum of 2.14x speedup over the SLPFIM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. Paper presented at the ACM SIGMOD (1993)

  2. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 1–12 (2000)

    Google Scholar 

  3. Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)

    Google Scholar 

  4. Wur, S.-Y., Leu, Y.: An effective boolean algorithm for mining association rules in large databases. Paper presented at the sixth international conference on database systems for advanced applications, Hsinchu, Taiwan (1999)

  5. Zaiane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. Paper presented at the data mining, 2001. Proceedings IEEE International Conference on ICDM 2001 (2001)

  6. Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl.-Based Syst. 20(4), 329–335 (2007)

    Google Scholar 

  7. Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. Paper presented at the workshop frequent item set mining implementations (FIMI 2003). Melbourne (2003)

  8. Rácz, B.: Nonordfp: an FP-growth variation without rebuilding the FP-tree. paper presented at the 2nd international workshop on frequent itemset mining implementations (FIMI 2004), Brighton (2004)

  9. Goethals, B., Zaki, M.J.: Advances in Frequent Itemset Mining Implementations: Report on FIMI’03. from http://fimi.cs.helsinki.fi (2003)

  10. Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16, 321–334 (2004)

    Google Scholar 

  11. Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. Paper presented at the DaMoN ’09 Proceedings of the fifth international workshop on data management on new hardware Providence, RI, USA, June 28–28 (2009)

  12. Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. (2013)

  13. Zhou, J., Kun-Ming, Y., Bin-Chang, W.: Parallel frequent patterns mining algorithm on GPU. Paper presented at the systems man, 10–13 Oct. 2010 (2010)

  14. Özdogan, G., Abul, O.: Task-parallel FP-growth on cluster computers. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds.) Computer and Information Sciences, vol. 62, pp. 383–388. Springer, Dordrecht (2010)

    Google Scholar 

  15. Pramudiono, I., Kitsuregawa, M.: Parallel FP-growth on PC cluster. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 2637, pp. 467–473. Springer, Berlin (2003)

    Google Scholar 

  16. Yu, K.-M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37, 2486–2494 (2010)

    Google Scholar 

  17. Huang, C.-H., Leu, Y.: A LINQ-based conditional pattern collection algorithm for parallel frequent itemset mining on a multi-core computer. Paper Presented at the Proceedings of the ASE BigData & SocialInformatics 2015, Kaohsiung, Taiwan (2015)

  18. Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. Paper Presented at the Very Large Data Base (2007)

  19. Vu, L., Alaghband, G.: Novel parallel method for mining frequent patterns on multi-core shared memory systems. Paper Presented at the Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems (2013)

  20. Hadoop. (2015). Hadoop. from http://zh.wikipedia.org/wiki/Apache_Hadoop

  21. Farzanyar, Z., Cercone, N.: Accelerating frequent itemsets mining on the cloud: a MapReduce-based approach. Paper Presented at the proceedings of the 2013 IEEE 13th international conference on data mining workshops (2013)

  22. Le, Z., Zhiyong, Z., Jin, C., Junjie, L., Joshua Zhexue, H., Shengzhong, F.: Balanced parallel FP-growth with MapReduce. Paper presented at the information computing and telecommunications (YC-ICT), 2010 IEEE youth conference on, 28–30 Nov. 2010 (2010)

  23. Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. Paper presented at the proceedings of the 2008 ACM conference on recommender systems, Lausanne, Switzerland (2008)

  24. Li, N., Zeng, L., He, Q., Shi, Z.: Parallel Implementation of Apriori Algorithm Based on MapReduce. Paper presented at the Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 2012 13th ACIS International Conference on 8-10 Aug. 2012 (2012)

  25. Moens, S., Aksehirli, E., Goethals, B.: Frequent Itemset Mining for Big Data. Paper presented at the Big Data, 2013 IEEE International Conference on, 6–9 Oct. 2013 (2013)

  26. Xun, Y., Zhang, J., Qin, X., Zhao, X.: FiDoop-DP: data partitioning in frequent itemset mining on hadoop clusters. IEEE Trans. Parallel Distrib. Syst. 28(1), 101–114 (2017)

    Google Scholar 

  27. Chen, Lin, Junzhong, Gu: PFIN: a parallel frequent itemset mining algorithm using nodesets. Int. J. Database Theory Appl. 9(6), 81–92 (2016)

    Google Scholar 

  28. Ozkural, E., Ucar, B., Aykanat, C.: Parallel Frequent Item Set Mining with Selective Item Replication. IEEE Trans. Parallel Distrib. Syst. 22(10), October 20 (2011)

  29. Joy, R., Sherly, K.K.: Parallel frequent itemset mining with spark RDD framework for disease prediction. 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016)

  30. Goethals, B. (2015). FIMI repository. from http://fimi.ua.ac.be/data/

  31. Borgelt, C. (2015). Christian Borgelt’s Web Pages from http://www.borgelt.net/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yungho Leu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, CH., Leu, Y. Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers. Cluster Comput 22 (Suppl 2), 2851–2863 (2019). https://doi.org/10.1007/s10586-017-1609-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1609-6

Keywords

Navigation