ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining

Zhang, Rui; Chen, Wenguang; Hsu, Tse-Chuan; Yang, Hongji; Chung, Yeh-Ching

doi:10.1007/s11227-017-2049-z

ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining

Published: 18 April 2017

Volume 75, pages 646–661, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Rui Zhang¹,
Wenguang Chen²,
Tse-Chuan Hsu³,
Hongji Yang³ &
…
Yeh-Ching Chung⁴

491 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

The Apriori algorithm is one of the most well-known and widely accepted methods for the association rule mining. In Apriori, it uses a prefix tree to represent k-itemsets, generates k-itemset candidates based on the frequent (\(k-1\))-itemsets, and determines the frequent k-itemsets by traversing the prefix tree iteratively based on the transaction records. When k is small, the execution of Apriori is very efficient. However, the execution of Apriori could be very slow when k becomes large because of the deeper recursion depth to determine the frequent k-itemsets. From the perspective of graph computing, the transaction records can be converted to a graph \(G (V,\, E)\), where V is the set of vertices of G that represents the transaction records and E is the set of edges of G that represents the relations among transaction records. Each k-itemset in the transaction records will have a corresponding connected component in G. The number of vertices in the corresponding connected component is the support of the k-itemset. Since the time to find the corresponding connected component of a k-itemset in G is constant for any k, the graph computing method will be very efficient if the number of k-itemsets is relatively small. Based on Apriori and graph computing techniques, a hybrid method, called Apriori and Graph Computing (ANG), is proposed to compute the frequent itemsets. Initially, ANG uses Apriori to compute the frequent k-itemsets and then switches to the graph computing method when k becomes large (where the number of k-itemset candidates is relatively small). The experimental results show that ANG outperforms both Apriori and the graph computing method for all test cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GA-Apriori: Combining Apriori Heuristic and Genetic Algorithms for Solving the Frequent Itemsets Mining Problem

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

Weighted frequent itemset mining over uncertain databases

Article 08 August 2015

Jerry Chun-Wei Lin, Wensheng Gan, … Vincent S. Tseng

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceeding of the SIGMOD’93, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceeding of the VLDB’94, pp 487–499
Borgelt C, Kruse R (2002) Induction of Association Rules: Apriori Implementation. In: Compstat. Physica-Verlag HD, pp 937–944
Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proceedings of the IEEE ICDM Workshop on Frequent Item Set Mining Implementations. CEUR Workshop Proceedings, 90
Chen R, Shi JX, Chen YZ, Chen HB (2015) Powerlyra: differentiated graph computation and partitioning on skewed graphs. In: Proceedings of the Tenth European Conference on Computer Systems. ACM, pp 1–15
Chen Z, Yang S, Shang Y, Liu Y, Wang F, Wang L, Fu J (2016) Fragment re-allocation strategy based on hypergraph for NoSQL database systems. Int J Grid High Perform Comput 8(3):1–24
Article Google Scholar
Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 219–228
Gonzalez JE, Low YC, Gu HJ, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: OSDI, pp 17–30
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation, pp 599–613
Han WS, Lee S, Park K, Lee JH, Kim MS, Kim J, Yu H (2013) Turbograph: a fast parallel graph engine handling billion-scale graphs in a single pc. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discover and Data Mining, ACM, pp 77–85
http://fimi.ua.ac.be/
https://github.com/solitaryreaper/HadoopApriori
Jain N, Liao G, Willke TL (2013) GraphBuilder: scalable graph etl framework. In: First International Workshop on Graph Data Management Experiences and Systems, ACM, pp 1–6
Kyrola A, Blelloch G, Guestrin C (2012) GraphChi: large-scale graph computation on just a pc. OSDI 12:31–46
Google Scholar
Lin M, Lee P, Hsueh S (2012) Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of ICUIMC, ACM, pp 26–30
Lin Z, Kahng M, Sabrin KM, Chau DHP, Lee H, Kang U (2014) Mmap: Fast billion-scale graph computation on a pc via memory mapping. In: IEEE International Conference on Big Data, IEEE, pp 159–164
Li N, Zeng L, He Q, Shi Z (2012) Parallel implementation of apriori algorithm based on MapReduce. In: Proceedings of SNPD, pp 236–241
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) GraphLab: a new parallel framework for machine learning. In: Conference on Uncertainty in Artificial Intelligence, pp 340–349
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. In: Proceedings of the VLDB Endowment. pp 716–727
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, pp 135–146
Moens S, Aksehirli E, Goethals B (2012) Frequent itemset mining for big data. In: IEEE International Conference on Big Data, IEEE, pp 111–118
Othman Y, Osman H, Ehab E (2012) An efficient implementation of apriori algorithm based on hadoop- MapReduce model. Int J Rev Comput 12:57–67
Google Scholar
Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of the ACM SIGMOD, pp 175–186
Ramakrishnudu T, Subramanyam RBV (2015) Mining interesting infrequent itemsets from very large data based on MapReduce framework. Int J Intell Syst Technol Appl 7(7):44–49
Google Scholar
Roy A, Mihailovic I, Zwaenepoel W (2013) X-Stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ACM, pp 472–488
Shun J, Blelloch GE (2013) Ligra: a lightweight graph processing framework for shared memory. In: ACM SIGPLAN Notices, vol 48, ACM, pp 135–146
Shun J, Dhulipala L, Blelloch GE (2015) Smaller and faster: parallel processing of compressed graphs with ligra+. In: Proceedings of the IEEE Data Compression Conference (DCC), pp 403–412
Tian J, Zhang H (2016) A credible cloud service model based on behavior graphs and tripartite decision-making mechanism. Int J Grid High Perform Comput 8(3):39–57
Article Google Scholar
Viswanathan V (2016) Discovery of semantic associations in an RDF graph using bi-directional BFS on massively parallel hardware. Int J Big Data Intell 3(3):176–181
Article Google Scholar
Wang K, Xu G, Su Z, Liu YD (2015) Graphq: Graph query processing with abstraction refinement: scalable and programmable analytics over very large graphs on a single PC. In: USENIX ATC, pp 387–401
Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, ACM, p 2
Yong DL, Pan CT, Chung YC (2001) An efficient hash-based method for discovering the maximal frequent set. In: Proceedings of IEEE International Computer Software and Applications Conference (COMPSAC), IEEE, pp 511–516
Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K (2014) Fast iterative graph computation: a path centric approach. In: High Performance Computing. Networking, Storage and Analysis, IEEE, pp 401–412
Zhu X, Han W, Chen W (2015) GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: USENIX ATC, pp 375–386
Zhu X, Chen W, Zheng W, Ma X (2016) Gemini: a computation-centric distributed graph processing system. In: OSDI, pp 301–316

Download references

Acknowledgements

The work of this paper is partially supported by Shenzhen City Brach Committee under contract 2016-09.01

Author information

Authors and Affiliations

Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518057, China
Rui Zhang
Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Wenguang Chen
Centre for Creative Computing, BathSpa University, Bath, England
Tse-Chuan Hsu & Hongji Yang
Research Institute of Tsinghua University in Shenzhen, Shenzhen, 518057, China
Yeh-Ching Chung

Authors

Rui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenguang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tse-Chuan Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Hongji Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yeh-Ching Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yeh-Ching Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, R., Chen, W., Hsu, TC. et al. ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining. J Supercomput 75, 646–661 (2019). https://doi.org/10.1007/s11227-017-2049-z

Download citation

Published: 18 April 2017
Issue Date: 06 February 2019
DOI: https://doi.org/10.1007/s11227-017-2049-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining

Abstract

Access this article

Similar content being viewed by others

GA-Apriori: Combining Apriori Heuristic and Genetic Algorithms for Solving the Frequent Itemsets Mining Problem

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

Weighted frequent itemset mining over uncertain databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining

Abstract

Access this article

Similar content being viewed by others

GA-Apriori: Combining Apriori Heuristic and Genetic Algorithms for Solving the Frequent Itemsets Mining Problem

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

Weighted frequent itemset mining over uncertain databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation