Abstract
The Apriori algorithm is one of the most well-known and widely accepted methods for the association rule mining. In Apriori, it uses a prefix tree to represent k-itemsets, generates k-itemset candidates based on the frequent (\(k-1\))-itemsets, and determines the frequent k-itemsets by traversing the prefix tree iteratively based on the transaction records. When k is small, the execution of Apriori is very efficient. However, the execution of Apriori could be very slow when k becomes large because of the deeper recursion depth to determine the frequent k-itemsets. From the perspective of graph computing, the transaction records can be converted to a graph \(G (V,\, E)\), where V is the set of vertices of G that represents the transaction records and E is the set of edges of G that represents the relations among transaction records. Each k-itemset in the transaction records will have a corresponding connected component in G. The number of vertices in the corresponding connected component is the support of the k-itemset. Since the time to find the corresponding connected component of a k-itemset in G is constant for any k, the graph computing method will be very efficient if the number of k-itemsets is relatively small. Based on Apriori and graph computing techniques, a hybrid method, called Apriori and Graph Computing (ANG), is proposed to compute the frequent itemsets. Initially, ANG uses Apriori to compute the frequent k-itemsets and then switches to the graph computing method when k becomes large (where the number of k-itemset candidates is relatively small). The experimental results show that ANG outperforms both Apriori and the graph computing method for all test cases.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceeding of the SIGMOD’93, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceeding of the VLDB’94, pp 487–499
Borgelt C, Kruse R (2002) Induction of Association Rules: Apriori Implementation. In: Compstat. Physica-Verlag HD, pp 937–944
Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proceedings of the IEEE ICDM Workshop on Frequent Item Set Mining Implementations. CEUR Workshop Proceedings, 90
Chen R, Shi JX, Chen YZ, Chen HB (2015) Powerlyra: differentiated graph computation and partitioning on skewed graphs. In: Proceedings of the Tenth European Conference on Computer Systems. ACM, pp 1–15
Chen Z, Yang S, Shang Y, Liu Y, Wang F, Wang L, Fu J (2016) Fragment re-allocation strategy based on hypergraph for NoSQL database systems. Int J Grid High Perform Comput 8(3):1–24
Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 219–228
Gonzalez JE, Low YC, Gu HJ, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: OSDI, pp 17–30
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation, pp 599–613
Han WS, Lee S, Park K, Lee JH, Kim MS, Kim J, Yu H (2013) Turbograph: a fast parallel graph engine handling billion-scale graphs in a single pc. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discover and Data Mining, ACM, pp 77–85
Jain N, Liao G, Willke TL (2013) GraphBuilder: scalable graph etl framework. In: First International Workshop on Graph Data Management Experiences and Systems, ACM, pp 1–6
Kyrola A, Blelloch G, Guestrin C (2012) GraphChi: large-scale graph computation on just a pc. OSDI 12:31–46
Lin M, Lee P, Hsueh S (2012) Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of ICUIMC, ACM, pp 26–30
Lin Z, Kahng M, Sabrin KM, Chau DHP, Lee H, Kang U (2014) Mmap: Fast billion-scale graph computation on a pc via memory mapping. In: IEEE International Conference on Big Data, IEEE, pp 159–164
Li N, Zeng L, He Q, Shi Z (2012) Parallel implementation of apriori algorithm based on MapReduce. In: Proceedings of SNPD, pp 236–241
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) GraphLab: a new parallel framework for machine learning. In: Conference on Uncertainty in Artificial Intelligence, pp 340–349
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. In: Proceedings of the VLDB Endowment. pp 716–727
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, pp 135–146
Moens S, Aksehirli E, Goethals B (2012) Frequent itemset mining for big data. In: IEEE International Conference on Big Data, IEEE, pp 111–118
Othman Y, Osman H, Ehab E (2012) An efficient implementation of apriori algorithm based on hadoop- MapReduce model. Int J Rev Comput 12:57–67
Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of the ACM SIGMOD, pp 175–186
Ramakrishnudu T, Subramanyam RBV (2015) Mining interesting infrequent itemsets from very large data based on MapReduce framework. Int J Intell Syst Technol Appl 7(7):44–49
Roy A, Mihailovic I, Zwaenepoel W (2013) X-Stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ACM, pp 472–488
Shun J, Blelloch GE (2013) Ligra: a lightweight graph processing framework for shared memory. In: ACM SIGPLAN Notices, vol 48, ACM, pp 135–146
Shun J, Dhulipala L, Blelloch GE (2015) Smaller and faster: parallel processing of compressed graphs with ligra+. In: Proceedings of the IEEE Data Compression Conference (DCC), pp 403–412
Tian J, Zhang H (2016) A credible cloud service model based on behavior graphs and tripartite decision-making mechanism. Int J Grid High Perform Comput 8(3):39–57
Viswanathan V (2016) Discovery of semantic associations in an RDF graph using bi-directional BFS on massively parallel hardware. Int J Big Data Intell 3(3):176–181
Wang K, Xu G, Su Z, Liu YD (2015) Graphq: Graph query processing with abstraction refinement: scalable and programmable analytics over very large graphs on a single PC. In: USENIX ATC, pp 387–401
Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, ACM, p 2
Yong DL, Pan CT, Chung YC (2001) An efficient hash-based method for discovering the maximal frequent set. In: Proceedings of IEEE International Computer Software and Applications Conference (COMPSAC), IEEE, pp 511–516
Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K (2014) Fast iterative graph computation: a path centric approach. In: High Performance Computing. Networking, Storage and Analysis, IEEE, pp 401–412
Zhu X, Han W, Chen W (2015) GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: USENIX ATC, pp 375–386
Zhu X, Chen W, Zheng W, Ma X (2016) Gemini: a computation-centric distributed graph processing system. In: OSDI, pp 301–316
Acknowledgements
The work of this paper is partially supported by Shenzhen City Brach Committee under contract 2016-09.01
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, R., Chen, W., Hsu, TC. et al. ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining. J Supercomput 75, 646–661 (2019). https://doi.org/10.1007/s11227-017-2049-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2049-z