Skip to main content
Log in

ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The Apriori algorithm is one of the most well-known and widely accepted methods for the association rule mining. In Apriori, it uses a prefix tree to represent k-itemsets, generates k-itemset candidates based on the frequent (\(k-1\))-itemsets, and determines the frequent k-itemsets by traversing the prefix tree iteratively based on the transaction records. When k is small, the execution of Apriori is very efficient. However, the execution of Apriori could be very slow when k becomes large because of the deeper recursion depth to determine the frequent k-itemsets. From the perspective of graph computing, the transaction records can be converted to a graph \(G (V,\, E)\), where V is the set of vertices of G that represents the transaction records and E is the set of edges of G that represents the relations among transaction records. Each k-itemset in the transaction records will have a corresponding connected component in G. The number of vertices in the corresponding connected component is the support of the k-itemset. Since the time to find the corresponding connected component of a k-itemset in G is constant for any k, the graph computing method will be very efficient if the number of k-itemsets is relatively small. Based on Apriori and graph computing techniques, a hybrid method, called Apriori and Graph Computing (ANG), is proposed to compute the frequent itemsets. Initially, ANG uses Apriori to compute the frequent k-itemsets and then switches to the graph computing method when k becomes large (where the number of k-itemset candidates is relatively small). The experimental results show that ANG outperforms both Apriori and the graph computing method for all test cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceeding of the SIGMOD’93, pp 207–216

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceeding of the VLDB’94, pp 487–499

  3. Borgelt C, Kruse R (2002) Induction of Association Rules: Apriori Implementation. In: Compstat. Physica-Verlag HD, pp 937–944

  4. Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proceedings of the IEEE ICDM Workshop on Frequent Item Set Mining Implementations. CEUR Workshop Proceedings, 90

  5. Chen R, Shi JX, Chen YZ, Chen HB (2015) Powerlyra: differentiated graph computation and partitioning on skewed graphs. In: Proceedings of the Tenth European Conference on Computer Systems. ACM, pp 1–15

  6. Chen Z, Yang S, Shang Y, Liu Y, Wang F, Wang L, Fu J (2016) Fragment re-allocation strategy based on hypergraph for NoSQL database systems. Int J Grid High Perform Comput 8(3):1–24

    Article  Google Scholar 

  7. Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 219–228

  8. Gonzalez JE, Low YC, Gu HJ, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: OSDI, pp 17–30

  9. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation, pp 599–613

  10. Han WS, Lee S, Park K, Lee JH, Kim MS, Kim J, Yu H (2013) Turbograph: a fast parallel graph engine handling billion-scale graphs in a single pc. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discover and Data Mining, ACM, pp 77–85

  11. http://fimi.ua.ac.be/

  12. https://github.com/solitaryreaper/HadoopApriori

  13. Jain N, Liao G, Willke TL (2013) GraphBuilder: scalable graph etl framework. In: First International Workshop on Graph Data Management Experiences and Systems, ACM, pp 1–6

  14. Kyrola A, Blelloch G, Guestrin C (2012) GraphChi: large-scale graph computation on just a pc. OSDI 12:31–46

    Google Scholar 

  15. Lin M, Lee P, Hsueh S (2012) Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of ICUIMC, ACM, pp 26–30

  16. Lin Z, Kahng M, Sabrin KM, Chau DHP, Lee H, Kang U (2014) Mmap: Fast billion-scale graph computation on a pc via memory mapping. In: IEEE International Conference on Big Data, IEEE, pp 159–164

  17. Li N, Zeng L, He Q, Shi Z (2012) Parallel implementation of apriori algorithm based on MapReduce. In: Proceedings of SNPD, pp 236–241

  18. Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) GraphLab: a new parallel framework for machine learning. In: Conference on Uncertainty in Artificial Intelligence, pp 340–349

  19. Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. In: Proceedings of the VLDB Endowment. pp 716–727

  20. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, pp 135–146

  21. Moens S, Aksehirli E, Goethals B (2012) Frequent itemset mining for big data. In: IEEE International Conference on Big Data, IEEE, pp 111–118

  22. Othman Y, Osman H, Ehab E (2012) An efficient implementation of apriori algorithm based on hadoop- MapReduce model. Int J Rev Comput 12:57–67

    Google Scholar 

  23. Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of the ACM SIGMOD, pp 175–186

  24. Ramakrishnudu T, Subramanyam RBV (2015) Mining interesting infrequent itemsets from very large data based on MapReduce framework. Int J Intell Syst Technol Appl 7(7):44–49

    Google Scholar 

  25. Roy A, Mihailovic I, Zwaenepoel W (2013) X-Stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ACM, pp 472–488

  26. Shun J, Blelloch GE (2013) Ligra: a lightweight graph processing framework for shared memory. In: ACM SIGPLAN Notices, vol 48, ACM, pp 135–146

  27. Shun J, Dhulipala L, Blelloch GE (2015) Smaller and faster: parallel processing of compressed graphs with ligra+. In: Proceedings of the IEEE Data Compression Conference (DCC), pp 403–412

  28. Tian J, Zhang H (2016) A credible cloud service model based on behavior graphs and tripartite decision-making mechanism. Int J Grid High Perform Comput 8(3):39–57

    Article  Google Scholar 

  29. Viswanathan V (2016) Discovery of semantic associations in an RDF graph using bi-directional BFS on massively parallel hardware. Int J Big Data Intell 3(3):176–181

    Article  Google Scholar 

  30. Wang K, Xu G, Su Z, Liu YD (2015) Graphq: Graph query processing with abstraction refinement: scalable and programmable analytics over very large graphs on a single PC. In: USENIX ATC, pp 387–401

  31. Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, ACM, p 2

  32. Yong DL, Pan CT, Chung YC (2001) An efficient hash-based method for discovering the maximal frequent set. In: Proceedings of IEEE International Computer Software and Applications Conference (COMPSAC), IEEE, pp 511–516

  33. Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K (2014) Fast iterative graph computation: a path centric approach. In: High Performance Computing. Networking, Storage and Analysis, IEEE, pp 401–412

  34. Zhu X, Han W, Chen W (2015) GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: USENIX ATC, pp 375–386

  35. Zhu X, Chen W, Zheng W, Ma X (2016) Gemini: a computation-centric distributed graph processing system. In: OSDI, pp 301–316

Download references

Acknowledgements

The work of this paper is partially supported by Shenzhen City Brach Committee under contract 2016-09.01

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yeh-Ching Chung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, R., Chen, W., Hsu, TC. et al. ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining. J Supercomput 75, 646–661 (2019). https://doi.org/10.1007/s11227-017-2049-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2049-z

Keywords

Navigation