Skip to main content
Log in

High-performance parallel frequent subgraph discovery

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Discovery of frequent subgraphs of an input network is one of the most important facilities for mining and analyzing complex networks. The most accurate solution to frequent subgraph discovery is to enumerate all subgraphs of size k and then count the frequency of each isomorphic class. However, the process is much time consuming because the number of subgraphs grows exponentially with the growth of the input network, or by increasing the size of the subgraphs. Also, there is no known polynomial-time algorithm for subgraph isomorphism detection, and this issue makes the problem harder. Hence, the available solutions can just mine small input networks and small subgraph sizes. A parallel and load-balanced solution named Subdigger is proposed which is faster and more efficient compared to available solutions. Subdigger efficiently executes on current multicore and multiprocessor machines, and incorporates a fast heuristic with a high-performance concurrent data structure which significantly accelerates detection and counting of isomorphic subgraphs. Subdigger can also handle large networks and subgraph sizes using external memory and external sorting. We performed several experiments using real-world input networks. Compared to the available solutions, Subdigger can extract frequent subgraphs much faster and the performance scales almost linearly using additional processor cores. The experimental results show that Subdigger can be more than 100 times faster than other solutions on a 4-core Intel i7 machine. Besides performance, Subdigger can process larger subgraphs using external memory while other tools crash due to memory limitation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://github.com/shahrivari/subdigger.

References

  1. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827

    Article  Google Scholar 

  2. Ribeiro P, Silva F (2010) g-Tries: an efficient data structure for discovering network motifs. In: Proceedings of the ACM symposium on applied computing, pp 1559–1566

  3. Kashani ZRM, Ahrabian H, Elahi E, Nowzari-Dalini A, Ansari E, Asadi S, Mohammadi S, Schreiber F, Masoudi-Nejad A (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10(1):318

    Article  Google Scholar 

  4. Wernicke S, Rasche F (2006) FANMOD: a tool for fast network motif detection. Bioinformatics 22(9):1152–1153

    Article  Google Scholar 

  5. Grochow J, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed T, Huang H (eds) Research in computational molecular biology, vol 4453. Springer, Berlin, Heidelberg, pp 92–106

    Chapter  Google Scholar 

  6. Harary F, Palmer E (1967) The enumeration methods of Redfield. Am J Math 89(2):373–384

    Article  MATH  MathSciNet  Google Scholar 

  7. Johnson DS (2005) The NP-completeness column. ACM Trans Algorithms 1(1):160–176

    Article  MathSciNet  Google Scholar 

  8. Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3(4):347–359

    Article  Google Scholar 

  9. Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 631–636

  10. Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746–1758

    Article  Google Scholar 

  11. Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of osns. In: INFOCOM Proceedings IEEE, pp 1–9

  12. Lee C-H, Xu X, Eun DY (2012) Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. ACM SIGMETRICS Perform Eval Rev 40(1):319–330

    Article  Google Scholar 

  13. Schreiber F, Schwbbermeyer H (2004) Towards motif detection in networks: frequency concepts and flexible search. In: Proceedings of the international workshop on network tools and applications in biology, pp 91–102

  14. Rudi AG, Shahrivari S, Jalili S, Kashani ZRM (2013) RANGI: a fast list-colored graph motif finding algorithm. IEEE/ACM Trans Comput Biol Bioinform 10(2):504–513

    Article  Google Scholar 

  15. McKay BD (1981) Practical graph isomorphism. Department of Computer Science, Vanderbilt University

  16. Pietro Cordella L, Foggia P, Sansone C, Vento M (2001) An improved algorithm for matching large graphs. In: 3rd IAPR-TC15 workshop on graph-based representations in pattern recognition, pp 149–159

  17. Junttila T, Kaski P (2007) Engineering an efficient canonical labeling tool for large and sparse graphs. In: Proceedings of the ninth workshop on algorithm engineering and experiments and the fourth workshop on analytic algorithms and combinatorics, pp 135–149

  18. Khakabimamaghani S, Sharafuddin I, Dichter N, Koch I, Masoudi-Nejad A (2013) QuateXelero: an accelerated exact network motif detection algorithm. PloS one 8(7):e68073

    Article  Google Scholar 

  19. Ribeiro P, Silva F, Lopes L (2012) Parallel discovery of network motifs. J Parallel Distrib Comput 72(2):144–154

    Article  Google Scholar 

  20. Wang T, Touchman JW, Zhang W, Suh EB, Xue G (2005) A parallel algorithm for extracting transcriptional regulatory network motifs. In: Fifth IEEE symposium on bioinformatics and bioengineering, pp 193–200

  21. Li X, Stones DS, Wang H, Deng H, Liu X, Wang G (2012) NetMODE: network motif detection without Nauty. PloS one 7(12):e50093

    Article  Google Scholar 

  22. Schatz M, Cooper-Balis E, Bazinet A (2008) Parallel network motif finding. Technical report, University of Maryland, Institute for Advanced Computer Studies

  23. Zhao Z, Khan M, Kumar VSA, Marathe M (2010) Subgraph enumeration in large social contact networks using parallel color coding and streaming. In: 39th international conference on parallel processing, vol 10, pp 594–603

  24. Ribeiro P, Silva F, Lopes L (2010) Efficient parallel subgraph counting using g-tries. In: IEEE international conference on cluster computing (CLUSTER), pp 217–226

  25. Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77

    Article  Google Scholar 

  26. Zhao Z, Wang G, Butt AR, Khan M, Kumar VS, Marathe MV (2012) Sahad: subgraph analysis in massive networks using hadoop. In: Proceedings of IEEE international parallel and distributed processing symposium (IPDPS), pp 390–401

  27. Zhao Z (2012) Subgraph querying in relational networks: a MapReduce approach. In: IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 2502–2505

  28. Cohen J (2009) Graph twiddling in a MapReduce world. Comput Sci Eng 11:29–41

    Article  Google Scholar 

  29. Wu B, Bai Y (2010) An efficient distributed subgraph mining algorithm in extreme large graphs. In: Artificial intelligence and computational intelligence, Springer, pp 107–115

  30. Afrati FN, Fotakis D, Ullman JD (2013) Enumerating subgraph instances using map-reduce. In: Proceedings of IEEE 29th international conference on data engineering (ICDE), pp 62–73

  31. Babai L, Luks EM (1983) Canonical labeling of graphs. In: Proceedings of the fifteenth annual ACM symposium on Theory of computing, pp 171–183

  32. Katebi H, Sakallah K, Markov I (2012) Conflict anticipation in the search for graph automorphisms. In: Bjørner N, Voronkov A (eds) Logic for programming, artificial intelligence, and reasoning, vol 7180. Springer, Berlin, Heidelberg, pp 243–257

  33. Ying L, Ding D (2012) Topology structure and centrality in a java source code. In: International conference on granular computing (GrC), pp 787–789

  34. Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31(1):64–68

    Article  Google Scholar 

  35. Pablo MG, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(04):565–573

    Article  Google Scholar 

  36. Guimerà R, Danon L, D’iaz-Guilera A, Giralt F, Arenas A (2003) Self-similar community structure in a network of human interactions. Phys Rev E 68(6):65103

    Article  Google Scholar 

  37. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–41

  38. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing

  39. Shahrivari S (2014) Beyond batch processing: towards real-time and streaming big data. Computers 3(4):117–129

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeed Jalili.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahrivari, S., Jalili, S. High-performance parallel frequent subgraph discovery. J Supercomput 71, 2412–2432 (2015). https://doi.org/10.1007/s11227-015-1391-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1391-2

Keywords

Navigation