High-performance parallel frequent subgraph discovery

Shahrivari, Saeed; Jalili, Saeed

doi:10.1007/s11227-015-1391-2

High-performance parallel frequent subgraph discovery

Published: 28 February 2015

Volume 71, pages 2412–2432, (2015)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Saeed Shahrivari¹ &
Saeed Jalili¹

365 Accesses
6 Citations
Explore all metrics

Abstract

Discovery of frequent subgraphs of an input network is one of the most important facilities for mining and analyzing complex networks. The most accurate solution to frequent subgraph discovery is to enumerate all subgraphs of size k and then count the frequency of each isomorphic class. However, the process is much time consuming because the number of subgraphs grows exponentially with the growth of the input network, or by increasing the size of the subgraphs. Also, there is no known polynomial-time algorithm for subgraph isomorphism detection, and this issue makes the problem harder. Hence, the available solutions can just mine small input networks and small subgraph sizes. A parallel and load-balanced solution named Subdigger is proposed which is faster and more efficient compared to available solutions. Subdigger efficiently executes on current multicore and multiprocessor machines, and incorporates a fast heuristic with a high-performance concurrent data structure which significantly accelerates detection and counting of isomorphic subgraphs. Subdigger can also handle large networks and subgraph sizes using external memory and external sorting. We performed several experiments using real-world input networks. Compared to the available solutions, Subdigger can extract frequent subgraphs much faster and the performance scales almost linearly using additional processor cores. The experimental results show that Subdigger can be more than 100 times faster than other solutions on a 4-core Intel i7 machine. Besides performance, Subdigger can process larger subgraphs using external memory while other tools crash due to memory limitation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

https://github.com/shahrivari/subdigger.

References

Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
Article Google Scholar
Ribeiro P, Silva F (2010) g-Tries: an efficient data structure for discovering network motifs. In: Proceedings of the ACM symposium on applied computing, pp 1559–1566
Kashani ZRM, Ahrabian H, Elahi E, Nowzari-Dalini A, Ansari E, Asadi S, Mohammadi S, Schreiber F, Masoudi-Nejad A (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10(1):318
Article Google Scholar
Wernicke S, Rasche F (2006) FANMOD: a tool for fast network motif detection. Bioinformatics 22(9):1152–1153
Article Google Scholar
Grochow J, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed T, Huang H (eds) Research in computational molecular biology, vol 4453. Springer, Berlin, Heidelberg, pp 92–106
Chapter Google Scholar
Harary F, Palmer E (1967) The enumeration methods of Redfield. Am J Math 89(2):373–384
Article MATH MathSciNet Google Scholar
Johnson DS (2005) The NP-completeness column. ACM Trans Algorithms 1(1):160–176
Article MathSciNet Google Scholar
Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3(4):347–359
Article Google Scholar
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 631–636
Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746–1758
Article Google Scholar
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of osns. In: INFOCOM Proceedings IEEE, pp 1–9
Lee C-H, Xu X, Eun DY (2012) Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. ACM SIGMETRICS Perform Eval Rev 40(1):319–330
Article Google Scholar
Schreiber F, Schwbbermeyer H (2004) Towards motif detection in networks: frequency concepts and flexible search. In: Proceedings of the international workshop on network tools and applications in biology, pp 91–102
Rudi AG, Shahrivari S, Jalili S, Kashani ZRM (2013) RANGI: a fast list-colored graph motif finding algorithm. IEEE/ACM Trans Comput Biol Bioinform 10(2):504–513
Article Google Scholar
McKay BD (1981) Practical graph isomorphism. Department of Computer Science, Vanderbilt University
Pietro Cordella L, Foggia P, Sansone C, Vento M (2001) An improved algorithm for matching large graphs. In: 3rd IAPR-TC15 workshop on graph-based representations in pattern recognition, pp 149–159
Junttila T, Kaski P (2007) Engineering an efficient canonical labeling tool for large and sparse graphs. In: Proceedings of the ninth workshop on algorithm engineering and experiments and the fourth workshop on analytic algorithms and combinatorics, pp 135–149
Khakabimamaghani S, Sharafuddin I, Dichter N, Koch I, Masoudi-Nejad A (2013) QuateXelero: an accelerated exact network motif detection algorithm. PloS one 8(7):e68073
Article Google Scholar
Ribeiro P, Silva F, Lopes L (2012) Parallel discovery of network motifs. J Parallel Distrib Comput 72(2):144–154
Article Google Scholar
Wang T, Touchman JW, Zhang W, Suh EB, Xue G (2005) A parallel algorithm for extracting transcriptional regulatory network motifs. In: Fifth IEEE symposium on bioinformatics and bioengineering, pp 193–200
Li X, Stones DS, Wang H, Deng H, Liu X, Wang G (2012) NetMODE: network motif detection without Nauty. PloS one 7(12):e50093
Article Google Scholar
Schatz M, Cooper-Balis E, Bazinet A (2008) Parallel network motif finding. Technical report, University of Maryland, Institute for Advanced Computer Studies
Zhao Z, Khan M, Kumar VSA, Marathe M (2010) Subgraph enumeration in large social contact networks using parallel color coding and streaming. In: 39th international conference on parallel processing, vol 10, pp 594–603
Ribeiro P, Silva F, Lopes L (2010) Efficient parallel subgraph counting using g-tries. In: IEEE international conference on cluster computing (CLUSTER), pp 217–226
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77
Article Google Scholar
Zhao Z, Wang G, Butt AR, Khan M, Kumar VS, Marathe MV (2012) Sahad: subgraph analysis in massive networks using hadoop. In: Proceedings of IEEE international parallel and distributed processing symposium (IPDPS), pp 390–401
Zhao Z (2012) Subgraph querying in relational networks: a MapReduce approach. In: IEEE international parallel and distributed processing symposium workshops (IPDPSW), pp 2502–2505
Cohen J (2009) Graph twiddling in a MapReduce world. Comput Sci Eng 11:29–41
Article Google Scholar
Wu B, Bai Y (2010) An efficient distributed subgraph mining algorithm in extreme large graphs. In: Artificial intelligence and computational intelligence, Springer, pp 107–115
Afrati FN, Fotakis D, Ullman JD (2013) Enumerating subgraph instances using map-reduce. In: Proceedings of IEEE 29th international conference on data engineering (ICDE), pp 62–73
Babai L, Luks EM (1983) Canonical labeling of graphs. In: Proceedings of the fifteenth annual ACM symposium on Theory of computing, pp 171–183
Katebi H, Sakallah K, Markov I (2012) Conflict anticipation in the search for graph automorphisms. In: Bjørner N, Voronkov A (eds) Logic for programming, artificial intelligence, and reasoning, vol 7180. Springer, Berlin, Heidelberg, pp 243–257
Ying L, Ding D (2012) Topology structure and centrality in a java source code. In: International conference on granular computing (GrC), pp 787–789
Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31(1):64–68
Article Google Scholar
Pablo MG, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(04):565–573
Article Google Scholar
Guimerà R, Danon L, D’iaz-Guilera A, Giralt F, Arenas A (2003) Self-similar community structure in a network of human interactions. Phys Rev E 68(6):65103
Article Google Scholar
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–41
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing
Shahrivari S (2014) Beyond batch processing: towards real-time and streaming big data. Computers 3(4):117–129
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Tarbiat Modares University (TMU), Tehran, Iran
Saeed Shahrivari & Saeed Jalili

Authors

Saeed Shahrivari
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Jalili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeed Jalili.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahrivari, S., Jalili, S. High-performance parallel frequent subgraph discovery. J Supercomput 71, 2412–2432 (2015). https://doi.org/10.1007/s11227-015-1391-2

Download citation

Published: 28 February 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s11227-015-1391-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-performance parallel frequent subgraph discovery

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Parallelizing the dual revised simplex method

An adaptive graph sampling framework for graph analytics

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-performance parallel frequent subgraph discovery

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Parallelizing the dual revised simplex method

An adaptive graph sampling framework for graph analytics

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation