Skip to main content
Log in

Compressed representations for web and social graphs

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Compressed representations have become effective to store and access large Web and social graphs, in order to support various graph querying and mining tasks. The existing representations exploit various typical patterns in those networks and provide basic navigation support. In this paper, we obtain unprecedented results by finding “dense subgraph” patterns and combining them with techniques such as node orderings and compact data structures. On those representations, we support out-neighbor and out/in-neighbor queries, as well as mining queries based on the dense subgraphs. First, we propose a compression scheme for Web graphs that reduces edges by representing dense subgraphs with “virtual nodes”; over this scheme, we apply node orderings and other compression techniques. With this approach, we match the best current compression ratios that support out-neighbor queries (i.e., nodes pointed from a given node), using 1.0–1.8 bits per edge (bpe) on large Web graphs, and retrieving each neighbor of a node in 0.6–1.0 microseconds (\(\upmu \)s). When supporting both out- and in-neighbor queries, instead, our technique generally offers the best time when using little space. If the reduced graph, instead, is represented with a compact data structure that supports bidirectional navigation, we obtain the most compact Web graph representations (0.9–1.5 bpe) that support out/in-neighbor navigation; yet, the time per neighbor extracted raises to around 5–20 \(\upmu \)s. We also propose a compact data structure that represents dense subgraphs without using virtual nodes. It allows us to recover out/in-neighbors and answer other more complex queries on the dense subgraphs identified. This structure is not competitive on Web graphs, but on social networks, it achieves 4–13 bpe and 8–12 \(\upmu \)s per out/in-neighbor retrieved, which improves upon all existing representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. www.worldwidewebsize.com, on August 6, 2012.

  2. http://newsroom.fb.com/content/default.aspx?NewsAreaId=22, considering June 2012.

  3. http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html.

  4. The term “dense subgraph” appears in the literature with different meanings [41], but in this paper, we use it to mean the described generalization of cliques and bicliques.

  5. Available at www.cse.psu.edu/~madduri/software/GTgraph.

  6. Available at http://micans.org/mcl/.

  7. http://micans.org/mcl/man/mclfaq.html#howfast.

  8. http://micans.org/mcl/man/mclfaq.html#goodinput.

  9. Available at law.dsi.unimi.it.

  10. Available at http://www.dia.uniroma3.it/~drovandi/software.php.

  11. Available at http://law.dsi.unimi.it/webdata/uk-2006-05.

  12. Available at snap.stanford.edu/data.

  13. Available at http://libcds.recoded.cl.

References

  1. Adler M, Mitzenmacher M (2001) Towards compressing web graphs. In: Proceedings of the data compression conference (DCC). Snowbird, UT, pp 203–212

  2. Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, Berlin

    Book  MATH  Google Scholar 

  3. Anh V, Moffat A (2010) Local modeling for webgraph compression. In: Proceedings of the data compression conference (DCC). Snowbird UT, p 519

  4. Apostolico A, Drovandi G (2009) Graph compression by BFS. Algorithms 2(3):1031–1044

    Article  MathSciNet  Google Scholar 

  5. Bader D, Madduri K (2005) Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In: Proceedings of the 12th international high performance computing (HiPC). Goa, India, pp 465–476

  6. Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web spam detection. ACM Trans Web 2(1):2

    Article  Google Scholar 

  7. Boldi P, Vigna S (2004) The Webgraph framework I: compression techniques. In: Proceedings of the 13th international conference on the world wide web (WWW), New York, NY, pp 595–602

  8. Boldi P, Santini M, Vigna S (2009) Permuting web graph. In: The 6th workshop on algorithms and models for the web graph (WAW), Barcelona, Spain, pp 116–126

  9. Boldi P, Rosa M, Santini M, Vigna S (2011) Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th international conference on world wide web (WWW), Hyderabad, India, pp 587–596

  10. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 1(7):107–117

    Google Scholar 

  11. Brisaboa N, Ladra S, Navarro G (2009) K2-trees for compact web graph representation. In: Proceedings of the 16th international symposium on string processing and information retrieval (SPIRE), Saariselkä, Finland, pp 18–30

  12. Brisaboa N, Ladra S, Navarro G (2012) Personal communication including code

  13. Broder A (2000) Min-wise independent permutations: theory and practice. In: Proceedings of the 27th international colloquium on automata, languages and programming (ICALP), Geneva, Italy, p 808

  14. Brohée S, Van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7:488

    Article  Google Scholar 

  15. Bron C, Kerbosch J (1973) Finding all cliques of an undirected graph (Algorithm 457). Commun ACM 16(9):575–576

    Article  MATH  Google Scholar 

  16. Buehrer G, Chellapilla K (2008) A scalable pattern mining approach to web graph compression with communities. In: Proceedings of the international conference on web search and web data mining (WSDM), Palo Alto, CA, pp 95–106

  17. Cha M, Mislove A, Gummadi P (2009) A measurement-driven analysis of information propagation in the Flickr social networking. In: Proceedings of the 20th international conference on world wide web (WWW), Madrid, Spain, pp 721–730

  18. Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of the 4th SIAM international conference on data mining (SDM), Lake Buena Vista, FL

  19. Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Paris, France, pp 219–228

  20. Claude F, Navarro F (2010) Extended compact web graph representations. In: Algorithms and applications. Lecture notes in computer science 6060. Springer, Berlin, pp 77–91

  21. Claude F, Navarro G (2010) Fast and compact web graph representations. ACM Trans Web 4(4):16

    Google Scholar 

  22. Claude F, Navarro G (2008) Practical rank/select queries over arbitrary sequences. In: Proceedings of the 15th international symposium on string processing and information retrieval (SPIRE), Melbourne, Australia, pp 176–187

  23. Claude F, Ladra S (2011) Practical representations for web and social graphs. In: Proceedings of the 20th ACM conference on information and knowledge management (CIKM), Glasgow, UK, pp 1185–1190

  24. Clark D (1996) Compact Pat trees. Ph.D. Thesis, University of Waterloo, Canada

  25. Demetrescu C, Finocchi I, Ribichini A (2006) Trading off space for passes in graph streaming problems. In: Proceedings of the 17th ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 714–723

  26. Donato D, Millozzi S, Leonardi S, Tsaparas P (2005) Mining the inner structure of the web graph. In: Proceedings of the 8th workshop on the web and databases (WebDB), Baltimore, MD, pp 145–150

  27. Dourisboure Y, Geraci F, Pellegrini M (2007) Extraction and classification of dense communities in the web. In: Proceedings of the 16th international conference on world wide web (WWW) Banff, Alberta, Canada, pp 461–470

  28. Gibson D, Kumar R, Tomkins A (2005) Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st international conference on very large data bases (VLDB), Trondheim, Norway, pp 721–732

  29. González R, Grabowski S, Mäkinen V, Navarro G (2005) Practical implementation of rank and select queries. In: Poster Proceedings of the volume of 4th workshop on efficient and experimental algorithms (WEA), Santorini Island, Greece, pp 27–38

  30. Golynski A, Munro J, Rao S (2006) Rank/select operations on large alphabets: a tool for text indexing. In: Proceedings of the seventeenth annual ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 368–373

  31. Grabowski S, Bieniecki W (2010) Tight and simple web graph compression. CoRR abs/006.0809

  32. Grabowski S, Bieniecki W (2011) Merging adjacency lists for efficient web graph compression. Adv Intell Soft Comput 103(1):385–392

    Google Scholar 

  33. Grossi R, Gupta A, Vitter J (2003) High-order entropy-compressed text indexes. In: Proceedings of the 14th annual ACM-SIAM symposium on discrete algorithms (SODA), Baltimore, MD, pp 841–850

  34. Hasan M, Salem S, Zaki M (2011) SimClus: an effective algorithm for clustering with a lower bound on similarity. Knowl Inf Syst 28(3):665–685

    Google Scholar 

  35. Hernández C, Navarro G (2011) Compression of web and social graphs supporting neighbor and community queries. In: Proceedings of the 6th ACM workshop on social network mining and analysis (SNAKDD), San Diego, CA

  36. Hernández C, Navarro G (2012) Compressed representation of web and social networks via dense subgraphs. In: Proceedings of the 19th international symposium on string processing and information retrieval (SPIRE), Cartagena de Indias, Colombia, pp 264–276

  37. Katarzyna M, Przemyslaw K, Piotr B (2009) User position measures in social networks. In: Proceedings of the 4th ACM workshop on social network mining and analysis (SNAKDD), Paris, France, pp 1–9

  38. Kleinberg J (1999) Authoritative sources in a hyperlinked environment. JACM 46(5):604–632

    Article  MATH  MathSciNet  Google Scholar 

  39. Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Trawling the web for emerging cyber-communities. Comput Netw 31(11):1481–1493

    Article  Google Scholar 

  40. Larsson N, Moffat A (1999) Offline dictionary-based compression. In: Proceedings of the data compression conference (DCC), Snowbird, Utah, pp 296–305

  41. Lee V, Ruan N, Jin R, Aggarwal C (2010) A survey of algorithms for dense subgraph discovery. Manag Min Graph Data 2010:303–336

    Article  Google Scholar 

  42. Macropol K, Singh A (2010) Scalable discovery of best clusters on large graphs. PVLDB J 3(1):693–702

    Google Scholar 

  43. Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 533–542

  44. Mcpherson J, Ma K, Ogawa M (2005) Discovering parametric clusters in social small-world graphs. In: Proceedings of the ACM symposium on applied computing, Santa Fe, New Mexico, USA

  45. Mislove A, Marcon M, Gummadi P, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the internet measurement conference (IMC), San Diego, CA, pp 29–42

  46. Mishra R, Shukla S, Arora D, Kumar M (2011) An effective comparison of graph clustering algorithms via random graphs. Int J Comput Appl 22(1):22–27

    Google Scholar 

  47. Morik K, Kaspari A, Wurst M (2012) Multi-objective frequent termset clustering. Knowl Inf Syst 30(3):715–738

    Article  Google Scholar 

  48. Randall K, Stata R, Wiener J, Wickremesinghe R (2002) The link database: fast access to graphs of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 122–131

  49. Raman R, Raman V, Rao S (2002) Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proceedings of the 13th annual ACM-SIAM symposium on discrete algorithms (SODA), San Francisco, CA, pp 233–242

  50. Saito H, Toyoda M, Kitsuregawa M, Aihara K (2007) A large-scale study of link spam detection by graph algorithms. In: Proceedings of adversarial information retrieval on the web (AIRWeb), Banff, Alberta, Canada

  51. Saito K, Kimura M, Ohara K, Motoda H (2012) Efficient discovery of influential nodes for SIS models in social networks. Knowl Inf Syst 30(3):613–635

    Article  Google Scholar 

  52. Suel T, Yuan J (2001) Compressing the graph structure of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 213–222

  53. Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on the world wide web (WWW), Hyderabad, India, pp 607–614

  54. Van Dongen, S (2000) Graph clustering by flow simulation. Ph.D. Thesis, University of Utrecht, The Netherlands

  55. Van Dongen S (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121–141

    Article  MATH  MathSciNet  Google Scholar 

  56. Vitter J (2001) External memory algorithms and data structures: dealing with massive data. ACM Comput Surv 33(2):209–271

    Article  Google Scholar 

  57. Zhuge H (2009) Communities and emerging semantics in semantic link network: discovery and learning. IEEE Trans Knowl Data Eng 21(6):785–799

    Google Scholar 

Download references

Acknowledgments

Partially funded by Millennium Nucleus Information and Coordination in Networks ICM/FIC P10-024F, and Fondecyt Grant 1-110066, Chile.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cecilia Hernández.

Additional information

Partial versions of this article appeared in Proc. SNA-KDD 2011 and Proc. SPIRE 2012.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hernández, C., Navarro, G. Compressed representations for web and social graphs. Knowl Inf Syst 40, 279–313 (2014). https://doi.org/10.1007/s10115-013-0648-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0648-4

Keywords

Navigation