Elsevier

Information Sciences

Volume 544, 12 January 2021, Pages 485-499
Information Sciences

Compact structure for sparse undirected graphs based on a clique graph partition

https://doi.org/10.1016/j.ins.2020.09.010Get rights and content

Highlights

Abstract

Compressing real-world graphs has many benefits such as improving or enabling the visualization in small memory devices, graph query processing, community search, and mining algorithms. This work proposes a novel compact representation for real sparse and clustered undirected graphs. The approach lists all the maximal cliques by using a fast algorithm and defines a clique graph based on its maximal cliques. Further, the method defines a fast and effective heuristic for finding a clique graph partition that avoids the construction of the clique graph. Finally, this partition is used to define a compact representation of the input graph. The experimental evaluation shows that this approach is competitive with the state-of-the-art methods in terms of compression efficiency and access times for neighbor queries, and that it recovers all the maximal cliques faster than using the original graph. Moreover, the approach makes it possible to query maximal cliques, which is useful for community detection.

Introduction

A wide variety of real systems are modeled by graphs, including communication, transit, web, social, and biological networks [1], [2]. The process of discovering relevant information from graphs is called graph mining [3]. This is usually a time-consuming task, especially with the current trend of data growth size [4]. The main challenges are triggered by different aspects. This includes the data volume itself, data complexity (i.e., many relationships among the data), and application needs [4]. Several schemes have been proposed for analyzing graphs that aim at understanding the properties and patterns found in them to serve different application purposes. Some known applications include disease analysis [5], community discovery [1], [2], [6], recommender systems [7], graph compression [8], [9], [10], measuring relevance of network actors [11], [12], and network visualization [13], [14]. Recent works on graph mining postulate that dense patterns are prominent and describe different dense substructures. Some examples include maximal cliques [15], [16], communities [17], and others [3], [9], [18], [19]. These substructures have been used for improving network analysis, graph compression [9], [20], and visualization [13].

Given the space required to store and analyze large graphs, the research community has proposed graph compression formats that support basic navigation queries directly over the compressed structure without requiring decompression. This approach enables the simulation of any graph algorithm in the main memory, requiring less space than plain representations. Even though these compressed structures are usually slower than uncompressed representations, they are still attractive in devices with limited memory. This includes devices, such as tablets or cell phones. Moreover, these in-memory representations can provide faster access than plain representations incurring I/O costs [21], [22].

Although there are different types of real-world graphs of interest, this work aims at processing highly clustered and sparse graphs. Clustered graphs contain vertices grouped in highly connected subgraphs. These graphs have high clustering coefficient and transitivity [23] measures. In practice, many real-world graphs are sparse, for example graphs with low degeneracy [16].

This work proposes a compact data structure for clustered sparse undirected graphs that exploits the cliques to represent the edges implicitly. Further, it makes use of the vertex redundancy of the cliques by partitioning them into components that share many vertices. This structure enables neighbor queries, as well as queries for recovering all or subsets of the maximal cliques. Finding maximal cliques is an important step in the clique percolation method (CPM). This has been successfully used for community searches in biological networks [1], social group evolution [24], human disease pattern discovery [5], and computing and visualizing topological features using persistence homology in network analysis [14].

The structure is built on a partition of the clique graph, where each node is a maximal clique in the original graph. The proposed method uses a fast algorithm for listing all maximal cliques and defines an effective heuristic for finding a clique graph partition avoiding the construction of the clique graph. From this, a compact representation of the partitioned clique graph is proposed.

The experimental evaluation shows that the compressed graph representation is competitive with the state-of-the-art methods in terms of compression efficiency for large real graphs, obtaining the smallest representation for clustered graphs. This high compression is achieved, in some cases, at the expense of slower access times when answering neighbor queries. As discussed, in a context of limited memory or steep memory hierarchies, using less space can be of special interest. This may allow the representation to fit into faster memory levels and, in the case of larger datasets, prevents it from being handled on slower ones, such as disks [21], [22]. In addition, according to our knowledge, beside neighbor queries, the structure presented in this study is the first proposal that enables maximal clique queries. This is an important operation for applications that use clique communities. Furthermore, retrieving maximal cliques from the compressed representation is much faster than listing them from the original graph.

The implementation of the proposed method is available at http://www.inf.udec.cl/c̃hernand/sources/cliquecomp/cliquecomp.tgz.

Section snippets

Related work

Boldi and Vigna [25] proposed in 2004 one of the best-known techniques for web graph compression, which offered the best space/time trade-off for many years. They presented the WebGraph framework, which obtains very compact representations of web graphs by exploiting their regularities and statistical properties. More concretely, they exploit the locality of reference, since web pages generally include links to other web pages of the same domain. They also exploited the similarity of the

Proposed method

This section describes a new method for compressing real sparse undirected graphs using a compact data structure that takes advantage of the vertex redundancy of the graph represented by its maximal cliques. In this method, vertex redundancy refers to vertices that belong to multiple maximal cliques. Such vertices can be stored only once to reduce space.

The proposed compression method includes three steps. The first step (clique listing) lists all the maximal cliques of size at least two in the

Query algorithms

This section describes how the main queries are solved using the compact data structure. Algorithm 2 displays a sequential algorithm that retrieves the input graph G in a single pass. The time complexity of the sequential algorithm is O(p=1M|Xp|2·(1+bpup)).

The algorithm goes through each partition p of the compact representation, retrieves all of the edges in that partition and adds all those edges to build E. If a partition Xp contains only one clique, then all the possible edges are

Experimental evaluation

This section describes several experiments to tune and compare our method with the state-of-the-art algorithms for compressing graphs, including version 3.6.1 of WebGraph (WG) [33], the graph compression by BFS from Apostolico and Drovandi (AD) [32], and the k2-tree [20]. The results of the compression efficiency reported by Rossi and Zhou for GraphZIP [36] are also included, although they do not support query operations. All of the experiments ran on a machine with an Intel i7-7500U CPU @

Conclusions

This work introduces a new compact representation of real sparse and clustered undirected graphs based on clique graph partitioning. The method first lists all the maximal cliques of the input graph. Then, it defines a clique graph, whose vertices are the cliques in the original graph. Next, it finds a partition of the clique graph, which is finally encoded in a compressed form using compact data structures.

Our method includes an effective heuristic to find a partition in the clique graph, by

CRediT authorship contribution statement

Felipe Glaria: Conceptualization, Writing - original draft, Software, Visualization. Cecilia Hernández: Conceptualization, Formal analysis, Writing - original draft, Software, Visualization, Writing - review & editing. Susana Ladra: Conceptualization, Formal analysis, Writing - original draft, Writing - review & editing. Gonzalo Navarro: Conceptualization, Formal analysis, Writing - original draft, Writing - review & editing. Lilian Salinas: Formal analysis.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie [grant agreement No 690941]; from the Ministerio de Economía y Competitividad (PGE and ERDF) [Grant Nos. TIN2016-77158-C4-3-R]; from Xunta de Galicia (co-founded with ERDF) [Grant Nos. ED431C 2017/58; ED431G 2019/01]; from the Center for Biotechnology and Bioengineering (CeBiB), Chile; and from the Millennium Institute for Foundational Research on Data

References (46)

  • Chuntao Jiang et al.

    A survey of frequent subgraph mining algorithms

    Knowl. Eng. Rev.

    (2013)
  • Bin Shao et al.

    Managing and mining large graphs: systems and implementations

  • Lidia Fotia. Recommending items in social networks using cliques-based trust. In WOA, pages 51–56,...
  • G. Buehrer et al.

    A scalable pattern mining approach to Web graph compression with communities

  • Cecilia Hernández et al.

    Compressed representations for web and social graphs

    Knowl. Inf. Syst.

    (2014)
  • Natalie Stanley et al.

    Compressing networks with super nodes

    Sci. Rep.

    (2018)
  • Øivind Wang, Nicolai Bodd, Chen Xing, Bård Kvalheim, and Torbjørn Helvik. Enterprise graph search based on object and...
  • Zhipeng Huang et al.

    Meta structure: Computing relevance in large heterogeneous information networks

  • Ryan A. Rossi et al.

    The network data repository with interactive graph analytics and visualization

  • Bastian Rieck et al.

    Clique community persistence: A topological visual analysis approach for complex networks

    IEEE Trans. Visualization Computer Graphics

    (2017)
  • Kazuhisa Makino et al.

    New algorithms for enumerating all maximal cliques

  • David Eppstein, Maarten Löffler, and Darren Strash. Listing all maximal cliques in large sparse real-world graphs. ACM...
  • Charalampos Tsourakakis

    The k-clique densest subgraph problem

  • Cited by (13)

    • The minimum quasi-clique partitioning problem: Complexity, formulations, and a computational study

      2022, Information Sciences
      Citation Excerpt :

      Among the critical problems of obtaining large dense subgraphs, we can highlight the maximum clique problem and the maximum quasi-clique problem. Besides, partitioning graphs into dense subgraphs finds applications in several areas, such as bioinformatics [18], quantum computing [45], data mining [16], and community detection [47,13,48]. The minimum quasi-clique partitioning problem lies in this family of problems.

    • Graph compression based on transitivity for neighborhood query

      2021, Information Sciences
      Citation Excerpt :

      They decomposed a graph into a set of large cliques, and then compressed and represented the graph succinctly. A more successful approach was proposed by [16] based on maximal cliques. This approach lists all the maximal cliques and defines a clique graph based on them.

    • Iterated multilevel simulated annealing for large-scale graph conductance minimization

      2021, Information Sciences
      Citation Excerpt :

      It would be useful to investigate additional strategies to be able to handle both types of graphs. In particular, other graph representations using compact structure [14] may be considered to reduce the space complexity of the algorithm. Fourth, in addition to the studied memetic and local search methods in the literature, it is worthy investigating other metaheuristic-based algorithms to better handle various types of graphs and further enrich the MC-GPP toolkit.

    • MIP formulations for induced graph optimization problems: a tutorial

      2023, International Transactions in Operational Research
    View all citing articles on Scopus
    View full text