Graph reconstruction in the congested clique

https://doi.org/10.1016/j.jcss.2020.04.004Get rights and content

Abstract

In this paper we study the reconstruction problem in the congested clique model. Given a class of graphs G, the problem is defined as follows: if GG, then every node must reject; if GG, then every node must end up knowing all the edges of G. The cost of an algorithm is the total number of bits received by any node through one link. It is not difficult to see that the cost of any algorithm that solves this problem is Ω(log|Gn|/n), where Gn is the subclass of all n-node labeled graphs in G. We prove that the lower bound is tight and that it is possible to achieve it with only 2 rounds.

Introduction

The congested clique model –a message-passing model of distributed computing introduced by Lotker, Patt-Shamir, Pavlov, and Peleg [26]– is receiving increasingly more attention [5], [12], [13], [14], [16], [20]. In the congested clique, the underlying communication network is the complete graph. Therefore, in this network of diameter 1, the issue of locality is taken out of the picture. The focus is set on congestion, which is, together with locality, the main issue in distributed computing. The point is the following: if the communication network is a complete graph and the cost of local computation is ignored, then the only obstacle to perform any task is due to congestion alone.

Despite the theoretical motivation of the congested clique model, examples of distributed and parallel systems, where the efficiency depends heavily on the bandwidth and therefore might benefit from our results, are becoming increasingly common. For instance, in [17], the authors show that fast algorithms in the congested clique model can be translated into fast algorithms in the MapReduce model. MapReduce is a well-known, popular parallel-programming framework for processing large scale data [6]. Other similar systems are Pregel [27], Spark [35], Hadoop [34], Dryad [19], etc.

Many theoretical models, aiming to bridge the gap between theory and previously mentioned softwares, have emerged: the Massively Parallel Communication model [2], the MapReduce Computation model [22], the k-Machine model [23]. There is also a precursory model: the Bulk-Synchronous Parallel model of Valiant [33]. All these models are very similar, but not completely identical, to the congested clique model.

The congested clique model is defined as follows. There are n nodes which are given distinct identities (IDs), that we assume for simplicity to be numbers between 1 and n. In this paper we consider the situation where the joint input to the nodes is a graph G. More precisely, each node v receives as input an n-bit boolean vector xv{0,1}n, which is the indicator function of its neighborhood in G. Note that the input graph G is an arbitrary n-node graph, a subgraph of the communication network Kn. Nodes execute an algorithm, communicating with each other in synchronous rounds and their goal is to compute some function f that depends on G. In every round, each of the n nodes may send up to n1 different b-bit messages through each of its n1 communication links. When an algorithm stops every node must know f(G). We call f(G) the output of the distributed algorithm. The parameter b is known as the bandwidth of the algorithm. We denote by R the number of rounds. The product Rb corresponds to the total number of bits received by a node through one link, and we call it the cost of the algorithm.

An algorithm may be deterministic or randomized. We distinguish two sub-cases of randomized algorithms: the private-coin setting, where each node flips its own coin; and the public-coin setting, where the coin is shared between all nodes. An ε-error algorithm A that computes a function f is a randomized algorithm such that, for every input graph G, Pr(Aoutputsf(G))1ε. In the case where ε0 as n, we say that A computes f with high probability (w.h.p.). Recall that in this model all nodes compute the same output; therefore, when we run the algorithm either all nodes compute the right answer or none of them does.

The function f defines the problem to be solved; a 0/1-valued function corresponds to a decision problem (such as connectivity [16]). For other, more general types of problems, f may be more appropriately defined as a relation. This happens, for instance, when we want to construct a minimum spanning tree [14], [20], a maximal independent set [13], a 3-ruling set [18], all-pairs shortest-paths [5], etc. We remark that in these references the model does not consider the restriction that all vertices have to know f(G) when the algorithm stops. For instance, in the minimum spanning tree problem each vertex has to output only the subset of its incident edges that belong to the tree.

The most difficult problem one could attempt to solve is the reconstruction problem, where nodes are asked to reconstruct the input graph G. In fact, if at the end of the algorithm every node has full knowledge of G, then, since nodes have unbounded computational power, they could answer any question concerning G.

In centralized, classical graph algorithms, a widely used approach to cope with NP-hardness is to restrict the class of graphs where the input G belongs. We use an analogous approach here, in the congested clique model. But, as we explain later, surprisingly, the complexity of the reconstruction problem will only depend on the cardinality of the subclass of n-node graphs in G.

We introduce two problems, each defined for a fixed set of graphs G. The first one, the strong reconstruction problem G-Strong-Rec, is the following. Recall that the output is computed by every node of the network. In other words, every node of an algorithm that solves G-Strong-Rec must end up knowing whether G belongs to G; and, in the positive cases, every node also finishes knowing all the edges of G.

We also define a weak reconstruction problem G-Weak-Rec. This is a promise problem, where the input graph G is promised to belong to G. In other words, for graphs that do not belong to G, the behavior of an algorithm that solves G-Weak-Rec does not matter. What is already known about the (strong) reconstruction problem? The answer to this question is that there exist mainly three different types of algorithms for tackling this problem.

First, we have Lenzen's algorithm, which performs a load balancing procedure [24]. It simply distributes all the edges among the nodes, and then broadcasts everything. Therefore, if the class of input graphs G consists of a subset of sparse graphs then such algorithm solves the reconstruction problem very quickly. For instance, if the input graph G contains O(n) edges, then Lenzen's algorithm reconstruct G in a constant number of rounds with bandwidth O(logn).

The second approach has been devised for graphs with bounded degeneracy. Recall that a graph G is d-degenerate if one can remove from G a vertex r of degree at most d, and then proceed recursively on the resulting graph G=Gr, until obtaining the empty graph. Bounded genus graphs such as planar graphs, bounded tree-width graphs, graphs without a fixed graph H as a minor, are all examples of classes of graphs with bounded degeneracy. In [3], [29] the authors exhibit a one-round deterministic algorithm that solves G-Strong-Rec using bandwidth O(dlogn)=O(logn), where G is the class of d-degenerate graphs.

Note that the class of d-degenerate graphs is simultaneously sparse and hereditary. A class G is hereditary if, for every graph GG, every induced subgraph of G also belongs to G. Many graph classes are hereditary: forests, planar graphs, bipartite graphs, k-colorable graphs, bounded tree-width graphs, etc. [4]. Moreover, any intersection class of graphs –such as interval graphs, chordal graphs, unit disc graphs, etc.– is also hereditary [4].

The third type of result is related to the reconstruction of particular hereditary graph classes. Authors have studied hereditary classes defined by one fixed forbidden graph H. A graph G is called H-free if H is not an induced subgraph of G. In [21] the authors studied the (one-round) reconstruction problem of H-free classes, for all possible graphs H. In particular, they consider the case when H is the path of 4 nodes P4. They exhibited a one-round, public coin algorithm for solving G-Strong-Rec using bandwidth O(logn), where G is the class of P4-free graphs, known as cographs.

Finally, one can also define a class of graph by forbidding H as a subgraph (instead of forbidding it as an induced subgraph). Any class defined like this is also hereditary. In [7], the authors prove that the degeneracy of the class of graphs defined by forbidding the subgraph H is upper bounded by 4ex(n,H)/n, where ex(n,H) is the Turán number of H, defined as the maximum number of edges in an n-node graph not containing an isomorphic copy of H as a subgraph. Therefore, the class of graphs defined by a forbidden subgraph H can be reconstructed deterministically in one round with bandwidth O((ex(n,H)logn)/n).

In this short subsection we explain a key aspect of this paper. For any positive integer n, let us define the set Gn as the set of all n-node graphs in G.

There is an obvious lower bound for Rb, even for the weak reconstruction problem G-Weak-Rec and even in the public-coin setting. In fact,Rb=Ω(log|Gn|/n).

This can be seen by noting that, in the randomized case, there must be at least one outcome of the coin tosses for which the correct algorithm reconstructs the input graph in at least (1ε) of the cases. In fact, if this was not true, then we would be contradicting the correctness of the algorithm, which states that for every input graph, in at least (1ε) of the coin tosses, the answer is correct. A useful way to see this is to visualize a matrix where the rows correspond to coin tosses, the columns to input graphs, and each entry is the output given by the algorithm to the corresponding graph and the corresponding sequence of coin tosses.

Now we count the number of bits received by any node. By previous argument, it must be Ω(log((1ε)|Gn|))=Ω(log|Gn|). This holds because in at least one outcome of the coin tosses, the nodes must reconstruct that many different graphs.

On the other hand, the total number of bits received by any node v of the network is (n1)Rb+n. In fact, (n1)Rb bits are received from the other nodes and n bits are known by v at the beginning of the algorithm. Therefore, putting all together, we have n+(n1)Rb=Ω(log|Gn|). This implies that Rb=Ω(log|Gn|/n).

In this paper we prove that this bound is essentially tight, even with R=1 (if G is an hereditary class of graphs) and R=2 (in the general case).

In Section 3 we give, for every hereditary class of graphs G, a one-round private-coin randomized algorithm that solves G-Strong-Rec with bandwidth O(maxk[n]log|Gk|/k+logn).

We emphasize that our algorithm runs in one round, and therefore it runs in the broadcast congested clique, a restricted version of the congested clique model where, in every round, the n1 messages sent by a node must be the same. This equivalence –which is a consequence of the requirement that all nodes must compute the output after one round– is proved in Section 2. We also remark that for many hereditary graph classes our algorithm is tight. More precisely, our result implies that G-Strong-Rec can be solved in one round with bandwidth O(logn) when G is the class of forests, planar graphs, d-degenerate graphs, interval graphs, unit-circle graphs, or any other hereditary graph class G such that |Gn|=2O(nlogn).

In Section 4 we give a very general result, showing that two rounds are sufficient to solve G-Strong-Rec in the congested clique model, for any set of graphs G. More precisely, we provide a two-round deterministic algorithm that solves G-Weak-Rec and a two-round private-coin randomized algorithm that solves G-Strong-Rec w.h.p. We also give a three-round deterministic algorithm solving G-Strong-Rec. All algorithms run using bandwidth O(log|Gn|/n+logn), so they are asymptotically optimal when |Gn|=2Ω(nlogn).

Our result implies, in particular, that G-Strong-Rec can be solved in two rounds with bandwidth O(logn), when G is any set of graphs of size 2O(nlogn). The only property of the set of graphs G used by our algorithm is the cardinality of Gn. Our algorithm does not require G to be closed under isomorphisms.

In Section 5 we revisit the one-round case. Our general algorithm can be adapted to run in one round (i.e., in the broadcast congested clique model) by allowing a larger bandwidth. We show that, for every set of graphs G, there is a one-round deterministic algorithm that solves G-Weak-Rec, and a one-round private-coin algorithm that solves G-Strong-Rec w.h.p., both of them using bandwidth O(log|Gn|logn+logn). We finish Section 5 pointing out that our algorithms are tight with respect to the bandwidth as well as the requirement of randomness.

Even though in the congested clique model, by definition, the only complexity measure taken into account is communication, it is important to point out that the general algorithms we present in this paper might run in exponential local time.

Note, however, that unless P=NP (or even if stronger conjectures in computational complexity are false), this difficulty can not be overcome. In fact, for many graph classes G, solving G-Strong-Rec in polynomial local time is impossible.

Let us illustrate this with an example. Consider the hereditary, sparse class of 3-colorable planar graphs, that we denote 3-col-plan. It is NP-complete to decide whether an arbitrary graph belongs to 3-col-plan [11]. Any algorithm in the congested clique model that runs in polynomial local time can be simulated by a sequential algorithm that also runs in polynomial time: simply run the computation of each node one by one at each round. Therefore, unless P=NP, there is no algorithm running in polynomial local time solving 3-col-planar-Strong-Rec. Nevertheless, when the class of graphs is decidable in (centralized) polynomial time, there is no reason, a priori, preventing us from finding one-round, polynomial local time reconstruction algorithms in the congested clique model.

The remark above motivates the study of the following question: for what graph classes G the reconstruction problem G-Strong-Rec can be solved in one round, bandwidth O(log|Gn|/n) and polynomial local time? In Section 6 we address this question by devising one-round, public-coin algorithms for reconstructing two hereditary classes of graphs: distance-hereditary graphs and graphs of bounded modular width. Both algorithms use bandwidth O(logn), run in polynomial local time, and give the correct answer w.h.p.

We point out that, with respect to the local computation time, there exists an example of a conditional lower-bound. In fact, Drucker et al. [7] show that, assuming the Exponential Time Hypothesis, any randomized algorithm solving triangle-freeness in the broadcast congested clique model such that the local computation time is sub-exponential, has communication cost Ω(n/elogn). The existence of similar lower-bounds for the reconstruction of graph classes is an interesting and challenging open question.

Section snippets

One-round algorithms

The broadcast congested clique model is a restricted version of the congested clique model where each node is forced, in each round, to send the same message through its n1 communication links. But, if we consider one-round algorithms, the two models are the same. In fact, suppose that there is a one-round algorithm A (deterministic or randomized) in the congested clique with bandwidth b. We can transform it into an algorithm B in the broadcast version with bandwidth b+1 as follows. We fix a

Reconstructing hereditary graph classes

Suppose that T=(T1,,Tn)(Fp)n is such that every Ti is picked uniformly at random. In that case, the fingerprints of two n-node graphs H and G, that correspond to F(G,T) and F(H,T), are random vectors. These fingerprints are different with a probability that grows exponentially with respect to the number of nodes having different neighborhoods in G and H. Therefore, roughly speaking, if G is a set of graphs where all graphs are very different, then each graph in G will have a different

Reconstructing arbitrary graph classes

In this section we show that there exists a two-round private-coin algorithm in the congested clique model that solves G-Strong-Rec w.h.p. and bandwidth O(log|Gn|/n+logn). Our algorithm is based, roughly, on the same ideas used to reconstruct hereditary classes of graphs. But the problem we encounter is the following: while in the case of hereditary classes of graphs, we had for every graph G and k>0, a bound on the number of graphs contained in D(G,k)Gn, this is not the case in an arbitrary

Revisiting the one-round case

In this section we revisit the one-round case (and therefore the broadcast congested clique model). But, instead of studying hereditary graph classes, we study arbitrary graph classes, and we show that for this general case we need a larger bandwith. Our results are tight, not only in terms of the bandwidth, but also in the necessity of using randomness.

Theorem 3

Let G be a set of graphs. The following holds:

  • 1)

    There exists a one-round deterministic algorithm in the congested clique model that solves G-

Local time complexity

Even though in the congested clique model, by definition, the only complexity measure taken into account is communication, it is important to point out that the general algorithms we presented in this paper might run in exponential local time. The reason behind this is that nodes, at some point, need to look, in Gn, for a graph having the same fingerprint as the fingerprint of G. The search space is obviously super-polynomial when |Gn| is super-polynomial (see, for instance, line 6 of Algorithm

Discussion

A natural question is to try to understand the relation between the recognition problem and the reconstruction problem. (The recognition problem is the classical decision problem, where we simply want to decide whether the input graph belongs to some class G.) It is clear that finding a formal proof showing some type of equivalence between the reconstruction and the recognition problems would yield a non-trivial lower bound on the recognition problem. However, in [7], the authors show that any

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work has been partially supported by CONICYT via PIA/Apoyo a Centros Científicos y Tecnológicos de Excelencia AFB 170001 (P.M. and I.R.), FONDECYT 1170021 (I.R.), FONDECYT 11190482 (P.M.) and PAI + Convocatoria Nacional Subvención a la Incorporación en la Academia Año 2017 + PAI77170068 (P.M.).

References (35)

  • Andrew Drucker et al.

    On the power of the congested clique model

  • Fedor V. Fomin et al.

    Algorithms parameterized by vertex cover and modular width, through potential maximal cliques

    Algorithmica

    (2018)
  • Jakub Gajarský et al.

    Parameterized algorithms for modular-width

  • Michael R. Garey et al.

    Computers and Intractability, vol. 29

    (2002)
  • Mohsen Ghaffari

    An improved distributed algorithm for maximal independent set

  • Mohsen Ghaffari

    Distributed mis via all-to-all communication

  • Mohsen Ghaffari et al.

    MST in log-star rounds of congested clique

  • Cited by (9)

    View all citing articles on Scopus

    A preliminary version of part of this work appeared in the proceedings of the 25th International Colloquium on Structural Information and Communication Complexity SIROCCO 2018, held in Ma'ale HaHamisha, Israel.

    View full text