Triangle Sparsiﬁers

In this work, we introduce the notion of triangle sparsiﬁers , i.e., sparse graphs which are approximately the same to the original graph with respect to the triangle count. This results in a practical triangle counting method with strong theoretical guarantees. For instance, for unweighted graphs we show a randomized algorithm for approximately counting the number of triangles in a graph G , which proceeds as follows: keep each edge independently with probability p , enumerate the triangles in the spar-siﬁed graph G (cid:48) and return the number of triangles found in G (cid:48) multiplied by p − 3 . We prove that under mild assumptions on G and p our algorithm returns a good approximation for the number of triangles with high probability. Speciﬁcally, we show that if p ≥ max ( polylog( n )∆ t , polylog( n ) t 1 / 3 ), where n , t , ∆, and T denote the number of vertices in G , the number of triangles in G , the maximum number of triangles an edge of G is contained and our triangle count estimate respectively, then T is strongly concentrated around t :


Introduction
In recent years, a considerable amount of research has focused on the study of graph structures arising from technological, biological and sociological systems.Graphs are the tool of choice in modeling such systems since the latter are typically described as a set of pairwise interactions.Important examples of such datasets are the Internet graph (vertices are routers, edges correspond to physical links), the Web graph (vertices are web pages, edges correspond to hyperlinks), social networks (vertices are humans, edges correspond to friendships), information networks like Facebook and LinkedIn (vertices are accounts, edges correspond to online friendships), biological networks (vertices are proteins, edges correspond to protein interactions), math collaboration network (vertices are mathematicians, edges correspond to collaborations) and many more.
The number of triangles is a computationally expensive, crucial graph statistic in complex network analysis, in random graph models and in various important applications.In Section 2.2 we provide an extensive list of applications involving triangles.It is worth mentioning that triangles play an important role in theoretical computer science.Specifically, the problem of detecting whether a graph is triangle-free [3] arises in numerous combinatorial problems such as the minimum cycle detection problem [26], the approximate minimum vertex cover in planar graphs [5] and recognition of median graphs [25].Recently, it was shown that detecting a triangle is equivalent to Boolean matrix multiplication under the notion of subcubic reductions [60] contrary to common belief [48].
In this paper, we contribute to the problem of counting triangles in large graphs.Specifically, we present a new randomized algorithm for approximately counting the number of triangles in a graph G.The algorithm proceeds as follows: keep each edge independently with probability p, enumerate the triangles in the sparsified graph G and return the number of triangles found in G multiplied by p −3 .We prove that under mild assumptions on G and p our algorithm returns a good approximation for the number of triangles with high probability.Specifically, we show that if p ≥ max ( polylog(n)∆ t , polylog(n) ), where n, t, ∆, and T denote the number of vertices in G, the number of triangles in G, the maximum number of triangles an edge of G is contained and our triangle count estimate respectively, then T is strongly concentrated around t: We illustrate the efficiency of our algorithm on various large real-world datasets where we achieve significant speedups.Furthermore, we investigate the performance of existing sparsification procedures namely the Spielman-Srivastava spectral sparsifier [46] and the the Benczúr-Karger cut sparsifier [8,9] and show that they are not optimal/suitable with respect to triangle counting.
Our paper is organized as follows: Section 2 presents work related to triangle counting and provides the necessary theoretical preliminaries for the analysis of our algorithm.Section 3 presents our algorithm and our main theoretical result.
Section 4 shows the efficiency of our method on several large networks.Section 5 discusses properties of existing sparsification methods (cut sparsifiers [9] and spectral sparsifiers [46]) with respect to triangle counting.Finally, Section 6 concludes and provides future research directions.

Background
In Section 2.1 we introduce the notation that we use in this work.In Section 2.2 we present an extensive list of applications which involves triangles.In Section 2.3 we present existing work on the triangle counting problem.Finally in Section 2.4 we briefly present theoretical preliminaries for our work.Namely, the Kim-Vu concentration theorem, the Benczúr-Karger and the Spielman-Srivastava sparsifier and finally Foster's theorem.

Notation
For the rest of the paper we use the following symbols: G([n], E) stands for an undirected simple graph with n vertices labeled as 1, 2, .., n and edge set E. Let m = |E| be the number of edges in G. Furthermore, let t be the number of triangles, deg(u) be the degree of vertex u, ∆(e) be the number of triangles edge e is contained in and ∆(v) be the number of triangles vertex v participates in.From the context, it shall always be clear whether we refer to a vertex or an edge.We define ∆ to be equal to the maximum number of triangles that any edge e is contained in, i.e., ∆ = max e∈E(G) ∆(e).Finally, p is the sparsification parameter.

Applications
There are two main processes that generate triangles in a social network: homophily and transitivity.According to the former, people tend to choose friends with similar characteristics to themselves (e.g., race, education) [61,55] and according to the latter friends of friends tend to become friends themselves [55].These facts have several implications which we present in the following together with other applications of triangle counting.For example, recently Bonato, Hadi, Horn, Pra lat and Wang [13] proposed the iterated local transitivity model which has several properties matching empirical properties of "real-world" networks such as skewed degree distribution, communities etc.In the following we provide an extensive list of applications which involves triangles and ranges from social networks to Computer Aided Design applications.

Clustering Coefficients and Transitivity of a Graph
Watts and Strogatz [57] in their influential paper proposed a simple model which explains several properties in social networks such as the abundance in triangles and the short paths among any pair of nodes.Their model combines the idea of homophily which leads to the wealth of triangles in the network and the idea of weak ties which create short paths.In order to quantify the homophily, they introduce the definitions of the clustering coefficient of a vertex and the graph, see Definition 1.The definition of the transitivity T (G) of a graph G, introduced by Newman et al. [39] is closely related to the clustering coefficient and measures the probability that two neighbors of any vertex are connected.It is worth pointing out that the authors of [39] erroneously claim that C(G) is the same with T (G), see also [45]. . (1) Definition 2 (Transitivity) The transitivity T (G) of a graph G is defined as

Uncovering Hidden Thematic Structures
Eckmann and Moses [20] propose the use of the clustering coefficient for detecting subsets of web pages with a common topic.The key idea is that reciprocal links between pages indicate a mutual recognition/respect and then triangles due to their transitivity properties can be used to extend "seeds" to larger subsets of vertices with similar thematic structure in the Web graph.In other words, regions of the World Wide Web with high triangle density indicate common thematic structure, allowing the authors to extract useful meta-information.This idea has found applications in other scientific domains as well, e.g., in bioinformatics [43].

Exponential Random Graph Model
Frank and Strauss [21] proved under the assumption that two edges are dependent only if they share a common vertex that the sufficient statistics for Markov graphs are the counts of triangles and stars.Wasserman and Pattison [56] proposed the exponential random graph (ERG) model which generalized the Markov graphs [42].In this model, the probability space is the set of all possible graphs on n vertices and the probability of a given graph g is given by where s(g) is a vector of graph statistics, typically counts of certain subgraphs (e.g., s(g) = (m, t)) and θ is a vector of parameters.Triangles are frequently used as one of the graph statistics of the ERG model and counting them is necessary for parameter estimation, e.g., using MCMC procedures [11].
Becchetti et al. [7] show that the distribution of triangles among spam hosts and non-spam hosts can be used as a feature for classifying a given host as spam or non-spam.The same result holds also for web pages, i.e., the spam and non-spam triangle distributions differ at a detectable level using standard statistical tests from each other.

Content Quality and Role Behavior Identification
Nowadays, there exist many online forums where acknowledged scientists participate, e.g., MathOverflow, CStheory Stackexchange and discuss problems of their fields.This yields significant information for researchers.Several interesting questions arise such as which participants comment on each other.This question including several others were studied in [58].The number of triangles that a user participates was shown to play a critical role in answering these questions.For further applications in assessing the role behavior of users see [7].

Structural Balance and Status Theory
Balance theory appeared first in Heider's seminal work [24] and is based on the concept "the friend of my friend is my friend", "the enemy of my friend is my enemy" etc. [55].To quantify this concept edges become signed, i.e., there is a function c : E(G) → {+, −}.If all triangles are positive, i.e., the product of the signs of the edges is +, then the graph is balanced.Status theory is based on interpreting a positive edge (u, v) as u having lower status that v, while the negative edge (u, v) means that u regards v as having a lower status than himself/herself.Recently, Leskovec et al. [35] have performed experiments to quantify which of the two theories better apply to online social networks and predict signs of incoming links.Their algorithms require counts of signed triangles in the graph.

Microscopic Evolution of networks
Leskovec et al. [36] present an extensive experimental study of network evolution using detailed temporal information.One of their findings is that as edges arrive in the network, they tend to close triangles, i.e., connect people with common friends.

Community Detection
Counting triangles is also used in community detection algorithms.Specifically Berry et al. use triangle counting to deduce the edge support measure in their community detection algorithm [10].

Link Recommendation
Online social networks (e.g.,Facebook, LinkedIn) typically have link recommendation applications.Proposing edges which create as many triangles as possible is a possible link recommendation mechanism [53].

Motif Detection
Triangles are abundant not only in social networks but in biological networks [37,62].This fact can be used e.g., to correlate the topological and functional properties of protein interaction networks [62].

CAD applications
Fudos and Hoffman [23] introduced a graph-constructive approach to solving systems of geometric constraints, a problem which arises frequently in Computer Aided Design (CAD) applications.One of the steps of their algorithm computes the number of triangles in an appropriately defined graph.

Existing work
There exist two categories of triangle counting algorithms, exact and approximate ones.It is worth noting that for most of the applications described in Section 2.2 the exact number of triangles in not crucial.Hence, approximate counting algorithms which are fast and output a high quality estimate are desirable for the practical applications in which we are interested in this work.

Exact Counting
Naive triangle counting by checking all triples of vertices takes O(n 3 ) units of time.The state of the art algorithm is due to Alon, Yuster and Zwick [2] and runs in O(m 2ω ω+1 ), where currently the fast matrix multiplication exponent ω is 2.371 [18].Thus, the Alon, Yuster, Zwick (AYZ) algorithm currently runs in O(m 1.41 ) time.It is worth mentioning that from a practical point of view algorithms based on matrix multiplication are not used due to the high memory requirements.Even for medium sized networks, matrix-multiplication based algorithms are not applicable.We use this partitioning idea in Section 3 to obtain state of the art results on approximate triangle counting.Itai and Rodeh in 1978 showed an algorithm which finds a triangle in any graph in O(m 3 2 ) [26].This algorithm can be extended to list the triangles in the graph with the same time complexity.Chiba and Nishizeki showed that triangles can be found in time O(mα(G)) where α(G) is the arboricity of the graph.Since α(G) is at most O( √ m) their algorithm runs in O(m 3/2 ) in the worst case [16].For special types of graphs more efficient triangle counting algorithms exist.For instance in planar graphs, triangles can be found in O(n) time [16,26,41].
Even if listing algorithms solve a more general problem than the counting one, they are preferred in practice for large graphs, due to the smaller memory requirements compared to the matrix multiplication based algorithms.Simple representative algorithms are the node-and the edge-iterator algorithms.The former counts for each node number of triangles it is involved in, which is equivalent to the number of edges among its neighbors, whereas in the latter, the algorithm counts for each edge (i, j) the common neighbors of nodes i, j.Both of these algorithms have the same asymptotic complexity O(mn), which in dense graphs results in O(n 3 ) time, the complexity of the naive counting algorithm.Practical improvements over this family of algorithms have been achieved using various techniques, such as hashing and sorting by the degree [34,44].

Approximate Counting
On the approximate counting side, most of the triangle counting algorithms have been developed in the streaming setting.In this scenario, the graph is represented as a stream.Two main representations of a graph as a stream are the edge stream and the incidence stream.In the former, edges arrive one at a time.In the latter scenario all edges incident to the same vertex appear successively in the stream.The ordering of the vertices is assumed to be arbitrary.A streaming algorithm produces a (1+ ) approximation of the number of triangles with high probability by making only a constant number of passes over the stream.However, sampling algorithms developed in the streaming literature can be applied in the setting where the graph fits in the memory as well.Monte Carlo sampling techniques have been proposed to give a fast estimate of the number of triangles.According to such an approach, a.k.a.naive sampling [45], we choose three nodes at random repetitively and check if they form a triangle or not.If one makes independent trials where T i is the number of triples with i edges and outputs as the estimate of triangles the random variable T 3 equaling to the fractions of triples picked that form triangles times the total number of triples ( n 3 ), then (1 − )T 3 < T 3 < (1 + )T 3 with probability at least 1 − δ.This is not suitable when In [6] the authors reduce the problem of triangle counting efficiently to estimating moments for a stream of node triples.Then, they use the Alon-Matias-Szegedy algorithms [1] (a.k.a.AMS algorithms) to proceed.The key is that the triangle computation reduces to estimating the zero-th, first and second frequency moments, which can be done efficiently.Furthermore, as the authors suggest their algorithm is efficient only on graphs with Ω(n 2 / log log n) triangles, i.e., triangle dense graphs as in the naive sampling.The AMS algorithms are also used by [28], where simple sampling techniques are used, such as choosing an edge from the stream at random and checking how many common neighbors its two endpoints share considering the subsequent edges in the stream.
Along the same lines, Buriol et al. [15] proposed two space-bounded sampling algorithms to estimate the number of triangles.Again, the underlying sampling procedures are simple.For instance, in the case of the edge stream representation, they sample randomly an edge and a node in the stream and check if they form a triangle.Their algorithms are the state-of-the-art algorithms to the best of our knowledge.The three-pass algorithm presented therein, counts in the first pass the number of edges, in the second pass it samples uniformly at random an edge (i, j) and a node k ∈ V − {i, j} and in the third pass it tests whether the edges (i, k), (k, j) are present in the stream.The number of samples r needed to obtain an -approximation with probability 1 − δ is Even if the term T 0 in the nominator is missing1 compared to the naive sampling, the graph has still to be fairly dense with respect to the number of triangles in order to get an approximation with high probability.Buriol et al. [15] show how to turn the three-pass algorithm into a single pass algorithm for the edge stream representation and similarly they provide a three-and one-pass algorithm for the incidence stream representation.In [52] an algorithm which tosses a coin independently for each edge with probability p to keep the edge and probability q = 1−p to throw it away is proposed, which however sets p to a constant.The sampling scheme of [52] can be combined with existing sampling schemes [15] via a degree based partitioning to yield improved theoretical and practical performance as it was shown in [31].Recently, a more efficient sampling scheme was proposed by Pagh and Tsourakakis [40].
Another line of work is based on linear algebraic arguments.Specifically, in the case of "power-law" networks it was shown in [50] that the spectral counting of triangles can be efficient due to their special spectral properties [17].This idea was further extended in [51] using the randomized SVD approximation algorithm by [19].In [7] the semi-streaming model for counting triangles is introduced, which allows log n passes over the edges.The key observation is that since counting triangles reduces to computing the intersection of two sets, namely the induced neighborhoods of two adjacent nodes, ideas from locality sensitivity hashing [14] are applicable to the problem.More recently, Avron proposed a new approximate triangle counting method based on a randomized algorithm for trace estimation [4].

Kim-Vu Concentration of Measure
For the purposes of this work, let Y = Y (t 1 , . . ., t m ) be a positive polynomial of m Boolean variables [t i ] i=1..m which are independent.A common task in combinatorics is to show that Y is concentrated around its expected value, e.g., [30].In the following we state the necessary definitions and the main concentration result which we will use in our method.Y is totally positive if all of its coefficients are non-negative.Y is homogeneous if all of its monomials have the same degree and we call this value the degree of the polynomial.Given any multi-index α = (α 1 , . . ., α m ) ∈ Z m + , define the partial derivative Now, we refer to the main theorem of Kim and Vu of [29, §1.2] as phrased in Theorem 1.1 of [54] or as Theorem 1.36 of [49].
Theorem 1 There is a constant c k depending on k such that the following holds.Let Y (t 1 , . . ., t m ) be a totally positive polynomial of degree k, where t i can have arbitrary distribution on the interval [0, 1].Assume that: Then for any λ ≥ 1: The Kim-Vu theorem is an important concentration result since it allows us to obtain strong concentration when the polynomial of interest is not smooth.Typically, when a polynomial Y is smooth, it is strongly concentrated.By smoothness one usually means a small Lipschitz coefficient or in other words, when one changes the value of one variable t j , the value Y changes no more than a constant.However, as stated in [54] this is restrictive in many cases.Thus one can demand "average smoothness" as defined in [54] which is quantified via the expectation of partial derivatives of any order.

Graph Sparsifiers
A sparsifier of a graph G(V, E, w) is a sparse graph H that is similar to G in some useful notion.In Section 2.4.2 we describe the Benczúr-Karger cut sparsifier [8,9] and in Section 2.4.2.the Spielman-Srivastava spectral sparsifier [46].
Benczúr-Karger Sparsifier Benczúr and Karger introduced in [8] the notion of cut sparsification to accelerate cut algorithms whose running time depends on the number of edges.Using a non-uniform sampling scheme they show that given a graph G(V, E, w) with |V | = n, |E| = m and a parameter there exists a graph H(V, E , w ) with O(n log (n)/ 2 ) edges such that the weight of every cut in H is within a factor of (1 ± ) of its weight in G. Furthermore, they provide a nearly-linear time algorithm which constructs such a sparsifier.The key quantity used in the sampling scheme of Benczúr and Karger is the strong connectivity c (u,v) of an edge (u, v) ∈ E [8,9].The latter quantity is defined to be the maximum value k such that there is an induced subgraph G 0 of G containing both u and v, and every cut in G 0 has weight at least k.

Spielman-Srivastava Sparsifier
In [47] Spielman and Teng introduced the notion of a spectral sparsifier in order to strengthen the notion of a cut sparsifier.A quantity that plays a key role in spectral sparsifiers is the effective resistance.The term effective resistance comes from electrical network analysis, see Chapter IX in [12].In a nutshell, let G(V, E, w) be a weighted graph with vertex set V , edge set E and weight function w.We call the weight w(e) resistance of the edge e.We define the conductance r(e) of e to be the inverse of the resistance w(e).Let G be the resistor network constructed from G(V, E, w) by replacing each edge e with an electrical resistor whose electrical resistance is w(e).Typically, in G vertices are called terminals, a convention that emphasizes the electrical network perspective of a graph G.The effective resistance R(i, j) between two vertices i, j is the electrical resistance measured across vertices i and j in G. Equivalently, the effective resistance is the potential difference that appears across terminals i and j when we apply a unit current source between them.Finally, effective conductance C(i, j) between two vertices i, j is defined to by Spielman and Srivastava in their seminal work [46] proposed to include each edge of G in the sparsifier H with probability proportional to its effective resistance.They provide a nearly-linear time algorithm that produces spectral sparsifiers with O(n log n) edges.

Foster's theorem
In Section 5 we use the following theorem, proved by Foster [22].
Theorem 2 Let G be a connected graph of order n.Then Our proposed algorithm Triangle Sparsifier is shown in Algorithm 1 (see also for a preliminary version [52]).The algorithm takes an unweighted, simple graph G(V, E), where without loss of generality we assume that the nodes are numbered from 1, . . ., n, i.e., V = [n] and a sparsification parameter p ∈ (0, 1) JGAA, 15(6) 703-726 (2011) 713 as input.The algorithm first chooses a random subset E of the set E of edges.The random subset is such that the events {e ∈ E }, for all e ∈ E, are independent and the probability of each is equal to p.Then, any triangle counting algorithm can be used to count triangles on the sparsified graph with edge set E .Clearly, the expected size of E is pm where m = |E|.The output of our algorithm is the number of triangles in the sparsified graph multiplied by 1 p 3 , or equivalently we are counting the number of weighted triangles in G where each edge has weight 1 p .It follows immediately that the expected value E [T ] of our estimate is the number of triangles in G, i.e., t.Our main theoretical result is the following theorem: Theorem 3 Suppose G is an undirected graph with n vertices, m edges and t triangles.Let also ∆ denote the size of the largest collection of triangles with a common edge.Let G be the random graph that arises from G if we keep every edge with probability p and write T for the number of triangles of G .Suppose that γ > 0 is a constant and

Proposed Algorithm
and for n ≥ n 0 sufficiently large.Then for any constants K, > 0 and all large enough n (depending on K, and n 0 ).
Proof: Write X e = 1 or 0 depending on whether the edge e of graph G survives in G .Then T = ∆(e,f,g) X e X f X g where ∆(e, f, g) = 1 (edges e, f, g form a triangle).
Refer to Theorem 1.We use T in place of Y , k = 3.
• Case 2 (p 2 ∆ ≥ 1): We get E ≥1 (T ) ≤ p 2 ∆ and, from ( 5), We get, for some constant c 3 > 0, from Theorem 1: Notice that since in both cases we have E [T ] ≥ E ≥1 (T ).We now select λ so that the lower bound inside the probability on the lefthand side of ( 9) becomes E [T ].In Case 1 we pick Since λ ≥ (K + 2) log n follows from our assumptions ( 5) and ( 6) if n is sufficiently large, we get Pr ), where ω currently is 2.371 [18].If we use the node-iterator (or any other standard listing triangle algorithm) the expected running time is Claim 1 (Sparsification in sublinear expected time) The edge sampling can run in O(pm) expected time.
Proof: We do not "toss a p-coin" m times in order to construct E .This would be very wasteful if p is small.Instead we construct the random set E with the following procedure which produces the right distribution.Observe that the number X of unsuccessful events, i.e., edges which are not selected in our sample, until a successful one follows a geometric distribution.Specifically, Pr [X = x] = (1 − p) x−1 p.To sample from this distribution it suffices to generate a uniformly distributed variable U in [0, 1] and set X ← lnU 1−p .Clearly the probability that x−1 p as required.This provides a practical and efficient way to pick the subset E of edges in subliner expected time O(pm).For more details see [33].
Expected Speedup: The expected speedup with respect to the triangle counting task depends on the triangle counting subroutine that we use.If we use [2] as our subrouting which is the fastest known algorithm the expected speedup is p − 2ω ω+1 , i.e., currently p −1.41 where ω currently is 2.371 [18].As already outlined, in practice p − 2ω ω+1 , i.e., currently p −1.41 , and p −2 respectively.
Discussion: This theorem states the important result that the estimator of the number of triangles is concentrated around its expected value, which is equal to the actual number of triangles t in the graph under mild conditions on the triangle density of the graph.The mildness comes from condition (5): picking p = 1, given that our graph is not triangle-free, i.e., ∆ ≥ 1, gives that the number of triangles t in the graph has to satisfy t ≥ ∆ log 6+γ n.This is a mild condition on t since ∆ ≤ n and thus it suffices that t ≥ n log 6+γ n (after all, we can always add two dummy connected nodes that connect to every other node, as in Figure 1(a), even if in empirically ∆ is smaller than n).The critical quantity besides the number of triangles t, is ∆.Intuitively, if the sparsification procedure throws away the common edge of many triangles, the triangles in the resulting graph may differ significantly from the original.A significant problem is the choice of p for the sparsification.Conditions ( 5) and ( 6) tell us how small we can afford to choose p, but the quantities involved, namely t and ∆, are unknown.We discuss a practical algorithm using a doubling procedure in Section 4.4.Furthermore, our method justifies significant speedups.For a graph G with t ≥ n 3/2+ and ∆ ∼ n , we get p = n −1/2 implying a linear expected speedup if we use a practical exact counting method as the node iterator.Finally, it is worth pointing out that Triangle Sparsifier essentially outputs a sparse graph H(V, E , w) with w = 1/p for all edges e ∈ E which approximates G(V, E) with respect to the count of triangles (a triangle formed by the edges (e 1 , e 2 , e 3 ) in a weighted graph counts for w(e 1 )w(e 2 )w(e 3 ) unweighted triangles).As we shall see in Section 6 Triangle Sparsifier is not recommended for weighted graphs.
Table 2: Datasets used in our experiments.Abbreviations are included.Symbol stands for Autonomous Systems graphs, for online social networks and for Web graphs.Notice that the networks with the highest triangle counts are online social networks (Flickr, Livejournal, Orkut), verifying the folklore that online social networks are abundant in triangles.

Experimental Setup
The experiments were performed on a single machine, with Intel Xeon CPU at 2.83 GHz, 6144KB cache size and and 50GB of main memory.The algorithm was implemented in C++, and compiled using gcc version 4.1.2and the -O3 optimization flag.Time was measured by taking the user time given by the linux time command.IO times are included in that time since the amount of memory operations performed in setting up the graph is non-trivial.However, we use a modified IO routine that's much faster than the standard C/C++ scanf.Furthermore, as we mentioned in Section 3 picking a random subset of expected size p|S| from a set S can be done in expected sublinear time [33].A simple way to do this in practice is to generate the differences between indices of entries retained.This allows us to sample in a sequential way and also results in better cache performance.As a competitor we use an implementation of ours of the 1 pass algorithm of [15, §2.2].

Experimental Results
Table 2 shows the count of triangles for each graph used in our experiments.Notice that Orkut, Flickr and Livejournal graphs have ∼622M, 550M and 311M triangles respectively.This confirms the folklore that online social networks are abundant in triangles.Table 3 shows the results we obtain for p = 0.1 over 5 trials.All running times are reported in seconds.The first column shows the running time for the exact counting algorithm over 5 runs.Standard deviations are neglibible for the exact algorithm and therefore are not reported.The second and third column show the error and running time averaged over 5 runs for each dataset (two decimal digits of accuracy).Standard deviations are also included (three decimal digits of accuracy).The last column shows the running time averaged over 5 runs for the 1-pass algorithm as stated in [15, §2.2] and the standard deviations.For each dataset the number of samples needed by the 1-pass algorithm was set to a value that achieves at most as good accuracy as the ones achieved by our counting method.Specifically, for any dataset, if α, β(%) are the errors obtained by our algorithm and the Buriol et al. algorithm, we "tune" the number of samples in the latter algorithm in such way that α ≤ β ≤ α + 1%.Even by favoring in this way the 1-pass algorithm of Buriol et al. [15], one can see that the running times achieved by our method are consistently better.However, it is important to outline once again that our method and other triangle counting methods can be combined.For example, in [31] it was shown that Triangle Sparsifiers and other sampling methods can be combined to obtain a superior performance both in practice and theoretically by improving the sampling scheme of Buriol et al. [15].This was achieved by distinguish vertices into two subsets according to their degree and using two sampling schemes, one for each subset [31].We also tried other competitors, but our running times outperform them significantly.For example, even the exact counting method outperforms other approximate counting methods.As we show in Section 4.4 smaller values of p values work as well and these can be found by a simple doubling-like procedure.

The "Doubling" Algorithm
As we saw in Section 3, setting optimally the parameter p requires knowledge of the quantity we want to estimate, i.e., the number of triangles.To overcome this problem we observe that when we have concentration, the squared coefficient of variation Var[T ] E[T ] 2 is "small".Furthermore, by the Bienayme-Chebyshev inequality and by the median boosting trick [27] it suffices to sample {T 1 , . . ., T s } where s δ ) in order to obtain a (1 ± ) approximation E [T ] = t with probability at least 1-δ.Hence, one can set a desired value for the number of samples s and of the failure probability δ and calculate the expected error If this value is significantly larger than the desired error threshold then one increases p and repeats the same procedure until the stopping criterion is satisfied.One way one can change p is to use the multiplicative rule p ← cp, where c > 1 is a constant.For example, if c = 2 then we have a doubling procedure.Notice that we've placed the word doubling in the title of this section in quotes in order to emphasize that one may use any c > 1 to change p from one round to the next.
For how many rounds can this procedure run?Let's consider the realistic scenario where one wishes to be optimistic and picks as an initial guess for p a value p 0 = n −α where α is a positive constant, e.g., α = 1/2.Let p * be the minimum value over all possible p with the property that for p * we obtain a concentrated estimate of the true number of triangles.Clearly, p * ≤ 1 and hence the number of rounds performed by our procedure is less that r where p 0 c r = 1.Hence, for any constant c > 1 we obtain that the number of rounds performed by our algorithm is O(log n).Furtermore, note that the running time of the doubling procedure is dominated by the last iteration.To see why, consider for simplicity the scenario where r + 1 rounds are needed to deduce concentration, c = √ 2 and the use of the node-iterator algorithm to count triangles in the triangle sparsifier.Then, the total running time shall be ).In practice, this procedure works even for small values of s.An instance of this procedure with s = 2, δ = 1/100 and error threshold equal to 3% is shown in Table 4.

Theoretical Ramifications
In Section 5.1 we investigate the performance of the Benczúr-Karger cut sparsifier and the Spielman-Srivastava spectral sparsifier with respect to triangle counting.

Cut and Spectral Sparsifiers
Consider the graph G shown in Figure 1.The strong edge connectivity of any edge in the graph is 2 and therefore the Benczúr-Karger algorithm does not distinguish the importance of the edge e = (1, 2) with respect to triangle counting.The Spielman-Srivastava sparsifier with probability 1 − o(1) throws away the critical edge e = (1, 2) as the number of vertices n tends to infinity.
To prove this claim, we need use Foster's theorem, see Section 2.4.3.

Proof:
Using the in-series and in-parallel network simplification rules [12], the effective conductance of the edge (1, 2) is 1 + Clearly, the Spielman-Srivastava sparsifier fails to capture the importance of the edge (1, 2) with respect to triangle counting.Finding an easy-to-compute quantity which allows a sparsification that preserves triangles more efficiently is an interesting problem.It is worth outlining that our analysis does not exclude effective resistances which can be computed very efficiently [32], but the use of them as is typically done in the context of spectral sparsifiers.

Conclusions
In this paper, we introduce the notion of a triangle sparsifier, i.e., a sparse graph which preserve the total number of triangles with respect to the input graph G.This notion naturally introduces a new randomized algorithm for counting triangles in a graph G, a problem with various real-world applications as outlined in Section 1.
Using a theorem on the concentration of multivariate polynomials due to Kim and Vu [29] we proved that under mild conditions on G and p our estimate T is concentrated around t with high probability, thus making our algorithm a reliable choice for counting triangles in large social networks with millions and billions of edges.By choosing a straightforward triangle counting algorithm as Figure 2: Weighted Graph the node or the edge-iterator we obtain quadratic speedups 1 p 2 .We can justify significant speedups: for example if we know t ≥ n 3/2+ and assume ∆ ∼ n, we get p = n −1/2 resulting in a O(n) speedup.As it was shown in [31], Triangle Sparsifiers can also be used in combination with approximate triangle counting methods to yield better practical and theoretical performance.
We believe that a tighter theoretical analysis is possible.Specifically we conjecture that if t ≥ n log (n) we can obtain concentration using our method.Also, our sampling scheme can be adapted to weighted graphs: multiply the weight of sampled edge by 1  p and count weighted triangles.However this can be problematic as the graph shown in Figure 2 indicates.Specifically, for a sufficiently large w, throwing out any weighted edge results in an arbitrarily bad estimate of the count of triangles.Finding a better sampling scheme for weighted graphs is left as a problem for future work.Finally, finding an easyto-compute quantity which gives us an optimal way to sparsify the graph with respect to triangles is an interesting research problem.

Figure 1 :Claim 2
Figure 1: Graph with linear number O(n) of triangles.
Pick a random subset E of edges such that the events {e ∈ E }, for all e ∈ E are independent and the probability of each is equal to p.
t ← count triangles on the graph G ([n], E ) Return T ← t p 3 [2]both cases.Complexity AnalysisThe expected running time of edge sampling is sublinear, i.e., O(pm), see Claim 1.The complexity of the counting step depends on which algorithm we use to count triangles 2 .For instance, if we use[2]as our triangle counting algorithm, the expected running time of Triangle Sparsifier is O(pm + (pm)

Table 3 :
Results of experiments averaged over 5 trials using p = 0.1.All running times are reported in seconds.The first column shows the running time for the exact counting algorithm averaged over 5 runs.The second and third column show the error and running time averaged over 5 runs for each dataset (two decimal digits of accuracy).Standard deviations are also included (three decimal digits of accuracy).The last column shows the running time averaged over 5 runs for the 1-pass algorithm as stated in [15, §2.2] and the corresponding standard deviations.The number of samples for each dataset was set to a value that achieves at most as good accuracy as the ones achieved by our counting method.See Section 4.3 for all the details.

Table 4 :
Doubling procedure for the Wikipedia 2005 graph with 44,667,095 triangles.