Scale-free and power law distributions via fixed points and convergence of (thinning and conditioning) transformations

In discrete contexts such as the degree distribution for a graph, \emph{scale-free} has traditionally been \emph{defined} to be \emph{power-law}. We propose a reasonable interpretation of \emph{scale-free}, namely, invariance under the transformation of $p$-thinning, followed by conditioning on being positive. For each $\beta \in (1,2)$, we show that there is a unique distribution which is a fixed point of this transformation; the distribution is power-law-$\beta$, and different from the usual Yule--Simon power law-$\beta$ that arises in preferential attachment models. In addition to characterizing these fixed points, we prove convergence results for iterates of the transformation.

the same distribution as X, so one might ask that it have the same distribution as X after conditioning on cX ≥ 1. This means that P (X ≥ x) = P (cX ≥ x | cX ≥ 1).
It is not hard to check that the only such distributions are the Pareto distributions: See [12,14] for similar observations.
In the discrete context, say D taking values in {0, 1, 2, . . .}, multiplying by c ∈ (0, 1) would change the support. The natural analog of multiplying by a positive c < 1, while maintaining integer support, is p-thinning. The notion of p-thinning will be defined in Section 2.2. Thus we ask: what distributions on the nonnegative integers have the property that D has the same distribution as ((the p-thinning of D), conditional on being strictly positive)? The preceeding involved characterization of fixed points. For convergence to these fixed points, the continuous analogue of the discrete problem that we treat in this paper is the following. Take X ≥ 1 and let H(x) = P (X ≥ x) be its tail probabilities. Under what conditions on H does lim c→0 P (cX ≥ x | cX ≥ 1) = x −α , x ≥ 1?
It is not hard to check that a necessary and sufficient condition is that where L is a slowly varying function.
Recall, for example from [6], that a positive function L on (0, ∞) is said to be slowly varying (at infinity) if (1) lim t→∞ L(tx) L(t) = 1 for every x > 0. Common examples are L(x) = c(log(1 + x)) σ , for a constant c = 0, and any σ ∈ (−∞, ∞). Slowly varying functions are useful in discussing distributions that have power law properties without being strict power laws. There are other results in the literature about fixed points of transformations that are the composition of two operations that change a distribution in opposite directions. Examples are [2,5].
2. Introduction in the context of graphs 2.1. "Scale-free" and "power-law". The notion that a degree distribution follows a power-law is concrete. It is the statement that the probabilility of being exactly k, decays as a negative power of k -see (5) for the precise version, and (6) - (8) for the common extensions. The notion of a graph being scale-free is harder to pin down. Some authors, e.g., [1,4], simply define scale-free graphs to be those where the degree distribution follows a power-law. We prefer a much broader interpretation: G is scale-free if randomly chosen subgraphs H are qualitatively similar to G.
The ambiguous phrase qualitatively similar is meant to apply to the degree distribution as well as other graph features, but in this paper we study only the degree distribution.

2.2.
Thinning by tossing a p-coin. To p-thin a population means to toss a coin, independently for each individual in the population; the coin has P(heads) = p, and only those individuals getting heads are retained. This standard but not well-known terminology comes from the study of random point processes and percolation, as in [7,9,10]. Thinning can be applied to sets and other structures, but the simplest context for thinning, corresponding to thinning a set and keeping track of its cardinality, is the following.
The "p-thinning" of a nonnegative integer valued random variable D is the number S D of heads, in D tosses of a p-coin. The notation S D comes from random walks. Take a sequence of independent Bernoulli random variables X 1 , X 2 , . . ., with p = P(X i = 1), 1 − p = P(X i = 0), and for n = 0, 1, 2, . . ., let S n = X 1 + · · · + X n , so that P(S n = k) = n k p k (1 − p) n−k . Finally, require that X 1 , X 2 , . . . be independent of D. With this setup, S D is the p-thinning of D, and we use the notation S D,p when we want to emphasize the role of p: (2) S D ≡ S D,p = X 1 + · · · + X D = the number of heads in D tosses.
In terms of the probability generating function, with G D (s) := E s D , we have This makes it tempting to use "doubly complemented" generating functions, with N D (s) := 1 − G D (1 − s), as the result of p-thinning is captured by the relation N SD (s) = N D (ps). See (17) for a use of the doubly complemented generating function.
2.3. Thinning a graph, by nodes or edges. Start with a simple graph G = (V, E); neither multiple edges nor loops are allowed. This G may be thought of as determinstic or random graph; if the graph is random, then we require all our p-coins to be independent of G. To thin "by nodes" we take a random subset S ⊂ V , chosen by tossing a p-coin for each node. The thinned graph H is the induced graph on S, i.e., S is the set of nodes of H, and the edges of H are all those edges of G which have both endpoints in S.
To thin "by edges" we toss a p-coin for each edge. The thinned graph H has the original V as its set of nodes, and the edges of H are those e ∈ E whose coin comes up heads.
It is easiest to see the difference between these two ways of thinning by considering the example G = K n , the complete graph with n nodes and m = n 2 edges. If we thin by nodes we get a complete graph K k for some random k, with 0 ≤ k ≤ n. If we thin by edges, we get a graph on n nodes -and all 2 m such graphs are possible; in fact the random graph H is exactly distributed according to the beloved Erdős-Rényi model G(n, p). [The choice of node is to be uniform among the available nodes, and otherwise independent of G and H; in the case of node-thinning, in order to have D * well-defined, we modify the procedure by conditioning on the event, of probability 1 − (1 − p) n , that H has at least one node.] Proof. Write D x (G) for the degree of node x in G, and D x (H) for its degree in H. Note that D x (G) may be deterministic or random, according to whether G is deterministic or random. Write N x for the set of neighbors of x in G, so that D x (G) = |N x |. Write X for the random node selected to represent H, so that D * = D X (H).
Consider first the case of thinning by edge. Since G and H have the same set of nodes, X also serves as a randomly chosen node for G, so that D = D X (G) and X are independent of the p-coins used for thinning. Now H is formed by tossing a p-coin for each edge of G; in particular, for every x, we have tossed a p-coin for each of the D x (G) edges from x to a node in N x , and D x (H) is the number of heads among these. Using the independence of G, X, and the coins, Consider next the case of thinning by node. Write S for the random subset of V formed by p-thinning, so that D x (H) = |S ∩ N x |. Our goal is to show that for each x ∈ V , P(X = x) = 1/n; having this, the entire argument in the four lines of the display containing (4) applies, and we are done with the case of thinning by node! Fix a node x ∈ V and a nonempty set T ⊂ V , and consider the event A x,T that X = x and S = T . Write k = |T |. Since we are p-thinning the set V and then conditioning on not getting the empty set, Since we are choosing X uniformly from S, we have P(A x,T ) = 1 k P(S = T ) if x ∈ T , and P(A x,T ) = 0 if x / ∈ T . There are n−1 k−1 ways to pick a set T of size k containing x. Finally, we have using n k /n = n−1 k−1 /k to get the final equality.
Remark 1. The difference between the argument for edge-thinning and the argument for node-thinning is subtle. For edge-thinning, we have a particular coupling between D = D X (G), D * = D X (H), based on a single random choice X of node, with X independent of the coins, but there is no such construction when thinning by node. In both cases, P(X = x) = 1/n for each node x, which is the key in the step (4). For node-thinning, X cannot be independent of the coins, because tails for the coin at node x implies X = x.

2.4.
Definition of power-law-β. We say that a positive integer valued random variable D satisfies a power-law-β distribution if it satisfies the following asymptotic condition on the point probabilities: The notation a k ∼ b k denotes the asymptotic relation lim k→∞ a k /b k = 1.
One way to broaden the definition would be to say that a positive integer valued random variable D satisfies a power-law-β distribution if it satisfies the following asymptotic condition on the tail probabilities: as k → ∞, Another way to broaden the definition is to allow a slowly varying function in place of the constant in (5), and say that a positive integer valued random variable D satisfies a power-law-β distribution if and L is slowly varying, as in (1). The broadest natural definition of power-law-β combines the upper-tail feature of (6) with the slowly varying feature of (7); one might say that positive integer valued random variable D satisfies a power-law-β distribution if

The (thinning and conditioning) tranformations, and their fixed points
If D is a nonnegative integer valued random variable and 0 < p < 1, the pthinning S D of D is the random variable given by where X 1 , X 2 , ... are i.i.d. random variables (also independent of D) with P (X i = 1) = p, P (X i = 0) = 1 − p. The distribution of S D is then given by This uses the notation (z) k = z(z − 1) · · · (z − k + 1) for the falling product.
Fix an integer m = 1, 2, . . .. For p ∈ (0, 1), the transformations T ≡ T p ≡ T p,m for which we consider fixed points and convergence of iterates are given by In Section 4, we will prove that the fixed points of the transformation are precisely those described by (11) -(15) below, and in Section 5 and 6 we will prove results where these fixed points arise as limits of iterates of the transformation.
Remark 2. We are referring here to distributions that are fixed points for all p, not just for some p. It would be interesting to know whether these are the only fixed points for a given p. For m = 1, all nontrivial fixed points have the form: for some α ∈ (0, 1), The right hand side of (11) defines c k (α) to be the coefficient of s k in 1 − (1 − s) α , so for any α ∈ R, for k ≥ 1, and then specifically for m = 1, with the restriction α ∈ (0, 1), In general, for m = 1, 2, . . . and α ∈ (0, m) there is a nontrivial fixed point for T p,m , which is power-law-β for β = 1 + α, with and this gives all nontrivial fixed points of T p,m . Thus, the special case m = 1 of (13) was given by (11), under the restriction 0 < α < 1; the special case m = 2 of (11) is and the special case m = 3 of (11) is A unified desciption of the fixed points (for all p) of T p,m , including both the trivial fixed point, obtained by taking α = m, is: 1 + α = β ∈ (1, m + 1], P(D ∈ {m, m + 1, m + 2, . . .}) = 1, P(D = m) > 0, and (14) P(D = k + 1) or equivalently, shifting the dummy variable k by 1, The Yule-Simon distribution for power-law-β has point probabilities given by P(D = k) = (β − 1) Γ(k)Γ(β)/Γ(k + β), and hence ratios (16) P In comparison with (15), both formulas have denomimator minus numerator = β, for every k, but for non-integer β, (15) has the integer in the denominator, while the Yule-Simon ratio (16) has the integer in the numerator.
Remark 3 (Iterates of the combined transformation). It may or may not be intuitively obvious that the four steps: p-thinning to get S D , then conditioning on S D ≥ m to get T p D, then q-thinning -to get say Y , and conditioning again, on Y ≥ m, to get T q T p D, have the same effect as the two steps: pq-thin, then condition, to get T pq D. The intuition is reasonable, since the event Y ≥ m is a subset of the event S D ≥ m. For the case m = 1, an easy way to see that T p followed by T q equals T pq is to use the "doubly complemented" generating functions from the end of Section 2.2, for which the distribution of T p D ≡ T p,1 D is determined by This allows a proof via the calculation That T q • T p = T pq for all m is true, and can be proved using a well-known coupling p-coins, q-coins, and pq-coins; we omit the details of this coupling proof. From T s • T t = T st it follows that the k-fold iterate (T q ) k of T q is T p with p = q k . Theorem 7 allows p → 0 with only the restriction p > 0, and the special case where p goes to zero along a geometric sequence q k yields convergence for iterates of the transformation T q , for one fixed q.

Uniqueness
The goal is to show that, for m = 1, 2, . . ., any distribution D on the nonnengative integers which is unchanged by p-thinning followed by conditioning on being at least m, for all p ∈ (0, 1), is either the constant D ≡ m or else, as specified by (15), the law with 1 < β < m + 1 and ratios P(D = k)/P(D = k − 1) = (k − β)/k for k ≥ m + 1.

Then (a) if and only if (c), and (b) if and only if (d).
Proof. Let a k := P(A = k) and b k = P(B = k) so that G A (s) = k≥0 a k s k and likewise for G B . These are power series with radius of convergence ≥ 1, hence differentiable term-by-term, with G  p(1 − s)).
Write f for the mth derivative of G, so that G  p(1 − s)), for all s ∈ [0, 1).

Convergence to nontrivial fixed points
We will prove Theorem 5. Suppose the distribution of D is power-law-β, as specified by (7) or more generally (8), allowing a slowly varying function. Then for every integer k ≥ β Before proving Theorem 5, we state part of a Tauberian theorem that can be found on page 447 of [6]. Many other Tauberian theorems can be found in [3]. Remark. In his more general statement, Feller assumes that q l is monotonic, but as he points out in the middle of page 446, this monotonicity is not needed for the special case quoted above.
Proof. Proof of Theorem 5. First, we assume (7). Apply the Tauberian theorem to q l = (l) n P (D = l) ∼ l n−β L(l) for l ≥ n, 0 for l < n, ρ = n − β + 1, and s = 1 − p. Using (9), the result is that if n > β − 1, then as p ↓ 0. If k > β, this can be applied to both n = k and n = k − 1 to obtain (22). If k = β, this can be applied to n = k, but not to n = k − 1, since then the corresponding ρ is zero. When k = β and n = k − 1, q l ∼ L(l)/l. Theorem 1(b) on page 281 of [6] implies in this case that L * (l) = q 1 + · · · + q l is slowly varying, and satisfies lim l→∞ L(l)/L * (l) = 0.
Using Theorem 5 on page 447 of [6], we see that Combining the above results implies that the limit in (22) is zero, as required. Now instead of (7) we assume (8). Write H(k) = P (D ≥ k), so that (8) gives where L is slowly varying. Sum by parts, make a change of variables in the second sum below, and apply the Tauberian theorem to each of the resulting sums.
Convergence of the ratios of probabilities in (22) does not immediately imply tightness of the distributions of (S D | S D ≥ m) as p ↓ 0. This tightness is needed to conclude that the iterates of the transformation converge to the appropriate fixed point. We therefore now turn our attention to that issue.
Theorem 7. Take m ≥ β − 1, and suppose the distribution of D is such that (22) holds for k ≥ β. Then the distributions of (S D | S D ≥ m) are tight as p ↓ 0. It follows that these distributions have a limit as p ↓ 0, which is the fixed point described in (15) in case β < m + 1, or P(D = m) = 1 in case β = m + 1.
Proof. Tightness of these conditional distributions means that Thus we need to deduce the asymptotics of ratios of tail probabilities from the asymptotics of ratios of point probabilities.
A key identity that allows for this transition is Students of the theory of percolation will recognize this as a very simple form of Russo's formula -see page 35 of [7], for example. The proof of (24) is also simple: Use (9) to write Differentiating gives To prove (24) one needs to check The easiest way to check this is to note that the two sides of (26) agree for k = 0, and differences of the two sides of (26) for successive values of k also agree. By L'Hospital's Rule, whenever (22) holds, it follows from (24) that Using (27)

Convergence to trivial fixed points
Next we consider what happens in the less interesting regime m < β − 1.
Theorem 8. Suppose ED k−1 < ∞. Then Remark If (5) holds with m = β − 1, then Theorems 5 and 7 provide the above conclusion even though ED m may be infinite.