Galton-Watson Probability Contraction

We are concerned with exploring the probabilities of first order statements for Galton-Watson trees with $Poisson(c)$ offspring distribution. Fixing a positive integer $k$, we exploit the $k$-move Ehrenfeucht game on rooted trees for this purpose. Let $\Sigma$, indexed by $1 \leq j \leq m$, denote the finite set of equivalence classes arising out of this game, and $D$ the set of all probability distributions over $\Sigma$. Let $x_{j}(c)$ denote the true probability of the class $j \in \Sigma$ under $Poisson(c)$ regime, and $\vec{x}(c)$ the true probability vector over all the equivalence classes. Then we are able to define a natural recursion function $\Gamma$, and a map $\Psi = \Psi_{c}: D \rightarrow D$ such that $\vec{x}(c)$ is a fixed point of $\Psi_{c}$, and starting with any distribution $\vec{x} \in D$, we converge to this fixed point via $\Psi$ because it is a contraction. We show this both for $c \leq 1$ and $c>1$, though the techniques for these two ranges are quite different.

Remark 2.3. Let y = g(c) be the probability that T = T c is infinite. It is well known that g(c) = 0 for c ≤ 1 while for c > 1, y = g(c) is the unique positive real satisfying e −cy = 1 − y. The value c = 1 is often referred to as a critical, or percolation, point for GW-trees. The function g(c) is not differentiable at c = 1. The right sided derivative lim c→1 + (g(c) − g(1))/(c − 1) is 2 while the left sided derivative is zero. An interpretation of Theorem 2.2 that we favor is that the critical point c = 1 cannot be seen through a First Order lens. Theorem 2.2 thus yields that the property of T being infinite is not expressible in the first language -though this can be shown with a much weaker hammer! The plot clearly shows how the function is not differentiable at c = 1, and how the solution is non-unique for c > 1.
But the plot corresponding to the property in Example 2.1 shows that the probability is a smooth function of c, which is in keeping with Theorem 2.2.

Definition 2.4.
With v ∈ T , T (v) denotes the rooted tree consisting of v and all of its descendents, with v regarded as the root. For s a nonnegative integer, T | s denotes the rooted tree consisting of its vertices up to generation at most s. We call T | s the s-cutoff of T . (This is defined even if no vertices are at generation s.) T (v)| s denotes the s-cutoff of T (v).

The Ehrenfeucht Game
3.1. Equivalence Classes. Let k denote an arbitrary positive integer. We may then define an equivalence relation ≡ k (we often omit the subscript) on all trees T , as follows.
Definition 3.1. T ≡ k T if they have the same truth value for all A of quantifier depth at most k. Equivalently T ≡ k T if Duplicator wins the k-move Ehrenfeucht game EHR[T, T , k]. Σ = Σ k denotes the set of all equivalence classes.
Critically, Σ k is a finite set. As a function of k we note that |Σ k | grows like a tower function. We give [2] as a general reference to these basic results. The proof of Theorem 3.3 is a goal of this paper, accomplished only in Section 9 after many preliminaries. Any first order sentence of quantifier depth A is determined, tautologically, by the set S(A) of those j ∈ Σ such that all T with EV [T ] = j have property A. For any j ∈ Σ either all T with EV [T ] = j or no T with EV [T ] = j have property A. We may therefore decompose the f A (c) of (2.2) into Theorem 2.2 will therefore follow from Theorem 3.3.

Recursive States.
In the k-move Ehrenfeucht game, values ≥ k are roughly all "the same." This will be made precise in the subsequent Theorem 3.4. We define The phrase "there are ω copies" is to be interpreted as "there are ≥ k copies." We call v ∈ T a rootchild if its parent is the root R. For w = R we say v is the rootancestor of w if v is that unique rootchild with w ∈ T (v). Of course, a rootchild is its own rootancestor. Theorem 3.4 roughly states that the Ehrenfeucht value of a tree T is determined by the Ehrenfeucht values EV [T (v)] for all the rootchildren v. To clarify: ω rootchildren means at least k rootchildren while n rootchildren, n ∈ C, n = ω means precisely n rootchildren.  Proof of Theorem 3.4. Let T, T have the same n. We give a strategy for Duplicator in the Ehrenfeucht game EHR[T, T , k]. Duplicator will create a partial matching between the rootchildren v ∈ T and the . At the end of any round of the game call a rootchild v ∈ T (similarly v ∈ T ) free if no w ∈ T (v) has yet been selected. Suppose Spoiler plays w ∈ T (similarly w ∈ T ) with rootancestor v. Suppose v is free. Duplicator finds a free v ∈ T with EV (T (v)) = EV (T (v )). (As the number for each class is the same for T, T this may be done when n j = ω.) When EV [T (v)] = j ∈ Σ and n j = ω, then as the number of rootchildren of T with Ehrenfeucht value j is exactly the same as that in T , hence this can be done. In the special case where n j = ω the vertex v may be found as there have been at most k − 1 moves prior to this move and so there are at most k − 1 rootchildren v with EV [T (v )] = j that are not free. Duplicator then matches v, v .
. He employs that strategy to find a response w ∈ T (v ) corresponding to w ∈ T (v). Once v, v have been matched any move z ∈ T (v) (similarly z ∈ T (v )) is responded to with a move in z ∈ T (v ) using the strategy for EHR(T (v), T (v ), k).
Remark 3.6. Tree automata consist of a finite state space Σ, an integer k ≥ 1, a map Γ as in (3.5) and a notion of accepted states. While first order sentences yield tree automata, the notion of tree automata is broader. Tree automata roughly correspond to second order monadic sentences, a topic we hope to explore in future work.

Solution as Fixed Point.
We come now to the central idea. We define, for c > 0, a map Ψ c : D → D. Let x = (x 1 , . . . , x m ) ∈ D, a probability distribution over Σ. Imagine root R has Poisson mean c children.
To each child we assign, independently, a j ∈ Σ with distribution x. Let n j ∈ C be the number of children assigned j. Let n = (n 1 , . . . , n m ). Apply the recursion function (equation 3.5) Γ to get σ = Γ( n), the Ehrenfeucht value of the root R. Then Ψ c ( x) ∈ D is the distribution thus induced on the Ehrenfeucht value of R.
The special nature of the Poisson distribution allows a concise expression. When the initial distribution is x, the number of chilren assigned j will have a Poisson distribution with mean cx j and these numbers are mutually independent over j ∈ Σ. Thus where the summation is over all a with Γ( a) = j. We place all Ψ c into a single map ∆:  Proof. To show that x(c) is a fixed point of iteration Ψ c , we start with the initial probability vector x = x(c). Once we perform the iteration, using the definition of the recursive function Γ from (3.5), we know, for any j ∈ Σ, where recall that P c is the probability induced under the P oisson(c) offspring distribution.
Example 3.8. For many particular A the size of Σ, which may be thought of as the state space, may be reduced considerably. Let A be the property given in (2.1), that no node has precisely one child. We define state 1, that A is true and state 2, that A is false. We set C = {0, 1, ω} with ω meaning "at least two." Let n 1 , n 2 ∈ C be the number of rootchildren v with T (v) having state 1, 2 respectively. Then T is in state 1 if and only if n = (n 1 , n 2 ) has one of the values (0, 0), (ω, 0). Let D be the set of distributions on the two states, (3.11) The fixed point (x, y) then has x = Pr[A] satisfying the equation Example 3.9. Let A be that there is a vertex v with precisely one child who has precisely one child. Let state 1 be that A is true. Let state 2 be that A is false but that the root has precisely one child. Let state 3 be all else.

The Contraction Formulation
4.1. The Total Variation Metric. On D we let ρ( x, y) denote the usual Euclidean metric, and || x− y|| 1 the L 1 distance. We let T V ( x, y) denote the total variation distance. With x = (x 1 , . . . , x m ) and y = (y 1 , . . . , y m ) this standard metric is given by Total variation distance between any two probability distributions µ, ν on the same probability space has a natural interpretation in terms of coupling µ and ν. Let p = x µ(x) ∧ ν(x). Flip a coin that has probability of heads p.
• If it lands heads, then choose a value Z according to the probability distribution and set X = Y = Z.
• If the coin lands tails, choose X according to the probability distribution Independently choose Y according to the probability distribution Then X ∼ µ, Y ∼ ν and (X, Y ) are coupled in such a way that We call such a coupling optimal.
As such, we have the general well-known result: Theorem 4.1. For any two probability distributions µ, nu on a common probability space, We refer the reader to [3] for further reading.

The Contraction Theorem.
Theorem 4.2. For all c > 0 there exists a positive integer s and an α < 1 such that for all x, y ∈ D Generate a random GW tree T = T c but stop at generation s. (The root is at generation 0.) To each node (there may not be any) v at generation s assign independently j ∈ Σ from distribution x. Now we work up the tree towards the root. Suppose, formally by induction, that all w at generation i have been assigned some j ∈ Σ. A v at generation i − 1 will then have n j children assigned j (allowing n j = ω). The value at v, which is now determined by Theorem 3.4, is given by the recursion function Γ( n) of Definition 3.5. Ψ s ( x) will then be the distribution of the Ehrenfeucht value assigned to the root.
Remark 4.3. The non-first order property A that T is infinite may be similarly examined. Set C = {0, ω} (ω denoting ≥ 1) and let state 1 be that T is infinite, state 2 that it is not. T is in state 1 if and only if n = (ω, 0) or (ω, ω). (4.6) However, Ψ c has two fixed points: (0, 1) and the "correct" (x(c), 1 − x(c)) when c > 1. The contraction property (4.5) will not hold. With small, 1 − e −c ∼ c and so x = (0, 1), and y = ( , 1 − ) become further apart on application of Ψ c , when c > 1.   Proof. The main idea is to use suitable coupling of x and y. First we fix s ∈ N. We then create two pictures.
In both pictures, let v have s many children v 1 , . . . v s . In picture 1, we assign, mutually independently, labels X 1 , . . . X s ∈ Σ to v 1 , . . . v s respectively, with X i ∼ x, 1 ≤ i ≤ s. In picture 2, we assign, again mutually independently, labels Z 1 , . . . Z s ∈ Σ to v 1 , . . . v s , with Z i ∼ y, 1 ≤ i ≤ s. The pairs (X i , Z i ), 1 ≤ i ≤ s are mutually independent, but for every i, (X i , Z i ) is coupled so that Suppose X v is the label of the root v in picture 1 that we get from the recursion function Γ (from (3.5)), and Z v that in picture 2. Then Theorem 4.5. Theorem 4.2 holds when c < 1.

Universality
We define a function Rad[i] on the nonnegative integers by the recursion Rad(0) = 0 and Rad(i + 1) = 3R(i) + 1 for i ≥ 0. (5.1) Definition 5.1. In T we define a distance ρ(v, w) to be the minimal r for which there is a sequence v = z 0 , z 1 , . . . , z r = w where each z i+1 is either the parent or a child of z i . We set ρ(v, v) = 0.
As an example, cousins would be at distance four.
Definition 5.2. For r a nonnegative integer, v ∈ T , the ball of radius r around v, denoted B(v, r) is the set of w ∈ T with ρ(v, w) ≤ r. We consider v a designated vertex of B(v, r).
We define an equivalence relation, depending on k, on such balls.
if the two sets satisfy the same first order sentences of quantifier depth at most k − 1 with v, v as designated vertices, allowing π, =, and ρ.
Remark 5.4. Note that the (k − 1)-round Ehrenfeucht game with v, v designated is identical to the k-round Ehrenfeucht game in which the first round is mandated to select v, v .
Equivalently, B(v, r) ≡ k B(v , r) if Duplicator wins the k-move Ehrenfeucht game on these sets in which the first round is mandated to be v, v and Duplicator must preserve π, = and ρ. The distance function ρ could be replaced by the binary predicates ρ i (w 1 , w 2 ) : ρ(w 1 , w 2 ) = i, 0 ≤ i ≤ 2r. As this is a finite number of predicates the number of equivalence classes is finite. Let Σ BALL k denote the set of equivalence classes.
Definition 5.6. We say T is k-full if for any v 1 , . . . , v k−1 ∈ T and any σ ∈ Σ BALL k there exists a vertex v such that (i) B(v, Rad(k − 1)) is in equivalence class σ.
When T is k-full our next result shows that the truth value of first order sentences of quantifier depth at most k is determined by examining T "near" the root. This "inside-outside" strategy is well known, see, for example, [2].
Theorem 5.7. Let T, T with roots R, R both be k-full. Suppose, as per Definition 5.3, B(R, Rad(k)) ≡ k+1 B(R , Rad(k)). Then T, T have the same k-Ehrenfeucht value as given by Definition 3.2.
Proof. Let T, T satisfy the condition of Theorem 5.7. We give a strategy for Duplicator to win the k-move Ehrenfeucht game. For convenience we add a move zero in which the roots R, R are selected. Suppose i moves remain. Consider the union of the balls of radius Rad(i) about the chosen vertices (including the root) in each tree. These split into components. Duplicator shall insure that the corresponding chosen vertices are in the same components and that the components are equivalent. At the start, with i = k, this is true by assumption, because B(R, Rad(k)) ≡ k B(R , Rad(k)). Suppose this holds with i moves remaining and Spoiler selects (the other case being symmetric) v ∈ T . There are two cases. Inside: v is at distance at most 2Rad(i − 1) + 1 from a previously selected v s . Then its ball of radius Rad(i − 1) lies entirely inside (from the recursion (5.1)) the ball of radius Rad(i) around v s . Duplicator then considers the equivalent component in T and moves the corresponding v . Outside: Now the ball of radius Rad(i − 1) about v lies in a separate component from the balls of radius Rad(i − 1) about the previously chosen vertices. As T is k-full, Duplicator selects v ∈ T satisfying the conditions of Definition 5.6.
In either case Duplicator continues the property. At the end of the game there are zero moves left. The union of the balls of radius zero, the vertices selected, are equivalent in T, T and Duplicator has won.
Definition 5.8. Christmas Tree: We replace the complex notion of k-full by a simpler sufficient condtion. For each σ ∈ Σ BALL k create k copies of a ball in that class. Take a root vertex v and on it place k · |Σ BALL k | disjoint paths (parent to child) of length Rad(k) + Rad(k − 1) + 1. Make each endpoint the top of one of these copies.
Definition 5.9. The k-universal tree, denoted U N IV k , is the Christmas Tree defined above.
Definition 5.10. T is called s-universal (given a fixed positive integer k) if all T with T | s ∼ = T | s have the same k-Ehrenfeucht value. Thus EV [T ] is determined by T | s completely.
Theorem 5.11. If for some v, T (v) ∼ = U N IV k then T is k-full. Thus, by Theorem 5.7, the k-Ehrenfeucht value of T is determined by T | Rad(k) , or in other words, T is Rad(k)-universal.
Remark 5.12. Many other trees could be used in place of U N IV k , we use this particular one only for specificity.
where v is not the root, cannot determine the Ehrenfeucht value of T as, for example, it cannnot tell us if the root has, say, precisely two children. Containing this universal tree U N IV k tells us everything about the Ehrenfecuht value of T except properties relating to the local neighborhood of the root.

Rapidly Determined Properties
We consider the underlying probability space for the GW tree T = T c to be an infinite sequence X 1 , X 2 , . . . of independent variables, each Poisson with mean c. These naturally create a tree. Let the root have X 1 children. Now we go through the nodes in a breadth first manner. Let the i-th node (the root is the first node) have X i children. This creates a unique rooted tree. Note, however, that when the tree is finite with, say, n nodes, then the values X j with j > n are irrelevant. In that case we say that the process aborts at time n.
We employ a useful notation of Donald Knuth.
Definition 6.1. We say an event occurs quite surely if the probability that it does not occur drops exponentially in the given parameter.
Definition 6.2. Let A be any property or function of rooted trees. We say that A is rapidly determined if quite surely (in s, with T = T c and c given) X 1 , . . . X s tautologically determine A.
Remark 6.3. Consider the property that T is infinite and suppose c > 1. Given X 1 , . . . , X s if the tree has stopped then we know it is finite. Suppose however (as holds with positive limiting probability) after X 1 , . . . X s the tree is continuing. If at that stage there are many nodes we can be reasonably certain that T will be infinite, but we cannot be tautologically sure. This property is not rapidly determined.
Remark 6.4. In this work we restrict the language in which A is expressed. It has been suggested that another approach would be to restrict A to rapidly determined properties.
Theorem 6.5. Let T 0 be an arbitrary finite tree. Let A be the (non first order) property that either the process has aborted by time s or there exists v ∈ T with T (v) ∼ = T 0 . Then A is rapidly determined in parameter s.
The proof is given in [1]. Let T 0 have depth d. Roughly speaking, when we examine X 1 , . . . , X s either the process has aborted or it has not. If not, quite surely some i ≤ s has T (i) ∼ = T 0 . Here is chosen small enough (dependent on c, d) so that quite surely the children of all i ≤ s down d generations.
Theorem 6.6. Every first order property A is rapidly determined.
Proof. Let A have quantifier depth k. Let T 0 be the universal tree U N IV k as given by Theorem 5.11. From Theorem 6.5 if T has not aborted by time s then quite surely some T (i) ∼ = T 0 . But then T is already k-full and already has depth at least Rad(k). By Theorem 5.7 the k-Ehrenfeucht value of T , hence the truth value of A, is determined solely by T | Rad(k) , and hence tautologically by X 1 , . . . X s .
Theorem 6.7. Fix a positive integer k. Let T ∼ T c . Then quite surely (in s), T is s-universal.
Theorem 6.6 gives that the k-Ehrenfeucht value of T is quite surely determined by X 1 , . . . , X s . When this is so it is tautologically determined by T | s , which has more information.

Unique Fixed Point
Theorem 7.1. The map Ψ c : D → D has a unique fixed point.
Proof. Let f (s) be the probability that T c is not s-universal. For any y, z ∈ D we couple Ψ s c ( y), Ψ s c ( z). Create T c down to generation s and then give each node at generation s a σ ∈ Σ with independent distribution y, respectively z. Then Ψ s c ( y), Ψ s c ( z) will be the induced state of the root. But when T c is s-universal this will be the same for any y, z. Hence T V [Ψ s c ( y), Ψ s c ( z)] ≤ f (s). When y, z are fixed points of Ψ, T V [ y, z] ≤ f (s)]. As f (s) → 0, y = z.
Remark 7.2. Theorem 7.1 will also follow from the more powerful Theorem 4.2.
Remark 7.3. It is a challenging exercise to show directly that the solution x to (3.12) or the solution x, y to the system (3.13 and 3.14) are unique.

A Two Stage Process.
Here we prove Theorem 4.2 for arbitrary c. Let D 0 be the depth of U N IV k , as given by Definition 5.9. We shall set and think of T | s as being generated in two stages. In Stage 1 we generate T | s0 . From Theorem 6.7, by taking s 0 large, this will be s 0 -universal with probability near one. In Stage 2 we begin with an arbitrary but fixed T 0 of depth at most s 0 . (We say "at most" because it includes the possibility that T 0 has no vertices at depth s 0 .) From each node at depth s 0 , mutually independently, we generate a GW-tree down to depth D 0 . We denote by Ext(T 0 ) this random tree, now of depth (at most) s. T where the sum is over all T 0 of depth (at most) s 0 .
Proof. We split the distribution of T into the distribution of Ext(T 0 ), with probability Pr[T | s0 = T 0 ], over each T 0 of depth (at most) s 0 .
8.3. Some Technical Lemmas. Let X = X(c, s) be the number of children at generation s of the GW tree T = T c . Let Y be the sum of t independent copies of X. The next result (not the best possible) is that the tail of Y is bounded by exponential decay in t.
Remark 8.7. In the worst case the event BAD would coincide with the top probability Ke −tγ in the distribution of Y .
Proof of Lemma 8.6. We split Y into Y < y 1 t and Y ≥ y 1 t, where y 1 needs to be chosen suitably. We fix some λ > 1 and choose where γ is as in the bound of P [BAD]. From (8.4) and (8.9), we then have The desired bound now follows easily, by choosing κ = min {y 1 β, γ} and k = Ky 1 + y 1 + 1.
Theorem 8.8. There exists K 0 (dependent only on s 0 , k) such that for any T 0 and any x, y ∈ D (8.10) Remark 8.9. As K 0 may be large, Theorem 8.8, by itself, does not give a contracting mapping. It does limit how expanding Ψ s c (T 0 , ·) can be. Remark 8.10. Let t be the number of nodes of T 0 at depth s 0 . Let T V ( x, y) = . The expected number of nodes in Ext(T 0 ) at level s = s 0 + D 0 is then tK 1 with K 1 = c D0 . The methods of Theorem 4.4 would then give Theorem 8.8 with K = K 1 t. However, when c > 1 this K would be unbounded in t. Our concern is then with large t though, technically, the proof below works for all t.
Proof. Let t be the number of nodes of T 0 at depth s 0 . Let T V ( x, y) = . We again couple x, y. Let Y be the number of nodes in Ext(T 0 ) at level s. Given Y = y, let us name these vertices u 1 , . . . u y . Again we create two pictures. In picture 1, we assign, mutually independently, labels X i ∈ Σ to u i , with X i ∼ x, and in picture 2, label Z i ∼ y. (X i , Z i ), 1 ≤ i ≤ y mutually independent, but X i , Z i are coupled so that (8.11) The probability of the event that for at least one i, X i = Z i is then bounded above by y · . Suppose X ∈ Σ is the label of the root of Ext(T 0 ) in picture 1, and Z that in picture 2, determined by using the recursion function Γ repeatedly upwards starting at level s. Then X ∼ Ψ s c (T 0 , x), Z ∼ Ψ s c (T 0 , y). Here A = A(t) approaches zero as t → ∞ and so there exists K 0 such that A ≤ K 0 for any choice of t.
8.5. Proving Contraction. We first show Theorem 4.2 in terms of the T V metric. Pick s 0 sufficiently large so that, say, the probability that T | s0 is not s 0 -universal is at most (2K 0 ) −1 , K 0 given by Theorem 8.8. This can be done because of Theorem 6.7. Let x, y ∈ D with = T V ( x, y). We bound T V (Ψ s c ( x), Ψ s c ( y)) by Theorem 8.4. Consider T V (Ψ s c (T 0 , x), Ψ s c (T 0 , y)). When T 0 is s 0 -universal this has value zero. Otherwise its value is bounded by K 0 by Theorem 8.8. Theorem 8.4 then gives Finally, we switch to the L 2 metric ρ. For B a sufficiently large constant the inequaliteis (4.9) yield, say, ρ(Ψ sB c ( x), Ψ sB c ( y)) ≤ A u is a convergent sequence, and so A − I is invertible, As by Property i the function ∆ is smooth, the Implicit Function Theorem gives that the fixed point function x(c) of F is C ∞ .