The metric dimension of critical Galton-Watson trees and linear preferential attachment trees

The metric dimension of a graph $G$ is the minimal size of a subset $R$ of vertices of $G$ that, upon reporting their graph distance from a distingished (source) vertex $v^\star$, enable unique identification of the source vertex $v^\star$ among all possible vertices of $G$. In this paper we show a Law of Large Numbers (LLN) for the metric dimension of some classes of trees: critical Galton-Watson trees conditioned to have size $n$, and growing general linear preferential attachment trees. The former class includes uniform random trees, the latter class includes Yule-trees (also called random recursive trees), $m$-ary increasing trees, binary search trees, and positive linear preferential attachment trees. In all these cases, we are able to identify the limiting constant in the LLN explicitly. Our result relies on the insight that the metric dimension can be related to subtree properties, and hence we can make use of the powerful fringe-tree literature developed by Aldous and Janson et al.


Introduction
The metric dimension is a notion originating from combinatorics, first defined by Slater [41] and independently by Harary and Melter [19]. Heuristically, the problem can be described in terms of source-detection: given a graph G = (V, E) with an unknown special vertex v , we would like to identify v based on limited observations. We think of v as the source of a spreading process (say, a color red, that can be thought of being an infection, or any type of information) on the graph. The spreading starts at time t = t 0 = 0, when v becomes red. The color then spreads at unit speed across edges: each direct neighbor of v is colored red at time t = 1, each second neighbor at time t = 2, and so on. Vertices keep their color forever. We are allowed to place, in advance, sensor vertices on the graph, forming a sensor set R ⊂ V . Sensor vertices report their coloring/infection time. Based on the vector of these infection times, we would like to uniquely identify the source vertex v . The minimal number of sensors needed for perfect detection, no matter what the location of v , is called the metric dimension (MD) of the graph, that we denote by β(G). Any set of sensors that can uniquely identify the source vertex v (no matter what its location is) is called a resolving set.
Algorithmic aspects. Computing resolving sets or even the metric dimension for general graphs is shown to be NP-hard [29] and it is approximable only to a factor of log(N ) [6,20]. The MD of specific deterministic graph families has been extensively studied, we refer to [38] for a list of references. For instance, for trees the MD can be written as the difference of the number of leaves and so-called exterior major vertices of the tree (vertices of at least degree 3 that have a line-graph leading to a leaf), both of which can be computed in linear time [29]. We mention that the MD has deep connections to the automorphism group of the graph G [4,11,18], and hence the graph isomorphism problem [3].
Asymptotic results. From the probabilistic point of view little is known about the asymptotic behaviour of MD of random graph families as their sizes tend to infinity. A pioneering work [8] determines the asymptotics of MD of Erdős-Rényi random graphs. In this Law of Large Numbers (LLN) type of result, the authors showed a surprising non-monotonous zig-zag phenomenon of the metric dimension as the average degree increases from bounded to linear in the graph size. A central limit theorem (CLT) type result for uniform random trees was determined in [33], and also for subcritical Erdős-Rényi random graphs.
Our contribution. In this paper, we provide LLN type results for two general distributions on trees: families of growing trees that grow according to general linear preferential attachment schemes, and conditioned critical Galton-Watson trees that include uniform random trees.
We describe these families briefly. In a general linear preferential attachment tree, there are two parameters, ρ > 0 and χ ∈ R. We start with a single root vertex. When there are i vertices, we attach the (i + 1)-st vertex to one of the existing vertices v ≤ i with probability proportional to (ρ + χdeg i (v)), where deg i (v) is degree of vertex v after i vertices have been added. Clearly, due to the normalization, only the quotient ρ/χ matters, and for the rest of the paper wlog we only consider χ ∈ {−1, 0, 1}. When χ = −1, we require ρ to be an integer.
We explain now why this class of trees contain m-ary increasing trees, binary search trees, and uniform recursive trees as well as rich-get-richer trees, that are the 'usual' linear preferential attachment trees. When we take ρ = m and χ = −1, we obtain the m-ary increasing tree: In its original definition of an m-ary increasing tree each vertex has the potential to have m labeled offspring. The tree starts with a single vertex (the root) at step 1, and at each step a new vertex arrives. When the tree has i vertices, a new vertex can attach to mi − (i − 1) = (m − 1)i + 1 possible places, since out of the mi possible places, i − 1 are already taken (only the root does not have a parent). An m-ary increasing tree with n vertices, is constructed by starting with a single root vertex, and placing the (i + 1)-st vertex uniformly randomly among the (m − 1)i + 1 possible places [24]. The probability that the (i + 1)-st vertex connects to vertex v ≤ i is thus proportional to m − deg i (v). Hence, we recognise the formula for m = ρ and χ = −1.
For ρ = 2 and χ = −1, the binary increasing tree corresponds to another well-known tree: the random binary search tree, an object that gained attention in computer science. In (the original definition of) a binary search tree, each vertex can store a single key and can have at most two children. The keys can be thought of as i.i.d. uniform random variables on [0, 1] (this is a representation used by Devroye in [15]). Initially, the first key K 1 arrives and is placed at the root. This makes the root a full vertex. Upon filling, every vertex creates two potential vertices, one on the left and one on the right, that can receive a key each. These potential vertices do not count as part of the tree yet, only once they contain a key and become full vertices. After the tree has i keys, the (i + 1)-st key K i+1 arrives and is compared to the key in the root. If K i+1 < K 1 , it is pushed to the left (otherwise to the right). Then it is compared to the key occupying the vertex that is the left (resp. right) child of the root, and again pushed left (resp. right) if it is less (resp. larger) than the key in that vertex. The procedure continues until the key finds an potential vertex and occupies it. Since only the permutation of the keys matters, it can be shown that when the tree has i full vertices, and hence i + 1 potential vertices, the (i + 1)-st vertex is equally likely to be placed at any of these potential vertices. Hence, the probability that a full vertex with v ≤ i with degree 1 deg i (v) gets a new child in step i + 1 has probability (2 − deg i (v))/(i + 1), and we get back ρ = 2, χ = −1.
A similar construction exists for m > 2, called the m-ary search tree, when each vertex can store up to m − 1 keys. This tree, however, is not equivalent to the m-ary increasing tree [24], and we omit studying them further in this paper. Binary search trees are also the tree-representation of the Quicksort algorithm [31]. Many of their properties are well studied, including Law of Large Numbers and Central Limit Theorems, see e.g. [15,17] such as the proportion of k-protected nodes or subtree sizes.
The random recursive tree is constructed analogously to the previous construction, except there is no dependence on the degree: starting with a single root vertex, the (i + 1)-st vertex attaches uniformly to each of the already present i vertices by an edge. This case corresponds to ρ = 1, χ = 0. Random recursive trees have a natural correspondence to binary search trees, and so often they are treated together [30]. They are also called Yule-trees, due to the fact that they can be naturally embedded in a Yule-process, and hence they have connections to phylogenetic trees [7].
The 'usual' linear preferential attachment tree, also called rich-get-richer tree, is constructed by taking ρ > 0, χ = 1. In this case the (i+1)-st vertex attaches to v ≤ i with probability proportional to ρ + deg i (v). The ρ = χ = 1 case corresponds to the positive linear preferential attachment tree, which was informally introduced by Barabási and Albert [5], although they allowed general graphs, not only trees. This is the model that produces power-law degree distributions [9], see also Hofstad [21] and the survey [24]. Positive linear preferential attachment trees have already been studied in the context of source location [28], with the difference that the authors of [28] consider snapshot-based source location and the MD is connected to sensor-based source location [43].
The survey [24] gives and excellent overview of the literature on various properties of all these growing trees, hence we refer the reader there for further literature.
We mention that our method provides almost sure LLN for a much larger class of random growing trees. This class is the class of trees that can be embedded in a Crump-Mode-Jagers branching process with finite Malthusian parameter; e.g. sub-linear preferential attachment trees, m-ary search trees, fragmentation trees, etc. We refer the reader to various classes of such trees to the survey of Janson and Holmgren [24].
Our second result is motivated by reproducing LLN of the metric dimension of uniform random trees [33]. A uniform random tree on n vertices is a tree that is chosen uniformly at random (u.a.r.) from the possible n n−2 labeled trees on n vertices. As mentioned before, LLN and even CLT for the MD of uniform random tree was proved in [33] using analytic combinatorics. We are able to reproduce the LLN result with a very short proof, and in higher generality.
Namely a uniform random tree has the same distribution as a Galton-Watson branching process, with Poisson offspring distribution with mean 1, conditioned to have total progeny n, see e.g. [21,Proof of Theorem 3.17]. Hence it is equivalent to determine the MD of conditioned GW trees.
A Galton-Watson tree is a random tree defined by the offspring distribution ξ taking values in N = {0, 1, . . . }. Initially a single individual (vertex) is born, which becomes the root of the tree, and the root gives rise to ξ children. Thereafter, each newly born individual samples its own independent copy of ξ and gives rise to that many new children, and the process continues recursively. We consider Galton-Watson trees conditioned to have n vertices, so we must assume that P(ξ = 0) = 0, otherwise the process never ends. We will assume that the Galton-Watson trees are critical, i.e., E[ξ] = 1, which is also fairly natural for conditioned Galton-Watson trees (see Remark 3.1 of [27]), since in this case a non-trivial limiting measure on trees exists (called the incipient infinite tree).
where G ξ (x) = ∞ n=0 p n x n is the probability generating function of ξ evaluated at x. As a corollary of this theorem, by substituting ξ = Poi(1) we recover the result of [33] on uniform random trees.
Corollary 1.1. The metric dimension of a uniform random tree U n on n vertices satisfies the following Law of Large Numbers: Methodology. The metric dimension of a given fixed tree can be computed explicitly using the number of leaves of the tree and the number of exterior major vertices, i.e., vertices of at least degree 3 that have a line-graph leading to a leaf, see Theorem 2.1 below.
The novel insight in our proofs is that both the asymptotic proportion of leaves as well as that of exterior major vertices of random trees T can be computed using results from the fringe tree literature initiated by Aldous in [1]. A fringe tree of a rooted tree, in plain words, is the random subtree obtained by choosing a vertex u.a.r. in the tree and taking its subtree pointing away from the root. The distribution of fringe trees is shown to converge for a large class of trees. So, fringe trees of a rooted tree T helps us to compute the asymptotic proportion of vertices v in T that have a certain property P, with the limitation that P must be a subtree-property. A subtree property is any property that depends only on the subtree of T rooted at v pointing away from the root. It is easy to see that being a leaf is a subtree-property. While strictly speaking being an exterior major vertex is not a subtree-property, we find a subtree property that serves as a good proxy.
The use of fringe tree-methodology allows us to use probabilistic arguments that are often much shorter than the analytic-combinatorial arguments used in [33]: the proportion of fringe-trees satisfying a given subtree property converges. Moreover, since the fringe distribution of several general random tree families are known [24,27], our proofs are quite general. Our results hold for critical Galton-Watson trees with a finite variance degree distribution (which includes, among others, uniform random trees, Motzkin trees, random binary trees) and all linear preferential attachment trees (which includes, among others, binary search trees, random recursive trees, positive linear preferential attachment trees) [16].
The fringe tree literature has CLT type results, which suggests that many of our results in this paper can also be extended to a CLT. In particular, the CLT of metric dimension for binary search trees and uniform recursive trees should be a consequence of the CLT proved in [23]. For the other cases, this is not a trivial extension, and we leave it for future work.
Other contexts. Resolving sets have wide rage of applications, including robot navigation [29,40], computational chemistry [13], network discovery [6] and source detection. In particular, source detection has a large body of literature. From the statistical point of view, motivated by the problem of determining the authors of online viruses, malicious information, and fake news, the seminal work [39] investigated the question: Can we locate the source if we only observe the epidemic much later, when it has already infected a large fraction of the population? Various statistical estimators of the source have been developed since, using e.g. belief propagation, subtree ranking, infection eccentricity, rumour centrality, and the minimum description principle [2,10,37,39,45]. These methods use only binary information about the vertices (infected vs not infected at some time t > t 0 ) as observational input. In an applied setting, a possibly noisy observation of the infection times at a few predetermined sensor vertices might be readily available, and with this extra information we might be able to detect the source by observing only a small subset of the nodes [36,44]. With the exception of the recent work of [32], not much is known about the number of required sensors in source detection if the spreading of the epidemic is very noisy. On the other hand, if we assume no noise in the spreading of the epidemic and the observations, the minimum number of sensors required to perfectly locate the source is equivalent to the MD problem [43]. . We show the average of the normalized MD of 1000 independently simulated random trees with 1000 nodes. Unless they are too small to be visible on the plot, we also show the 95% confidence intervals for the simulation results on top of the bar plots.
If in addition, the start time of the epidemic (t 0 ) is unknown, the minimum number of required sensors becomes equivalent to the double metric dimension problem [14]. The algorithmic aspects of the double MD in the source location context were investigated in [12,14], and the double MD of Erdős-Rényi random graphs was computed by [42]. Recently, [35] studied a version of the MD in Erdős-Rényi graphs, where the sensors can be placed sequentially based on the observation times of previously placed sensors.
Organisation of the rest of the paper. In Section 2 we define the notions we use precisely, and we give the formula for the constant c (ρ,χ) for general linear preferential attachment trees. In Section 3 we explain the general methodological background about embedding discrete trees in (continuous time) Crump-Mode-Jagers trees, and fringe trees and subtree properties. In Section 4 we prove our results.

Definitions and numerical values for c (ρ,χ)
We start by giving a formal definition of the metric dimension.
Definition 2.1 (MD). Let G = (V, E) be a simple connected graph, and let us denote by d(v, w) ∈ N the length of the shortest path (that is, the number of edges) between nodes v and w that we The next definition helps us express the MD of fixed trees explicitly. Definition 2.2 (Leaves and exterior major nodes). Let us denote by deg(v) the degree of a node v ∈ V . We say that a node v ∈ V is a leaf if deg(v) = 1, and it is a major node if deg(v) ≥ 3. If a major node v ∈ V has a path to a leaf that only contains degree-two vertices besides the beginning and the end of the path (i.e., a line-graph), we say that v is an exterior major node. Let us denote the set of leaves of G by L(G) and the set of exterior major nodes of G by K(G).
The following theorem characterises the metric dimension of a fixed tree.
Theorem 2.1 (Metric dimension of trees [41]). Consider a fixed tree T . If T is a path graph, then β(T ) = 1. Otherwise, We refer the reader to [41] for a proper proof, but we explain the formula heuristically. It is not hard to see that if two or more leaves are attached to a major node by line-graphs, then the vertices at equal distance from the major node on these lines are indistinguishable by sensors that do not fall into these lines. Hence, all but one of the terminal leaves of such lines have to be sensors.
Now we state our more detailed results about families of trees growing according to general linear preferential attachment schemes, that is, we refine Theorem 1.1 and express the limiting constant c (ρ,χ) of the MD explicitly. Some of the numerical values acquired from the Theorems 2.2, 2.3 and 2.4 below are shown in Figure 1 along with numerical approximation given by computer simulations.
Random binary search tree and m-ary increasing trees. Recall that in an m-ary increasing tree is equivalent to a general linear preferential attachment tree with ρ = m and χ = −1, and that the for m = 2, an m-ary increasing tree is equivalent to a random binary search tree.
We write for the lower incomplete gamma function, and for the generalized binomial coefficient.
be a growing sequence of random m-ary increasing trees with n vertices. Then, where for all (i, j) ∈ N 2 with i + j ≤ m and (i, j) = (1, m − 1) In particular, for the binary search tree (m = 2), this expression evaluates to We provide two proofs to this theorem for m = 2 below in Sections 4: a combinatorial proof and a probabilistic proof. The probabilistic proof is more robust, and we are able to generalise that proof for m > 2 and other types of attachments rules.
Random recursive tree. As mentioned in the introduction, a random recursive tree is constructed by attaching each new node uniformly randomly to one of the existing nodes. It is also a special case of a general linear preferential attachment tree with parameters ρ = 1, χ = 0.  Rich-get-richer trees. Theorems 2.2 and 2.3 covered general linear preferential attachment trees with χ ∈ {−1, 0}. In the next theorem it suffices to state the result with χ = 1. These trees are often called rich-get-richer trees, as new nodes are more likely to attach to nodes with higher degrees.
Theorem 2.4 (MD of rich-get-richer trees). Let T (ρ,1) n be a sequence of linear preferential attachment trees with n nodes and χ = 1, ρ > 0. Then, The ρ = χ = 1 case corresponds to the positive linear preferential attachment tree, introduced by [5]. For positive linear preferential attachment trees we can use Theorem 2.4 and a numerical integration software [25] to obtain the following result result.

Method and discussion
In this section we introduce fringe-trees and general results on their convergence, we explain the embedding of trees growing in discrete times into Crump-Mode-Jagers branching processes, and relate the metric dimension to subtree properties.
3.1. Fringe trees. For the rest of the paper, all trees T are considered to be rooted, which simply means that they have a special vertex denoted by root(T ). In rooted trees, every vertex v ∈ T \ {root(T )} has a parent, which is the first vertex on the path from v to root(T ). For any vertex v ∈ T , let T v be the subtree of T rooted at v, that is the connected subtree of T that contains v after removing the parent of v (as a special case T root(T ) = T ). If we sample v uniformly at random from T , we say that the random tree T v is a random fringe tree of T . When T is a deterministic tree, this definition is quite straightforward. However, we are interested in the case when T itself is random, and in this case defining random fringe trees requires more care. When T is deterministic n S (T )/|T | defines the random fringe tree distribution. When T is random, we can think of the sampling of T and v as a combined random event, which again gives rise to a distribution over trees. This is called the annealed fringe tree distribution. In this paper, we are interested in the quenched fringe tree distribution. In the quenched version, we think of n S (T )/|T | as a distribution that is itself random. Since we are interested in the convergence of fringe tree distributions as the size of the trees tend to infinity, we are going to focus on the convergence of the random variables n S (T n )/|T n | (almost surely (a.s.) or in probablity (p)).
We also defined the seemingly more general notion of n P (T ), however, in our applications whenever we can say something about the convergence of n S (T n )/|T n |, we have a similar result for n P (T n )/|T n |. In fact, since working with subtree properties will be very convenient for computing the MD (see Lemma 3.2), we only state the results from the fringe tree literature on n P (T ).  [27]). Let GW n be a sequence of Galton-Watson trees conditioned to have n vertices, with offspring distribution ξ, where E[ξ] = 1 and E[ξ 2 ] < ∞. Let F be the unconditioned Galton-Watson tree with the same offspring distribution. Then, for every subtree property P, n P (GW n ) n p − → P(F ∈ P).
The previous theorem applied to any Galton-Watson tree with E[ξ] = 1 and E[ξ 2 ] < ∞. The next theorem only applies to a single family of growing trees, the binary search tree. We will use it to give a combinatorial proof of the LLN of the MD of binary search trees (second part of Theorem 2.2).
be a growing sequence of binary search trees of size n. Then, for every subtree property P, In words, this theorem says that the fringe-tree distribution of a random binary search tree is again a random binary search tree with a random size: the probability that the size of the fringe-tree is k is 2/((k + 1)(k + 2)). A similar statement can be made for random recursive trees, however, we do not include this statement as it will not be used in our proofs. Instead we introduce a more powerful theorem which will help to strengthen the convergence to almost sure, treat mary increasing trees for general m ≥ 2, random recursive trees, and linear preferential attachment trees.

3.2.
Crump-Mode-Jagers trees and fringe trees. A Crump-Mode-Jagers (CMJ) branching process generalizes, among many other random tree models, m-ary increasing trees and random recursive trees. Heuristically speaking, CMJ branching processes provide a method of embedding trees growing in discrete steps into a corresponding continuous time process. The CMJ process is defined by a point process Ξ = (ξ 1 , ξ 2 , . . . ), called the reproduction process. At time zero, a single vertex is born, which becomes the root of the tree, and the children of the root are born at time ξ 1 , ξ 2 , · · · . Similarly, each vertex v born at time t v has an independent copy of Ξ denoted as Ξ v = ξ v,1 , ξ v,2 , · · · , and the offspring of v are born at time t v + ξ v,1 , t v + ξ v,2 , · · · . So far we defined a branching process that grows over time. We obtain a random tree from this branching process by stopping the process at time τ and taking only the vertices (individuals) that have already been born. The stopping time τ can depend on the tree (very often τ is the time the n th individual is born), or it can be an independent random variable.
Lemma 3.1. A CMJ tree with a linear preferential attachment reproduction process Σ ρ,χ stopped when it reaches n vertices has the same distribution as a linear preferential attachment tree with n vertices and parameters ρ and χ.
This lemma is due to the memoryless property of exponential random variables; the proofs can be found in [24, Sections 6.3, 6.4].
The interesting property of CMJ trees is that the fringe tree distribution of the random CMJ tree stopped at n vertices is again a random CMJ tree, with the same reproduction process, stopped at a random time that is independent of the number of vertices. This independence of the stopping time will be heavily exploited in our proofs. In this paper, we only use the results on the fringe trees of linear preferential attachment trees. We refer to [24] for the general statement on CMJ trees. 26,34],Theorem 5.14 of [24]). Let (T (ρ,χ) n ) n≥1 be a growing sequence of linear preferential attachment trees with n vertices and parameters ρ > 0 and χ ∈ {−1, 0, 1}. Let F be the corresponding CMJ tree stopped at random time Exp(ρ + χ). Then, for every subtree property P, 3.3. Expressing the metric dimension with subtree properties. In this section we reduce the metric dimension of trees to counting subtrees with certain properties. Recall Theorem 2.1 that expresses the MD of a tree as the difference between the number of leaves and that of exterior major vertices.
Definition 3.3. Let P L be the subtree property that the subtree is a single vertex, that is a leaf. Let P K be the subtree property that the root has degree at least two and at least one of its subtrees is a line-graph to a leaf (a single vertex is considered to be a line).  show the smallest trees where n P K (T ) − K(T ) = ±1, respectively. The inequality n P K (T ) − K(T ) > 0 holds only for trees in which the root has degree 2, and the root has a line-graph to a leaf. In this case the root has property P K , but it does not count into K(T ) since it has degree 2. The inequality n P K (T ) − K(T ) < 0 holds only for trees in which the root that has degree 1, and the first descendant of the root with degree 3 (node v) has no other line-graph to a leaf. In this case v counts into K(T ), but it does not have property P K .
Proof. We are going to show the equivalent statement that for any deterministic rooted tree T , we must have n P L (T ) = |L(T )| and |n P K (T ) − |K(T )|| ≤ 1. The equality n P L (T ) = L(T ) follows from the definition. Next we show that |n P K (T ) − |K(T )|| ≤ 1 (see also Figure 3).
If v ∈ V is not the root of T , then T v ∈ P K implies v ∈ K(T ). This is because v must have at least two children by the property P K and a parent vertex since v is not the root, which means that v has degree at least three. By the definition of P K , T v contains a line-graph to a leaf. Hence For the other direction, we argue that v ∈ K(T ) implies T v ∈ P K , except for at most one vertex v ∈ V . This is because v has degree at least three by the exterior major vertex property, two of which must be the children of v in T v . Moreover, the path of degree two vertices to a leaf ensured by the exterior major vertex property must be a subtree that is a path in T v , unless the path of degree two vertices to a leaf is through the parent of v. This can only happen if all ancestors of v have degree two, the root(T ) has degree one or two, and if root(T ) has another subtree that does not contain v, this must be a line-graph. In other words, root(T ) can have only one subtree with a major vertex, and v must be the first major vertex on this subtree, if such a v exists. Hence |K(T )| − 1 ≤ n P K (T ).
In all of our proofs we will combine Lemma 3.2 with either Theorem 3.1, 3.2 or 3.3. Since P(F ∈ P L ) is an easy computation in all cases, most of the difficulty will come from computing P(F ∈ P K ), where F is a random tree having the limiting fringe tree distribution (see formulas (12), (13) and (15)). To compute P(F ∈ P K ), often it will be useful to condition on the degree of the root of F, and another event E, that will be the ringing time of the doomsday clock Exp(ρ + χ) in Theorem 3.3. Recall that for any non-negative discrete random variable Y we denote by the probability generating function of Y evaluated at x. Lemma 3.3. Let κ be the degree of root(F). If v is an offspring of root(F), let B v be the event that F v is a line-graph. Suppose that for some event E the indicators of B v , conditioned on κ and E, are independent and identically distributed Bernoulli random variables with parameter q. Then, Proof. Let A be the event that the root has at least two offspring, and B i be the event that the root of the i th subtree is born, and the subtree is a line-graph. Let us denote the event B := ∪ i≥1 B i . By definition the event F ∈ P K = A ∩ B. Then we can write, where the last line followed since we assumed that B i are independent Ber(q) conditioned on κ and E. Noticing that the last sum is the generating function of (κ | E) evaluated at 1 − q, except that the index starts from two instead of zero, we get Remark 3.1. If we were interested in simply exterior vertices, using the same ideas, the expression in equation (17) would simplify to 1 − G κ|E (1 − q).

Proofs
In this section we prove Theorems 1.2, and 2.2-2.4.

Metric dimension of conditioned Galton-Watson trees.
Proof of Theorem 1.2. Combining Lemma 3.2, and Theorem 3.1, we have that where F is a Galton-Watson tree with offspring distribution ξ. Clearly, P(F ∈ P L ) = p 0 . It remains to compute P(F ∈ P K ). Since the subtrees of each offspring in a Galton-Watson tree are independent the conditions of Lemma 3.3 are satisfied without conditioning.
We still need to find the value of q = P(B v ), which is the probability that F v is a line-graph since the subtree F v is independent of the degree of the root of F. Vertex v can have (i) zero offspring, in which case F v is a (trivial) line graph, (ii) one offspring, in which case F v is a line with probability q, or (iii) more than one offspring, in which case F v is not a line. Hence, we have the equation which gives q = p 0 /(1 − p 1 ). Substituting equation (17) into equation (20) with q = p 0 /(1 − p 1 ) we obtain the desired result.

Metric dimension of binary search trees (combinatorial proof ).
Proof of Theorem 2.2, m = 2. Combining Lemma 3.2 and Theorem 3.2, we obtain that Clearly P(T (2,−1) k ∈ P L ) equals 1 for k = 1 and 0 for k > 1, which implies that the first term in equation (22) is 1/3. It remains to compute the second term in equation (22). Recall full and potential vertices from the description of binary search trees on page 2. Let k = k −1 and S k ∈ {0, . . . , k } be the number of (full) vertices in the left subtree when the tree has k (full) vertices. Notice, that the number of potential vertices in the left and right subtrees follows a Pólya urn process with two urns initially with a single white and a single black ball, and that the number of full vertices is always one less than the number of potential vertices in each subtree. Elementary calculation using induction shows that S k is then uniform over the set {0, . . . , k }, or in other words P(S k = ) = 1/(k + 1), see e.g. [22,Theorems 5.2,5.3].
Since S k ∈ {0, k } implies that the root has degree less than two P(T k ∈ P K |S k ∈ {0, k }) = 0. By the law of total probability, Now we focus on the second condition of P K , the existence of a subtree that is a line. If a subtree has vertices, we argue that the probability that it is a line is Indeed, if the subtree has just one or two vertices, it must be a line. Thereafter, conditionally that the subtree is a line after having i − 1 vertices, when we place the i th vertex into the subtree, we have to sample from i possible places, only two of which keep the subtree a line. Namely, the children of the last vertex on the line. Here we use that the placement of vertices in the binary search tree is uniform over the possible locations, and conditioned that the vertex falls into the left (resp. right) subtree, its placement is uniform over the available locations within this subtree.
To compute the probability that at least one of the subtrees is a line we apply an elementary inclusion-exclusion argument. For 1 ≤ ≤ k − 1, we have where in the last term we used that conditioned on their sizes, the left and right subtree evolve independently. Substituting the rhs back into equation (23) and using the basic identities of binomial coefficients, and recalling that k = k − 1, we obtain Substituting (25) into (23) and then into (22) we obtain (with k = k − 1) Getting a closed form expression for Since the sum ∞ k= +2 2/(k(k + 1)(k + 2)) is also bounded, we can swap the order of the sums to get the easier expression .
The sum ∞ k= +2 2/(k(k + 1)(k + 2)) can be evaluated by elementary arithmetic operations and a telescopic sum. Indeed, Substituting back into equation (27), elementary arithmetic operations give The last equality follows if we notice that the sum that we are subtracting is the same as the sum we are subtracting from, except it is shifted by one index. Hence, the result of the subtraction is the simply the first term of the sum. A similar compuation yields the following equalities, Finally, substituting into equation (26) we obtain which is the desired result.

4.3.
Metric dimension of general linear preferential attachment trees (proof using fringe trees). In this section we prove Theorems 2.2, 2.3 and 2.4. First, we state a few preliminary lemmas. We handle all values of (ρ, χ) together until the last step when we obtain the numerical values. Recall that Lemma 3.1 gives an embedding of (T (ρ,χ) n ) n≥1 into a Crump-Mode-Jagers process with reproduction function Σ ρ,χ given in Definition 3.2. Combining Lemma 3.2, and Theorem 3.3, we have that where F is a CMJ tree with offspring point process Σ ρ,χ stopped at random time τ = Exp(ρ + χ). By Definition 3.2, the time of the first offspring of the root of F is an Exp(ρ) random variable. To find P(F ∈ P L ) we need to compute the probability that the doomsday clock Exp(ρ + χ) rings before the first offspring clock Exp(ρ). Hence, Next, we check that the conditions of Lemma 3.3 are satisfied, which will help us to find P(F ∈ P K ). Let Σ ρ,χ = (ξ 1 , ξ 2 . . . ) be a linear preferential attachment reproduction process as described in Definition 3.2. We will apply Law of Total Probability with respect to the ringing time of the doomsday clock τ . So, for infinitesimal dx, let us take E x := {τ ∈ (x, x + dx)} be the event that the doomsday clock τ rings in the interval (x, x + dx). Recall that we denote by κ the degree of the root of F. Recall that we write κ for the number of children of the root in the limiting fringe tree F.
supported on the interval [0, x], with Z g (x) = x 0 e χy dy. This statement is commonly known for χ = 0, when Σ (ρ,0) is a Poisson point process (PPP) on R + with intensity ρ. In this case, the lemma states that conditioned on the event that Σ (ρ,0) has k points on the interval [0, x], the locations of these points have the same distribution as that of k i.i.d. uniform random variables on [0, x].
Proof of Lemma 4.1. Recall the distribution of the consecutive birth times Conditioned on E x , the density that there are k children of the fringe-root, precisely born at ordered times r := (r 1 , r 2 , . . . , r k ), and the (k + 1)-st child has r k+1 > x is: Observing that the coefficient of r j in the exponent is χ, we see that where Z fo (x) = e −(ρ+χk) / k−1 i=0 (ρ + iχ) is the normalizing factor independent of r (as long as r is really an ordered sequence, otherwise f o (k, r 1 , . . . , r k | x) = 0). However, we are not interested in the density of the ordered set of times. The unordered set of times {ξ 1 , . . . ξ κ } has density by the symmetry of the possible permutations of r 1 , . . . r k . Conditioning on k, by Bayes rule we know that where Z fu (x) is the appropriate normalizing factor independent of {r 1 , . . . , r k }, that is Z fu (x) = Z g (x) k . Since the density f u (r 1 , . . . , r k | E x , κ = k) is the product of the densities g(r i ), the random variables {ξ 1 , . . . ξ κ } must be i.i.d., with density g x (y).
The implication of this lemma is that conditioned on E x and κ = k, the k subtrees of the fringe root are born independently at times following density g x (y), and evolve independently. Consequently, we can apply Lemma 3.3, and we proceed to computing the terms that appear in (17). Some of these terms can be simply deduced from a result of [24]. and We refer the reader to [24] for a proof. The last unknown variable that we need to compute to apply Lemma 3.3 is q = P(B v | κ = k, E x ), the probability that a subtree F v of a child v of root(F) is a line graph. We condition on the doomsday clock to ring at time x (this the event E x ). Since we assumed that v = v 0 is an offspring of a root, and v is alive before time x, by Lemma 4.1, the random variable τ v0 has density g x (y) defined in equation (34). By definition, the event B v holds if none of the v j have two offspring until time x, hence, we must find q = P(τ 2 > x). To describe τ 2 , the following definition will be useful.
Definition 4.2. Consider a Poisson point process Π := {0 = π 0 , π 1 , π 2 , . . . } on R + with intensity λ ∈ R + and let (Y j ) j≥1 be an independent collection of exponential variables, independent of Π, with Y j having parameter jν ∈ R + . Let ζ := min{j : Y j ≤ π j+1 − π j }. Then, the exponential random variable with Poisson increasing rate is Due to the memoryless property of exponential variables, we can think of H λ,ν as a single exponential clock, that starts with initial rate 0 at time 0, and every time the governing Poisson point process Π has a new point, the rate of the clock increases by ν. The next lemma relates H λ,ν to τ 2 : Lemma 4.3. Recall that τ v0 has density g(y) defined in equation (34), and let H ρ,ρ+χ be an exponential random variable with Poisson increasing rate as defined in Definition 4.2 independent of τ v0 . Then, P(τ 2 > x) = P(H ρ,ρ+χ + τ v0 > x). Figure 4. Illustration of the proof of Lemma 4.3. Part (a) shows the tree at time t = τ v0 , when only v 0 is born, and part (b) shows the tree at time t = τ v2 assuming τ 2 > τ v2 . If v j is the last-born vertex at some time t < τ 2 , we have an (grey) exponential clock with intensity ρ to govern the Poisson point process τ v1 , τ v2 , . . . , and j (black) exponential clocks with intensity (ρ + χ) that govern τ 2 . If the grey clock rings, a new (black) exponential clock with intensity (ρ + χ) appears, and τ 2 is the time when the first black clock rings.
Proof. We are going to show that τ 2 − τ v0 and H ρ,ρ+χ has the same distribution and both are independent of τ v0 . First, the independence follows from the fact that differences between births of consecutive children in the Crump-Mode-Jagers tree are using independent exponential clocks, see Definition 3.2.
Next we show that τ 2 − τ v0 d = H ρ,ρ+χ . First we identify the underlying PPP. In the CMJ tree, by Definition 3.2, the first offspring of every vertex is governed by an exponential clock with rate ρ, hence (τ vj − τ v0 ) j≥0 has the same distribution as (π j ) j≥1 , a Poisson point process Π with intensity λ = ρ in Definition 4.2. The first offspring form the line-graph emanating from v = v 0 , see Figure 4.
The random variable τ 2 is defined as the first time any of the vertices {v j | j ≥ 0} have degree at least three. The inequality τ 2 > τ v1 holds deterministically, because this is the first time any vertex (in this case, v 0 ) can have a second child within the subtree F v0 . This means that until τ v1 = π 1 , τ 2 cannot happen. Indeed, ζ = 0 cannot happen, since the rate of the exponential clock Y 0 is 0, hence Y 0 ≤ π 1 − 0 happens with probability 0.
By Definition 3.2 again, the rate of arrival of the second child of any individual is ρ + χ. For j ≥ 1, let us look at a scenario when v 0 , v 1 , . . . , v j−1 , v j are born and forming a line, i.e., they are born, none of them has a second child yet, and v j+1 has not been born yet. That is, we look at a time t ∈ (τ vj , τ vj+1 ). In this scenario, all of the vertices v 0 , v 1 , . . . , v j−1 are waiting for their second offspring to be born, hence the total rate of arrival of the second offspring is governed by an exponential clock with parameter j(ρ + χ).
In other words, if Y j ≤ π j+1 − π j , then j is the index of the last vertex v j that is born before τ 2 , and τ 2 = τ vj + Y j . Otherwise, if Y j > τ vj+1 − τ vj , the value of Y j is irrelevant, τ vj+1 is born before any of the v 0 , v 1 , . . . , v j−1 has a second child, and the rate of getting a second child on the line present goes up by ρ + χ since now v j is also waiting for his second offspring to be born. By the memoryless property, we can restart the clocks and use a new exponential variables for comparison. Hence, we move on to the next index j + 1. The random variable ζ describes the first index j for which Y j ≤ τ vj+1 − τ vj , which is the first (and only) "relevant" index. Then, which is precisely what we needed.
Although for the proofs we will only need H ρ,ρ+χ , we find the tail distribution of H λ,ν in a general form.
Lemma 4.4. The tail distribution of H λ,ν is given by Proof of Lemma 4.4. Let us condition on the number of points in the Poisson point process π 1 , π 2 , . . . before time t, which is just a Poisson random variable with intensity λt. We have where the expectation is over the random points π 1 , . . . , π k . By standard properties of the Poisson point process (in the spirit of Lemma 4.1 with ρ = λ and χ = 0), we can sample the points π 1 , . . . , π k by sampling k points uniformly from interval [0, t] and then indexing them such that π 1 < · · · < π k . Then, by a telescopic cancellation we obtain Since each π j appears exactly once in the sum, and we can forget about their ordering. Then, the π j become independent uniform random variables on [0, t], and we can simplify to Now simply cancelling the appropriate terms and factoring out the term not depending on k we reach the final result We proceed computing P(F ∈ P K ) in (32). In order to do this, we make use of Lemma 3.3, that requires the conditional generating function of κ | E, that we identified in Lemma 4.2 when we take E x = {τ ∈ (x, x + dx)}. It remains to calculate 1 − q = P(τ 2 < x) that is needed as the argument of the generating function. So, we combine Lemmas 4.3 and 4.4 to find q = P(τ 2 > x). By Lemma 4.3, we must compute the convolution of H ρ,ρ+χ and the random variable with density g x (y) defined in equation (34), which gives We make the substitution u = ρ ρ+χ e −(ρ+χ)t , which gives t = − log( ρ+χ ρ u)/(ρ + χ) and dt = −1 u(ρ+χ) du, to get where Z g (x) = x 0 e χy dy is from Lemma 4.1. Finally, we are ready to apply Lemma 3.3. Let us assume χ = 0. The χ = 0 case will be handled in Section 4.3 below.
Proof of Theorem 2.4. Substituting equations (36), (37) into (17) we obtain Now, using the value q from (45), and by Z g (x) = (e χx − 1)/χ, we have Substituting (47) into the second and third terms of equation (46), the formula becomes: We apply the law of total probability with respect to the density of the doomsday clock τ with rate ρ + χ to compute Let us denote the three integrals on the right hand side by I 1 , I 2 , I 3 , respectively. The third integral can be computed explicitly as and we observe that this term equals P(F ∈ P L ) in (33), and hence it cancels when substituted back into equation (32). So, for (32), we obtain the result for linear preferential attachment trees β(T This is the general formula for (ρ, χ) when χ = 0. This also finishes the proof of Theorem 2.4, since the formula in (11) is recovered when χ = 1.
Now we evaluate this further for the special case χ = −1, and obtain the metric dimension of m-ary increasing trees (ρ = m ∈ N, and χ = −1).
In the first row, the last bracket is of the form (1 − µ + ν) m , that we expand using the trinomial formula: We apply this formula with µ = e −x+ m m−1 (1−e −(m−1)x ) /m and ν = e −x /m. After collecting terms, and taking into account that the integral in the last row of equation (51) For i = 0, the coefficient im/(m − 1) of the doubly-exponential term in equation (52) in the exponent is 0, hence these terms simplify. We sum over the i = 0 terms in j, and perform the integration to obtain Observe that the j = 0 term is 1, and hence cancels the −1 in the first term of the right hand side of equation (51).
where a i,j = a i,j e im m−1 Combining equations (52)-(59) gives the formula which agrees with equation (6) in Theorem 2.2 with A i,j = −a i,j . For the binary search tree, that is, m = 2 we evaluate the coefficients in equations (56) and (58) numerically. Starting with equation (56), then proceeding to the coefficients a 1,1 , a 1,0 , a 2,0 , we get Next we proceed with the random recursive tree (χ = 0 and ρ = 1). The proof is analogous to the proof of Theorem 2.4. We proceed from formula (45).
In this case, τ is exponential with rate ρ = 1. We apply the law of total probability to compute P(F ∈ P K ) = We make the substitution u = e −x , which gives x = − log(u) and dx = − 1 u du, to get P(F ∈ P K ) = 1 + where γ was defined in equation (4). Furthermore, we substitute v = e 1−u in the integral still remaining, which gives u = 1 − log(v) and du = − 1 v dv, to get This finishes the proof.