Competing evolutionary paths in growing populations with applications to multidrug resistance

Investigating the emergence of a particular cell type is a recurring theme in models of growing cellular populations. The evolution of resistance to therapy is a classic example. Common questions are: when does the cell type first occur, and via which sequence of steps is it most likely to emerge? For growing populations, these questions can be formulated in a general framework of branching processes spreading through a graph from a root to a target vertex. Cells have a particular fitness value on each vertex and can transition along edges at specific rates. Vertices represent cell states, say genotypes or physical locations, while possible transitions are acquiring a mutation or cell migration. We focus on the setting where cells at the root vertex have the highest fitness and transition rates are small. Simple formulas are derived for the time to reach the target vertex and for the probability that it is reached along a given path in the graph. We demonstrate our results on several scenarios relevant to the emergence of drug resistance, including: the orderings of resistance-conferring mutations in bacteria and the impact of imperfect drug penetration in cancer.


S1 Model recap
We recap our framework. Our model is a specific form of multitype branching process [1] in which each population will evolve according to a Markovian linear birth-death process with transitions. We will assume each population is comprised of cells. As a conceptual framework we associate the multitype branching process with a simple, finite, rooted, directed, graph G = (V, E) containing N vertices (N = |V |). Labels of the vertices take values in {1, . . . , N } and E is a subset of the set of ordered pairs {(i, j) : i, j ∈ V, i = j}. We shall often refer to vertex 1 as the root and vertex N as the target. Each vertex is reachable from the root and the target is reachable from any other vertex. We further assume the root is a source vertex and the target vertex is a sink. Letting the set of incoming neighbours for vertex i be N − (i) = {k ∈ V : (k, i) ∈ E} and the set of outgoing neighbours N + (i) = {k ∈ V : (i, k) ∈ E}, the previous assumption is N − (1) = N + (N ) = ∅. Each type in the branching process is uniquely mapped to a vertex, and so the number of types is N . Hence any cell may be described by its type or the vertex associated to that type.
Take any cell at vertex x. We assume this cell divides (replaced by two identical copies) at rate α(x), dies (removed from system) at rate β(x) and transitions to a cell at vertex y (replaced by an identical copy and a cell at vertex y) at rate ν(x, y) if edge (x, y) is contained in E. The growth rate of cells at vertex x will be denoted λ(x) = α(x) − β(x). The parameters associated with the vertex 1 population feature prominently and so for convenience we let α = α(1), β = β(1) and λ = λ(1). All cells are independent. We will focus on the setting where the vertex 1 population is the most fit and has positive growth rate. Therefore, we henceforth assume that λ > 0 and for 2 ≤ x ≤ N − 1, λ(x) < λ. We do not specify the relative fitness of the vertex N population. The cell level dynamics may summarised as rate β(x) (x), (y) rate ν(x, y) if (x, y) ∈ E (S1) where (x) represents a cell at vertex x and ∅ symbolises a dead cell. At a population level, the number of cells at vertex x at time t will be denoted Z x (t). We shall always assume Z x (0) = zδ x,1 , where δ x,y is the Kronecker delta function. The population growth of the initial and intermediate vertices is crucial and so the notation Z(t) = (Z x (t)) 1≤x≤N −1 will be used.
We also recap some relevant definitions from the main text that will be used repeatedly. First, the target hitting time is defined to be T = inf{t ≥ 0 : Z N (t) > 0}. (S2) Now let the set of paths between the root and any vertex x ∈ V be be denoted P 1,x . Then we define the weight of the path p ∈ P 1,x to be Throughout the empty product is set to 1. Further, we let the total weight of the the vertex x be

S2 Population growth
In this section we prove Theorem 3 from the main text, but G is not assumed to be acyclic. Specialising to the acyclic case is in Section S4. As the root vertex is a source, and hence no type transitions into vertex 1 cells, Z 1 (t) follows a linear birth-death process. Its asymptotic behaviour is described by the following classic result [1,3].
Here W is distributed as the sum of K independent exponential random variables with parameter λ/α, where K is binomial with z trials and success probability λ/α. W , as given in the above theorem, is positive if and only if the vertex 1 population survives (almost surely). It will be convenient to define this event and so we let From the distribution of W , in particular using that P(S 1 ) = P(W > 0) = P(K > 0) with K as in Lemma S1, we immediately have To obtain a similar result to Lemma S1 for Z(t), we recall the N − 1 × N − 1 matrix A from the Extension section of the main text, with entries Note that the off diagonal elements of the transpose of A are positive whenever the corresponding element of the adjacency matrix of G is one. We will be using a result from [5] which holds when the largest real eigenvalue of A, λ * , is simple (i.e. has algebraic multiplicity 1) and equal to λ. Below we give two sufficient conditions for this to be true. The second is motivated by applications when typically all transition rates are small. Lemma S2. Two sufficient conditions such that λ * = λ and is simple are: 1) G is acyclic. 2) For all Proof. 1) As G is directed and acyclic, a relabelling of the vertices exists (often called the topological sorting, see [2] chapter 22.4) such that the underlying adjacency matrix of the relabelled graph is upper triangular. We use this relabelling with vertex 1 as first in the labelling, π. Thus π is a permutation of {1, . . . , N } with π(1) = 1. The permutation matrix associated with π and its transpose are Under π we have the new matrix Due to the relation between the adjaceny matrix of G and A, we see A is lower triangular and has the λ(i) as diagonal elements. Thus λ is the largest eigenvalue of A . As A and A are similar matrices, and hence share eigenvalues, we can conclude the statement.
2) By considering the left eigenvector e T 1 , λ is indeed an eigenvalue. We now demonstrate all other eigenvalues have real part smaller than λ, which implies λ is simple. Observe that the assumed condition implies the bound on the row sums (S11) As A is reducible (due to the root being a source vertex) there exists a permutation matrix P 2 such that with each submatrix R i,i square and irreducible and 2 ≤ M ≤ N − 1 (see [7]  j }. Further, the bound (S11) is preserved under permutations and so each r (i) j < λ. As this holds for each R i,i the claim is shown.
Henceforth we assume either of the conditions given in Lemma S2 hold and thus λ * = λ and is simple. Due to this assumption, results of [5] show that the vertex 1 cells drives the entire population growth. To state this result, we introduce the vector ψ with (S13) Now letũ,ṽ be the left and right eigenvectors of A corresponding to λ, scaled such that ψ ·ṽ = 1,ũ ·ṽ = 1. (S14) Note, from the structure of A,ũ i > 0 if i = 1 and 0 otherwise. Further let Φ =ũ 1ṽ (S15) Then the following result, which is the more general form of Theorem 3 in the main text, holds.
Theorem S1. With W as in Lemma S1, a.s. (S16) Proof. We demonstrate how to apply the result in [5] to our setting. From Theorem 3.1 in [5], with probability one lim t→∞ e −λt Z(t) =Ŵṽ (S17) whereŴ is a non-negative random variable, with as yet unknown distribution. However from Lemma S1 almost surely e −λt Z 1 (t) → W , with the distribution of W given in the lemma. This implieŝ W = W/ṽ 1 =ũ 1 W with the last equality following from (S14) and that only the first entry ofũ is positive.
The proof of Theorem S1 indicates our primary reason for taking the root vertex to be a source (no incoming edges). If transitions were permitted into the vertex 1 population, but still λ * = λ and is simple, then (S17) still applies but we do not know the distribution ofŴ . Additionally, regardless of whether the root is a source, if we consider the the more general offspring distribution discussed in the Extensions section of the main text, we similarly would have no relevant knowledge of the distribution ofŴ (the Laplace transform ofŴ is known to satisfy an implicit integral equation [1], but we cannot see how to use this information here).

S3 Time until target vertex is populated: general case
We now turn our attention to T , the time at which the target population arises, as defined in (S2). So that we can control all target seeding transition rates we let i.e. the largest of the transition rates into the target. We further give the definition of µ in this more general setting as Then the following holds.
Theorem S2. Let W be as in Theorem S1, then with probability one It follows that Then, as for given (Z(t)) t≥0 immigration into vertex N occurs as a non-homogeneous Poisson process, Now observe that for any i ∈ N − (N ), by multiplying by 1 = e λ(u+λ −1 log(σ −1 )) e λ(u+λ −1 log(σ −1 )) , Consider the limit of (S25) as ν * → 0, which results in log(σ −1 ) → ∞. As the (Z i (t)) i∈N − (N ) are independent of (ν(i, N )) i∈N − (N ) , Theorem S1 informs us that, almost surely, Hence taking limits across (S25) yields, with probability one, By the definition of σ this last sum is one and hence We seek to apply the dominated convergence theorem to the integral in (S24). The limit demonstrated in (S28) implies that for any realisation, we may find x such that for is cadlag, it is bounded over finite intervals. Hence for all (ν * , u) the integrand is dominated by an integrable function. Therefore, taking the limit of (S24) and using (S28) leads to, with probability one, so that we centre appropriately, leads to the first stated result. For the second statement, we must derive an expression for Using a one-step conditioning argument on χ 1 and considering the Laplace transform for a unit rate exponential variable leads to the stated result.
Before proceeding, we comment on a limitation of our proof approach. A key point in the proof of Theorem S2 is that the stochastic process (Z i (t)) 1≤i≤N −1,t≥0 is independent of the target seeding transition rates (ν(i, N )) i∈N − (N ) . This permits the limit displayed in (S26) to hold. If we considered a limit where transition rates associated with non-target adjacent edges also tended to 0, the limit (S26) would not hold. This is as Theorem S1 is true for fixed (α(x)) N −1 x=1 , (β(x)) N −1 x=1 and transition rates associated with non-target adjacent edges (the parameters controlling the growth of the populations at vertices 1, . . . , N − 1). Our inability to alter these parameters is also the reason, that in a precise sense, our results do not cover the formulation of transitions where type x divide at rate α (x), and then with probability ν (x, y) a transition occurs to vertex y. In that formulation, taking the limit when ν (x, N ) tends to 0, for x ∈ N − (N ), alters the division rate of cells at vertices in N − (N ), again forbidding the use of Theorem S1. We hope to improve upon these limitations in future work.
The first statement in Theorem (S2) will be key to understanding the path distribution, discussed in the Path distribution section of the main text, which justifies its prominence. As it implies that, with probability one, lim the first statement in Theorem (S2) can be interpreted as indicating that if we consider the conditional distribution of T around µ, then rescale time, an exponential distribution appears.
As discussed in the Time until target vertex is populated section of the main text, it is natural to consider the distribution of T conditioned that it is finite (we cannot discuss which path populated the target vertex if T = ∞). However it is technically more convenient to condition on S 1 , defined in (S6). The following proposition states that in many relevant cases, namely large initial population, low death rate or small transition rates leaving vertex 1. these events are similar. Below, the superscript c denotes the complement of the set.
Proof. As P({T < ∞} c ∩ S 1 ) = P(T = ∞|S 1 )P(S 1 ) and P(S 1 ) > 0 we initially demonstrate Firstly recall that with W as in Lemma S1, up to null events, with probability 1 on S 1 . Observe that (S39) is true as, by Theorem S1, for any realisation we may find As each Φ x > 0, due to each entry ofṽ also being positive [5], the right hand side of the above inequality is infinite a.s. on S 1 which gives (S38). It remains to bound P({T < ∞} ∩ S c 1 ). Let us introduce Z * (t) = i∈N + (1) Z i (t) and Note that we cannot hope to populate the target vertex without Z * (t) becoming positive. Therefore, Hence our interest turns to the right hand side of the preceding inequality. The distribution of τ is invariant to the growth at vertices i ∈ N + (1), and so we let α(i) = β(i) = 0 for such i. The process (Z 1 (t), Z * (t)) is a two-type branching process where cells of the second type (that contribute to Z * (t)) simply accumulate. As and from (S7) we know P(S c 1 ) = β/α, we now solve for the joint distribution of (Z 1 (t), Z * (t)). For now assume z = 1. We introduce the generating functions By solving this equation (separation of variables with a partial fraction decomposition), and using the initial condition we find With where q 1 = β/α and We now have expressions for the final terms in (S43) for z = 1. For z ≥ 1, we can use independence, and this leads to By examining the leading order with respect to the relevant variables we can conclude.
We remark that if we instead wish to look at the case when the vertex N population arises and does not go extinct then P(All populations go extinct) = P(T = ∞) whereT is the first time the target is founded by a cell whose progeny does not go extinct. Recall that our results for T hold identically forT so long as the mapping ν(x, N ) → ν(x, N )λ(x)/α(x) for all x ∈ N − (N ) is applied . The probability of population extinction is known to be given by an implicit equation involving the offspring distribution's generating function [1]. This was the approach taken by [4]. The approximate solution for small transition rates was derived and agrees with the result presented above (see Eq. A.4a in Appendix A of [4]).
The above proposition yields P(T < ∞) ≈ P(S 1 ). Using this with (S7) gives Eq. 6 in the main text. We now give consider the distribution of T given S 1 (which we use as proxy for conditioning on a finite hitting time).
Corollary S1. With the notation of Theorem S2, Proof. Starting from the first statement in Theorem S2, we only need apply E[·|S 1 ]. The distribution of W λ/α can be rewritten as where ξ i are rate one exponential variables, and K ∼ Binomial(z, λ/α), independent of all ξ i . Conditioning on S 1 ensures that K > 0. Thus we must calculate The binomial theorem leads to the stated expression.
We now prove a corollary regarding µ (defined in (S19)), which provides justification to Eq. 7 in the main text.
Corollary S2. Let t 1/2 be the median time for the vertex N population to arise, conditioned on S 1 . That is t 1/2 satisfies Then lim Proof. We firstly note, as is apparent from the cumulative distribution function of T (see (S23) for instance), that t 1/2 is monotone increasing as ν * decreases. By (S56) Taking the ν * → 0 limit we find λ(t 1/2 − µ) → x, where the convergence is guaranteed as both t 1/2 , µ are monotone increasing. Further, from Corollary S1, the limit x satisfies Solving for x leads to x = −λh(z) as stated. For the asymptotic result we firstly consider Expansions on the numerator and denominator show that we have shown the claimed asymptotic behaviour.

S4 Path graph and acyclic graphs
In this section we specialise the results of Sections S2 and S3 firstly to the case of a path graph and then to acyclic graphs. We first consider a path graph with N vertices and letp 1:x = (1, 2, . . . x). In this setting the matrix A from Section S2 has entries and the vector ψ has entries ψ i = α(i) + β(i) + ν(i, i + 1) for i = 1, . . . , N − 1. We want a useful representation of the sequence (Φ x ) N −1 x=1 , which we recall is given in terms of the entries of the suitably normalised left and right eigenvectors of A (see (S14)). Because they are easier to work with (i.e. check they indeed are eigenvectors) we first introduce the unnormalised left and right vectors u , v with entries where the empty product is again set equal to 1. Note for 2 ≤ here w(·) is again the path weight defined in (S3) . It may be directly verified that these are left and right eigenvectors for A with eigenvalue λ (for v it is helpful to note that Av = λv implies . Now we normalise these, so that they agree with (S14) viã Our aim is to obtain Φ =ũ 1ṽ , and as both u 1 = v 1 = 1, we see that Φ = v . Using this, the following corollary demonstrates how the results of the previous section on population growth and time to reach the target simplifies in the path graph setting.
Corollary S3. The results of Sections S2 and S3 hold with the explicit representations and The appearance of φ N in the expression for µ in Corollary (S3) is due to φ N = w(p 1:N ) = ν(N − 1, N )Φ N −1 (to be compared with the general definition of µ (S19)). For referencing, note the definition for the weight of a path p, w(p), and total weight of N , φ N , are in (S3) and (S4). We now move to the case where G is acyclic.
Recall the concept of vertex lineage from the Path distribution section in the main text (which for a chosen cell, is a sequence recording the vertices of cells in the ancestral lineage of our chosen cell). We denote the number of cells at time t with vertex lineage p as X(p, t). The number of cells at a vertex is related to the cells with vertex lineages ending at that vertex via For any path p, as above we let the first j terms in p be denoted p 1:j . The process (X(p 1:j , t)) l+1 j=1 is a multitype branching process, and X(p 1:j , t) represents the number of cells that have progressed to step j along path p at time t. Births occur to individuals of type i at at rate α(p i ), deaths at rate β(p i ) and individuals of type i transition to type i + 1 at rate ν(p i , p i+1 ). Thus we may use our results for path graphs with vertices labelled 1, . . . , l + 1. Hence via Theorem S1 and Corollary S3 we have the following. where w(p) is the path weight given in (S3) and W is as in Lemma S1.
Analogously to Corollary S3 we have the following statement regarding the growth of the population at the initial and intermediate vertices.
Corollary S5. The results of Sections S2 and S3 hold with the explicit representations where w(p) is the path weight given in (S3).
Note that as the empty product is defined to be 1, φ 1 = 1.
Proof of Corollary S5. Due to Theorem S1 we must demonstrate Observe that for any events (A i ) n i=1 such that for all i, P(A i ) = 1 then P(∩ n i A i ) = 1. Use this with events A p = {lim t→∞ e −λt X(p, t) = W } to conclude.
One particular consequence, by coupling Corollaries S1 and S5 is that for ν * small which is the conditional version of Eq. 5 in the main text. Differentiating yields an approximation for the conditional density

S5 Distribution of the path to the target
We continue to take G acyclic. Note that |P 1,N | is finite, and so we let n = |P 1,N |. The elements of P 1,N can be enumerated and we use the following notation to do so P 1,N = {p (1) , . . . , p (n) }. For each p (i) ∈ P 1,N we let: the path lengths be l (i) = |p (i) | − 1, u (i) = ν(p (i) l (i) , N ) be the final transition rates, and η (i) = w(p (i) )/u (i) . Consider p (i) ∈ P 1,N and recall that X(p (i) 1:j , t) represents the number of cells that have progressed to step j along path i at time t. If we apply the first statement in Theorem S2 to the branching process (X(p (i) 1:j , t))) l (i) +1 j=1 we can deduce that under a time rescaling, the time to traverse path i is approximately exponentially distributed, conditional on the population growth along that path. Therefore the question of which path the target population arises from becomes equivalent to the minimum of a set of exponential random variables. To simplify notation we let, Also, from the Path distribution section of the main text, recall the notation We can now prove the precise version of Theorem 2 in the main text.
Theorem S3. Assume the limits lim ν * →0 u (i) u (j) exist, but may be infinite, and further that there exists i * ∈ {1, . . . , n} such that lim ν * →0 u (i) u (i * ) < ∞ for all i = 1, . . . n. Then, for p ∈ P 1,N and t ∈ R, Proof. Consider initially the case when n = 2, that is there are only two paths from vertex 1 to vertex N . These two paths must diverge before N . If the population growth along these paths until the target vertex (but excluding the population growth at the target vertex itself) is given, then T (1) and T (2) are the first arrival times in two independent non-homogeneous Poisson processes. Returning to arbitrary n, to use this fact we define the population numbers along all the paths up until the target vertex as So that we can use this conditional independence, and prior results, we note .

(S84)
We now focus on the argument of the above expectation. Let the path under consideration be path 1 (p = p (1) ) and We know, by Theorem S2, that each T (i) appropriately centred converges. If we centre each T (i) by the same quantity, then the ordering is preserved. That is, clearly for any µ (i) . As T (i) is the target time along the path graph p (i) , Corollary S3 gives the appropriate centring. In particular with we have, by the first statement of Theorem S2 and Corollary S3, For order preservation (of the T (i) s) we take µ (i * ) as a common centring of all T (i) , and use Note the above expression may be −∞. Using this, and exponentiating (which again preserves order), leads to At this point we note that if κ (i) = 0, then for fixed (X (s)) s≥0 , the random variables exp[λ(T (i) − µ (i * ) )] converges to +∞ almost surely. Recall the T (i) are conditionally independent, and hence convergence of the marginal distributions implies joint convergence. Therefore for (x j ) n j=1 with each x j ≥ 0, almost surely we have Including the shift t, we have for fixed (X (s)) s≥0 the random vector (exp[λ(T (1) + t − µ (i * ) )], exp[λ(T (j) − µ (i * ) )]) n j=2 jointly tends in distribution to the random vector (U i ) n i=1 . Note that only T (1) is shifted, due to the path under consideration being p (1) . If W = 0, all the U i = +∞ a.s., instead if W > 0 each U i is independent and for 2 ≤ i ≤ n has distribution where E i is an exponential random variable with parameter W κ (i) η (i) λ η (i * ) α . For i = 1, instead we have U 1 d = + ∞ if κ (1) = 0, else U 1 is distributed as an exponential with parameter e −λt W κ (1) η (1) λ η (i * ) α . As, up to null events, {W > 0} = S 1 , and using standard results for the ordering of exponential variables (looking at the minimum of the U i ), we have lim ν * →0 I S1 P(T (p) + t < T (¬p)|(X (s)) s≥0 ) = I S1 The right hand side of (S92) is an element of [0, 1]. 0 is obtained when κ (1) = 0 (the final transition rate along the path is small, in the sense of the limit (S85), relative to other paths) and 1 is obtained if κ (i) = δ 1,i (the final transition rate along the path is the largest relative to other paths). Coupling the above with (S84) gives the result.

S6 Cyclic graphs
Suppose G is as before, but without the acyclic assumption. For cyclic graphs in place of P 1,N we must consider W 1,N , which we define to be the set of walks between the root and vertex N . The definition of vertex lineages holds identically for walks. Therefore for ω ∈ W 1,N we let where X(ω, t) is as given in the Path distribution section of the main text. Here we show that so long as the number of transitions between vertices is finite, which corresponds to finitely many back transitions, we may map a graph containing cycles to an acyclic one and then use the results given above. The set of walks between the root and the target vertex of length at most i − 1 will be denoted Further for a graph G = (V, E), let the vertex parameters be the birth and death rates associated to the vertices of G and the edge parameters the transition rates associated with the edges.
Proposition S2. For a given graph G = (V, E) with associated vertex and edge parameters, and i ≥ 1 which is the upper-length of walks that we consider on G, we can construct an acyclic graph G = (V , E ) (dependent on i), which coupled with appropriate vertex and edge parameters, permits a bijection g : W (i) 1,N → P 1,|V | . This bijection possesses the property that for ω ∈ W where T (g(ω)) is defined as in (S93) but with the process on G with its associated parameters.
Before giving the proof we briefly demonstrate this on the graphs shown in Fig. A. This elucidates the general idea.
Here W  1, 2, 6)) so long as the transition rates associated with edges (1, 2) and (2, 4) on G are equal to the rates with (1, 2) and (2, 6) on G and that the birth and death rates at vertices 1 and 2 are equal in both cases.
Then for each further ω (j) , 2 ≤ j ≤ M, from the previously considered walks select the walk that is identical to ω (j) for the greatest number of vertices, say ω (k) , 1 ≤ k < j. If ω (j) and ω (k) agree up until their mth element, and l (j) − m > 0 then add l (j) − m elements to V j−1 (new vertices starting with |V j−1 |). This creates V j . Denote the new vertices as (v . As ω (k) has previously been considered it will have an associated path p (k) , whose final element we will require to construct E j : to E j−1 add the edges (p and (v Identifying g(ω (j) ) = p (j) , we see that if ω (j) , ω (k) agree until their mth element then so also will g(ω (j) ), g(ω (k) ), but after this the paths diverge. We further require that the birth and death rates for the population type represented by the kth element of ω should be the same as the kth element of g(ω). Similarly for the transition rates between the kth and k + 1th element of ω and g(ω).
Coupling the above proposition with the results of Section S4 allows one to characterise in terms of the weights of the walks. Further for a set of walks W (i) 1,N , Theorem S3 holds but now with the walk weights, defined analogously to the definition given in (S3). Note that the Proposition S2 holds regardless of whether the root is a source vertex, or whether the vertex 1 population is the most fit. However, if transitions are permitted into the root vertex, then for any given walk, there is no guarantee that the cells associated with the first element of that walk are the most fit. This would leave us unable to apply the results derived in earlier sections.
While the above proposition allows us to consider back transitions, the following result demonstrates that when the transition rates between vertices are small, the first initiation of the target is dominated by the paths, as opposed to walks containing cycles. This offers a secondary justification for mainly focusing on acyclic graphs. To state the result let us define q : W 1,N → P 1,N as the operation reducing walks to their respective paths. This may be accomplished recursively by searching for the first cycle in ω and replacing this with the first element in the cycle. Then repeating the same procedure on the reduced walk until it is a path. This operation is the same as that which maps trajectories to their corresponding trajectory class in [6]. Further, for p ∈ P 1,N let that is the walks of length at most i − 1 which when reduced to paths are p, but excluding p itself. Then with ν max = max{ν(i, j) : (i, j) ∈ E} (S100) we have the following.

S7 Simulation for alternative transition formulation
Increasing the relative size of the single drug compartment speeds up resistance when d dr t 1/2 (0) < 0 or equivalently when ds(d 2 + 3ds + s 2 ) 1 − f > f m(d + s) 2 . (S108) Expansions in f and s/d (recall f 1, s/d 1), keeping only leading order terms and using the definition of f leads to Eq. 16 in the main text.
Turning to Eq. 17 of the main text, we wish to find r such that d dr t 1/2 (r) = 0. The numerator of d dr t 1/2 (r) is quadratic in r. After solving for both roots and expanding in powers of s/d, we see the leading term for one of the roots is -1. Thus the root of interest is the other root, which to leading order is the expression in Eq. 17. For Eq. 18 of the main text, we use our main result of Theorem 2 in the main text. With φ N as given above in (S104), this yields (S110) Upon expanding in s/d, to leading order we have Eq. 18.