An Approximation Scheme for Quasistationary Distributions of Killed Diffusions

In this paper we study the asymptotic behavior of the normalized weighted empirical occupation measures of a diffusion process on a compact manifold which is also killed at a given rate and regenerated at a random location, distributed according to the weighted empirical occupation measure. We show that the weighted occupation measures almost surely comprise an asymptotic pseudo-trajectory for a certain deterministic measure-valued semiflow, after suitably rescaling the time, and that with probability one they converge to the quasistationary distribution of the killed diffusion. These results provide theoretical justification for a scalable quasistationary Monte Carlo method for sampling from Bayesian posterior distributions in large data settings.


Introduction
With recent developments in theoretical and computational statistics, a promising new paradigm has emerged for the performance of exact Bayesian inference on large datasets: quasistationary Monte Carlo (QSMC) methods. Recall that for a Markov process Y = (Y t ) t≥0 killed at random time τ ∂ , a distribution π is called quasistationary if for each t ≥ 0, where P π denotes the law of Y conditional on Y 0 ∼ π. The idea behind quasistationary Monte Carlo is to draw samples from such a quasistationary distribution.
In [20] sufficient conditions were formulated under which the quasistationary distribution of a killed diffusion coincides with a density function of interest.
There are (at least) two distinct approaches to realizing this method in practice. The first is to use particle approximation methods. This was the approach of the first QSMC method, the Scalable Langevin Exact (ScaLE) algorithm, introduced in [16]. The quasistationary framework enables the principled use of subsampling techniques, and the resulting ScaLE algorithm is provably efficient for performing exact Bayesian inference when the underlying dataset is large. This approach is supported by the full weight of the sequential Monte Carlo literature, and thus has theoretical assurances of convergence.
While the ScaLE algorithm has many desirable theoretical and computational properties, it also suffers some drawbacks. One particularly prominent drawback is the high complexity of implementing the algorithm. A cursory glance through [16] can convince the reader that the algorithm is (necessarily) involved. This motivates the search for alternative QSMC methods which inherit the desirable properties of exactness and scalability, while avoiding the algorithmically complex particle approximation framework.
One alternative approach to QSMC takes advantage of the fact that quasistationary distributions can be written as solutions to fixed point equations in measure spaces (represented here by equation (14) in Proposition 2.9). This yields the QSMC method dubbed Regenerating ScaLE (ReScaLE), which seeks to approximate the quasistationary distribution π by means of stochastic approximation. The idea behind ReScaLE is that we simulate a single killed diffusion path, whose quasistationary distribution coincides with the Bayesian posterior of interest. When this single particle is killed, the particle is 'regenerated' or 'reborn' in a new random location drawn independently from its normalized weighted empirical measure. It then continues to evolve from this new starting position until it is killed again, at which point it is again regenerated, and so on.
ReScaLE inherits the key properties that motivate the ScaLE algorithm, and algorithmically it is significantly more transparent and straightforward, but this is worthwhile only if the method provably leads to correct results. Theoretical analysis of stochastic approximation approaches to numerically solving fixed point equations are well-studied in finite-dimensional contexts, cf. [13], but there is currently no theory appropriate for the measure-valued, continuous-time context of ReScaLE. While the implementation of the original ScaLE algorithm is underpinned by the vast body of work on sequential Monte Carlo, there has been no theoretical assurance that the ReScaLE algorithm even converges. The purpose of the present work is to demonstrate this fundamental convergence property, leaving practical and computational properties of the algorithm to be considered in detail in forthcoming work.
In this work we assume that we are working on a compact boundaryless d-dimensional Riemannian manifold M , as in [5]. Relaxing the assumption of compactness is discussed in Section 5. We have a particle X = (X t ) t≥0 evolving on M in continuous time, according to the solution to the stochastic differential equation between regeneration events, where A : M → R is a smooth function and W is a standard Brownian motion on M . Regeneration events occur at a statedependent rate κ(X t− ), where κ : M → [0, ∞) is a given smooth, positive function, which we will refer to as the killing rate. At a regeneration event, say at time T , the particle is instantaneously 'killed' and 'reborn': its location is drawn (independently) from its normalized weighted empirical occupation measure µ T , where for all t ≥ 0, µ t is given by where r > 0, η · : R + → R + and a probability measure µ 0 on M are fixed. The resulting process X is clearly non-Markovian. The addition of the µ 0 term has the benefit of regularising the µ t around t = 0, as well as providing practical flexibility for the resulting Monte Carlo algorithm. The weight function η · similarly provides additional practical flexibility; a straightforward choice would be constant η s ≡ 1, but see Remark 2.2 for a non-constant alternative. The goal of this paper is to characterize the asymptotic behavior of the measures (µ t ) t≥0 , and to show that it converges to the quasistationary distribution for the original killed process. We proceed in two steps: First, we show that a certain deterministic semiflow Φ converges to the appropriate limit. Second, we show that after making an explicit deterministic time change ζ t := µ h(t) , the stochastic evolution of the measures (ζ t ) t≥0 shadows Φ in a particular sense, to be defined below.
where d w is a metric that metrises weak-* convergence of probability measures given in (4) and Φ is the semiflow defined in Section 3.
Remark 1.2. In [2] a map t → ζ t that satisfies (3) for each T > 0 is called an asymptotic pseudo-trajectory for the measure-valued semiflow Φ.
In Section 3 we define the semiflow Φ, and proceed to show that it has a global attractor π, which is the unique quasistationary distribution of the diffusion (2) killed at rate κ.
In particular, Theorem 1.1 leads to the following corollary: Corollary 1.3. Almost surely, lim t→∞ µ t = π in the sense of weak-* convergence.
That is, we have lim t→∞ µ t (f ) = π(f ) for any continuous f : M → R. While our results above hold for any appropriate given killing rate, in the ReScaLE algorithm the killing rate κ is chosen so that the quasistationary distribution π is the Bayesian posterior distribution of interest; see the expression (6). Corollary 1.3 tells us that in this setting we can draw approximate samples from π by running the regenerating process X and outputting µ t for a large t as a proxy for π. Figure 1 shows the output of two independent simulations on the unit circle M = R/(2πZ) parameterised by θ ∈ [0, 2π), which is amenable to straightforward simulation and visualisation. The underlying diffusion is a Brownian motion (A ≡ 0 in (2)), and the quasistationary distribution is trimodal with density function π(θ) = (0.3 + sin 2 (1.5θ))/(1.6π) with respect to the Riemannian measure. The corresponding killing rate, calculated using (6) is κ(θ) = 2.25(2 cos 2 (1.5θ) − 1)/(0.3 + sin 2 (1.5θ)) + K, with K = 1.75. Figure 1: Example on the circle. The circle is parameterised by θ ∈ [0, 2π), with due east corresponding to θ = 0. The underlying diffusion is a Brownian motion, and we have chosen r = 1000, µ 0 to be the uniform distribution on M and η s ≡ 1. The quasistationary distribution π(θ) = (0.3 + sin 2 (1.5θ))/(1.6π) is the dashed line. We have plotted t 0 δ Xs− ds/t discretised into 50 evenly-spaced bins, for t = 25, 100, 1000, 10 6 . The two columns are two independent runs.
The simulations were done using a simple Euler discretisation scheme where time was discretised into intervals of length 0.05. The plots are oriented so that θ = 0 is due east, with θ = π/2 being due north, and so on. We have chosen r = 1000, µ 0 to be the uniform distribution over M and η s ≡ 1. The plots shows t 0 δ Xs− ds/t for t = 25, 100, 1000, 10 6 , split into 50 evenly-spaced bins. The quasistationary distribution π is the dashed line.
We see that there is a significant amount of variability for the small time values, as the initial rebirths are drawn mostly from the uniform distribution µ 0 . These discrepancies are largely smoothed out by t = 1000 and certainly by t = 10 6 , t 0 δ Xs− ds/t closely resembles the quasistationary distribution π. The present work may be seen as a synthesis and extension of of [5] and [4]. [5] shows an analogous characterization for the normalized occupation measures of a self-interacting diffusion on a compact space, where the past occupation measure influences the present behavior of the diffusion through its drift term. In our work, the influence of the occupation measures is felt through jumps, which occur at a state-dependent rate. In [4], the authors prove the discretetime analogue of our above result; their underlying Markov process is a discretetime Markov chain evolving on a compact space, rather than a diffusion. Our work can be seen as an affirmative response to the suggested continuous-time extension described in [4,Section 8.3].
We utilise the broad approaches used in [5], [4] and [12], often referred to as the "ODE method", cf. [13]. To understand the particular continuous-time dynamics we employ spectral techniques similar to those of [20] and [11]. For example, to handle the killing mechanism we make use of the sub-transition density of the killed diffusion and the corresponding resolvent operator; see Lemmas 2.3 and 2.6. The continuous-time setting also provides a natural interpretation of the weights η s in terms of the distribution of the rebirth 'times'; see Remark 2.2. Another contribution of this work is the analysis of the deterministic semiflow in Section 3, where we transform to an auxiliary Markov process and apply an appropriate drift condition.
A similar analysis has previously been carried out in the finite-state-space discrete-time setting in [3], [6] and much earlier in [1]. In [1] convergence in the finite state-space setting was first proven, and in [3] and [6] rates of convergence and central limit theorems were additionally established. These analyses rely upon special techniques, such as those in [13], applicable only to finitedimensional probability distributions.
The structure of the paper is as follows. In Section 2 we lay out the mathematical setting and describe the assumptions that we are making. We also define the fixed rebirth process and prove some of its key properties which are crucial for later defining the semiflow. In Section 3 we define and analyse the deterministic semiflow Φ and prove that π is a global attractor. In Section 4 we prove that our normalized weighted occupation measures almost surely comprise an asymptotic pseudo-trajectory for Φ, concluding the convergence proof. Finally, in Section 5, we discuss briefly the possibilities for extending these results to noncompact manifolds M .

General background
We will be working on a boundaryless d-dimensional compact, connected smooth Riemannian manifold M . We will denote the corresponding Riemannian measure by dx or dy, and the Riemannian inner product of points x and y by x · y. Let C(M ) denote the Banach space of (bounded) real-valued continuous functions on M equipped with the sup norm, · ∞ . Let P(M ) denote the space of Borel probability measures on M , equipped with the topology of weak-* convergence, namely convergence along all bounded continuous test functions. (This is conventionally called simply weak convergence in probability theory, but we follow [4] to avoid confusion when working with the Banach space C(M ) and its dual.) Thus P(M ) is a compact metrisable space, by the Prokhorov theorem, cf. [10, Theorem 11.5.4].
As M is compact C(M ) is separable, so we may choose a sequence of smooth We also define the total-variation norm for a signed measure µ on M by Let Ω be the space of càdlàg paths ω : R + → M , and let F be the cylinder σ-algebra. Let X = (X t ) t≥0 be the coordinate process, X t (ω) = ω(t), and let (F t ) t≥0 be the natural filtration of X. Assumption 1. The SDE (2) on M has a unique strong solution for any initial position.
Recall that for a Markov process Y killed at a random stopping time τ ∂ , a distribution π ∈ P(M ) is quasistationary if it satisfies (1) for all t ≥ 0.
Given a killing rate κ : M → [0, ∞), the corresponding killing time τ ∂ is defined as where ξ is an independent exponential random variable with rate 1.

Assumption 2.
We have a smooth killing rate κ : M → [0, ∞) which is uniformly bounded away from zero: there exists some constant κ > 0 such that As a continuous function on a compact space κ is necessarily bounded above, say Given a killing rate κ which is not strictly bounded away from 0, we can always add a positive constant everywhere; this will not affect the quasistationary behavior of the process and will ensure that (5) holds. The upper bound on the killing rate will certainly guarantee that the resulting process will be almost surely non-explosive.
We will later see that given κ, the diffusion (2) killed at rate κ has a unique quasistationary distribution (Proposition 2.9).
The question of existence and uniqueness of quasistationary distribution of killed diffusions has been studied in depth, for instance in [11]. This context of quasistationary Monte Carlo methods -where we are starting from a density π and wish to construct a killed diffusion whose quasistationary distribution coincides with π -is the topic of [20], and the interested reader is encouraged to look therein for more details.
From [20], given a smooth positive density π, the appropriate choice of κ is given for each y ∈ M by Here ∇ and ∆ are the gradient and Laplacian operators on M , and K is a constant chosen so that κ satisfies (5). This choice of κ is discussed at length in [20]. We note that the assumptions of [20] are trivially satisfied in the present setting. Fix µ 0 ∈ P(M ) and r > 0. We fix a weight function η : R + → R + satisfying the following assumptions: Define functions g : R + → R + and α : , t ≥ 0.
Assumption 3. η is continuously differentiable with η t > 0 for all t > 0, and where h is defined below.
Since g is strictly increasing (as η t > 0), continuously differentiable and increases to ∞, it is a diffeomorphism of R + → R + . Thus it has a well-defined continuously differentiable inverse g −1 . The function h : This function h will be the time change that we shall employ in Section 4.
Remark 2.1. It follows from Assumption 3 that t 0 α s ds = log(1 + g(t)/r) → ∞ as t → ∞. Thus these conditions on α are analogous to the typical discrete-time assumptions on the step sizes in traditional stochastic approximation; cf. [13, s ds is monotone decreasing, a sufficient condition for (7) to hold is that Define the normalized empirical occupation measures (µ t (ω)) t≥0 by In general we will omit the dependence on ω.
Remark 2.2. For simplicity one may take η t ≡ 1, then g(t) = t and α t = 1/(r+t) for all t ≥ 0. Sampling from Z ∼ t 0 η s δ Xs− ds/ t 0 η s ds is then equivalent to simulating V ∼ Unif([0, 1]) and setting Z = X (V t)− . More generally, for k ≥ 0 one can take η t = t k for each t ≥ 0. It is not difficult to check that for this choice of η Assumption 3 is satisfied, and simulating Z ∼ t 0 η s δ Xs− ds/ t 0 η s ds is equivalent to the case of constant η except with V ∼ Beta(k+1, 1). Heuristically, choosing a larger value of k privileges the more recent times. How to choose the parameter k to optimise convergence is itself an interesting question for future exploration.
For any µ ∈ P(M ), define the operator −L µ on smooth functions by Here we choose to use minus the operator in order to agree with the convention of [11] and [20], which made the corresponding self-adjoint operators positive. Formally, we are defining a generator on a core of smooth functions, which can be extended to a formal domain and so on. However since we are working on a compact boundaryless manifold these technical details are unimportant and hence omitted. We can define probability measures (P x : x ∈ M ) with the following properties: • P x (X 0 = x) = 1.
• For all smooth f ∈ C(M ) is a P x -martingale with respect to (F t ).
We can construct P x by explicitly defining the killing events, as follows. Let ξ 1 , ξ 2 , . . . be an i.i.d. sequence of Exp(1) random variables. Set T 0 = 0, X 0 = x and given T n and X Tn inductively define where for each n = 0, 1, 2, . . . , (Y (n+1) s ) s≥0 is an independent realisation of the solution to the SDE (2) started from Y (n+1) 0 = X Tn . For a careful treatment of defining diffusions on manifolds, the reader is referred to [19].

Fixed Rebirth processes
We now define the fixed rebirth processes and derive some useful properties. These will be crucial later for defining the deterministic semiflow. It will be convenient to work on M with the measure Γ(dy) = γ(y) dy, where γ(y) = exp(2A(y)).
As in [20] and [11], the solution to the SDE (2) killed at rate κ has a continuous sub-transition density p κ (t, x, y) with respect to Γ. Moreover, since we are working with respect to Γ, this density is symmetric, namely p κ (t, x, y) = p κ (t, y, x).
Let us write L κ for minus the generator of the diffusion Y from (2) killed at rate κ. For smooth f , we have The sub-transition density p κ (t, x, y) is related to the semigroup through the identity Here (and throughout) τ ∂ denotes a general killing time, defined analogously to (9). . Following the standard probabilistic notation we will denote its dual action on a measure µ by simply µR. That is, µR is the measure defined by Proof. R is clearly linear and maps non-negative functions to non-negative functions. It maps continuous functions to continuous functions since p κ (t, x, y) and γ are continuous. Thus R is a positive linear operator mapping C(M ) → C(M ). For the constant function 1 : x → 1 we have Since we are assuming that the killing rate is everywhere bounded below by κ, it follows that we have the uniform bound over x ∈ M , Thus since R is positive, it follows that R is bounded.
We note for future reference that Remark 2.5. Heuristically, the resolvent describes the cumulative behavior of the killed diffusion over entire lifetimes.
Fix a probability measure µ ∈ P(M ). We now define the fixed rebirth process with rebirth distribution µ, abbreviated to FR(µ). The FR(µ) process X µ = (X µ t ) t≥0 has generator −L µ , as defined in (8). Intuitively, it is a Markov process with càdlàg paths which evolves according to the SDE (2) between regeneration events, which occur at rate κ(X µ t ). At such an event the location is drawn independently from distribution µ. Let (P µ t ) t≥0 denote the semigroup of this process. Lemma 2.6. Given µ ∈ P(M ), an invariant measure for the FR(µ) process is given by Proof. Let f ∈ D(L µ ), the domain of the generator. We wish to show that µRL µ f = 0. Note by (10) that we can write Then where we used the backward equation Note that by Tonelli's theorem we can exchange the order of integration to find where ξ ∼ Exp(1) by the definition of our killing construction.
Thus putting the terms together we have that Thus it follows that Π(µ), which is the normalized version of the measure µR, is an invariant measure for the FR(µ) process.
Proof. This will follow straightforwardly from the coupling inequality, cf. [18, Section 4.1], which states that L(X) − L(Y ) 1 ≤ P(X = Y ) for a coupling P of random variables X, Y with laws L(X), L(Y ) respectively. Since we are assuming the killing rate κ is bounded below by κ and the rebirth distribution is fixed, we can couple two processes started from different initial distributions at the first arrival time of a homogeneous Poisson process with rate κ. Proof. We know that R is a bounded linear operator on C(M ). Thus as noted in Remark 2.4 it acts dually on the space of finite signed Borel measures, and is continuous on the dual space. So it must also be weak-* continuous. Continuity of µ → Π(µ) follows. Proposition 2.9. µ ∈ P(M ) satisfies the fixed point equation if and only if µ is quasistationary for the diffusion Y killed at rate κ. There exists a unique quasistationary distribution π for the diffusion Y killed at rate κ. Furthermore, π has continuous density with respect to the Riemannian measure, which will also be denoted by π.
Proof. Suppose µ is invariant for L µ . Then we have for all smooth f , which is equivalent to Since µ(κ) > 0 this tells us that µ is a quasistationary distribution for X. Conversely, suppose µ is quasistationary for Y . Then for all smooth f and some λ 0 > 0. Then choosing f ≡ 1, we find that µ(κ) = λ 0 , from which it follows that µL µ f = 0. Hence µ is stationary for L µ . Multiplicity of quasistationary distributions is related to processes which spend too long at infinity, outside of compact sets cf. [8,Section 7.7]. Since we are working on a connected compact Riemannian manifold therefore there can be at most one quasistationary distribution.
π must also be absolutely continuous with respect to the Riemannian measure since it is the quasistationary distribution of the diffusion Y ; the laws of such diffusions are absolutely continuous with respect to the Riemannian measure. Since the killing rate is also smooth this gives a continuous density. Remark 2.10. As noted in Section 2, in practical settings the quasistationary distribution π will be chosen to coincide with a distribution of interest by choosing κ according to (6).

Basic properties
We are now in a position to define the deterministic measure-valued flow that will characterise the asymptotic behavior of the normalized occupation measures (µ t ) t≥0 .
Recall, as in [2], that on a metric space E a semiflow Φ is a jointly continuous map Φ : such that Φ 0 is the identity on E and Φ t+s = Φ t • Φ s for all s, t ∈ R + . We would like to define a semiflow Φ on the space E = P(M ) of probability measures with the topology of weak-* convergence, which is a metric space. In particular we want t → Φ t (µ) to solve the measure-valued ordinary differential equation (ODE)ν in the weak sense, meaning that for any test function f ∈ C(M ) We define such a semiflow by adapting the approach of [4, Section 5] to our present setting. As noted in Lemma 2.3, the operator R is a bounded linear operator mapping from the Banach space C(M ) to itself. This allows us to define, for any t ≥ 0, the bounded linear operator e tR : C(M ) → C(M ), whose dual acts on the space of finite signed Borel measures, equipped with the total-variation norm.
This allows us to define, for each t ≥ 0, the probability measures The map t →Φ t (µ) satisfies the weak measure-valued ODĖ To get a solution to our actual ODE (15) we employ a suitable time change, imitating [6] and [4]. Similarly to [4], for t ≥ 0 set Proof. Immediate since the function x → E x [τ ∂ ] is uniformly bounded below by 1/κ since the killing rate is bounded above byκ.
Recall that we have equipped the space P(M ) with the weak-* topology, which is a compact metric space. Proposition 3.2. Φ is an injective semiflow on P(M ), and for each µ ∈ P(M ) t → Φ t (µ) is the unique weak solution to (15).
Proof. This is identical to the proof of [4, Proposition 5.1].

Stability of π
Recall that we are interested in working with respect to the measure Γ(dy) = γ(y) dy where γ = exp(2A). From Proposition 2.9, we see that there is a (unique) quasistationary distribution π which is a fixed point of Π, which also has a density with respect to the Riemannian measure, which we also denote by π. Set ϕ := π/γ which is a smooth positive function.
We now list some basic facts about π and ϕ which can all be verified through routine manipulations. There exists λ κ 0 > 0, which describes the asymptotic rate of killing, such that L κ ϕ = λ κ 0 ϕ. Writing where this final identity holds both in terms of π as a measure, and pointwise as density functions.
We wish now to analyse the asymptotic behavior of the semiflow Φ defined in Section 3. To do this we will derive a drift condition for a reweighted version of the Markov process.
Consider the bounded linear operator L : C(M ) → C(M ) given by By the Hille-Yosida theorem [14, Chapter 1] L generates a Markov semigroup (K t ) t≥0 . For each t ≥ 0, We can also define the kernels (K t ) t≥0 , For the process defined by (K t ) t≥0 it can easily be seen from (16) that the measure πϕ given by (πϕ)(f ) = f (x)ϕ(x)π(x) dx is an invariant measure. Since ϕ is bounded, we see that πϕ is in fact a finite measure. In what proceeds, without loss of generality we rescale ϕ so that πϕ(1) = 1.
We would like to show that this process is V -uniformly ergodic, with V = 1/ϕ. We do this using a drift condition from Theorem 5.2 of [9]: for a continuoustime irreducible aperiodic Markov process with extended generator L, assume it satisfies for constants b, c > 0 and petite set C then the process is V -uniformly ergodic. Heuristically, a petite set is a set from which the Markov process leaves with a common minorizing measure. The precise definition is carefully presented in [9, Section 3]. For our present purposes, since we have a Feller process, it is enough to note that compact sets are petite. Proof. Recall we have set V = 1/ϕ. Then for β = 1/λ κ 0 , for all x, which exists since V is bounded above and κ is bounded away from 0. Note our entire space is compact, hence petite. Thus the drift condition holds.
By [9,Theorem 5.2] this implies V -uniform ergodicity: There exist constants D and 0 ≤ ρ < 1 such that for all x ∈ M : Multiplying through by ϕ(x) and relabeling ϕg as f , we see that the condition |g| ≤ V = 1/ϕ is equivalent to |f | ≤ 1, hence proving uniform ergodicity: For Thus we will have, for any initial distribution µ, Proposition 3.4. We have convergenceΦ t (µ) → π as t → ∞ in total variation distance, uniformly in µ .
Proof. Let ϕ * := min x∈M ϕ(x), which is positive, since ϕ is a continuous positive function on a compact set. We find that for any t ≥ (log 2D − log ϕ * )/ log ρ −1 we have by (18) we have then for any probability measure µ and continuous f with |f | ≤ 1 Finally, we see that this convergence carries over to the semiflow Φ.
Proof. This follows from Proposition 3.4 and the fact thatṡ µ (t) is bounded above by 1/κ uniformly in µ and t, as in Lemma 2.3. The boundedness of this derivative ensures that its inverse τ µ satisfies for all µ ∈ P(M ) and t ≥ 0 Remark 3.6. In the language of [2], this shows that π is a global attractor of the semiflow Φ.

Basic properties and definitions
We now wish to relate the stochastic behavior of the normalised weighted empirical measures Recall that by Assumption 3, since g is continuously differentiable, strictly increasing and g(t) → ∞ as t → ∞, it is a diffeomorphism of R + → R + . Thus we can define its inverse function g −1 : R + → R + , which is also continuously differentiable and satisfies g −1 (t) → ∞ as t → ∞. Let for all t ≥ 0, where we defined h(t) := g −1 (re t − r). We will show that almost surely t → ζ t is an asymptotic pseudo-trajectory of the semiflow Φ defined in Section 3. Since applying the chain rule and product rule for derivatives yields Looking at the first bracket, we recognize the flow from Section 3. Thus to formally show that ζ t approximates the flow, we need to control the second bracket. We also note here for future reference that We formalize this intuition by the approach of [5,Proposition 3.5] and [12,Lemma 5.4]. Clearly an asymptotic pseudo-trajectory must be uniformly continuous. Theorem 3.2 of [2] tells us that, a uniformly continuous path ζ is an asymptotic pseudo-trajectory if and only if every limit point of the time shifts Θ t ζ in the topology of uniform convergence on compacta is itself a trajectory of the flow. We define M(M ) to be the space of Borel signed measures on M , equipped with the weak-* topology, which can be metrized analogously to (4). Let C(R + , P(M )) and C(R + , M(M )) be the spaces of continuous paths mapping R + into P(M ) and M(M ) respectively, each equipped with the topology of uniform convergence on compact subsets of R + . As usual, for each t ≥ 0 we define Θ t : C(R + , P(M )) → C(R + , P(M )) to be the shift map given by with F (ν) := −ν + Π(ν), this is equivalent to showing that the limit points of Θ t ζ are fixed points of L F .
Define the collection of signed measures for all t, s ≥ 0. We note that for each t ≥ 0, t (·) ∈ C(R + , M(M )). Proof. Given continuous f with f ∞ ≤ 1, we have Hence we have uniform continuity. Suppose the condition (22) holds for any T > 0 and smooth f ∈ C(M ). This says that t (·) converges to 0 in C(R + , M(M )). Consider the family {Θ t ζ} t≥0 . Supposeζ is a limit point of this family in C(R + , P(M )). Analogously to [5], we can use (20) and the definition of t to write Since we are assuming precisely (22), and since L F is continuous, taking t → ∞ in (23) along the appropriate subsequence we obtainζ = L F (ζ). This shows that the limit pathζ is a fixed point of L F , completing the proof. Conversely, suppose ζ is an asymptotic pseudo-trajectory for Φ. By definition, this means that Θ t ζ − L F (Θ t ζ) converges to 0 in C(R + , M(M )) as t → 0. Thus by (23) we conclude that t → 0 in C(R + , M(M )), that is, (22) must hold for any T > 0 and smooth f ∈ C(M ) as required.
The goal now is to establish the requirements of Theorem 4.1 almost surely. As remarked, tightness is trivial since we are working on a compact space. It remains to establish (22).

Poisson Equation
To show that (22) holds we will measure the discrepancy between the cumulative occupancy and the quasistationary distribution by a solution to the Poisson equation Fix any µ ∈ P(M ). Define for f ∈ C(M ), Proof. From (13), we have that uniformly over x ∈ M and µ ∈ P(M ).
Recall that −L µ is the generator of the FR(µ) process. Then Proof. Without loss of generality, assume Π(µ)f = 0; just replace f with f − Π(µ)f . Then taking t → ∞ in (25) and using Proposition 2.7 we see Remark 4.6. Note that we can think of the operator as a projection operator since it is linear and idempotent (K 2 µ = K µ ).

Bounding the discrepancy
Using our solution to the Poisson equation, we now rewrite the discrepancy term, and decompose it using Itô's formula. Making the change of variables u ← h(u) we write Consider now a time-dependent function f : R × M → R, written as f s (x). For simplicity we will take the notation ∇ and ∇ 2 to refer to the gradient with respect to the coordinates of M , and f s to be the time derivative ∂f t /∂t t=s . We now apply Ito's formula for general semimartingales (as formulated, for instance, as Theorem 14.2.4 of [7]), taking advantage of the fact that all quadratic covariation terms are 0 for the Brownian motion that is driving our process X t : Using the formula (8) for L µ we find where W u is a Wiener process on M . We apply this formula now to the function f s (x) = Q µs f (x)α s , and rearrange the terms to obtain where (1) ∂ ∂u Q µu f (X u− ) α u du, (4) and N and J are the local martingales so by Lemma 4.3 and the bounds (12) we see that there is a constant C (de- The Burkholder-Davis-Gundy inequality then implies the existence of a constant C 4 such that for any δ > 0 α 2 s ds. Using (7) from Assumption 3, it follows from the Borel-Cantelli lemma that almost surely, for any δ > 0 and T > 0,

Proof of Corollary 1.3
This follows from Theorem 1.1 and Remark 3.6, since limit sets of asymptotic pseudo-trajectories are attractor free sets, and so will be contained in the global attractor of Φ, which is {π}. These relationships are spelled out in [2, Section 5].

Some comments on non-compact manifolds
In this work we have restricted our analysis to compact state spaces. Practically speaking, though, QSMC methods such as ScaLE and ReScaLE are applicable on non-compact state spaces, such as R d .
The key difficulties in extending this present work to the non-compact state space setting are: 1. establishing almost sure tightness of the occupation measures, and 2. arguing that E[ t 0 κ(X s ) ds] = O(t) as t → ∞. We have also implicitly assumed there is a lower bound on the Ricci curvature, for the local bounds on the growth of the gradient of the Brownian transition kernel used in section 4.3.3.
To establish tightness it might be helpful to consider the discrete skeleton at the regeneration times, (µ Tn ) n∈N , as a measure-valued Pólya urn process, as introduced in the recent work [15].
The second point -bounding E[ t 0 κ(X s ) ds] -is necessary in controlling the variance of the jump martingale in (32). [12] works in a non-compact setting and considers a related problem, namely studying self-interacting diffusions where the occupation measures (µ t ) come into play through the drift of the diffusion. However the techniques utilisedsuch as the construction of explicit Lyapunov functions -are not immediately applicable. Our choice of the metric (4) also needs to be modified, since -contrary to what is stated in [12, Section 2.1.1] -the space of bounded continuous functions is in general not separable on a non-compact state space.