Localization of directed polymers in continuous space

The first main goal of this article is to give a new metrization of the Mukherjee--Varadhan topology, recently introduced as a translation-invariant compactification of the space of probability measures on Euclidean spaces. This new metrization allows us to achieve our second goal which is to extend the recent program of Bates and Chatterjee on localization for the endpoint distribution of discrete directed polymers to polymers based on general random walks in Euclidean spaces. Following their strategy, we study the asymptotic behavior of the endpoint distribution update map and study the set of its distributional fixed points satisfying a variational principle. We show that the distributdion concentrated on the zero measure is a unique element in this set if and only if the system is in the high temperature regime. This enables us to prove that the asymptotic clustering (a natural continuous analogue of the asymptotic pure atomicity property) holds in the low temperature regime and that the endpoint distribution is geometrically localized with positive density if and only if the system is in the low temperature regime.


Introduction
The directed polymer model was introduced in the physics literature [HH85], [HHF85], [Kar85], [KN85], [KZ87] and mathematically formulated by Imbrie and Spencer [IS88]. Since then, many models of directed polymers in random environment were studied in the literature over last several decades, see, e.g. books [Szn98], [Gia07], [dH09], [Com17] and multiple references therein. The common feature of these models is that they are based on Gibbs distributions on paths with the reference measure usually describing a process with independent increments (random walks, if the time is discrete) and the energy of the interaction between the path and the environment is given by a space-time random potential (with some decorrelation properties) accumulated along the path.
One of the intriguing phenomena that these models exhibit is the transition of dynamics of directed polymers between high/low temperature regimes. In the high temperature regime, directed polymers have diffusive behavior which is similar to that of the classical random walks and the endpoint distributions of polymer paths of length n are typically spread over domains of size of the order of n 1/2 (see [Bol89], [SZ96], [AZ96]). On the other hand, in the low temperature regime, they are super-diffusive, i.e. the typical transverse displacement of polymer paths is of the order of n ξ with ξ > 1/2. In particular, it has been conjectured that ξ = 2/3 for d = 1, based on two following observations: (i) when β = +∞, the directed polymer models coincide with the last passage percolation (LPP) models; (ii) Integrable LPP models have shown the spatial fluctuation of order n 2/3 and the fluctuation of passage times of order n 1/3 placing LPP in the KPZ universality class [Cor12]. This has been proved in some integrable models, see [Sep12], [BCF14]. Besides the super-diffusive behavior, it is known that polymer measures are mostly concentrated within a 1 relatively small region in the low temperature regime, see [CV06], [Var07], [Lac10], [BK10]. Such localization phenomenon of directed polymers is closely related to the intermittency of the solution of stochastic heat equation, see [CM94], [BC95], [Kho14]. It is believed that the size of the small region is O(1) but this has been proved only for integrable models, see [CN16]. It is also conjectured that a similar picture holds for generalized directed polymers, see [BK18].
While many integrable models for (1+1)-dimensional directed polymers have been extensively studied (see [MO07], [ACQ11], [MFQR13], [AKQ14]), the results on higher dimensions are rather restricted. In [BC20] and its improved version [Bat18], a novel machinery was suggested to study localization of directed polymers that are discrete in space and time. This approach is based on another recent achievement, a compactification of the space of probability measures on R d with respect to the weak convergence [MV16] (we will refer to this compactification as the MV topology in this paper). In [BC20], the authors introduce a simple metrization of the MV topology induced on the space of measures concentrated on Z d and they were able to obtain localization results for discrete directed polymers by using the metric.
The first goal of this paper is to develop a new metrization of the MV topology that will be useful for space-continuous polymer models. Our new metrization is inspired by the one used in the discrete setting in [BC20] and is based on coupling in optimal transport. Its relation to the metric given in [MV16] resembles the equivalence between the definitions of the Kantorovich-Wasserstein distance via optimal coupling (2.10) and via Lipschitz test functions (2.12) known as the Kantorovich duality.
The second goal of this paper is to introduce a broad family of time-discrete and space-continuous polymer models where polymers are understood as discrete sequences of points in R d , and to generalize the entire program of [BC20] to these models with the help of our new metrization of the MV topology.
As this paper was being prepared we learned that similar results were obtained in [BM19] for a specific model where the reference measure is Brownian and the random potential is the space-time white noise mollified with respect to the space variable. We stress that the only assumption we need on the reference measure for polymers is that it defines a random walk, with no restriction on the distribution of i.i.d. steps in contrast to a concrete model of [BM19].
Due to the absence of assumptions on the random walk steps, we can say that our results generalize those of [BC20] and [Bat18] that are restricted to lattice random walks (except that a moment assumption on the potential is slightly weaker in [Bat18]) since one can embed any i.i.d. random potential indexed by Z d into a stationary potential on R d with a small dependence range.
In addition, we give a new result that goes beyond the asymptotic pure atomicity results of [BC20] and [BM19]. Under the assumption that the reference measure is absolutely continuous with respect to the Lebesgue measure, several forms of asymptotic clustering property hold for the random density of the polymer endpoint distribution in low temperature regime. An important feature of our work is that our results are based on the new metrization of the MV topology which is of independent interest. However, a similar program was executed in [BM19] using the original metrization.
The article is organized as follows: In the remaining part of Section 1, we introduce our general model of directed polymers, review the results in discrete setting, and state our results for localization/delocalization of directed polymers. In Section 2, we review the MV topology and introduce a new metric which is equivalent to the original MV metric and useful for our analysis of polymer measures. In Sections 3, 4, and 5, we develop a program parallel to [BC20], proving the continuity of the update map that maps the law of the endpoint distribution to the one of the next step endpoint distribution and proving that the empirical measure of the endpoint distribution of directed polymers converges to the set of free energy minimizers which is a subset of the set of fixed points of the update map. We will also see how the set of free energy minimizers can characterize the high/low temperature regimes. In Section 6, we introduce an asymptotic clustering property that is an analogue of the asymptotic pure atomicity studied in [Var07], [BC20] for discrete directed polymers, and prove that it holds for the endpoint distribution in the low temperature regime. In Section 7, we show that the endpoint distribution of directed polymer is asymptotically geometrically localized with positive density.
Acknowledgements. We are grateful to Erik Bates, Chiranjeeb Mukherjee, and Raghu Varadhan for stimulating discussions. YB thanks NSF for partial support via grant DMS-1811444.
1.1. The model of directed polymers in stationary environment. We begin with a Markov chain (ω n ) n∈N , {P x } x∈R d on R d , defined on a measurable space (Ω p , F ), where • Ω p = (R d ) N = {ω = (ω n ) n≥0 : ω n ∈ R d }, • F is the cylindrical σ-algebra on Ω p , • For each x ∈ R d , P x is the unique probability measure such that (ω n+1 − ω n ) n≥0 are i.i.d. and P x (ω 0 = x) = 1, P x (ω n+1 − ω n ∈ dy) = λ(dy) (1.1) for any nondegenerate Borel probability measure λ on R d . We stress that unlike the existing papers on directed polymers, we do not require λ to be a lattice distribution. In fact, for most of the paper, we do not impose any restrictions on λ at all. Thus λ may be an arbitrary mixture of Lebesgue absolutely continuous, singular, and atomic distributions, and, if atomic, it does not have to be concentrated on any lattice (we only exclude the trivial case where λ is a Dirac mass). We denote expectation with respect to P x by E x . We also write P and E for P 0 and E 0 .
The random environment that we will consider is a real-valued, non-constant random field X(n, x) n∈N,x∈R d , defined on a probability space (Ω e , G , P) such that • X(n, · ) n∈N are independent and identically distributed, • X(1, x) x∈R d is stationary and M -dependent for some finite number M , i.e., for any subset A, B ⊂ R d with dist(A, B) := inf{|x − y| : x ∈ A, y ∈ B} > M , (X(1, x)) x∈A and (X(1, x)) x∈B are independent of each other. • X(1, ·) has continuous trajectories, i.e., the mapping x → X(1, x) is P-a.s. continuous.
The continuity condition can actually be weakened, see Remark A.2. We will write E for expectation with respect to P. X(1, x) will be sometimes shortened to X(x) for convenience. We denote by β ≥ 0 the inverse temperature parameter and will assume c(κ) := log E exp κX(0) < ∞ for κ ∈ [−2β, 2β].
(1.2) For given n ∈ N, x ∈ R d , we define the point-to-point quenched polymer measure, starting from x, at time n as is called the point-to-line partition function. Let M n and Z n denote the polymer measure and the partition function corresponding to P , of length n. Notice that (M n ) n≥0 , (Z n ) n≥0 are random processes adapted to the filtration (G n ) n≥0 given by G n = σ(X(k, x) : 1 ≤ k ≤ n, x ∈ R d ). 1.
2. An outline of existing results in discrete setting. Directed polymer models have been largely studied on the lattice Z d . In this section, we recall the well-known results in the discrete setting, which will be extended to the continuous model in this paper. To stress the similarity with our model, we will use the same notation here as for our continuous setting. That is, in this section, we let M n be the quenched polymer measure on paths of length n defined on (Z d ) N by where • P is the distribution of the d-dimensional simple random walk starting at 0, • the random environment X(k, x) k∈N,x∈Z d is given by a collection of non-constant, i.i.d. random variables defined on some probability space (Ω e , G , P) and Most of the mathematical results on directed polymers were obtained mainly by analyzing the asymptotic behavior of the partition function Z n . One of the interesting quantities, called the quenched free energy, is given by It turned out that the phase transition in directed polymer model is characterized by the discrepancy between the quenched free energy and the annealed free energy, which is Applying a superadditivity argument developed in [CH02], we see that the limit is well-defined. The following exponential concentration inequality enables us to make (1.3) stronger: Theorem A (Theorem 1.4 in [LW09], for Q = 1). Let β > 0 be fixed such Ee β|X(1,0)| < ∞ Then, there is a constant a > 0, depending only on β and the law of X, such that In particular, lim n→∞ F n (β) = p(β) a.s. and L p for all p ∈ [1, ∞) (1.4) We remark that Theorem A was proved for discrete setting but the proof can be easily adapted to our space-continuous setting. Therefore, we will use (1.4) later without further proof.
The Lyapunov exponent of the system is defined as where the inequality follows from Jensen's inequality. Before describing the phase transition of directed polymers, we give a statement for the existence of critical temperature.
Theorem B was first proved in [CY06] when the reference measure is the simple random walk and c(κ) exists for all κ ∈ R. [Bat18] enhanced this by extending to reference measures given by arbitrary random walks on Z d and weakening the moment condition of random environment. Extending this result to general random walks on R d is straightforward.
We now collect three statements which describe how the Lyapunov exponent identifies the phase transition of directed polymers. We denote by the endpoint distribution of directed polymer of length i.
Theorem C tells that the endpoint distribution can localize partial mass in the low temperature regime. Vargas proposed in [Var07] the notion of "asymptotic pure atomicity", which describes the localization of the entire mass of the endpoint distribution. For any i ≥ 0, ǫ > 0, let Then, (ρ i ) i≥0 is called asymptotically purely atomic if for every sequence (ǫ i ) i≥0 tending to 0, we have Convergence in probability was used in [Var07] and the author proved that if c(β) = ∞, then (ρ i ) i≥0 is asymptotically purely atomic. Bates and Chatterjee replaced it with almost sure convergence and proved the following: Theorem D (Theorem 6.3 in [BC20], Theorem 5.3 in [Bat18] ).
is asymptotically purely atomic.
Theorem E illustrates how the favorable sites, which localize mass in the endpoint distribution of directed polymers, cluster together. For δ > 0 and K > 0, let G δ,K be the collection of probability measures on Z d that assign mass greater than 1−δ to some subset of Z d having diameter at most K. (We use the l 1 distance here.) We say that (ρ i ) i≥0 is geometrically localized with positive density if for every δ > 0, there exist K > 0 and θ > 0 such that Theorem E (Theorem 7.3 (a), (c) in [BC20], Theorem 5.4 in [Bat18]). Λ(β) > 0 ⇔ (ρ i ) i≥0 is geometrically localized with positive density.
1.3. Main results of this paper. The first main result of this paper is the development of a new metrization of the translation-invariant compactification of the space of probability measures. The structure of the metric and relevant background are provided in Section 2. As an application of the theory developed in Section 2, we prove analogues of Theorems D and E for our model of directed polymers in the continuous space. Before stating our results, we denote the quenched endpoint distribution for the polymer of length n by ρ n (dx) = M n (ω n ∈ dx).
We extend the notion of asymptotic pure atomicity applicable in discrete case to the continuous case in three ways. We introduce three related notions of clustering: we define asymptotic clustering at level r > 0 in Definition 6.1 (this notion is also considered in [BM19]), the notion of asymptotic local clustering in Definition 6.2, and the notion of asymptotic clustering of densities in Definition 6.3. For a sequence of absolutely continuous measures, the asymptotic local clustering is equivalent to the asymptotic clustering of densities, see Remark 6.4.
The following results concerning the notions of asymptotic clustering at positive levels and asymptotic local clustering (analogues of Theorem D on asymptotic pure atomicity) are proved in Section 6, see Theorems 6.7 and 6.8: Theorem 1.1. For r > 0, ǫ > 0, and i ≥ 0, let us define where V d is the volume of the unit ball in R d .
(a) If β > β c , then for every r > 0 and every sequence (ǫ i ) i≥0 tending to 0, The following localization result (an analogue of Theorem E) is proved in Section 7, see Theorem 7.3: Theorem 1.2. For δ > 0 and K > 0, let us define a set where M 1 is the collection of probability measures on R d .

Compactification of a space of probability measures
In [BC20], the authors pointed out that the usual topologies of weak/vague convergence of probability measures are inadequate to capture the localization phenomenon of directe polymers. To tackle the issue, they used an analogue of the compact metric space ( X , D) constructed in the work of Mukherjee and Varadhan [MV16].
The idea behind the MV topology is that two measures are considered close to each other if one can find several well-separated regions of high concentration for each of them such that the restrictions of the measures to these regions are close to being spatial translations of each other. To encode this, it is natural to work with an extension of the space of measures on R d to the space of measures on N × R d where multiple layers (copies of R d ) correspond to multiple domains of concentration. In this approach, it is natural not to distinguish between two measures in one layer if they are obtained by a translation of each other, and the order of the layers is not important either. Now all measures on R d that can be approximated as a sum of translations of the measures in the layers without much overlap can be viewed as being close to each other.
We will discuss two formalizations of these ideas in this section. While the Mukherjee-Varadhan (MV) topology was originally defined through test functions, Bates and Chatterjee introduced a different form of metric on the space of sub-probablity distributions on N × Z d in [BC20] and showed that their metric space is equivalent to the discrete version of the MV topology. We recall the original MV definition first and then construct a metrization of the MV topology that is similar to the metric introduced in [BC20].
Before we begin, we give a brief guide to the notations that we use throughout the paper.
• x, y are used for elements of R d and u, v for elements of N × R d .
• ξ, ζ denote elements of P( X ), the space of probability measures on X introduced in Section 3.
• Functionals on X are usually denoted by capital letters, such as T, R and I r , while those on P( X ) are denoted in calligraphic fonts, e.g., T and R.
2.1. Mukherjee-Varadhan topology. For any a ≥ 0, we denote by M a = M a (R d ) (M ≤a ) the space of measures on R d with mass a (less than or equal to a) and by M a = M a / ∼ the quotient space of M a under spatial shifts on R d . For any α ∈ M a , its orbit is defined bỹ where α 1 * α 2 denotes the convolution of α 1 and α 2 in M ≤a , i.e., for any measurable set A in R d , In particular, if α 2 (dx) = f (x)dx, then α 1 * α 2 (dx) = f (x − y)α 1 (dy)dx. We denote the zero measure on R d or N × R d by 0. Let us recall the notions of the weak topology and the vague topology on M a and M ≤a which will be used in this paper. We say that a sequence (α n ) n∈N in M a (or M ≤a ) converges to α in the weak topology and write α n ⇒ α if for all bounded continuous functions f on R d . We say a sequence (α n ) n∈N in M ≤a converges to α in the vague topology and write α n ֒→ α if (2.1) holds for all continuous functions with compact support. Note that the weak convergence preserves the total mass of measures, while the vague convergence may fail to do so. Another distinction between two topologies is that M ≤a is compact in the vague topology, but not in the weak topology.
Throughout the paper, we will work with multisets (sets with multiplicities) [α i ] i∈I consisting of elements of M ≤1 . We define to be the space of all empty, finite or countable collections of orbits of subprobability measures on R d . For convenience, we slightly depart from the original definition in [MV16] and do not allow α i to be a zero measure.
Let us introduce an interpretation of X as a quotient space of X = M ≤1 (N × R d ). For µ ∈ X , we can write µ(dk, dx) = i∈N α i (dx)δ i (dk) and identify µ with the (ordered) sequence µ = (α i ) i∈N , of subprobability measures on R d , with α i ≤ 1. Some of α i may be equal to 0. We define the The following definition explains when two measures from X are representatives of the same element of X : Thus µ = [ α i ] i∈I ∈ X can be represented or viewed as an element of X (a measure on N × R d ), a sequence (α i ) i∈N of measures in R d by taking α i = 0 for all i / ∈ I. We will often not make a distinction between µ ∈ X and its representative. We will write 0 (instead of ∅) for the empty multiset of X since its sole representative is 0.
In order to define the metric and convergence in X , we need to specify test functions. For an integer k ≥ 2, let F k be the space of continuous functions f : (R d ) k → R which are translation invariant and vanishing at infinity, i.e.
f (x 1 + y, · · · , x k + y) = f (x 1 , · · · , x k ) ∀ x 1 , · · · , x k , y ∈ R d , Note that F k , equipped with the uniform norm, is separable. Therefore, if we denote F = k≥2 F k , we can choose a countable dense subset {f r (x 1 , · · · , x kr )} r∈N of F. We also check that for any f ∈ F k and µ = [ α i ] ∈ X , the functional is well-defined due to (2.3). For any µ, ν ∈ X , we now define (2.5) Here f denotes the uniform norm. We state a theorem proved in [MV16].
2.2. Reinterpretation of the MV topology. Due to the analogy with [BC20], the compact metric space ( X , D) is expected to be suitable for studying localization for directed polymers on N × R d . However, one might have difficulties in extracting some information on two elements µ = [ α i ], ν = [ γ i ] ∈ X close to each other. More precisely, one would expect that if D(µ, ν) is very small, one can match large parts of measures α i and γ j by applying appropriate translations to subsets of R d . Motivated by the approach taken in [BC20], in the present paper, we attempt at expressing this idea more explicitly in the definition of an appropriate metric. Similarly to having two definitions of the Wasserstein distance in terms of Lipschitz test functions and in terms of couplings, it would be natural and helpful to introduce an equivalent metric on X that is based on coupling. Adopting the ideas from [BC20], we construct such an equivalent metric which allows us to obtain explicit estimates needed to show continuity of some functionals defined on X . Before constructing the metric rigorously, we need to introduce some notations. We define a distance between two elements u = (i, x) and v = (j, y) of N × R d by This definition is natural in the sense that we would like to record two concentrated regions getting away from each other on different copies of R d . For r > 0, we denote by B r (u) the open ball centered at u with radius r in N × R d and similarly by B r (x) in R d . Notice that B r (u) = {i} × B r (x) by (2.6).
The right-hand side in (2.4) can be expressed in terms of functions defined on N×R d instead of R d . More precisely, for an integer k ≥ 2, let G k be the space of continuous functions g : (N × R d ) k → R that are translation-invariant and vanishing at infinity, i.e. g(u 1 + v, · · · , u k + v) = g(u 1 , · · · , u k ) ∀u 1 , · · · , u k , u 1 + v, · · · , u k + v ∈ N × R d , (2.7) For any g ∈ G k , g = 0 only if all u 1 , u 2 , · · · , u k belong to the same copy of R d due to (2.7). Therefore, there is a unique f ∈ F k such that In other words, there is a natural bijection Then, considering µ as an element of X , we have Another remark is that any continuous function f : (R d ) k−1 → R vanishing at infinity can be identified with an element of F k by mapping it to For any α ∈ M ≤1 ( or X ) and non-negative function f which is integrable with respect to α, we writeᾱ = f α ifᾱ is defined asᾱ(A) = A f α(dx) for each measurable set A. Moreover, we sayᾱ is a submeasure of α (denoted byᾱ ≤ α) if 0 ≤ f ≤ 1. For any signed measure µ on R d or N × R d , denote by µ the total variation of µ.
2.3. The Wasserstein distance. In this section, we recall the basics on the Wasserstein distance. Similar notions were first introduced to solve the Monge-Kantorovich transportation probem and it turned out that such distances can be used extensively in the variety of fields (see, e.g., [Vil09]).
To any metric d Euc on R d generating the Euclidean topology, we can associate a transport distance on measures as follows. For α, γ ∈ M a (a > 0), let Π(α, γ) be the collection of Borel probability measures on (R d ) 2 such that the marginal distribution of the first argument is α/a and of the second argument is γ/a. Then, the Wasserstein distance between α and γ is defined by (2.10) It is known that the infimum on Π is achieved. In this paper, we choose to work with a bounded metric d Euc (x, y) = |x − y| ∧ 1, so that W metrizes the topology of weak convergence of M a . For α, γ ∈ M a , we define Since the choice of representatives does not affect the value of W , it is well-defined. One can check that W is a metric on M a and metrizes the weak topology of M a . The latter is defined in the following sense: A result of [PR14] allows us to apply the Wasserstein distance to α, γ ∈ M ≤a with different masses. More precisely, the generalized Wasserstein distanceŴ can be defined bŷ and it is proved in [PR14] that the infimum on the right hand side is achieved. The result known as the Kantorovich duality states that for any α, γ ∈ M ≤1 with the same mass, where the supremum is taken over all 1-Lipschitz continuous functions f : It follows from (2.12) that for for any measures µ = µ 1 + µ 2 and ν = ν 1 + ν 2 with µ = ν , µ 1 = ν 1 and µ 2 = ν 2 , one has W (µ, ν) ≤ W (µ 1 , ν 1 ) + W (µ 2 , ν 2 ).
(2.13) 2.4. Construction of a metric on X . We are now ready to define a metric on X . From now on, for any µ = [ α i ] i∈I ∈ X , we will abuse the notation µ and use it for both the element of X and representatives chosen from X . When µ is used in integration, we mean that an explicit representative, such as (α i ) i∈N , is chosen where α i = 0 for all i / ∈ I. Let µ = [ α i ], ν = [ γ i ] ∈ X be given. We first introduce a family of functionals estimating the mass of the heaviest region for a measure in X . For r ≥ 0, we define a function I r on X by (2.14) where r + 1 − |x|, |x| ∈ (r, r + 1], and g r = σ 2 f r (See (2.8) and (2.9)). Note that f r is 1-Lipschitz continuous with respect to d Euc .
We collect some useful properties of I r : • I r (µ) is comparable with the mass of the heaviest ball of radius r under µ, i.e., (2.15) • I r is sub-additive, i.e., I r (µ + ν) ≤ I r (µ) + I r (ν).
• Since M ≤1 is naturally embedded in X , we can define I r (α) for α ∈ M ≤1 in the same way. For any α, γ ∈ M ≤1 with the same mass, (2.12) implies One can check that the choice of the representative of an element in X does not change the value of I r (µ) so I r is also well-defined on X .
(2) For each k, |S µ k | = |S ν k | = 1, i.e. each µ k and ν k has exactly one layer of R d with positive mass. (See (2.2) for the definition of S µ ). Then, an element in P µ,ν is called a (µ, ν)-matching (or simply a matching when there is no confusion). We have the empty matching ∅ included in any P µ,ν . For Remark 2.4. From condition (2) in Definition 2.3, we can identify µ k and ν k as subprobability measures on R d if needed. The quantitity sep(φ) is the degree of separation among the supports of submeasures in the matching. We see that dist supp(µ k 1 ), supp(µ k 2 ) < ∞ only if µ k 1 and µ k 2 belong to the same layer of µ. If the supports of distinct µ k belong to different layers of µ (i.e., µ k 1 ≤ α j 1 and µ k 2 ≤ α j 2 imply j 1 = j 2 for any k 1 = k 2 ) and the same holds for ν, then we have sep(φ) = ∞ due to (2.6). (2.17) Remark 2.6. We see that, for the empty matching, d r,∅, x (µ, ν) = I r (µ)+ I r (ν)+ 2 −r does not depend on x. For any non-empty matching, µ k and ν k are interpreted in two different ways in the right-hand side of (2.17). While they are treated as elements of M ≤1 in the Wasserstein metric term, they are viewed as submeasures of µ and ν, respectively, in X = M ≤1 (N × R d ) in the I r terms.
Let us see how this works in a specific example. For simplicity, we assume d = 1. Let µ = (α 1 , α 2 , α 3 , 0, 0, · · · ) and ν = (γ 1 , γ 2 , γ 3 , γ 4 , 0, 0, · · · ) such that where U (a,b) is the uniform probability measure on the interval (a, b) and N (a,b) is the Gaussian measure with mean a and variance b. We can choose a (µ, ν)-matching φ as follows: Since dist(supp(µ 1 ), supp(µ 2 )) = 8, dist(supp(ν 1 ), supp(ν 3 )) = 11 and the distances of any other pairs are infinity, we obtain sep(φ) = 8. So we can let r to be any number less than 4, let say r = 3, so that they meet the condition of a triple. With the choice of x 1 = −10, When it comes to the I r terms, we view measures µ k and ν k as elements of X : Therefore, we have We can now define where the infimum is taken over all (µ, ν)-triples. One can check that the choice of representatives of µ, ν ∈ X does not affect the value of d(µ, ν) so it is well-defined in X . One can readily check that by choosing the empty matching and letting r → ∞ in a (µ, ν)-triple. Let φ −1 := {(ν k , µ k )} n k=1 ∈ P ν,µ . Then, we see that sep(φ) = sep(φ −1 ) and hence which implies that d is symmetric. With two propositions below, we prove that d is a metric on X .
Proof. Since the "if " part is obvious, it suffices to prove the "only if " part. Let d(µ, ν) = 0 and (α i ) i∈N , (γ i ) i∈N be representatives of µ, ν, respectively. We may assume α i ≥ α i+1 and γ i ≥ γ i+1 for all i by rearranging the order if needed.
Note that r m → ∞. Suppose α 1 = 0 (i.e. µ = 0). If γ 1 > δ > 0, since the empty matching is the only option, we have a m > I rm (ν) ≥ I rm (γ 1 ) > δ for all sufficiently large m, which is a contradiction. Hence, γ 1 = 0 and µ = ν = 0. Now suppose α 1 > 0. By the same argument as above, we have γ 1 > 0. We may assume and let p ∈ N be an integer such that γ 1 = · · · = γ p > γ p+1 . Since a m converges to 0, there is at least one integer l = l(m) such that µ m,l ≤ α 1 for all sufficiently large m. In fact, if this does not hold, then a contradiction. By rearranging the order of pairs in φ m , we may assume that µ m,1 ≤ α 1 and it has the biggest mass among (µ m,k : µ m,k ≤ α 1 ) k . For Then, for all m satisfying r m > R, there is at most one sub-measure µ m,j(m) ≤ α 1 whose support has an overlap with which implies that a m does not converge to 0. Therefore, for all sufficiently large m, and for such m, j(m) = 1 by the definition of µ m,1 . We claim that there is q ∈ N such that ν m,1 ≤ γ q for infinitely many m. To see this, let q m be an integer such that ν m,1 ≤ γ qm . If there is no such an integer q as claimed above, we have q m → ∞ as m → ∞. It follows that γ qm → 0. On the other hand, for all sufficiently large m, γ qm ≥ ν m,1 = µ m,1 > α 1 − 2ǫ by (2.21), which is a contradiction. Hence, the claim is proved and, moreover, we obtain γ q > α 1 − 2ǫ.
Let small ǫ > 0 be given. We choose R as above and We can obtain ν m,1 > γ 1 − 2ǫ for all sufficiently large m by applying the same argument used for α 1 . Then, for all sufficiently large m, We used (2.13) in the first inequality. Letting m → ∞ first and then ǫ ↓ 0, we have W ( α 1 , γ 1 ) = 0, i.e. α 1 = γ 1 . Peeling off α 1 and γ 1 from µ and ν and repeating the same process to obtain α i = γ i for all i, we complete the proof.
For each 1 ≤ k ≤ n 1 , applying (2.22) inductively to all l ∈ j −1 1 (k) = {m : j 1 (m) = k}, we obtain so that the optimal coupling between µ k and η 1,k * δ x 1,k can be split into overlapping parts of η 1,k and the remaining part. Repeating the same process for ν in place of µ, we can defineν k . Now let us define r = min(r 1 , From the subadditivity property of I r and the following inequality (2.24) We claim that To see this, let f k , g k be the Radon-Nikodym derivatives of η 1,k , η 2,k with respect to η. Since supp(η 1,k ) , supp(η 2,k ) are disjoint, respectively, we have f k ≤ 1 and g k ≤ 1 pointwise. Therefore, which proves the claim. Note that (2.25) can be rewritten as In the last identity, we used the fact that since sep(φ 1 ) > 2r 1 ≥ 2r, the support of g r (u, ·) cannot intersect with supp(µ k ) and supp(µ m ) at the same time for any k = m. Similarly, we have We now see that Collecting (2.24), (2.28) and (2.29), we have Letting ǫ ↓ 0 completes the proof.
Having proved that d is a metric on X , we can now study properties of the metric space ( X , d).
2.5. Compactness and equivalence to the MV topology. In this section, we prove that ( X , d) is compact and equivalent to the original MV space ( X , D). We recall that M 1 is naturally embedded in X since we can identify any α ∈ M 1 with the element [ α] ∈ X having representative (α, 0, 0, · · · ).
Proof. This proof is similar to that of Theorem 3.2 in [MV16].
(a) Let µ ∈ X be given and (α i ) i∈N ∈ X be a representative of µ. We may assume α i ≥ α i+1 for all i ≥ 1. For each m ∈ N, there are n = n(m) and R = R(m) such that We may assume 2 −R < 1/m. We denote by λ N the product measure on R d with centered Gaussian marginals of variance N . We can choose N = N (m) and Let us write B and B j for B R (0) and B R (x j ). One can consider a (µ m , µ)-matching given by We observe that We can decompose µ − n j=1 µ 2,k into two parts: It is easy to check that Collecting all estimates gives d(µ m , µ) < 5/m, which implies that µ m → µ as m → ∞.
(b) We now show that for any ( µ n ) n∈N in M 1 , there is a subsequence that converges to µ ∈ X . Since I r is bounded by 1, by passing to a subsequence, we may assume that for every r > 0, there Let µ n,0 = µ n . For each m ∈ N, we can choose inductively a subsequence (µ n,m ) n≥1 of (µ n,m−1 ) n≥1 such that the limit lim For simplicity of notation, we write µ n for µ n,n from now on.
If q 0 = 0 (i.e. q 0 (r) = 0 for all r > 0), then for any r > 0, by choosing the empty matching ∅, we have Letting r → ∞, we obtain that µ n converges to 0 in ( X , d).
If q 0 > 0, by choosing a suitable sequence (a n,1 ) n∈N in R d , we have for some r > 1, µ n * δ a n,1 B r (0) ≥ I r−1 (µ n ) ≥ q 0 2 for all sufficiently large n. Due to the compactness of M 1 in the vague topology, by taking a subsequence if needed, we may assume λ n := µ n * δ a n,1 ֒→ α 1 .
In particular, it is proved in Theorem 3.2 of [MV16] that if q 0 = 1, then β n,1 can be taken to be 0, so we have µ n → α 1 in ( X , d).
If there is k ∈ N such that q k = 0, then we have the following decomposition for µ n : To see (2.31), let us assume that it is not true. By taking a subsequence, we may assume the limit b := lim n→∞ (a n,i − a n,j ) exists for some i > j. We observe that β n,i * δ a n,i ≥ α n,j * δ a n,i = (α n,j * δ a n,j ) * δ a n,i −a n,j . (2.32) Since β n,i * δ a n,i ֒→ 0, lim n→∞ β n,i * δ a n,i (K) = 0 (2.33) for any compact set K in R d . On the other hand, it follows from α n,j * δ a n,j ⇒ α j and α j ≥ q j−1 /2 that there is a compact set K j such that α n,j * δ a n,j (K j ) ≥ q j−1 /3 for all sufficiently large n. Let Then, one can readily check that α n,j * δ a n,i (K ′ j ) ≥ q j−1 /3 for all sufficiently large n. Combining this with (2.33) gives contradiction to (2.32).
We claim that µ n → µ = [ α 1 , · · · , α k ] in ( X , d). The argument is the same as in the proof of (2.34) below, and we omit it here.
If q k > 0 for every k ∈ N, then there are (α n,j ), (β n,j ) in M ≤1 and (a n,j ) in R d such that for all n, k ∈ N, (2.30), (2.31) hold and µ n = k j=1 α n,j + β n,k .
Since α j ≥ q j−1 /2 and j∈N α j ≤ 1, we have q j → 0. We claim that (2.34) Let ǫ > 0 be given. We first choose k = k(ǫ) ∈ N such that q k < ǫ. There is r = r(ǫ) such that for all sufficiently large n ≥ N 1 . We may assume 2 −r < ǫ. We can also find N 2 such that inf i =j i,j≤k |a n,i − a n,j | > 2r for all n ≥ N 2 . Recalling the definition of f r in (2.14), we choose a (µ, µ n )- matching φ = f r α j , f r (· + a n,j )α n,j k j=1 and x n = (a n,1 , · · · , a n,k ). For all n ≥ max (N 1 , N 2 ), W f r α j , f r (α n,j * δ a n,j ) 1 − f r (· + a n,j ) α n,j + β n,k + 2 −r .
In our proof of the equivalence between the MV topology and the topology defined by our metric d, we will use the following theorem: Theorem 2.11 (Theorem 26.6 in [Mun00]). Let X, Y be two topological spaces and let f : X → Y be a bijective continuous function. If X is compact and Y is Hausdorff, then f is homeomorphism.
Before proving the equivalence statement, we recall the original MV metrization D defined by (2.5) and the functional Λ defined by (2.4).
Proof. We fix k ≥ 2. Since ( X , d) is compact by Proposition 2.10 and ( X , D) is Hausdorff being a metric space, it suffices, due to Theorem 2.11, to show the continuity of the identity map e : ( X , d) → ( X , D).
Let us assume that d(µ, ν) < δ. Then there is a (µ, ν)-triple r, φ = {(µ j , ν j )} n j=1 , y = (y 1 , · · · , y n ) (2.39) such that d r,φ, y (µ, ν) < δ. Let Notice that since r > M , we have sup Let α c j = α j − α s j = l:µ l ≤α j µ l for each j. We divide each term of Λ(f, µ) into a core part and a sparse part: From the binomial theorem, we have and, by the mean value theorem, for some p ∈ α c j , α j . Combining (2.42) and (2.43) with (2.41) gives For the core part, where we used in the inequality the fact that sep(φ) ≥ 2r > M , so |f | < ǫ/4 on the support of the off-diagonal products of µ l 's. Substituting (2.44) and (2.45) into (2.40) and summing over all j, we obtain On the other hand, it follows from the non-negativity of f that (2.47) Estimates similar to (2.46) and (2.47) also hold true for Λ(f, ν). We now give an upper bound for W (α ⊗k , γ ⊗k ) in terms of W (α, γ). Let π be the optimal coupling of (α, γ). Then, π ⊗k is a coupling of (α ⊗k , γ ⊗k ) and (2.48) Combining (2.46), (2.47) and (2.48), we conclude that where the inequality in the second line follows from the translational invariance of Λ and (2.12) (we recall that y = (y 1 , · · · , y n ) was introduced in (2.39) as an element of the (µ, ν)-triple).

The update map
In this section, following [BC20], we define an "update map" T which maps the law of the polymer endpoint distribution of length n to that of length n + 1, and prove that T is continuous. As in Section 1.3, the endpoint distribution for the polymer of length n is denoted by Notice that ρ n is a random measure on the probability space (Ω e , G , P) of random environment. We denote by P( X ) the space of Borel probability measures on X and endow the space P( X ) with the Wasserstein metric W: where the infimum is taken over all couplings π of (ξ 1 , ξ 2 ).
3.1. The conditional update map. In this section, we define a "conditional" update map T : X → P( X ) that maps ρ n to the law of ρ n+1 given G n . We recall that P , P x and λ were defined in (1.1) and below it. For each n ≥ 1, let us define P x n (or P n ) = the law of (ω 1 , · · · , ω n ) under P x (or P ). (3.2) We observe that X(i, y i ) + βX(n + 1, x) P n+1 (dy 1 , · · · , dy n , dx) Integrating over x on the both sides gives Since X(n + 1, · ) is independent of G n , the law of ρ n+1 given G n is equal to the law of where Y (·) d = X(·) and Y is independent of G n . In general, for µ = (α i ) ∈ X , we can consider a X -valued random variableμ = (α i ) given bŷ 2). Notice that (3.4) is a generalization of (3.3) because ρ n = 1 for all n ∈ N. The additional term in the denominator allows us to defineμ when µ = 0 and we will see later that this term makes the (conditional) update map continuous.
For any µ = (α i ) i∈N ∈ X and γ ∈ M ≤1 , we will write µ * γ := (α i * γ) i∈N . (3.5) One can check that the convolution is also well-defined for µ ∈ X . The measureμ can be now expressed in terms of (3.5) and integration on N × R d as follows: (3.6) We now would like to have (3.6) to be well-defined on X . For a fixed environment though,μ does depend on the choice of the representative of µ. However, the next proposition claims that the law ofμ is independent of the choice of representative. We recall the equivalence relation ∼ on X introduced in Definition 2.1.
Proof. It suffices to find a coupling of (Y 1 , Y 2 ) such that • Y 1 , Y 2 are random fields with the same law as X, • Y i is used to defineμ i in (3.6), •μ 1 =μ 2 in X .
Let µ 1 = (α i ) and µ 2 = (γ i ). From Definition 2.1, there are a sequence {x i } in R d and a bijection σ : S µ 2 → S µ 1 such that γ i = α σ(i) * δ x i for all i ∈ S µ 2 . Let Y 1 , W be independent random fields with the same law as X and set Then, we have for any i ∈ S µ 2 , which implies thatα σ(i) andγ i belong to the same orbit.
Proposition 3.1 allows us to define the update map T : X → P( X ) by T : µ → law ofμ.
Since M 1 is naturally embedded in X , we can identify the endpoint distribution ρ n with a random element of X . As we discussed before, we have T µ(dν) = P(ρ n+1 ∈ dν | ρ n = µ), or equivalently, T ρ n (dν) a.s.
To see the reverse inequality, let us take any ǫ > 0 and choose a g-triple r, ϕ = {(µ k , ν k )} n k=1 , x such that d r,ϕ, x (µ, ν) < d g (µ, ν) + ǫ. For each pair (µ k , ν k ), there exist submeasuresμ k ≤ µ k and ν k ≤ ν k such that they have the same mass and and by the subadditivity of I r , Plugging (3.9) into (3.8), we obtain that which completes the proof.

Continuity of the conditional update map.
In this section, we prove that T : ( X , d) → P( X ), W is continuous. In our proof of continuity, we follow the general strategy used in [BC20]. Some elements of our proof in the general continuous setting have appeared in [BM19] for a Gaussian setting. We begin with a useful lemma. Proof. We consider two cases. If µ ≤ 1/2, then by Jensen inequality, If µ > 1/2, then again by Jensen's inequality, Proposition 3.3. T : ( X , d) → (P( X ), W) is continuous. That is, for any ǫ > 0, there is δ = δ(ǫ) > 0 such that for µ, ν ∈ X , d(µ, ν) < δ ⇒ W(T µ, T ν) < ǫ.
(Part 1) We first assume that for each k, there is at most one j = j(k) such that S µ j = {k} and the same condition holds for ν. By rearranging the order of (α j ), (γ j ) and translating each γ i if needed, we may assume S µ k = S ν k = {k} for 1 ≤ k ≤ n and x = 0. Let us use the same environment Y to defineα j ,γ j as in (3.4). In addition to A defined in (3.10), . Then, we can writeα i = α * i (dx)/A,γ i = γ * i (dx)/B due to (3.4). Similarly, let us define We first observe the generalized Wasserstein distance terms. (3.16) Summing over k gives We need to estimate all the terms on the right-hand side of (3.17). We start with EŴ (µ * k , ν * k ).
(Part 1.1.1) Upper bound for E Ŵ (µ * k , ν * k ) 2 1/2 We observe that the stationary process (e 2βY (x) ) x∈R d is uniformly integrable and by the path continuity, a lim |x|→0 e βY (x) = e βY (0) almost surely. It follows that (e βY (x) ) x∈R d is L 2 -continuous, i.e., Let π k be the optimal coupling of (µ k , ν k ) and π ′ k be the optimal coupling of (µ ′ k , ν ′ k ). One can check that by considering the following coupling of (µ ′ k , ν ′ k ): Since µ k (e βY (x) ∧ e βY (y) )π ′ k is a unnormalized sub-coupling of (µ * k , ν * k ), i.e., we can use it to estimatê Therefore, Taking the square root and summing over k, we conclude that there is a constant One can write and therefore (3.20) By the independence of V (i, ·) i∈N , we see that Denote c β (x) = E e βX(x) − e c(β) e βX(0) − e c(β) . Notice that c β (x) ≤ e c(2β) for all x ∈ R d . Let us now give an upper bound for the second term in (3.20): One can see that for any α ∈ M ≤1 and r ≥ 0, We recall that M is the radius of dependence of the potential and r > M . Therefore, where C 1 = C 1 (β) > 0 is some constant depending only on β.
(Part 1.2) Upper bound for EI r μ − μ k Let us estimate the second term on the right-hand side of (3.15). Since r > M , we have where we used (3.23) in line 6. The same upper bound holds for EI r ν − ν k .
(Part 2) We now relax the assumption that α k , γ k are minorized by at most one µ j , ν j for each k, respectively. Our goal is to reduce the problem to that studied in (Part 1). To that end, we will find µ ′ and ν ′ in X such that • T µ ′ and T ν ′ are close to T µ and T ν, respectively, • µ ′ and ν ′ satisfy the conditions of (Part 1), i.e., (µ k ) and (ν k ) are submeasures (as elements in M ≤1 ) of mutually exclusive orbits in µ ′ and ν ′ , respectively.
First, we choose R > 0 such that λ(B R (0) c ) < ǫ and decompose λ into central and exterior parts: Recalling that n and (µ i ) n i=1 were defined just before (3.12), let us consider µ ′ = (α ′ i ) ∈ X defined as follows: where we view µ i as a subprobability measure on R d instead of N × R d . In other words, we set the first n layers (α ′ i ) n i=1 in µ ′ to coincide with (µ i ) n i=1 , while the remaining layers (α ′ i ) i≥n+1 are obtained from µ − n k=1 µ k via a shift by n layers. Regarding the interpretation of µ k , we refer readers to Remark 2.6. We also define a function J : {1, · · · , n} → N by We denote byμ ′ an X -valued random variable whose law is T µ ′ . To estimate W(T µ, T µ ′ ), we need to introduce a coupling of (μ,μ ′ ). To this end, let Y, V be independent random fields with the same law as X and let us use Y to defineμ. We fix j in the range of J. For k ∈ J −1 (j), If we impose δ < 2 − max(M,R)−R (i.e., r > max(M, R) + R), then by the definition of a (µ, ν)-triple, (we refer to Lemma A.1 in the Appendix for a precise statement.) We now set and use it to defineμ ′ . Combining this with (3.29), we have Similarly to (3.14) and (3.10), let us define µ * ′ k ,μ ′ k , µ s′ and A ′ with the environment Y ′ and introduce for j ∈ {1, 2}, We estimate the first term on the right-hand side in (3.32). Similarly to (3.16) and (3.17), It follows from supp(µ k * λ 1 ) ∈ U k and (3.30) that Similarly to (3.20), we have (μ k,1 ,μ ′ k,1 ) ≤ 2e (c(−2β)+c(2β))/2 · √ 6(ǫ + √ δ). (3.37) As for the I r 0 terms on the right-hand side of (3.32), we repeat the computation of (3.28) to obtain Combining (3.32), (3.37), (3.38) and recalling that 2 −r 0 = 2 −r/2 < √ δ due to (3.31) and (3.12), we conclude that there is a constant C > 0 such that if δ < min(δ 0 ǫ 2 , 2 − max(M,M )−M ), then The same result holds for ν so that there is ν ′ such that W(T ν, T ν ′ ) < Cǫ. One can check that Thus, (Part 1) can be now applied to estimate W(T µ ′ , T ν ′ ). Finally, the triangle inequality completes the proof in the general case.
3.4. Lifting the update map. We discussed in (3.7) that T ν(dµ) = Γ(ν, dµ) can be understood as a transition kernel for the Markov chain (ρ i ) i≥0 of the endpoint distributions of random polymers. Integrating Γ(ν, dµ) over the initial conditions ν, we can extend T to an operator T on P( X ): (3.39) Therefore, T maps the law of ρ i to the law of ρ i+1 .
Remark 3.5. Based on the definition of T , one can readily check that T ν = T δ ν . Therefore, (3.39) can be rewritten as and by iteration, one has that for (3.40)

Convergence of empirical measure
This section is an adaptation of Section 4 of [BC20] to our general setting.
4.1. The convergence to fixed points of the update map. Let us denote the empirical probability measure of the endpoint distributions on X by The goal of this section is to study the asymptotic behavior of ψ n . We will prove that ψ n converges to a set K of fixed point of T and introduce an "energy functional" R which maps ψ n to a value close to the quenched free energy F n . The functional R allows us to improve the former result by replacing K with a subset K 0 of K with the minimal energy state.

Proof.
We use martingale analysis similar to that in the proof of Proposition 4.1 in [BC20]. Let as a coupling of (ψ n , ψ ′ n ) and applying (2.19) we conclude that W(ψ n , ψ ′ n ) ≤ 2/n. Therefore, it suffices to prove that It follows from (2.12) that is a martingale with respect to the filtration (G n ) n≥1 . Since |h| ≤ 2, we can apply the Burkholder-Davis-Gundy inequality to see that there is a constant C > 0 such that which implies E M n (h)/n 4 ≤ 16Cn −2 and hence, by the Borel-Cantelli lemma, On the other hand, we observe which tells us that (M n (·)/n) n≥1 is an equicontinuous sequence of functions on L. By the compactness of L and Arzela-Ascoli theorem, the limit in (4.1) is uniform in h ∈ L, which completes the proof.
Proposition 4.1 suggests that (ψ n ) n≥1 will be close to the set of fixed points of T as n becomes large. We denote the set of fixed points of T by Notice that K is nonempty since T δ 0 = δ 0 . By applying the same argument as in Corollary 4.3 and Proposition 4.4 in [BC20], we can prove the following: Proposition 4.2. As n → ∞, W(ψ n , K) := inf{W(ψ n , ξ) : ξ ∈ K} → 0 P-a.s.
Proof. Suppose that W(ψ n , K) 0. Then, there is ǫ > 0 and a subsequence (ψ n k ) k≥1 such that W(ψ n k , K) > ǫ for all k ≥ 1. Since P( X ) is compact, we may assume lim k→∞ ψ n k = ψ for some ψ ∈ P( X ), if needed, by choosing a further subsequence. On the other hand, and as k → ∞, each term in the right-hand side converges to 0 due to the convergence of ψ n k , Proposition 4.1, and continuity of T , respectively. It follows T ψ = ψ, i.e., ψ ∈ K, which contradicts the assumption that ψ n k is ǫ-away from K. Proposition 4.3. If ξ ∈ K, then ξ({µ ∈ X : 0 < µ < 1}) = 0.
Proof. Let ξ ∈ K and let µ be a X -valued random variable whose law is ξ. Let us suppose, to derive a contradiction, that ξ({µ ∈ X : 0 < µ < 1}) > 0. Recalling that T µ is the law of the X -valued random variableμ defined as in (3.6), we have, by Jensen's inequality applied to the concave function x → x/(x + (1 − µ )e c(β) ), where identity holds if and only if N×R d e βY (u) µ * λ(du) is a constant P-a.s. However, since Y is non-degenerate, we have a strict inequality. Combining this fact and the assumption that we made, we see that which is a contradiction since T ξ = ξ.

4.2.
Variational formula for the free energy. We observe that Conditioning the i-th term on G i , we have It is useful to extend this to a functional on X as follows: where Y has the same law as X.
Proof. It is easy to check that the right-hand side of (4.3) does not depend on the choice of the representative of µ. We need to prove that R(µ) is finite. For any positive random variable K, one has |E log K| ≤ max{log EK, log EK −1 } = log max{EK, EK −1 } Combining (4.4), (4.5), and (4.6) gives |R(µ)| < ∞. We now prove the uniform continuity of R: given any ǫ > 0, there is δ = δ(ǫ) > 0 such that where N > 0 is a constant depending only on β and the law of X(1, 0). Let ǫ > 0 be given. Let us define A, B by (3.10) and (3.13) and choose δ used in the proof of Proposition 3.3 (see the last line of Part 1). Notice that for any x > 0, Combining this with Lemma 3.2 and (3.26), we have which completes the proof.
Remark 4.5. We can apply Jensen's inequality to (4.3) to obtain Since Y is non-degenerate, the identity holds only if µ = 0, i.e., µ = 0. Hence, the functional R attains its unique maximum at 0.
We can rewrite (4.2) as In fact, not only the expectations but the random variables themselves are close: Proposition 4.7. As n → ∞, F n − R(ψ n ) → 0 P-a.s.

Proof.
Let We have E[U i |G i ] = R(ρ i ) and therefore is a martingale.
We claim that E U i − E[U i |G i ] 4 is bounded. It suffices to show that EU 4 i is bounded. To this end, we observe that Similarly, we obtain E[V −2 i ] ≤ e c(−2β) . Using the inequality (log x) 4 < x 2 + 1/x 2 , we have Using the Burkholder-Davis-Gundy inequality, we obtain which implies E(M n /n) 4 ≤ Cn −2 and hence, by the Borel-Cantelli lemma, completing the proof.

4.3.
A representation of convergence of F n via R and T . In this section, we explore how energy functional R and T are related to the quenched free energy F n . First, we show that the limit of free energy can be understood as the minimal energy state among the set K of fixed points of T , see Theorem 4.8. This result allows us to describe more precisely the asymptotic behavior of empirical measures ψ n previously stated in Proposition 4.2.
We already discussed in (1.4) that |F n − EF n | → 0 almost surely and in L p for all p ≥ 1. Hence, it is sufficient to show lim n→∞ EF n = inf ξ∈K R(ξ).
To that end, we need two following propositions.  Proof. We claim that for any µ (0) ∈ X and n ∈ N, First, let us use this claim to derive the proposition. For any ξ ∈ P( X ), it follows from (3.40) that Moreover, if ξ ∈ K, then (4.7) implies Taking infimum over ξ ∈ K and lim sup n→∞ , we complete the proof.
Let us prove the claim (4.7). For i ≥ 1, let µ (i) be defined inductively by where (Y (i) ) are i.i.d. random fields whose law is the same as X. By induction, we see that the law of µ (i) is T i δ µ (0) . Hence, Here, we can consider P u k as a probability measure on (N × R d ) k by extending P x k defined in (3.2) as follows: for any u 0 = (m 0 , x 0 ) ∈ N × R d , Iterating (4.8) for i = n − 1, · · · , 1, we have e β(Y (n) (un)+Y (n−1) (u n−1 )) P u n−2 2 (du n−1 , du n )µ (n−2) (du n−2 ) where u = (u 1 , · · · , u n ). Integrating over u in (4.8), we have Iterating this relation for i = n − 1, · · · , 1, we obtain (4.11) Combining (4.9) for i = n, (4.10), and (4.11), we obtain and by Jensen's inequality, (4.12) Since Y (i) (j, x) j≥1 is stationary in x, log Z n,u d = log Z n for all u ∈ N × R d . In particular, E log Z n,u = E log Z n . Taking expectation on the both sides of (4.12), we obtain completing the proof of (4.7) and the entire proposition.
Let us denote Since R is continuous on the compact space P( X ), the infimum is attained and K 0 is also compact. Proposition 4.2, Proposition 4.7, and Theorem 4.8 suggest that one can strengthen Proposition 4.2 by taking a subset K 0 of K.
We omit the proof. It is identical to that of Theorem 4.11 in [BC20] and is based on compactness of K, continuity of R, Proposition 4.7, and Theorem 4.8.
5. Characterization of high/low temperature regimes 5.1. Existence of phase transitions. We recall that the critical inverse temperature β c was introduced in Theorem B.
The proof below follows the proof of Theorem 5.2 in [BC20] closely.
Proof. It is sufficient to prove the inverses of (a) and (b) because their hypotheses are complementary. We recall from Theorem B that 0 ≤ β ≤ β c is equivalent to Λ(β) = c(β) − p(β) = 0, where Λ, c and p were defined in (1.5), (1.3) and (1.2), respectively. If K has no elements other than δ 0 , i.e., K = K 0 = {δ 0 }, then Theorem 4.8 tells us that Let us assume that there is an element ζ ∈ K which is different from δ 0 . From Remark 4.5, we have R(µ) < R(0) for all µ = 0. Combining this with the fact ζ({0}) < 1, we see that To see that ξ(U ) = 1 for all ξ ∈ K 0 in (b), fix ζ ∈ K \ {0} and let us consider a conditional probability measure on X given by We claim that ζ U ∈ K. To prove this, first, we notice that due to the presence of (1 − µ )e c(β) term in the denominator of (3.6), Therefore, for any Borel A ⊂ X , which proves the claim. If ζ(U ) < 1, then which implies that ζ ∈ K 0 only if ζ(U ) = 1.
6. Asymptotic clustering 6.1. Definitions and sufficient conditions. Definition 6.1. The sequence (ρ i ) i≥0 of the endpoint distributions is said to be "asymptotically clustered at level r > 0" if for every sequence (ǫ i ) i≥0 tending to 0, we have Definition 6.2. We say that (ρ i ) i≥0 is "asymptotically locally clustered" if for every sequence (ǫ i ) i≥0 tending to 0, we have Definition 6.3. We say that asymptotic clustering of densities holds for (ρ i ) i≥0 if every ρ i is absolutely continuous with respect to the Lebesgue measure and for every sequence (ǫ i ) i≥0 tending to 0, we have Remark 6.4. If every ρ i is absolutely continuous with respect to the Lebesgue measure, then, due to the Lebesgue differentiation theorem, clustering of densities is equivalent to asymptotically local clustering.
The above definitions are Euclidean space extensions of the notion of of asymptotic pure atomicity that was introduced first by Vargas in [Var07] and modified by Bates and Chatterjee in [BC20] in their studies of endpoint distributions for discrete polymers. Moreover, asymptotic clustering at positive levels was studied in [BM19] still under the name of asymptotic pure atomicity. We introduce the new term asymptotic clustering instead of asymptotic pure atomicity to avoid a misleading image of convergence of the measures in question to a purely atomic measure. Roughly speaking, (ρ i ) i≥0 is asymptotically clustered at level r if the mass of ρ i concentrates on few balls of radius r for large i.
We state a sufficient condition for asymptotic clustering that is simpler to verify because it is stated in terms of fixed ǫ > 0 instead of sequences (ǫ i ) i≥0 .
The proof of this lemma repeats the proof of Lemma 6.2 of [BC20] word for word. The discreteness of Z d plays no role in this argument.
6.2. Auxiliary functionals. For any ǫ > 0, let us define f ǫ : One can see that f ǫ is 1/ǫ-Lipschitz continuous and can be interpreted as an approximation of a step function g ǫ (t) = ½ (2ǫ,+∞) (t) for small ǫ.
For any µ = (α i ) i∈N ∈ X and r > 0, let us define a functional D r on X × (N × R d ) as where u = (i, x) and a + = max(a, 0). Comparing this with the definition of I r , one has that We also observe that Using the embedding of M ≤1 into X , we can naturally define D r on M ≤1 × R d . Combining (2.12) with the fact that y → (1 − |x − y|/r) + is 1/r-Lipschitz for every x ∈ R d , we obtain Let us define a functional J r,ǫ : X → [0, 1] by where we denoted D r (µ, ·) by D r,µ (·). J r,ǫ is well-defined on X due to the following observation Proposition 6.6. For any r, ǫ > 0, J r,ǫ : X → [0, 1] is continuous.
Proof. Let µ ∈ X and δ 2 > 0 be given. We claim that d(µ, ν) < δ 1 := min Fix ν ∈ X satisfying d(µ, ν) < δ 1 . Then, we can find a triple (r ′ , φ = {(µ k , ν k )} n k=1 , x) such that r ′ > 2r and d r ′ ,φ, x (µ, ν) < δ 1 . Let us denote Then, we have In order to derive the upper bound for the first term in the right-hand side of (6.3), we observe In the second equality, we used the fact that dist supp(µ k ), supp(µ l ) > 2r ′ > 4r for all l = k, which implies D r,µ−µ k = D r,µ s on supp(µ k ). Combining this with the triangle inequality, we have On the other hand, we can use (6.2), (2.12), and the 1/ǫ-Lipschitz continuity of f ǫ to write Summing over k on both sides gives Let us estimate the second and the third terms in the right-hand side of (6.3). (6.6) Then, we can choose a finite number of disjoint balls such that every ball B r (u), u ∈ C has non-empty intersection with Using this and µ s (B r ′ (u)) ≤ I r ′ (µ s ) < δ 1 for all u ∈ N × R d , (6.6) can be continued as (6.7) Combining (6.3), (6.4), (6.5), and (6.7) completes the proof.
6.3. Asymptotic clustering of polymer endpoint distributions. In this section, we prove the following theorem which is a reformulation of relations (1.6) and (1.8) in Theorem 1.1.
Since each J r,ǫ is continuous on the compact set K 0 and (J r,ǫ ) ǫ>0 is monotone increasing as ǫ ↓ 0, the convergence above is uniform in ξ by the Dini's theorem. Let now c > 0 be given. By the uniform convergence of (J r,ǫ ) ǫ>0 , we can choose ǫ = ǫ(r, c) > 0 such that J r,ǫ (ξ) > 1 − c for all ξ ∈ K 0 and, for such ǫ, we can also find δ > 0 such that Combining Theorem 4.11, (6.9), and (6.10), we complete the proof of (a). (b) Suppose β ≤ β c and let r > 0, ǫ > 0 be given. We claim that To see this, we observe that for any µ ∈ X and any u ∈ A ǫ µ (r) = {v ∈ N × R d : µ(B r (v)) > ǫV d r d }, where ǫ ′ = ǫ/2 d+2 . Therefore, we have By Theorem 5.1 (a) and the continuity of J 2r,ǫ ′ , we conclude Fix r > 0 and let us now construct a sequence (ǫ i ) i≥0 tending to 0 and satisfying (6.8). By (6.11), we see that for each k ∈ N, there is N k such that We may assume N k+1 > N k for all k. Set ǫ i = 1 for i < N 1 and ǫ i = 1/k for N k ≤ i < N k+1 . Then, we see that for each n ∈ N, there is k = k(n) such that N k ≤ n < N k+1 and hence Since lim n→∞ k(n) = ∞, letting n → ∞ on the both side above completes the proof of (b).
6.4. Asymptotic local clustering of the endpoint distribution. In this section, we prove the following reformulation of relation (1.7) in Theorem 1.1: Theorem 6.8. If β > β c , then (ρ i ) i≥0 is asymptotically locally clustered. In particular, for these values of β, clustering of densities (see Remark 6.4) holds if the reference random walk step distribution λ(dx) is absolutely continuous.
Before we prove this, we recall the Besicovitch covering theorem and its related lemma which will be used later.
Theorem 6.9. (Besicovitch covering theorem) There is a constant N d , depending only on the dimension d, with the following property: Let F = {B rσ (x σ ) : σ ∈ I} be any collection of open balls in R d with sup{r σ : σ ∈ I} < ∞. Let us denote A = {x σ : σ ∈ I}. Then, there is a countable subcollection G of F such that G is a cover of A and every x ∈ B∈G B belongs to at most N d different balls from the subcover G.
We now state a lemma which is based on the Besicovitch covering theorem.
Let ǫ > 0 be given. Then, for any Borel Similarly to A ǫ i (r) and A ǫ i , let us denote for any α ∈ M 1 , ǫ > 0 and r > 0. By substituting γ = m (Leb esgue measure on R d ) in the lemma above, we obtain (6.12) Proposition 6.11. Let ǫ, c, r > 0 be given and let us assume that α ∈ M 1 satisfies Then, there is ǫ 1 = ǫ 1 (ǫ, c, r, d) > 0, independent of α, such that Proof. Let ǫ > 0 be given. For any t ∈ (0, 1), let us set s = (1 − t)/(V d r d ) and (6.14) We will determine the value of t later. Let x ∈ A ǫ α (r) and suppose α(A ǫ 1 α ∩ B r (x)) ≤ tα(B r (x)). Then, we see that In the first inequality above, we used (6.12). By (6.14), we have which contradicts x ∈ A ǫ α (r). Therefore, we obtain or, equivalently, Let us now apply Theorem 6.9 with F = {B r (x) : x ∈ A ǫ α (r)} and A = A ǫ α (r). Then, we can find a countable subset A ⊂ A ǫ α (r) such that G = {B r (x) : x ∈ A} is a cover of A ǫ α (r) and every x ∈ y∈A B r (y) is covered by at most N d balls from G. Therefore, due to (6.15), we have (6.16) Therefore, due to (6.16) and (6.13), Proof of Theorem 6.8. Suppose β > β c . Let (ǫ i ) i≥0 tending to 0 be given. For any c > 0, let us denote s = c 2V d N d and By Theorem 6.7 (a), (ρ i ) i≥0 is asymptotically clustered at level 1. This can be rewritten as Since Proposition 6.11 implies F c ⊂ F ′ c , the same relation holds for F ′ c and hence lim inf Letting c ↓ 0 completes the proof.

Geometric localization
Adapting the terminology from [BC20], we say that the sequence (ρ n ) n≥0 is geometrically localized with positive density if for any δ > 0, there exist K < ∞ and θ > 0 such that If θ can be taken equal to 1, then the sequence would be geometrically localized with full density. A full density localization is an open question. In this section, we prove that (ρ i ) i≥0 is geometrically localized with positive density if and only if β > β c .
(a) W δ is upper semi-continuous.
(b) G is lower semi-continuous.
(c) Q is lower semi-continuous.
This implies ξ {µ ∈ X : |S µ | = 1} = 0. Let η = [ α i ] be a X -valued random variable whose law is ξ and Y be a random field with the same law as X and independent of η. Recalling ξ {µ ∈ X : µ = 1} = 1 in Theorem 5.1 (b), we have η = 1 almost surely. Since T η = η, the law of η(du) = e βY (u) η * λ(du) N×R d e βY (w) η * λ(dw) is also ξ. We observe that where we used the independence between Y (i, ·) and (Y (j, ·)) j =i in the second line, Jensen's inequality in the third line. Integrating with respect to η of the both sides leads to a contradiction, which completes the proof.
The lower semi-continuity of G implies that U δ is an open set, so the map ξ → ξ(U δ ) is also lower semi-continuous. Together with the compactness of K 0 , we have For each ξ ∈ K 0 , we can use (7.6) and monotonicity of V δ,K in K to choose K = K ξ < ∞ such that The upper semi-continuity of W δ implies that the map ξ → ξ(V δ,K ) is lower semi-continuous. Hence, there is r ξ > 0 such that inf ζ∈B(ξ,r ξ ) ζ(V δ,K ξ ) > (1 − ǫ)θ.

Appendix A. An auxiliary coupling lemma
Here we formally state and prove a coupling lemma used in Section 3.3, we used a statement Let us recall that the LU topology (topology of locally uniform convergence) on the space C[R d , R] of all continuous real-valued functions on R d is defined by the following metric: ρ(ω 1 , ω 2 ) = ∞ n=1 1 2 n sup |x|≤n |ω 1 (x) − ω 2 (x)| ∧ 1 .
Let H = B(C[R d , R]) be the Borel sigma-algebra on C[R d , R] equipped with LU topology. Let P be the distribution of X(·) on (Ω, F ). Under P, the canonical process Y x (ω) = ω(x) is a distributional copy of X(x).
Remark A.2. Our proof of this lemma uses regular conditional probabilities. Their existence is guaranteed by our choice of C[R d , R] as the space of realizations but in principle we could impose weaker requirements on the potential than continuity in Section 1.1.