\Gamma-convergence of Onsager-Machlup functionals. Part I: With applications to maximum a posteriori estimation in Bayesian inverse problems

The Bayesian solution to a statistical inverse problem can be summarised by a mode of the posterior distribution, i.e. a MAP estimator. The MAP estimator essentially coincides with the (regularised) variational solution to the inverse problem, seen as minimisation of the Onsager-Machlup functional of the posterior measure. An open problem in the stability analysis of inverse problems is to establish a relationship between the convergence properties of solutions obtained by the variational approach and by the Bayesian approach. To address this problem, we propose a general convergence theory for modes that is based on the $\Gamma$-convergence of Onsager-Machlup functionals, and apply this theory to Bayesian inverse problems with Gaussian and edge-preserving Besov priors. Part II of this paper considers more general prior distributions.


Introduction
In diverse applications such as Bayesian inference and the transition path analysis of diffusion processes, it is important to be able to summarise a probability measure µ on a possibly infinitedimensional space X by a single distinguished point of X -a point of maximum probability under µ in some sense, i.e. a mode of µ. If µ is an absolutely continuous measure on a finitedimensional Euclidean space X, then the modes of µ are the maximisers of its Lebesgue density. In the Bayesian statistical context, if µ is the posterior measure, then the modes of µ are precisely the maximum a posteriori (MAP) estimators. If X is an infinite-dimensional Banach space X, then a Lebesgue density is not available. In this case it has become common to define modes using the posterior probabilities of norm balls in the small-radius limit. Under suitable conditions, such modes admit a variational characterisation as the minimisers of an appropriate Onsager-Machlup (OM) functional. Heuristically, such an OM functional plays the role of the negative logarithm of the "Lebesgue density" of µ, but the rigorous formulation of this relationship requires some care. In a statistical context, this variational characterisation of modes suggests a connection between the fully Bayesian approach and the regularised variational approach to inverse problems: the negative logarithm of the prior acts as a regulariser for the misfit (i.e. for the negative log-likelihood).
A significant challenge to exploiting this connection is the lack of a suitable convergence theory. This is because the stability properties of MAP estimators are poorly understood. In particular, it is not known under what circumstances mild perturbations of the setup of a Bayesian inverse problem (BIP) lead to mild perturbations of the posterior distribution and to mild perturbations of its MAP estimators. Typical examples of perturbations include those arising from finite-dimensional truncation of an infinite-dimensional prior; numerical approximation of an ideal forward operator within the likelihood, e.g. the solution operator of a differential equation; perturbation of observed data; or limiting procedures such as small-noise limits.
In the last decade, beginning with the seminal work of Stuart (2010), many articles have studied the well-posedness and stability of BIPs in function spaces. However, in general, the stability of the posterior and of its MAP estimators are "orthogonal" questions: two posterior probability measures can be arbitrarily close in a strong sense such as Kullback-Leibler (relative entropy) distance and still have MAP estimators that are at constant distance from one another; conversely, even equality of MAP estimators says nothing about the similarity of the full posteriors. The situation vis-a-vis convergence is even less satisfying, as the following examples show: Example 1.1. (a) The normal distributions µ (1) := N (0, 1) and µ (σ) := N (0, σ 2 ) on R have the same unique mode at 0, but can be very far apart in the Kullback-Leibler sense: KL µ (1) µ (σ) = σ −1 − 1 + log σ 2 → +∞ as σ → 0 or σ → +∞.
exp − 1 2 (x − 1) 2 , which has a unique mode at x = 1. Convergence in the Kullback-Leibler sense also holds, with KL µ (∞) µ (n) ≈ 1 n . However, each µ (n) has a unique mode at approximately x ≈ 1 n and the unique cluster point of this sequence of modes is x = 0 = 1. Thus, even in finite-dimensional settings, pointwise convergence of densities (and hence of OM functionals) does not imply convergence of modes.
Given the variational characterisation of modes as minimisers of an OM functional, it seems natural to assess the convergence (and hence stability) of modes using a notion of convergence that is appropriate for variational problems, i.e. one for which the convergence of functionals implies the convergence of minimisers. The notion of Γ-convergence, as introduced by De Giorgi and collaborators from the 1970s onwards (De Giorgi, 2006), fulfils exactly this rôle and, in particular, overcomes the shortcomings of pointwise convergence as illustrated in Example 1.1(c). Indeed, if the densities in Example 1.1(c) were uniformly convergent, then they would be continuously convergent and hence Γ-convergent as well, and the pathological non-convergence of modes would have been avoided. Therefore, the strategy followed by this article consists in the following: 1. Formulate the problem of finding modes of probability measures on potentially infinitedimensional spaces as a variational problem for the associated OM functionals; and 2. Study the Γ-convergence properties of such problems, in order to obtain criteria for the convergence and stability of such modes. The remainder of this article is structured as follows. Section 2 gives an overview of related work in the theory of modes for measures on infinite-dimensional spaces and Section 3 sets out some notation and basic results for the rest of the article. Section 4 explores the correspondence between modes and minimisers of OM functionals, and hence demonstrates that Γ-convergence of OM functionals is the correct notion of convergence to ensure convergence of modes. Section 5 develops this idea in two prototypical settings, namely Gaussian and Besov-1 measures, which are frequently used as Bayesian prior distributions; these results can then be transferred to measures that are absolutely continuous with respect to these paradigmatic examples and can be interpreted as the corresponding posterior measures. More general prior measures, which include Cauchy measures and Besov-p measures for 1 p 2 are treated in Part II of this paper (Ayanbayev et al., 2021). In Section 6, these ideas are then applied to the convergence and stability of MAP estimators for BIPs, for which Gaussian and Besov-1 measures are prototypical prior distributions. Some conclusions and suggestions for further work are given in Section 7. Standard definitions and results relating to Γ-convergence are collected in Appendix A and technical supporting results are given in Appendix B.

Overview of related work
In stochastic analysis and mathematical physics, the interpretation of the minimisers of Onsager-Machlup functionals over path spaces as most probable paths appears to be due to Dürr and Bach (1978). The OM functionals of diffusion processes, and hence the determination of maximum a posteriori paths, have been considered by e.g. Zeitouni (1989) and Dembo and Zeitouni (1991). It is important to note that simply determining the OM functional on some n-dimensional approximation space and then taking a limit as n → ∞ can fail to yield the correct OM functional as defined in terms of ratios of small ball probabilities by (3.3). This is because the space on which the OM functional is finite can be "smoother" than the full space on which small ball probabilities are defined (Dashti et al., 2013).
Recent years have seen a growing interest in the well-posedness and stability of BIPs in function spaces, a perspective proposed by a seminal article of Stuart (2010) that has stimulated many follow-on works and generalisations (e.g. Dashti et al., 2012;Hosseini, 2017;Latz, 2020;Sprungk, 2020;Sullivan, 2017). There is also complementary theory of discretisation invariance, sometimes referred to as the "Finnish school", which in some sense treats the finite-dimensional discrete versions of BIPs as the primary objects of interest but pays careful attention to their limiting properties as the discretisation dimension tends to infinity (e.g. Lasanen, 2012a,b;Lassas and Siltanen, 2004;Lassas et al., 2009;Lehtinen et al., 1989). However, the robustness studies in these works have focussed on the robustness of the posterior measure in a distributional sense such as the Hellinger, Kullback-Leibler, or Wasserstein sense and, as Example 1.1 shows, these are insufficient to ensure robustness of modes or MAP estimators. Our results can be seen as contributions to this field in the sense that they establish stability/convergence of MAP estimators in a setting that is not limited to finite-dimensional or even linear spaces.
In the BIP context, the definition of a MAP estimator as the centre of a norm ball that has maximum posterior probability in a small-radius limit appears to be due to Dashti et al. (2013). As Dashti et al. (2013) note, a similar definition of a maximal point was given earlier by Hegland (2007), but that analysis was implicitly limited to the finite-dimensional setting, since it assumed finiteness of the Cameron-Martin norm. The context of Dashti et al. (2013) was limited to a separable Hilbert space 1 X equipped with a Bayesian posterior measure µ that was absolutely continuous with respect to a centred non-degenerate Gaussian reference measure µ 0 . In this setting, Dashti et al. (2013) established the existence of MAP estimators and characterised them as the minimisers of the OM functional, which they further identified as the sum of the log-likelihood and the OM functional of µ 0 . When read in the context of a general probability measure on a metric space, rather than the original setting of a Bayesian posterior on a Hilbert space, the definition of Dashti et al. (2013) is essentially the definition of a strong mode (Definition 3.6).
The work of Dashti et al. (2013) has been extended in multiple ways. Dunlop and Stuart (2016) proved the connection between MAP estimators and OM functionals in the setting of piecewise continuous inversion, where the prior is defined in terms of a combination of Gaussian random fields. Recently, Kretschmann (2019) has corrected some technical deficiencies of Dashti et al. (2013). The definition of a (strong) MAP estimator for µ given by Dashti et al. (2013) was relaxed to that of a weak MAP estimator by Helin and Burger (2015), in which comparisons between the masses of balls are only performed for balls whose centres differ by an element of a topologically dense subspace E of X. Helin and Burger (2015) showed that this weak MAP estimator has a close relationship with the zeroes of the logarithmic derivative β µ h : The initial analysis of the weak MAP estimator relied upon the existence of a continuous representative for β µ h , which could not be guaranteed for several important applications, notably the edge-preserving Besov prior with p = 1. By focussing on the Radon-Nikodym derivative r µ h := dµ( · −h) dµ instead of β µ h , the analysis of Agapiou et al. (2018) remedied this shortcoming, posed the definitions and results in more general terms of modes of probability measures rather than MAP estimators of Bayesian posteriors, and also considered local (rather than global) strong and weak modes. The equivalence of strong and weak modes when E is dense in X and under a uniformity condition on µ was established by Lie and Sullivan (2018), who worked in the more general context of measures on metrisable topological vector spaces.
Finally, we mention that the definition of a strong mode is unsuitable for probability measures with bounded support, and especially to such measures with essential discontinuities in the density. Examples of such measures include uniform measures on bounded subsets of X. The definition is unsuitable because it excludes "obvious" modes on the boundary of the support. The recent generalised mode of Clason et al. (2019) addresses this deficiency.

Preliminaries and notation
3.1. General notation and assumptions Throughout this article, X will denote either a topological space, a metric space, a separable Banach or a Hilbert space. When thought of as a measurable space, X will be equipped with its Borel σ-algebra B(X), which is generated by the collection of all open sets. When X is a metric space, we write B r (x) for the open ball in X of radius r centered on x. The set of all probability measures on (X, B(X)) will be denoted P(X); we denote typical probability measures by µ, µ 0 , µ (n) for n ∈ N ∪ {∞} etc. The topological support of µ ∈ P(X) defined on a metric space X is which is always a closed subset of X. We write R for the extended real line R ∪ {±∞}, i.e. the two-point compactification of R. For 0 < p ∞, we write p := p (N) for the real sequence space of p th -power summable sequences, and bounded sequences in the case p = ∞. Furthermore, given γ = (γ k ) k∈N ∈ R N >0 , we write for the corresponding weighted p space. It is well known that · p (and hence · p γ ) is a complete quasinorm when p > 0, a Banach norm when p 1, and a Hilbert norm when p = 2.

Onsager-Machlup functionals
We recall here the definition of an Onsager-Machlup functional for a measure. The minimisers of OM functionals will turn out to be the modes of the measure (see Section 4). We also stress a property that is already implicitly used without a name in the modes literature, one that essentially ensures that the modes must lie in the domain of the OM functional.
Definition 3.1. Let X be a metric space and let µ ∈ P(X). We say that (3.4) and in this situation we extend I to a function I : X → R with I(x) := +∞ for x ∈ X \ E. Every measure µ admits an OM functional if E is taken to be small enough, e.g. a singleton subset of supp(µ). Therefore, there is a natural desire to have E be "maximal" in some sense. Property M (µ, E) means that the set E is the "maximal" set on which the OM functional assumes finite values.
It is tempting but incorrect to read property M (µ, E) as saying that µ somehow concentrates upon E. A straightforward counterexample is given by any non-degenerate Gaussian measure µ with infinite-dimensional Cameron-Martin space H(µ), such as the law µ of standard Brownian motion on X = C([0, 1]; R) with H(µ) = H 1 ([0, 1]; R). In this situation, property M (µ, H(µ)) holds (Dashti et al., 2013, Lemma 3.7) and yet µ(H(µ)) = 0 (Bogachev, 1998, Theorem 2.4.7). Rather, the purpose of property M (µ, E) is to ensure that the global weak modes (see Definition 3.7) of µ lie in E and are precisely the minimisers of its extended OM functional (see Proposition 4.1). Furthermore, it is essentially the lim sup part of the limit in (3.4) that ensures this; Example B.2 shows that if we weaken (3.4) by considering the limit inferior instead of the limit, then -even for very simple choices of E -the desired correspondence may break down.
Remark 3.3 (Topological considerations). Note that in defining the OM functional here and various notions of mode / MAP estimator later on, we use open balls (following e.g. Dashti et al. (2013) and Agapiou et al. (2018)) rather than closed ballsB ε (x) (following e.g. Bogachev (1998)). However, Proposition B.3 shows that these two notions yield the same definition of OM functionals and global weak modes.
Remark 3.4 (Uniqueness of OM functionals). Note that OM functionals are at best unique up to the addition of real constants. Whenever we talk about Γ-convergence and equicoercivity of sequences of OM functionals, which are at the core of this work, we always mean the existence of representatives that fulfil these properties. Further, whenever we apply results that require both Γ-convergence and equicoercivity (such as Theorem A.3), we need to make sure that the same representatives can be chosen for both properties.
Remark 3.5 (OM functionals and changes of metric). Unfortunately, the choice of metric on a space X can affect the OM functional of a measure µ on X, even beyond the non-uniqueness alluded to in Remark 3.4, and even if the two metrics are Lipschitz equivalent. An explicit example of this is furnished by the finite measure µ of Lie and Sullivan (2018, Example 5.6); see Example B.4 for details.

Modes and MAP estimators
In finite-dimensional spaces, for probability measures that are either purely discrete or possess a continuous Lebesgue density, modes (as points of maximum probability) are easily defined as being global maximisers of the probability mass function or probability density function as appropriate. For probability measures on infinite-dimensional spaces, however, the situation is more delicate as there is no infinite-dimensional analogue of Lebesgue measure to serve as a uniform reference. Therefore, it has become common to define modes by examining the masses of norm balls in the small-radius limit. The following definition of a strong mode is a slight generalisation of the definition of a MAP estimator for a Bayesian posterior measure on a normed space as given by Dashti et al. (2013).
where M r := sup w∈X µ(B r (w)) ∈ (0, 1]. Since µ(B r (u)) M r , the ratio inside the limit in (3.5) is at most one, and so it is equivalent to define a strong mode as being any u ∈ X for which A related notion of mode for a measure is the weak mode or weak MAP estimator (Helin and Burger, 2015, Definition 4), which are points that dominate all other points within an affine subspace, in terms of small ball probabilities. Since we are only interested in global weak modes, we simplify the definition slightly and at the same time generalise this concept to metric spaces. The original definition relies on subtraction and thus only applies in the case of linear spaces.
Definition 3.7. For a metric space X, a global weak mode of µ ∈ P(X) is any u ∈ supp(µ) satisfying, for any point u ∈ X, Remark 3.8. The definition (3.6) differs further from that of Helin and Burger (2015, Definition 4) in that we use a limit superior instead of a limit. We suspect that this is the way it was intended to be defined -if a point dominates every other point in terms of small ball probabilities (in the sense that the ratio in (3.6) becomes 1 for sufficiently small r > 0), it should be called a weak mode. This suspicion is based on Helin and Burger (2015, Lemma 3), where the authors prove that every strong mode is a weak mode, which is clearly a desirable property given the terminology "strong" and "weak", but their proof is incorrect given their original definition. The reason is that the ratio in (3.6) can drop below 1 as r 0 for certain points u, u ∈ X without the limit existing (e.g. it might oscillate between 0 and 1 2 ), such that u cannot be a weak mode in the original definition, while it can still be a strong mode. Example B.2 shows that such oscillations can occur and a slight modification provides a concrete counterexample to Helin and Burger (2015, Lemma 3).
Lemma 3.9. Let µ ∈ P(X) and let u ∈ supp(µ) be a strong mode of µ. Then u is a global weak mode of µ.
Proof. Since u ∈ supp(µ) is a strong mode of µ, we obtain, for any point u ∈ X, Sufficient conditions for the converse implication are given by Lie and Sullivan (2018). The relationship between modes and OM functionals will be examined in Section 4.

Generalised inverses
We adopt the following definition of the Moore-Penrose pseudoinverse of an operator (Engl et al., 1996, Definition 2.2): Definition 3.10. For a bounded linear operator A : X → Y between Hilbert spaces X and Y , the Moore-Penrose pseudoinverse A † of A is the unique extension of (A| In particular, for y ∈ ran A, A † y is the minimum-norm solution of Ax = y (Engl et al., 1996, Theorem 2.5).
Remark 3.12. For a self-adjoint and positive semi-definite (SPSD) and compact operator C = n∈N σ 2 n e n ⊗ e n : X → X on a Hilbert space X, (e n ) n∈N being an orthonormal system in X and σ n 0 for each n ∈ N, we denote the SPSD operator square root of C by C 1/2 and furthermore set Note that (C † ) 1/2 can differ from (C 1/2 ) † since it may have a smaller domain.

Modes, Onsager-Machlup functionals, and their convergence
The purpose of this section is to firmly establish the intuitively plausible relationship between the modes of a probability measure µ and its OM functional I, namely that the global weak modes of µ are exactly the global minimisers of I. Once this is done, it is a relatively simple matter to give sufficient conditions for the global weak modes of a sequence of measures to converge to the global weak modes of a limiting measure: Γ-convergence and equicoercivity of the associated OM functionals.
Proposition 4.1 (Global weak modes and OM functionals). Let X be a metric space and let I : E → R be an OM functional for µ ∈ P(X), defined on a nonempty subset E ⊆ X with property M (µ, E). Then u ∈ E is a global weak mode of µ if and only if u is a minimiser of the extended OM functional I : X → R.
Proof. By property M (µ, E) and Lemma B.1(c), any global weak mode of µ must lie in E, and in addition any minimiser of I : X → R must also lie in E, where E is the set on which I takes real values. Let u ∈ E be arbitrary. Then lim r 0 µ(Br(u )) µ(Br(u)) exists for any u ∈ E by definition of I being the OM functional, and the same limit exists and equals 0 for u ∈ X \ E, because property M (µ, E) holds. Thus, ⇐⇒ for all u ∈ X, I(u) I(u ), as claimed.
Property M (µ, E) was essential in the above argument in order for points outside E to be treated in a consistent way. Recall that, in Definition 3.1, for a given µ ∈ P(X), we initially defined the OM functional of µ to be a function I : E → R. Only under property M (µ, E) can we sensibly extend I to a R-valued function on X by setting I(x) := +∞ for x ∈ X \ E. The motivation for this extension is that, by Lemma B.1(c), no point of X \ E can be a global weak mode for µ, and hence cannot be a strong mode for µ.
Unfortunately, without additional assumptions, an analogous result to Proposition 4.1 cannot hold for strong modes, as demonstrated in Example B.5. The main idea behind the measure µ constructed therein is that u = 1 "dominates" any other (fixed) point u ∈ X = R in the limit r 0, i.e. lim r 0 µ(Br(u )) µ(Br(u)) 1, hence u = 1 is a global weak mode; but for certain arbitrarily small radii r n , n ∈ N, there exist points u n ∈ X that "dominate" u by a margin, in fact lim inf r 0 µ(Br(u)) Mr 1 √ 2 , hence u = 1 cannot be a strong mode. Moreover, for E = N ⊆ X, property M (µ, E) holds and an OM functional I µ,E : E → R exists and has u = 1 as its minimizer. The construction is based on suitably chosen singularities of the Lebesgue density ρ of µ.
The following result, which is an almost immediate consequence of the preceding discussion, provides clear criteria for the convergence of global weak modes along sequences of probability measures. (Definitions and basic properties of Γ-convergence, equicoercivity, etc. are collected in Appendix A.) Theorem 4.2 (Γ-convergence and equicoercivity imply convergence of modes). Let X be a metric space and let, for n ∈ N ∪ {∞}, µ (n) ∈ P(X) have OM functionals Suppose that the sequence (I (n) ) n∈N is equicoercive and Γ-converges to I (∞) . Then, if u (n) is a global weak mode of µ (n) , n ∈ N, every convergent subsequence of (u (n) ) n∈N has as its limit a global weak mode of µ (∞) .
Proof. By Proposition 4.1, the global weak modes of µ (n) are precisely the minimisers of the extended version of I (n) , n ∈ N ∪ {∞}. The rest follows immediately from the fundamental theorem of Γ-convergence (Theorem A.3).
It is instructive to reconsider the earlier Example 1.1(c) in light of Theorem 4.2. The problem in Example 1.1(c) -in which the unique modes of the measures µ (n) fail to cluster at the unique mode of the limiting measure µ (∞) -can now be recognised as being due to the fact that although pointwise convergence of Lebesgue densities and OM functionals holds, Γ-convergence does not. Therefore, Theorem 4.2 does not apply to that example and there is no reason for modes to converge in this case.
Theorem 4.2 is, of course, a highly general result. For it to be useful in specific situations, one must prove property M (µ (n) , E (n) ) and identify the form of the OM functional I (n) for every n ∈ N. In addition, one must verify both the Γ-convergence and equicoercivity properties of the sequence (I (n) ) n∈N . In the next section, we do this for Gaussian measures and Besov-1 probability measures, which are commonly used as priors in the context of BIPs.

Γ-convergence of Onsager-Machlup functionals for Gaussian and Besov-1 priors
This section illustrates the preceding general theory of convergence of modes via Γ-convergence of OM functionals by means of two key examples, namely Gaussian and Besov B s 1 measures, both of which commonly arise as prior distributions in BIPs. Besov B s p -priors with 1 p 2, Cauchy priors, and more general product measures are treated in a unified way in Part II of this paper (Ayanbayev et al., 2021). The convergence of modes (MAP estimators) for posterior distributions will be discussed in Section 6.

Gaussian measures
As a natural first case, we consider the Γ-convergence of the OM functionals of Gaussian measures -and we call attention to the fact that we consider Gaussian measures with possibly indefinite covariance operators. It is almost folklore that the OM functional of a Gaussian measures is half the square of the associated Cameron-Martin norm; a precise formulation of this result is the following.
Theorem 5.1 (OM functional of a Gaussian on a separable Banach space). Let µ be a centred Gaussian measure on a separable Banach space X. Let H(µ) be the Cameron-Martin space of µ, with Cameron-Martin norm · H(µ) . Then, for all h, k ∈ H(µ), In particular, the OM functional for µ on the Cameron-Martin space H(µ) is half the square of the Cameron-Martin norm.
Proof. This is a special case of (Bogachev, 1998, Corollary 4.7.8), in which the cylindrical σ-algebra E(X) and the Borel σ-algebra B(X) coincide by the separability of X, the measurable seminorm q is the ambient norm · X , the q-ball V r is the ball B r (0) ∈ B(X), and the projection π q is the identity due to the definiteness of q( · ) = · X . Note that Bogachev (1998) works with closed balls, this difference being inconsequential in view of Proposition B.3.
Corollary 5.2. Let µ = N (0, C) be a centered Gaussian measure on a separable Hilbert space X, where the covariance C is interpreted as an SPSD operator on X. Then the (extended) OM functional I µ : X → R of µ is given by Proof. By (Bogachev, 1998, Section 2.3, p. 49), the reproducing kernel Hilbert space X * µ := X * L 2 (µ) of µ can be identified with the weighted Hilbert space of sequences Further, after extending C naturally to X * µ , the Cameron-Martin space coincides with the image of X * µ under C, i.e. H(µ) = C(X * µ ). Now let C = n∈N σ 2 n e n ⊗ e n , σ n 0, be the eigenvalue decomposition of its covariance operator C with complete orthonormal system (e n ) n∈N and let u = n∈N u n e n ∈ H(µ). Since H(µ) = C(X * µ ), there exists x = (x n ) n∈N ∈ 2 C , such that u n = σ 2 n x n for all n ∈ N, and, by (Bogachev, 1998, Lemma 2.4.1) and Remark 3.12, The claim now follows from Theorem 5.1 and from (Dashti et al., 2013, Lemma 3 Remark 5.3. Note that the notation in (Bogachev, 1998, Section 2.3, p. 49) is slightly imprecise, since the space 2 C in (5.2) is, in general, only a pre-Hilbert space (and · 2 C is just a seminorm). To be rigorous, one would need to consider the quotient space of 2 C after factoring out the subspace {x | x 2 C = 0}. This detail has no influence on the proof of Corollary 5.2.
Corollary 5.4. Let µ 0 = N (0, C) be a centered Gaussian measure on a separable Hilbert space X, where the covariance C is interpreted as an SPSD operator on X, and µ = N (m, C). Then the OM functional I µ : X → R of µ is given by Proof. This follows directly from Theorem 5.1 and Corollary 5.2.
We now give the main result of this section, that the strong (norm) convergence of means and covariance operators of Gaussian measures is sufficient to ensure that their associated OM functionals are Γ-convergent and equicoercive.
Theorem 5.5 (Γ-convergence and equicoercivity of OM functionals for Gaussian measures). Let X be a separable Hilbert space and µ (n) = N (m (n) , C (n) ) and µ = N (m, C) be Gaussian measures on X such that m (n) → m in X and C (n) → C with respect to the operator norm. Then I µ = Γ-lim n→∞ I µ (n) . Furthermore, the sequence (I µ (n) ) n∈N is equicoercive.
Remark 5.6. Since all the Gaussian OM functionals I µ (n) are quadratic forms, and homogeneity is preserved by Γ-limits (Braides, 2006, Proposition 2.13), it is not surprising that Γ-lim n→∞ I µ (n) is quadratic -the point here is to check that the quadratic forms Γ-lim n→∞ I µ (n) and I µ agree, and moreover with careful attention to the possibility of indefinite covariances.
Proof of Theorem 5.5.
Let A := C 1/2 and A n := (C (n) ) 1/2 . Further, let (e k ) k∈N be an orthonormal eigenbasis of A, A = k∈N σ k e k ⊗ e k with σ k 0, and, for any vector w ∈ X, let w k := w, e k X denote its k th component in that basis.
Let (u (n) ) n∈N be a sequence in X that converges to u ∈ X. If lim inf n→∞ I µ (n) (u (n) ) = ∞, then there is nothing to prove. Therefore, define I := lim inf n→∞ I µ (n) (u (n) ) ∈ R. There exists a subsequence of (u (n) ) n∈N , which for simplicity we also denote by (u (n) ) n∈N , such that u (n) − m (n) ∈ ran A n for each n ∈ N and I µ (n) (u (n) ) −−−→ n→∞ I (note that I µ (n) (u (n) ) = ∞ unless Now let ε > 0 and v (n) := A † n (u (n) − m (n) ), n ∈ N. Without loss of generality (possibly, by a further thinning of the subsequence) and using Corollary 5.4, we may assume 1 2 v (n) 2 X = I µ (n) (u (n) ) I + ε for each n ∈ N. Define K := {k ∈ N | σ k > 0} and the sequencesv = (v k ) k∈N To prove the Γ-lim inf inequality, and with Corollary 5.4 in mind, we must show that (i)v ∈ 2 and therefore v := k∈N v k e k ∈ X; (ii) Av = u − m and therefore u − m ∈ ran A; (iii) 1 2 v 2 X I + ε and therefore by using the fact that ε > 0 is arbitrary and by using Remark 3.11. Since A n − A → 0, u (n) − u X → 0 and v (n) X M ε := √ 2I + 2ε for all n ∈ N, we obtain The above convergence implies componentwise convergence: proving (ii) and finalising the proof of the Γ-lim inf inequality.
For the Γ-lim sup inequality, first note that, if u − m / ∈ ran A, then I µ (u) = ∞, and there is nothing to prove since we may choose u (n) := u for all n ∈ N. If u−m ∈ ran A, let v := A † (u−m) and u (n) : ) is its minimum norm solution (cf. Remark 3.11), Corollary 5.4 implies for each n ∈ N, finalising the proof of the Γ-lim sup inequality.
In order to prove equicoercivity of the sequence (I µ (n) ) n∈N , let t ∈ R and where we used Corollary 5.4. We will now show that K t is (sequentially) precompact. To this end, let (u (ν) ) ν∈N be a sequence in K t . If u (ν) ∈ K (n) t infinitely often for some n ∈ N, there is nothing to prove, since A n is a compact operator for each n ∈ N. Otherwise, there exist subsequences (u (ν j ) ) j∈N and (µ (n j ) ) j∈N such that u (ν j ) ∈ K (n j ) t for each j ∈ N. Hence, u (ν j ) − m (n j ) ∈ ran A n j for each j ∈ N and the points v (j) : Since A is a compact operator, the sequence (w (j) ) j∈N given by w (j) := Av (j) has a subsequence that converges to some element w ∈ X. For simplicity, we denote this subsequence by (w (j) ) j∈N . It follows that and so (u (ν) ) ν∈N has a convergent subsequence. Hence, K t is compact with I −1 µ (n) ([−∞, t]) ⊆ K t for each n ∈ N, finalising the proof of equicoercivity.
The following corollary is a direct consequence of Theorems 4.2 and 5.5: Corollary 5.7. Let X, µ, (µ (n) ) n∈N be as in Theorem 5.5. If u (n) is a global weak mode of µ (n) , n ∈ N, then every convergent subsequence of (u (n) ) n∈N has as its limit a global weak mode of µ.

B s 1 -Besov measures
We now establish analogous results to those of the previous section for the class of Besov-1 measures. Besov-1 measures and Gaussian measures on infinite-dimensional spaces are analogous to Laplace distributions and normal distributions on R. Besov-1 measures have been used as sparsity-promoting or edge-preserving priors 4 in inverse problems (Agapiou et al., 2018;Dashti et al., 2012;Lassas et al., 2009). Throughout this subsection, we use the following notation: 5 Assumption 5.8. Let s ∈ R, d ∈ N, η > 0, t := s−d(1+η) and assume that τ := (s/d+1/2) −1 > 0. The parameter s is thought of as a "smoothness parameter" and d as a "spatial dimension". Define γ 0 := 1 and γ, δ ∈ R N by and let µ k ∈ P(R) for k ∈ N ∪ {0} have the Lebesgue density .
4 Strictly speaking, regularisation using the Besov-1 norm promotes edge-preservation for the MAP estimator but not for samples from the full posterior distribution. 5 Typically, Besov measures are introduced on the space L 2 (T d ); the same construction that we use for the components of a random sequence in R N is used for the components of a random Fourier or wavelet expansion in L 2 (T d ). In our definition, the dimension d becomes superfluous and one could work withs := s/d, but we continue to use the classical notation in order to reduce confusion.
We define the Besov measure B s 1 as follows, using notation that is an adaptation of that of Dashti et al. (2012) and Agapiou et al. (2018).
Definition 5.9 (Sequence space Besov measures and Besov spaces). Using Assumption 5.8, we call µ := k∈N µ k a (sequence space) Besov measure on R N and write B s 1 := µ. The corresponding Besov space is the weighted sequence space (X s 1 , · X s 1 ) := ( 1 γ , · 1 γ ). Since it is the parameter "p = 1" that most strongly affects the qualitative properties of the measure, we often refer simply to a "Besov-1 measure" for any measure in the above class, regardless of the values of s, d, etc.
Proof. This is a restatement of Lassas et al. (2009, Lemma 2) for particular case p = 1.
From now on we will consider the Besov measure µ = B s 1 on the normed spaces X = X t 1 = 1 δ . This is possible since, by (Ayanbayev et al., 2021, Lemma B where we consider the product topology on R N . Proposition 5.11. Let µ = B s 1 be a B s 1 -Besov measure on the space X = X t 1 = 1 δ . Then, for E = X s 1 = 1 γ , property M (µ, E) is satisfied and the OM functional I µ : X → R of µ is given by Proof. The OM functional formula on E follows from (Agapiou et al., 2018, Theorem 3.9), while property M (µ, E) follows from (Ayanbayev et al., 2021, Theorem 4.9). The assumptions of this theorem are fulfilled, given Definition 5.9 and Lemma 5.10.
Remark 5.12. Proposition 5.11 uses and extends (Agapiou et al., 2018, Theorem 3.9). The authors write that "the space B s 1 (T d ) here, is the largest space on which the Onsager-Machlup functional is defined". This claim is intuitively true, since h X s 1 = +∞ if h / ∈ X s 1 = E, and in our notation X s 1 corresponds to B s 1 (T d ). However, one must not a priori exclude the possibility that I µ can have a different formula outside of E. Property M (µ, E) in the above proof is one way to guarantee that the claim is true.
We now give a Γ-convergence and equicoercivity result for sequences of Besov-1 measures with converging smoothness parameters.
Without loss of generality, we assume n 0 = 1 from now on in order to simplify notation. Since and, for any θ 0 and n ∈ N, We will now show that K θ ⊆ X is precompact. For this purpose, we define the operators All T m are finite-rank operators that converge to T in the operator norm: Therefore, T is a compact operator and K θ = θ T B ∞ 1 (0) is precompact, finalising the proof of equicoercivity. Note that for θ < 0 there is nothing to prove, since In order to prove the Γ-convergence statement, we will first show that γ For the Γ-lim inf inequality, it follows from u (n) − u (∞) for all k ∈ N. Thus, by Fatou's lemma, For the Γ-lim sup inequality, note that, if I µ (∞) (u (∞) ) = ∞, then there is nothing to prove. Therefore, let us assume that I µ (∞) (u (∞) ) < ∞, and define u (n) by u and lim sup n→∞ I µ (n) (u (n) ) I µ (∞) (u (∞) ). Additionally, finalising the proof of the Γ-lim sup inequality.
The following corollary is now a direct consequence of Theorems 4.2 and 5.13: Corollary 5.14. Let X, (µ (n) ) n∈N∪{∞} be as in Theorem 5.13. If, for each n ∈ N, u (n) is a global weak mode of µ (n) , then every convergent subsequence of (u (n) ) n∈N has as its limit a global weak mode of µ (∞) .

Consequences for maximum a posteriori estimation in Bayesian inverse problems
The Γ-convergence theory of OM functionals described in Section 5 has important consequences for the stability of MAP estimators of Bayesian inverse problems (BIPs), in particular those BIPs that use the probability measures considered above as prior distributions. An inverse problem consists of the recovery of an unknown u from related observational data y. In the Bayesian approach to inverse problems (Kaipio and Somersalo, 2005;Stuart, 2010), these two objects are treated as coupled random variables u and y that take values in spaces X and Y respectively. A priori knowledge about u is represented by a prior probability measure µ 0 ∈ P(X) and one is given access to a realisation y of y. The solution of the BIP is, by definition, the posterior probability measure µ y ∈ P(X), i.e. the conditional distribution of u given that y = y. For the sake of space, we omit here all technical discussion of the existence and regularity of this conditional distribution and focus exclusively on the case that µ y has a Radon-Nikodym derivative with respect to µ 0 of the form µ y (du) ∝ exp(−Φ(u; y)) µ 0 (du) for some Φ : X × Y → R. The function Φ, often called the potential, encodes both the idealised relationship between the unknown and the data and statistical assumptions about any observational noise. The textbook example is that X is a separable Hilbert or Banach space of functions, Y = R J for some J ∈ N, and that y = O(u) + η for some deterministic observation map O : X → Y and additive non-degenerate Gaussian noise η ∼ N (0, C η ) that is a priori independent of u, in which case Φ is the familiar quadratic misfit One convenient point summary of µ y is a MAP estimator, i.e. a point of maximum probability under µ y in the sense of a maximiser of a small ball probability. Under the conditions laid out in Section 4, these points (in the sense of global weak modes) are the minimisers of the OM functional of µ y . However, we note that there are many problems of interest for which a more generalised notion of MAP estimator and a correspondingly generalised OM functional are needed, particularly problems in which the prior may have bounded support or the potential may take the value +∞ (Clason et al., 2019).
Our interest lies in assessing the stability of µ y (more precisely, the stability of the MAP estimators of µ y ) in response to the following: • perturbations of the observed data y, to be reassured that the posterior is not unduly sensitive to observational errors; • perturbations of the potential 6 Φ, for example to be reassured that the posterior is not unduly sensitive to numerical approximation of O by some O (n) (e.g. using a finite element solver to solve a partial differential equation), or to examine the small-noise limit C η → 0; • perturbations of the prior µ 0 , to be reassured that the posterior is not unduly sensitive to prior assumptions, e.g. relating to the regularity of u. We propose to address this question using the Γ-convergence results of the previous section. The classes of measures for whose OM functionals explicit Γ-limits were computed in Section 5 will serve here as Bayesian prior measures.
Our main result concerns the transfer of convergence properties of sequences of prior OM functionals and sequences of potentials to the convergence of posterior OM functionals.
Theorem 6.1 (Transfer of property M , Γ-convergence, equicoercivity, and MAP estimators). Let X be a metric space. For each n ∈ N ∪ {∞}, let µ (n) 0 ∈ P(X) and let Φ (n) : X → R be locally uniformly continuous. Suppose that, for each n ∈ N ∪ {∞}, Suppose that each µ (d) Suppose that the sequence (I (n) 0 ) n∈N is equicoercive and the functions Φ (n) M are uniformly bounded from below by some constant M ∈ R. Then the sequence (I (n) ) n∈N is also 6 Of course, a perturbation of the data y induces a perturbation of Φ( · ; y). Sometimes it is easier to consider data perturbations and potential perturbations separately, and sometimes, as we do in Theorem 6.1, it is simpler to consider them both as perturbations of the potential. 7 See Definition A.2 for the definition of continuous convergence. Note that Proposition A.4 is agnostic as to which of the two summands converges continuously, and so Theorem 6.1(c) also holds if I Proof. Parts (a) and (b) follow from Lemma B.8, and part (c) follows from Proposition A.4 (i.e. Dal Maso, 1993, Proposition 6.20). For part (d), let (I (n) 0 ) n∈N be equicoercive and Φ (n) M be uniformly bounded from below. Then, for any t ∈ R, there exists a compact K t ⊆ X such that, for all n ∈ N, (I Finally, part (e) is just a restatement of Theorem 4.2.
Remark 6.2. Loosely speaking, the hypothesis in Theorem 6.1(d) that the potentials are uniformly bounded below corresponds to a likelihood model in which the observed data are (uniformly) finite dimensional. BIPs with infinite-dimensional data are known to involve potentials that are unbounded below. Such potentials cannot be interpreted as (non-negative) misfit functionals, as discussed by e.g. Stuart (2010, Remark 3.8) and Kasanický and Mandel (2017). Note also that a standing assumption of Dashti et al. (2013) is that Φ is locally Lipschitz continuous, which is stronger than the local uniform continuity assumed in Theorem 6.1, and that boundedness of Φ from below is also assumed by Dashti et al. (2013, Theorem 3.5), just as in the hypothesis of Theorem 6.1(d).
Corollary 6.3. Consider a BIP with prior µ 0 = µ (∞) 0 , potential Φ = Φ (∞) , and observed data y = y (∞) , each of which may now be approximated. In addition to the assumptions of Theorem 6.1, assume for simplicity that the OM functional of µ 0 is lower semicontinuous, so that it equals its own Γ-limit (Theorem A.5).
Example 6.4 (Small-noise limits). Regrettably, the analysis of MAP estimators of small-noise (infinite-precision) limits is not entirely trivial even under the Γ-convergence theory that we have outlined. Consider a BIP on X with prior µ 0 and potential Φ. Assume that µ 0 has OM functional I 0 : E → R that satisfies M (µ 0 , E), leading to a lower semi-continuous and coercive extended OM functional I 0 : X → R. Assume also that Φ is locally uniformly continuous, is bounded below, and attains its lower bound -without loss of generality, take this minimal value to be 0. Now consider the posterior µ (n) (dx) := 1 Z (n) e −nΦ(x) µ 0 (dx) in the small-noise limit n → ∞. By Theorem 6.1, µ (n) has OM functional I (n) = nΦ + I 0 . It is easy to see that, pointwise, It is natural to hope that Γ-lim n→∞ I (n) = I (∞) as well, and hence that the MAP estimators of µ (n) converge, in the small-noise limit n → ∞, to the constrained minimisers of the prior OM functional I 0 among the global minima of Φ. However, this Γ-convergence is not straightforward to establish.
• For the Γ-lim sup inequality, choose any x ∈ X. Consider first the case that Φ(x) > 0: for the recovery sequence x n ≡ x, Similarly, in the case Φ(x) = 0, we may use the same recovery sequence to obtain • For the Γ-lim inf inequality, choose any x ∈ X and any sequence x n → x. Taking ω Φ,x to be a local modulus of continuity for Φ near x, we have and hence lim inf where the last inequality uses the lower semicontinuity of I 0 . At this point we encounter a problem. For x such that Φ(x) > 0, the right-hand side of the above display is indeed +∞, as required. However, for x such that Φ(x) = 0, the Γ-lim inf inequality only holds if lim inf n→∞ nω Φ,x ( x n − x ) = 0, and this holds only if x n converges sufficiently rapidly to x, which is not at all guaranteed.
We close this section by repeating the observation made at the end of Section 4, that the necessity of the continuous convergence / Γ-convergence assumptions, as opposed to simple pointwise convergence of densities or OM functionals, is shown by Example 1.1(c) from the introduction, which can easily be interpreted as a pointwise but not continuously convergent sequence of likelihoods/potentials and a Gaussian prior.

Closing remarks
The purpose of this paper was to establish a convergence theory for modes of probability measures (in the BIP setting, MAP estimators of Bayesian posterior measures) in the sense of maximisers of small ball probabilities, by first characterising them as minimisers of OM functionals and then using the well-established notion of Γ-convergence from the calculus of variations. The correspondence between modes and OM minimisers was established rigorously for global weak modes under the abstract M -property, and counterexamples were given to show that an extension to strong modes and relaxation of the M -property would be non-trivial if not impossible. The general programme of studying Γ-limits of OM functionals of measures was illustrated via two explicit example classes that are frequently used in the inverse problems literature, namely Gaussian measures and Besov B s p measures with integrability parameter p = 1. The Gaussian and Besov-1 measures treated in this paper are merely simple examples of a general class of measures, namely countable products of scaled copies of a measure on R (the normal and Laplace distributions respectively). General Besov-p measures and infiniteproduct Cauchy measures fall into this class. Part II of this paper (Ayanbayev et al., 2021) treats this class in a high degree of generality, following the same programme of determining the OM functional, verifying the M -property, and showing Γ-convergence and equicoercivity. The advantage of having considered the Gaussian and Besov-1 measures separately in this paper is that the requisite calculations could be done in more-or-less closed form and with much less notational overhead than the general case.
This work has made extensive use of the hypothesis that some measure µ of interest actually possesses an Onsager-Machlup functional (and moreover one that satisfies property M (µ, E) for a "good enough" E), and that µ possesses a mode. However, there are examples, even in finite dimension, of µ that have no strong or global weak modes, only generalised modes in the sense of Clason et al. (2019), which are associated with generalised OM functionals. A natural further generalisation of this article would be to study the Γ-convergence properties of such generalised OM functionals, and hence the convergence of generalised strong modes.
It would be of great value in applications not only to know that some sequence of approximations to an ideal limiting MAP problem Γ-converges, but also to quantify how quickly those approximate MAP estimators converge. Unfortunately, this is not trivial, since the basic framework of Γ-convergence does not easily deliver convergence rates for minimisers, especially when the objective functions are non-smooth, as is the case for most of the OM functionals in our setting. Therefore, the interesting question of convergence rates for modes / MAP estimators must be deferred to future work.

A. Γ-convergence
We collect here the basic definitions and results related to Γ-convergence as used in the main text. Standard references on Γ-convergence include the books of Braides (2002Braides ( , 2006 and Dal Maso (1993).
Definition A.1. Let X be a metric space and suppose that F n , F : X → R. We say that F n Γ-converges to F , written Γ- (a) (Γ-lim inf inequality) for every sequence (x n ) n∈N converging to x, (b) (Γ-lim sup inequality) and there exists a "recovery sequence" (x n ) n∈N converging to x such that F (x) lim sup n→∞ F n (x n ).
We say that (F n ) n∈N is equicoercive if for all t ∈ R, there exists a compact K t ⊆ X such that, for all n ∈ N, In general, Γ-convergence and pointwise convergence are independent of one another, although the following inequality always holds: However, one can compare Γ-convergence with continuous convergence: Definition A.2. Let X be a metric space and suppose that F n , F : X → R. We say that F n converges continuously to F if, for every x ∈ X and every neighbourhood V of F (x) in R, there exists N ∈ N and a neighbourhood U of x such that (n N and x ∈ U ) =⇒ F n (x ) ∈ V.
Continuous convergence implies both pointwise convergence and Γ-convergence and, in the case that F is continuous, is implied by uniform convergence of F n to F (Dal Maso, 1993, Chapters 4 and 5).
Theorem A.3 (Fundamental theorem of Γ-convergence; Braides, 2006, Theorem 2.10). Let X be a metric space and suppose that F n , F : X → R are such that Γ-lim n→∞ F n = F and (F n ) n∈N is equicoercive. Then F has a minimum value and min X F = lim n→∞ inf X F n . Moreover, if (x n ) n∈N is a precompact sequence such that lim n→∞ F n (x n ) = min X F , then every limit of a convergent subsequence of (x n ) n∈N is a minimiser of F . Thus, if each F n has a minimiser x n , then every convergent subsequence of (x n ) n∈N has as its limit a minimiser of F . A.4 (Dal Maso, 1993, Proposition 6.20). Let X be a metric space and suppose that F n , F : X → R and G n , G : X → R are such that F n Γ − → F on X and G n → G continuously on X as n → ∞. Then

Proposition
Theorem A.5 (Braides, 2006, Proposition 2.5). The Γ-limit of a constant sequence (F ) n∈N is the lower semicontinuous envelope F lsc of F , i.e. the greatest lower semicontinuous function bounded above by F : In particular, F = Γ-lim n→∞ F if and only if F is lower semicontinuous.

B. Technical supporting results
B.1. Supporting results for Section 3 Lemma B.1 (The M -property). Let X be a metric space and let µ 0 ∈ P(X). Suppose that µ 0 has an OM functional I : E → R on a nonempty subset E ⊆ supp(µ). (c) If property M (µ 0 , E) holds, then no point of X \ E can be a global weak mode for µ 0 , and hence cannot be a strong mode for µ 0 .
Since µ 0 has an OM functional I : E → R, where we used (3.3) and (3.4) in the penultimate and last equation. This proves (a). For Observe that, for any x ∈ X, The exponential on the right-hand side is finite, by the assumption that Φ is bounded on bounded subsets of X. If x ∈ X \ E, then taking the limit as r 0 yields property M (µ, E), as claimed.
Suppose that x ∈ X \E is a global weak mode in the sense of Definition 3.7. Then x ∈ supp(µ) and 1 lim sup Above, we used that x ∈ E ⊆ supp(µ) to ensure that for every r > 0, µ 0 (Br(x )) µ 0 (Br(x)) > 0. The inequality above implies that lim inf and hence x does not satisfy (3.4). Finally, if x is not a global weak mode, then by Lemma 3.9, it cannot be a strong mode. This proves (c).
Then lim inf ε 0 µ(Bε(x)) µ(Bε(1)) = 0 for any point. Hence, for E = {1}, the lim inf part of property M (µ, E) is satisfied. Further, u = 1 is a minimiser of any OM functional on E since E = {1}, but u = 1 is not a global weak mode due to (B.1). Note that the above example can be modified to be even more extreme: if one sets a n := 2 −n(n−1) and b n := an 2 n , then one obtains that a n /b n = 2 n and a n+1 /b n = 2 −n , and hence lim inf Proposition B.3 (Open v. closed balls). Let X be a metric space, µ ∈ P(X) a probability measure on (X, B(X)) and x 1 , x 2 ∈ X with x 2 ∈ supp(µ). For ε > 0 define the ratios R ε := µ(Bε(x 1 )) µ(Bε(x 2 )) andR ε := µ(Bε(x 1 )) µ(Bε(x 2 )) , whereB r (x) denotes the closed ball in X of radius r centred on x. Then Hence, lim ε 0Rε exists if and only if lim ε 0 R ε exists, in which case these two values agree.
Proof. First assume that lim sup ε 0Rε > lim sup ε 0 R ε =:s. Then there exists ζ > 0 and a positive null sequence (ε n ) n∈N such thatR εn s + ζ. For each n ∈ N perform the following construction: Since δ>0 B εn+δ (x) =B εn (x) for any x ∈ X and using continuity of probability measures, we obtain and there exits 0 < δ n < n −1 such that R εn+δn s + ζ/2. Hence, we have constructed a null sequence (ε n ) n∈N := (ε n + δ n ) n∈N with s = lim sup ε 0 R ε lim sup n→∞ Rε n s + ζ/2, which is a contradiction. Therefore, our assumption was false and lim sup ε 0Rε lim sup ε 0 R ε . The other inequality can be proven similarly using δ>0B εn−δ (x) = B εn (x) and a similar argument works for the corresponding lim inf statement.
Example B.4 (OM functionals and changes of metric). Following Lie and Sullivan (2018, Example 5.6), let µ be the finite Borel measure on (R 2 , B(R 2 )) that is one-dimensional Hausdorff measure (i.e. uniform length measure) on the disjoint union E of two right-angled crosses in the plane, with one cross, E + , aligned with the coordinate axes and centred at e 1 := (1, 0) and the other, E − , aligned at π/4 to the axes and centred at −e 1 , as illustrated in Figure B.2. (Note that there is a slight error in (Lie and Sullivan, 2018, Example 5.6) concerning the side lengths of the cross E − and hence the total mass of µ, but this error does not affect the final conclusion of that example or this one, since it is only the mass near ±e 1 that is important.) With respect to the 1-norm, µ(B 1 r (−e 1 )) = 2 √ 2r, µ(B 1 r (e 1 )) = 4r, whereas, with respect to the ∞-norm, which in this setting is Lipschitz equivalent to the 1-norm, µ(B ∞ r (−e 1 )) = 4 √ 2r, µ(B ∞ r (e 1 )) = 4r, and, after considering the other points of R 2 , it follows that e 1 (resp. −e 1 ) is the unique strong and global weak mode of µ with respect to the 1-norm (resp. ∞-norm). These same calculations, using the Taylor expansion of y → √ y. Thus, µ(B r (x)) decreases to zero linearly in r, whereas µ(B r (m)) decreases to zero like r 1/2 . Recall that property M (µ, E) holds if there exists some x ∈ E such that if x ∈ X \ E then (3.4) holds, i.e. lim r 0 µ(Br(x)) µ(Br(x )) = 0. Using that x = m + δ and x = m shows that property M (µ, E) holds.
Thus, u = 1 cannot be a strong mode of µ, even though it minimises I µ,E .
B.3. Supporting results for Section 5 Lemma B.6. Let a (n) = (a n k ) k∈N ∈ 2 , n ∈ N, define a bounded sequence in 2 , i.e. there exists a constant M > 0 such that a (n) 2 M for each n ∈ N. Further, let a (n) k −−−→ n→∞ a k ∈ R for each k ∈ N. Then a := (a k ) k∈N ∈ 2 and a 2 M .
Remark B.7. Lemma B.6 does not state that a (n) − a 2 → 0 and, in fact, this is not true in general. A counterexample is provided by a = 0 and a (n) = (δ nk ) k∈N , where δ nk denotes the Kronecker delta function.

B.4. Supporting results for Section 6
Recall that a function f : X → Y between metric spaces X and Y is locally uniformly continuous if, for every x ∈ X, there exists a function ω f,x : [0, ∞) → [0, ∞], a local modulus of continuity for Φ near x, such that and ω f,x (r) → 0 as r → 0. (B.6) In particular, (B.6) implies that, for each x ∈ X, there exists r x > 0 such that ω f,x (r x ) is finite for all 0 r r x . It is no loss of generality to assume that ω f,x is an increasing function. Local uniform continuity is slightly but strictly stronger than f being continuous: according to Izzo (1994, Theorem 1), on every infinite-dimensional separable normed space there exist bounded, continuous real-valued functions that are nowhere locally uniformly continuous; however, Izzo (1994, Theorem 4) also shows that every continuous real-valued function on a metric space can be approximated uniformly by locally uniformly continuous functions.
which proves the first claim. The second claim is an immediate consequence of the first part and Lemma B.1(b).
Lemma B.9 (Continuous convergence of potentials via projection). Let X be a separable Banach space and let (X n ) n∈N be a sequence of (not necessarily nested) finite-dimensional subspaces with surjective uniformly bounded linear projection operators P n : X → X n such that for all x ∈ X, lim n→∞ P n x − x = 0. (B.7) Let Φ : X → R be locally uniformly continuous.
Proof. Let M 1 be a uniform upper bound for the operator norms P n , n ∈ N. (Note that, in the special case that X is a separable Hilbert space with complete orthonormal system {ψ n } n∈N and P n is the orthogonal projection onto span{ψ 1 , . . . , ψ n }, we may take M = 1.) To show (a), fix n ∈ N and x ∈ X and let ω Φ,Pnx : [0, ∞) → [0, ∞] be an increasing local modulus of continuity for Φ near P n x. Then, for all x ∈ X, Thus, ω Φ•Pn,x (r) := ω Φ,Pnx (M r) is a local modulus of continuity for Φ • P n near x.
To establish (b), fix x ∈ X and let ω Φ,x : [0, ∞) → [0, ∞] be an increasing local modulus of continuity for Φ near x. Let ε > 0 be arbitrary and let r ε > 0 be such that ω Φ,x (r ε ) < ε. By (B.7), there exists N ∈ N such that, for all n N , P n x − x < r ε /2. Then, for n N and x ∈ X with x − x < r ε /2M , as required.