MAP Estimators for Piecewise Continuous Inversion

We study the inverse problem of estimating a field $u$ from data comprising a finite set of nonlinear functionals of $u$, subject to additive noise; we denote this observed data by $y$. Our interest is in the reconstruction of piecewise continuous fields in which the discontinuity set is described by a finite number of geometric parameters. Natural applications include groundwater flow and electrical impedance tomography. We take a Bayesian approach, placing a prior distribution on $u$ and determining the conditional distribution on $u$ given the data $y$. It is then natural to study maximum a posterior (MAP) estimators. Recently (Dashti et al 2013) it has been shown that MAP estimators can be characterised as minimisers of a generalised Onsager-Machlup functional, in the case where the prior measure is a Gaussian random field. We extend this theory to a more general class of prior distributions which allows for piecewise continuous fields. Specifically, the prior field is assumed to be piecewise Gaussian with random interfaces between the different Gaussians defined by a finite number of parameters. We also make connections with recent work on MAP estimators for linear problems and possibly non-Gaussian priors (Helin, Burger 2015) which employs the notion of Fomin derivative. In showing applicability of our theory we focus on the groundwater flow and EIT models, though the theory holds more generally. Numerical experiments are implemented for the groundwater flow model, demonstrating the feasibility of determining MAP estimators for these piecewise continuous models, but also that the geometric formulation can lead to multiple nearby (local) MAP estimators. We relate these MAP estimators to the behaviour of output from MCMC samples of the posterior, obtained using a state-of-the-art function space Metropolis-Hastings method.

(Some figures may appear in colour only in the online journal) 1. Introduction

Context and literature review
A common inverse problem is that of estimating an unknown function from noisy measurements of a (possibly nonlinear) map applied to the function. Statistical and deterministic approaches to this problem have been considered extensively. In this paper we focus on the the study of MAP estimators within the Bayesian approach; these estimators provide a natural link between deterministic and statistical methods. In the Bayesian formulation, we describe the solution probabilistically and the distribution of the unknown, given the measurements and a prior model, is termed the posterior distribution. MAP estimators attempt to work with a notion of solutions of maximal probability under this posterior distribution and are typically characterised variationally, linking to deterministic methods.
There are two main approaches taken to the study of the posterior. The first is to discretise the space, and then apply finite dimensional Bayesian methodology [18]. An advantage to this approach is the availability of a Lebesgue density and a large amount of previous work which can then be built upon; but issues may arise (for example computationally) when the dimension of the discretisation space is increased. An alternative approach is to apply infinite dimensional methodology directly on the original space, to derive algorithms, and then discretise to implement. This approach has been studied for linear problems in [12,25,27], and more recently for nonlinear problems [10,21,22,33]. It is the latter approach that we focus on in this paper.
In some situations it may be that point estimates are more desirable, or more computationally feasible, than the entire posterior distribution. A detailed study of point estimates can be found in for example [24]. Three different estimates are commonly considered: the posterior mean which minimises L 2 loss, the posterior median which minimises L 1 loss, and posterior modes which minimise zero-one loss. The former two estimates are unique [28], but a distribution may possess more than one mode. A consequence of this is that the posterior mean and median may be misleading in the case of a multi-modal posterior. Posterior modes are often termed maximum a posteriori (MAP) estimators in the literature.
In this paper we focus on MAP estimation. If the posterior has Lebesgue density ρ, MAP estimators are given by the global maxima of ρ. The problem of MAP estimation in this case is hence a deterministic variational problem, and has been well-studied [18]. In the infinitedimensional setting there is no Lebesgue density, but there has been recent research aimed at characterising the mode variationally and linking to the classical regularisation techniques described in, for example, [9] in the case when Gaussian priors are adopted. Non-Gaussian priors have also been considered in the infinite dimensional setting-in [14] weak MAP (wMAP) estimators are defined as generalisations of MAP estimators, and a variational characterisation of them is provided in the case that the forward map is linear, using the notion of Fomin derivative.
In this paper we make a significant extension of the work in [9] to include priors which are defined by a combination of Gaussian random fields and a finite number of geometric parameters which define the different domains in which the different random fields apply. We thereby study the reconstruction of piecewise continuous fields with interfaces defined by a finite number of parameters. Our motivation for doing so comes from the work in [5], and its predecessors. In that paper a Bayesian inverse problem for piecewise constant fields, modelling the permeability appearing in a two-phase subsurface flow model, was studied. Such piecewise continuous fields were also previously studied in a groundwater flow context in [16], where existence and well-posedness of the posterior distribution were shown. The idea of single point estimates being misleading is discussed and the existence of multiple local MAP estimators is shown. We also link our work to that in [14], by characterising the MAP estimator via the Fomin derivative.
Throughout this paper we focus on two model problems: groundwater flow and electrical impedance tomography (EIT). Both of these problems are important examples of large scale inverse problems, with applications of great economic and societal value. MAP estimation in such problems has been studied previously [2,4,17,31]. However our formulation is quite general; for brevity we simply illustrate the theory for groundwater flow and EIT, and the numerics only in the case of groundwater flow.

Mathematical setting
Let X be a separable Banach space and let  L Í k . X should be thought of as a function space and Λ a space of geometric parameters. Given Î´L u a X , ( ) , we construct another function Î u Z a , say. Considering the ingredients u and a in the construction of this function u a separately will be useful in what follows. An example of such a construction is shown in figure 1.
Suppose we have a (typically nonlinear) forward operator ´L  X Y : , where  = Y J . If (u, a) denotes the true input to our forward problem, we observe data Î y Y given by where h~G N 0, ( ),  G ÎJ J positive definite, is some centred Gaussian noise on Y.
Modelling everything probabilistically, we build up the joint distribution of u a y , , ( ) by specifying a prior distribution m ń 0 0 on (u, a) and an independent noise model on η. We are then interested in the posterior μ on (u, a) given y. Denote |·| the Euclidean norm on  J , and Figure 1. An example of construction of a piecewise continuous field, using two continuous fields and two scalar parameters. Here the scalar parameters determine the points where the interface meets each side of the domain. We work on the space of continuous fields and parameters, but it is pushforward of these by the construction map that represents the piecewise continuous field we aim to recover.
for any positive definite |·| ≔ | · | the weighted norm on  J . Under certain conditions, using a form of Bayes' theorem, we may write μ in the form The modes of the posterior distribution, termed MAP estimators, can be considered 'best guesses' for the state (u, a) given the data y. We now state rigorously what we mean by a MAP estimator for μ, as in [9]. Given Î´L u a X , ) the ball of radius δ centred at (u, a).  is called a MAP estimator for the measure μ.
If this definition is applied to probability measures defined via a Lebesgue density, MAP estimators coincide with maxima of this density. Here we extend the notion to the study of piecewise continuous fields.

Our contribution
The primary contributions of the paper are fourfold: (i) We develop the MAP estimator theory for infinite dimensional geometric inverse problems involving discontinuous fields, building on theory in both of the recent papers [9,14], and opening up new avenues for the study of MAP estimators in infinite dimensional inverse problems. (ii) We explicitly link MAP estimation for these geometric inverse problems to a variational Onsager-Machlup minimisation problem. (iii) We show that the theory applies to the groundwater flow model as in [16] and we show that the theory applies to the EIT problem as in [11]. (iv) We implement numerical experiments for the groundwater flow model and demonstrate the feasibility of computing (local) MAP estimators within the geometric formulation, but also show that they can lead to multiple nearby solutions. We relate these multiple MAP estimators to the behaviour of output from MCMC to probe the posterior.

Structure of the paper
• In section 2 we describe the forward maps associated with the groundwater flow and EIT problems, and show that they have the appropriate regularity needed in sections 4-5. • In section 3 we describe the choice of, and assumptions upon, the prior distribution whose samples comprise piecewise Gaussian random fields with random interfaces. • In section 4 we show existence and uniqueness of the posterior distribution.
• In section 5 we define MAP estimators and prove their equivalence to minimisers of an appropriate Onsager-Machlup functional. • In section 6 we present numerics for the groundwater flow problem. We consider three different prior models and investigate maximisers of the posterior distribution. • In section 7 we conclude and outline possible future work in the area.

The forward problem
We consider two model problems. Our first problem (groundwater flow) is that of determining the piecewise continuous permeability of a medium, given noisy measurements of water pressure (or hydraulic head) within it. The second problem (EIT) is determination of the piecewise continuous conductivity within a body from boundary voltage measurements.
In what follows, the finite dimensional space Λ will be a space of geometric parameters defining the interfaces between different media, and X will be a product of function spaces defining the values of the permeabilities/conductivities between the interfaces.
We begin in section 2.1 by defining the construction map  u a u , a ( ) for the piecewise continuous fields. In sections 2.2 and 2.3 we describe the models for groundwater flow and EIT respectively, and prove regularity properties of the resulting forward maps; these properties are required for our subsequent theory.

Let
 Í D d be the domain of interest and let  L Í k be the space of geometric parameters. Take a collection of set-valued maps We assume that each map A i is continuous in the sense that where Δ denotes the symmetric difference: ( ) is the construction map. We give four examples of the functions A i and the sets/interfaces they define.
] and N=2. We specify points a and b on either side of the square D and join them with a straight line. We then let A a b , 1 ( ) be the region of D below this line and ) be the region of D beneath the graph of the curve H a b , This setup includes the previous example: ) defines the appropriate straight lines. The continuity of A 1 and A 2 can be seen by noting that . For example, one may take H to be given by so that the parameter c determines the (signed) magnitude of the fault. Defining the sets A a b c , , 1 ( ) and A a b c , , 2 ( ) as the regions of D beneath and above the curve H a b c , , ( ) respectively, the continuity can be seen in a similar manner to the previous example. Example sets A i (a, b, c) for various parameters a, b, c are shown in figure 3.
] , but with a much larger parameter space, one could also select points at specific x-coordinates and linearly interpolate between them.
1, to be the linear interpolation of the points Example sets A i (a) for various parameters a are shown in figure 5.
In order to see the continuity of these maps, we first partition the domain into strips D j , It follows from properties of the symmetric difference that It hence suffices to show that the maps are continuous for all i j , . This follows from the same argument as in example 2.2, for sufficiently smalla b | |.

The Darcy model for groundwater flow
We consider the Darcy model for groundwater flow on a domain ( ) denote the permeability tensor of the medium, p the pressure of the water, and assume the viscosity of the water is constant. Darcy's law [8] tells us that the velocity is proportional to the gradient of the pressure: Additionally, a local form of mass conservation tells us that  = v f. · Combining these two equations, and imposing Dirichlet boundary conditions for simplicity, results in the PDE

· ( )
This is the PDE we will consider in the forward model, and it gives rise to a solution map k  p.
For simplicity we will work in the case where κ is an isotropic (scalar) permeability, bounded above and below by positive constants, and so it can be represented as the image of some bounded function under a positive continuously differentiable map   s  + : . Let = V H D 1 ( ), the Sobolev space of once weakly differentiable functions on D [13].
to be the solution of the weak form of the PDE We are first interested in the regularity of the map ´L  X V : · ( ( ) ) · ( ( ) ) ( ) The following lemma tells us that the map  is well defined and has certain regularity properties. Its proof is given in the appendix.
is well-defined and satisfies: is locally Lipschitz continuous, i.e. for every > r 0 there exists is continuous.
We now choose a continuous linear observation operator , .
is continuous. Proof.
(i) The map ℓ is defined to be a continuous linear functional, and so in particular is Lipschitz. Since we have   = ℓ • the result follows from lemma 2.5(ii).
(ii) This follows from the continuity of ℓ and lemma 2.5(iii). ,

The complete electrode model (CEM) for EIT
EIT is an imaging technique that aims to make inference about the internal conductivity of a body from surface voltage measurements. Electrodes are attached to the surface of the body, current is injected, and the resulting voltages on the electrodes are measured. Applications include both medical imaging, where the aim is to non-invasively detect internal abnormalities within a human patient, and subsurface imaging, where material properties of the subsurface are differentiated via their conductivities. Early references include [15] in the context of medical imaging and [20] in the context of subsurface imaging. The CEM is proposed for the forward model in [32], and shown to agree with experimental data up to measurement precision. In its strong form, the PDE reads ⎧ ( ) . Figure 6 shows an example domain and attached electrodes. A current I l is injected into each electrode e l , and a voltage measurement V l made. Here κ represents the conductivity of the body, and v the potential within it. Note that the solution comprises both a function Î ( ) is known to be Fréchet differentiable when we equip the conductivity space with the supremum norm [17].
We can apply different current stimulation patterns to the electrodes to yield additional information. Assume that we have M different (linearly independent) current stimulation patterns is positive and continuously differentiable. Our forward operator   L  X : J , is then given by the composition We show in the appendix that the map defined in this way has the same regularity as the map corresponding to the Darcy model.
is continuous.

Onsager-Machlup functionals and prior modelling
In this section we recall the definition of an Onsager-Machlup functional for a measure which is equivalent 2 to a Gaussian measure. We then introduce the prior measures that we will consider, first on the function space X, then the geometric parameter space Λ, and finally the product space´L X . We conclude the section by extending the definition of Onsager-Machlup functional so that it is appropriate for the measures we consider here, supported on fields and geometric parameters which are combined to make piecewise continuous functions.

Onsager-Machlup functionals
The Onsager-Machlup functional of a measure is the negative logarithm of its Lebesgue density when such a density exists, and otherwise can be thought of analogously. We start by defining it precisely for measures defined via density with respect to a Gaussian, allowing for infinite dimensional spaces on which Lebesgue measure is not defined. Suppose that μ is a measure equivalent to a Gaussian measure m 0 . Then the Onsager-Machlup functional for μ is defined as follows. (i) The Onsager-Machlup functional is only defined up to addition of a constant.
(ii) If Z is finite dimensional and μ admits a positive Lebesgue density ρ, then In light of the previous remark, this is true even if ρ is not normalised.
Then by the previous remark, the Onsager-Machlup functional for μ is given by

Inverse Problems 32 (2016) 105003 M M Dunlop and A M Stuart
(iv) The preceding example (iii) may be extended to an infinite dimensional setting. Let Z be a separable Banach space, and let Then theorem 3.2 in [9] tells us that the Onsager-Machlup functional for μ is given by In this paper, the posterior distribution will be a measure on the product space =´L Z X . The prior distribution will be an independent product of a Gaussian on X and a compactly supported measure on Λ. Due to the assumption of compact support, the prior will not be equivalent to a Gaussian measure on Z and so the above definition does not apply; we provide a suitable extension to the definition in section 3.4.
As we are taking a Bayesian approach to the inverse problem, we incorporate our prior beliefs about the permeability/conductivity into the model via probability measures on X and Λ. We will combine these into a prior measure on the product space´L X . We equip this space with any (complete) norm   ,

Priors for the fields
We wish to put priors on the fields ) . If E i denotes the Cameron-Martin space [10] of m i 0 , then that of m 0 is given by with inner product given by the sum of those of its component spaces.
The Onsager-Machlup functional of m 0 is known to be given by This can be seen, for example, as a consequence of proposition 18.3 in [26]. correlations between the fields and the geometric parameters under the prior is a more technical issue however, and so we will assume that these are independent.
Example 3.4. Define the negative Laplacian with Neumann boundary conditions as follows: . Then each  i is traceclass and positive definite, and samples from each m i 0 will be almost surely continuous and so m 0 can be considered as a Gaussian measure on X. Moreover, regularity of the samples will increase as a i increases, see [10] for details.

Priors for the geometric parameters
We also want to put a prior measure on the geometric parameters, i.e. we want to choose a probability measure on Λ. Since  L Í k the analysis is more straightforward than the infinite dimensional case. Let ν be a probability measure on Λ with compact support Í L S . We assume ν is absolutely continuous with respect to the Lebesgue measure and that its density ρ is continuous on S. Despite being defined on a finite dimensional space, the measure ν is not necessarily equivalent to the Lebesgue measure on the whole of  k and so the previous definition of Onsager-Machlup functional does not apply. We hence must formulate a new definition for this case.
Since r > 0 on S int( ), we can use the continuity of ρ to calculate the limits of ratios of small ball probabilities for ν on S int( ). Let If either a 1 or a 2 lie outside of S the limit can be seen to be 0 or ¥ respectively. It hence makes sense to define the Onsager-Machlup functional for ν on L ¶S ⧹ as which is well defined due to the continuity of ρ on S int( ). K is then continuous on the whole of S.

Inverse Problems 32 (2016) 105003 M M Dunlop and A M Stuart
Remark 3.5. If we were to define K on ¶S in the same way that we defined it on L ¶S ⧹ , K would have a positive jump at the boundary related to the geometry of S. This would mean that K was not lower semi-continuous on S which would cause problems when seeking minimisers. The definition we have chosen is appropriate: if any minimising sequence  Í a S int n n 1 ( ) ( ) of K has an accumulation point on ¶S, then ν has a mode at that point.
If we have no prior knowledge about the interfaces and Λ is compact, we could place a uniform prior on the whole of Λ. Otherwise we could either choose a prior with smaller support, or one that weights certain areas more than others.

Priors on X Â Λ
We assume that the priors on the fields and the geometric parameters are independent, so that we may take the product measure m ń 0 0 as our prior on´L X . Note that if ). This is much more cumbersome to deal with however, since for example It is for this reason we incorporate the mapping F into the forward map  . Assuming now that the prior m ń 0 0 is as described above, we can define the Onsager-Machlup functional for measures μ on´L X which are equivalent to m ń 0 0 .

Likelihood and posterior distribution
We return to the abstract setting mentioned in the introduction. Let X be a separable Banach space, . If (u, a) denotes the true input to our forward problem, we observe data Î y Y given by 3 Given a measurable map . If a random variable u on X has law μ, . We can use this to formally find the distribution of u a y , is given by Hence under suitable regularity conditions, Bayes' theorem tells us that the distribution μ of u a y , We now make this statement rigorous. To keep the situation general, we do not insist that Φ takes the form (4.1), and instead assert only that Φ satisfies the following assumptions.
monotonic non-decreasing separately in each argument, such that for each > r 0, Î ¢ u X and Î L¢ a , and Î y y Y , 1 2 with < y y r , 1 2 r u  a y  y  , ; , ; , , ;  r a y u  u  , ; , ; , , .
These assumptions are used in the proof of existence and well-posedness of the posterior distribution, which is given in the appendix: is positive and finite, and so the probability measure m y , ; .
In this paper we are focused on the case when the field prior m 0 is taken to be Gaussian. However, the above existence and well-posedness result still holds if, for example, m 0 is taken to be Besov rather than Gaussian, since a Fernique-type theorem holds for such priors [10,23].
We show that for both choices of test models, the potential (4.1) satisfies assumptions 4.1: denote the forward map corresponding to either the groundwater flow or EIT problem, as detailed in section 2. Let Î y Y and let Proof.
With a choice of prior as described in section 3, we can therefore apply theorem 4.2 in the cases where the forward map is one of the two described in section 2 and the observational noise is Gaussian. In this case, the constant   M r u a , , , where S is the (compact) support of n 0 .

MAP estimators
In section 5.1 we characterise the MAP estimators for the posterior μ in terms of the Onsager-Machlup functional for μ. In section 5.2 we relate this Onsager-Machlup functional to the Fomin derivative of μ, with reference to the work [14].

MAP estimators and the Onsager-Machlup functional
Throughout this section we assume that μ is given by (4.2). Furthermore we assume that m 0 has mean zero for simplicity. Additionally, when we assume that assumptions 4.1 hold, we will assume that ¢´L¢ =´L X X . Suppressing the dependence of Φ on the data y since it is not relevant in the sequel, we define the functional  L  I X : where J K , are as defined in sections 3.2 and 3.3 respectively. In this section we attain the following three results concerning I and μ, which are proved in the appendix.

The Fomin derivative approach
In recent work of Helin and Burger [14], the concept of MAP estimators was generalised to wMAP estimators using the notion of Fomin differentiability of measures. The definition of wMAP estimators is such that if û is a MAP estimator then it is a wMAP estimator, but not necessarily vice versa. Under certain assumptions, they show that wMAP estimators are equivalent to minimisers of a particular functional. The assumptions do not hold in our case, since our forward map is nonlinear and our prior m ń 0 0 is not necessarily convex, however the functional agrees with our objective functional I. Thus in what follows we provide a link between the Fomin derivative of the posterior μ and our objective functional I.
The Fomin derivative of a measure on a Banach space X equipped with its Borel σalgebra  X ( ) is defined as follows.
The Radon-Nikodym density of l d z with respect to λ is denoted b l z , and is called the logarithmic derivative of λ along z. (i) Let n 0 be a measure on  k with Lebesgue density ρ, supported and continuously differentiable on  Í S k . Then for any Î a S int( ) and This follows from the Cameron-Martin and dominated convergence theorems. (iii) Again using the Cameron-Martin and dominated convergence theorems, we see that with n 0 and m 0 as above, for any Îú a X S , i n t ( ) ( )and We can use the above example to characterise the Fomin derivative of our posterior distribution μ, given by (4.2).
is bounded measurable with uniformly bounded derivative, and assume that r is continuously differentiable on S. Then for each ) .
Proof. We use result 2.1.13 ( )from [3], which tells us that if λ is a measure differentiable along z and f is a bounded measurable function with uniformly bounded partial derivative ¶ f z , then the measure l f · is differentiable along h as well and We apply this result with l m ). Note that f satisfies the assumptions of (2.1.13) due to the assumptions on Φ. The result then follows using example 5.6 (iii) above. ,

Numerical experiments
In this section we perform some numerical experiments related to the theory above for a variety of geometric models, in the case of the groundwater flow forward map introduced in section 2.2. We both compute minimisers of the relevant Onsager-Machlup functional (i.e. MAP estimators), and we sample the posterior distribution using a state-of-the-art function space Metropolis-Hastings MCMC method. We then relate the samples to the MAP estimators. From these numerical experiments we observe the following behaviour of the posterior distribution.
(i) The posterior distribution can be highly multi-modal, especially when the parameterised geometry is non-trivial. This is evident from the sensitivity of the minimisation of the objective functional on its initial state, and the behaviour of MCMC chains initiallised at these calculated minimisers. (ii) When the geometry is incorrect the fields attempt to compensate, which presumably contributes to the existence of multiple local minimisers of the objective functional; this occurs in both the MAP estimation and the MCMC samples. A consequence is that many of the local minimisers lack the desired sharp interfaces. These minimisers could however be used to suggest more appropriate geometric parameters for the initialisation. (iii) The mixing rates of MCMC chains have a strong dependence upon which local minimiser they are initialised at: acceptance rates can vary wildly when the initial state is changed but all other parameters are kept fixed. This provides some insight into the shape of the posterior distribution. (iv) Though often there are many local minimisers, they can be separated into classes of minimisers sharing similar characteristics, such as close geometry. MCMC chains typically tend to stay within these classes, which can be observed by monitoring the closest local minimiser to an MCMC chain's state at each step. This suggests that the posterior can possess several clusters of nearby modes.
One conclusion we can draw from the above points is that there are often many different geometries that are consistent with the data. This is not necessarily an effect of noise on the measurements, and the effect may persist as the noise level goes to zero, since it is unknown if these geometric parameters are uniquely identifiable in general.

Test models
We consider three different geometric models: a two parameter, two layer model; a five parameter, three layer model with fault; and a five parameter channelised model.
In what follows, as in example 3.4, we define the negative Laplacian with Neumann boundary conditions: 6.1.3. Model 3 (channel). This model is described in [16], where it is labelled test model 2.
The geometric parameters = a a a a a a , , , ( )are defined as in figure 9. Here a a a a a , , , For each model, we fix a true permeability u a , ( ) † † as a draw from the corresponding prior distribution, generated on a mesh of 256 2 points. For the forward model, we take the coefficient map s = exp (·) (·). We observe the pressure on a grid

MAP estimation
Based on the theory in section 5, we can calculate MAP estimators by minimising the Onsager-Machlup functional for the posterior distribution. We compute local minimisers of the Onsager-Machlup functional using the following iterative alternating method.
(2) Update the geometric parameters simultaneously using the Nelder-Mead algorithm. The Nelder-Mead and Gauss-Newton algorithms are discussed in [30], in sections 9.5 and 10.3 respectively. Since we do not update the fields and geometric parameters simultaneously, it is possible that this algorithm will get caught in a saddle point: consider for example the function , at the point 0, 0 ( ), being minimised alternately in the coordinate directions. Hence once the algorithm stalls, we propose a large number of random simultaneous updates in an attempt to find a lower functional value. If this is successful, we return to step (2) of the algorithm. We terminate the algorithm once the difference between successive values of Φ is below TOL = 10 −5 . Calculations are performed on a mesh of 64 2 points in order to avoid an inverse crime.
To ensure that we explore the support of the posterior distribution, we choose a variety of initial states Î´L u a X , Generally the geometric parameters are closely recovered regardless of the initialisation state, though there is more variation in the fields. In the simulations where the geometry is inaccurate, for example simulations 7, 17 and 46, the fields can be seen to be compensating by forming a 'soft' interface where the true interface is.
The minimisers associated with model 2 admit much more variation, though it is possible to partition them into smaller subsets of minimisers which share similar characteristics to one another, as mentioned in point (iv) at the beginning of the section. The clustering of the different minimisers is performed by eye, classifying them according to similar geometric parameters. Additionally we have an other class, containing the minimisers which do not appear similar to one another nor appear to fit into any other class. We see later with MCMC simulations that these states do still act as local maximisers of the posterior probability.
The minimisers of the Onsager-Machlup functional for model 3 show even more variation than those for model 2, with the geometry in half of the minimisers not even being close to the true geometry. In the cases where the geometry is drastically wrong the fields have again attempted to compensate. This behaviour is particularly evident in the other class, and is echoed in the MCMC simulations later. The other class here is much larger than for model 2, though as with model 2 these states do appear to act as distinct local maximisers of the posterior probability. This multi-modality of the posterior distribution is not unexpected. The paper [5] considers the history matching problem in reservoir simulation, in which inference is done jointly on both geometric and permeability parameters in the IC fault model. Though the forward In [5] it is observed that the global minimum often does not correspond to the truth, especially in the presence of measurement noise, and so all local minimisers of the Onsager-Machlup functional should be sought before drawing conclusions about the permeabilities-this appears to be the case in our model as well. We note that MCMC can be

MCMC and local minimisers
We now observe the behaviour of MCMC chains initialised at these local minimisers of the Onsager-Machlup functional. We use a Metropolis-within-Gibbs algorithm for the sampling, alternating between preconditioned Crank-Nicolson updates for the fields, see [6] for details, and random walk metropolis updates for the geometric parameters. Again, simulations are performed on a mesh of 64 2 points in order to avoid an inverse crime. 10 5 samples are taken for each chain, with the initial2 10 4 discarded as burn-in. The conditional means calculated from the samples are shown in figures 13-15.
We monitor the value that Φ takes along the chain u a , ) . Note that it makes no sense to monitor the value that the objective functional I takes along the chain as the fields almost surely do not lie in the corresponding Cameron-Martin spaces, and so I is almost surely infinite along the chain in the continuum setting.
In addition, we monitor which minimiser the chain is nearest at each step, in the permeability space. Specifically, we look at ( ) is the construction map (2.1) from the state space to the permeability space. We make the choice of the L 2 norm over the ¥ L norm for the permeability space to avoid over-penalising incorrect geometry. A selection of traces of m n are shown in figures 19-21. These illustrate that even though some of the local minimisers are very far from from the true log-permeability, they do indeed act as local maximisers of the posterior probability. Moreover, they show the interaction between the different classes of minimisers in the cases of models 2 and 3. Specifically, they show that the MCMC chains can easily move within these classes, but moving between classes is more difficult.
We now discuss the above monitored quantities, and their relation to MAP estimators, on a model-by-model basis. Despite the slight variation in the fields of the minimisers from model 1, the conditional means arising from the MCMC are nearly all identical. Simulation 23 stands out from the rest due to its slightly incorrect geometry-this effect can be seen in the trace plot of Φ, figure 16, where the value of Φ remains larger than the simulations started elsewhere. The traces of Φ for all other initialisations behave similarly to one another, taking similar misfit values after2 10 4 samples. From figure 19, it can be seen that the MCMC chains considered all spend a lot of time close to MAP estimator 38, despite this not being the estimator with the lowest functional value.
For model 2, typically the conditional means within the different classes are very similar to one another. Classes A and C resemble each other, and class B has compensated for incorrect geometry with the centre field. Faults have developed in class D, though there is still some compensation in the field. The centre field and a small fault has appeared in class E, but again the fields are compensating. The geometric parameters for the permeabilities in the other class remain relatively unchanged, but the fields have more freedom to attain a lower misfit than in the Onsager-Machlup functional minimisation due to the lack of regularisation term. Figure 17 shows evidence for a number of local minima with a large data misfit value Φ, with some chains appearing to remain stuck in their vicinity. The four chains visible in In the channelised model, model 3, there is yet more variation between local minimisers. Here the compensation effect by the fields is even more apparent in the conditional means, especially in the other class. From figure 18 it appears that the local minima are much sharper and more sparsely distributed than the previous two models. Again the chains with the largest Φ values were initialised at minimisers in the other class, suggesting the existence of many posterior modes with incorrect geometry. The mixing of the MCMC chains varies heavily based on the initialisation points of the chains: with the same jump parameters for the field and geometric parameter proposals, acceptance rates vary largely based on which minimiser the chain was started from. This indicates that some of the minima are much sharper than others. This is also evident from the

Conclusions and future work
We have made a new contribution to the recently developed theory of MAP estimation in infinite dimensions [9,14]. We link MAP estimation to a variational Onsager-Machlup functional. The work is focused on priors for piecewise Gaussian random fields, with random interfaces parameterised finite-dimensionally. Such fields arise naturally in applications such as groundwater flow and EIT, and these are used to illustrate the theory and numerics. The The majority of the simultions find a small value of Φ almost immediately, but numerous fail to reach there, settling in local minima. The shape of these minima can be seen in figure 14, and generally correspond to those in the same class as the initial state. The majority of the simultions find a small value of Φ almost immediately, but numerous fail to reach there, settling in local minima. The shape of these minima can be seen in figure 15, and generally correspond to those in the same class as the initial state.    work opens up several new avenues for investigation. A major theoretical direction is to fully reconcile the approaches in [9,14]; the work in this paper suggests that this may be possible. On the applications side an important new direction would be to consider problems in which the geometric parameters are no longer independent from the fields a priori. A possible extension could be to treat the geometric parameters as hyperparameters for the fields under the prior. This would allow, for example, the fields to have specific boundary conditions at the interfaces, which may be more physically appropriate in some contexts. A related hierarchical model was considered in [29], in which prior samples were piecewise white; this could be extended to allow for spatial correlations in the continuum setting. Computationally an exciting direction is to build upon definitions of MAP estimators to develop hybrid algorithms which fully exploit local minimiser structure of the Onsager-Machlup functional within MCMC.
In this appendix we provide proofs of the results given in the paper.

A.1. Results from section 2
Before we prove lemma 2.5 we require the following lemma.
Now suppose that   f n L 1 does not tend to zero. Then there exists d > 0 and a subsequence for all  k 1. This subsequence still converges to zero in measure, and so admits a further subsequence that converges to zero almost surely. We can bound this subsequence above uniformly by f, and so an application of the dominated convergence theorem leads to a contradiction. The result follows. , and so we have the estimate , , Using that the A i are disjoint gives that Finally we deal with the * k -j terms: . Now as before we can bound k -1 † : , essinf e essinf e min min min .
Putting the above bounds together, we have The right hand goes to zero as each and so the continuity of  u, ( ·) follows from the assumed continuity of the maps A i . , Proof of proposition 2.7.
(i) Theorem 2.3 in [17] tells us that the mapping from the conductivity to the weak solution of (2.5) is Fréchet differentiable with respect to the supremum norm, and hence locally Lipschitz. Note that the mapping from the solution to the boundary voltage measurements, , is smooth, and the assumptions on σ imply that it is Lipschitz. It hence suffices to show that the mapping where the supremum is finite due the continuity of M 3 in its second component. Since Φ is also continuous in its second component, we see that the right-hand side tends to zero as  u a u a , , is continuous and m ń . This is a Caratheodory function, and it is known that these are jointly measurable, see for example [1].
Using the continuity of Φ and M 3 in a, we can maximise the right-hand side over < a r | | to deduce that  F u a y K r y , ; , . | ( )| ( ) Thus F y , ; (· · ) is bounded on bounded sets. Now we can proceed as in [10]. Using that m ń The first half of the proof is almost identical to that in [26], though some care must be taken since we cannot (a priori) separate the integrals over balls in´L X into products of those over balls in X and Λ. Using the Cameron-Martin theorem we see that where we have used the linearity of z c to extract δ from the supremum. As in [26], using a result from [34], a special case of the Gaussian correlation conjecture, it follows that for any  Î C and any convex set We now deal with the geometric parameters. Let * Î a S int( ) so that ρ is positive in a neighbourhood of a * (we may take * = a a 1 or a 2 since we assume they lie in S int( )). Then We conclude that m n m n The result now follows by the previous lemma. , Using assumptions 4.1(iv), we have that for any  This will be useful later for separating integrals.
Proof of theorem 5.2. We follow the idea of the proof of theorem 5.4 in [33], which is based on [7,19], and first show = F + + I J K is weakly lower semicontinuous on E×S. Let  u a u a , , n n ( ) (¯¯) in E×S. Since  Í S k , weak convergence of the second component is equivalent to strong convergence. Since m = X 1 0 ( ) , E is compactly embedded in X and so  u u n¯s trongly in X. In the proof of existence of the posterior distribution we showed that Φ is continuous on´L X , and so we deduce that F F u a u a , , n n ( ) ( ). Hence Φ is weakly continuous on E×S. The functional J is weakly lower semicontinuous on E and K is continuous on S, and so I is weakly lower semicontinuous on E×S.
Now we show I is coercive on E×S. Since E is compactly embedded in X there exists a > C 0 such that      u C u . (¯¯)S ince δ is arbitrary the first result follows. Now consider the subsequence  u a u a , , n n ( ) (¯¯). The convergence of  a a n¯i s strong, so all that needs to be checked is that  u u n¯s trongly in X. This follows from exactly the same argument as in the proof of theorem 5.4 in [33] (taking ā as the second parameter in I and Φ) and so the second result follows. , Before proving theorem 5.3 we first collect some results on centred Gaussian measures from [9], specifically lemmas 3.6, 3.7, and 3.9. For Î u X, let (i) We first show d d u a , ( ) is bounded in´L X . The boundedness of the second component is clear since S is bounded, so it suffices to show that d u ( ) is bounded in X. This is proved in the same way as in theorem 3.5 in [9]. In the proof of existence of the posterior measure, theorem 4.   ) and the bounds on Φ we have for δ small enough and some 5 α close to 1,