Surrogate Accelerated Bayesian Inversion for the Determination of the Thermal Diffusivity of a Material

Determination of the thermal properties of a material is an important task in many scientific and engineering applications. How a material behaves when subjected to high or fluctuating temperatures can be critical to the safety and longevity of a system's essential components. The laser flash experiment is a well-established technique for indirectly measuring the thermal diffusivity, and hence the thermal conductivity, of a material. In previous works, optimization schemes have been used to find estimates of the thermal conductivity and other quantities of interest which best fit a given model to experimental data. Adopting a Bayesian approach allows for prior beliefs about uncertain model inputs to be conditioned on experimental data to determine a posterior distribution, but probing this distribution using sampling techniques such as Markov chain Monte Carlo methods can be incredibly computationally intensive. This difficulty is especially true for forward models consisting of time-dependent partial differential equations. We pose the problem of determining the thermal conductivity of a material via the laser flash experiment as a Bayesian inverse problem in which the laser intensity is also treated as uncertain. We introduce a parametric surrogate model that takes the form of a stochastic Galerkin finite element approximation, also known as a generalized polynomial chaos expansion, and show how it can be used to sample efficiently from the approximate posterior distribution. This approach gives access not only to the sought-after estimate of the thermal conductivity but also important information about its relationship to the laser intensity, and information for uncertainty quantification. We also investigate the effects of the spatial profile of the laser on the estimated posterior distribution for the thermal conductivity.


Introduction
Many measurements are made indirectly: quantities of interest (QoIs) are estimated by measuring one or more other quantities and calculating the required values from a model that links the QoIs to the measured quantities. Examples include scatterometry, where surface profile is inferred from intensity measurements, determination of Young's modulus, where the measured quantities are force and displacement, and determination of thermal diffusivity from measurements of temperature and time.
Thermal diffusivity is defined as where λ is thermal conductivity, is density, and c p is specific heat capacity. Density and specific heat capacity can be measured independently, so measurement of thermal diffusivity is often used to evaluate thermal conductivity. Thermal conductivity is a key property in understanding heat transport in solids. Accurate characterisation of thermal conductivity supports design of thermal behaviour in products ranging from protective coatings for turbine blades in high temperature gas streams to insulation for low-temperature carriers for transport of vaccines. Reliable uncertainty evaluation of thermal conductivity enables designers to have confidence that their products will meet the required specification under all circumstances. In some cases the models used in indirect measurements are simple and can be inverted to express QoIs explicitly in terms of the measured quantities, but in many cases the models are not invertible (e.g. due to observational noise), so the problem of determining the QoIs is more challenging and we have to solve an inverse problem.
Inverse problems present difficulties in applying the approach to uncertainty evaluation outlined in the GUM [5] and its supplements. The difficulties become more severe when the model used to link the measured quantities and the QoIs is computationally expensive, making sampling methods too time-consuming to use. One remedy is to replace the expensive model with a cheaper one that is sufficiently accurate for the task at hand, called a surrogate model [30,Chapter 5] (also known as an emulator or metamodel). This paper reports on the application of an approach combining a parametric surrogate model with a Bayesian approach to uncertainty evaluation for the determination of thermal conductivity from the laser flash experiment. This approach has made a computationally expensive problem tractable.
In the rest of this section we describe the laser flash experiment (see Figure  1), present the mathematical model for the temperature of the material being tested and briefly discuss previous work by other authors. We also outline our approach, linking it to existing literature and methods from other areas. In Section 2, we pose the problem of determining the thermal conductivity and the laser intensity as a Bayesian inverse problem. We then describe how a Markov chain Monte Carlo (MCMC) algorithm that incorporates a surrogate forward model, here based on a stochastic Galerkin finite element method (SGFEM), can be used to efficiently probe the resulting approximate target distribution. In Section 3, we apply this methodology to a real life example where the sample material is a copper cylinder. We discuss the construction of the SGFEM surrogate, timings, the estimated joint and marginal distributions for the thermal conductivity and laser intensity, and examine how these results can be interpreted with regards to the experimental data. Finally, in Section 4 we draw conclusions and briefly describe how this work may be extended to more complex problems.
We note that the laser flash experiment was also considered in a Bayesian setting by Allard et al in [2]. Hence, since the set up of the physical experiment is similar, our discussion in Sections 1.1 and 1.2 below follows [2, Section 1].

The laser flash experiment
The thermal conductivity of a material is not a directly measurable quantity; its value is indirectly observed through the change in temperature of the material being characterised. The laser flash experiment [28] is a commonly used method for making such observations. In the experiment, the sample is placed in a furnace in a low-pressure inert gas atmosphere, supported by pins to minimise conductive heat losses, and heated to ambient temperature T a . The lower face is then subjected to a short laser burst of duration t f seconds, heating the sample on that face. The average temperature of the opposite face is measured by an infra-red sensor through a small window in the furnace. The experimental set up is shown in Figure 1.

Mathematical model
We assume that the sample is a cylinder C of radius R and vertical height H and work in cylindrical coordinates r = (r, θ, z). Furthermore, we assume the material is isotropic and homogeneous, with unknown scalar thermal conductivity λ and known scalar density and specific heat capacity c p . Let z f denote the depth of penetration of the laser into the sample, which lasts for t f seconds. Then the temperature of the material during the laser flash experiment is modelled by the solution u to the transient heat equation where the source term represents the laser flash with unknown intensity I. Here, the function χ : [0, R] → R is either constant, to represent an ideal laser of uniform profile, or of the form χ(r) = exp(−r 2 /2r 2 f ) to give a Gaussian profile in the radial direction as is the typical profile for lasers.
As stated in Section 1.1, the material is heated to the ambient temperature, yielding the initial condition We assume that the height H is sufficiently small that heat losses across the curved surface are negligible. Furthermore, we assume that heat is lost at the top and bottom faces at a rate proportional to the difference in temperature between the sample and the furnace. This assumption is valid for convection and radiation at small temperature differences, as are generated experimentally. Hence, we have the boundary conditions We also impose the condition These conditions, together with the assumptions on the source term and material properties of the sample, ensure that the problem is axisymmetric. That is, u is symmetric about the z-axis and does not depend on θ.

Previous approaches
Many previous approaches to determining the unknowns λ and I have used optimization (e.g., see [35,36]). In such works, values which are by some measure optimal are found by matching model output to experimental data. There are a number of drawbacks to such approaches. Firstly, the optimal values that are returned do not have associated uncertainties. The standard GUM [5] approach to calculating the uncertainty is unsuitable because the problem is an inverse one, and the problem is too computationally expensive for standard Monte Carlo simulation. Secondly, well-posedness of the problem in terms of uniqueness of the solution can be an issue, particularly in cases where there are many unknowns and there is little data to infer from. A Bayesian formulation of the problem [11,34] allows both of these issues to be resolved. Some previous work [2] has been done to treat the laser flash experiment in a Bayesian manner. However, the posterior distribution for λ and I that is provided (up to a constant of proportionality) by the Bayesian approach must be evaluated for one to compute quantities of interest such as probabilities, expectations and variances. Here, each evaluation of the posterior requires an evaluation of the solution u of (2)- (7). Since an exact solution is unavailable, u must be approximated using a finite element method combined with a timestepping scheme, a finite difference scheme or similar approach. We may exploit axisymmetry to reduce the spatial domain to two dimensions. However, depending on the choice of spatio-temporal discretization, and its level of fidelity, a single solve may still take tens of seconds, or minutes to perform on a standard desktop computer. If computational resources are limited, then it is infeasible to perform a large number of forward solves.
Markov chain Monte Carlo (MCMC) methods [7] are popular algorithms for sampling from a distribution that is known only up to a constant of proportionality. However, it is well documented that they suffer from a slow rate of convergence [8], with the sampling error behaving as O(M −1/2 ), where M is the number of samples used. The large number of samples required to estimate quantities of interest accurately, coupled with the expense of each posterior evaluation means that MCMC methods are often infeasible. This is especially true when the forward problem consists of a time-dependent partial differential equation (PDE).
We will use a pre-computed parametric surrogate for the forward solution u that can be cheaply evaluated for any choice of λ and I. This takes the form of a polynomial expansion and allows for fast sampling from the resulting approximate posterior distribution within an MCMC routine. As well as reducing the time required to solve the inverse problem by several orders of magnitude, we find that with the proper choice of polynomial degree, there is no significant loss of accuracy compared to the standard approach using a fixed spatio-temporal discretization.
The idea of combining surrogate models with MCMC methods to accelerate the solution of Bayesian inverse problems is becoming popular in groundwater flow modelling [12], electrical impedance tomography [15,18,23] and other applications [10,25,26].

Method
In this section we discuss a parametric form of the forward problem (2)-(7) and describe how its solution may be approximated using a stochastic Galerkin finite element method (SGFEM). We also formally introduce the Bayesian inverse problem of determining λ and I from experimental data. Finally, we explain how to combine the SGFEM surrogate with an MCMC method to approximate posterior distributions.

Parametric forward problem and SGFEM
We consider the laser intensity I as well as the thermal conductivity λ to be unknown. This is because the exact strength of the laser is not known and the effects of the laser depend on the heat absorption characteristics of the sample. Note that we could treat other model inputs (such as density or ambient temperature T a ) as unknown and propagate all measurement uncertainties through the model but choose not to here for clarity.
To construct the surrogate, we start by modelling λ and I as real-valued random variables. Specifically, we assume that λ and I may be expressed in terms of independent uniform random variables with mean zero and unit variance. That is, λ(ξ 1 ) = µ λ + ν λ ξ 1 , for some µ λ , µ I , ν λ , ν I ∈ R + (to be chosen) with For heterogeneous or layered materials, λ may be more accurately represented as a spatial random field [1]. If the material consists of layers, each with a distinct unknown value of λ, then a random field could be constructed using a linear combination of spatial indicator functions with uncertain coefficients (as used for example in [18]). Alternatively, if the mean and covariance function of the random field are known, a truncated Karhunen-Loève expansion [24] could be used. Although we do not consider λ and I to be spatially varying here, the approach outlined below can be used whenever λ and I are linear functions of a finite set of independent random variables with appropriate distributions.
To set up the parametric forward problem, we make a change of variable. Let y 1 and y 2 be the images of ξ 1 and ξ 2 , respectively. Then, y := (y 1 , is the parameter domain. Assuming that λ, I are written as in (8), we may rewrite (2)-(7) in the equivalent parametric form where We have chosen uniform random variables to ensure that weak formulations of the parametric model are well-posed. We must also choose the values of µ λ and ν λ in (8) such that realisations of λ are positive. Note also that it is physically meaningless to model either λ or I as a random variable with non-zero probability of taking a negative value (e.g. Gaussian).
The solution u to (10)- (14) is a function of the parameters y 1 and y 2 , as well as r and t. We will approximate it using a spectral stochastic finite element method [3,13] based on Galerkin approximation. Specifically, we combine finite element approximation [6] on the physical domain with global polynomial approximation on the parameter domain. For parabolic PDEs, the suitability of this approach has been analyzed in [27].
If we exploit axisymmetry, then the SGFEM approximation takes the form where φ i , i = 1, 2, . . . , n h , are standard piecewise linear finite element functions associated with a mesh of triangular elements of width h on the spatial domain The functions Ψ j , j = 1, 2, . . . , n k , are chosen to form a basis for the set of polynomials in y 1 and y 2 on Γ of total degree less than or equal to k. Hence, In particular, we use Legendre basis polynomials which are orthonormal with respect to the inner product where ρ(y) = (2 √ 3) −2 is the joint probability density function of ξ 1 and ξ 2 . We may then also write (15) as which is known as a polynomial chaos expansion [37]. To find the functions u ij (t) in (15) we solve the associated Galerkin equations [4] which leads to a coupled system of ODEs. For this, we use an implicit Euler scheme with uniform step size τ := T /n t . This yields a sequence of discrete approximations at the time steps τ n = nτ, n = 0, 1, . . . , n t , and requires the solution of n t linear systems of dimension n h n k . We will use the notation u hkτ to denote a continuous function of time which interpolates u n hkτ at each τ n . Once the coefficients u n ij in (20) have been computed, u n hkτ may be evaluated at any y ∈ Γ of interest by evaluating the polynomials Ψ j .

Bayesian inverse problem formulation
Suppose we wish to recover the unknown thermal conductivity λ given the (thermogram) results from a laser flash experiment with uncertain laser intensity. We also need to recover the laser intensity I since it directly affects the observed thermogram. Specifically, we study a cylindrical sample of pure copper. Let t 1 , t 2 , . . . , t n d denote the measurement times and D L denote a disc of radius L centrally positioned on the top face of the sample over which an average temperature is measured at each t n . Our data d ∈ R n d is the vector of observed averaged temperature values. The data used here was measured at NPL in 2004 on a Netzsch LFA427 using a sample of copper at ambient temperature T a = 385 K.
The Bayesian approach to this inverse problem is to condition our prior knowledge regarding λ and I on the measurements d. We may obtain this knowledge from historical information, expert elicitation or even previous Bayesian analysis using another data set. We assume that our measurements are subject to independent, identically distributed (iid), mean zero Gaussian noise, the variance σ 2 of which is also unknown. In this section, we choose a prior distribution, formulate the Bayesian inverse problem and construct the posterior distribution for the unknowns.

Prior distribution.
First, we assign a prior distribution to λ, which describes our belief about its value before incorporating the data d. Two previous studies of samples involving copper deduced the distributions N (379, 3.79 2 ) and N (278, 13 2 ). In the first case, the value of 379 W m −1 K −1 was obtained by directly fitting a model to data measured on a uniform sample of copper, and the estimated standard deviation of 3.79 W m −1 K −1 was based on the typical uncertainty for the measurement. In the second case, the value of 278 W m −1 K −1 was obtained from analysis of a layered sample that was half braze and half copper, and a Monte Carlo analysis was used to propagate several other measurement uncertainties  through a model to give the standard deviation of 13 W m −1 K −1 . The second estimate is considered to be less reliable because it is affected by assumptions made about the properties of the braze.
As high probability confidence intervals from the two stated Gaussian distributions do not overlap (see Figure 2a), we have conflicting beliefs about the value of the thermal conductivity. This leads us to be less certain about the value of the thermal conductivity than either of the previous studies suggests in isolation and makes construction of a prior both challenging and subjective. To combine the two results into a prior distribution, in a way which reflects this uncertainty, we could consider a Gaussian distribution with (averaged) mean µ λ := (379 + 278)/2 = 328.5 W m −1 K −1 and large standard deviation σ λ := 50 W m −1 K −1 . However, to ensure positivity, we will use as our prior and perform inference on where m λ , s λ are chosen so that E[λ] = µ λ and sd[λ] = σ λ under the prior. That is, our prior density on θ 1 is The effects of the choice of prior are discussed later in this section.
As we have no information about the value of the laser intensity I other than that it is positive, we choose the improper prior distribution U(0, ∞) with "density" π I 0 (I) ∝ 1 [0,∞) (I). For ease of sampling, we work with θ 2 := ln(I) and our prior "density" on θ 2 is therefore given by This is an example of an uninformative prior. Another common choice is the Jeffreys' prior [19], which is based on the Fischer information matrix [2,22]. We choose an inverse gamma prior distribution for σ 2 to ensure positivity and so that we can exploit it's conjugacy with the Gaussian distribution. That is, our prior density on σ 2 is given by where, here, Γ(·) is the gamma function. The hyper parameters α σ = 3 and β σ = 0.0079 are chosen so that all reasonable values of σ 2 have relatively high probability with respect to the prior distribution. Assuming (a priori) independence of λ, I and σ 2 (and therefore of θ 1 , θ 2 and σ 2 ), letting θ := (θ 1 , θ 2 ) we define our prior density on (θ, σ 2 ) to be the product of the three densities (23), (24) and (25). That is, π 0 (θ, σ 2 ) = π λ 0 (θ 1 )π I 0 (θ 2 )π σ 0 (σ 2 ).
We note that choosing a prior may be viewed as a form of regularization [20], allowing us to assign greater weight to values we deem more likely and less weight to those we believe less likely. Numerical investigations showed that for our problem the data d is sufficiently informative about the unknowns that the choice of prior is not important. That is, changing the prior distribution (including the hyper parameters) had little effect on the estimated posterior. These studies are not presented in this paper. However, experiments involving different choices of prior for similar inverse problems are given in [2,16].

Data, likelihood, posterior and target densities.
We assume that our thermogram data d are indirect observations of θ, subject to additive noise. That is, where G : R 2 → R n d is the observation operator, mapping values of θ into data and η is the noise vector. A diagonal covariance matrix Σ := σ 2 I n d is chosen as we assume individual measurements are subject to iid N (0, σ 2 ) noise. This yields a likelihood function L : R n d × R 2 × R + → R (describing the probability of observing d for a specific choice of θ and σ 2 ) given by where is the negative log-likelihood or so-called potential. The observation operator G is given explicitly by is the average temperature over the disc D L on the top face of the sample at time t and u(r, t; θ) denotes the solution to the deterministic forward problem (2-7) for a fixed choice of θ where (λ, I) = exp(θ) := (exp(θ 1 ), exp(θ 2 )). Note that (28) is simply the probability density function of a multivariate Gaussian distribution. Now, from Bayes' Theorem [21], the posterior density π d : R 2 × R + × R n d → R + (describing the probability of obtaining values for (θ, σ 2 ) given the data d) satisfies If we integrate out the dependence on σ 2 , we are left with our target density for θ given d. Exploiting the conjugacy of the inverse gamma and Gaussian distributions, we find that is the density function of a multivariate t-distribution with 2α σ degrees of freedom, mean vector G(θ) and shape matrix (β σ /α σ )I n d .
Determining the normalization constant for π gives the target density explicitly. Determination of any QoI involving λ and I (expectations, variances, probabilities etc.), requires the computation of some integral involving the density π. This task is non-trivial. One approach, which is particularly amenable to situations with a small number of unknowns and where the target density is a smooth function of the unknowns is Smolyak sparse grid quadrature [33]. An alternative approach, which we take here, is Markov chain Monte Carlo (MCMC) sampling [7]. MCMC methods allow us to sample from a distribution whose density is known only up to a constant of proportionality. Our specific approach is described in Section 2.2.4.

Approximating the target density.
Since we are unable to solve (2)-(7) exactly, we are unable to evaluate u in (31) for a given value of θ. When using an MCMC method, the standard approach is to approximate the forward operator on a 'case-by-case' basis. That is, for each proposed θ, a new spatio-temporal approximation is computed. We will refer to this as plain MCMC, as in [17]. For time-dependent and nonlinear problems, each forward solve usually requires the solution of a sequence of linear systems. This exacerbates the problem, and the cost of repeated forward solves can rapidly exceed a user's computational budget.
We propose an alternative approach where a surrogate is used to replace the repeated forward solves for different values of θ. In particular, we use a pre-computed SGFEM approximation u hkτ to the parametric forward problem (10)- (14). As explained in Section 2.1, once computed, this can be evaluated for any choice of the parameters y 1 and y 2 (corresponding to proposed values of λ = exp(θ 1 ) and I = exp(θ 2 )) without the need for any further linear system solves. We will refer to this approach as SGFEM MCMC.
Notice that y ∈ Γ if, and only if, ζ(y) ∈ Ω λ × Ω I . For values of θ for which (λ, I) ∈ Ω λ × Ω I , we replace G in (34) with the approximate observation operator given by G hkτ (θ) := ū hkτ (t 1 , ζ −1 (e θ )), . . . ,ū hkτ (t n d , ζ −1 (e θ )) , whereū This induces an approximate target density [9,11,34] given by where L σ hkτ is given by (34) with G replaced by G hkτ . Note that we use a number of time steps n t which is a multiple of (n d − 1) so that the measurement times t n are a subset of the time steps τ n and we need not interpolate in time to computeū hkτ (t n , ζ −1 (e θ )).
We have no guarantee that the surrogate is accurate outside of Ω λ × Ω I . Hence, for values of θ for which (λ, I) / ∈ Ω λ × Ω I (equivalently y / ∈ Γ), we would instead need to approximate L σ (d | θ) using a standard deterministic spatiotemporal discretization (as in the plain MCMC approach). In that case, we denote the induced approximate target density (defined analogously to π hkτ in (40)) by π hτ .
To ensure that we can evaluate u hkτ for as many values of λ and I proposed by the selected MCMC method as possible, we need to choose µ λ , ν λ , µ I and ν I in (8) appropriately. That is, the distributions used for λ and I in the construction of the surrogate should sufficiently cover the supports of the prior distributions selected for the Bayesian inference. The uniform distribution selected for λ is shown in Figure 2b. Should the MCMC propose an appreciable number of samples outside of the region in which the surrogate can be used, leading to a large number of significantly more expensive likelihood approximations (as in the plain MCMC approach), then it may be necessary to compute a new surrogate approximation which is accurate on a larger region of the state space. This was not necessary in any of the examples that we present in Section 3.

Markov chain Monte Carlo.
As mentioned in Section 2.2.3, we use an MCMC algorithm to sample from the approximate target distribution, with density π hkτ . For clarity of exposition, we use the simple Random Walk Metropolis Hastings (RWMH) [7] algorithm, although more advanced MCMC algorithms may be used, for example MALA [31] or HMC [14].
The RWMH algorithm proposes values of θ from a proposal distribution q which is symmetric around the current state (usually a Gaussian), and then accepts/rejects them in the so-called Metropolis step in such a way that the resulting samples are the states of a Markov chain with stationary density equal to the target density. This means that after a sufficient number of samples have been produced, say n B , states can be considered draws from the target distribution. Further post-processing of the chain, such as thinning (where only a subset of the states are retained), can be implemented in order to reduce the correlation between samples. Pseudocode for the algorithm used is given in Algorithm 1.
if u < p then The SGFEM solve is performed once outside the Monte Carlo loop. There are n h n k equations to solve per time step and the cost of each of these solves is O(n h n k ), provided an optimal solver (see [4,29]) is available. This is an offline computation. Recall that n k is defined in (17). Since k denotes the chosen polynomial degree (typically k = 6 is sufficiently high in this work), then n k is orders of magnitude smaller than n h and n t . The evaluation of the approximate observation operator G hkτ is made once per iteration (online) and has only a small cost. Specifically, since the observed quantity is averaged in space, evaluatingū hkτ in (39) for all n z measurement times only requires a matrix-vector product with a pre-computed matrix of size n z × n k with entries depending on the coefficients u n ij . In each iteration, we simply need to compute a vector of length n k containing the Legendre polynomials evaluated at the point y ∈ Γ corresponding to the proposed θ. The total cost of the SGFEM MCMC approach consists of the offline cost (building the surrogate) and the online cost (evaluating the surrogate). This is summarized in Table 1.
In the plain MCMC approach, we can use the same finite element discretization and time-stepping method as in the SGFEM approach. At each iteration, however, we need to perform a forward solve to compute a new spatio-temporal approximation and evaluate it at the proposed value of θ. The cost of each solve behaves as n t × O(n h ), assuming an optimal solver for the FEM linear systems is available. Note that we do not include the cost of evaluating the approximation in Table 1 as the solve is the dominant cost.
It is clear that so long as the number of MCMC samples is larger than n k (the dimension of the polynomial space used to construct the surrogate), then a saving will be made with the SGFEM MCMC approach. Due to the slow convergence of MCMC methods, M is typically of order 10 6 or higher whereas, since we have only two unknowns, n k only grows like O(k 2 ), where k is the chosen polynomial degree. Furthermore, as we save computational effort on every iteration, the larger M is, the greater the savings made.

Convergence and error.
The computational cost of the SGFEM approach, but also the accuracy, depends on our choice of discretization parameters. Since we use piecewise linear finite elements, the spatial approximation error for the forward problem can be expected to behave as O(h), where h is the mesh size. The implicit Euler method yields an error which behaves as O(τ ), where τ is the time step size. For parabolic PDEs, under certain simplifying assumptions, the error associated with the polynomial approximation on the parameter domain decays exponentially with respect to the polynomial degree k (see [27,38]). Finally, the Monte Carlo error is O(M −1/2 ), where M is the number of samples.
In Section 3, we demonstrate that the approximation to the target density π hkτ obtained by the SGFEM MCMC approach converges as the polynomial degree k is increased. In future work, using the framework in [9] and [17] we hope to show convergence in the Hellinger distance [34] of π hkτ to the true target density π as the forward approximation is improved by decreasing h and τ and increasing k. In fact, under certain conditions [9] on the approximation of G by G hkτ , the rate of convergence in the forward approximation may be preserved in that of the target density. This means that we are able to sample from a distribution with density close to π as long as the forward approximation is sufficiently accurate.

Results
A laser flash experiment was performed and temperature measurements were taken at n d = 401 equally spaced time points for the duration T = 40 milliseconds. The chosen values of all the model inputs are given in Table 2. Prior distributions are stated for the unknowns λ and I rather than fixed values. First, we use the SGFEM MCMC scheme to sample from the approximate target density π hkτ in the case of a uniform laser profile. That is, when χ(r) = 1 in the definition of the source term Q in (3) and (14). We comment on the speedup achieved over the plain MCMC approach and demonstrate numerical convergence with respect to the polynomial degree k. Second, we look at how the modelling of the laser profile affects the results of the Bayesian inference, comparing the results for the uniform profile to those obtained with two Gaussian-shaped profiles.

Sampling the approximate target distribution using SGFEM MCMC
To construct the surrogate u hkτ , we use a spatial mesh with n h = 12, 206 vertices and characteristic element size h = 4.24 × 10 −5 m. The number of implicit Euler time steps is set to n t = 800. After investigating the accuracy of the solution to the forward problem, the polynomial degree for the parametric approximation is chosen to be k = 6. The resulting polynomial space has dimension n k = 28. The time taken 1 to complete the offline stage was 378 seconds. The surrogate RWMH algorithm described in Algorithm 1 in Section 2.2.4 was run to generate 100 million samples from the approximate target distribution. The time taken to produce these samples was 13,223 seconds. We approximate marginal densities π λ hkτ (λ | d) (top left) and π I hkτ (I | d) (bottom right) produced using 100 million MCMC samples with SGFEM surrogate computed using n h = 12, 206, n t = 800 and k = 6. Uniform laser profile χ(r) = 1. discarded the initial 10,000 samples as burn-in and the acceptance rate was tuned to near the optimal value of 23% (see [32]) by choosing the proposal standard deviation β = 1.55. Histograms of the 100 million samples produced by Algorithm 1 were formed to obtain the approximate target density π hkτ (λ, I | d) and the approximate marginal densities π λ hkτ (λ | d) and π I hkτ (I | d) for λ and I, respectively. These are shown in Figure 3.
Combining the offline and online steps, the total time required to approximate the target distribution was 13,601 seconds (1.36 × 10 −4 seconds per sample). For a finer spatio-temporal discretization, the cost of the offline stage would increase but, crucially, the cost of the online stage would remain unchanged (see Table 1). Like all MCMC algorithms, this approach is highly parallelizable. Once u hkτ is computed, we could use it to produce multiple MCMC chains in parallel and combine the resulting samples appropriately. However, we did not do this here.
No experiment was performed with the plain MCMC method. Since a single forward solve costs around 35 seconds when we use the same spatio-temporal discretization, we estimate that the (CPU) time required to compute 100 million samples (again without exploiting parallelization) would be around 111 years! This clearly shows that the surrogate approach represents a huge time saving (here, a reduction of 5 orders of magnitude) in comparison to the plain MCMC approach. Even with a more modest 100,000 samples, the plain MCMC approach would take 40.5 days, compared to 391 seconds with the SGFEM ap-proach. With this smaller number of samples, the Monte Carlo error would far exceed the difference between the two approximate target densities π hτ and π hkτ .
The distributions shown in Figure 3 now tell us about the values of the unknowns λ and I (conditional on the data) and the uncertainty in these values. We also gain information about the relationship between λ and I. Note that this is a huge advantage over optimization approaches. We obtain a mean value of 355.15 W m −1 K −1 for λ and a mean value of 1.1816 × 10 12 W m −3 for I. The standard deviations are 1.10 W m −1 K −1 and 1 × 10 9 W m −3 , respectively. The mean value for λ lies within the support of the prior (see Figure 2b), but is different to the prior mean µ λ = 328.5 W m −1 K −1 . Moreover, we observe that the standard deviation is much reduced from the prior value σ λ = 50 W m −1 K −1 .
From Figure 3 we see that there is negative correlation between λ and I in the target distribution. That is, a larger value of the thermal conductivity λ requires a smaller value of the laser intensity I (and vice versa) for the model output to be as consistent with the data as possible. The computed value of the correlation coefficient is -0.48. This highlights the importance of performing inference on both unknowns simultaneously, as fixing the value of one amounts to considering the conditional distribution for the other. The approximate densities of two such conditional distributions are displayed in Figure 4, along with the approximate marginal target density for λ. Observe that the three densities indicate very different likely values of λ. Note also that fixing a lower (resp. higher) value of the laser intensity I results in a larger (resp. smaller) likely value of λ being indicated. The histograms used to approximate the conditional densities are far less converged than those for the marginals as there are fewer samples which may be used to construct them. Conditional distributions for larger or smaller values of I than those used here did not have sufficient samples to create histograms, but would result in more severe differences in the distributions on λ. Correlation between λ and I, and hence the need for joint inference, has not previously been considered for the data set considered here.
In Figure 5, we show the results of a numerical convergence study. We plot the approximate marginal target density for λ obtained using a varying value of k in the construction of the surrogate. The spatio-temporal discretization is fixed as before. For k ≥ 6, the densities produced are almost indistinguishable visually. The approximation converges as k increases. Similar behaviour is witnessed when the discretization parameters h and τ are decreased.
In Figure 6, we compare the experimental data d with a model thermogram computed by evaluating the approximate observation operator G hkτ in (38) at the approximate target distribution mean (computed with k = 6). We observe that, modelling the laser profile as uniform (i.e., χ(r) = 1), a very good fit to the data has been achieved. This could potentially be improved further by treating other model inputs as unknown (such as the heat transfer coefficient κ in the boundary condition) and inferring their values from the data too.     Figure 6: Thermogram of experimental data d and model thermograms for three different laser profiles computed by evaluating the observation operator G hkτ at the approximate target distribution mean obtained using 100 million MCMC samples and SGFEM surrogate computed using n h = 12, 206, n t = 800 and k = 6.

Varying the laser profile
In this section, we consider the effect on our inference results of using a Gaussianshaped profile to describe the laser flash, instead of a uniform profile. That is, we now set χ(r) := exp(−r 2 /2r 2 f ) in (14), rather than χ(r) = 1 as in Section 3.1. By varying the parameter r f in (41), we are able to control how quickly the laser intensity in the model decays as the distance from the centre of the sample increases. We repeat the experiment in Section 3.1, now using the values r f = R and r f = R/3 to represent lasers whose profiles decay slowly and rapidly, respectively. The resulting estimates of the marginal target distribution means and standard deviations for λ are displayed in Table 3. Note that such investigations would be impossible without the rapid sampling provided by the SGFEM surrogate approach. We see that the profile of the laser has a significant effect on the reported values. The mean value of λ decreases as the laser becomes more focused at the centre of the sample. This indicates that checking the uniformity of the laser is important for accurate measurement and that uncertainties about the laser profile shape should be built into the model. Note that the computed values for I are not directly comparable. Figure 6 shows model thermograms computed by evaluating G hkτ at the target distribution mean for all three laser profiles. It is clear that a good level of fit to the experimental data has been obtained when using the uniform laser profile or the flatter Gaussian profile (r f = R). When using the rapidly decaying Gaussian profile however, the target distribution mean gives a bad fit to the data. This indicates that modelling the laser this way is incorrect and that the experiment has been performed in such a way that the laser profile is actually close to the ideal uniform shape. The bad fit observed for the sharply decaying Gaussian profile is indicative of the fact that in this model the heat energy travels rapidly through the centre of the sample and then slowly spreads out in the radial direction. The simulated temperature does not reach a steady state over the time scale of the experiment. Figure 7 shows the approximate temperature at the spatial locations (r, z) = (R, H) and (r, z) = (0, H) obtained by evaluating u hkτ at the target density mean for the duration of the experiment. At the point (r, z) = (0, H), the temperature increases rapidly initially and is then is still cooling at time T . However, at (r, z) = (R, H), the temperature increases more slowly and is still increasing at time T .
Note that, for each laser profile considered, there is a discrepancy at the early times where the measured temperature spikes. This is due to the fact that the laser flash itself is detected by the sensor. This feature has not been incorporated into the model.

Conclusions and future work
In this paper we formulated the determination of the thermal diffusivity (and hence the conductivity) of a material given laser flash experiment data as a Bayesian inverse problem. We demonstrated how an SGFEM surrogate may be used to accelerate MCMC sampling from an approximate target distribution, in a real life situation where the plain MCMC approach is infeasible. We explained how posing the problem as a Bayesian inverse problem allows for clear quantification of the uncertainty in the estimate of the thermal conductivity. We used this approach to find the joint distribution on the thermal conductivity and the laser intensity. We observed a correlation in the posterior between these unknowns, which indicates that incorrect estimates of the laser intensity without joint inference can lead to biased estimates of the thermal conductivity. A numerical convergence study was presented, showing the effect of varying the discretization parameters in the construction of the surrogate. Due to the speedup over plain MCMC provided by the SGFEM MCMC approach, we have been able to investigate the effect of the laser profile on the inference results. In particular, we have shown that the spatial uniformity of the laser profile strongly affects the value of the estimated thermal conductivity. This demonstrates the importance of both accurately modelling the laser flash and correctly setting up the physical experiment.
Combining a parametric surrogate with MCMC sampling offers a computationally efficient way to solve inverse problems involving PDEs. However, we caution that the appropriate choice of surrogate is problem-dependent. For problems where the forward solution is a non-smooth function of the parameters chosen to represent the uncertain inputs, the standard SGFEM approach adopted here is not recommended. Moreover, for problems where the solution becomes more nonlinear over time, using a fixed polynomial basis for all time steps, as has been done here, is not a sensible approach. When a surrogate is chosen, its accuracy with respect to the solution of the forward problem should first be investigated before using it to accelerate the numerical solution of an inverse problem.
In future, we shall consider how the multi-thermogram approach of [2] can similarly be incorporated into the SGFEM MCMC framework (for single and multi-layered materials) through simultaneous evaluation of surrogates for multiple forward problems. We will also incorporate more uncertain inputs into the model, for example the heat transfer coefficient κ in the definition of the boundary condition.