A Statistical Approach to Surface Metrology for 3D-Printed Stainless Steel

Surface metrology is the area of engineering concerned with the study of geometric variation in surfaces. This paper explores the potential for modern techniques from spatial statistics to act as generative models for geometric variation in 3D-printed stainless steel. The complex macro-scale geometries of 3D-printed components pose a challenge that is not present in traditional surface metrology, as the training data and test data need not be defined on the same manifold. Strikingly, a covariance function defined in terms of geodesic distance on one manifold can fail to satisfy positive-definiteness and thus fail to be a valid covariance function in the context of a different manifold; this hinders the use of standard techniques that aim to learn a covariance function from a training dataset. On the other hand, the associated covariance differential operators are locally defined. This paper proposes to perform inference for such differential operators, facilitating generalisation from the manifold of a training dataset to the manifold of a test dataset. The approach is assessed in the context of model selection and explored in detail in the context of a finite element model for 3D-printed stainless steel.


Introduction
The rate at which new manufacturing technologies are being developed and the range of components that can now be produced pose a challenge to formal standardisation, which has traditionally formed the basis for engineering design and assessment. An important motivation for new technologies is to trade the quality of the component with the speed and cost at which it can be manufactured. This trade-off results in the introduction of imperfections or defects, in a manner that is intrinsically unpredictable and could therefore be described as statistical. As such, there is a need to develop principled and general statistical methods that can be readily adapted to new technologies, in order to obtain reliable models as the basis for formal standards in the engineering context.
The recent emergence of additive manufacture technologies for stainless steel promises to disrupt traditional approaches based on standardised components and to enable design of steel components of almost arbitrary size and complexity (Buchanan and Gardner, 2019). This directed energy deposition technology arises from the combination of two separate technologies; traditional welding modalities and industrial robotic equipment (ASTM Committee F42, 2021). In brief, material is produced in an additive manner by applying a layer of steel onto the surface of the material in the form of a continuous weld. To ensure that this process is precisely controlled, the manufacture is performed by a robot on which the weld head is mounted. The complexity of this process is such that the geometric (e.g. thickness and volume) and mechanical (e.g. stiffness and strength) properties of additively manufactured (henceforth "3D-printed") steel are not yet well-understood. Indeed, conventional models for welding focus on the situation of a localised weld that is typically intended to bind two standard components together. The material properties of elementary welds, as a function of the construction protocol and ambient environment, have been modelled in considerable detail (Kou, 2003). In contrast, 3D-printed steel is a single global weld whose construction protocol is highly non-standard. Furthermore, the limited precision of the equipment results in imperfections in the material geometry, which can range in severity from aesthetic roughness of the surface of the material through to macroscale defects, such as holes in the printed component. These imperfections occur in a manner that is best considered as random, since they cannot be easily predicted or controlled.
The variation in the manufactured components precludes the efficient use of raw material, as large factors of safety are necessarily employed. An important task is therefore to gain a refined understanding of the statistical nature of 3D-printed steel, enabling appropriate factors of safety to be provided for engineering design and assessment (Buchanan et al., 2017). It is anticipated that geometric variation, as opposed to mechanical variation, is the dominant source of variation in the behaviour of a 3D-printed thin-walled steel component under loading. The focus of this work is therefore to take a statistical approach to surface metrology, with the aim to develop a generative model for geometric variation in 3D-printed steel. Specifically, the following problem is considered: Problem Statement. Given the notional and actual geometries of a small number of 3D printed steel components, construct a generative statistical model for the actual geometry of a 3D-printed steel component whose notional geometry (only) is provided.
For example, in Section 4.1 of this paper a statistical model is used to generate realistic simulations of geometric variation that might be encountered on a 3D-printed steel cylinder, based only on a training dataset of notionally flat panels of 3D-printed steel; see Figure 1.

A Brief Review of Surface Metrology
Surface metrology concerns small-scale features on surfaces and their measurement (Jiang et al., 2007b,a). The principal aim is to accurately characterise the nature of the surface of a  Table 1: Summary statistics traditionally used in surface metrology; here z : [0, L] → R is the profile through the surface of the material and µ is its average or intercept as per (1). Adapted from Chapters 7 and 8 of Thomas (1999).
given type of material, in order that such material can be recreated in simulations or otherwise analysed. Typical applications of surface metrology are at the nanometer-to-millimeter scale, but the principles and methods apply equally to the millimeter-to-centimeter scale variation present in 3D-printed stainless steel. The first quantitative studies of surfaces can be traced back to the pioneering work of Abbot and Firestone (1933), and concern measurement techniques to produce one-dimensional surface cross-sections (or "profiles"), denoted z : [0, L] → R, and the subsequent description of profiles in terms of a small number of summary statistics, such as amplitude and texture statistics (Thomas, 1999). Typical examples of traditional summary statistics are presented in Table 1. The use of summary statistics for surface profiles continues in modern surface metrology (e.g. as codified in standard ISO/TC 213, 2015). These include linear filters, for example based on splines and wavelets (Krystek, 1996), and segmentation filters that aim to partition the profile into qualitatively distinct sections (Scott, 2004). See Jiang et al. (2007a).
Although useful for the purposes of classifying materials according to their surface characteristics, these descriptive approaches do not constitute a generative model for the geometric variation in the material. Subsequent literature in surface metrology recognised that sur-faces contain inherently random features that can be characterised using a stochastic model (Patir, 1978;DeVor and Wu, 1972). In particular, it has been noted that many types of rough surface are well-modelled using a non-stationary stochastic model (see e.g. Majumdar and Tien, 1990). Traditional stochastic models for surface profiles fall into two main categories: The first seeks to model the surface profiles as time-series, typified by the autoregressive moving average (ARMA) model introduced in this context in DeVor and Wu (1972) and extended to surfaces with a non-Gaussian height distribution by Watson and Spedding (1982). An ARMA(p, q) model combines a pth order autoregressive model and a qth order moving average where z i = z(t i ) represents a discretisation of the surface profile and the i are residuals to be modelled. These techniques have been applied to isotropic surfaces. Patir (1978) have noted that the theory is less well-understood regarding anisotropy and pointed out that observations of surface profiles in differing directions may be used to establish anisotropic models. This represents a significant weakness where simulation of rough surfaces are concerned, as it becomes unwieldly and complex to simulate materials with a layered surface such as 3Dprinted steel. Patir (1978) introduced a second class of methods, wherein a two-dimensional surface is represented as a N × M matrix of heights (z i,j ), assumed to have arisen as a linear transform of a (N + n) × (M + m) random matrix (η i,j ) according to where the coefficients (α k,l ) are selected in order to approximate the measured autocorrelation function for the z i,j . The surface heights are often assumed to be Gaussian as the theory in this setting is well-understood. A number of authors have developed this method: A fractal characterisation was explored in Majumdar and Tien (1990), recognising that the randomness of some rough surfaces is multiscale and exhibits geometric self-similarity. Hu and Tonder (1992) proposed the use of the fast Fourier transform on the autocorrelation function to facilitate rapid surface simulation; see also Newland (1980) and Wu (2000). A possible drawback with this second class of approaches, compared to the former, is that it can be more difficult to account for non-stationarity compared to e.g. the ARMA model. The use of wavelets to model non-stationary and multi-scale features has received more recent attention, see Jiang et al. (2007b). Recent research in surface metrology concerns the simulation of rough surfaces for applications in other fields, including tribology (Wang et al., 2015), electronics (Shen et al., 2016) and wave scattering (Choi et al., 2018). Typically the approaches above are used directly, but alternative methods are sometimes advanced. For instance, Wang et al. (2015) proposed an approach utilising random switching to generate Gaussian surfaces with prescribed statistical moments and autocorrelations. Clearly it is necessary to specialise any statistical model to the specific material under study. For example, in the case of 3D-printed steel an approach ought to be taken which is able to natively account for anisotropy due to layering of the steel. More crucially in the present context, existing methods pre-suppose that surface profiles (z i ) or height matrices (z i,j ) are pre-defined with respect to some regular Cartesian grid on a Euclidean manifold. In applications to 3D-printed steel, where components have possibly complex macroscale geometries, the training and test data can belong to different manifolds and it is far from clear how an existing model, once fitted to training data from one manifold, could be used to simulate realistic instances of geometric variation on a different manifold.
An important application of surface metrology is surface quality monitoring; a classification task where one seeks to determine, typically from a limited number of measurements and in an online environment, whether the quality of a manufactured component exceeds a predefined minimum level (Zang and Qiu, 2018a,b;Jin et al., 2019;Zhao and Del Castillo, 2021). Existing research into quality monitoring has exploited Gaussian process models to reconstruct component geometries from a small number of measurements (Xia et al., 2008;Colosimo et al., 2014;Del Castillo et al., 2015), possibly at different fidelities Ding et al., 2020). Going further, performance metrics arising from surface metrology have been used as criteria against which the parameters of a manufacturing protocol can be optimised (Yan et al., 2019).

Outline of the Contribution
The aim of this paper is to explore whether modern techniques from spatial statistics can be used to provide a useful characterisation of geometric variation in 3D-printed stainless steel.
To provide a training dataset, a high-resolution laser was used to scan 3D-printed stainless steel panels, on whose surfaces the geometric variation of interest is exhibited. A typical panel is displayed in Figure 2. To characterise geometric variation in the panel dataset, in Section 2 a library of candidate models is constructed, rooted in Gaussian stochastic processes, and in Section 3 formal parameter inference and model selection are performed to identify a most suitable candidate model. Although quite natural from a statistical perspective, this systematic approach to parameter inference and model selection appears to be novel in the surface metrology context. The high-dimensional nature of the dataset introduces computational challenges that are familiar in spatial statistics and the stochastic partial differential equation (SPDE) techniques described in Lindgren et al. (2011) are leveraged to make the computation practical.
The aim is to characterise geometric variation not just on panels but on components whose notional geometry, represented abstractly by a two-dimensional manifold M embedded in R 3 , may be complicated. (The case study in Section 4.1 considers the case where M is cylindrical which, whilst not complicated, differs from a flat panel in a fundamental topological way.) This variation, present in the actual geometry of a 3D-printed component, is described by a two-dimensional vector field u : M → R 2 . As illustrated in Figure 3, the value of the first coordinate u (1) (x) of the vector field represents the Euclidean distance of the upper surface of the component from the manifold and similarly the value of u (2) (x) represents the distance of the lower surface of the component from the manifold. The absence of geometric variation corresponds to the functions u (1) (x) and u (2) (x) being constant, but actual 3Dprinted steel components exhibit geometric variation that can be statistically modelled. The need to consider diverse notional geometries, represented by different manifolds, precludes conventional parametric (Rasmussen and Williams, 2006), compositional (Duvenaud, 2014) and nonparametric (Bȃzȃvan et al., 2012;Oliva et al., 2016;Moeller et al., 2016) approaches that aim to infer an appropriate covariance function defined only on the manifold of the training dataset. Indeed, it is well-known that a radial covariance function ϕ(d M 1 (x, y)) defined on one manifold x, y ∈ M 1 with geodesic distance d M 1 can fail to be a valid covariance function when applied as ϕ(d M 2 (x, y)) in the context of a different manifold x, y ∈ M 2 with geodesic distance d M 2 . This occurs because positive-definiteness can fail; see Section 4.5 of Gneiting (2013). The key insight used in this paper to generalise across manifolds is that the Laplacian ∆ M 1 on M 1 can be associated with the Laplacian ∆ M 2 on M 2 . In doing so a suitable differential operator on M 1 , describing the statistical character of the vector field u via an SPDE, can be inferred and then used to instantiate an equivalent differential operator on M 2 , defining a Gaussian random field on M 2 via an SPDE. This allows, for instance, the simulation of high-resolution random surfaces on a cylinder whose local characteristics are related to the panel training dataset. (Note that this paper does not consider the question of when the covariance differential operator / SPDE approach defines a true random field rather than a generalised random field. The practical consequences of this are minimal, however, since the random field will be discretised by projection onto a finite element basis whose coefficients are well-defined linear functionals of the random field.) In Sections 4.1 and 4.2 it is demonstrated, using a held out test dataset consisting of a 3Dprinted steel cylinder, that random surfaces generated in this way can capture some salient aspects of geometric variation in the material. However, the conclusions of this paper are tempered by the substantial computational challenges associated with the fitting of model parameters in this context. The paper concludes with a brief discussion in Section 5.
This research forms part of a wider effort to develop realistic computer modelling tools for applications involving 3D-printed steel; these must also account also for other factors such as mechanical variation and aspects of the printing protocol (Dodwell et al., 2021). Specifically, other controllable factors that affect the quality of the manufactured component include the angle of the weld head, the speed of printing and the order in which the steel layers are printed. It is possible that an improved understanding of the nature of geometric variation can inform the improvement of manufacturing techniques and their protocols (Barclift and Williams, 2012;Franco et al., 2017), but these directions are beyond the scope of the present paper, which aims only to characterise the geometric variation in the material.

Statistical Models for Geometric Variation in 3D-Printed Steel
This section proceeds as follows: Section 2.1 contains the main conceptual contribution, that identification of Laplacians can be used to generalise from the training manifold to a test manifold. Section 2.2 presents the mathematical setting of a two-dimensional Gaussian random field characterised by a differential operator and Section 2.3 presents the resulting collection of candidate models.

Gaussian Models and Generalisation via the Laplacian
In this paper, geometric variation for a 3D-printed component with notional geometry M is modelled using a Gaussian stochastic process. Letting E and C denote expectation and covariance operators, set where the covariance function k M ((x, r), (y, s)) describes the statistical relationship between the upper and lower surfaces (r, s ∈ {1, 2}) at different spatial locations (x, y ∈ M). To describe geometric variation in the panel datasets, whose notional flat geometry is denoted M 1 , any existing method for estimating a suitable covariance function k M 1 from the dataset could be employed. Such methods are well-developed and examples include maximum likelihood estimation over a parametrised class of covariance functions (Rasmussen and Williams, 2006), inference based on cross-validation with arbitrarily complicated compositions of simple kernels (Duvenaud, 2014) and even fully nonparametric inference of a covariance function (Bȃzȃvan et al., 2012;Oliva et al., 2016;Moeller et al., 2016). However, in doing so one would be learning a covariance function that is defined on the manifold M 1 of the training dataset, being the Green's function of a stochastic differential equation defined on M 1 (see e.g. Fasshauer and Ye, 2013). As a consequence, one cannot expect a covariance function learned from the training dataset to carry the same meaning when applied on the manifold M 2 of the test dataset. Indeed, it is a classic result that the Matérn kernel ϕ on R 3 with d R 3 (x, y) = x − y can fail to be positive definite when applied as ϕ(d S 2 (x, y)) on the sphere x, y ∈ S 2 ⊂ R 3 with d S 2 the geodesic distance on S 2 (see e.g. Section 4.5 of Gneiting, 2013). Thus inferential approaches based on the covariance function are unsuitable in this context, since a description is required of geometric variation on a range of manifolds that includes potentially complicated manufacturing geometries.
The key insight in this paper is that, whilst covariance functions do not generalise across manifolds, their associated differential operators are only locally defined. In particular, a wide range of models for stochastic vector fields u can be cast as the solutions of a coupled system of SPDEs based on a Laplacian-type differential operator. The specific form of the differential operator, together with suitable boundary conditions, determine the statistical properties of the associated random field. The Laplacian ∆ M 1 on the manifold M 1 of the training dataset can be associated with the Laplacian ∆ M 2 on the manifold M 2 where generative models are to be tested. This provides a natural way in which to decouple the statistical properties of the random vector field u from the manifold on which it is defined. To sound a cautionary note, this assumption is likely to break down in situations where one of the manifolds is highly curved. For example, it is conceivable that the protocols used to control weld head movement may be less effective when the required trajectories are highly curved, leading to higher levels of geometric variation in corresponding regions of the component. As a more extreme example, if two distinct regions of the manifold come within millimetres due to the manifold curving back on itself when embedded in 3D, then through stochastic fluctuation there is a potential for the two corresponding regions of the printed component to become physically fused. The first of these phenomena could be handled by considering Gaussian models that allow covariances to be both position-and curvaturedependent, while an entirely different approach would be needed to model accidental fusing of distinct regions of a component. Neither of these extensions are pursued in the present work.
The SPDE formulation of Gaussian random fields has received considerable attention in the spatial statistics literature following the seminal work of Lindgren et al. (2011), generating an efficient computational toolkit which has been widely used (e.g. Coveney et al., 2019). Interestingly, the present account appears to describe the first instance of transfer learning using the SPDE framework, where observations made on one manifold are used to provide information on the equivalent stochastic process defined on a different manifold. As noted earlier, no attempt is made to address the technical issue of whether the covariance operator corresponds to a true random field rather than a generalised random field.

Mathematical Set-Up
The purpose of this section is to rigorously set up a coupled SPDE model for a random vector field u on a manifold M. For completeness, the required regularity assumptions are stated in Section 2.2.1; these are essentially identical to the presentation in Lindgren et al. (2011) and may be safely skipped if desired. The coupled SPDE model is introduced in Section 2.2.2, following a similar presentation to that of Hu et al. (2013a,b).

Regularity Assumptions
The domain of the printing process is R 3 and throughout the paper the convention is used that the printing occurs in the direction of the z-axis of R 3 . That is, each layer of steel is notionally perpendicular to the z-axis according to the printing protocol. The notional geometry of a component is described by a manifold M embedded in R 3 . Below are listed some strong regularity conditions on M that ensure the coupled SPDE model in Section 2.2.2 is well-defined.
Let M be a 2-dimensional manifold embedded in R 3 , so that the Hausdorff measure on M, denoted µ M , can be defined, and thus an inner product and an associated space L 2 (M) of functions φ : M → R for which φ, φ M < ∞. It is assumed that M is endowed with a Riemannian structure, induced by the embedding in R 3 , which implies that one can define differential operators including the gradient ∇ M , the Laplacian ∆ M and the normal derivative ∂ M,n on M. It is assumed that M is compact (with respect to the topology induced by R 3 ), which implies the existence of a countable subset can be taken as a basis for L 2 (M). From compactness it follows that M is bounded and it is further assumed that M has a boundary ∂M that is a piecewise smooth 1-dimensional manifold. (From a practical perspective, the assumption that the manifold has a boundary is not restrictive, since manifolds without boundary, such as the sphere, are likely to be challenging to realise as 3D-printed components.) The induced measure on ∂M is denoted µ ∂M . In the sequel the manifold M is fixed and the shorthand ∆ = ∆ M is used, leaving the dependence of the Laplacian on M implicit.

Two-Dimensional Gaussian Random Fields
Hu et al. (2013a,b) considered using coupled systems of SPDEs to model random vector fields, building on the constructions in Lindgren et al. (2011). In this work a similar coupled SPDE construction is used, defined next.
Let Ω be an underlying probability space, on which all random variables in this paper are defined, and let ω → Z (r) (x; ω), r ∈ {1, 2} be independent, centred, real-valued (true or generalised) random fields, whose distribution is to be specified. Following standard convention, the argument ω ∈ Ω is left implicit throughout. Let L (rs) , r, s ∈ {1, 2}, be differential operators on M, to be specified, and consider then the coupled system of SPDEs whose specification is completed with Neumann boundary conditions on ∂M. The system in (2) is compactly represented in matrix form as Lu = Z. The distribution of u, the vector field solution of (2), is therefore fully determined by the specification of the matrix differential operator L and the distribution of the random field Z. In particular, the components u (r) , r ∈ {1, 2} are independent if L (12) = L (21) = 0 and if Z (1) and Z (2) are independent. Suitable choices for L and Z are discussed next.

Candidate Models
A complete statistical description for the random vector field u follows from specification of both the differential operator L and the noise process Z. Sections 2.3.1-2.3.4 present four candidate parametric models for L, then Sections 2.3.5-2.3.7 present three candidate parametric models for Z. Thus a total of 12 candidate parametric models are considered for the random vector field u of interest. It is important to acknowledge that the SPDE framework does not currently enjoy the same level of flexibility for modelling of complex structural features as the covariance function framework. This is for two reasons; first, the indirect relationship between an SPDE and its associated random field presents a barrier to the identification of a suitable differential operator to represent a particular feature of interest and, second, each SPDE model requires the identification or development of a suitable numerical (e.g. finite element) method in order to facilitate computations based on the model. A discussion of the numerical methods used to discretise the SPDEs in this work is reserved for Appendix A.1, with model-specific details contained in Appendix A.2. For these two reasons, the aim in the sequel is not to arrive at a generative model whose realisations are indistinguishable from the real samples of the material. Rather, the aim is to arrive at a generative model whose realisations are somewhat realistic in terms of the salient statistical properties that determine performance in the engineering context. The models were therefore selected to exhibit a selection of physically relevant features, including anisotropy, oscillatory behaviour, degrees of smoothness and non-stationarity of the random field. It is hoped that this paper will help to catalyse collaborations with the surface metrology community that will lead in the future to richer and more flexible SPDE models for geometric variation in material.
The candidate models presented next are similar to those considered in Lindgren et al. (2011), but generalised to coupled systems of SPDEs in a similar manner to Hu et al. (2013b,a). Note that formal model selection was not considered in Lindgren et al. (2011) or Hu et al. (2013a,b).

Isotropic Stationary
For the differential operators L (rs) , r, s ∈ {1, 2}, the simplest candidate model that considered is where η (rs) , τ (rs) > 0 are constants to be specified. This model is stationary, meaning that the differential operator L (rs) does not depend on x, and isotropic, meaning that it is invariant to rotation of the coordinate system on R 3 . The differential order of the differential operator in (3) is fixed, which can be contrasted with Lindgren et al. (2011) where powers (η − ∆) α/2 , α ∈ N, of the pseudo-differential operator were considered. The decision to fix α = 2 was taken because (a) the roughest setting of α = 1 was deemed not to be realistic for 3D-printed steel, since the solutions u to the SPDE are then not well-defined, and (b) overparametrisation will be avoided by controlling smoothness of the vector field u through the smoothness of the driving noise process Z, rather than through the differential order of L. The model parameters in this case are therefore θ = {τ (rs) , η (rs) } r,s∈{1,2} . To reduce the degrees of freedom, it was assumed that η (12) = η (21) , η (11) = η (22) , τ (12) = τ (21) and τ (11) = τ (22) . This encodes an exchangeability assumption on the statistical properties of the upper and lower surfaces of the material. Such an assumption seems reasonable since the distinction between upper and lower surface was introduced only to improve the pedagogy; in reality such a distinction is arbitrary.

Anisotropic Stationary
The visible banding structure in Figure 2b, formed by the sequential deposition of steel as the material is printed, suggests that an isotropic model is not well-suited to describe the dataset. To allow for the possibility that profiles parallel and perpendicular to the weld bands admit different statistical descriptions, the following anisotropic stationary model was considered: Here (4) differs from (3) through the inclusion of a diffusion tensor H (rs) , r, s ∈ {1, 2}, a positive definite 3 × 3 matrix whose elements determine the nature and extent of the anisotropy being modelled. The model parameters in this case are θ = {η (rs) , τ (rs) , H (rs) } r,s∈{1,2} . To reduce the number of parameters, it was assumed that H (rs) ≡ H where H is a 3 × 3 diagonal matrix, so that the nature of the anisotropy is the same for each of the four differential operators being modelled. Moreover, anisotropy was entertained only in the direction of printing (the z-axis of R 3 ) relative to the non-printing directions, meaning that the model is in fact orthotropic with H = diag(h 1 , h 1 , h 2 ) for as yet unspecified h 1 , h 2 > 0. As before, an exchangeability assumption on the upper and lower surfaces of the material was enforced, so that η (12) = η (21) , η (11) = η (22) , τ (12) = τ (21) and τ (11) = τ (22) .

Isotropic Non-Stationary
A more detailed inspection of the material in Figure 2b reveals that (at least on the visible surface) the vertical centre of the panel exhibits greater variability in terms of surface height compared to the other portions of the panel. It can therefore be conjectured that the printing process is non-stationary with respect to the direction of the z-axis, in which case this nonstationarity should be encoded into the stochastic model. To entertain this possibility, a differential operator of the form was considered, which generalises (3) by allowing the coefficients η (rs) and τ (rs) to depend on the z-coordinate x 3 of the current location x on the manifold. Here it was assumed that and some functions τ , η, to be specified, whose scale is fixed. The nature of the non-stationarity, as characterised by τ and η, will differ from printed component to printed component and this higher order variation could be captured by a hierarchical model that can be considered to have given rise to each specific instance of the τ, η.
Recent work due to Roininen et al. (2019) considered the construction of non-stationary random fields that amounted to using a nonparametric Gaussian random field for both τ and η. However, this is associated with a high computational overhead that prevents one from deploying such an approach at scale in the model selection context. Therefore the functions τ and η were instead parametrised using a low order Fourier basis, for τ taking where the γ (·) τ are parameters to be specified. The units of x 3 are millimeters, so (5) represents low frequency non-stationarity in the differential operator. For η an identical construction was used, with coefficients denoted γ η , joint learning of these parameters across the panel datasets was not attempted. This can be justified by the fact that we have only 6 independent samples, drastically limiting the information available for any attempt to learn a hierarchical model from the dataset.

Anisotropic Non-stationary
The final model considered for the differential operator is the natural combination of the anisotropic model of Section 2.3.2 and the non-stationary model of Section 2.3.3. Specifically, the differential operator x ∈ M was considered, where the 3 × 3 matrices H (rs) and the functions τ (rs) , η (rs) are parametrised in the same manner, with the same exchangeability assumptions, as earlier described. Note that the matrices H (rs) are not considered to be spatially dependent in this model.

White Noise
Next, attention turns to the specification of a generative model for the noise process Z.
In what follows three candidate models are considered, each arising in some way from the standard white noise model. The simplest of these models is just the white noise model itself, described first. Let Z (j) := W (r) be a spatial random white noise process on M, defined as an L 2 (M)bounded generalised Gaussian field such that, for any test functions {ϕ i ∈ L 2 (M) : i = 1, . . . , n}, the integrals ϕ i , W (j) M , i = 1, . . . , n, are jointly Gaussian with It is not necessary to introduce a scale parameter for the noise process since the τ (rs) parameters perform this role in the SPDE. For this choice of noise model the random field u of interest is typically continuous but not mean-square differentiable, which may or (more likely) may not be an appropriate mathematical description of the material.

Smoother Noise
It is difficult to determine an appropriate level of smoothness for the random field u based on visual inspection, except to rule out models that cannot be physically realised (c.f. Section 2.3.1). For this reason, a smoother model for the noise process Z was also considered. In this case the noise process is itself the solution of an SPDE where W (r) is the white noise process defined in Section 2.3.5. To limit scope, it was assumed that the matrix H is the same as the matrix used (if indeed one is used) in the construction of the differential operator in Sections 2.3.2 and 2.3.4. The only free parameters are therefore η (1) , η (2) > 0 and the further exchangeability assumption was made that these two parameters are equal. For this choice of noise model the random field u of interest is typically twice differentable in the mean square sense, which may or may not be more realistic than the white noise model.

Smooth Oscillatory Noise
The final noise model allows for the possibility of oscillations in the realisations of Z. As explained in Appendix C4 of Lindgren et al. (2011), such noise can be formulated as arising from the complex SPDE Here η > 0 and 0 ≤ θ < 1 are to be specified, with θ = 0 corresponding to the smoother noise model of Section 2.3.6. The matrix H is again assumed to be equal to the corresponding matrix used (if indeed one is used) in the construction of the differential operator in Sections 2.3.2 and 2.3.4, and W (1) and W (2) are independent white noise processes as defined in Section 2.3.5. This completes the specification of candidate models for L and Z, constituting 12 different models for the random vector field u in total. These will be denoted M i , i = 1, . . . , 12 in the sequel. Section 3 seeks to identify which of these models represents the best description of the panel training dataset.

Data Pre-Processing and Model Selection
This section proceeds as follows: In Section 3.1 the training dataset is described and Section 3.2 explains how it was pre-processed. Section 3.3 describes how formal statistical model selection was performed to identify a "best" model from the collection of 12 models just described. The fitting of model parameters raises substantial computational challenges, which this paper does not fully solve but discusses in detail.

Experimental Set-Up
The data were obtained from large sheets of notional 3.5mm thickness manufactured by the Dutch company MX3D (https://mx3d.com) using their proprietary printing protocol. The dataset consisted of 6 panels of approximate dimensions 300mm × 300mm, cut from a large sheet. These panels were digitised using a laser scanner to form a data point cloud; the experimental set-up and a typical dataset are displayed in Figure 2. The measuring process itself can be considered to be noiseless, due to the relatively high (0.1mm) resolution of the scanning equipment. For each panel a dense data point cloud was obtained, capturing the geometry of both sides of the panel.

Data Pre-Processing
Before data were analysed they were pre-processed and the steps taken are now briefly described.

Cropping of the Boundary
First, as is clear from Figure 2b, part of the boundary of the panel demonstrates a high level of variation. This variation is not of particular interest, since in applications the edges of a component can easily be smoothed. Each panel in the dataset was therefore digitally cropped using a regular square window to ensure that boundary variation was removed. Note that no reference coordinate system was available and therefore the orientation of the square window needed to be determined (i.e. a registration task needed to be solved). To identify the direction perpendicular to the plane of the panel a principal component analysis of the data point cloud was used to determine the direction of least variation; this was taken to be the direction perpendicular to the plane of the panel. To orient the square window in the plane a coordinate system was chosen to maximise alignment with the direction of the weld bands. The weld bands are clearly visible (see Figure 2b) and the average intensity of a Gabor filter was used to determine when maximal agreement had been reached (see e.g. Weldon et al., 1996).

Global De-Trending
As a weld cools, a contraction in volume of the material introduces residual stress in the large sheet (Radaj, 2012). As a consequence, when panels are cut from the large sheet these stresses are no longer contained, resulting in a global curvature in each panel as internal forces are re-equilibrated. This phenomenon manifests as a slight curvature in the orthographic projections of the panel shown in Figure 2b. The limited size of the dataset, in terms of the number of panels cut from the large sheet, precludes a detailed study of geometric variation due to residual stress in this work. Therefore residual stresses were removed from each panel dataset by subtracting a quadratic trend, fitted using a least-squares objective in the direction perpendicular to the plane of the panel. This narrows focus to the variation that is associated with imprecision in the motion of the print head as steel is deposited. The data at this stage constitute a point cloud in R 3 that is centred around a two-dimensional, approximately square manifold M, aligned to the x and y axes in R 3 .

High-Frequency Filter
Closer inspection of Figure 2b reveals small bumps on the surface of the panel. These are referred to as "splatter" and result from small amounts of molten steel being sprayed from the weld head as subsequent layers of the sheet are printed. Splatter is not expected to contribute much to the performance of the material and therefore is of little interest. To circumvent the need to build a statistical model for splatter, a pre-processing step was used in order to remove such instances from the dataset. For this purpose a two-dimensional Fourier transform was applied independently to each surface of each panel and the high frequencies corresponding to the splatter were removed. Since data take the form of a point cloud, the data were first projected onto a regular Cartesian grid, with grid points denoted x i ∈ M, in order that a fast Fourier transform could be performed. The frequencies for removal were manually selected so as to compromise between removal of splatter and avoidance of smoothing of the relevant variation in the dataset. The resolution of the grid was taken to be 300×300 (i.e. millimeter scale), which was fine enough to capture relevant variation whilst also enabling the computational analyses described in the sequel.

Pre-Processed Dataset
The result of pre-processing a panel dataset is a collection of triples where x i ∈ M 1 represents a location in the manifold representation M 1 of the panel, u (1) i = u (1) (x i ) ∈ R represents the measured Euclidean distance of the upper surface of the panel from x i ∈ M 1 and u (2) i = u (2) (x i ) ∈ R represents the measured Euclidean distance of the lower surface of the panel from x i ∈ M 1 . The subscript on M 1 indicates that the training data are associated to this manifold; a test manifold M 2 will be introduced in Section 4.1. The latent two-dimensional vector field u describes the continuous geometry of the specific panel and it is the statistical nature of these vector fields which will be modelled. In the sequel a collection of candidate models for u are considered, each based on a Gaussian stochastic process.
The pre-processed data for a panel are collectively denoted D (j) , where j ∈ {1, . . . , 6} is the index of the panel. The collection of all 6 of these panel datasets is denoted D 1 , the subscript indicating that these constitute the training dataset.

Model Selection
To perform formal model selection among the candidate models an information criterion was used, specifically the Akaike information criterion (AIC) M is empty and no random effects are present. The panel datasets D (j) that together comprise the training dataset D 1 are considered to be independent given θ M and M , so that The maximisation of (9) to obtainθ M requires a global optimisation routine and this will in general preclude the exact evaluation of the AIC for the statistical models considered. Following standard practice,θ M was therefore set to be the output of a numerical optimisation procedure applied to maximisation of (9). A drawback of the SPDE approach is that implementation of an effective numerical optimisation procedure is difficult due to the considerable computational cost involved in differentiating the likelihood. To obtain numerical results, this work arrived, by trial-and-error, at a procedure that contains the following ingredients: Gradient ascent: In the case where M is a stationary model, so the θ (j) M are empty, the maximum likelihood parametersθ id M were approximated using the natural gradient ascent method. This extracts first order gradient information from (9) and uses the Fisher information matrix as a surrogate for second order gradient information, running a quasi-Newton method that returns a local maxima of the likelihood. Full details are provided in Appendix A.4.
Surrogate likelihood: The use of gradient information constitutes a computational bottleneck, due to the large dense matrices involved, which motivates the use of a surrogate likelihood within the gradient ascent procedure whose Fisher information is more easily computed (Tajbakhsh et al., 2018). For this paper a surrogate likelihood was constructed based on a subset of the dataset, reduced from the 300 × 300 grid to a 50 × 50 grid in the central portion of the panel. Natural gradient ascent was feasible for the surrogate likelihood. It is hoped that minimisation of this surrogate likelihood leads to a suitable valueθ M with respect to maximising the exact likelihood, and indeed the values that reported in the main text are evaluations of the exact likelihood atθ M .
Two-stage procedure for random effects: The case where M is non-stationary is challenging for the natural gradient ascent method, even with the surrogate likelihood, due to the much larger dimension of the parameter vector when the random effects θ (j) M are included. In this case a two-stage optimisation approach was used, wherein the common parameters θ id M were initially fixed equal to their values under the corresponding stationary model and then optimised over the random effects θ (j) M using natural gradient ascent, which can be executed in parallel.
These approximation techniques combined to produce the most robust and reproducible parameter estimates that the authors were able to achieve, but it is acknowledged that they do not guarantee finding a local maximum of the likelihood and this analytic/numerical distinction can lead to pathologies that are discussed next. The computations that we report were performed in Matlab R2019b and required approximately one month of CPU time on a Windows 10 laptop with an Intel ® Core™i7-6600U CPU and 16GB RAM. The number of iterations of natural gradient ascent required to achieve convergence ranged from 10 to 80 and was model-dependent. Models with ≥ 100 parameters required several days of computation, while the simpler models with < 10 parameters typically required less than one day of computation in total.
The twelve candidate models are presented in the first three columns of Table 2, together with the dimension dim(θ M ) of their parameter vector, the (approximately) maximised value of the surrogate and exact log-likelihood and the AIC. The main conclusions to be drawn are as follows: Isotropy: In all cases the anisotropic models for L out-performed the corresponding isotropic models according to both the (approximately) maximised exact log-likelihood and the AIC. (The same conclusion held for 5 of 6 comparisons based on the surrogate likelihood.) Since the anisotropic model includes the isotropic model as a special case, these higher values of the likelihood are to be expected, but the AIC conclusion is non-trivial. (Inspection of the log-likelihood does, however, provide a useful validation of the optimisation procedure that was used.)  Stationarity: In all cases the non-stationary models for L out-performed the corresponding stationary models according to the (approximately) maximised surrogate log-likelihood, which is to be expected since these models are nested. However, the conclusion was reversed when the parametersθ M , estimated using the surrogate likelihood, were used to evaluate the exact likelihood. These results suggest that the two-stage numerical optimisation procedure for estimation of random effects was robust but that the surrogate likelihood was lossy in respect of information in the dataset that would be needed to properly constrain parameters corresponding to non-stationarity in the model.

I s o t r o p ic S t a t io n a r y N o is e
Noise model: In all cases the smooth oscillatory noise models for Z led to substantially lower values of the surrogate and the exact log-likelihood compared to the white noise and smooth noise models. This is to be expected since these models are nested, but the fact that this ordering was recovered in both the values of the surrogate and exact log-likelihood indicates that the numerical procedure worked well in respect of fitting parameters pertinent to the noise model.

Best model:
The best-performing model, denoted M * in the sequel, was defined as the model that provided the smallest value of AIC, computed using the (approximately) maximised exact likelihood. This was the anisotropic stationary model L with smooth oscillatory noise Z, which had 8 parameters and achieved a substantially lower value of AIC compared to each of the 11 candidate models considered. Although the use of AIC in this context is somewhat arbitrary and other information criteria could also have been used, the substantially larger value of the log-likelihood for the best-performing model suggests that use of an alternative information criterion is unlikely to affect the definition of the best-performing model. If instead the surrogate likelihood had been used to produce values for AIC, it would have declared that the best-performing model also includes non-stationary behaviour. This suggests that the aforementioned "best" model is more accurately described as the "best model that could be fitted" and that, were the computational challenges associated with parameter estimation in the SPDE framework surmounted, a model that involves non-stationarity may have been selected. This suggests that additional models of further complexity may provide better descriptions of the data, at least according to AIC, but that fitting of model parameters presents a practical barrier to the set of candidate models that can be considered. The remainder investigates whether predictions made using the simple model M * may still be useful.

Transfer Learning
This section aims to assess the capacity of our selected statistical model M * to perform transfer learning, and proceeds as follows: Section 4.1 assesses the performance of the statistical model in the context of a 3D-printed cylinder, which constitutes a held-out test dataset. Then, Section 4.2 implements finite element simulations of compressive testing based on the fitted model and compares these against the results of a real compressive experiment.

Predictive Performance Assessment
To investigate the generalisation performance of the model M * identified in Section 3.3, a test dataset, denoted D 2 and displayed in Figure 4a, was obtained as a laser scan of a 3D-printed cylinder of diameter ≈170mm and length ≈581mm. The notional thickness of the cylinder was 3.5mm, in agreement with the training dataset. The manifold representation of the panel is denoted M 1 and the manifold representation of the cylinder is denoted M 2 . The fitted model M * induces a bivariate Gaussian field on any suitably regular manifold, in particular M 2 . A sample from this model is displayed in Figure 4b. At a visual level, the "extent" of the variability is similar between Figure 4a and Figure 4b, but the experimentally produced cylinder contains more prominent banding structure compared to the cylinder simulated from M * . The thickness of the cylinder wall is relevant to its mechanical performance when under load. Recall that M * was not explicitly trained as a predictive model for thickness; rather, thickness is a derived quantity of interest. It is therefore interesting to investigate whether the distribution of material thickness, accouring to M * instantiated on M 2 , agrees with the experimentally obtained test dataset. Results in Figure 4c indicate that the fitted model agrees with the true distribution of material thickness in as far as the modal value and the right tail are well-approximated. However, the true distribution is positively skewed and the model M * , under which thickness is necessarily distributed as a Gaussian random field, is not capable of simultaneously approximating also the left tail. From an engineering perspective, the model M * is conservative in the sense that it tends to under-predict wall thickness, in a similar manner to how factors of safety might be employed.
The predictive likelihood of a held-out test dataset is often used to assess the predictive performance of a statistical model. However, there are two reasons why such assessment is unsuitable in the present context: First, the pre-processing of the training dataset, described in Section 3.2, was adapted to the manifold M 1 of the training dataset and cannot be directly applied to the manifold M 2 of the test dataset. This precludes a fair comparison, since the fitted statistical model is not able to explain the high-frequency detail (such as weld splatter, visible in Figure 4a) that are present in the test dataset D 2 but were not included in the training dataset. Second, recall that the purpose of the statistical model is limited to capturing aspects of geometric variation that are consequential in engineering. The predictive likelihood is not well-equipped to determine if the statistical model is suitable for use in the engineering context. For a more detailed assessment of the statistical model and its predictive performance in the engineering context, finite element simulations of cylinders under load were performed, described next.

Application to Compressive Testing
As discussed in Section 2.3, the aim of this work is to model those aspects of variability that are salient to the performance of components manufactured from 3D-printed steel. To this end, predictions were generated for the outcome of a compressive test of a cylinder. This test, which was also experimentally performed, involved a cylinder of the same dimensions described in Section 4.1, which was machined to ensure that its ends are parallel to fit the testing rig. The test rig consisted of a pair of end plates which compress the column ends axially. The cylinder was placed under load until a set amount of displacement has been reached. Both strain gauge and digital image correlation measurements of displacement under a given load were taken; further details may be found in Buchanan et al. (2018).
To produce a predictive distribution over possible outcomes of the test, three geometries were simulated for the cylinder using M * and finite element simulations of the compressive test were performed, using these geometries in the construction of the finite element model. To facilitate these simulations a simple homogeneous model for the mechanical properties of the material was used, whose values were informed by the coupon testing undertaken in Buchanan et al. (2018). This is not intended as a realistic model for the mechanical properties of 3D-printed steel. Indeed, as discussed in Section 1, it is expected that 3D-printed steel exhibits local variation in mechanical properties such as stiffness; the focus of this paper is surface metrology (only) and joint modelling of geometric and mechanical variation for 3D-printed steel will be addressed in future work. Three typical end points from the finite element simulation are displayed in Figure 5, panels (a)-(c). Notice that local variation in geometry leads to non-trivial global variation in the location of the buckling point. Panel (d) in Figure 5 displays the buckled states of two real 3D-printed steel cylinders, experimentally obtained. At a visual level there is some qualitative agreement in the appearence of these failure modes. The approach taken in this paper can be contrasted with approaches in the engineering literature that do not generalise across manifolds and do not report full probability distributions (Wagner et al., 2020). However, an important caveat is that the precise quantitative relationship between load and displacement in these simulations is not yet meaningful due to the simplistic model used for material properties. Further research will be required to incorporate these aspects into a complete statistical model for 3D-printed steel.

Discussion
This paper identified surface metrology as a promising new application area for recent advances in spatial statistics. Though attention was focussed on 3D-printed stainless steel, the techniques and methods have the potential for application to a wide variety of problems in this engineering sub-field.
Our approach sought to decouple the local statistical properties of the geometric variation from the global shape of the component, which offered the potential for transfer learning; i.e. conclusions drawn from one component to be carried across to a possibly different component. This assumption is intuitively reasonable for components that do not involve high levels of curvature, but its appropriateness for general components remains to be tested. To achieve this we exploited the SPDE framework. However, these was a practical limitation on the complexity of SPDE that could be reliably fit, since gradients of the likelihood required large dense matrices to be computed. As a result, it is possible that models that are too simple but whose parameters can be successfully optimised may be prefered to more parsimonious models whose parameters cannot easily be optimised. In other applications of surface metrology where generalisation across manifolds is not required, model fitting and model selection could be more easily achieved using the traditional covariance function framework (e.g. Rasmussen and Williams, 2006;Duvenaud, 2014;Bȃzȃvan et al., 2012;Oliva et al., 2016;Moeller et al., 2016).
The principal mathematical assumption that made transfer learning possible in this work was the identification of Laplacians across different manifolds, corresponding to the different components that can in principle be manufactured. This assumption (together with knowledge of the printing direction) enabled the meaning of a statistical model to be unambiguously interpreted across different manifolds, despite the fact that in general there are an infinitude of different geometries with which a given manifold can be equipped. Gaussian process models arise naturally in this context, since in specific cases they can be conveniently expressed in terms of the Laplacian via the SPDE framework. Alternative statistical models could also be considered: For example, the eigenfunctions {E k : k = 0, 1, . . . } of the negative Laplacian −∆ M form a basis for L 2 (M), and one can consider more general generative models for the random fields u 1 , u 2 ∈ L 2 (M) based on linear combinations of the E k with non-Gaussian coefficients (see e.g. Hosseini and Nigam, 2017;Sullivan, 2017). However, computational methods for non-Gaussian models remain relatively under-developed at present.
The scope of this investigation was limited to describing the geometric variation of 3Dprinted steel and there are many interesting questions still to be addressed. These include statistical modelling of the mechanical properties of the material, optimisation of the printing protocol to maximise the performance of (or reduce the variability in performance of) a given component, and adaptive monitoring and control of the manufacturing process to improve the quality of the printed component. These important problems will need to be collaboratively addressed; see Dodwell et al. (2021). The same printing protocol that we studied was recently used to construct an entire pedestrian bridge using 3D-printed steel (Zastrow, 2020). To ensure the safety of the bridge for pedestrians, extensive engineering assessment and testing were required. A long-term goal of this research is to enable such assessment to be partly automated in silico before the component is printed. This would in turn reduce the design and testing costs associated with components manufactured with 3D-printed steel.

Supplementary Materials
The electronic supplement contains a detailed description of how computation was performed, together with the Matlab R2019b code that was used.

A Supplementary Material
This appendix explains in detail how the computations described in the main text were performed. The calculations are essentially identical to those described in Lindgren et al. (2011);Hu et al. (2013a,b). However, the authors believe it is valuable to include full details since previous work has left elements of these calculations implicit. For instance, compared to the earlier work of Hu et al. (2013a,b), this paper is explicit about how Neumann boundary conditions are employed.
Appendix A.1 recalls how a general coupled system of SPDEs can be discretised using the classical finite element method. Then, in Appendix A.2 the model-specific aspects of the calculations are discussed. Technical details regarding the choice of the finite elements are contained in Appendix A.3 and are essentially identical to Appendix A.2 in Lindgren et al. (2011). Lastly, in Appendix A.4 the natural gradient method is developed in the SPDE context and associated computational challenges are discussed.

A.1 Discrete Representation of the Random Field
This section discusses how a coupled system of SPDEs, as in Equation (2) of the main text, can be discretised using a finite element method. Following Lindgren et al. (2011), a finite-dimensional approximation is used, where {ψ (l) k } n k=1 are basis functions to be specified and {e l } 2 l=1 is the standard basis for R 2 . The solution of Equation (2) is random and this randomness will be manifested in (10) through the weights w (l) k . Thus in construction of an approximation u n , a distribution for these weights must be identified such that u n is an accurate approximation of the random field. To this end, let f, g := f 1 , g 1 M + f 2 , g 2 M for f, g : M → R 2 and consider the Galerkin method, which selects the w (l) k in order that the equations that the test functions and the basis functions coincide and that the same set of basis functions is used to describe both the upper and the lower surface of the material. The system (11) fully determines the weights w

A.2 Calculations for Specific Model Components
This appendix provides formulae for the matrices K and C(z) −1 (c.f. Corollary 1). that are needed to discretise each of the twelve coupled systems of SPDEs considered. The form of K will be determined by the choice of differential operator L, and this is considered first, in Sections A.2.1-A.2.4. The distribution of z will be determined by the choice of noise process and this is considered second, in Sections A.2.5-A.2.7.

A.2.3 Isotropic Non-stationary
For the isotropic non-stationary model, the approximation that η (rs) and τ (rs) are locally constant over the support of any of the ψ (r) i was used. Thus, letting x (r) i denote an aribitrary point in the support of ψ Thus, in matrix format, K (rs) = τ (rs) (η (rs) C (rs) + G (rs) ).

A.2.4 Anisotropic Non-stationary
For the anisotropic non-stationary model, the approximation that η (rs) and τ (rs) are locally constant over the support of any of the ψ Thus, in matrix format, K (rs) = τ (rs) (η (rs) C (rs) +G (rs) ).

A.2.5 White Noise
For the white noise model, it follows immediately from the definition that, since φ Thus, in matrix format, C(z (r) ) = C (rr) for this noise model.

A.2.6 Smoother Noise
For the smoother noise model, observe that the system of SPDEs that governs the noise process, in Equation (7) of the main text, has the same generic form as the original system of SPDEs of interest. The same discretisation techniques described in Appendix A.1 can therefore be applied and an argument essentially identical to Proposition 1 establishes that C(z (r) ) −1 = (K (rr) ) (C (rr) ) −1 K (rr) for this noise model. Here K (r,s) is as in (14).
See Section 3.3 of Lindgren et al. (2011) for further detail.

A.3 Calculations for Finite Elements
The purpose of this technical appendix is to provide explicit formulae for the matrices C, G andG appearing in Appendix A.2. This is included merely for completeness and almost exactly follows Lindgren et al. (2011). First note that, for the matrix G appearing in isotropic models, one can use Green's first identity to obtain G (rs) ij For the more general formG(H) appearing in anisotropic models, based on the weighted Laplacian ∇ · H∇, one can apply the divergence theorem to the vector field F = ψ Thus computation of C, G andG amounts to computing particular inner products over the manifold M.
The finite element construction renders such computation straightforward by enforcing sparsity in these matrices, that in turn leads to sparsity of the matrix K. To construct finite elements the locations x i at which the data are defined (see main text) are used to produce a triangulation M (n) of size n that forms an approximation to M in R 3 . Let the vertices of the triangulation M (n) be denoted v 1 , . . . , v n . The basis functions ψ k were then taken to be the canonical piecewise linear finite elements associated with the triangulation M n , such that ψ k is centred at v k ∈ R 3 . Consider a triangle T = (v i , v j , v k ) and denote the edges e i = v k − v j , e j = v i − v k , e k = v j − v i . Let ·, · denote the standard inner product on R 3 and · its associated norm. Then the area of T is |T | = e i × e j /2. As explained in Appendix A2 of Lindgren et al. (2011), the matrices C, G andG are obtained from summing the contributions from each triangle; the contribution from triangle T is C {i,j,k}×{i,j,k} ← C {i,j,k}×{i,j,k} + |T | 12   2 1 1 1 2 1 1 1 2   G {i,j,k}×{i,j,k} ← G {i,j,k}×{i,j,k} + 1 4|T | e i e j e k e i e j e k G {i,j,k}×{i,j,k} ← G {i,j,k}×{i,j,k} + 1 4|T | e i e j e k adj(H) e i e j e k where adj(H) = det(H)H −1 is the adjugate matrix of H.
It was argued in Lindgren et al. (2011) that the approximation of C byC still ensures convergence of the approximation of u n to u in the limit n → ∞ as the mesh is refined.

A.4 Natural Gradient Ascent
This appendix described the natural gradient ascent method used to (approximately) maximise the likelihood. Let D be a dataset as described in the main text and let y ∈ R N be a vectorised representation of D following the block matrix convention of Proposition 1, so that N = 2n. Suppose that the parameters θ M of a model M have been vectorised so that θ M is represented by a vector θ ∈ R p . The log-likelihood of a model M , with precision matrix Q(θ) parametrised by θ, is (θ) := log p(D|θ M , M ) = − N 2 log(2π) + 1 2 log det Q(θ) − 1 2 y Q(θ)y.
The required derivatives ∂ θ i Q can be computed using matrix calculus to obtain the identities (letting ∂ = ∂ θ i denote differentiation with respect to a specific element θ i of θ) ∂Q = (∂K) C(z) −1 K − K C(z) −1 (∂C(z))C(z) −1 K + K C(z) −1 (∂K) ∂K (rs) = (∂τ (rs) )(η (rs) C (rs) +G (rs) ) + τ (rs) ((∂η (rs) )C (rs) + ∂G (rs) ) Here K is the most general form of the matrices variously called K, presented in (15). The natural gradient method can be viewed as a quasi-Newton method that, if it converges, converges to a local maximum of the likelihood. An anonymous Reviewer helpfully pointed out that common sparsity structure in Q and ∂ θ i Q can be exploited by using Takahashi recursions to circumvent computation of the full dense matrix Q −1 ∂ θ i Q in (16), since only the trace is required (Zammit-Mangion and Rougier, 2018). However, this sparsity exploit is not immediately applicable to the Fisher information matrix. Indeed, the natural gradient ascent method requires a product of two matrices of the form Q −1 ∂ θ i Q, which unfortunately will be dense in general. One solution is simply to use the usual Euclidean gradient instead, which corresponds to performing standard gradient descent. However, this carries the substantial disadvantage that a step-size parameter must be introduced and manually tuned. An alternative solution is to develop an approximation strategy to circumvent the dense matrix computations; this has been considered in Tajbakhsh et al. (2018). In the present context, natural gradient ascent is performed only on the manifold M 1 of the training dataset, whose notional flat geometry is easily meshed, and dense matrix computation is not required for subsequent predictions to be produced on the potentially large and complex test manifold M 2 . For this reason one may be willing to perform computation with dense matrices when fitting the training dataset.