Spatial Regression With Partial Differential Equation Regularisation

This work gives an overview of an innovative class of methods for the analysis of spatial and of functional data observed over complicated two‐dimensional domains. This class is based on regression with regularising terms involving partial differential equations. The associated estimation problems are solved resorting to advanced numerical analysis techniques. The synergical interplay of approaches from statistics, applied mathematics and engineering endows the methods with important advantages with respect to the available techniques, and makes them able to accurately deal with data structures for which the classical techniques are unfit. Spatial regression with differential regularisation is illustrated via applications to the analysis of eco‐colour doppler measurements of blood‐flow velocity, and to functional magnetic resonance imaging signals associated with neural connectivity in the cerebral cortex.


Introduction
This work reviews an innovative class of methods for the analysis of data that display complicated spatial or spatio-temporal dependencies. The complexity in the structure of spatial or spatio-temporal variation may be due to different reasons. In some cases, the complexity originates from the complex physics of the phenomenon under study. As an illustrative application, we will discuss the study of the velocity field of blood flow in human arteries, with data from eco-colour doppler and magnetic resonance imaging. In other cases, the complexity is due to an external source that generates strong anisotropies and non-stationarities in the observed quantity of interest; this is the case, for instance, of streams or prevailing winds in the analysis of environmental and climate data. In yet other cases, the complex spatial variation might be influenced by the complicated conformation of the domain where the data are observed. As an example, Figure 1 shows the spatial distribution of all the crimes reported in 2012 in the city of Portland, Oregon, USA, each crime location being indicated by a dot. The domain of interest for this application, that is, the territory of the municipality, is characterised by a strong concavity, generated by the presence of the Willamette river that cuts the city in two parts, connected by a few bridges downtown. As apparent in the figure, the variation of the phenomenon is not smooth across the river: in the north of the municipality, for instance, a rather high criminality affects the area East of the river, while a very low criminality is observed on the West of This is particularly apparent in the north of the municipality, where the high criminality in the East of the river is opposed to a low criminality in the West of the river; this sharp difference cannot be fully explained by population density or other census information. Right: triangulation of the domain the Willamette river; this sharp difference cannot be fully explained by population density and other census information. For this reason, when analysing these data, it is important to also take appropriately into account the domain conformation. In other applied problems, the domain is a two-dimensional manifold. Data distributed over non-planar domains with complicated geometries are in fact nowadays common in varied contexts. Figure 2 illustrates an example from the neurosciences, concerning the study of high-dimensional neuroimaging signals associated with neuronal activity in the cerebral cortex, a highly convoluted thin sheet of neural tissue that constitutes the outermost part of the brain, and where most neural activity is focused; see the left panel in Figure 2. Due to the highly convoluted nature of the cortex, functionally distinct areas, that are far apart along the cortex, may in turn be close in three-dimensional space. For this reason, neglecting the morphology of the cortex may lead to totally inaccurate estimates. Moreover, it is often crucial to comply with specific conditions that characterise the phenomenon at the boundaries of the spatial domain. As an example, in the already mentioned study of the velocity of blood flow, the domain of interest is the quasi-circular section of an artery, and there are some physiological constraints at the boundaries of this domain: the blood flow is indeed confined within this domain, and its velocity is zero at the arterial wall, the boundaries of the spatial domain, due to the friction between the blood particles and the arterial wall.
This work focuses entirely on a specific class of models we have recently been developing (in Sangalli et al., 2013;Azzimonti et al., 2014Azzimonti et al., , 2015Ettinger et al., 2016;Lila et al., 2016;Bernardi et al., 2017Bernardi et al., , 2018Arnone et al., 2019), that we name spatial regression with partial differential equation (SR-PDE) regularisation. These are regression methods with regularisation terms that involve a partial differential equation (PDE). PDEs are extensively employed in most fields of sciences and engineering to model complex phenomena behaviours. Their inclusion as regularising terms in a spatial regression model establishes a novel way of defining spatial variation, offering an important alternative to the classical paradigm in spatial data analysis. The differential regularisation permits to model the spatial and spatio-temporal variation in a highly rich and flexible manner, that can naturally render varied forms of anisotropy and non-stationarity, and that can areas of statistics, setting the background for the proposed models. Moreover, it points to other recent proposals to handle data over complicated domains.

Spatial Data Analysis
Spatial data analysis is one the main branches of statistics and has played a leading role in the development of models and methods for the analysis of spatially (and spatio-temporal) distributed data, with a vast literature (see, e.g., the textbooks, Cressie, 2015;Cressie & Wikle, 2011;Diggle & Ribeiro, 2007), and extensive software produced (see, e.g., Bivand et al., 2013). Most approaches to spatial data analysis are designed for data scattered over regular domains. See Section 2.4 for some proposals that try to overcome this limitation.
In the classical spatial data analysis approach, the unknown spatial field is assumed to be stochastic and the covariance of the stochastic field is used to model the second-order spatial dependence of the phenomenon under study. Instead, in the proposed SR-PDE, the unknown spatial field is assumed to be deterministic and the spatial structure of the phenomenon is modelled via the PDE in the regularising term.

Functional Data Analysis
Functional data analysis is concerned with samples whose atoms are noisy measurements of functions (see, e.g., Ramsay & Silverman, 2005;Ferraty & Vieu, 2006;Kokoszka & Reimherr, 2017). The connection between the proposed class of models and functional data analysis is made explicit in Section 8, that reviews a functional principal component method for samples of functions over complicated two-dimensional domains, based on SR-PDE. More generally, the use of differential regularisations is widespread in functional data analysis. In particular, Ramsay & Hooker (2017) describes similar approaches to the one here proposed, but in simpler one-dimensional settings and using ordinary differential equations.
Moreover, there is a very active area of research across spatial data analysis and functional data analysis. Some classical spatial data analysis methods such as kriging have been extended to deal with functional data (see, e.g., the reviews and collections in Delicado et al., 2010;Mateu & Romano, 2017;Menafoglio & Secchi, 2017;Martínez-Hernández & Genton, 2020). SR-PDE contributes to the literature on spatial functional data analysis by proposing innovative approaches for the study of spatially distributed data, that are influenced by the literature on functional data analysis. Moreover, the method extension discussed in Section 6.3 explicitly targets spatially dependent functional data.

Non-parametric and Semiparametric Models With Quadratic Differential Regularisation
Various forms of nonparametric smoothers and of semiparametric models with quadratic differential regularisation have been proposed in the literature, and their properties have been thoroughly studied in a well established literature (see, e.g., the textbooks by Green & Silverman, 1994;Eubank, 1999;Bickel et al., 1998;Härdle et al., 2004, and references therein). With respect to these models, SR-PDE features more general regularising terms, which can involve linear second-order PDEs with space-varying second-order, first-order and zero-order differential operators, and space-varying forcing terms, instead of the simple differential operators, constant over space, typically considered by classical smoothers and semiparametric models. Moreover, as emphasised in Section 1, SR-PDE considers spatial domains with complex shapes, both planar and curved, and complies with general forms of boundary conditions. Such high flexibility comes at the price of a higher analytic complexity of the proposed models. The solution to the estimation problem in SR-PDE is not explicitly available, differently from classical smoothers and semiparametric models with differential regularisation. Advanced numerical analysis techniques are needed to obtain approximate solutions.

Data Over Complex Domains
The problem of handling data over non-trivial planar domains, appropriately taking into account the shape of the domain, has recently attracted an increasing interest, and some techniques have been proposed to tackle this issue. Regularised least-square smoothing methods, that can handle data on complex planar domains, include: FELsplines (Ramsay, 2002); soap film smoothing (Wood et al., 2008); bivariate splines over triangulations (see, e.g., Lai & Schumaker, 2007;Guillas & Lai, 2010;Ettinger et al., 2012;Lai & Wang, 2013); low-rank thin-plate spline approximations (Wang & Ranalli, 2007;Scott-Hayward et al., 2014); SR-PDE (see, e.g., Sangalli et al., 2013;Azzimonti et al., 2015;Bernardi et al., 2017), that in turn expands the methods introduced by Ramsay (2002). Within the kriging framework, some proposals to address data distributed over complex planar domains have been advanced by Zhang et al. (2007) and Menafoglio et al. (2018) & Menafoglio et al. (2021. Moreover, the stochastic PDE approach introduced by Lindgren et al. (2011) can handle data over irregularly shaped domains, and the same holds for the intrinsic Gaussian processes in Niu et al. (2019). Soap film smoothing (Wood et al., 2008), the stochastic PDE approach (Bakka et al., 2019) and the kriging technique in Zhang et al. (2007) comply with some boundary conditions. As detailed in Section 3.2, SR-PDE can handle general forms of boundary conditions (see, e.g., Azzimonti et al., 2014).

Spatial Regression With Differential Regularisation
This section introduces SR-PDE in its most basic formulation, while subsequent sections present various extensions. Consider n locations p 1 ; : : : ; p n over a spatial domain R 2 , with boundary @ 2 C 2 . At location p i , we observe a variable of interest´i 2 R, and possibly also a set of covariates w i 2 R q . We consider the following semiparametric generalised additive model:´i D w t iˇC f .p i / C i ; i D 1; : : : ; n; (1) whereˇ2 R q is an unknown vector of regression coefficients, that describes the effect of the covariates on the mean of the variable of interest, f W D ! R is an unknown deterministic field, that captures the spatial structure of the phenomenon under study, and 1 ; : : : ; n are uncorrelated errors, with zero mean and finite variance. In Sangalli et al. (2013), we propose to estimate the unknownˇand f by minimising the regularised sum-of-square-error functional where is a positive smoothing parameter and denotes the Laplace operator. To define this partial differential operator, consider a generic function f W ! R. Denote the gradient of f by where > is the transpose operator. Moreover, given a vector field f D .f 1 .p/; f 2 .p// > W ! R 2 , define the divergence of the vector field as div f.p/ WD @f 1 @p 1 .p/ C @f 2 @p 2 .p/: Then, the Laplacian of f is defined as The Laplace operator provides a simple measure of the local curvature of the field f. Higher values of the smoothing parameter result in smoother estimates of f, while lower values of lead to more data-adapted estimates. It should be noticed that the Laplacian is invariant with respect to rigid transformations (rotations, translations and reflections) of the spatial coordinates; this ensures that the resulting smoothness of the estimate is independent of the arbitrary orientation of the coordinate system.
The functional (2) is well defined forˇ2 R q and f 2 H 2 . /, where H 2 . / is the Sobolev space of functions h W ! R such that h and its first and second derivatives are in L 2 . /; see, for example, Rudin (1991).
When covariates are not available, model (1) is replaced bý i D f .p i / C i ; i D 1; : : : ; n and f can be estimated by minimising (2), or another of the estimation functionals considered in the following sections, but omitting w > iˇi n the least-square term. In the following, we consider the more general model that includes covariates, briefly commenting on simplifications for the model without covariates.

Modelling Problem-Specific Information via a Partial Differential Equation
Assume that some problem-specific information is available, coming for instance from the physics, mechanics, chemistry or morphology of the phenomenon under study, and that this information can be formalised in terms of a PDE Lf D u. Specifically, on the base of the problem-specific information, we assume that the misfit Lf u is small, although we do not require it to be null. Then, it makes sense to estimate the unknownˇand f minimising the functional that trades-off a data fidelity criterion, the least-square term, and a model-fidelity criterion, the misfit with respect to the PDE (see Azzimonti et al., 2015Azzimonti et al., , 2014. The regularising term enables a very rich modelling of spatial variation. In particular, L is a general linear differential operator, including, for instance, second-order, first-order and zeroorder differential operators. Consider a symmetric and positive definite matrix K D˚K ij « 2 R 2 2 , named diffusion tensor, a vector b D˚b j « 2 R 2 , named transport vector, and a positive scalar c 2 R C , named reaction term. The differential operator L can include a second-order differential operator such as the divergence of the gradient, that is, a first-order differential operator such as the gradient, that is, and a zero-order differential operator such as cf. We thus consider general differential operators of the form Furthermore, K, b and c can vary over space, that is, K D K.p/, b D b.p/ and c D c.p/. The three operators composing (5) model various forms of anisotropy and non-stationarity. The diffusion term d iv.K r f/ produces a smoothing in all the directions. When K is a multiple of the identity matrix I, this term induces an isotropic smoothing effect; otherwise, it induces an anisotropic smoothing with a preferential direction identified by the first eigenvector of K, with a degree of anisotropy controlled by the ratio between the two eigenvalues of k. Figure 3 visualises the diffusion term via ellipses whose axes are oriented according to the eigenvectors of K and with lengths proportional to the corresponding eigenvalues: panel (a) visualises an isotropic and stationary diffusion; panels (b) and (c) provide two examples of anisotropic diffusion, with different directions and intensities of the anisotropy; panel (d) shows a nonstationarity isotropic case and panel (e) a non-stationarity anisotropic case. The transport term b rf produces a smoothing towards a unique direction, identified by b. This is visualised in Figure 3: panels (f) and (g) display two transport fields with different directions and intensities; panel (h) presents a non-stationary transport field. The reaction term cf induces shrinkage of the field to zero; this effect can as well be non-stationary.
The forcing term u 2 L 2 . / can either be a null function u D 0 (so-called homogeneous case), or u ¤ 0 (non-homogeneous case), even further increasing the flexibility in the modelling of spatial variation. In the following sections, for simplicity of exposition, we consider the homogeneous case, and only briefly comment on changes required by non-homogeneous forcing terms. We refer the reader to Azzimonti et al. (2014Azzimonti et al. ( ,2015, Arnone et al. (2019) for details on handling non-homogeneous forcing terms.
For K D I , b D 0, c D 0 and u D 0, we get the special case corresponding to (2), where the regularisation with the Laplacian f implies a stationary and isotropic smoothing.
It should be pointed out that we are not here interested in looking for the PDE solution that is closest to the data, as done for instance with an analogous approach by Xun et al. (2013). In fact, we do not make the assumption that the unknown field satisfies the PDE. Rather, we use the PDE as a powerful tool to model spatial variation, as described above.

Boundary Conditions
Moreover, when the domain has a boundary @ , there can be problem-specific conditions that f satisfies at @ . SR-PDE handles different types of boundary conditions, including Dirichlet conditions, that fix the value of f at @ ; Neumann conditions, that fix the value of its normal derivative, hence governing the flow across @ ; Robin conditions, that are linear combinations of the above conditions. Moreover, different conditions may be imposed on different portions, D , N , R , of @ , yielding a partition of the boundary. Denoting by the outward unit normal vector to @ , we can summarise the admissible boundary conditions as 8 < : where the functions D , N and R and the portions of the boundary D , N and R must satisfy some regularity conditions to ensure that the estimation functional is well-defined (Azzimonti et al., 2014). The boundary conditions are said homogeneous when the functions D , N , and R are constant zero functions. For simplicity of exposition, in the following sections, we consider homogeneous Dirichlet or Neumann conditions; see Azzimonti et al. (2014) for details on how to handle other boundary conditions.
In several applications, the ability to comply with specific boundary conditions is crucial to obtain meaningful estimates. An illustrative example in this respect is given in Section 7.

Characterisation of the Solution to the Estimation Problem
The functionals in (2) and (4) are well defined forˇ2 R q and f 2 H 2 . /. Moreover, when the domain is bounded, the use of appropriate boundary conditions guarantees the uniqueness of the solution (see, e.g., Sangalli et al., 2013;Azzimonti et al., 2014, for details). We denote by V. / the subspace of H 2 . / with the chosen boundary conditions. Set z WD .´1; : : : ;´n/ > , the vector of observed data values, and, for any function v W ! R, set v n WD .v.p 1 /; : : : ; v.p n // > , the vector of evaluations of v at the n spatial locations. Moreover, if covariates are present, let W denote the n q matrix having the q covariates w > i observed at p i in the i-th row, and assume W has full rank. Moreover, set Q WD I W .W > W / 1 W > , the matrix that projects into the orthogonal complement of R n with respect to the subspace of R n spanned by the columns of W.
The following proposition characterises the solution to the estimation problem, under homogeneous boundary conditions and forcing terms (see Azzimonti et al., 2014 for nonhomogeneous boundary conditions and forcing terms).
Proposition 1.There exists a unique pair of estimators . and In the model without covariates, O f satisfies a problem like (7), but where Q does not appear (or, equivalently, it is replaced by the identity matrix).

Numerical Solution of the Estimation Problem
The fourth-order differential problem (7) cannot be solved analytically. We hence use advanced numerical discretisation procedures. Here, we briefly review the discretisation based on a mixed finite element formulation.  describe instead an alternative discretisation by isogeometric analysis with non-uniform rational B-splines.
The numerical discretisation reduces the estimation problem to the solution of a linear system. The spatial domain is represented by an appropriate mesh, and the functions over are approximated by a finite system of bases defined on this mesh. This permits to consider spatial domains with complex shapes.

Construction of the Mesh
Convenient meshes of the spatial domain are obtained by constrained Delaunay triangulation; this is a generalisation of the Delaunay triangulation (see, e.g., Hjelle & Daehlen, 2006) that enables the definition of the boundary of the domain, forcing the required segments into the triangulation. The triangulated domain is denoted by T , where T is the set of all the triangles. It is important to note that the mesh nodes can differ from the data locations.
The R package fdaPDE (Lila et al., 2020) provides functions to construct the mesh, such as create.mesh.2D, that creates the mesh starting from a set of points to be used as triangle vertices and/or a set of edges to be used as boundary. The package also contains functions to refine the mesh, such as refine.mesh.2D, whose options allows to specify the maximal allowed triangle area and/or the minimal allowed triangle angle. The right panel of Figure 1 shows a mesh of the municipality of Portland. The triangulation has been obtained starting from the boundary and refining it according to criteria of maximal allowed triangle area and minimal allowed triangle angle. The mesh is able to very well represent this complicated domain, accurately rendering the Willamette river, that cuts through the city, and other detailed features. The package fdaPDE also handles non-planar triangular meshes, such as the one in the left panel of Figure 2, that represents the left hemisphere of the cerebral cortex of a template brain.
In general, the mesh should be fine enough to capture the features in the signal. For medium sample sizes, a regular mesh having a number of nodes higher than the number of data, but of the same order of magnitude, typically works efficiently. Triangles having too acute angles should be avoided (setting for instance a minimum angle of 25 ı in the refinement function), as they may be associated with numerical instability. For data displaying highly localised features, it may be convenient to consider data-driven meshes, that may be constructed starting from (a subsample of) the data locations. Such data-driven meshes permit to capture strongly localised features, while being parsimonious (i.e., using a limited number of mesh nodes) and thus requiring a lower computational cost.

Finite Elements
Finite elements provide bases for globally continuous, piecewise polynomial functions; see, for example, Ciarlet (2002) & Quarteroni (2017 for an introduction to finite elements. The finite element space V r T . /, with r 2 f1; 2; : : : g, is defined as the space of continuous functions over T that are polynomials of degree r once restricted to any triangle in T. It is convenient to introduce the nodes of the triangulation, 1 ; : : : ; N T . In the case of linear finite elements, the nodes are given by the vertices of the triangles in T. In the case of quadratic finite elements, the nodes also include the middle points of the triangle edges. For higher orders r, the nodes include also other specific points of the triangles. It is then possible to define a set of N T basis functions 1 ; : : : ; N T , associated with these nodes, and spanning the finite element space. In particular, for each node j , j 2 f1; : : : ; N T g, the basis j is defined as a locally supported piecewise polynomial function of the desired order r, taking value 1 at the node j , and value 0 at all other nodes, i.e., This property is computationally very convenient. Homogeneous boundary conditions on D can be simply enforced discarding the finite element bases associated with nodes in D . The function create.FEM.basis of fdaPDE constructs a finite element basis of the desired order (linear and quadratic finite elements are currently implemented), requiring as imput the mesh.

Solution of the Estimation Problem Using Finite Elements
Denote by ‰ the n N T matrix having as entries the evaluations of the N T basis functions at the n data locations: by v n D ‰v. Moreover, let R 0 and R 1 be the following N T N T matrices: Proposition 4.3 states that, with the introduction of the finite element space, the estimation problem becomes a linear system. Proposition 2 There exists a unique pair of estimators . Ǒ 2 R q ; O f 2 V T . // which solve the discrete counterpart of the estimation problem. Moreover, The system (8) is of order 2N T , so it can be very large. On the other hand, it is highly sparse: R 0 and R 1 are in fact highly sparse matrices because most of the cross-products of the basis functions and of their partial derivatives are null, due to local support of the bases. As an example, the triangulation of the municipality of Portland in Figure 1 has 492 nodes, and only about 2% of the entries of R 0 and R 1 are non-zero. For this reason, the solution of (8) is fast.
From Proposition 4.3, we get that where the positive definite matrix P WD R > 1 R 1 0 R 1 , discretises the regularising term in (4). This approximation of the regularisation only involves first-order derivatives. In fact, in the mixed finite element approach here considered, the fourth-order problem (7) is rewritten as a system of two coupled second-order problems; integrating by parts against test functions, these secondorder problems are hence reformulated in a version that only involves first-order derivatives. See, for example, Azzimonti et al. (2015) for details.
For non-homogeneous forcing terms u 2 L 2 . /, u ¤ 0, the vector 0 in the right-hand side of (8) is replaced by u, where u is the discretisation of the forcing term u, that is, u D .u. 1 /; : : : ; u. N // > . For the model without covariates, O f is as in (8) and (9) with Q D I . SR-PDE estimates can be obtained by the function smooth.FEM of the package fdaPDE (Lila et al., 2020). The option PDE.parameters of this function permits to specify the parameters K, b, c and u of the PDE in the regularising term; its default value corresponds to the regularisation of the Laplace operator.

Properties of the Estimators
The estimators Ǒ and O f are linear in the observed data values z. S denote the n n matrix S WD ‰.‰ > Q‰ C P / 1 ‰ > Q. Then, Ǒ D .W > W / 1 W > fI Sg z and O f n D S z. Assuming that the random errors 1 ; : : : ; n in model (1) are uncorrelated, with zero mean and finite constant variance 2 , we can readily derive Moreover, consider the field estimator O f at the generic location p 2 : O f .p/ D .p/ t .‰ t Q‰ C P / 1 ‰ t Qz: Its mean and variance/covariance are given by for any p; p 1 ; p 2 2 . As highlighted by the expressions above, the first-order structure of the field estimator, the mean of O f , as well as its second order structure, the covariance of O f , depend on the regularisation being considered.
A new observation at p n C 1 , having covariates w n C 1 , can be predicted by and the mean and variance of this estimator can be analogously derived. The vector of fitted values at the n data locations O z D W Ǒ C O f n D .I Q C Q S/z is also linear in the observations z. The trace of the linear operator that takes from the observed to the fitted values, t r.I Q C Q S/ D q C t r.S/, can be taken as equivalent degrees of freedom of the model (see, e.g., Buja et al., 1989). The equivalent degrees of freedom are thus given by the sum of q, the number of covariates, and tr.S/, the equivalent degrees of freedom of the the field estimator. An estimator of 2 is given by It is possible to automatically select the smoothing parameter using for instance k-fold cross validation or generalised cross-validation (Craven & Wahba, 1978): Expressions (10), (11) and (12) point out that the estimators are biased. Specifically, Ǒ is biased as a result of the bias affecting O f . The bias of O f in turn has two sources: the discretisation and the regularisation. The bias due to the discretisation is common to any non-parametric estimator that employs a basis expansion. The bias due to the regularisation is common to any penalised regression estimator, and affects also the infinite-dimensional estimators in Proposition 3.3, not only their discrete counterparts: unless the true function f satisfies exactly the PDE in the regularising term (hence, annihilating the penalty term), this term induces in fact a bias. Simulations show that this bias might be appreciable in small sample scenarios, especially under particularly irregular sampling designs. Azzimonti et al. (2014), considering for simplicity an SR-PDE model without covariates, investigates the convergence of the bias of the finite element estimator, with respect to the size of the triangulation. Arnone et al. (2021), always considering a model without covariates, and assuming other simplifying hypothesis, derive some preliminary results on the consistency of the methods.
SR-PDE has been thoroughly tested under various simulated settings (with and without covariates, with true fields generated parametrically or randomly, with uncorrelated and correlated noise, with various signal-to-noise ratios, sampling sizes and sampling designs), and compared with kriging and with other regularised regression estimators such as those based on thin-plate splines or soap film smoothing (see, e.g., Sangalli et al., 2013;Azzimonti et al., 2014;Bernardi et al., 2017;Ettinger et al., 2016;Bernardi et al., 2018;Arnone et al., 2019). In all the simulation studies, the estimates provided by SR-PDE, in correspondence of the smoothing parameter automatically selected by GCV or by k-fold cross validation, have always shown comparative advantages with respect to the alternatives, especially in the cases of irregularly shaped domains and of fields exhibiting complicated localised features.

Areal Data
Instead of so-called geostatistical data, that is, data observed at point-wise locations p 1 ; : : : ; p n , we can consider data referred to areal subdomains of . Specifically, let D 1 ; : : : ; D n be n disjoint spatial subdomains of . In such case, the model must involve areal evaluations of the unknown field f. Depending on the data and problem being considered, these areal evaluations can for instance be either mean evaluations or integral evaluations of f. For instance, we may assume the modeĺ i D 1; : : : ; n; where jD i j denotes the area of D i and 1 ; : : : ; n are uncorrelated zero-mean errors with Var. i / / 1 jD i j . In such case,ˇand f are estimated minimising the regularised weighted sum-of-square-error functional Azzimonti et al. (2015) for details. Areal data can be handled in the function smooth.FEM of fdaPDE, specifying the option incidence.matrix to identify D 1 ; : : : ; D n over the triangulated domain.

Generalised Linear Model
The sum of square error in (2) can be interpreted as a Gaussian (negative) log-likelihood. Indeed, the model in Section 3 can be extended to any distribution in the exponential family, via a generalised linear model framework (see, e.g., Hastie & Tibshirani, 1990;McCullagh & Nelder, 1989). Let Z 1 ; : : : ; Z n be independent random variable having a distribution that belongs to the exponential family. In , we model the expected value of Z i , at location p i and with covariates w i (whenever available), by where g is the canonical link function associated with the considered distribution. We thus estimateˇand f maximising the penalised log-likelihood where l.I Â/ is the log-likelihood for the considered distribution. This of course coincides with the minimisation of the regularised least-squares in (2) when the considered distribution is Gaussian. This generalisation significantly broadens the possible applications of SR-PDE, because most of the (continuous and discrete) well-known distributions belong to the exponential family. The functional in (13) is then optimised through iterative penalised least-squares. We refer the interested reader to  for details. The generalised linear version of SR-PDE can be implemented specifying the option family of the function smooth.FEM of fdaPDE.

Spatio Temporal Models
SR-PDE can also handle spatio-temporal data and spatially dependent functional data (Delicado et al., 2010;Mateu & Romano, 2017;Menafoglio & Secchi, 2017). To this aim, let T denote the temporal domain of interest and consider the field f defined over T. Different sampling designs may be considered, with point-wise or areal observations in space, and pointwise or interval observations in time, leading to different least-square terms (or, equivalently, different likelihood terms in a generalised linear framework). See Arnone et al. (2019) for details. As for the regularisation, we can consider two separate terms the first penalty in (14) regularises the field in space, likewise in (4), while the second penalty regularises the field in time, considering the classical regularisation of second derivative. In the simple case where L is the Laplace operator, this model is considered by Bernardi et al. (2017).
Analogous models are also considered by Marra et al. (2012) and Augustin et al. (2013). Alternatively, as detailed in Arnone et al. (2019), the regularisation can be based on the misfit with respect to a time-dependent PDE, jointly modelling the spatio-temporal behaviour, Z T Z Â @f @t C Lf u Ã 2 d pdt: See Arnone et al. (2019) for details. An application of this model is shown in Section 7. Depending on whether the regularisation in (14) or the one in (15) is considered, convenient discretisations of the estimation problem involve finite elements in space, and either splines or finite differences in time. Appropriately stacking the data observed at the nm spatiotemporal locations in the vector z, and correspondingly defining the vector of evaluations of f at the same nm spatio-temporal locations f nm , the covariate matrix W and the matrix Q D I W .W > W / 1 W > , it is possible to obtain the estimators where Q ‰ contains the evaluations of the spatial and temporal basis functions at the nm spatiotemporal locations, Q P is the discretisation of the considered regularising term, either (14) or (15), also including the corresponding smoothing parameters, and O f is the vector of basis expansion coefficients that returns the estimate of the spatio-temporal field. Hence, the estimators have the same form as in Section 3, and it is possible to derive their properties along the same lines as in Section 5. See Bernardi et al. (2017) and Arnone et al. (2019) for details.
Both methods are implemented in the function smooth.FEM.time of fdaPDE. This function has similar options as smooth.FEM. For instance, the option PDE.parameters permits to specify the parameters of the PDE in the regularising term; the option incidence.matrix allows to handle areal data in space.

Non-Planar Domains
As mentioned in Section 1, SR-PDE can as well deal with data whose locations lie on nonplanar bi-dimensional domains. Specifically, let M be a general bi-dimensional Riemmanian manifold, and let p 1 ; : : : ; p n 2 M be the n data locations where the variables z 1 ; : : : ; z n , and, whenever available, also the covariates w 1 ; : : : ; w n , are observed. The field f is now defined on the manifold domain, f W M ! R. In this case, the differential operator in the regularising term must be appropriately referred to the manifold domain M. In particular, in  (1) and estimate the unknownˇand f by minimising the regularised sum-of-square-error functional as in (2), with the standard Laplacian replaced by the Laplace-Beltrami, that is, The Laplace-Beltrami is also invariant with respect to rigid transformations of the coordinates of the non-planar domain. Hence, the use of M f in the penalty guarantees that the concept of smoothness is independent of the orientation of M and of the coordinate system. Both closed bi-dimensional domains as well as bounded bi-dimensional domains can be considered (for bounded domains, appropriate boundary conditions must be considered to guarantee uniqueness of the solution). The discretisation of the estimation problem and the properties of the estimators follow as in the case of planar domains. Lila et al. (2016) describes the discretisation via finite elements defined over non-planar triangulations. The left panel of Figure 2 displays a non-planar triangulation, and the right panel of Figure 4 provides an example of linear finite element basis on a non-planar triangulation. The function smooth.FEM of fdaPDE can handle data over non-planar triangulations. An alternative discretisation by isogeometric analysis with non-uniform rational B-splines is given in .

Illustrative Case Study: Analysis of Blood-Flow Velocity Field From ECD Data
We consider a problem that has arisen within the research project Mathematics for Carotid Endarterectomy @ MOX, concerning the analysis of the velocity field of blood flow in sections of carotid arteries, with data obtained from magnetic resonance imaging (MRI) and echocolour doppler (ECD) (see Azzimonti et al., 2015;Arnone et al., 2019, for details). The project explored the role of vessel morphology and of blood fluid dynamics on the onset and progression of atherosclerotic plaques. Figure 5 displays one ECD image from the study. The upper part of this image shows the ultrasound scan of the vessel; the small gray box indicates the beam where the velocity of blood particles is acquired. The lower part of the same image displays the measured blood flow velocity over the time lapse of about three heartbeats. The solid red line superimposed to this signal shows the mean velocity.
The study starts from ECD measurements of the velocity of the blood flow, at a cross-section of the common carotid artery located 2 cm before the bifurcation of the carotid, for patients having a high-grade stenosis (>70%) at the carotid bifurcation. The left panel of Figure 6 displays a schematic carotid artery: the lower part of this draw shows the cross-section where the ECD measurements are taken; specifically, the ECD signal is acquired over seven beams displayed in a cross-shaped pattern. The right panel of the same figure shows a zoom of this cross-section: the quasi-circular patient-specific profile of the section is reconstructed from MRI data; the seven beams where the ECD signal is measured are colored according to the mean velocity of blood particles at systolic peak. The left panel of Figure 7 shows the mean velocity over the seven beams during one heartbeat. The seven temporal profiles have different shapes, some displaying two peaks during the systolic phase, with the highest velocities attained in the second peak, and others displaying only one peak, earlier in time.
The first goal of the project consisted in accurately estimating, for each of the patients involved in the study, the time-dependent blood-flow velocity field over the whole carotid crosssection, starting from the ECD measurements, and from the reconstructions of patient-specific artery sections obtained from MRI data. This application though features some peculiarities, that hinder the applicability techniques for spatio-temporal data and for spatially dependent functional data. Firstly, in this problem, the domain shape, the carotid cross section, strongly influences the phenomenon under study: the blood flow is in fact confined within the artery, whose shape heavily reflects on the blood flow velocity field. Secondly, there are specific conditions at the boundaries of this domain: in fact, the velocity of the blood flow is zero at the arterial wall, due to the friction between the wall and the blood particles. Thirdly, the irregular pattern   (ECD) measurements are taken at a cross-section of the common carotid artery (CCA), 2 cm before the bifurcation of the CCA in internal carotid artery (ICA) and external carotid artery (ECA). The ECD signal is acquired in seven beams, located in the cross-shaped pattern shown at the cross-section. Right: cross-section of the common carotid artery, as reconstructed from magnetic resonance imaging (MRI) data, with the seven beams where the ECD signal is taken. The colour correspond to the mean velocity over the beam at systolic peak. The figure also highlights the physiological boundary conditions: the velocity of blood-flow is zero at the arterial wall, due to the friction between the wall and the blood particles.
of the observations, shown in the right panel of Figure 6, misleads stationary and isotropic methods, that yield non-physiological estimates. Figure 8, left panel, illustrates this issue, displaying the estimate of the blood-flow velocity field at systolic peak that is obtained by SR-PDE, imposing the appropriate boundary condition and using a regularising term that involves the  . Left: estimate of the blood-flow velocity field at systolic peak, obtained by spatial regression with partial differential equation (SR-PDE), imposing the appropriate boundary conditions (i.e., zero velocity at the arterial wall), but using L D (stationary isotropic estimate in space). Right: estimate of the blood-flow velocity field at systolic peak, obtained by SR-PDE, imposing the appropriate boundary conditions and using the differential operator L suggested by problem-specific information, with the diffusion and transport fields shown in Figure 9 (non-stationary anisotropic estimate in space). stationary and isotropic differential operator L D : penalising the local curvature of the field pull the solution towards a plane where data are missing, returning non-physiological estimates with rhomboidal isolines.
On the other hand, we can here take advantage of an advanced problem-specific information about the phenomenon under study, concerning fluid dynamics and hemodynamics (see, e.g., Formaggia et al., 2009). This information can be appropriately formalised into a PDE, along with the physiological boundary conditions. Specifically, we can consider the parabolic PDE @f =@t C Lf D 0, where L includes a non-stationary anisotropic diffusion tensor that induces a smoothing along concentric circles (Figure 9, centre) and a non-stationary transport field that induces a smoothing in the radial direction, from the centre of the artery to the arterial wall (Figure 9, right); the reaction term and the forcing term are instead not required by this application (see Azzimonti et al., 2015;Arnone et al., 2019, for the details). The right panel of Figure 8 displays the estimate of the blood-flow velocity field at systolic peak, obtained by SR-PDE, imposing the appropriate boundary condition and with a regularising term that involves the non-stationary anisotropic differential operator L illustrated above. This estimate efficiently incorporates the problem-specific information and yields a realistic estimate, that displays physiological isolines, unaffected by the cross-shaped pattern of the observations. Figure 10 shows the estimate at some temporal instants. The velocity field strongly varies in shape during the heartbeat, exhibiting asymmetries and eccentricities. These features are of particular interest to the medical doctors, that investigate their relationship with atherosclerosis. The right panel in Figure 7 shows the temporal profiles of the estimated velocity, evaluated at the central point of each beam. A visual comparison with the original data in the left panel of the same figure highlights that the estimate captures very well the main features of the ECD signals.

Population Studies: A PCA Method Based on SR-PDE
In this section, we consider population studies. Consider a sample of m realisations of the field, z 1 ; : : : ; z m , corresponding to m statistical units, where z j WD .´j 1 ; : : : ;´j n / > and z ji is the value of the variable of interest observed for the jth statistical unit at location p i . For simplicity of exposition, we assume that the m realisations of the signal are observed at the same locations p 1 ; : : : ; p n ; later, we shall remove such assumption. We are now interested in exploring the variability in the sample. Zhou & Pan (2014) propose a method for PCA of data spatially distributed over planar domains with irregular shape; the authors in particular consider a mixed effect model where the principal components are represented by splines over triangulations. Here, we review the method described in Lila et al. (2016), that is instead based on a regularised low rank approximation of PCA, and leverages directly on SR-PDE. The method is implemented in the function FPCA.FEM of fdaPDE; it can tackle signals over complicated domains, as those considered in the previous sections.
To this aim, we formulate PCA in a functional data analysis framework. Denote by D, the spatial domain of interest, either planar or curved. Consider the space , of square integrable functions on D, equipped with its standard inner product hf; gi D R D f g and norm jjf jj 2 D R D f 2 , where f; g 2 L 2 .D/. Consider the random field Z. / taking values in L 2 .D/, with mean . / D EOEZ. / and finite second moment, that is, R D E.Z 2 /d p < 1; moreover, assume that its covariance †.p; q/ D EOE.Z.p/ .p//.Z.q/ .q// is square-integrable. Mercer's Lemma (Riesz & Sz.-Nagy, 1990) ensures that there exist a non-increasing sequence of eigenvalues f j g j D1;2; ::: of †, and an orthonormal sequence of corresponding eigenfunctions ff j g j D1;2; ::: such that Z and that †.p; q/ D P 1 j D1 j f j .p/f j .q/, for each p; q 2 D. Hence, the random field Z can be expressed as Z. / D . / C P 1 j D1 u j f j . /, where u j D hZ ; f j i are uncorrelated random variables, named scores. This is the so-called Karhunen-Loève expansion of Z.
The orthonormal functions f 1 ; f 2 ; : : : 2 L 2 .D/ are named principal components of the random variable Z. These functions identify the strongest modes of variations of Z. Specifically, f 1 solves the following maximisation problem: Var hZ; f i; while f d , for d > 1, solves an analogous maximisation problem, but with the added constraint of being orthogonal to previous principal components, that is,  (16), but with the true covariance replaced by the sample covariance, that is, Unfortunately, it is impossible to have realisations of the random field Z observed for any p 2 D and without noise; only discrete and noisy realisations of Z can be observed. The classical approach in functional data analysis is thus the following: first, smooth estimates of each single realisation of the signal, O z 1 . /; : : : ; O z m . /, are obtained, through some appropriate smoothing procedure; then, these estimates are used to compute the sample mean O and sample covariance O †, and hence, the estimates O f 1 ; O f 2 ; : : : ; O f d of the first d principal components are obtained solving numerically problem (17). This is the so-called presmoothing approach.
Instead, in Lila et al. (2016), we propose an alternative method for the estimation of the principal components, inspired by approaches based on regularised functional PCA (fPCA), developed for the case of functional data over one-dimensional domains by, for example, Rice & Silverman (1991), Silverman (1996), & Huang et al. (2008). The proposed approach relies on another characterisation of principal components, known as best Ä basis approximation property. Namely, for any Ä D 1; 2; : : : , the first Ä principal components satisfy If we consider Ä D 1 and we take data already centred around the mean, the empirical version of the minimisation functional in (18) is given by that involves row data, without requiring any presmoothing. We can thus estimate the first principal component and associated score minimising a regularised sum-of-square-error functional, involving the least-square criterion (19) and a regularising term that encourages smoothness of f. This regularising term can for instance involve D f , where D denotes the Laplace or Laplace-Beltrami operator, depending on whether D is a planar or non-planar domain. This suggests to estimate the first principal component function f and the associated score vector u D .u 1 ; : : : ; u m / > by minimising over u 2 R m such that u > u D 1, and f 2 H 2 .D/, with boundary conditions in case D has boundaries. In particular, setting the constraint u > u D 1 makes the representation unique, without the need to impose that f has unitary norm. Subsequent principal component functions are extracted sequentially by removing previous principal components from the data matrix Z, whose jth row is given by z j . The minimisation of (20) can be performed via an iterative algorithm that alternates the estimation of the scores u given the principal component f, and the estimation of the principal component f given the scores u. Specifically, the two steps of the algorithm are as follows: Step 1: u given f. The unitary norm vector u that minimises the functional in (20) for a fixed f is given by where f n D .f.p 1 / : : : ; f.p n // > constitutes the loading vector. The expression for the scores is thus analogous to the one obtained in the multivariate setting.
Step 2: f given u.For a fixed u with unitary norm, let y i denote the i-th entry of the vector y D Z > u. Then, finding f that minimises (20), for the fixed u, is equivalent to finding f that minimises This is the typical SR-PDE problem (without covariates), that has been described in detail in the previous sections. The principal components and scores can then be rescaled to obtain principal components with unitary norms.
Missing data can as well be considered. Specifically, assume that the j-th statistical unit z j WD .´j 1 ; : : : ;´j n j / > , is observed at locations p j 1 ; : : : ; p j n j . Then, the least-square term in (20) can be replaced by while the regularising term is left unchanged. Similar sparse designs can be considered for the spatio-temporal problems in Section 6.3.

Illustrative Case Study: Analysis of Neural Connectivity on the Cortical Surface
This section illustrates a case study concerning the analysis of high-dimensional neuroimaging signals associated with neural activity in the cerebral cortex. The main goal is to explore the main neural connectivity patterns in a population of healthy subjects. The analysed data come from the Human Connectome Project (Essen et al., 2012) and are obtained from resting-state functional magnetic resonance imaging (fMRI) on 491 subjects. The fMRI signal measures neural activity through changes in the concentration of deoxy-hemoglobin, associated with energy use by neural cells. As standard practice in the neuroimaging community, the signal of each individual is mapped to a common template of cortex, to enable multisubject analyses. The left panel of Figure 2 shows a triangular mesh representing the cortical surface of a template brain, where the signals of all subjects are projected. The figure highlights the highly convoluted morphology of the cortex. Still today, the most part of neuroimaging studies are carried out either neglecting the spatial dependence in the signal, or employing standard methods that rely on the Euclidean distance, hence disregarding the complicated morphology of the cortex. As mentioned in Section 1, this may nonetheless lead to inaccurate estimates. There is nowadays a growing awareness of the need to include the complex brain morphology, to advance our still limited knowledge about brain functioning (see, e.g., Glasser et al., 2013, and references therein). This has generated a strong momentum in the international community for the development of methods able to accurately analyse data arising from these complex imaging scans. Lila et al. (2016) offer in this respect a first technique for population studies of signals referred to bi-dimensional manifold domains.
The analysis is focused on functional connectivity maps, computed from fMRI data. In particular, we select a region of interest on the cortex template. This is the area whose behaviour, as compared with the rest of the cortical surface, is of interest for the investigator. In the current study, we select a region of interest in the precuneus, an area of the cerebral cortex devoted to higher mental processes. For each subject, we then compute the functional connectivity map with respect to the region of interest (see, e.g., Lila et al., 2016); this map highlights the areas of the cortex that are more highly connected to the selected region of interest. A snapshot of the functional connectivity map for the chosen region of interest, for one subject in the study, is shown in the right panel of Figure 2.
We now wish to explore the main modes of variation of these functional connectivity maps among the different subjects. The last column of Figure 11 shows the first three principal components estimated by the proposed regularised fPCA and compares the estimates to those obtained by standard multivariate PCA on the data matrix Z and to those obtained by the classical presmoothing approach, respectively shown in the first column and second column of the same figure. The smoothing parameters in the proposed regularised fPCA and in the presmoothing approach are chosen via cross-validation. As highlighted by Figure 11, the three principal components estimated by multivariate PCA display a strong local variability. This is due to the fact that the sample size is too small with respect to the very high dimensionality of the data; moreover, multivariate PCA completely disregard the morphology of the domain, because spatial information is ignored by this approach. Both the presmoothing approach and the proposed regularised fPCA instead return smooth principal components. On the other hand, the presmoothing approach, smooth out sharp changes in the modes of variations, returning estimates that miss some localised features, apparent in the estimates provided by multivariate PCA and by regularised fPCA. As an example, the third principal components estimated by multivariate PCA and by regularised fPCA display corresponding localised areas with high values (displayed in red) or low values (displayed in blue), that are instead missing in the corresponding estimate yielded by presmoothing PCA. By contrary, the presmoothing approach appears to introduce some artefacts: for instance, the third principal component estimated by presmoothing PCA displays high values in the upper part of the plot, that do not find matches in the multivariate PCA and regularised fPCA estimates. In summary, the proposed regularised fPCA is able to combine the desired smoothness with the capacity to represent highly localised features of the modes of variation.

Discussion
Although the study of the theoretical properties of SR-PDE is still ongoing, and much remains to be done in this respect, this methodology has already shown to be very promising and capable to handle complex applied problems. Moreover, its flexible framework permits numerous models extensions, for instance considering higher dimensional domains (such as three-dimensional spatial domains with complicated shapes) and density estimation problems (Ferraccioli et al., 2021).
As highlighted in this article, SR-PDE is based on a very rich blend of approaches from pure and applied mathematics, and from engineering. This endows the methods with remarkable advantages with respect to the available techniques and makes them able to accurately deal with data structures for which the classical methods are unfit. Moreover, leveraging on advanced numerical analysis techniques ensures that the methods are computational highly efficient. We are sure that SR-PDE will gain popularity, proving highly valuable in a number of applied domains.