The SPDE approach for spatio-temporal datasets with advection and diffusion

In the task of predicting spatio-temporal fields in environmental science using statistical methods, introducing statistical models inspired by the physics of the underlying phenomena that are numerically efficient is of growing interest. Large space-time datasets call for new numerical methods to efficiently process them. The Stochastic Partial Differential Equation (SPDE) approach has proven to be effective for the estimation and the prediction in a spatial context. We present here the advection-diffusion SPDE with first order derivative in time which defines a large class of nonseparable spatio-temporal models. A Gaussian Markov random field approximation of the solution to the SPDE is built by discretizing the temporal derivative with a finite difference method (implicit Euler) and by solving the spatial SPDE with a finite element method (continuous Galerkin) at each time step. The ''Streamline Diffusion'' stabilization technique is introduced when the advection term dominates the diffusion. Computationally efficient methods are proposed to estimate the parameters of the SPDE and to predict the spatio-temporal field by kriging, as well as to perform conditional simulations. The approach is applied to a solar radiation dataset. Its advantages and limitations are discussed.

In the task of predicting spatio-temporal fields in environmental science using statistical methods, introducing statistical models inspired by the physics of the underlying phenomena that are numerically efficient is of growing interest.Large space-time datasets call for new numerical methods to efficiently process them.The Stochastic Partial Differential Equation (SPDE) approach has proven to be effective for the estimation and the prediction in a spatial context.We present here the advection-diffusion SPDE with first-order derivative in time which defines a large class of nonseparable spatio-temporal models.A Gaussian Markov random field approximation of the solution to the SPDE is built by discretizing the temporal derivative with a finite difference method (implicit Euler) and by solving the spatial SPDE with a finite element method (continuous Galerkin) at each time step.The "Streamline Diffusion" stabilization technique is introduced when the advection term dominates the diffusion.Computationally efficient methods are proposed to estimate the parameters of the SPDE and to predict the spatio-temporal field by kriging, as well as to perform conditional simulations.The approach is applied to a solar radiation dataset.Its advantages and limitations are discussed.

Introduction
Many areas of environmental science seek to predict a space-time variable of interest from observations at scattered points in the space cross time domain of study, e.g., among other possible applications, wind prediction (Lenzi and Genton, 2020;Huang et al., 2022), precipitation forecasting (Sigrist et al., 2011), urban air quality inference (Paciorek et al., 2009).Among modern techniques proposing efficient methods for estimation and prediction in a spatio-temporal framework, there is a distinction between two possible ways of constructing and treating spatio-temporal models (Wikle and Hooten, 2010): either one follows the traditional geostatistical paradigm, using joint space-time covariance functions (see for example Cressie and Huang (1999), Gneiting (2002), Stein (2005), as well as the recent reviews Porcu et al. (2021), Chen et al. (2021)), or one uses dynamical models, including functional time series of surfaces, see for example Wikle and Cressie (1999), Sigrist et al. (2012) and Martínez-Hernández and Genton (2023).
While the theoretical aspects of spatio-temporal geostatistics are well developed (Cressie and Wikle, 2011), their implementation faces difficulties.The geostatistical paradigm is computationally expensive for large spatio-temporal datasets, due to the factorization of dense covariance matrices, whose complexity scales with the cube of the number of observations.This well known problem is often referred to as the "big n problem" (Banerjee et al., 2014).Separable space-time covariance functions have often been used to take advantage of their computational convenience, even when they are not realistic in describing the processes due to the absence of space-time interaction.In most applications, separable models show poorer predictions than nonseparable models, see references above.Recent studies have focused on constructing nonseparable models, which are physically more realistic, albeit computationally more expensive, see Gneiting (2002), Porcu et al. (2006), Salvaña and Genton (2021) and Bourotte et al. (2016), Allard et al. (2022) in a multivariate context.Nonseparable space-time covariance models can be constructed from Fourier transforms of permissible spectral densities, mixtures of separable models, and partial differential equations (PDEs) representing physical laws (Carrizo-Vergara et al., 2022;Lindgren et al., 2022).They can be fully symmetric or asymmetric, stationary or nonstationary, univariate or multivariate, in the Euclidean space or on the sphere.See Porcu et al. (2021) and Chen et al. (2021) for recent comprehensive reviews.
In this paper, we follow the dynamic approach that makes use of physical laws and study models which are defined through Stochastic Partial Differential Equations (SPDEs), where the stochasticity is obtained by adding a random noise as a forcing term.The SPDE approach relies on the representation of a continuously indexed Gaussian Random Field (GRF) as a discretely indexed random process, i.e. a Gaussian Markov Random Field (GMRF, see Rue and Held (2005)).Passing from a GRF to a GMRF, the covariance function and the dense covariance matrix are substituted respectively by a neighborhood structure and a sparse precision matrix.Using GMRFs with sparse precision matrices implies computationally efficient numerical methods, especially for matrix factorization.The link between GRF and GMRFs in the purely spatial case has been pioneered by Lindgren et al. (2011), who proposed to construct a GMRF representation of the spatial Matérn field on a triangulated mesh of the domain through the discretization of a diffusion SPDE with a Finite Element Method (FEM).We refer to Bakka (2022) for a simple explanation of the FEM applied to the spatial SPDE and to Section 2.3 for a detailed generalization to the spatio-temporal SPDE.
In the spatial framework, major mathematical and algorithmic advances in the SPDE approach have been made (Fuglstad et al., 2015;Pereira and Desassis, 2019;Pereira et al., 2022), making it possible to efficiently process very large datasets, even in the presence of nonstationarities and varying local anisotropies.The development of SPDE-based approaches to Gaussian processes has led to several practical solutions, among which we find the R package for approximate Bayesian inference R-INLA (Rue et al., 2009;Lindgren and Rue, 2015) that uses SPDEs to sample from spatial and spatio-temporal models.
When generalizing to the spatio-temporal framework, a direct space-time formulation of the SPDE approach was first suggested in Lindgren et al. (2011), without any precise detail on estimation and prediction.In Cameletti et al. (2011), the SPDE approach was coupled with an AR(1) model in time, leading to a separable space-time model.Nonseparable spatio-temporal models have been elaborated in Särkkä et al. (2013), Krainski et al. (2018b) and Lindgren et al. (2023) as a spatio-temporal generalization of the diffusion-Matérn model of Lindgren et al. (2011).In the approaches overviewed above, the space-time processes are symmetrical in the sense that the spatio-temporal covariance does not change when the sign of the space and/or time lag changes.However, atmospheric and geophysical processes are often asymmetric due to transport effects, such as air and water flows.Sigrist et al. (2015) built nonsymmetrical and nonseparable space-time Gaussian models as a solution to an advection-diffusion SPDE with computationally efficient algorithms for statistical estimation using fast Fourier transforms and Kalman filters.Sharrock and Kantas (2022) used a similar method, but in an joint online parameter estimation and optimal sensor placement problem.Liu et al. (2020) extended this approach to spatially-varying advection-diffusion and non-zero mean sourcesink, leading to a space-time covariance which is nonstationary in space.The applicability of these approaches remains difficult however, especially with scattered data, as it relies on the Fourier transform of the data.Carrizo-Vergara et al. (2022) defined new spatio-temporal models incorporating the physical processes linked to the studied phenomena (advection, diffusion, etc.), but the estimation of the parameters and the conditioning to the observed data remained unaddressed.
In this work, we propose a new and efficient approach for dealing with spatio-temporal SPDEs that includes both a diffusion and an advection term.In contrast to Sigrist et al. (2015) and Liu et al. (2020), we make use of the sparse formulation of the spatio-temporal field which is the approximate solution of the SPDE obtained by a combination of FEM and finite differences (FD).This sparse formulation allows fast algorithms for parameter estimation and spatio-temporal prediction.We also treat the case of an advection-dominated SPDE, by introducing the Streamline Diffusion (SD) stabilization term in the SPDE (Hughes and Brooks, 1979).To the best of our knowledge, this work is the first statistical FEM/FD implementation of spatio-temporal SPDEs with advection.
The paper is organized as follows: Section 2 first presents background material on the spatio-temporal SPDE approach.The spatio-temporal advectiondiffusion model developed in this paper is presented, along with its discretization.Moreover, the stabilization of advection-dominated SPDEs is introduced.Section 3 explores fast and scalable estimation methods, kriging formula for prediction and conditional simulations.Section 4 presents an application of the proposed spatio-temporal SPDE approach to a solar radiation dataset.Section 5 discusses the advantages and the limitations of the approach and opens the way to further works.
2 The spatio-temporal advection-diffusion SPDE and its discretization

Background
In the SPDE representation, GRFs on R d are viewed as solutions to specific stochastic partial differential equations (Whittle, 1954(Whittle, , 1963)).In particular, Gaussian Whittle-Matérn fields, analyzed in details in Lindgren et al. (2011) and reviewed in Lindgren et al. (2022), are solutions to with α > d/2 and τ > 0. ∆ = is the Laplacian operator and W (•) is a standard spatial Gaussian white noise, whose definition is briefly recalled.
If {Z i } i∈N is a sequence of independent, standard Gaussian variables, then the function W defined over L 2 (R d ) by where {e j } j∈N denotes an orthonormal basis of L 2 (R d ), is a Gaussian white noise on R d .
In principle, GeRFs have only meaning when applied to test functions in some particular functional space, and not necessarily when evaluated in points of the space, but, for an easier reading, we will allow ourselves to write W (s) and X(s).
The covariance function of the Gaussian Whittle-Matérn field solution to Equation (1) is the well known Matérn covariance function with smoothness parameter ν = α − d/2 > 0, scale parameter κ and variance . K ν is the modified 2 nd order Bessel function and h = s − s ′ is the spatial lag between two locations s and s ′ in R d .In particular, when ν = 1/2, we get the exponential covariance function and when ν → +∞, after proper renormalization, (3) tends to the Gaussian covariance function (Genton, 2001).
In Lindgren et al. (2011), the smoothness parameter ν considered in the Matérn covariance function corresponds to integer values of α.When noninteger values of α are introduced in the modeling, the SPDE is said to be fractional.Recent reviews of results and applications of the fractional SPDE approach are available in Bolin et al. (2024); Bolin and Kirchner (2020); Roques et al. (2022), but this case will not be treated further in this work.
When generalizing to spatio-temporal processes X(t, s), we consider the framework proposed in Carrizo-Vergara et al. (2022) for extending the SPDE approach to a wide class of linear spatio-temporal SPDEs.Let us denote ξ ∈ R d a spatial frequency and ω ∈ R a temporal frequency.The spacetime white noise with unit variance, denoted W (t, s), is characterized by its spectral measure dµ W (ω, ξ) = (2π) −(d+1) d ξ dω.New spatio-temporal models were obtained from known PDEs describing physical processes, such as diffusion, advection, and oscillations with stochastic forcing terms.In particular, Carrizo-Vergara et al. (2022) provided sufficient conditions to the existence and uniqueness of stationary solutions to with β > 0. In (4), the spatial operator L g is defined using the spatial Fourier transform on R d , denoted F S , where g : R d → C is a sufficiently regular and Hermitian-symmetric function called the symbol function of the operator L g .The temporal operator ∂ β ∂t β is where F T is the temporal Fourier transform on R and where we have used the symbol function over R ω → (iω) β = |ω| β e i sgn(ω)βπ/2 .
The spatio-temporal symbol function of the operator involved in (4) is thus where g R and g I are the real and imaginary part of the spatial symbol function g(ξ).If |g R | is inferiorly bounded by the inverse of a strictly positive polynomial and g R cos βπ 2 ≥ 0, Theorem 1 and Proposition 3 in Carrizo-Vergara et al.
(2022) state that (4) admits a unique stationary solution for every arbitrary g I function .

The spatio-temporal advection-diffusion SPDE
The advection-diffusion equation is a Partial Differential Equation (PDE) that describes physical phenomena where particles, energy, or other physical quantities evolve inside a physical system due to two processes: diffusion and advection.Advection represents the mass transport due to the average velocity of all particles, and diffusion represents the mass transport due to the instantaneously varying velocity of individual particles.In this paper, we study the advection-diffusion SPDE on the domain [0, T ] × R d that writes where • the operator ∇ • H ∇ is a diffusion term that can incorporate anisotropy in the matrix H.When the field is isotropic, i.e. when H = λ I, this term reduces to the Laplacian operator λ∆; • the operator γ •∇ models the advection, γ ∈ R d being a velocity vector; • α ≥ 0 relates to the smoothness of X(t, •), κ 2 > 0 accounts for damping and c is a positive time-scale parameter; • τ ≥ 0 is a standard deviation factor and Z is a stochastic forcing term, detailed below.
This equation was mentioned in Lindgren et al. (2011), Carrizo-Vergara et al. (2022) and Lindgren et al. (2023), and was analyzed using spectral approaches in Sigrist et al. (2015) and Liu et al. (2020).The term Z(t, s) is assumed to be of the form i.e., a space-time separable stochastic (Generalized) random function given as the tensor product of a temporal Gaussian white noise W T and a spatial noise Z S .Z S is often chosen to be a spatial Gaussian white noise, denoted W S in this case.To ensure a sufficient regularity for Z, Z S can alternatively be a colored noise, such as for example the solution to the spatial Whittle-Matérn SPDE (Lindgren et al., 2011) where W S is a Gaussian white noise.Note that the parameters κ 2 and H in the noise term have been set identical to those in the diffusion term in the lefthand-side of (5) to ensure that the spatial marginalization of the process is a Matérn field, as detailed below.A relation can be found between the SPDE notation of our paper and the more classical notation of infinite dimensional SDEs of Da Prato and Zabczyk (1992).This relation is explained in Appendix A.
When α > 0, X(t, s) is a stationary nonseparable spatio-temporal field with covariance function Cov(u, h), with (u, h) ∈ R × R d .The advection-diffusion equation ( 5) is a particular first-order evolution model as in Equation ( 4) with β = 1.Its spatial symbol function verifies the sufficient condition for existence and uniqueness of a stationary solution recalled at the end of Section 2.1.We define the spatial trace of X as the spatial random field X(t, •) at any t ∈ [0, T ].Carrizo-Vergara et al. (2022) showed that the advection term does not affect the spatial trace of the solution.For some specific values of the parameters, the spatial trace of the solution to ( 5) is a Matérn field, as detailed in Proposition 1.In the following |H| denotes the determinant of the square matrix H.
Proposition 1 Let Z(t, s) be a spatio-temporal noise colored in space with Z S (s) satisfying (7), and let α tot = α + α S .If α tot > d/2, the spatial trace of the stationary solution X(t, s) of the SPDE (5) is the Gaussian Matérn field with covariance where h = s − s ′ is the spatial lag and C M αtot−d/2 (•) is the unit variance and scale Matérn covariance function defined in (3) with smoothness parameter equal to ν = α tot − d/2.
Proposition 1 is adapted from Proposition 1 in Lindgren et al. (2023).A proof is reported in Appendix B. The model reduces to a separable one in a particular case stated in the corollary below.
Corollary 2 Let the coefficients of the SPDE (5) be such that α = 0 and γ = 0; the spatial operator applied to the spatio-temporal field X(t, s) is then the constant value c −1 .Let Z(t, s) be a spatio-temporal noise colored in space, with Z S (s) satisfying (7).If α S > d/2, the stationary solution of the SPDE is a separable spatio-temporal field with covariance with smoothness parameter equal to ν = α S − d/2.

Discretization
The advection-diffusion SPDE in ( 5) is discretized in time and space, using finite differences and a finite element method, respectively (from now on, this type of discretization will be noted as FEM/FD discretization).The temporal domain [0, T ] is discretized in (N T +1) regular time steps of length dt = T /N T , and we note t k = kdt for k ∈ {0, . . ., N T }.Since implicit solvers are usually less sensitive to numerical instability than explicit solvers, the implicit Euler scheme is chosen for the temporal discretization.This choice implies stability, hence convergence towards the stationary solution.We denote X (k) = X(t k , •) the temporal approximation of the spatial trace of the SPDE (5) at time t k .The FEM method for the spatial discretization is the continuous Galerkin method with Neumann Boundary Conditions as detailed in Lindgren et al. (2011).
The solution in two dimensions is now detailed.The solution in three dimensions involve geometrical technicalities, but is otherwise very similar.Let Ω ⊂ R 2 be a compact and connected domain of R 2 .Ω is meshed using a triangulation T with N S vertices {s 1 , . . ., s N S } ⊂ Ω.Let h := max Tr∈T h Tr , where h Tr is the length of the longest side of the triangle Tr ∈ T .A first-order finite element representation X h of the solution to the spatial SPDE is a linear combination X h = N S i=1 x i ψ i of piecewise linear basis functions {ψ i } N S i=1 , each ψ i being equal to 1 at the vertex s i and 0 at all the other vertices.The weights {x i } N S i=1 define uniquely the values of the field at the vertices, while the values in the interior of the triangles are determined by linear interpolation.The continuous Galerkin solution is then obtained by finding the weights that fulfill the weak formulation of Equation ( 5) for test functions belonging to the space V spanned by {ψ i } N S i=1 .
Proposition 3 Let X(t, s) be the spatio-temporal process solution to Equation (5) with α ∈ {0, 1} and spatio-temporal white noise, i.e.Z(t, s) = W (t, s) = W T (t) ⊗ W S (s).Let T be a triangulation of Ω and {ψ i } N S i=1 be the piecewise linear basis functions defined over T .Let us define the mass matrix i,j=1 as follows: Then, at each time step, the continuous Galerkin finite element solution vector where z (k+1) ∼ N (0, I N S ), M 1/2 is any matrix such that M 1/2 M 1/2 = M and dt = T /N T .When the noise on the right-hand side is colored in space, i.e.Z(t, s) = W T (t) ⊗ Z S (s), the discretization reads where L S is the Cholesky decomposition of Q −1 S , the covariance matrix of the discretized solution Z S of the spatial SPDE (7), obtained with the continuous Galerkin FEM (Lindgren et al., 2011).
The elements of the matrices M, G and B are non-zero only for pairs of basis functions which share common triangles.This implies that the matrix (M + dt c (K + B)) is sparse and that Equation ( 9) can be solved by Cholesky decomposition in an efficient way.

Stabilization of advection-dominated SPDE
When the advection term is too strong with respect to the diffusion term, advection-domination occurs.In the framework outlined above, when α = 1, the non-symmetric matrix M + dt c (K + B) becomes ill-conditioned, which The SPDE approach for spatio-temporal datasets with advection and diffusion induces oscillations and unstable solutions for the continuous Galerkin approximation.Specifically, the advection-domination occurs when the Péclet number Pe h = ∥γ∥h 2λ > 1, where λ is the coefficient of the isotropic Laplacian operator (see for example Mekuria and Rao (2016) or Quarteroni (2008, Chapter 5)).
One possible solution is to decrease the mesh size h, i.e., to refine the triangulation, until the advection no longer dominates on the element-level, with Pe h < 1.However, in many cases this is not a feasible solution because it would increase the number of vertices beyond computation limits.Another solution, adopted here, is to introduce a stabilization term.Many stabilization approaches are possible, some being more accurate than others (Quarteroni, 2008, Chapter 5).In our case, we opt for the streamline diffusion stabilization approach (Hughes and Brooks, 1979), considered as a good trade-off between accuracy and computational complexity.Essentially, the SD approach consists in stabilizing the advection by introducing an artificial diffusion term along the advection direction.The following proposition presents the stabilized solution to (5).
Proposition 4 Assume the same hypotheses as in Proposition 3 with α = 1.The solution to Equation (5) in presence of SD stabilization is where S = [S ij ] N S i,j=1 is the matrix of the SD stabilization operator S, such that (|H|) 1/4 .When the noise on the right-hand side of Equation ( 5) is colored in space, the discretization becomes where L S is as in Proposition 3.
The proof of the discretized equation follows the same reasoning as that of Proposition 3 with the addition of the matrix S. The streamline diffusion approach can be seen as a perturbation of the original SPDE (Bank et al., 1990).Indeed, by making the classical hypothesis of Neumann boundary condition on Ω and by using the Green's first identity, we get As a consequence, the original SPDE (5) can be rewritten with an additional diffusion term as The term (h∥γ∥ −1 γ γ ⊤ ) acts as an anisotropic "diffusion" matrix that is added to the anisotropy (or identity) matrix H of the original diffusion.This extra diffusion stabilizes the advection directed along the direction γ.By following the proof of Proposition 1, we find that the marginal variance of the spatial field X(t, •) of Equation ( 11) is equal to For the variance to be equal to the variance in Proposition 1, τ must be

Spatio-temporal Gaussian Markov Random Field approximation
Proposition 5 In presence of an advection-dominated flow and a spatio-temporal white noise on the right-hand side of Equation (5), the discretized vector x (k+1) on the mesh T at each time step is the solution of the following equation: where and z (k+1) ∼ N (0, I N S ) is independent of x (0) , . . ., x (k+1) .In presence of a spatiotemporal noise colored in space on the right-hand side of Equation (5), the matrix E reads where L S is defined in Proposition 3.
Proof Starting from Equation ( 10), which represents the numerical scheme for the advection-diffusion spatio-temporal SPDE with stabilization, it is straightforward to obtain (12).□ When the SPDE is not advection-dominated, which implies that no stabilization term is needed, Equation ( 13) is replaced by the similar equation where the matrix S is deleted and τ is replaced by τ .
The covariance matrix Σ of the approximate spatial trace x (0) at the first time step, can be taken to be equal to any admissible positive definite matrix.The closer Σ is to the covariance C S of X(t, •), the faster the stationary solution is obtained.When the hypotheses of Proposition 1 are satisfied, an efficient option is to choose Σ as the Matérn covariance of Equation (8).
To obtain fast inference and prediction computations, the precision matrix of the spatio-temporal discretized solution x 0:N T = [x (0) , . . ., x (N T ) ] ⊤ must be sparse.For this reason, M is replaced by the diagonal matrix M , where M ii = ⟨ψ i , 1⟩ and M ij = 0 if i ̸ = j (Lindgren et al., 2011).This technique is called mass lumping and is common practice in FEM (Quarteroni, 2008, Chapter 5).From now on, we always use the diagonal matrix M, but for ease of reading, it will still be denoted M.
Proposition 6 Let x 0:N T = [x (0) , . . ., x (N T ) ] ⊤ be the vector containing all spatial solutions until time step N T of Equation (12).The global precision matrix Q of the vector x 0:N T of size (N S (N T + 1), N S (N T + 1)) reads Proof The proof is available in Appendix D. □ Remark 2 Matrix (14) has a tridiagonal structure in time and is sparse in each of its spatial blocks of size (N S , N S ) located on the three diagonals.The sparsity of the precision matrix relies on the choice of M and on the value of α (the higher α the less sparse the spatial blocks).Conversely, the precision of finite element method discretization is influenced by the mesh density (the smaller h, the more precise the solution); this factor plays a role in defining the size of the precision matrix, but not its sparsity pattern.The sparsity pattern will be useful for the following sections concerning estimation and prediction.

Estimation, prediction and simulation
This section presents an efficient implementation for parameter estimation and spatio-temporal prediction within the spatio-temporal SPDE framework described in Section 2. We consider the advection-diffusion SPDE (5) with d = 2, α = 1, H = I (isotropic diffusion) and colored noise in space with α S = 2. Similar computations can be generalized to other values of α S such that α S /2 is integer or to anisotropic diffusion.The spatio-temporal domain Ω × [0, T ] is discretized in space with a triangulation T with N S nodes and discretized in time by means of (N T + 1) regular time steps.This space-time discretization is denoted T ′ = T × {0, T /N T , . . ., T }.At each time step k ∈ {0, . . ., N T } there are n (k) observations scattered in the spatial domain Ω.There is thus a total of n = N T k=0 n (k)  spatio-temporal data collected in the vector y 0:N T = [(y (0) ) ⊤ , . . ., (y We consider a statistical model with fixed and random effects.The fixed effect is a linear trend on a set of covariates and the random effect is modeled as the FEM/FD discretization of the random field solution to the SPDE (5) (as described in Section 2.3), with the addition of random noise: where b is the vector of q fixed effects and Λ is a (n, q) matrix of covariates with [Λ] jk = λ k (t j , s j ), j = 1 . . ., n and k = 1, . . ., q.The matrix A is the (n, N S (N T + 1)) projection matrix between the data and the points in T ′ , and ε is a standard Gaussian random vector with independent components.When the observation locations do not change during the time window, A ⊤ A is a (N S (N T + 1), N S (N T + 1)) block-diagonal matrix with all (N S , N S ) equal blocks.
The discretization with FEM in space and FD in time is justified by a few considerations: compared to a Fourier approximation approach (such as the ones considered in Sigrist et al. (2015); Liu et al. (2020); Sharrock and Kantas ( 2022)), it is better suited for scattered data, it can be easily adapted to spatially and/or temporally varying parameters in the SPDE, leading to nonstationary extensions of the method, and it can be generalized to spatio-temporal fields on Riemannian surfaces with only a few changes in the approach (Pereira, 2023).

Estimation of the parameters
The parameters of the SPDE are estimated using maximum likelihood.We collect the parameters of the SPDE in the vector θ ⊤ = [κ, γ x , γ y , c, τ ], while all the parameters of the statistical model are collected in ψ ⊤ = [θ ⊤ , b ⊤ , σ 0 ].Following (15), y 0:N T is a Gaussian vector with expectation Λ b and covariance matrix Σ y 0:N T = A Q −1 (θ) A ⊤ +σ 2 0 I n , where Q(θ) is a precision matrix of size (N S (N T + 1), N S (N T + 1)) depending on the parameters θ.For ease of notation, we use Q instead of Q(θ).The log-likelihood is equal to (16) We use the Broyden, Fletcher, Goldfarb, and Shanno optimization algorithm (Nocedal and Wright, 2006), that makes use of the second-order derivative of the objective function.The gradients of the log-likelihood function ( 16) with respect to the different parameters included in ψ are approximately computed with FD.
The SPDE approach for spatio-temporal datasets with advection and diffusion We now propose and detail a computationally efficient formulation for both the terms of the log-likelihood (16), namely − 1 2 log|Σ y 0:N T | and − 1 2 (y 0: We first consider the logdeterminant.The quadratic form is addressed next.
Proposition 7 In the framework outlined above, we have where Proof To compute log|Σy 0:N T |, let us consider the augmented matrix Hence, Using block formulas, we have where the last equality is a consequence of the Woodbury identity.This leads to the result.
□ Proposition 8 The term log|Q| in Equation (17) can be computed with the computationally efficient formula with where Q S is the precision matrix of the discretized spatial noise Z S defined in Proposition 3.
Proof Following Powell (2011), let where the α (k) are defined by Q is a block-matrix organized as N N .Hence, the formula for |Q| is Applying the logarithm, we obtain Equation ( 20).□ Note that |F −1 | is now the determinant of a (N S , N S ) sparse, symmetric and positive definite matrix.The log-determinant can be computed through its Cholesky decomposition as chol is a triangular matrix, whose determinant is the product of the diagonal elements.
The term log|Q A | = log|Q +σ −2 0 A ⊤ A| requires a detailed analysis.The term σ −2 0 A ⊤ A is an (N S (N T + 1), N S (N T + 1)) diagonal block matrix, whose (N S , N S ) blocks are sparse.The computation of log|Q A | is not as straightforward as in the case of log|Q|, because there is no way of reducing the computation to purely spatial matrices.Depending on the size N S (N T +1), we can either apply a Cholesky decomposition of the (N S (N T + 1), N S (N T + 1)) matrix Q A or the matrix-free approach proposed in Pereira et al. (2022), here sketched.The logarithm function is first approximated by a Chebyschev polynomial P (•) (Chebyshev, 1853), then the Hutchinson's estimator (Hutchinson, 1990) is used to obtain a stochastic estimate of tr[P (Q A )].The method is detailed in Algorithm 5 in Pereira et al. (2022).Now, concerning the quadratic term of the log-likelihood (16), we can work with the more convenient expression obtained thanks to the Woodbury formula The second term of Equation ( 22) can be computed either by Cholesky decomposition or using the conjugate gradient (CG) method.This latter method solves Q A v = w with respect to v and computes v sol = w ⊤ v, with w = A ⊤ (y 0:N T − Λ b).In this case, it is useful to find a good preconditioner for the matrix Q A to ensure fast convergence of the CG method.We found that a temporal block Gauss-Seidel preconditioner (Young, 1971, Chapter 3) was a good choice in this case.A detailed presentation of the CG method is available in Pereira et al. (2022).

Prediction by Kriging
Under a Gaussian assumption, optimal prediction is the conditional expectation, also known in the geostatistics literature as kriging.We detail here two prediction settings: space-time interpolation and temporal extrapolation.
In the space-time interpolation setting, the spatio-temporal vector x 0:N T is predicted on the entire spatial mesh during the time window [0, T ], i.e. on T ′ , using the data y 0:N T defined in Equation ( 15).The kriging predictor is directly read from Equation ( 19): The computation of ( 23) requires the inversion of Q A , as detailed in Section 3.1.The conditional variance, also called kriging variance, is The computation of the diagonal of an inverse matrix is not straightforward when only the Cholesky decomposition of the matrix is available.Among the existing methods there is the Takahashi recursive algorithm described in Takahashi et al. (1973) and Erisman and Tinney (1975).Another way of computing the kriging variance is through conditional simulations, as detailed in Section 3.3.
In the temporal extrapolation setting, the vector x (N T +1) is predicted at time step (N T + 1) on T using all the data available until time T , i.e. from y 0:N T .Following Equation ( 12), we have where z (N T +1) is a standardized Gaussian vector, and D and E are defined in Proposition 5.The kriging predictor x ⋆(N T +1) is where x ⋆(N T ) is extracted from x ⋆ 0:N T .The same procedure can be iterated to predict x at further time steps.

Conditional simulations
To perform a conditional simulation, we use the conditional kriging paradigm presented below.This approach relies on the fact that kriging predictors and kriging residuals are uncorrelated (independent under Gaussian assumption, see Chilès and Delfiner (1999, Chapter 7)).First, a non-conditional simulation x (N C) 0:N T is performed on the spatio-temporal grid T ′ .From this simulation, kriging residuals are computed over the entire spatio-temporal grid T ′ .The conditional expectation is computed using the method presented in the previous section.In a second step, these independently generated residuals are added to the usual kriging of the data to get the conditional simulation Conditional simulations at further time steps are obtained by iteratively computing x (C) N T +k using the propagation equation ( 24) with k ≥ 1.Multiple independent realizations of conditional simulations can then be used to compute estimates of conditional variances or other quantities, such as probability maps of threshold exceedance.

Simulation study
We report here some results regarding the estimation of the parameters θ ⊤ = [κ, γ 1 , γ 2 , c, τ ] for 50 independent realisations of a spatio-temporal model simulated with the SPDE (5).We set H = I, α = 1 and α S = 2.The spatial domain is the [0, 30] 2 square with a grid triangulation of N S = 900 spatial points.The time window is [1,10] with unit time step and N T + 1 = 10.The n S = 100 observations are randomly located into the spatial domain and their position do not change during the N T time steps (hence n = 1000).Since the sizes of both the dataset and the spatio-temporal mesh are reasonable, we report the estimations computed with both the Cholesky decomposition approach and the matrix-free approach.
As initial values, we use estimated values obtained from the variograms of the spatial and temporal traces of the process.Specifically, the initial value for κ is the estimated scale parameter of a Matérn covariance function with smoothness parameter ν = α + α S − 1 = 2 considering independent temporal repetitions, the initial value for c is deduced from the estimated parameter of n S independent repetitions of AR(1) processes of length (N T + 1) and τ 2 is computed from Equation ( 8) with σ 2 being the empirical variance computed on the data.Finally, the initial value for γ is the null vector.The parameters for the matrix-free approach are set to the following: the order of the Chebychev polynomial to approximate the logarithm is set to 30 and the number of terms in the sum of the Hutchinson's estimator is set to 10.The results are reported in Table 1.They show that all parameters are accurately estimated with both approaches.In almost all cases, the true value of the parameter is The SPDE approach for spatio-temporal datasets with advection and diffusion within the mean ± 2 standard deviations interval.We remark how the matrixfree approach takes more time to estimate the parameters.This is due to the iterative computations, that increase the computational time.However, we know that the benefit of the matrix-free approach is the possibility of applying it to much larger spatio-temporal meshes, where the Cholesky decomposition cannot be applied at all.

Application to a solar radiation dataset
With the constantly increasing installation of photovoltaic (PV) power and its volatility due to weather conditions, characterizing short-term variability of generated solar power (from the minute resolution to the 15-minute resolution) is important for the integration of PV systems into the electrical grid, for balancing supply and demand (Kreuwel et al., 2020).Fluctuations in solar production can have a significant impact on grid stability, and accurate prediction allows for planning necessary adjustments, such as modulating the production of other energy sources, to avoid service interruptions and optimize resource utilization.
The approach detailed in the previous sections is now applied to a solar radiation dataset for which experts agree on the presence of advection due to Western prevailing winds transporting clouds from one side of the domain to the other.The HOPE campaign (Macke et al., 2017) recorded Global Horizontal Irradiance (GHI) (also called SSI, Surface Solar Irradiance) over a 10 × 16 km 2 region in West Germany near the city of Jülich from April 2 to July 2, 2013.The sensors were located at 99 stations located as pictured in Figure 1 and GHI was recorded every 15 seconds.A detailed description of the campaign can be found in Macke et al. (2017).
The dataset was cleaned for outlying values and non-operating sensors, and the temporal resolution was reduced from 15 seconds to 1 minute.Figure 2 (left panel) shows GHI as a function of time (in minute, during a full daythe 28th of May 2013) at 4 different stations.These stations, represented in color in Figure 1, are located at the border of the domain, far from each other.The GHI starts close to 0, increases after sunrise, peaks at midday and tends to 0 at sunset.The maximal theoretical amount of irradiance reaching the  sensor follows an ideal concave curve.The divergence between the measured irradiance and the optimal curve can be slight or important, depending on the presence of clouds.One can see on this example that the evolution among the 4 stations is similar, with variations accounting for spatio-temporal variations of the clouds.
A first preprocessing was made in order to stationarize the phenomenon.Oumbe et al. (2014) showed that the solar irradiance at ground level, GHI (denoted G for short from now on), computed by a radiative transfer model can be approximated by the product of the irradiance under clear atmosphere (called Clear Sky GHI, or G c ) and a modification factor due to cloud properties and ground albedo only (Clear Sky Index, or K c , Beyer et al. (1996)): The error made using this approximation depends mostly on the solar zenith angle, the ground albedo and the cloud optical depth.In most cases, the maximum errors (95th percentile) on global and direct surface irradiances are less than 15 W m −2 and less than 2 to 5 % in relative value, as recommended by the World Meteorological Organization for high-quality measurements of the solar irradiance (Oumbe et al., 2014).Practically, it means that a model for fast calculation of surface solar irradiance may be separated into two distinct and independent models: a deterministic model for G, under clear-sky conditions, as computed according to Gschwind et al. (2019), considered as known in this study, and a model for K c , which accounts for cloud influence on the downwelling radiation and is expected to change in time and space.K c is modeled as a random spatio-temporal process and will be the subject of our analysis.
Figure 2 (right panel) shows the variable K c corresponding to the variable G shown on the left panel.In general, K c lies between 0 and 1, but in rare occasions, values above 1 can be observed.This phenomenon is called overshooting (Schade et al., 2007) and is due to light reflection by surrounding clouds.
A time window of 20 minutes around 4 p.m. on May 28, 2013 is extracted with observations every minute at the 73 stations with well recorded values.The histogram and time series of the data are shown in Figure 3. Parameters are estimated on this 20-minute window using the method described in Section 3.1.The spatio-temporal grid contains N T + 1 = 20 one-minute time steps, from t = 1 to t = 20 and N S = 900 regular spatial mesh points.

Estimation and prediction
Six different models are fitted to the data and used for prediction: 3 models with advection (called "adv-diff") and 3 models without advection (called "diff") obtained by setting γ = 0.Both groups contain the three following submodels: (i) a model with diffusion included only in the stochastic forcing term, with a Matérn spatial trace with ν = 1; (ii) a nonseparable model that does not have a Matérn spatial trace, but a generalized covariance function instead (the spatial covariance function exists only if α + α S > d/2, where d = 2 in this case, see Proposition 1); in this case, the SPDE generates a process which only has meaning as a random measure, and cannot strictly be interpreted at individual locations; (iii) a nonseparable model with a Matérn spatial trace with ν = 2.In the general model of Equation ( 5) they correspond respectively to (α, α S ) = (0, 2), (1, 0), (1, 2).The parameters of the SPDE are estimated for each model separately.The results are reported in Table 2.
The log-likelihoods of the models that include advection are within a range of variations of 10 log-likelihood units and are between 34 to 80 units larger than those with diffusion only.As a point of comparison, if all spatio-temporal dependencies were ignored, the BIC penalization for the advection parameters would be 2 ln(1460) ≃ 14.5.These results indicate strong evidence in favor of models with advection, but no significant differences among them.The parameters vary substantially from one model to the other, but it must be remembered that, when considered independently, their physical interpretation is model dependent.Some combinations are interpretable however.For example, following Proposition 1, the overall variance is equal to (8π ) when α tot = 3. Accordingly, the estimated standard deviations for models (1), ( 3), ( 4) and ( 6) are equal to 0.160, 0.119, 0.174 and 0.199 respectively, with the experimental standard deviation being equal to 0.184.For the same models, the practical ranges computed as √ 8ν/κ (Lindgren et al., 2011) are equal to 1.915, 3.079, 2.281 and 4.255 respectively.Notice that among pairs of models that differ by the presence or absence of advection, the estimated range is larger for those without advection in an attempt to account for the larger correlation distance due to transport.With the objective of improving the predictions at the 15-minute resolution needed for the electrical grid management, we then perform prediction with two different validation settings containing 80% of conditioning data and 20% of validation data.In the first case (called "Uniform") the validation locations are uniformly randomly selected.In the second case (called "South-East") the validation locations are located downwind (i.e.South-East) with respect to the estimated advection direction.See Figure 4 for a representation of the validation settings.
Recall that a time window containing 20 unit time-steps, from t = 1 to t = 20, has been selected.For each validation setting and considering each time the end of the time window from T = 11 to T = 20, three prediction configurations using conditioning data from time (T − 9) to T are computed and compared to the real values, allowing us to compute a root mean square error (RMSE) validation score.First, the kriging is performed spatially only (hereafter referred to as "S" kriging).Second, a temporal extrapolation is computed at the conditioning locations at time horizons (T + 1), (T + 2) and (T + 3) ("T1", "T2", "T3" kriging).Third, the spatio-temporal prediction is computed at the validation locations at time horizons (T + 1), (T + 2) and (T + 3) ("ST1", "ST2", "ST3" kriging).We thus have a total of 6 models × 2 validation settings × 3 prediction configurations.RMSEs are averaged over the 10 repetitions.Results are reported in Table 3 and those of the ST prediction configuration are also shown in Figure 6.
For all tested validation settings and prediction configurations, the models with advection show better RMSE scores than models without advection.This result is a confirmation of the results already observed on log-likelihoods.Models with advection have similar prediction scores in the prediction configurations S and T, model (1) having slightly better performances in the configuration S. In the T and ST configurations, models (2) and (3) have in general quite similar RMSEs, except in the South-East setting with ST configuration where model ( 2) is clearly the best model.In this case, prediction is made in a space-time domain lying downstream with respect to the advection.It is thus expected that the model best representing the underlying physics should lead to the best prediction performances.
An example of prediction maps on T at time horizons (T + 1), (T + 2) and (T +3) is reported in Figure 5, along with the observed values (black contoured

Conditional simulations
Figure 7 shows 100 conditional simulations of K c computed at time T = 11 and horizons (T + 1), (T + 2), . . ., (T + 6) with the advection-diffusion model (3).Two validation stations have been selected: one in the North-West part of the domain (the orange star in the left panel of Figure 4) and one in the South-East part of the domain (the green star in the right panel of Figure 4).Given that there is an advection from NW to SE, it is therefore expected that the advection-diffusion model should be able to transport the information in that direction.The mean of the 100 simulations and the envelopes corresponding to twice the pointwise standard deviation have also been represented, along with the true values.As expected, most of the conditional simulations lie within the envelopes in both cases and at all time horizons.The remarkable result is that the variance of the conditional simulations at the green station is smaller than that at the orange one at every time step, especially when the time horizon increases.This is due to the advection term in model ( 3), able to propagate information from North-West to South-East.

Conclusion
The spatio-temporal SPDE approach based on advection-diffusion equations proposed in this work combines elements of physics, numerical analysis and statistics.It can be seen as a first step toward physics informed geostatistics, which introduces physical dynamics into a statistical model, accounting for possible hidden structures governing the evolution of the spatio-temporal phenomenon.The different terms of the SPDE (advection, diffusion) directly influence the spatio-temporal dependencies of the process, by controlling its variability in space and time.Compared to spatio-temporal models built on covariance functions such as the Gneiting class (Gneiting, 2002), we gain in interpretability since the parameters of the model can be linked to the physical coefficients of SPDEs.We showed that it is possible to build an accurate space-time approximations of the process driven by the advection-diffusion SPDE using a combination of FEM in space and implicit Euler scheme in time.It leads to sparse structured linear systems.We obtained promising results for the estimation and for the prediction of processes both in terms of precision and speed.When the size of the dataset is moderate, direct matrix implementation is possible.We showed how matrix-free methods can be implemented in order to obtain scalable computations even for very large datasets.The application to the solar radiation dataset demonstrated that the nonseparable advectiondiffusion model exhibited the best prediction performances on a phenomenon that is certainly governed by advection and diffusion processes.Nonetheless, further work is necessary to better assess the prediction accuracy and the computational complexity.Applications to larger and more complex datasets, in particular using the matrix-free approach, will be considered.
Further work is also necessary to compare the proposed approach to competing ones.In the spirit of Heaton et al. (2019), which focused on spatial models, a comparison aimed at spatio-temporal processes showing dominating advection would be of great interest.For example, Okasaki et al. (2022) considered spatio-temporal SPDEs without advection or with an advection that does not dominate diffusion.Moreover, the moderate size of the dataset considered allowed the use of direct matrix computations whereas, as discussed above, our approach is scalable.Comparison to models expressing the advection in a Lagrangian framework (Ailliot et al., 2011;Benoit et al., 2018;Salvaña and Genton, 2021) should also be performed.
A maximum likelihood procedure was implemented.As a follow-up work, it would be interesting to implement this space-time model as part of a Bayesian hierarchical construction, possibly within the INLA/SPDE framework (Rue et al., 2009;Krainski et al., 2018a), which already propose separable spatiotemporal models and will probably soon include the diffusion spatio-temporal SPDE model proposed in Lindgren et al. (2023).The Bayesian framework would enable the assessment of estimation and prediction uncertainty.Extending the INLA approach to deal with advection-diffusion models is left for future work.Different parameter estimation methods could also be investigated, such as "online" methods, which recursively estimate the unknown model parameters based on the continuous stream of observations (Sharrock and Kantas, 2022).
One of the main advantages of the SPDE formulation is that it is easy to generalize to nonstationary settings.Nonstationary fields can be defined by letting the parameters κ(t, s) and γ(t, s) be space-time-dependent.This generalization implies only minimal changes to the method used in the stationary case concerning the simulation, but needs more work for estimation and prediction, since the maximum likelihood approach becomes much more expensive.We can also incorporate models of spatially varying anisotropy by modifying the general operator ∇ • H(t, s)∇X(t, s) with a nonstationary anisotropic matrix H(t, s).The introduction of nonstationarities could allow to better describe phenomena where local variations are clearly present.The approaches by Fuglstad et al. (2015) and Pereira et al. (2022) have being investigated and generalized to the spatio-temporal framework, but are left for a follow-up publication.In a nonstationary context, a comparison to the echo state networks propossed in Huang et al. (2022) would also be of great interest.
Another interesting consequence of defining the models through local stochastic partial differential equations is that the SPDEs still make sense when R d is replaced by a space that is only locally flat.We can define nonstationary Gaussian fields on manifolds, and still obtain a GMRF representation.Important improvements were obtained in the spatial case (Pereira et al., 2022).The generalization to space-time processes is being explored further (Pereira, 2023).
Possible generalization to spatio-temporal SPDEs with a fractional exponent in the diffusion term could also be considered.A development of the methods proposed by Bolin and Kirchner (2020) and Vabishchevich (2015) should be explored.
The SPDE approach for spatio-temporal datasets with advection and diffusion function (Carrizo-Vergara et al., 2022), hence it can be written as Cov(0, h) = where S(ξ, ω) is the spectral density defined as Integrating over ω, we obtain the spatial spectral density We detail here the discretization scheme of the advection-diffusion spatiotemporal SPDE (5).
For the sake of a clearer exposition, we set H = I, α = 1 and we consider a spatio-temporal white noise Z(t, s) = W (t, s).The proof for the general case follows exactly the same lines as the proof below.The considered SPDE is The SPDE approach for spatio-temporal datasets with advection and diffusion Remark 3 When the diffusion term includes an anisotropy matrix H, i.e., when ∆ is replaced by ∇ • H ∇, the stiffness operator becomes G(v, w) = Ω H ∇v • ∇w d s, and the stiffness matrix changes consequently.

Appendix D Proof of Proposition 6
We present here the proof of Proposition 6.
Proof Let us denote x 0:N T = [x (0) , . . ., x (N T ) ] ⊤ the vector containing all spatial solutions until time step N T .Then, By denoting F = E E ⊤ , the global precision matrix reads .
By replacing the values of D and F and by defining J = M + dt c (K + B + S) , we obtain

Figure 1
Figure 1 Left: spatial domain of study (red square).Right: zoom on the spatial domain along with measuring stations.

Figure 2
Figure 2 GHI G and Clear Sky Index Kc for 4 different stations on the 28th of May 2013

Figure 3
Figure 3 Left: Histogram of Kc over 20 time steps.Right: Time series of Kc for all the stations over 20 time steps, along with the mean value for each time step (in red)

Figure 6
Figure 6 Averaged RMSE for ST prediction configuration for 6 different models and 2 validation settings: Uniform (left) and South-East (right)

Figure 7
Figure 7 Real Kc, mean of conditional simulations of Kc and ±2σ envelope at time horizons T, (T + 1), (T + 2), . . ., (T + 6).Left: orange station in the North-West part of the domain.Right: green station in the South-East part of the domain h ξ)S S (ξ) d ξ , (B3)

Table 2
Estimated parameters and log-likelihood for 6 different models from all data on a 20-minute window