Estimation of causal continuous‐time autoregressive moving average random fields

We estimate model parameters of Lévy‐driven causal continuous‐time autoregressive moving average random fields by fitting the empirical variogram to the theoretical counterpart using a weighted least squares (WLS) approach. Subsequent to deriving asymptotic results for the variogram estimator, we show strong consistency and asymptotic normality of the parameter estimator. Furthermore, we conduct a simulation study to assess the quality of the WLS estimator for finite samples. For the simulation, we utilize numerical approximation schemes based on truncation and discretization of stochastic integrals and we analyze the associated simulation errors in detail. Finally, we apply our results to real data of the cosmic microwave background.

(cf. Brockwell (2014) and the references therein). By contrast, considerably less is known about CARMA random fields indexed by R d , which have been defined only recently. To the best of our knowledge, two different classes exist in the literature: the isotropic CARMA random field was introduced in Brockwell and Matsuda (2017) and the causal CARMA random field in Pham (2018). Although Bayesian parameter estimation is included in Brockwell and Matsuda (2017), the article Pham (2018) only provides stochastic properties of causal CARMA random fields. The goal of this article is to provide a semiparametric method to estimate model parameters of causal CARMA random fields from discretely observed samples.
A Lévy-driven causal CARMA random field (Y (t)) t∈R d on R d is given by the equation where A 1 , … , A d ∈ R p×p are companion matrices, e p = (0, … , 0, 1) ⊤ , b ∈ R p and Λ is a homogeneous Lévy basis, that is, the multiparameter analog of a Lévy process (see Section 2 for more details). Due to its similar structure, many commonly known properties of CARMA processes also hold for Y , such as càdlàg sample paths, exponentially decreasing autocovariance functions, and rational spectral densities. In fact, the random field Y reduces to a causal CARMA process if d = 1. Moreover, Y has an autocovariance function, which is both anisotropic and nonseparable in the sense of Guttorp and Schmidt (2013).
Since the matrices A 1 , … , A d are in companion form, they are completely determined by their eigenvalues. These eigenvalues in conjunction with the components of the vector b will form the model parameters. As our main tool for parameter estimation, we choose the variogram, which is broadly applied in spatial statistics. It is defined as for stationary random fields (cf. Section 2.2.1 of Cressie (1993)). Furthermore, it is pointed out in Section 2.4.1 of Cressie (1993) that variogram estimation performs better than autocovariance estimation in terms of bias and in the presence of trend contamination. Assuming that observations of Y are given on a regular lattice L = {Δ, … , NΔ} d , we estimate the model parameters by a two-step procedure. First, we calculate an empirical version of the variogram (⋅) at different lags using a nonparametric estimator * N (⋅). Second, we fit the empirical variogram to the theoretical one using a weighted least squares (WLS) method. More precisely, for a given set of strictly positive weights w j , we estimate the true vector of CARMA parameters 0 by means of the WLS estimator * where Θ is a compact parameter space containing 0 and K is the number of lags used (see also Equation (8)). An important task in connection with this approach is to determine sufficiently many lags t (1) , … , t (K) ∈ R d in order to obtain identifiability of the model parameters. We tackle this problem and show that under certain conditions a small number of lags on the principal axes of the Cartesian coordinate system is already sufficient to recover the CARMA parameters. In particular, one does not need to assume the property of invertibility (or an analog thereof) as for CARMA processes. This fact differentiates the one-dimensional case from the higher dimensional case and we will investigate this in more detail.
Another part of this article is devoted to the study of different numerical simulation schemes for the causal CARMA random field. We derive approximation algorithms similar to those presented in Chen, Chong, and Klüppelberg (2016) and Nguyen and Veraart (2017), which are based on truncation or discretization of the stochastic integral in Equation (1). We show that the output converges in mean square and almost surely to the underlying CARMA random field. The algorithms are then used to conduct a simulation study in order to assess the quality of the WLS estimator. Subsequently, we apply the estimator to data of the cosmic microwave background (CMB).
Our article is organized as follows: we recall the definition and basic properties of causal CARMA random fields in Section 2. Therein, a new formula for the spectral density is also proven. Strong consistency and asymptotic normality of the nonparametric variogram estimator * N (⋅) is shown in Section 3. Subsequently, Section 4 is concerned with the asymptotic properties of the WLS estimator * N . Under identifiability conditions, we show strong consistency and asymptotic normality. Although it is easier to show identifiability of CAR parameters, we obtain identifiability of CARMA parameters by carefully analyzing algebraic properties of the variogram. In Section 5, we consider two different simulation methods and their associated algorithms. It is shown that the simulations converge pointwise both in L 2 and almost surely to the underlying true random fields as the truncation parameter tends to infinity and the discretization parameter tends to zero. The article concludes with a simulation study and an application to CMB data in Sections 6 and 7. We use the following notation throughout this article: 1 {⋅} denotes the indicator function such that, for instance, 1 {t≥0} is the Heaviside function. Furthermore, A ⊤ denotes the transpose of a matrix (or a vector) A. The components of a vector u ∈ R d are given by u 1 , … , u d if not stated otherwise. For u, v ∈ R d , ||u|| is the Euclidean norm, u ⋅ v ∈ R is the scalar product, u ⊙ v ∈ R d is the componentwise product, and u ≤ v if and only if u i ≤ v i for all i ∈ {1, … , d}. The d-dimensional interval [u, v] is defined as [u, v] ∶= {s ∈ R d ∶ u ≤ s ≤ v} and we set R + = [0, ∞). In addition, e 1 , … , e d are the unit vectors in R d and e ∶= e 1 + … + e d = (1, … , 1) ⊤ . Diagonal matrices are denoted by diag( 1 , … , d ) ∈ R d×d and M d (C[z]) is the space of all matrix polynomials of dimension d × d. Finally, Re(z) and Im(z) are the real and imaginary part of a complex number z, Leb(⋅) is the Lebesgue measure, and i is the imaginary unit.

PRELIMINARIES
First and foremost, we summarize some important properties of causal CARMA random fields. To this end, let us fix a probability space (Ω,  , P), supporting all stochastic objects in this article. As stated in the introduction, CARMA random fields are defined as stochastic integrals driven by homogeneous Lévy bases. These are random measures, which can be observed as a generalization of Lévy processes and their integration theory was developed in the seminal article Rajput and Rosiński (1989). For a homogeneous Lévy basis Λ, we denote its characteristic triplet by ( , 2 , ), where ∈ R, ∈ R + and is a Lévy measure. We say that Λ has a finite second moment and variance 2 ∶= 2 + ∫ R z 2 (dz) if and only if ∫ R z 2 (dz) < ∞. The variance 2 always appears in conjunction with the variogram or mean squared errors throughout this article. For more details on Lévy bases, we refer to Section 2 in Pham (2018). The following definition of causal CARMA random fields is taken from the same reference.
Definition 1. Let q and p be two nonnegative integers such that q < p, b = (b 0 , … , b p−1 ) ⊤ ∈ R p with b q ≠ 0 and b i = 0 for i > q, e p = (0, … , 0, 1) ⊤ ∈ R p , and A i be the companion matrix to a monic polynomial a i of degree p with real coefficients and roots having strictly negative real parts for i = 1, … , d. A random field (Y (t)) t∈R d is called (causal) CARMA(p, q) random field if it satisfies the equations where Λ is a homogeneous Lévy basis on R d with ∫ R log (|z|) d 1 {|z|>1} (dz) < ∞. A (causal) CARMA(p, 0) random field is also called a (causal) CAR(p) random field.
Under the conditions specified in this definition, it was shown in Pham (2018) that CARMA random fields exist and are well defined. Furthermore, they are by definition causal since the value of Y(t) at t ∈ R d only depends on the driving Lévy basis Λ on the set (−∞, This type of causality can be interpreted as a directional influence. In addition, it is immediate to see that they have the moving average representation where the kernel g is given by The kernel g is anisotropic in contrast to the isotropic CARMA random field in Brockwell and Matsuda (2017). In addition, it is nonseparable, that is, it cannot be written as a product of the form g(s) = g 1 (s 1 ) … g d (s d ) with real-valued functions g i except in the CAR(1) case. If d = 1, we recover the classical kernel of a causal CARMA process (indexed by R).
Remark 1. Causal CARMA random fields solve a system of stochastic partial differential equations, which generalizes the classical state-space representation of CARMA processes. For more details, see Section 3 in Pham (2018).
In this article, we always impose the following additional conditions: Assumption 1. • The Lévy basis Λ has mean zero and a finite second moment.
• The companion matrix A i has distinct eigenvalues for i = 1, … , d.
The first part of Assumption 1 ensures the existence of a second-order structure of Y , which is crucial for our estimation procedure in Section 4. In addition, the zero mean condition facilitates some computations, however, it is neither necessary nor restrictive. The autocovariance functions of CARMA random fields are nonseparable (except in the CAR(1) case), anisotropic, and integrable over R d since they are exponentially decreasing (cf. Pham (2018)). This implies the existence of a spectral density, which can be shown to be rational as for CARMA processes.
The second part of Assumption 1 is analogous to Assumption 1 in Brockwell, Davis, and Yang (2011), where it is also pointed out that this condition is not critical since multiple eigenvalues can be handled as a limiting case. Furthermore, this assumption implies that the kernel g from Equation (4) can alternatively be represented as where d( 1 , … , d ) are (possibly complex) coefficients and ∑ i denotes the sum over distinct eigenvalues of A i for i = 1, … , d (cf. Corollary 3.7 in Pham (2018)).
It is commonly known that an equidistantly sampled CARMA process is always an ARMA process. Under certain conditions, we also obtain an ARMA random field if we sample a CARMA random field on a regular lattice (see Section 4.3 in Pham (2018)). However, these conditions are rather restrictive (one of which is A 1 = … = A d ) and it is not known whether this sampling property can be generalized to the whole class of CARMA random fields. Nevertheless, we will see in Section 5 that a CARMA random field sampled on a regular grid can always be approximated arbitrarily well by a discrete-parameter moving average random field (of finite order) in terms of the mean squared error and almost surely.
From Equation (3), we observe that Y is strictly stationary, which in turn implies that the variogram is translation-invariant, that is, independent of s. In Section 4, we will estimate the CARMA parameters b and the eigenvalues of A 1 , … , A d by fitting an empirical version of to its theoretical counterpart. Therefore, it is necessary to have the variogram structure of CARMA random fields at hand, which is given in the next proposition. Recall that u ⊙ v ∈ R d is the componentwise product for u, v ∈ R d . Proposition 1. Suppose that (Y (t)) t∈R d is a CARMA(p, q) random field such that Assumption 1 holds true. Then the variogram of Y has the form and ∑ i denotes the sum over distinct eigenvalues of A i for i = 1, … , d.
Proof. The statement is a combination of Theorem 4.1. in Pham (2018) and the relation is the autocovariance function of Y . ▪ As it was argued in Pham (2018), it is in general hard to find explicit formulas for d v ( 1 , … , d ) in terms of d, p, q, b, A 1 , … , A d . However, if we fix the dimension d and the orders p and q, we are able to compute the variogram explicitly. We consider the following example.
These formulas have been computed with the computer algebra system Mathematica.
The next result, which is of theoretical interest and will be useful later on, contains a formula for the spectral density of Y , which is more explicit than Equation (4.5) in Pham (2018).
Proposition 2. Suppose that (Y (t)) t∈R d is a CARMA(p, q) random field such that Λ has a finite second moment. Furthermore, let a i (z) = ∑ p j=0 a i,j z p−j for i = 1, … , d be the monic polynomials in Definition 1. Then the spectral density f of Y has the representation and and matrix polynomials For each i = 1, … , d, the (k, l)-entry of the matrix polynomial Q i is given by Proof. Proposition 11.2.2 in Bernstein (2005) and Equation (4) imply that the Fourier transform of the kernel g satisfies Applying Lemma 3.1 in Brockwell and Schlemm (2013) and the relation yields the claimed assertion. ▪

ASYMPTOTIC PROPERTIES OF THE EMPIRICAL VARIOGRAM
Let (Y (t)) t∈R d be a CARMA(p, q) random field satisfying Assumption 1. If, we are given observations of Y on a lattice L = {Δ, … , NΔ} d , we can estimate the variogram (⋅) by Matheron's method-of-moment estimator (cf. Section 2.4 in Cressie (1993) for more details) * We aim to show strong consistency and multivariate asymptotic normality of * N (⋅) as N tends to infinity. To this end, we make use of the asymptotic normality of the autocovariance estimator * which was shown in Berger (2017) for moving average random fields by applying a blocking technique and a central limit theorem for m-dependent random fields.
Theorem 1. Suppose that (Y (t)) t∈R d is a CARMA(p, q) random field such that Assumption 1 holds true, Λ has a finite fourth moment 4 and observations of Y are given on the lattice L = {Δ, … , NΔ} d .
where the two matrices F and V = (v i,j ) i,j=0,… ,K are given by for all i, j = 0, … , K.
Proof. First of all, we show strong consistency of the variogram estimator * N (⋅). By Corollary 3.18 in Berger (2017), we have for all t ∈ ΔZ d that as desired. It remains to show asymptotic normality of * N (⋅). Since the kernel g in Equation (5) is a sum of exponentials, we have for every i, j = 0, … , K that Moreover, by Theorem 4.1. in Pham (2018) we also have that ∑ We conclude that the conditions of Theorem 3.8 in Berger (2019) are satisfied, which in turn shows that Consider now the mapping whose Jacobian is the matrix F. The multivariate delta method (see e.g., Proposition 6.4.3 in Brockwell and Davis (1991)) in combination with the mapping f and Equation (7) yields as N → ∞, and Slutsky's theorem finishes the proof. ▪

ESTIMATION OF CARMA RANDOM FIELDS
According to Definition 1, a CARMA random field is determined by the pair (p, q), the vector b, the companion matrices A 1 , … , A d and the Lévy basis Λ. To avoid redundancies in model specification one usually assumes that either b 0 or 2 is known. We assume the latter and thus the goal of this section is to estimate b and A 1 , … , A d when p, q, and 2 are given. Since every companion matrix is uniquely determined by its eigenvalues, we define the CARMA parameter vector as Recall that A i is real by definition and thus its eigenvalues are real or appear in pairs of complex conjugates. In order to estimate , we fit the empirical variogram * N (⋅) of the last section to the theoretical variogram (⋅) using a WLS approach. In other words, we consider the estimator * where Θ ⊆ R q+1 × C dp is a compact parameter space containing the true parameter vector 0 , w j > 0 are strictly positive weights and t (1) , … , t (K) ∈ R d are prescribed lags. The article Lahiri, Lee, and Cressie (2002) determines asymptotic properties of least squares estimators for parametric variogram models subject to asymptotic properties of the underlying variogram estimators. We use these results in conjunction with Theorem 1 to show strong consistency and asymptotic normality of * N . In the following, we denote by the vector of first-order partial derivatives of (t (1) ), … , (t (K) ) with respect to the ith coordinate of and define which is the Jacobian matrix of the mapping  → ( (t (1) ), … , (t (K) )).
Theorem 2. Suppose that (Y (t)) t∈R d is a CARMA(p, q) random field with true parameter vector 0 such that Assumption 1 holds true, 2 = 1, Λ has a finite fourth moment and observations of Y are given on the lattice L = {Δ, … , NΔ} d . Furthermore, assume that • the true parameter vector 0 lies inside a compact parameter space Θ ⊆ R q+1 × C dp , Then, we have both Proof. We only have to check conditions (C.1)-(C.3) in Lahiri et al. (2002) since our assertions follow directly from Theorems 3.1 and 3.2 of this reference. Since (t) = 2( (0) − (t)), it suffices to show that for each t ∈ R d the autocovariance (t) is continuously differentiable with respect to in order to check (C.2)(ii). Recall that the relation Due to the exponential structure of the integrand, we recognize that (t) is in fact infinitely often differentiable with respect to and therefore condition (C.2)(ii) holds true for every t ∈ R d + . For each t ∈ R d an analogous argument applies with a slightly different integrand. Moreover, condition (C.2)(ii) implies both (C.2)(i) and (C.1) in light of the identifiability criterion and the fact that Θ is compact. Finally, the condition (C.3) is trivial since W does not depend on . ▪ An important task in connection with the previous theorem is to determine a sufficient set of lags t (1) , … , t (K) such that the identifiability condition is satisfied. For CAR(p) random fields it is enough to consider finitely many lags on the principal axes of the Cartesian coordinate system. Before examining this matter, we prepend an auxiliary lemma, which presents a simplified representation of the variogram on the principal axes.
Lemma 1. Suppose that (Y (t)) t∈R d is a CARMA(p, q) random field such that Assumption 1 holds true. Then there exists a set of complex coefficients {d * i ( i ) ∶ i is an eigenvector of A i , i = 1, … , d} such that the values of the variogram is given on the principal axes by where ∑ i denotes the sum over distinct eigenvalues of A i .
Proof. Proposition 1 implies for every i = 1, … , d and ∈ R d that The next example displays more explicit formulas for d * i ( i ) in the case of a CARMA(2, 1) random field.
Example 2. Let (Y (t)) t∈R 2 be a CARMA(2, 1) random field such that Assumption 1 holds true. Then, we have for every t 1 , t 2 ∈ R that The next theorem establishes the identifiability of CAR(p) parameters. Note that replacing the vector b by −b would not change the variogram. Hence, we may assume that b 0 is nonnegative.
Proof. Assuming without loss of generality that Δ = 1 and 2 = 1∕2, and setting i0 = 0 and is twice the variance of Y and therefore nonzero. Introducing the polynomials where the last equation follows from the definition of R i (z). Hence, we get the linear systems where the system matrices Ψ i are quadratic Hankel matrices. We show that all Ψ i are invertible.
To this end, for fixed This gives the linear system Since the system matrix is a regular Vandermonde matrix and the coefficients d * i ( ik ) are nonzero, we conclude that P(e ik ) = 0 for k = 0, … , p. The polynomial P(⋅) has (p + 1) different roots and is of degree p. Consequently, it has to be the zero polynomial, which means that u = 0 and Ψ i is invertible. By solving the linear systems (12), we get all r il , which gives the e il by determining the roots of R i (z). Finally, all eigenvalues il can be obtained uniquely using the condition on the imaginary part Im( ), and it is trivial to recover b 0 in light of Equations (6) and (10). (1) The set of parameter vectors of CAR random fields, which have at least one vanishing coefficient d * i ( i ) is a lower dimensional algebraic variety in the parameter space R × C dp . Thus, the Lebesgue measure of this set is zero and almost all ∈ R × C dp satisfy the condition on the coefficients d * i ( i ) in Theorem 3. For instance, in the setting of Example 2, d * 2 ( 21 ) = 0 if and only if ( 2 11 + 3 11 12 + 2 12 − 2 21 ) = 0. (2) The condition − ∕Δ ≤ Im( ) < ∕Δ is necessary due to the complex periodicity of the exponential function. In time series analysis this problem is associated with the aliasing effect, that is, the emergence of redundancies when sampling the process (cf. e.g., Section 3.4 and Assumption C5 in Schlemm and Stelzer (2012)).
Having established identifiability for CAR(p) random fields, we now turn to CARMA(p, q) random fields. For classical CARMA processes on R it is commonly known that one needs to impose at least conditions like b 0 ≥ 0 and invertibility in order to identify CARMA parameters from the second-order structure (i.e., either autocovariance, spectral density or variogram). For instance, if we consider the spectral density of a CARMA(p, q) process with AR polynomial a(⋅) and MA polynomial b(⋅), then the numerator of f yields the polynomial . Therefore, assuming invertibility, that is, the condition that every root of b(⋅) has a negative real part, is necessary to determine the MA polynomial b(⋅) uniquely. However, this reasoning cannot be carried over to the causal CARMA random field since two additional obstacles occur: first, the spectral density f of Proposition 2 is now a multiparameter function and, second, it is in general not separable, that is, it cannot be written as a product of the form f( . Therefore, we cannot iterate the previous argument to each dimension. In addition, the roots of the numerator Q(z)Q(−z) are not discrete points in C anymore but, more generally, form algebraic varieties in C d . This makes it harder to formulate a similar condition as invertibility for the multiparameter case. However, as we shall see by the end of this section, an invertibility condition is in fact not necessary. In order to show identifiability of CARMA random fields, we study the algebraic properties of the variogram and start with the following result.
where i = 1, … , d, i is an eigenvalue of A i and M(i, i ) are matrices that only depend on the (known) eigenvalues of A 1 , … , A d . That is, we are given pd quadratic equations in q + 1 unknowns b 0 , … , b q . Assumption 1 and Bézout's theorem (see e.g., Theorem 18.3 in Harris (1992)) conclude the proof. ▪ The previous theorem shows that every fiber of the mapping Θ ∋  → S is finite. This property is also called algebraic identifiability (cf. Section 1 in Améndola, Ranestad, and Sturmfels (2018)). To obtain statistical identifiability, we explicitly compute the variogram coefficients d * i ( i ) in Equation (13), which yields a polynomial system in terms of the CARMA parameters. One has then to show that this system has a unique solution. We demonstrate our method for the CARMA(2, 1) case and show how it can be applied to higher (p, q).
Proof. First of all, the eigenvalues 11 , 12 , 21 , 22 and coefficients d * 1 ( 11 ), d * 1 ( 12 ), d * 2 ( 21 ), and d * 2 ( 22 ) can be recovered exactly as in Theorem 3. Hence, we only have to determine the parameters b 0 and b 1 . Using the formulas of Example 2, it is an easy task to verify the equations We have to show that b 0 and b 1 are identifiable from this system, where 11 , 12 , 21 , 22 and d * 1 ( 11 ), d * 1 ( 12 ), d * 2 ( 21 ), d * 2 ( 22 ) are known. It therefore suffices to consider all four numerators and show that the system )) , implies b 0 = b 0 and b 1 = b 1 , where we assume that b 0 is nonnegative and b 1 ≠ 0. Defining the variables  (1) Note that in Proposition 3, we have not used the full variogram, but only values on the principal axes. Working with the full variogram, we are able to dispose of the condition 11 12 ≠ 21 22 . However, imposing this weak condition has the advantage that we do not have to estimate the full variogram and the set of parameters, which satisfy 11 12 = 21 22 is a Lebesgue null set in C 4 .
In a similar fashion, we can show the following result. Since all factors in Equation (17)   Proof. The assertion can be proven analogously to the proof of Proposition 3. We therefore only highlight the difference. Instead of four, we have six different d * i ( i ) in this case. Defining x 1 , x 2 , x 3 as before, we obtain a linear system of size 6 × 3 similar to Equation (16). The system matrix has (   6  3   ) = 20 different 3 × 3-minors, one of, which is − 6 2 11 2 12 2 13 ( 11 + 12 ) 2 ( 11 + 13 ) 2 ( 12 + 13 ) 2 ( 21 − 22 )( 21 This minor is always nonzero under our assumptions. Thus, we conclude b 0 = b 0 and b 1 = b 1 . ▪ The method used to show identifiability for CARMA(2, 1) and CARMA(3, 1) random fields on R 2 relied on the definition of appropriate variables x 1 , x 2 , x 3 in Equation (15) and a system of pd = 4 equations in the first case and pd = 6 equations in the second case. Both systems have a unique solution provided that at least one of the minors of the coefficient matrix is nonzero. In the first case four minors of the 4 × 3 coefficient matrix had to be considered and in the second case 20 of the 6 × 3 coefficient matrix. The complexity of this method becomes too high to consider higher order models. Moreover, we have observed that for the CARMA(3, 2) model on R 2 the method fails, since the determinant of the corresponding 6 × 6 coefficient matrix is always zero. However, this does not prevent parameter identifiability, since-as we note from Equation (15)-the components of the vector x display algebraic dependencies, that is, the variables of the corresponding linear systems are not independent.
As an alternative to the substitution (15), we can find a solution to the original system of pd quadratic Equation (13) for the q + 1 variables b 0 , … , b q directly taking resort to representations via Gröbner bases (see e.g., Chapter 2 of Cox, Little, and O'Shea (2015)). As a test case, we have replicated Proposition 3 using the software Mathematica, where the pd = 4 quadratic equations in (14) were transformed to an equivalent system of 48 polynomial equations. From these, we could read off b 0 = b 0 and b 1 = b 1 immediately and again obtain identifiability.
Note that in Propositions 3 and 4, we have not assumed any extra conditions on b except for b 0 ≥ 0. In particular, it is not necessary to impose an analogous condition to invertibility in order to achieve identifiability. This illustrates a fundamental difference between CARMA processes and CARMA random fields with d ≥ 2.

SIMULATION OF CARMA RANDOM FIELDS ON A LATTICE
In this section, we develop two numerical simulation schemes for the causal CARMA random field. One is designed for compound Poisson noise and the other one for general Lévy noise. In both cases, we simulate on a lattice L = {Δ, … , NΔ} d with fixed Δ > 0 and N ∈ N. Techniques for simulating on more general lattices are discussed as well.

Compound Poisson noise
The homogeneous Lévy basis Λ is assumed to be compound Poisson in this subsection. That is, the characteristic triplet of Λ satisfies = c∫ (−1,1) x F(dx), = 0 and = cF, where c > 0 is the intensity parameter and F is a probability measure on R (cf. Section 1.2.4 in Applebaum (2009)). As a consequence, the resulting CARMA random field (Y (t)) t∈R d in Equation (3) can be represented as The random field Y S1 has the alternative representation with Λ S1 (ds) = 1 D (s)Λ(ds), hence it arises by truncating the Lévy basis Λ. The advantage of Y S1 is that we can simulate it exactly.
In order to assess the accuracy of this approximation algorithm, we determine its mean squared error. Note that as the simulation of Y S1 is exact, we only have to consider the approximation error between Y and Y S1 . Moreover, we show that the simulated random field Y S1 (t) converges for fixed t ∈ L both in L 2 and almost surely to the underlying true random field Y(t) as the truncation parameter M tends to infinity.
Theorem 5. Suppose that (Y (t)) t∈R d is a CARMA(p, q) random field such that Assumption 1 holds true and Λ is compound Poisson with characteristic triplet (c∫ (−1,1) x F(dx), 0, cF), where c > 0 and F is a probability distribution. Then the mean squared error of Algorithm 1 satisfies where the coefficients d(⋅) are the same as in Equation (5) Furthermore, Y S1 (t) converges to Y(t) in L 2 and almost surely as M → ∞ for every t ∈ L = {Δ, … , NΔ} d . (3) and (18), we observe that

Proof. By the properties of Lévy bases and Equations
where in the first equation we have taken into account that the kernel g contains the indicator function 1 {s≥0} . In addition, Equation (5) implies Plugging this into Equation (20), we arrive at Equation (19), which in turn shows that Y S1 (t) converges to Y(t) in L 2 for every t ∈ L = {Δ, … , NΔ} d . It remains to show that the convergence also holds almost surely. Due to Chebyshev's inequality, we have for each t ∈ L = {Δ, … , NΔ} d that where we explicitly include the input parameter M into the subscript of Y S1,M (t). The right-hand side of the latter inequality is finite due to Equation (19). Finally, the assertion follows from the Borel-Cantelli lemma. ▪ Remark 4.
(1) Algorithm 1 can also be applied to pure-jump Lévy bases if small jumps are truncated. This technique has been analyzed in detail in Section 3 of Chen et al. (2016) for the simulation of stochastic Volterra equations in space-time. Furthermore, Section 4 of Chen et al. (2016) considers a simulation technique, which is based on series representations for Lévy bases (see also Rosiński (2001)). However, we do not pursue this direction. Instead, in the next subsection, we consider a method, which are not restricted to pure-jump Lévy bases, easy to implement and sufficient for our simulation study in Section 6. (2) One can readily replace L = {Δ, … , NΔ} d in step (5) of Algorithm 1 with any finite subset of points in R d . Algorithm 1 is not restricted to simulation on lattices.

General Lévy noise
Algorithm 1 is not suitable for CARMA random fields driven by general Lévy bases since a drift or a Gaussian part may be part of the noise. A different way to approximate a CARMA random field (Y (t)) t∈R d is to discretize and truncate the stochastic integral in Equation (3). Introducing a truncation parameter M ∈ N, we first replace the integral in Equation (3) by By discretization of this integral, we obtain the sum Herein, the random field Z represents spatial increments of Λ, or more precisely This approach has also been applied in Nguyen and Veraart (2017) to simulate the so-called OU ∧ process. Since, we evaluate Y S2 only on the lattice L = {Δ, … , NΔ} d , we actually simulate a discrete-parameter moving average random field of finite order driven by i.i.d. spatial noise as given in Equation (21). The set {g(s) ∶ s ∈ {0, Δ, … , MΔ} d } plays the role of the moving average coefficients and Y S2 can be simulated exactly. Furthermore, it is easy to check that the random field Y S2 also has the representation where the step function g S2 is given by This allows us to observe that truncation and discretization of the stochastic integral in Equation (3) is in fact equivalent to truncation and discretization of the kernel g, which will be useful for establishing error bounds. We sum up the simulation scheme in the following algorithm, where ( , 2 , ) denotes the characteristic triplet of Λ.

Draw Z(s), s
If, we collect the g(s) values from the second step of Algorithm 2 in an array A g , and the Z(s) values from the third step in an array A Z , then the Y S2 (s) values from the fourth step can be computed as the discrete convolution of the two arrays A g and A Z . This can be carried out efficiently using the fast Fourier transform (FFT). In-built convolution commands using the FFT exist in computer software such as R or Matlab.
By approximating the CARMA random field Y by Y S2 , we create two sources of error, one originates from the kernel truncation, the other one from the kernel discretization. A more detailed analysis yields the following result.
Proof. For notational convenience, we assume that all eigenvalues of A 1 , … , A d are real. The complex case can be shown analogously by similar arguments taking care of imaginary parts. The mean squared error is by stationarity for each t ∈ L = {Δ, … , NΔ} d the same, namely, ds.
Herein, we have used the inequality ( ∑ K j=1 a j ) 2 ≤ K ∑ K j=1 a 2 j with a 1 , … , a K ∈ R. In order to evaluate the latter integral, we consider for fixed 1 , … , d the identities Summing these up, we obtain where the function f 1 is defined as ) .
In addition, we have the limits and Moreover, we observe that For every > 0, the function f 3 (v) is bounded and continuous on (0, ). We therefore arrive at which shows that the convergence in Equation (0) Hence, for every t ∈ L = {Δ, … , NΔ} d , Y S2 (t) converges to Y(t) in L 2 as simultaneously Δ → 0 and ΔM → ∞.
As for the second part of our assertion, we note that if Δ k = (k −1− ), then all < 0 satisfy the inequality which can be shown with the Taylor expansion of the exponential function. Defining and Finally, the almost sure convergence follows similarly as in the proof of Theorem 5 by Chebyshev's inequality and the Borel-Cantelli lemma. ▪ Remark 5.
(1) Instead of simulating on the regular lattice L = {Δ, … , NΔ} d , one can easily adjust Algorithm 2 for simulating on the more general lattice (2) In Section 4 of Pham (2018) it was shown that under mild conditions every CARMA random field has a version, which is càdlàg with respect to the partial order ≤. By inspection of Algorithms 1 and 2 it is easy to see that both Y S1 and Y S2 are càdlàg as well.

SIMULATION STUDY
We conduct a simulation study in order to assess the empirical quality of the WLS estimator of the previous section for finite samples. We use Algorithm 2 to simulate 500 paths of a CARMA(2, 1) random field on a two-dimensional grid. As CARMA parameters, we take the estimates from Section 7, which are b 0 = 4.8940, b 1 = −1.1432, 11 = −1.7776, 12 = −2.0948, 21 = −1.3057, 22 = −2.5142.
We take a Gaussian Lévy basis Λ with mean zero and variance one. In accordance with the parameter estimation in Section 7, we first choose Δ = 0.04 for the grid size of Algorithm 2, M = 400 for the truncation parameter, and N 2 = 1, 000 2 for the number of points for each path. However, this choice results in relatively high approximation errors, yielding only poor parameter estimates. By choosing a higher truncation parameter M and a smaller grid size Δ, the step function g S2 in Equation (22) approximates the CARMA kernel g in Equation (4) better, which by Equation (0) also reduces the approximation error of the CARMA random field. We therefore decide to simulate on a finer grid with Δ = 0.01, M = 600, and N 2 = 4, 000 2 . After simulation, we save only every fourth point in each of the two axes directions of R 2 in order to be back in the setting of Section 7 with Δ = 0.04 and N 2 = 1, 000 2 points per path.
Having simulated the CARMA random fields on a grid, we proceed by estimating the variogram using the variogram estimator of Section 3. We calculate the empirical variogram at K = 100 different lags, namely, These lags lie on the principal axes of R 2 and are by Proposition 3 sufficient to identify the CARMA(2, 1) parameters. In the final step, we estimate the CARMA parameter vector with the WLS estimator * N given in Equation (9). We consider the following choices for weights and number of lags used.
Case 3: * Case 4: * Cases 1 and 2 apply quadratically decreasing weights while Cases 3 and 4 apply exponentially decreasing weights. The compact parameter space Θ is chosen to be Θ = [0, 10] × [−10, 10] × [−10, 0] 4 , which contains the true parameter vector 0 = (b 0 , b 1 , 11 , 12 , 21 , 22 ). For minimization of the objective function, we use the command DEoptim of the R package DEoptim, which implements the differential evolution algorithm (for more details see Mullen, Ardia, Gil, Windover, and Cline (2011)). This algorithm has the advantage that we do not need an initial value for the optimization procedure. Instead, one can directly hand over the parameter space Θ as an TA B L E 1 Parameter estimation results for CARMA(2, 1) on R 2 with K = 100 lags, quadratically decreasing weights as in Equation (28)  TA B L E 2 Parameter estimation results for CARMA(2, 1) on R 2 with K = 50 lags, quadratically decreasing weights as in Equation (29)  input. The output of DEoptim itself is then used as an initial value for the standard R command optim. The summary of the estimation results are given in Tables 1-4 below. Figure 1 shows the estimated variogram of a simulated path along with fitted variogram curves. Recall that in our parametrization b 0 actually plays the role of the white noise standard deviation (SD). Comparing Table 1 with 2 and Table 3 with 4, we observe that using K = 50 instead of K = 100 lags generally reduces the SD, but increases the bias for most of the estimators. This lags, quadratically decreasing weights as in Equation (28)

TA B L E 6
Parameter estimation results for CARMA(2, 1) on R 2 with K = 50 lags, quadratically decreasing weights as in Equation (29)  indicates a typical variance-bias trade-off subject to the number of lags used. Moreover, we find that using exponential weights as in Equations (30) and (31) increases the SD and the root mean squared error for all components of * N . According to Theorem 2, the asymptotic properties of the WLS estimator * N does not depend on the distribution of the Lévy basis Λ. To examine this statement for finite samples, we repeat the procedure above with variance gamma noise. More precisely, we simulate 500 independent TA B L E 7 Parameter estimation results for CARMA(2, 1) on R 2 with K = 100 lags, exponentially decreasing weights as in Equation (30)  CARMA(2, 1) paths driven by a variance gamma basis Λ with mean zero and variance one, compute the empirical variogram as in Equation (27) and estimate the CARMA parameters as in Cases 1-4. The results are summarized in Tables 5-8. Comparing the RMSEs in Tables 1-8, we observe that the WLS estimation is slightly, but not significantly better for the variance gamma case than for the Gaussian case.

APPLICATION TO CMB DATA
We apply our theory to CMB data from the Planck mission of the European Space Agency. The 2018 data release can be downloaded publicly from the Planck Legacy Archive https://pla.esac. esa.int. The CMB maps on this website cover the full sky and have been produced using four different methods. We choose the dataset created by the SMICA method and refer to Planck Collaboration (2018) for more information. We take data points between 50 • and 70 • longitude and 10 • and 30 • latitude, the unit is given in Kelvin. We save the data with mean −8.7316 × 10 −6 and SD 9.6049 × 10 −5 into an N × N-matrix with N = 1, 000, and plot column-wise and row-wise means. Since, we do not find any deterministic trend or seasonal component in Figure 2, we may assume that the data is stationary. This is in line with standard assumptions for the CMB (see e.g., Section 2.1.1 of Giovannini (2008)). We perform a normalization of the data to have mean zero and variance one, and plot the data's empirical density against the standard normal density.
An inspection of Figure 3 reveals that the marginal distribution of the CMB data is Gaussian. Hence, we may also assume that the Lévy basis Λ is Gaussian. We proceed as in the previous section and compute the empirical variogram at 100 different lags on the principal axes, namely, { * N (jΔe i ) ∶ i = 1, 2; j = 1, … , 50} with Δ = 0.04. Assuming that the Lévy basis Λ has variance one, we estimate the parameters of CAR(1), CAR(2), and CARMA(2, 1) random fields with the WLS estimator in Equation (28)  ) 2 are 7.6132 × 10 −2 for CAR(1), 2.5769 × 10 −2 for CAR(2), and 2.0113 × 10 −2 for CARMA(2, 1). For model selection, we compute the Akaike information criterion (AIC) where P is the number of model parameters and K the number of lags used to calculate the WSS. The AIC values are −712.0453 for CAR(1), −816.3761 for CAR(2), and −839.1583 for CARMA(2, 1). These numbers are summarized in Table 9 and suggest that the CARMA(2, 1) model is optimal compared with the CAR(1) and CAR(2) models. For a visual comparison, we plot the heat map of the original CMB data together with heat maps of simulated fields in Figures 5-8. Although, we cannot draw any conclusions from a single sample path, it is possible to observe some features of the fitted models. All three models exhibit clusters of high and low values similarly to the original data. However, the cluster sizes of the CAR(1) random field are larger than those of the CMB data, whereas the CAR(2) and CARMA(2, 1) models display a better visual fit. Another common feature are horizontal and vertical lines, which is most visible in Figure 6. These lines are the consequences of the nonsmoothness of the kernel function in Equation (4). One therefore can argue that the fitted CARMA random fields represent linear approximations to the spatial dependence structures of the CMB.

ACKNOWLEDGEMENT
We cordially thank Thiago do Rego Sousa for helpful discussions. The second author acknowledges support from the graduate program TopMath at the Technical University of Munich and the Studienstiftung des deutschen Volkes.