FRK: An R Package for Spatial and Spatio-Temporal Prediction with Large Datasets

FRK is an R software package for spatial/spatio-temporal modelling and prediction with large datasets. It facilitates optimal spatial prediction (kriging) on the most commonly used manifolds (in Euclidean space and on the surface of the sphere), for both spatial and spatio-temporal fields. It differs from many of the packages for spatial modelling and prediction by avoiding stationary and isotropic covariance and variogram models, instead constructing a spatial random effects (SRE) model on a fine-resolution discretised spatial domain. The discrete element is known as a basic areal unit (BAU), whose introduction in the software leads to several practical advantages. The software can be used to (i) integrate multiple observations with different supports with relative ease; (ii) obtain exact predictions at millions of prediction locations (without conditional simulation); and (iii) distinguish between measurement error and fine-scale variation at the resolution of the BAU, thereby allowing for reliable uncertainty quantification. The temporal component is included by adding another dimension. A key component of the SRE model is the specification of spatial or spatio-temporal basis functions; in the package, they can be generated automatically or by the user. The package also offers automatic BAU construction, an expectation-maximisation (EM) algorithm for parameter estimation, and functionality for prediction over any user-specified polygons or BAUs. Use of the package is illustrated on several spatial and spatio-temporal datasets, and its predictions and the model it implements are extensively compared to others commonly used for spatial prediction and modelling.


Introduction
Fixed rank kriging (FRK) is a spatial/spatio-temporal modeling and prediction framework that is scaleable, works well with large datasets, and can deal easily with data that have different spatial supports. FRK hinges on the use of a spatial random effects (SRE) model, in which a spatially correlated mean-zero random process is decomposed using a linear combination of spatial basis functions with random coefficients plus a term that captures the random process' fine-scale variation. Dimensionality reduction through a relatively small number of basis functions ensures computationally efficient prediction, while the reconstructed spatial process is, in general, non-stationary. The SRE model has a spatial covariance function that is always nonnegative-definite and, because any (possibly non-orthogonal) basis functions can be used, it can be constructed so as to approximate standard families of covariance functions (Kang and Cressie 2011). For a detailed treatment of FRK, see Johannesson (2006, 2008), Shi and Cressie (2007), and Nguyen, Cressie, and Braverman (2012).
There are numerous R (R Core Team 2021) packages available for modeling and prediction with spatial or spatio-temporal data (see Bivand 2021), although relatively few of these make use of a model with spatial basis functions. A few variants of FRK have been developed to date, and the one that comes closest to the present software is LatticeKrig (Nychka, Bandyopadhyay, Hammerling, Lindgren, and Sain 2015;Nychka, Hammerling, Sain, Lenssen, Smirniotis, and Iverson 2019). LatticeKrig implements what we call a LatticeKrig (LTK) model, which is made up of Wendland basis functions (that have compact support) decomposing a spatially correlated process. LatticeKrig models use a Markov assumption to construct a precision matrix (the matrix K −1 in Section 2.1) to describe the dependence between the coefficients of these basis functions. This, in turn, results in efficient computations and the potential use of a large number (> 10, 000) of basis functions. LatticeKrig models do not cater for what we term fine-scale-process variation and, instead, the finest scale of the process is limited to the finest resolution of the basis functions used.
The package INLA (Lindgren and Rue 2015) is a general-purpose package for model fitting and prediction. One advantage of INLA is that it contains functionality for fitting Gaussian processes that have covariance functions from the Matérn class (see Lindgren and Rue 2015, for details on the software interface) by approximating a stochastic partial differential equation (SPDE) using a Gaussian Markov random field (GMRF). Specifically, the process is decomposed using basis functions that are triangular 'tent' functions, and the coefficients of these basis functions are normally distributed with a sparse precision matrix. Thus, these models, which we term SPDE-GMRF models, share many of the features of LatticeKrig models. A key advantage of INLA over LatticeKrig is that once the spatial or spatio-temporal model is constructed, one has access to all the approximate-inference machinery and likelihood models available within the package. Kang and Cressie (2011) develop Bayesian FRK; they keep the spatial basis functions fixed and put a prior distribution on K. The predictive-process approach of Banerjee, Gelfand, Finley, and Sang (2008) can also be seen as a type of Bayesian FRK, where the basis functions are constructed from the postulated covariance function of the spatial random effects and hence depend on parameters (see Katzfuss and Hammerling 2017, for an equivalence argument). An R package that implements predictive processes is spBayes (Finley, Banerjee, and Carlin 2007;Finley, Banerjee, and Gelfand 2015). It allows for multivariate spatial or spatio-temporal processes, and Bayesian inference is carried out using Markov chain Monte Carlo (MCMC), thus allowing for a variety of likelihood models. Because the implied basis functions are constructed based on a parametric covariance model, a prior distribution on parameters results in new basis functions generated at each MCMC iteration. Since this can slow down the computation, the number of knots used in predictive processes is usually chosen to be small, which has the effect of limiting their ability to model finer scales.
Our software package FRK differs from spatial prediction packages currently available by constructing an SRE model on a discretized domain, where the discrete element is known as a basic areal unit (BAU; see, e.g., Nguyen et al. 2012). The BAU can be viewed as the smallest spatial area or spatio-temporal volume that can be resolved by the process and, to reflect this, the process itself is assumed to be piecewise constant over the set of BAUs. The BAUs serve many purposes in FRK: They define a fine grid over which to do numerical integrations for change-of-support problems; a fine lattice of discrete points over which to predict (although FRK implements functions to predict over any arbitrary user-defined polygons); and a set of bins within which to average large spatio-temporal datasets, if so desired, for computational efficiency. BAUs do not need to be square or all equal in size, but they do need to be 'small, ' in the sense that they should be able to reconstruct the (undiscretized) process with minimal error.
In the standard 'flavor' of FRK (Cressie and Johannesson 2008), which we term vanilla FRK (FRK-V), there is an explicit reliance on multi-resolution basis functions to give complex nonstationary spatial patterns at the cost of not imposing any structure on K, the covariance matrix of the basis function weights. This can result in identifiability issues and hence in over-fitting the data when K is estimated using standard likelihood methods (e.g., Nguyen, Katzfuss, Cressie, and Braverman 2014), especially in regions of data paucity. Therefore, in FRK we also implement a model (FRK-M) where a parametric structure is imposed on K (e.g., Stein 2008;Nychka et al. 2015). The main aim of the package FRK is to facilitate spatial and spatio-temporal analysis and prediction for large datasets, where multiple observations come with different spatial supports. We see that in 'big data' scenarios, lack of consideration of fine-scale variation may lead to over-confident predictions, irrespective of the number of basis functions adopted.
In Section 2, we describe the modeling, estimation, and prediction approach we adopt in FRK. In Section 3, we discuss further details of the package and provide a simple example on the classic meuse dataset. In Section 4, we evaluate the SRE model implemented in FRK in controlled cases, against LatticeKrig models and SPDE-GMRF models through use of the packages LatticeKrig and INLA. In Section 5, we show its capability to deal with change-of-support issues and anisotropic processes. In Section 6, we show how to use FRK with spatio-temporal data and illustrate its use on the modeling and prediction of columnaveraged carbon dioxide on the globe from remote sensing data produced by NASA's OCO-2 mission. The spatio-temporal dataset contains millions of observations. Finally, Section 7 discusses future work.

Outline of FRK: Modeling, estimation and prediction
In this section we present the theory behind the operations implemented in FRK. In Section 2.1 we introduce the SRE model, in Section 2.2 we discuss the EM algorithm for parameter estimation, and in Section 2.3 we present the spatial prediction equations.

The SRE model
Denote the spatial process of interest as {Y (s) : s ∈ D}, where s indexes the location of Y (s) in our domain of interest D. In what follows, we assume that D is a spatial domain but extensions to spatio-temporal domains are natural within the framework (Section 6).
Consider the classical spatial statistical model,
In order to cater for different observation supports {B j } (defined below), it is convenient to assume a discretized domain of interest D G ≡ {A i ⊂ D : i = 1, . . . , N } that is made up of N small, non-overlapping basic areal units or BAUs (Nguyen et al. 2012) and N is the number of BAUs. At this BAU level, where for i = 1, . . . , N, and ξ i is specified below. The SRE model specifies that the small-scale random variation is υ(·) = φ(·) η, and hence in terms of the discretization onto D G , so that υ = Sη, where S is the N × r matrix defined as follows: In FRK, we assume that η is an r-dimensional Gaussian vector with mean zero and r × r covariance matrix K, and estimation of K is based on likelihood methods; we denote this variant of FRK as FRK-V (where recall that 'V' stands for 'vanilla'). If some structure is imposed on VAR(η) in terms of parameters ϑ, then K = K • (ϑ) and ϑ needs to be estimated; we denote this variant as FRK-M (where recall that 'M' stands for 'model'). Frequently, the resolution of the BAUs is sufficiently fine, and the basis functions are sufficiently smooth, so that S can be approximated: where {s i : i = 1, . . . , N } are the centroids of the BAUs. Since small BAUs are always assumed, this approximation is used throughout FRK.
In FRK, we do not directly model ξ(s), since we are only interested in its discretized version. Rather, we assume that ξ i ≡ 1 |A i | A i ξ(s)ds has a Gaussian distribution with mean zero and variance ξ is a parameter to be estimated, and the weights {v ξ,1 , . . . , v ξ,N } are known and represent heteroscedasticity. These weights are typically generated from domain knowledge; they may, for example, correspond to topographical features such as terrain roughness (Zammit-Mangion, Rougier, Schön, Lindgren, and Bamber 2015). Since we specified ξ(·) to be 'almost' spatially uncorrelated, it is reasonable to assume that the variables representing the discretized fine-scale variation, {ξ i : i = 1, . . . , N }, are uncorrelated. From (2), we can write We now assume that the hidden (or latent) process, Y (·), is observed with m footprints (possibly overlapping) spanning one or more BAUs, where typically m r (note that both m > N and N ≥ m are possible). We thus define the observation domain as D O ≡ {∪ i∈c j A i : j = 1, . . . , m}, where c j is a non-empty set in 2 {1,...,N } , the power set of {1, . . . , N }, and m = |D O |. For illustration, consider the simple case of the discretized domain being made up of three BAUs. Then D G = {A 1 , A 2 , A 3 } and, for example, D O = {B 1 , B 2 }, where B 1 = A 1 ∪A 2 (i.e., c 1 = {1, 2}) and B 2 = A 3 (i.e., c 2 = {3}). Catering for different footprints is important for remote sensing applications in which satellite-instrument footprints can widely differ (e.g., Zammit-Mangion et al. 2015).
Each B j ∈ D O is either a BAU or a union of BAUs. Measurement of Y is imperfect: We define the measurement process as noisy measurements of the process averaged over the footprints where the weights, depend on the areas of the BAUs, and I(·) is the indicator function. Currently, in FRK, BAUs of equal area are assumed, but we give (6) in its most general form. The random quantities {δ i } and { i } capture the imperfections of the measurement. Better known is the measurement-error component i , which is assumed to be mean-zero Gaussian distributed. The component δ i captures any bias in the measurement at the BAU level, which has the interpretation of an intra-BAU systematic error. These systematic errors are BAU-specific, that is, the {δ i } are uncorrelated with mean zero and variance where σ 2 δ is a parameter to be estimated, and {v δ,1 , . . . , v δ,N } represent known heteroscedasticity.
We assume that Y and δ are independent. We also assume that the observations are conditionally independent, when conditioned on Y and δ. Equivalently, we assume that the measurement errors { j : j = 1, . . . , m} are independent with VAR( j ) = σ 2 v ,j .
We represent the data as Z ≡ (Z j : j = 1, . . . , m) . Then, since each element in D O is the union of subsets of D G , one can construct a matrix where the three components are independent, ≡ ( j : j = 1, . . . , m) , and VAR( The matrix Σ is assumed known from the properties of the measurement. If it is not known, V is fixed to I and σ 2 is estimated using variogram techniques (Kang, Liu, and Cressie 2009). Notice that the rows of the matrix C Z sum to 1.
It will be convenient to re-write where In practice, it is not always possible for each B j to include entire BAUs. For simplicity, in FRK we assume that the observation footprint overlaps a BAU if and only if the BAU centroid lies within the footprint. Frequently, point-referenced data are included in Z. In this case, each data point is attributed to a specific BAU and it is possible to have multiple observations of the process defined on the same BAU.
We collect the unknown parameters in the set θ ≡ {α, σ 2 ξ , σ 2 δ , K} for FRK-V and θ • ≡ {α, σ 2 ξ , σ 2 δ , ϑ} for FRK-M for which K = K • (ϑ); their estimation is the subject of Section 2.2. If the parameters in θ or θ • are known, an inversion that uses the Sherman-Woodbury identity (Henderson and Searle 1981) allows spatial prediction at any BAU in D G . Estimates of θ are substituted into these spatial predictors to yield FRK-V. Similarly, estimates of θ • substituted into the spatial-prediction equations yield FRK-M.
In FRK, we allow the prediction set D P to be as flexible as D O ; specifically, D P ⊂ {∪ i∈c k A i : k = 1, . . . , N P }, wherec k is a non-empty set in 2 {1,...,N } and N P is the number of prediction areas. We can thus predict both at the individual BAU level or averages over an area spanning multiple BAUs, and these prediction regions may overlap. This is an important change-ofsupport feature of FRK. We provide the FRK equations in Section 2.3.

Parameter estimation using an EM algorithm
In all its generality, parameter estimation with the model of Section 2.1 is problematic due to confounding between δ and ξ. In FRK, the user thus needs to choose between modeling the intra-BAU systematic errors (in which case σ 2 ξ is fixed to 0) or the process' fine-scale variation (in which case σ 2 δ is fixed to 0). We describe below the estimation procedure for the latter case; due to symmetry, the estimation equations of the former case can be simply obtained by replacing the subscript ξ with δ. However, which case is chosen by the user has a considerable impact on the prediction equations for Y (Section 2.3). Recall that the measurement-error covariance matrix Σ is assumed known from measurement characteristics, or estimated using variogram techniques prior to estimating the remaining parameters described below. For conciseness, in this section we use θ to denote the parameters in both FRK-V and FRK-M, only distinguishing when necessary.
We carry out parameter estimation using an expectation-maximization (EM) algorithm (similar to Katzfuss and Cressie 2011;Nguyen et al. 2014) with (7) as our model. Define the complete-data likelihood L c (θ) ≡ [η, Z | θ] (with ξ Z integrated out), where [ · ] denotes the probability distribution of its argument. The EM algorithm proceeds by first computing the conditional expectation (conditional on the data) of the complete-data log-likelihood at the current parameter estimates (the E-step) and, second, maximizing this function with respect to the parameters (the M-step). In mathematical notation, in the E-step the function is found for some current estimate θ (l) . In the M-step, the updated parameter estimates The E-step boils down to finding the conditional distribution of η at the current parameter estimates. One can use standard results in Gaussian conditioning (e.g., Rasmussen and Williams 2006, Appendix A) to show from the joint distribution, [η, Z | θ (l) ], that In FRK-V, the update for K (l+1) is while in FRK-M, where recall that K = K • (ϑ), the update is which is numerically optimized using the function optim with ϑ (l) as the initial vector.
The update for σ 2 ξ requires the solution to where The solution to (9), namely (σ 2 ξ ) (l+1) , is found numerically using uniroot after (8) is substituted into (10). Then α (l+1) is found by substituting (σ 2 ξ ) (l+1) into (8). Computational simplifications are possible when V ξ,Z and Σ are diagonal, since then only the diagonal of Ω needs to be computed. Further simplifications are possible when V ξ,Z and Σ are proportional to the identity matrix, with constants of proportionality γ 1 and γ 2 , respectively. In this case, where recall that m is the dimension of the data vector Z and α (l+1) is, in this special case, the ordinary-least-squares estimate given µ (l) η (see (8)). These simplifications are used by FRK whenever possible. Convergence of the EM algorithm is assessed using the (incomplete-data) log-likelihood function at each iteration, Efficient computation of the log-likelihood is facilitated through the use of the Sherman-Morrison-Woodbury matrix identity and a matrixdeterminant lemma (e.g., Henderson and Searle 1981). Specifically, the operations ensure that we only deal with vectors of length m and matrices of size r × r, where typically the fixed rank r m, the dataset size.

Prediction
The prediction task is to make inference on the hidden Y -process over a set of prediction regions D P . Consider the process {Y P (B k ) : k = 1, . . . , N P }, which is derived from the Y process and, similar to the observations, is constructed using the BAUs {A i : i = 1, . . . , N }.
Here, N P is the number of areas at which spatial prediction takes place, and is equal to |D P |. Then, where the weights arẽ Define Y P ≡ (Y P,k : k = 1, . . . , N P ) . Then, since each element in D P is the union of subsets of D G , one can construct a matrix, the rows of which sum to 1, such that where T P ≡ C P T, S P ≡ C P S, ξ P ≡ C P ξ and VAR(ξ P ) = σ 2 ξ V ξ,P ≡ σ 2 ξ C P V ξ C P . As with the observations, the prediction regions {B k } may overlap. In practice, it may not always be possible for eachB k to include entire BAUs. In this case, we assume that a prediction region contains a BAU if and only if the BAU centroid lies within the region.
Let l * denote the EM iteration number at which convergence is deemed to have been reached. The final estimates are then Recall from Section 2.2 that the user needs to attribute fine-scale variation at the BAU level to either the measurement process or the hidden process Y . This leads to the following two cases.
Case 1: σ 2 ξ = 0 and estimate σ 2 δ . The prediction vector Y P and covariance matrix Σ Y P |Z , corresponding to the first two moments from the predictive distribution [Y P | Z] when σ 2 ξ = 0, are Under the assumptions taken, [Y P | Z] is a Gau( Y P , Σ Y P |Z ) distribution. Note that all calculations are made after substituting in the EM-estimated parameters, and that σ 2 δ is present in the estimated parameters.
Case 2: σ 2 δ = 0 and estimate σ 2 ξ (default). To cater for arbitrary observation and prediction support, we predict Y P by first carrying out prediction over the full vector Y, that is, at the BAU level, and then transforming linearly to obtain Y P through the use of the matrix C P . It is easy to see that if Y is an optimal (squared-error-loss matrix criterion) predictor of Y, then A Y is an optimal predictor of AY, where A is any matrix with N columns.
Let W ≡ (η , ξ ) and Π ≡ (S, I). Then (5) can be re-written as Y = Tα + ΠW, and for and the block-diagonal matrix Λ ≡ bdiag( K, σ 2 ξ V ξ ), where bdiag(·) returns a block diagonal matrix of its matrix arguments. Note that all calculations are made after substituting in the EM-estimated parameters.
For both Cases 1 and 2 it follows that Y P = E(Y P | Z) = C P Y and Note that for Case 2 we need to obtain predictions for ξ P which, unlike those for η, are not a by-product of the EM algorithm of Section 2.2. Sparse-matrix operations (Zammit-Mangion and Rougier 2018) are used to facilitate the computation of (13) when possible.

FRK package structure and usage
In this section we discuss the layout and the interface of the package, and we show its use on the meuse dataset under 'simple usage' and 'advanced usage.' The former attempts to construct the SRE model automatically from characteristics of the data, while the latter gives the user more control through use of additional commands. The meuse dataset is not large and contains 155 readings of heavy-metal abundance in a region of The Netherlands along the river Meuse. For more details on the dataset see the vignette titled 'gstat' in the package gstat (Pebesma 2004).

Usage overview
By leveraging the flexibility of the spatial and spatio-temporal objects in the sp (Bivand, Pebesma, and Gómez-Rubio 2013) and spacetime (Pebesma 2012) packages, FRK provides a consistent, easy-to-use interface for the user, irrespective of whether the datasets have different spatial supports, irrespective of the manifold being used, irrespective of whether or not a temporal dimension needs to be included, and irrespective of the 'prediction resolution.' In Figure 1 we provide a partial unified modeling language (UML) diagram summarizing the important package classes and their interaction with the packages sp and spacetime, while in Table 1 we provide a brief summary of these classes. BAUs should be 'Spatial' or 'ST' pixel Class Description 'Basis' Defines basis functions on a specified manifold. 'Basis_obj' A virtual class that other basis classes inherit from. 'manifold' A virtual class that other manifold classes inherit from. 'measure' Defines objects that compute distances on a specified manifold. 'plane', 'real_line', 'sphere', 'STplane', 'STsphere', 'STmanifold' Subclasses that inherit from the virtual class 'manifold'. 'SRE' Defines the spatial-random-effects model, which is used to do FRK. 'TensorP_Basis' Tensor product of two basis functions. or polygon objects, while the data can also be point objects (although they are subsequently mapped to BAUs by FRK). Each 'Spatial' and 'ST' object is equipped with a coordinate reference system (CRS), which needs to be identical across objects. The main class is the 'SRE' class, the object of which incorporates all information about fitting and prediction using the data, BAUs, and basis functions.
The basis functions are constructed on a manifold which, at the time of writing, can be R (real_line), R 2 (plane), S 2 (surface of sphere), and their spatio-temporal counterparts (STplane and STsphere). Some consistency checks are made to ensure that the CRS in the BAUs and the data objects are compatible with the manifold on which the basis functions are constructed. As with spDists in the sp package, distances on the manifold are either Euclidean or great-circle. The function spDists in sp is not used, rather a function in an object of class 'measure' is used for abstraction -this redundant structure is intended to facilitate future implementation of FRK on arbitrary manifolds and with arbitrary distance functions. The package FRK has support for spatio-temporal data (see Section 6); in this case, basis functions are of class 'TensorP_Basis' and, as the name implies, are constructed through the tensor product of spatial and temporal basis functions.
The package is built around a straightforward model (outlined in Section 2) and has the capability of handling large datasets (up to a few hundred thousand data points on a standard desktop machine, and a few million on a big memory machine). For linear algebraic calculations, it leverages routines from the sparseinv package (Zammit-Mangion 2018, which is built from C code written by Davis 2021)  The user has two levels of control; for simple problems one can call the function FRK, in which case basis-function construction and BAU generation is done automatically based on characteristics of the data. Alternatively, for more (advanced) control, the user can follow the following six steps. • Step 1: Place the data into an object with class defined in sp or spacetime, specifically either 'SpatialPointsDataFrame' or 'STIDF' for point-referenced data, and either 'SpatialPolygonsDataFrame' or 'STFDF' for polygon-referenced data (Pebesma 2012

STFDF
. . .  Table 1 for a brief description of these classes. For conciseness, in each class diagram (yellow box) only a few attributes are shown and no class operations are listed. Italicized class names indicate virtual classes, an arrow with an open arrowhead indicates inheritance, and a line with a diamond at one end and an arrowhead at another indicates a compositional ("has a") relationship. The numbers on these lines indicate the number of instances involved in the relationship. For example, the 'SRE' class always has two or more sp or spacetime instances (BAUs and data), while a 'TensorP_Basis' object may or may not be needed when setting up the SRE model. In the former case we use the notation '2..*' to denote 'two or more,' while in the latter we use '0..1' to note that the user may have 0 or 1 'TensorP_Basis' objects when using FRK. • Step 2: Construct a prediction grid of BAUs using auto_BAUs, where each BAU is representative of the finest scale upon which we wish to carry out inference (the process is discretized at the BAU level). The BAUs are usually of class 'SpatialPixelsDataFrame' for spatial problems (or they could also be of class 'SpatialPolygonsDataFrame'), and they are of class 'STFDF' for spatio-temporal problems. • Step 3: Construct using auto_basis a set of regularly or irregularly spaced basis functions. The basis functions can be of various types (e.g., bisquare, Gaussian, or exponential functions). • Step 4: Construct an SRE model using SRE from an R formula that identifies the response variable, the covariates, the data, the BAUs, and the basis functions. • Step 5: Estimate the parameters within the SRE model using SRE.fit. Estimation is carried out using the EM algorithm described in Section 2.2. • Step 6: Predict either at the BAU level or over arbitrary polygons specified as 'SpatialPolygon's or 'SpatialPolygonDataFrame's in the spatial case, or as 'STFDF's in the spatio-temporal case, using predict.

Group Method/Function Use Basis functions auto_basis
Automatically constructs a set of basis functions on a given manifold based on a supplied dataset. local_basis Manually constructs a set of 'local' basis functions from a set of centroids and scale parameters. eval_basis Evaluates basis functions over arbitrary points or polygons. remove_basis Removes basis functions from an object of class 'Basis'. show_basis Visualizes basis functions. BAUs auto_BAUs Automatically constructs a set of BAUs on a given manifold around a supplied dataset. BAUs_from_points Constructs BAUs from point-level data. Information coef Returns regression coefficients from a fitted SRE model. info_fit Returns information from the EM algorithm (e.g., information on convergence). nbasis Returns the number of basis functions in a 'Basis' or 'SRE' object. nres Returns the number of basis-function resolutions in a 'Basis' or 'SRE' object.

opts_FRK$get
Returns current option settings.

opts_FRK$set
Sets an option. summary Returns information on the 'Basis' or 'SRE' object.

FRK operations FRK
Constructs and fits an SRE model from a supplied R formula and dataset. predict Predicts over BAUs or at newdata using a fitted SRE model.

SRE
Constructs an SRE model from an R formula, data, BAUs, and basis functions.

SRE.fit
Fits (estimates parameters in) an SRE model. In Table 2 we provide some of the important methods and functions, together with brief descriptions, available to the user of FRK.

Simple usage
In simple cases, the user constructs and fits the SRE model using the function FRK, and then prediction is carried out using the function predict. The main function FRK takes two compulsory arguments: A standard R formula f and a list of data objects data, and it returns an object of class 'SRE'. Each of the data objects in the list must be of class 'SpatialPointsDataFrame', 'SpatialPolygonsDataFrame', 'STIDF', or 'STFDF', and each must contain the dependent variable defined in f. If there are covariates, then the user must supply the covariate data with all the BAUs, that is, at both the BAU measurement locations and at the BAU prediction locations.
The BAUs should be of class 'SpatialPolygonsDataFrame' or 'SpatialPixelsDataFrame' (in the spatial case) or 'STFDF' (in the spatio-temporal case). Note that, unlike conventional spatial modeling tools, covariate information should not be supplied with the data, but with the BAUs. Also note that the intersection of the data support and that of the BAUs should never be null.
When no basis functions or BAUs are supplied, then these are elicited automatically based on characteristics of the supplied dataset(s). The number of basis functions used depends on whether K is unstructured or not, on whether the data are spatial only or are spatio-temporal, and on the number of data points. For details, see the package's manual (Zammit-Mangion and Sainsbury-Dale 2021). The number of BAUs depends on the domain boundary and on whether the dataset is spatial or spatio-temporal. Domain construction and basis-function placement may make use of geometric functions available in INLA. If INLA is unavailable, simple geometric methods are used instead.
FRK was not built for small datasets, for which standard exact kriging is fast and memory efficient. However, to illustrate the utility of FRK, we consider the meuse dataset in the package sp. We first consider a simple model with no covariates, in which we model the logarithm of zinc concentrations. Basis functions can either be arranged on a grid by setting regular = 1 or as a function of data density (using the INLA mesher) by setting regular = 0.
The meuse dataset is first loaded and cast into a 'SpatialPointsDataFrame'.
R> library("FRK") R> f <-log(zinc)~1 R> S <-FRK(f = f, data = list(meuse), regular = 0) The returned 'SRE' object S contains all the information about the fitted SRE model, which can be displayed using the summary command.
If we wish to use covariate information, we need to consider BAUs that have covariate information attached to them. Such BAUs are available for this problem in the package sp in meuse.grid, which we first cast into a 'SpatialPixelsDataFrame' using the function gridded before using them in the SRE model.
R> data("meuse.grid", package = "sp") R> coordinates(meuse.grid) <-~x + y R> gridded(meuse.grid) <-TRUE In this example, based on prior exploratory data analysis (see the vignette 'gstat' in the package gstat), we consider the square root of the distance from the centroid of a BAU to the nearest point on the river Meuse as the covariate. Recall that all covariates need to be supplied with the BAUs and not with the data, and FRK will throw an error if the data and BAUs have fields in common. In the code below, we first set any common fields to NULL in the data object, before running FRK using the user-specified BAUs.
The other core function, which is also needed for 'advanced usage,' is predict, which is used to compute prediction and prediction standard errors at all prediction locations. This function takes as compulsory argument the SRE model S. If no polygons are specified, prediction is carried out at the BAU level (in space and/or time). An important argument is the flag obs_fs, which acts as a choice between Case 1 (process' fine-scale variance σ 2 ξ = 0; obs_fs = TRUE) and Case 2 (systematic intra-BAU variance σ 2 δ = 0; obs_fs = FALSE) of Section 2.3.

R> Pred <-predict(S, obs_fs = FALSE)
The function Pred returns the polygons (or, in this case, the BAUs) containing the prediction mu and the prediction variance var, which can be readily used for visualization. The predictions and prediction standard errors of the model having sqrt(dist) as a covariate are depicted in Figure 2. In this instance, Case 2 was used, and the estimate of the fine-scale variance σ 2 ξ was positive. Hence, the prediction and prediction-error maps exhibit 'bulls-eye' features, where the prediction standard errors are much lower in BAUs containing data than in neighboring BAUs not containing data.

Point-level data and predictions
In many cases, the user has one data object or data frame containing both observations and prediction locations with accompanying covariates. Missing observations are then usually denoted as NA. Since in FRK all covariates are associated with the BAUs and not the data, that one data object needs to be used to construct (i) a second data object where no data are missing and that does not contain missing covariates, and (ii) BAUs at both the observation and prediction locations supplied with their associated covariate data.
For example, assume that the first 10 log-zinc concentrations are missing in the meuse dataset. R> data("meuse", package = "sp") R> meuse[1:10, "zinc"] <-NA Once the data frame is appropriately subsetted, it is then cast as a 'SpatialPointsDataFrame' as usual.
R> meuse2 <-subset(meuse, !is.na(zinc)) R> meuse2 <-meuse2[, c("x", "y", "zinc")] R> coordinates(meuse2) <-~x + y The BAUs, on the other hand, should contain all the data and prediction locations, but not the response variable itself. Their construction is facilitated by the function BAUs_from_points which constructs tiny BAUs around the data and prediction locations.

Advanced usage
The package FRK provides several helper functions for facilitating basis-function construction and BAU construction when more control is needed. Harnessing the extra functionality requires following the six steps outlined in Section 3.1.
Step 1: As before, we first load the data and cast it into a 'SpatialPointsDataFrame'. R> data("meuse", package = "sp") R> coordinates(meuse) <-~x + y Step 2: Based on the geometry of the data we now generate BAUs. For this, we use the helper function auto_BAUs, which takes several arguments (see help(auto_BAUs) for details). In the code below, we instruct the helper function to construct BAUs on the plane, centered around the data in meuse with each BAU of size 100 × 100 m. The type = "grid" input indicates that we want a rectangular grid and not a hexagonal lattice (type = "hex") and convex = -0.05 is a parameter controlling the shape of the domain boundary when nonconvex_hull = TRUE (see the help file of INLA::inla.nonconvex.hull and Lindgren and Rue 2015 for more details), and the extension of the convex hull of the data when nonconvex_hull = FALSE (default). For the ith BAU, we also need to supply the element v ξ,i (or v δ,i ) that describes the heteroscedasticity of the fine-scale variation for that BAU. As described in Section 2.1, this component encompasses all process variation that occurs at the BAU scale and only needs to be known up to a constant of proportionality, σ 2 ξ or σ 2 δ (depending on the chosen model); this constant is estimated using maximum likelihood with SRE.fit, which uses the EM algorithm of Section 2.2. Typically, geographic features such as altitude are appropriate, but in this illustration of the package we just set this value to be 1 for all BAUs. This field is labeled fs, and SRE will throw an error if it is not set.

R> GridBAUs1$fs <-1
The data and BAUs are illustrated using the plot function in Figure 3. At this stage, the BAUs only contain geographical information. To add covariate information to these BAUs from other 'Spatial' objects, the function sp::over can be used.
Step 3: FRK decomposes the spatial process as a sum of basis functions that can be constructed using the helper functions auto_basis as follows: R> G <-auto_basis(manifold = plane(), data = meuse, regular = 0, + nres = 3, type = "bisquare") The argument nres indicates the number of basis-function resolutions to use, while type indicates the function to use, in this case the bisquare function, where A is the amplitude and r is the aperture. Other options are "exp" (the exponential covariance function) and "Matern32" (the Matérn covariance function with smoothness parameter equal to 1.5). The basis functions do not need to be positive-definite and users may  (7) need to contain at least one non-zero element in order for η to be identifiable. See help(auto_basis) for details.
The basis functions can be visualized using show_basis(G); see Figure 4.
Step 4: The SRE model is constructed by supplying an R formula, the data, the BAUs, and the basis functions, to the function SRE. If the model contains covariates, one must make sure that they are specified at the BAU-level (and hence attributed to GridBAUs1). We use the following formula.

R> f <-log(zinc)~1
The SRE model is then constructed using the function SRE, which essentially bins the data into the BAUs, constructs all the matrices required for estimation, and provides initial guesses for the parameters that need to be estimated. By default, K_type = "block-exponential", which signals the construction of the matrices where d ijn is the distance between the centroids of the ith and jth basis functions at the nth resolution, r n is the number of basis functions at the nth resolution, n = 1, . . . , n res , n res is the number of resolutions, ϑ 1n is the marginal variance at the nth resolution, and ϑ 2n is the e-folding length-scale (i.e., the distance at which the correlation is 1/e) at the nth resolution. Then the default is K • (ϑ) = bdiag({K n (ϑ) : n = 1, . . . , n res }), where ϑ ≡ (ϑ 11 , ϑ 21 , ϑ 12 , . . . , ϑ 2nres ) and bdiag(·) returns a block-diagonal matrix constructed from its arguments.
R> S <-SRE(f = f, data = list(meuse), BAUs = GridBAUs1, basis = G, + est_error = TRUE, average_in_BAU = FALSE) K_type = "unstructured" can be used to invoke FRK-V. When calling the function SRE, we supplied the formula f containing information on the dependent variable and the covariates; the data (as a list that can include additional datasets); the BAUs; the basis functions; a flag est_error; and another flag average_in_BAU. The flag est_error = TRUE is used to estimate the measurement-error variance σ 2 (where Σ ≡ σ 2 I) using semivariogram methods (Kang et al. 2009). At the time of writing, est_error = TRUE was only available for spatial data, not for spatio-temporal data. When not set to TRUE, each dataset needs to also contain a field std, the standard deviation of the measurement error (that can vary with the measurement).
FRK is built on the concept of a BAU, and hence the smallest spatial support of an observation has to be equal to that of a BAU. However, in practice, several datasets (such as the meuse dataset) are point-referenced. We reconcile this difference by assigning a support to every point-referenced datum equal to that of a BAU. Multiple point-referenced data falling within the same BAU are thus assumed to be noisy observations of the same random variable; see (6). As a consequence of this, when multiple observations fall into the same BAU, the matrices V ξ,Z and V δ,Z will be sparse but not diagonal (since C Z will contain more than one non-zero element per column). This can increase the computational time required for estimation considerably. For large point-referenced datasets, such as the dataset considered in Section 4.2, it is reasonable to summarize the data at the BAU level. Since FRK is designed for use with large datasets, the argument average_in_BAU of the function SRE is defaulted to TRUE. In this default setting, all data falling into one BAU is averaged, and the standard deviation of the measurement error of the averaged data point is taken to be the average standard deviation of the measurement error of the individual data points. Consequently, the dataset is thinned. With large datasets and small BAUs, this thinning frequently does not cause performance degradation (see Section 4.2). Since the meuse dataset is relatively small, we set average_in_BAU = FALSE.
Step 5: The SRE model is fitted using the function SRE.fit. Maximum likelihood estimation is carried out using the EM algorithm of Section 2.2, which is assumed to have converged either when n_EM is exceeded or when the log-likelihood across subsequent steps does not change by more than tol. In this example, the EM algorithm converged in about 30 iterations; see Figure 5.
R> S <-SRE.fit(SRE_model = S, n_EM = 400, tol = 0.01, print_lik = TRUE) Step 6: Finally, we predict at all the BAUs with the fitted spatial model. This is done using the function predict. The argument obs_fs dictates whether we attribute the fine- scale variation to intra-BAU systematic error (Case 1) or to the process model (Case 2). In the code below, we use the default setting and allocate it to the process model.

R> GridBAUs1 <-predict(S, obs_fs = FALSE)
The object GridBAUs1 now contains the prediction vector, the prediction standard error, and the square of the prediction standard error at the BAU level in the fields mu, sd, and var, respectively. These can then be visualized using standard plotting commands.

Predicting over larger polygons/areas
Now, assume that we wish to predict over regions encompassing several BAUs such that the matrix C P in (11) contains multiple non-zeros per row. We can create this larger regionalization by using the function auto_BAUs and specifying the cell size. This gives a 'super-grid' shown in Figure 6.

R> Pred <-predict(S, newdata = Pred_regions, obs_fs = FALSE)
The predictions and the corresponding prediction standard errors on this super-grid are shown in Figure 6.

Computational considerations
While FRK beats the curse of 'data dimensionality' by dealing with matrices of size r × r instead of matrices of size m × m, one must ensure that the number of basis functions, r, remains reasonably small. The reasons are two-fold. First, the computational time required to invert an r × r matrix increases cubicly with r, and several such inversions are required  when running the EM algorithm. Second, it is likely that more EM-algorithm iterations are required when r is large. In practice, r should not exceed a few thousand. The number of basis functions r can usually be controlled through the argument nres. The function auto_basis also takes an argument max_basis that automatically finds the number of resolutions required to not exceed the desired maximum number of basis functions.
The fitting and prediction algorithms scale linearly with the number of data points m and the number of BAUs N . However, if one has millions of data points, then the number of BAUs must exceed this and a big-memory machine will probably be required. Irrespective of problem size, we have noted considerable improvements in speed when using the OpenBLAS libraries (Wang, Zhang, Zhang, and Yi 2013).
Some of the operations in FRK can be run in parallel. To use a parallel back-end, one needs to set an option as follows:

R> opts_FRK$set("parallel", numcores)
where numcores is of class 'integer' (e.g., numcores = 4L to use 4 cores). When this option is set, the parallel package is used to set up a parallel backend using makeCluster, which is subsequently used for parallel operations. Currently, parallelism is limited in FRK to • computing the integrals in (3) using Monte Carlo integration or, when appropriate, the approximation (4); • finding which data are influenced by which BAUs and computing the weights in (6).
Unfortunately the EM algorithm, which is the bottleneck in a spatial analysis using FRK, is serial in nature and difficult to parallelize. Hence, SRE.fit takes as argument method, in recognition that in the future other, possibly parallelizable, estimation methods might be implemented to speed up the fitting process.

Comparison studies
In this section we compare the utility of the SRE model in FRK to standard kriging using gstat, and to two other popular models for modeling and predicting with large datasets in R: the LatticeKrig model that can be implemented with the package LatticeKrig (Nychka et al. 2019), and the SPDE-GMRF model that can be implemented with the package INLA (Lindgren and Rue 2015). In both these models the spatial field is decomposed as and K −1 is modeled in such a way that it is sparse. These two models allow for feasible computation with large r, however neither includes an extra fine-scale effect ξ(·). The SPDE-GMRF model has the added interpretable feature that, for a given set of basis functions, K −1 is such that the resulting field approximates a Gaussian process with a stationary covariance function from the Matérn class.
In Section 4.1, we first analyze a 2D simulated dataset. We shall see that while FRK may sometimes perform less well in terms of prediction accuracy due to the practical limit on the number of basis functions it uses, it does not under-fit (i.e., it gives valid results) since fine-scale variation is taken into account. In fact, we see that the SRE model in FRK provides better coverage in terms of prediction intervals, even with large datasets, when compared to other models that use considerably more basis functions but that do not account for fine-scale variation. In the second case study (Section 4.2), we consider three days of column-averaged carbon dioxide data from the Atmospheric InfraRed Sounder instrument on board the Aqua satellite .

A 2D simulated dataset
Let D = [0, 1] × [0, 1] ⊂ R 2 , and consider a process Y (·) with covariance function COV(Y (s), Y (s + h)) ≡ σ 2 exp(− h /τ ), where σ 2 is the marginal variance of the process and τ is the e-folding length-scale. Further, let m be the number of observations, and define the signal-tonoise ratio (SNR) to be the ratio of the marginal variance σ 2 to that of the measurement-error process, σ 2 . In the inter-comparison, we consider cases where m is either 1, 000 or 50, 000, SNR is 0.2, 1, or 5, and τ is either 0.015 or 0.15. These choices of parameters help highlight the strengths and weaknesses of FRK with respect to the other approaches. For example, due to the relatively small number r of basis functions employed, we expect FRK to have lower prediction precision when the SNR is high and τ is low, but we expect the prediction intervals to be valid. We further split the domain into two side-by-side partitions, and we placed 95% of the observations in the left half (LH) and 5% in the right half (RH). This partitioning helps identify the different methods' capability of borrowing strength from a region with dense measurements to a region with sparse measurements. The measurement locations for the m = 1, 000 case are shown in Figure 7, left panel. We simulated the process on a 1,000 × 1,000 grid using the package RandomFields (Schlather, Malinowski, Menck, Oesting, and Strokorb 2015). We used the 10 6 cells of the grid as our set of BAUs, D G , and therefore each BAU was of size 0.001 × 0.001. One such spatial-process realization for τ = 0.15 and σ 2 = 1 is shown in Figure 8, left panel, while one with τ = 0.015 and σ 2 = 1 is shown in Figure 8, right panel. With gstat, which we used to implement simple kriging (denoted gstat), we assumed the true underlying covariance function was known. Hence, when available (for the m = 1, 000 case), the results of gstat should be taken as the gold standard. As m gets larger, simple kriging quickly becomes infeasible, since it is O(m 3 ) in computational complexity.
We implemented the LatticeKrig model (denoted LTK) using the package LatticeKrig. We used nlevel = 3 resolutions of Wendland basis functions, set the smoothness parameter nu = 0.5, and the number of grid points per spatial dimension at the coarsest resolution to NC = 33. The first resolution contained 1,849 basis functions, the second resolution contained 5,625, and the third resolution contained 19,321. In the case where m = 1, 000, we set findAwght = TRUE for the effective process range to be estimated by maximum likelihood methods. Setting findAwght = TRUE was prohibitive for m = 50, 000, but separate experiments showed that predictions from LTK were largely insensitive to this option for this value of m.
We implemented and fit the SPDE-GMRF model (denoted SPDE) using the package INLA.
We constructed a triangular mesh using inla.mesh.2d with max.edge = c(0.05, 0.05) and cutoff = 0.012. This gave a mesh with a higher density of basis function on the lefthand side of the domain and (as with the LatticeKrig model) a buffer to reduce edge effects. The basis functions are defined by the triangles, and their number was around 2,500 for m = 1, 000, and 5,000 for m = 50, 000, while the parameter α = 3/2 was used to reproduce Gaussian fields with a Matérn covariance function with smoothness parameter of 1/2 (i.e., an exponential covariance function). Unlike LatticeKrig and FRK, INLA uses an approximate Bayesian method for inference, and thus it requires the specification of prior distributions of the parameters, for which we use penalized complexity priors (Fuglstad, Simpson, Lindgren, and Rue 2019). For these simulation settings and our choice of prior distributions, we do not expect the inferential method to be a factor that largely influences the prediction and prediction errors.
For the SRE model implemented in FRK (denoted FRK) we put a block-exponential covariance structure on K • (ϑ) (K_type = "block-exponential"), and we set nres = 3, yielding, in total, 819 basis functions regularly distributed in the domain D. In this study we used LatticeKrig v7.0, INLA v18.07.12, and FRK v0.2.2. For each configuration in the simulation experiment (i.e., the factorial design defined by m ∈ {1, 000, 50, 000}, SNR ∈ {0.2, 1, 5}, and τ ∈ {0.015, 0.15}), we simulated L = 100 datasets. For prediction locations we took 1, 000 locations at random on the left-hand side of the gridded domain D G that coincided with measurement locations, and another 1, 000 that did not; and we did the same for the right-hand side. When there were less than 1,000 measurement locations on a given side, all measurement locations were chosen as prediction locations for that side. The sets of locations are denoted as D O LH , D M LH , D O RH , and D M RH , respectively. These locations were kept constant across all simulation experiments for a given m. In addition to the stationary, exponential, Gaussian process, we also simulated from the nonstationary process Y NS (·), where with COV(Y 1 (s), Y 1 (s+h)) = σ 2 1 exp(− h /τ 1 ) and COV(Y 2 (s), Y 2 (s+h)) = σ 2 2 exp(− h 2 /τ 2 ). For this additional experiment, we set m = 10, 000, σ 1 = σ 2 = 0.5, and τ 1 = τ 2 = 0.15, and we used all configurations in the original experiment as described above. The measurement locations for this case are shown in Figure 7, right panel.
As prediction-performance measures ('responses' of the experiment), we considered the following: • Root mean-squared prediction error: LetŶ X (s; l) denote the 'model-X' predictor of Y (s; l), where Y (s; l) is the lth simulated process evaluated at location s and X = gstat, LTK, SPDE, FRK. Then the model-X predictor root-mean-squared prediction error (RMSPE) for the lth simulation is where Since we are interested in benchmarking the model we use in FRK, we considered a measure of relative skill (RS), relative to FRK: where X = gstat, LTK, SPDE. Hence, RS > 1 (< 1) indicates that FRK has better (worse) prediction accuracy. We intentionally focus on coverage in order to highlight the strengths and weaknesses of the models in terms of uncertainty quantification. Other related measures, such as the interval score , penalize for both prediction interval width and poor coverage and are thus less suited to assess the issue of validity (i.e., whether the prediction intervals are correct, on average). The measures RS X and I 90 X were considered for {Y (s) : s ∈ D G } simulated from the stationary process with exponential covariance function and from the nonstationary process Y NS (·) in (14).
Distributions of RS X for the original experiment with m = 1, 000 and m = 50, 000 are shown in Figures 9 and 10, respectively. While it is possible to proceed with an analysis of variance to analyze these results (e.g., Zhuang and Cressie 2014), here we discuss their most prominent features. First, when there are few (m = 1, 000) data points (Figure 9), there is little difference between the methods for low SNR but, for high SNR, SPDE and LTK perform better in terms of RS when τ is small (τ = 0.015; see, for example, the bottom left two panels of Figure 9). This was expected since the number of basis functions used begins to play an important role as the SNR increases (Zammit-Mangion, Cressie, and Shumack 2018). As expected, all prediction methods perform worse than or as well as, simple kriging with gstat (under a known covariance function).
The comparison in terms of RS is less clear when m = 50, 000 ( Figure 10). First, at unobserved locations, FRK is frequently outperformed in terms of RS by the other methods, since the relatively small number of basis functions is unable to adequately reconstruct the optimal (simple-kriging) predictor. On the other hand, at observed locations, the performance is SNR dependent and data-density dependent. In much of the design space, FRK performs worse (in terms of RS) than the other predictors at the measurement locations, but it begins to outperform SPDE and LTK as the SNR increases and when τ is small (τ = 0.015). Now we turn to the question of 'validity' of the predictors.
An equally important performance measure to RMSPE is coverage. Distributions of I 90 X for m = 1, 000 and m = 50, 000 are shown in Figures 11 and 12, respectively. In the smalldata case (m = 1, 000), all methods are over-confident (more so in the left-hand part of the domain) and by varying degrees. In the large-data case (m = 50, 000), both SPDE and LTK perform poorly in terms of coverage, providing over-confident predictions, especially when the SNR is large (SNR = 5). This is a result of these models relying on basis functions to reproduce the fine-scale variation and not attempting to separate out fine-scale variation from measurement error. The model implemented in FRK places a white-noise process at the BAU level to capture the fine-scale variation and can thus yield good coverage despite the use of a relatively low-dimensional manifold. It is worth nothing that it is straightforward in INLA to include an extra fine-scale-variation term and fix the standard deviation of the measurement error to some pre-specified value, although this is rarely done. Here we are illustrating that  Figure 11 but with the number of data points m = 50, 000. Note that, in this case, simple kriging (with gstat) is computationally prohibitive and hence does not appear in the figure. not doing this may lead to severe deleterious effects on coverage. The model used in FRK was also found to yield 90% interval scores that were at least as good as, or better than, the other two models for the case with m = 50, 000 (results not shown).
To further investigate this issue, we re-ran the simulations and generated coverage diagnostics for predicted data, Y P + P , rather than for just Y P . The coverage for all methods was very good (results not shown), indicating that all methods are able to correctly apportion total variability. Consequently, these results show that inclusion of the fine-scale variation term is critical in reduced-rank approaches (irrespective of the number of basis functions) with large datasets when predicting the hidden process. (It is not critical when predicting missing data). The simple semivariogram method employed by FRK for estimating the measurement-error  Table 3: Root mean squared prediction error (RMSPE) and 90% coverage (CR) for the case where m = 10, 000 data were simulated from the nonstationary process Y NS (·).
variance is a step in the right direction, and it appears to yield good results in the first instance. However, ideally, the standard deviation of the measurement error is known from the application and fixed a priori.
Overall, all models have their own relative strengths and weaknesses, largely arising from the differences in (i) the type and number of basis functions employed, and (ii) the presence or otherwise of a fine-scale-process term. In this experiment we saw that the model employed by FRK produces predictions that are valid, on average. However for large-data situations, our experiment shows FRK predictions to be less efficient, as expected due to the restriction on the number of basis functions that can be used.
In the nonstationary case (14), all methods performed similarly, with LTK being slightly overconfident and FRK being slightly underconfident; see Table 3. This similarity is not surprising since in (14) we set τ 1 = τ 2 = 0.15, which results in a process that is highly spatially correlated as well as rather smooth. The resulting process has a similar overall length scale and SNR to that simulated in the original experiment that yielded the results shown in the second row (SNR = 1) of the third (LH, '10'), fourth (LH, '11'), seventh (RH, '10'), and eighth (RH, '11') columns of Figures 9 and 11. We see that all three methods performed similarly, and satisfactorily, in this case.

Modeling and prediction with data from the AIRS instrument
The US National Aeronautics and Space Administration (NASA) launched the Aqua satellite on 2002-05-04, with several instruments on board, including the Atmospheric Infrared Sounder (AIRS). AIRS retrieves column-averaged CO 2 mole fraction (in units of parts per million), denoted XCO 2 (with particular sensitivity in the mid-troposphere), amongst other geophysical quantities (Chahine et al. 2006). The data we shall use consist of XCO 2 measurements taken between 2003-05-01 and 2003-05-03 (inclusive). These data are a subset of those available with FRK. We compare LTK, SPDE, and FRK on the three-day AIRS dataset, and we assess the utility of the methods on a validation dataset that we hold out.
Modeling on the sphere with FRK proceeds in a very similar fashion to the plane, except that a coordinate reference system (CRS) on the surface of the sphere needs to be declared for the data. This is implemented using a 'CRS' object with string "+proj=longlat +ellps=sphere". We next outline the six steps required to fit these data using FRK.
Step 1: Fifteen days of XCO 2 data from AIRS (in May 2003) are loaded by using the command data("AIRS_05_2003", package = "FRK"). In this case study, we subset the data to include only the first three days, which contains 43,059 observations of XCO 2 in parts per million (ppm). We subsequently divide the data into a training dataset of 30,000 observations, chosen at random (AIRS_05_2003_t), and a validation dataset (AIRS_05_2003_v) containing the remaining observations. To instruct FRK to fit the SRE model on the surface of a sphere, we assign the appropriate 'CRS' object to the data as follows: R> coordinates(AIRS_05_2003) <-~lon + lat R> proj4string(AIRS_05_2003) <-CRS("+proj=longlat +ellps=sphere") Step 2: The next step is to create BAUs. This is done using the auto_BAUs function but this time with the manifold specified to be the surface of a sphere with radius equal to that of Earth. We also specify that we wish the BAUs to form an icosahedral Snyder equal area aperture 3 hexagon (ISEA3H) discrete global grid (DGG) at resolution 9 (for a total of 186,978 BAUs). Resolutions 0-6 are included with FRK; higher resolutions are available in the package dggrids (Zammit-Mangion 2020). By default, this will create a hexagonal grid on the surface of the sphere, however it is also possible to have the more traditional lon-lat grid by specifying type = "grid" and declaring a cellsize in units of degrees. An example of an ISEA3H grid, at resolution 5 (which would yield a total of 6,910 BAUs), is shown in Figure 13, left panel, while a 5 • × 5 • lon-lat grid using type = "grid" is shown in Figure 13, right panel.
R> G <-auto_basis(manifold = sphere(), data = AIRS_05_2003_t, nres = 3, + isea3h_lo = 2, type = "bisquare") Steps 4-5: Since XCO 2 , a column-averaged CO 2 mole fraction in units of ppm, has a latitudinal gradient, we use latitude as a covariate in our model. The SRE object is then constructed in the same way as in Section 3.3. The AIRS footprint is approximately 50 km in diameter, which is smaller than the BAUs we use (approximately 100 km in diameter), and hence it is possible that multiple observations fall into the same BAU. Recall from Section 3.3 (Step 4) that when multiple data points fall into the same BAU that these are correlated through either intra-BAU systematic error (Case 1) or fine-scale process variation at the BAU level (Case 2). However, recall also that when multiple observations fall into the same BAU, the matrices V ξ,Z and V δ,Z are sparse but not diagonal, and this can increase computational time considerably. For large datasets in which each datum has relatively small (relative to the BAU) spatial support, such as the AIRS dataset, it is frequently reasonable to let the argument average_in_BAU = TRUE (as it is by default) to indicate that one wishes to summarize the data at the BAU level.
In the code below we implement FRK using the default case (Case 2, where σ 2 δ = 0 and σ 2 ξ is estimated).  Step 6: To predict at the BAU level, we invoke the predict function.

R> pred_isea3h <-predict(S)
The prediction and prediction standard error maps obtained using FRK, together with the observations, are shown in Figure 15. We denote the implementation above of FRK as FRK-Ma, where "M" denotes the case for the modeled VAR(η) = K • (ϑ) and "a" denotes the case for average_in_BAU set to FALSE.
We evaluated the utility of the SRE model used in FRK, the LatticeKrig model (with Lat-ticeKrig), and the SPDE-GMRF model (with INLA), using out-of-sample prediction at the validation-data locations. We also re-ran FRK with average_in_BAU set to TRUE (denoted FRK-Mb) and K_type = "unstructured" (FRK-V). With INLA we approximated an SPDE with α = 2 on a global mesh of 6,550 basis functions and used penalized complexity prior distributions for the parameters (Fuglstad et al. 2019). With LatticeKrig we used three resolutions with a total of 12,703 basis functions on R 2 using the lon-lat coordinates to denote spatial locations. As comparison measures we used the RMSPE (15) between the validation values and their respective predicted observations, the continuous ranked probability score (CRPS, Gneiting, Balabdaoui, and Raftery 2007), and the actual coverage of a 90% prediction interval (16) but obtained with respect to the validation data instead of the process Y at the validation-data locations.
The results are summarized in Table 4. In this example, we see that there is little practical difference in performance between the five methods despite the large difference in the number of basis functions and the form of the models; FRK performs about 2% worse than the others in terms of RMSPE. As expected, since we are validating against data (and not against the true process, which is unknown here), all methods perform acceptably in capturing total variation. However, the FRK methods gave prediction standard errors of the process that were, on average, double those provided by LTK and SPDE. This mirrors what was seen in Section 4, where SPDE and LTK were generally overconfident, although in this case the Computation time for all three packages were similar under the chosen configurations (except for FRK-Ma that assumes intra-BAU correlations). For FRK, we computed the predictions and prediction standard errors directly using sparse-matrix operations, while we used INLA's predictor functionality to obtain the prediction and prediction standard errors for the SPDE-GMRF model. We obtained LatticeKrig's prediction errors using 30 conditional simulations.

Change-of-support, anisotropy, and custom basis functions
Sections 1-4 introduced the core spatial functionality of FRK. The purpose of this section is to present additional functionality that may be of use to the spatial analyst.

Multiple observations with different supports
In FRK, one can make use of multiple datasets with different spatial supports with little difficulty. Consider the meuse dataset. We synthesize observations with a large support by changing the meuse object into a 'SpatialPolygonsDataFrame', where each polygon is a square of size 300 m × 300 m centered around the original meuse data point (see Figure 16, left panel). For reference, the constructed BAUs are of size 50 m × 50 m. Once this object is set up, which we name meuse_pols, we assign zinc values to the polygons by fitting a spherical semi-variogram model to the log zinc concentrations in the original meuse dataset, generating a realization by conditionally simulating once at the BAU centroids, exponentiating the simulated values, aggregating accordingly, and adding on measurement error with variance 0.01. The analysis proceeds in precisely the same way as in Section 3, but with meuse_pols used instead of meuse.
The predictions and the prediction standard errors using meuse_pols are shown in Figure 16, center and right panels, respectively. We note that the supports of the observations and the BAUs do not precisely overlap: Recall that, for simplicity, we assumed that an observation is taken to overlap a BAU if and only if the centroid of the BAU lies within the observation footprint. (A refinement of this simple method will require a more detailed consideration of the BAU and observation footprint geometry and is the subject of future work.)

Anisotropy: Changing the distance measure
Highly non-stationary and anisotropic fields may be easier to model on a deformed space on which the process is approximately stationary and isotropic (e.g., Sampson and Guttorp 1992;Kleiber 2016). In FRK, a deformation can be introduced to capture geometric anisotropy by changing the distance measure associated with the manifold. As an illustration, we simulated an anisotropic, noisy, spatial process on a fine grid in D = [0, 1] × [0, 1] and assumed that 1,000 randomly-located grid points were observed. The process and the sampled data are shown in Figure 17.
In this simple case, to alter the modified distance measure we note that the spatial frequency in x is approximately four times that in y. Therefore, in order to generate anisotropy, we use a measure that scales x by 4. In FRK, a 'measure' object requires a distance function and the dimension of the manifold on which it is used, and it is constructed as follows: R> dist_fun <-function(x1, x2 = x1) { + scaler <-diag(c(4, 1)) + distR(x1 %*% scaler, x2 %*% scaler) + } R> asymm_measure <-measure(dist = dist_fun, dim = 2L) The distance function can be assigned to the manifold as follows.
R> TwoD_manifold <-plane(measure = asymm_measure) We now generate a grid of basis functions (here at a single resolution) manually. First, we create a 5 × 14 grid on D, which we will use as centers for the basis functions. We then call the function local_basis to construct bisquare basis functions centered at these locations with a range parameter (i.e., the radius in the case of a bisquare) of 0.4. Due to the scaling used, this implies a range of 0.1 in x and a range of 0.4 in y. Basis-function number 23 is shown in Figure 18.

Customized basis functions and BAUs
The package FRK provides the functions auto_BAUs and auto_basis to help the user construct the BAUs and basis functions based on the data that are supplied. However, these could be done manually. The object containing the basis functions needs to be of class 'Basis' that defines 5 slots: • dim: The dimension of the manifold.
• fn: A list of functions. By default, distances used in these functions (if present) are attributed to a manifold, but arbitrary distances can be used.
• pars: A list of parameters associated with each basis function. For the local basis functions used in this paper (constructed using auto_basis or local_basis), each list item is another list with fields loc and scale where length(loc) is equal to the dimension of the manifold and length(scale) is equal to 1.  • df: A data frame with number of rows equaling the number of basis functions and containing auxiliary information about the basis functions (e.g., resolution number).
• n: An integer equal to the number of basis functions.
The constructor Basis can be used to instantiate an object of this class.
There are less restrictions for constructing BAUs; for spatial applications, they are usually either 'SpatialPixelsDataFrame's or 'SpatialPolygonsDataFrame's. In a spatio-temporal setting, the BAUs need to be of class 'STFDF', where the spatial component is usually either a 'SpatialPixels' or a 'SpatialPolygons' object. In either case, the data slot of the object must contain • all covariates used in the model; • a field fs containing elements proportional to the fine-scale variation at the BAU level; and • fields that can be used to summarize the BAU as a point, typically the centroid of each polygon. The names of these fields need to equal those of the coordnames(BAUs) (typically c("x", "y") or c("lon", "lat")).

Spatio-temporal FRK
Fixed rank kriging in space and time is different from fixed rank filtering (Cressie, Shi, and Kang 2010), where a temporal autoregressive structure is imposed on the temporally evolving basis-function weights {η t }, and where Rauch-Tung-Striebel smoothing is used for inference on {η t }. In FRK, the basis functions can also have a temporal dimension; then the only new aspect beyond the purely spatial analysis given above is specifying these spatio-temporal basis functions. We describe how this can be done in Section 6.1. In Section 6.2 we show how this can be applied to modeling and prediction with data from the more recent Orbiting Carbon Observatory-2 (OCO-2) satellite that measures XCO 2 . To illustrate their construction, consider the following set of spatial basis functions.

Basis-function construction
R> centroids <-as.matrix(expand.grid(x = 1:3, y = 1:3)) R> G_spatial <-local_basis(manifold = plane(), loc = centroids, + scale = rep(2, 9), type = "bisquare") The function call above returns a set of bisquare basis functions centered at loc with aperture equal to scale. The same call, given below, can be used to construct temporal basis functions; note that now manifold = real_line(), and we choose Gaussian basis functions instead (in which case scale represents the standard deviation). As in the spatial case, other basis functions (such as bisquare) can also be used.
R> G_temporal <-local_basis(manifold = real_line(), loc = matrix(seq(2, + 28, by = 4)), scale = rep(3, 7), type = "Gaussian") The generated basis functions can be visualized using show_basis; see Figure 20. The spatiotemporal basis functions are then constructed using the function TensorP as follows: R> G <-TensorP(G_spatial, G_temporal) The object G can be subsequently used for constructing SRE models, as in the spatial case. Note that since we have nine spatial basis functions and seven temporal basis functions, we have 63 spatio-temporal basis functions in total. Care should be taken that the total number of basis functions does not become prohibitively large (say > 4000).

Global prediction of column-averaged carbon dioxide from OCO-2
The NASA OCO-2 satellite was launched on 2014-07-02, and it produces between 100,000 and 300,000 usable retrievals per day. Between the beginning of October 2014 and the end of February 2017, the satellite produced around 100 million retrievals. The specific data product we used was the OCO-2 Data Release 7R Lite File Version B7305Br (OCO-2 Science Team, Gunson, and Eldering 2015). Following pre-processing, we reduced the number of data entries in the product to around 50 million. Each retrieval produces a number of variables; in this example, we consider the most commonly used one, XCO 2 , the column-averaged mole fraction in ppm. Obtaining reliable global predictions of XCO 2 does not require consideration of all the data simultaneously. The atmosphere mixes quickly within hemispheres, and temporal correlation-length scales are on the order of days. Hence, we consider the OCO-2 data in a moving-window of 16 days, and for each 16 days we fit a spatio-temporal SRE model in order to obtain a global prediction of XCO 2 in the middle (i.e., the 8th day) of the window. We use this moving-window approach to obtain daily XCO 2 global prediction and prediction errors, between 2014-10-01 and 2017-03-01.
We first put the OCO-2 data into an 'STIDF' object, before using the function auto_BAUs to construct spatio-temporal voxels. The following code constructs gridded BAUs and instructs FRK to use 1 day as the smallest temporal unit and to limit the latitude grid to the minimum and maximum latitude of the OCO-2 data locations, rounded to the nearest degree.
Following pre-processing, we had about one million usable soundings per 16-day window. However, several of these are in quick succession and thus also in close proximity to each other, so that they fall within the same spatio-temporal BAU. We therefore keep the flag average_in_BAU set to TRUE when calling SRE as in Section 4.2. Following averaging, the number of observations per 16-day window reduces by a factor of about 100. We did not need to estimate the measurement-error variance σ 2 in this case, since measurement-error variances are provided with the OCO-2 data. However, we forced all measurement-error standard deviations that were below 2 ppm to be equal to 2 ppm, since the reported values are likely to be optimistic. The total time needed to fit and predict with FRK in a single 16-day window was about 1 hour on a standard desktop computer.
In Figure 21 we show the measurement locations and values for the 16 days, and we show the central day of the 16-day window centered on 2016-09-08. Predictions and prediction standard errors for the central day are shown in Figure 22. Note how the error map reflects the data coverage over the entire 16-day window and not just the day at the center of the window. Prediction standard error maps on other days clearly show when the satellite is only taking readings over the ocean and when it is not taking any readings due to instrument reset or satellite manoeuvers. An animation showing these and other interesting features of predicted column-averaged CO 2 (i.e., XCO 2 ) between 2014-10-01 and 2017-03-01 is available at https://www.youtube.com/watch?v=wEws67WXvkY.

Future work
There are a number of useful features that could be implemented in future versions of FRK, some of which are listed below: • Currently, FRK is designed to work with local basis functions having an analytic form. However, the package could also accommodate basis functions that have no known functional form, such as empirical orthogonal functions (EOFs) and classes of wavelets defined iteratively; future work will attempt to incorporate the use of such basis functions. Vanilla FRK (FRK-V), where the entire positive-definite matrix K is estimated, is particularly suited to the former (EOF) case where one has very few basis functions that explain a considerable amount of observed variability.
• There is currently no component of the model that caters for sub-BAU process variation, and each datum that is point-referenced is mapped onto a BAU. Going below the BAU scale is possible, and intra-BAU correlation can be incorporated if the covariance function of the process at the sub-BAU scale is known (Wikle and Berliner 2005).
• Most work and testing in FRK has been done on the real line, the 2D plane, and the surface of the sphere (S 2 ). Other manifolds can be implemented since the SRE model always yields a valid spatial covariance function, no matter the manifold. Some, such as the 3D hyperplane, are not too difficult to construct. Ultimately, it would be ideal if the package would allow the user to specify his/her own manifold, along with a function that can compute the appropriate distances on the manifold.
• Currently, only the EM algorithm is implemented and hence the argument method = "EM" is implicit in the function SRE.fit. The EM algorithm has been seen to be slow to converge to a local maximum in other contexts (e.g., McLachlan and Krishnan 2007).
Other methods for finding maximum, or restricted maximum, likelihood estimates for the SRE model (e.g., Tzeng and Huang 2018) will be considered for future versions of FRK. • Although designed for very large data, FRK begins to slow down when several hundreds of thousands of data points are used. The flag average_in_BAU can be used to summarize the data and hence reduce the size of the dataset, however it is not always obvious how the data should be summarized (and whether one should summarize them in the first place). Future work will aim to provide the user with different options for summarizing the data.
• Currently, in FRK, all BAUs are assumed to be of equal area even if they are not. Unequal BAU area (see, for example, the lon-lat grid shown in Figure 13) is important when aggregating the process or the predictions. In FRK there is the option to use equal-area icosahedral grids on the surface of the sphere, and regular grids on the real line and the plane for when areal data or large prediction regions are used. (Note that in Section 6.2 an equal-area grid was not used, but also note that we did not spatially aggregate our results and that our data were point-referenced). In conclusion, the package FRK is designed to provide core functionality for spatial and spatio-temporal prediction with large datasets. The low-rank model used by the package has validity (accurate coverage) in a big-data scenario when compared to high-rank models that do not explicitly cater for fine-scale variation. However, it is likely to be less statistically efficient (larger root mean squared prediction errors) than other methods when data density is high and the basis functions are unable to capture the spatial variability.
FRK is available on the Comprehensive R Archive Network (CRAN) at https://CRAN. R-project.org/package=FRK. Its development page is https://github.com/andrewzm/ FRK. Users are encouraged to report any bugs or issues relating to the package on this web page.