A generalized, compactly supported correlation function for data assimilation applications

This work introduces a new, compactly supported correlation function that can be inhomogeneous over Euclidean three-space, anisotropic when restricted to the sphere, and compactly supported on regions other than spheres of fixed radius. This function, which we call the Generalized Gaspari–Cohn (GenGC) correlation function, is a generalization of the compactly supported, piecewise rational approximation to a Gaussian introduced by Gaspari and Cohn in 1999 and its subsequent extension by Gaspari et al in 2006. The GenGC correlation function is a parametric correlation function that allows two parameters a and c to vary, as functions, over space, whereas the earlier formulations either keep both a and c fixed or only allow a to vary. Like these earlier formulations, GenGC is a sixth-order piecewise rational function (fifth-order near the origin), while the coefficients now depend explicitly on the values of both a and c at each pair of points being correlated. We show that, by allowing both a and c to vary, the correlation length of GenGC also varies over space and introduces inhomogeneous and


INTRODUCTION
Parametric correlation functions play an essential role in data assimilation, where they are used to model covariances given a set of tunable parameters or applied as tapering functions to localize sample covariances in ensemble-based schemes (Daley, 1991;Houtekamer and Mitchell, 1998;Gaspari and Cohn, 1999;Gneiting, 1999;Hamill et al., 2001;Houtekamer and Mitchell, 2001;Gaspari et al., 2006). Compactly supported correlation functions are particularly useful for modeling covariances wherein correlations can be assumed identically zero after some distance while maintaining positive semidefiniteness. In addition, correlation functions that do not have compact support can gain compact support through the Hadamard (elementwise) product with compactly supported correlation functions, which preserves positive semidefiniteness by the Schur product theorem (Horn and Johnson, 1985, p. 458). Compactly supported correlation functions are particularly useful in high-dimensional contexts, such as data assimilation and spatial statistics, as the resulting covariances are computationally tractable due to their sparsity. One of the most commonly used compactly supported correlation functions in data assimilation is the Gaspari and Cohn (1999) (hereafter referred to as GC99) piecewise rational, compactly supported approximation to a Gaussian (their equation 4.10), which we will refer to as the GC99 function. This parametric correlation function depends on only two parameters: the tunable cut-off length parameter c and the parameter a that, when taken to be one-half, produces correlations with Gaussian-like behavior. For covariance tapering applications such as localization, the GC99 function has been used frequently since it was first applied by Houtekamer and Mitchell (2001) because of its Gaussian-like shape and compact support controlled through the cut-off length parameter c. Though the GC99 function has several attractive features, it is limited by its homogeneity (invariance under translations: GC99, p. 729) and isotropy (invariance under rotations: GC99, p. 729), unless constructed using coordinate stretching (e.g. GC99, pp. 725, 732;Ménard et al., 2016, pp. 882 -883). The fixed cut-off length parameter c of the GC99 function typically requires special tuning per application, and it has been shown that the optimal cut-off length, sometimes termed the cut-off radius or localization radius, depends on the observation type, spatial location, dynamics, and whether localization is applied in model space or observation space (Evensen, 2009, Ch. 15 and references therein).
There is thus a need for compactly supported correlation functions with more flexible features that can adjust to the situation. Several compactly supported, parametric correlation functions have been introduced that vary in their flexibility as correlation functions (e.g. Wendland, 1995;Wu, 1995;Gaspari and Cohn, 1999;Gneiting, 1999;Buhmann, 2000;Gneiting, 2002;Gaspari et al., 2006;Porcu et al., 2013;Kleiber and Porcu, 2015;Liang and Marcotte, 2016;Stanley et al., 2021). In particular, the follow-on work to GC99, namely Gaspari et al. (2006) (hereafter referred to as G06), presents a generalization of the GC99 function in which the parameter a can vary while keeping c fixed (see their equation 33). This generalization introduces inhomogeneity and anisotropy through variable a. However, like the GC99 function, the G06 function is limited by its fixed cut-off length c.
In this work, we generalize the compactly supported, piecewise rational correlation functions introduced in GC99 and G06 further by allowing both the parameters a and c to vary, as functions, over the spatial domain. This correlation function, which we refer to as the Generalized Gaspari-Cohn (GenGC) correlation function, is a compactly supported parametric correlation function where inhomogeneity and anisotropy are introduced as both a and c vary over space. By now allowing the cut-off length c to vary over the domain, the GenGC correlation function gains much more flexibility in its construction, and in particular is no longer restricted to support regions of fixed spheres. The parameters a and c can be estimated, tuned manually, or derived from dynamics. The added flexibility of GenGC while maintaining positive semidefiniteness and compact support can be useful for covariance modeling and tapering in data assimilation and spatial statistics applications.
We can see the impact of allowing both parameters a and c to vary over some spatial index k by considering the correlation length L k of GenGC, The correlation length L k of GenGC varies spatially as c k and a k vary over space, and its linear dependence on variable c k allows for direct control of the correlation length L k . In contrast, the correlation length of the GC99 function, where a = 1∕2 and c is fixed, remains constant at L = c √ 0.3. For the G06 function, the correlation length L k does vary over space as a k varies, but its dependence on a k is such that L k ∕c k in Equation (1) is tightly bounded, 0.2297 ≤ L k ∕c k ≤ 0.5485, as discussed in Section 3.2. Therefore, varying c k yields more control and influence on the correlation length, introducing flexibility that the GC99 and G06 functions lack.
This flexibility gained by allowing the correlation length L k to vary through both a k and c k can be particularly useful when reconstructing correlations from correlation lengths L k that are evolved dynamically. For example, Cohn (1993, pp. 3136 -3144) derives explicit partial differential equations (PDEs) for the correlation length field associated with state dynamics governed by the continuity equation and related PDEs. By evolving the correlation length field according to its governing PDE and with additional information regarding c k or a k , GenGC and Equation (1) can be used to reconstruct full correlation functions, as we discuss in Section 3.2. Alternatively, correlation lengths L k can be obtained by estimating the local correlation tensor from an ensemble of states (Michel et al., 2016, and references therein) and in a similar manner used to construct correlations with GenGC.
The layout of this article is as follows. In Section 2, we derive the GenGC correlation function, beginning with definitions and theory in Section 2.1, followed by its construction in Section 2.2 using convolutions and a brief discussion of its evaluation. Section 3 discusses important consequences of allowing both a and c to vary over space. Since the correlations modeled and tapered during data assimilation are generally continuous, it is natural to consider the conditions on a and c such that GenGC is continuous, which is done in Section 3.1. This is followed by Section 3.2, which defines the correlation length L k of GenGC and discusses methods for implementing GenGC given L k . In Section 4, we present two sets of examples, with one-dimensional examples in Section 4.1 and two-dimensional examples in Section 4.2, to highlight the flexibility gained by varying a and c. Concluding remarks are given in Section 5, followed by Appendices A and B, which contain a proof and an example to supplement Section 2, and the formulas for GenGC, respectively.

DERIVATION OF THE GENGC CORRELATION FUNCTION
The GC99 function (equation 4.10 of GC99) is constructed through self-convolution of a compactly supported, radially symmetric function over Euclidean three-space, R 3 . The G06 function (equation 33 of G06) and the GenGC correlation function in this work generalize this approach as follows. Rather than constructing a correlation function using one compactly supported, radially symmetric function h, we build a correlation function by convolving functions from a whole collection of m compactly supported, radially symmetric functions h k , k = 1, 2, ..., m on R 3 . Our particular choice of functions h k is given in Equation (18), though this approach can be generalized to other collections of radially symmetric functions, which we discuss at the beginning of Section 2.2.
The derivation of the GenGC correlation function is presented in two parts. In Section 2.1, we reprise definitions and theory from G06 that establish the theoretical framework in which we can build scalar covariance functions with inhomogeneous and anisotropic features. Using this framework, in Section 2.2 we construct scalar covariance functions by convolving functions from a collection of radially symmetric functions h k , k = 1, 2, ..., m. With the general approach established at the beginning of Section 2.2, we then specify the functions h k and construct the GenGC correlation function, followed by a brief discussion on its evaluation.

Definitions and theoretical framework
We begin by revisiting some of the theory established in G06, where particular definitions and discussions from G06 serve as the starting point for this work. We consider covariance functions defined on a domain Ω ⊆ R N , which we take to be an open, connected set, with smooth boundary Ω in the case in which Ω is not all of R N , where the general theory applies. In later discussions, we will restrict ourselves to R 3 or to the surface of the sphere with radius r > 0 embedded in R 3 , which we denote as S 2 r , as needed by applications.
We start by defining covariance functions of vector random fields, which will serve as the foundation for building inhomogeneous and anisotropic covariance functions.

Definition 1. (Def. 2.1 of G06)
The multilevel covariance function of a vector random field, is the m × m matrix function where where X k (r), k = 1, 2, ..., m, are real-valued random fields on Ω, and where ⟨⋅⟩ denotes mathematical expectation.
The covariance defined in Equation (3) is a matrix function, where we use boldface, for example, B, to denote matrices. We use the term "multilevel", as done in G06, to give an intuitive sense of how Equation (3) will be used to generalize single-level, scalar covariance functions. As discussed in G06 (p. 1816), this terminology does not capture the full generality of this and the following definition, but it is useful for our purposes.
The matrix structure of covariance functions defined in Equation (3) becomes more explicit in the following definition, which is equivalent to Definition 1 in the case of continuous covariance functions.
is a multilevel covariance function on Ω if, for each positive integer n and for each choice of points r 1 , ..., r n ∈ Ω, the mn × mn matrix 2, ..., n, k, = 1, 2, ..., m (6) is positive semidefinite, that is, for arbitrary scalars c ik with i = 1, 2, ..., n and k = 1, 2, ..., m, From any given matrix covariance function B(r, s) = {B k (r, s)}, we now want to extract a scalar (i.e., single-level) covariance function to evaluate in practice. To build such a scalar covariance function, we choose the indices k, = 1, 2, ..., m to index m subdomains or subregions Ω k , Ω ⊂ Ω, which are open, connected sets with smooth boundaries Ω k , Ω . These subregions partition the domain Ω into m nonoverlapping, nonempty subdomains, that is, where the overbar denotes the closure, e.g. Ω k = Ω k ∪ Ω k . Now, for every r ∈ Ω k and s ∈ Ω and for each fixed k, = 1, 2, ..., m, we define the following scalar function B: Equation (9) is a piecewise-defined, scalar (single-level) covariance function on Ω, where the nonboldface B denotes a scalar function. The proof that Equation (9) is a covariance function is given in Appendix A. From the matrix covariance function in Equation (5), we have extracted particular entries that collapse the multilevel covariance function into the single-level covariance function 1 defined in Equation (9). For r, s ∈ Ω k , B(r, s) = B kk (r, s) is an autocovariance, while, if r ∈ Ω k , s ∈ Ω , and k ≠ , then B(r, s) = B k (r, s) is a cross-covariance. The piecewise nature of Equation (9) is inherited from the partitioning of Ω: different functions B k (r, s) are evaluated depending on the subregions r and s belong to, piecing together a scalar covariance function that is defined over the full domain Ω. In Appendix A, we illustrate how to construct Equation (9) in a simple example, which makes the piecewise nature of Equation (9) more explicit.
The advantage of scalar covariance functions defined via Equation (9) is that the subregions Ω k can have different properties, such as length-scale or cut-off length. Therefore, as these properties vary over the different subregions Ω k of the full domain Ω, inhomogeneous and anisotropic features are introduced to the resulting scalar covariance function. The choice of subregions Ω k can be quite general and can adapt to the needs of different applications. For example, Ω 1 , Ω 2 , ..., Ω m can represent different vertical levels of a global circulation model, which motivated the terminology "multilevel" in G06. Each vertical level can just as well be partitioned again into individual grid cells, so that we represent the subregions Ω k with multi-index k = (k lat , k lon , k vert ), for instance.

Construction
The construction of scalar covariance functions from matrix covariance functions described in the previous section provides the framework in which we can introduce inhomogeneous and anisotropic features by varying quantities such as length-scale or cut-off length over the different subregions Ω 1 , Ω 2 , … , Ω m . There are several different ways to use Equation (9) to build inhomogeneous and anisotropic covariance functions, as illustrated in examples 2.3, 2.5, and 2.6 of G06, for instance. The focus of the present work is on the construction of scalar covariance functions from a collection of radially symmetric, compactly supported functions. We will choose the functions B k (r, s) in Equation (9) to be the convolution of such functions, applying theory established in GC99 to the framework introduced here in Section 2.1. We begin by describing the construction of scalar covariance functions from convolutions first on R N for general N, then restricted to N = 3. This general approach will lay the groundwork 1 A matrix covariance function, Equation (3) for example, can be interpreted as "multilevel" in the sense that each k corresponds to a "level" in which a single-level covariance function is defined over the full domain Ω with m different levels as k = 1, 2, ..., m. Equation (9) extracts a portion from each of these k levels to collapse the multilevel covariance function into a single-level, piecewise-defined, scalar covariance function on Ω.
for the construction of GenGC on R 3 and on the surface of the sphere S 2 r , which we describe at the end of this section. Suppose we have a collection of functions h k ∶ R N  → R, k = 1, 2, ..., m, with each h k ∈ L 1 (R N ) ∩ L 2 (R N ). Assume each h k is radially symmetric on R N , meaning that there exists an even function h 0 k ∶ R  → R such that, for any where each function h 0 k depends only on the distance || ⋅ ||, which here denotes the Euclidean norm on R N . Other norms can be used in Equation (10), however the Euclidean norm is the natural choice in this context. Now, for each fixed k, = 1, 2, ..., m and all r, s ∈ R N , define B k (r, s) to be the following convolution over R N of functions from this collection: (11) As shown in G06 (see remarks on p. 1820), the resulting matrix of convolutions B(r, s) = {B k (r, s)} = {(h k * h )(r − s)} for k, = 1, 2, ..., m defines a multilevel covariance function. To extract the scalar covariance function from this matrix of convolutions, we partition the given domain Ω ⊆ R N into subregions Ω k , k = 1, 2, ..., m, where these indices k correspond to the indices of the radially symmetric functions h k . Equation (9) then defines the scalar covariance function B(r, s) extracted from the matrix covariance function B(r, s) = {(h k * h )(r − s)} of Equation (11). That is, the scalar covariance function B(r, s) is obtained by evaluating Equation (11) only for r ∈ Ω k and s ∈ Ω , for each fixed k, = 1, 2, ..., m.
For high-dimensional applications, we are interested in constructing covariance functions in which long-range correlations are identically zero. We can accomplish this by requiring the radially symmetric functions h k to have compact support, that is, each of these functions will go to zero at some fixed distance. For the rest of the section, we will require the functions h k to be supported on spheres of fixed radius c k > 0, where c k is referred to as the cut-off length.
With our scalar covariance function B(r, s) defined through Equation (9) by the convolutions in Equation (11), all that remains is to evaluate the convolution integrals. For evaluation, we will restrict ourselves to R 3 , which is sufficient for most applications and will make explicit computation of Equation (11) feasible. To simplify the calculation of Equation (11), observe that, for each pair k, , the function B k (r, s) = (h k * h )(r − s) is radially symmetric, which is a property inherited from the radial symmetry of h k and h (see theorem 3.a.1 in GC99). Therefore, for fixed k, and for r ∈ Ω k and s ∈ Ω , we can express B k (r, s) = B 0 k (||r − s||) for some even function B 0 k defined on the real axis, which simplifies the evaluation of Equation (11) significantly and is summarized in the following theorem from GC99. Theorem 1. (Theorem 3.c.1 of GC99) Suppose that h k ∈ L 1 (R 3 ) ∩ L 2 (R 3 ) is radially symmetric and supported on a sphere of radius c k , 0 < c k ≤ ∞, for k = 1, 2, ..., m. Let h 0 k and B 0 k denote the even functions on R given by for r ∈ R 3 and k, = 1, 2, ..., m, where || ⋅ || denotes Euclidean distance. If c k ≤ c , then Therefore, for r ∈ Ω k , s ∈ Ω , and c k ≤ c , we evaluate the scalar covariance function B k (r, s) = B 0 k (||r − s||) by evaluating Equations (14) and (15) with z = ||r − s||.
The correlation function C k (r, s) can be computed by decomposing the covariance B k (r, s) for r ∈ Ω k and s ∈ Ω as follows: where B kk (r, r) and B (s, s) are the variances at r ∈ Ω k and s ∈ Ω , respectively. Using Theorem 1, we can therefore evaluate the correlation function explicitly for each k, and r ∈ Ω k , s ∈ Ω as follows: Up to this point, we have described the construction of scalar covariance functions from convolutions of arbitrary compactly supported, radially symmetric functions h k . To construct the GenGC correlation function, we specify the functions h k as follows. Given scalars a k ∈ R and c k > 0, for k = 1, 2, ..., m, define h k ∶ R 3  → R as where The functions h k (r; a k , c k ) are piecewise linear, compactly supported, radially symmetric functions that are supported on a sphere of radius c k . Each h k (r; a k , c k ) is parametrized by the scalar a k and the cut-off length c k . Equation (18) generalizes the piecewise linear functions defined in section 4c of GC99 and equation 32 of G06 by letting both parameters a k and c k vary over the index k. Examples of Equation (18) are given in Figure 1 for different values of a k . The parameter a k determines the value of h k (r; a k , c k ) at ||r|| = c k ∕2, subsequently determining the slopes of each line in its piecewise definition.
To construct the GenGC correlation function, we partition the domain Ω ⊆ R 3 into subregions Ω k , k = 1, 2, ..., m, where each subregion Ω k has an associated cut-off length c k and parameter a k used to define h k . The functions B k (r, s) for r ∈ Ω k , s ∈ Ω are defined using Equation (11) on R 3 for h k , h given in Equation (18). For fixed k, , we can use Theorem 1 to evaluate B k (r, s). Equation (17) then defines the GenGC correlation function. Like the GC99 and G06 functions, GenGC is sixth-order piecewise rational (fifth-order near the origin) 2 and three-times continuously differentiable in space for fixed a k , a , c k , c . In the case in which c k = c for k = 1, 2, ..., m is fixed while a k and a vary, GenGC is the G06 function; similarly, if also a k = 1∕2 for k = 1, 2, ..., m, GenGC is exactly the GC99 2 The GenGC, GC99, and G06 correlation functions are sixth-order piecewise rational functions in that the highest-degree terms are of the form z 6 ∕z (near the origin, the highest-degree terms are z 5 since the 1∕z terms vanish, and are therefore fifth-order). We therefore correct the misstated order of the GC99 function in section 4c of GC99 and the G06 function in G06, p. 1821. function. For general choices of cut-off lengths c k and c , GenGC is not supported on a sphere of uniform radius. Rather, its support depends on the values of c k and c . This, in combination with the parameters a k and a , allows for correlation functions that can be highly inhomogeneous and anisotropic.
GenGC is a piecewise-defined scalar correlation function that is first split into six functions depending on the relationship of c k and c , and subsequently split again into 12 different functions that depend on how ||r − s|| for r ∈ Ω k and s ∈ Ω relates to the particular c k and c . Appendix B describes the construction of GenGC in further detail and gives the explicit formulae. We verified that this function is symmetric positive-definite numerically by generating a variety of different matrices and checking for symmetry and positive eigenvalues. Note that these formulae reduce to those of the G06 function (their equation 33 and appendix C) in the case in which c k = c for k = 1, 2, ..., m.
Though the explicit formulae for the GenGC correlation function are lengthy (see Appendix B), its evaluation can be implemented efficiently in practice, for instance as an operator acting on a vector (Gilpin, 2023). The evaluation of the GenGC correlation function can be formulated as a two-step process: first, there are the two sets of conditional statements that determine the correct piecewise function to evaluate based on c k , c , and ||r − s||, then a small number of scalar operations are performed to evaluate the chosen function. These two steps are not computationally expensive, can be optimized per coding language, and simply extend existing implementations of the GC99 and G06 functions.
Remark. As the GenGC correlation function is constructed via convolution over the Euclidean space R 3 , for r, s ∈ R 3 the norm ||r − s|| is taken to be the Euclidean norm. Restricting r, s ∈ R 3 to S 2 r for r > 0 embedded in R 3 requires restricting the Euclidean norm to S 2 r , which is the chordal distance, where is the angle between the vectors r and s, ( GC99,737). This implies that the cut-off length c k should also be specified in terms of chordal distance.

CONTINUITY PROPERTIES INVOLVING A AND C
With the GenGC correlation function derived in the previous section, we now discuss two of its properties resulting from a k and c k varying over the domain. Since correlation modeling and tapering applications in data assimilation typically require continuity, we first study the influence of a k and c k on the continuity behavior of GenGC. We show that GenGC is not, in general, continuous at the boundaries Ω k of the subregions Ω k . If, however, a k and c k are defined as discretizations of continuous functions a and c on Ω, then continuity of GenGC, in an appropriate sense, ensues. We conclude this section with a discussion of the correlation length of GenGC, first introduced in Equation (1), and implementation of GenGC when the correlation length is available over the domain.

Continuity of a and c as functions on
The GenGC correlation function is defined piecewise on the domain Ω. The piecewise nature is inherited from the partitioning of Ω into subregions Ω 1 , Ω 2 , … , Ω m , with each subregion assigned constants a k and c k used to define the functions h k for k = 1, 2, ..., m in Equation (11). For fixed k, , the scalar function B k (r, s) in Equation (11) used to define GenGC in Equation (17) is continuous for r ∈ Ω k , s ∈ Ω , since the functions h k and h are continuous in their arguments. However, continuity of B k (r, s) over the full domain Ω (i.e. as k, vary), and hence that of the GenGC correlation function, is not always guaranteed. Rather, the continuity of B k (r, s) over Ω depends on the behavior of B k (r, s) across the boundaries of the subregions Ω 1 , Ω 2 , ..., Ω m .
To see this, fix k, with k ≠ and let r * ∈ Ω k and s * ∈ Ω with r * = s * a point in common to the two boundaries. We define B kk (r * , r * ) and B (s * , s * ) in terms of limits, where the last equality in each case comes from the convolutions defined in Equation (11). We also define B k (r * , s * ) in a similar manner, The limits in Equations (21)-(23) exist due to the continuity of B k (r, s) for r ∈ Ω k , s ∈ Ω , and fixed k, .
For the covariance function to be well-defined over all of Ω, including at the boundaries between subregions, and in particular for the variance to exist at each common boundary point r * = s * , we must have B kk (r * , r * ) = B (s * , s * ) = B k (r * , s * ). It is clear, however, from Equations (21) and (22) that B kk (r * , r * ) ≠ B (s * , s * ) in general, since h k and h under the integral signs are different functions if a k ≠ a or c k ≠ c . We also have by the Cauchy-Schwartz inequality. Though the quantity B k (r * , s * ) is defined at the common boundary point r * = s * when a k ≠ a or c k ≠ c , it is not the variance, and all we are guaranteed is the bound given in Equation (24). We illustrate the consequences of specifying a k and c k arbitrarily over the different subregions in the top row of Figure 2. For this example, we consider the GenGC correlation function on the domain (0, 1), which is discretized into 201 equally spaced grid points (including both endpoints 0 and 1). We then partition the one-dimensional line segment (0, 1) into two subregions Ω 1 = (0, 1∕2) and Ω 2 = (1∕2, 1), where a 1 ≠ a 2 and c 1 ≠ c 2 (values shown in the top left panel). The GenGC correlation matrix produced for this case, shown in the top middle panel, is continuous within the four blocks (which correspond to fixed k, = 1, 2), but is discontinuous where the boundaries of the subregions meet at r * = s * = 1∕2. This is more explicit in the one-dimensional correlations shown in the top right F I G U R E 2 One-dimensional examples of the GenGC correlation function on (0, 1) for different choices of functions for a and c (and subsequently partitions of (0, 1)). In both cases, the domain (0, 1) is discretized into 201 equally spaced grid points (including both endpoints 0 and 1). Top row: the domain is partitioned into two subregions, Ω 1 = (0, 1∕2) and Ω 2 = (1∕2, 1), where values of a and c are constant over these two domains and chosen to mimic the constant values of the continuous case, shown in the bottom left panel. The middle panel illustrates the corresponding GenGC correlation matrix, with horizontal lines indicating the selected one-dimensional correlations shown on the right for rows 50 (gray), 100 (orange), and 170 (yellow). Bottom row: both a and c vary continuously over (0, 1) according to the functions specified in the subplot title. The corresponding GenGC correlation matrix is in the middle (see text for the description of its construction), with the right panel illustrating rows 50, 100, and 170 from the correlation matrix. [Colour figure can be viewed at wileyonlinelibrary.com] panel, where we see that, away from the boundary at 1∕2, the correlations are continuous, but each correlation has a jump discontinuity at 1∕2.
Though a 1 , a 2 , ..., a m and c 1 , c 2 , ..., c m can be arbitrarily specified on the different subregions Ω 1 , Ω 2 , ..., Ω m , the covariance function, and hence the GenGC correlation function, will not necessarily be continuous over the full domain Ω. Since data assimilation applications typically require continuous covariance functions, we must impose conditions on a k , a and c k , c so that the covariance function is well-defined and continuous where the boundaries of the subregions Ω k and Ω meet. This requires defining a k , a and c k , c from continuous functions on Ω, which we describe in the following theorem.
Proof. In the limit as Ω k and Ω shrink towards their boundary points r * and s * , respectively, it follows from Equation (25) that a k → a(r * ), c k → c(r * ), a → a(s * ), and c → c(s * ). Since the boundary points satisfy r * = s * , we have that a k = a and c k = c in the limiting case. Therefore, as Ω k and Ω shrink to their common boundary point r * = s * , it follows from Equations (21) and (22) that B kk (r * , r * ) = B (s * , s * ), that is, the variances are equal and well-defined. In addition, it follows from Equations (21) and (23) that B k (r * , s * ) = B kk (r * , r * ) = B (s * , s * ). ▪ Therefore, the variances are continuous throughout Ω when a k and c k are chosen as appropriate discretizations of continuous functions, in the sense described in Theorem 2, and this implies continuity of the GenGC correlation function in the same sense. In practice, the integrals in Equation (25) can be approximated numerically, such as with the midpoint rule, for example.
Theorem 2 provides the framework to construct continuous GenGC correlation functions on Ω. Suppose the domain Ω is discretized into m grid cells. Take each grid cell to be a subregion Ω k for k = 1, 2, ..., m, then define a k and c k according to Equation (25) given continuous functions a and c on Ω. The bottom row of Figure 2 constructs GenGC on (0, 1) in this framework. The bottom left panel plots the functions a(r) and c(r) for r ∈ (0, 1), both constructed using a hyperbolic tangent function to allow for a smooth transition between two constant values on either side of 1∕2. The domain (0, 1) is discretized into 201 equally spaced grid points (including both endpoints 0 and 1), and the subregions Ω k for k = 1, 2, .., m = 200 are the grid intervals of width 1∕200 centered half-way between each grid point. Values of a k and c k in Equation (25) are evaluated using the midpoint rule with four subintervals within each subregion Ω k . The resulting GenGC correlation matrix is shown in the bottom center panel, along with selected rows of the correlation matrix on the right. We can see that, when a and c are continuous functions and each subregion is taken to be a grid cell on the discrete grid, the GenGC correlation matrix does not have the jump discontinuities seen in the top row, and instead has a smooth transition across 1∕2 corresponding to the smooth behavior in a and c shown on the left. In fact, by choosing the hyperbolic tangent to construct a and c, we have created a continuous version of the discontinuous case shown in the top row of Figure 2, as is clear under mesh refinement (not shown).
We emphasize here that the choice of a and c and the partitioning of the domain have a significant impact on not only the inhomogeneity and anisotropy of the GenGC correlation function but also its continuity over space, which is not readily available from convolution theory alone. For applications where continuity of the GenGC correlation function is required, choosing a and c to be continuous functions over the spatial domain will ensure that continuity holds. In cases where a and c are not continuous functions, such as the piecewise constants shown in the top row of Figure 2, the GenGC correlation function will not be continuous across the boundaries of the subregions. There are applications where jump discontinuities like those shown in Figure 2 may be suitable, for instance at the interface between different fluids, such as the atmosphere and ocean (e.g. Stanley et al., 2021). Care should be taken to ensure that the GenGC correlation function constructed satisfies the requirements of the particular application.

Correlation length
Evaluation of the GenGC correlation function requires specifying a 1 , a 2 , … , a m and c 1 , c 2 , … , c m over the different subregions of the domain. The GenGC correlation function does not impose restrictions on how a 1 , a 2 , … , a m and c 1 , c 2 , … , c m are specified, therefore these quantities can be specified arbitrarily. If GenGC must be continuous over the full domain, the only restriction is that a k and c k for k = 1, 2, … , m are defined from continuous functions, as described in the previous section, but these functions can also be arbitrary.
In some applications, the cut-off length values c k for k = 1, 2, … , m may be readily specified from physical or computational considerations, while specifying each a k may be less straightforward and these might instead be computed from other known quantities. The desired correlation length values L k , for instance, may be available while holding the cut-off length constant. For example, for state dynamics governed by the continuity equation and related hyperbolic PDEs, an explicit PDE can be derived for the correlation length field (see Cohn, 1993, pp. 3137 -3138). Therefore, if the PDE for the correlation length field is solved and the cut-off field is known or specified, correlations can be reconstructed using GenGC, provided that we can relate the correlation length L k and cut-off length c k to a k . We derive an expression here that does so, and we discuss methods of reconstructing correlations with GenGC in this context.
For fixed k, and r ∈ Ω k , s ∈ Ω , the GenGC correlation function, Equation (17), can be expressed as C k (r, s) = C 0 k (z) for an even function C 0 k defined on the real axis with z = ||r − s||. Therefore, we can define the correlation length for GenGC in terms of its second derivative with respect to z for k = and at z = 0 (Daley, 1991, p. 110;Michel et al., 2016, special case of their equation 2), From the explicit formula for GenGC (see Appendix B), we derive an expression for the correlation length L k as a function of a k and c k , As the quantities a k and c k vary over the different subregions Ω k for k = 1, 2, ..., m, the correlation length L k also varies. In particular, if a k and c k are defined from continuous functions on Ω, for example, by Equation (25), then it follows from Equation (27) that the correlation length L k is also. Observe that, while L k depends linearly on c k for fixed a k , L k depends only weakly on a k away from a k = 0, but strongly near a k = 0, as shown in the left panel of Figure 3. In applications where the cut-off lengths c k and correlation lengths L k are available, to construct GenGC one would need to recover a k . From Equation (27), we can solve for a k as a function of L k and c k , which yields a quadratic equation with the following two solutions: where we define k = c k ∕L k , and a + k ( k ) (a − k ( k )) refers to the solution where the square root is added (subtracted).
Since the values of a k in Equation (28) are solutions to a quadratic equation, there are restrictions on c k and L k so that the solutions remain real, as well as restrictions on the choice of a + k and a − k that should be considered. The right panel of Figure 3 plots a + k and a − k as functions of k = c k ∕L k and illustrates these restrictions. For the solutions in Equation (28) to remain real and bounded, we require that the discriminant remains positive and the denominator is nonzero. It follows that two real roots a + k and a − k do not exist for all values of k . Rather, we obtain two sets of restrictions on values of a + k and a − k in Equation (28) and the corresponding values of k , which we will denote as + k and − k for a + k and a − k , respectively: The root a + k (gray curves in the right panel of Figure 3) is asymmetric about the vertical asymptote at + k = √ 160∕33 ≅ 2.2019 (dotted line in the right panel of Figure 3). Therefore, values of a + k that cross the vertical asymptote at + k will produce a GenGC correlation function that is not continuous. The root a − k (orange curve), (27) (30), but the values of a − k that can be achieved are limited, bounded by the black horizontal lines in the right panel of Figure 3. As k varies between the limits expressed in Equations (29) and (30), 1∕ k = L k ∕c k is restricted to vary only between 1∕4.3536 ≅ 0.2297 and 1∕1.8233 ≅ 0.5485.

F I G U R E 3 Left: Equation
Solving for a k directly from Equation (28) given k is possible, but has the limitation expressed by Equations (29) and (30) and illustrated in the right panel of Figure 3. For applications where a k may not vary substantially, an alternative to solving for a k directly would be to fix k = as constant, which is equivalent to fixing a k = a. Given a set of correlation lengths L k , one could then solve for the cut-off lengths by evaluating c k = L k , making sure to observe the restrictions presented in Equations (29) and (30) when choosing . For example, correlation lengths L k may be obtained dynamically, as in Cohn (1993, Sec. 4b) and, fixing = 1.8233, the lower bound of − and + , we recover the cut-off lengths c k and fix a. This choice of allows for recovery of the smallest possible cut-off lengths c k given the correlation lengths L k , which may be advantageous to reduce computational expense. Alternatively, one could simply take a = 1∕2 fixed as in the GC99 correlation function, which, according to the right panel of Figure 3, would be nearly the same as choosing at its minimum = 1.8233.

EXAMPLES
The advantage of GenGC is that both a k and c k can vary over the subregions Ω k , producing correlations that can be markedly inhomogeneous and anisotropic over the full domain Ω. This section presents simple examples that highlight the anisotropy of GenGC on the sphere by allowing these quantities to vary, first through a series of one-dimensional examples, followed by a two-dimensional example on the surface of the Earth.

One-dimensional examples
We begin with one-dimensional examples on the discretized unit circle S 1 1 with m = 200 equally spaced grid points. The subregions Ω k for k = 1, 2, ..., m = 200 are the grid intervals of width ∕100 centered at the grid points. We choose c(r) and a(r) to be continuous functions on the unit circle and evaluate c(r) and a(r) at each grid point to define c k and a k , respectively (which is equivalent to evaluating Equation (25) using the midpoint rule with one subinterval equal to the subregion). The corresponding GenGC correlation matrices are constructed using the chordal distance, Equation (19), for different choices of functions a(r) and c(r). We give three examples that illustrate the results of allowing a and c to vary as continuous functions over the spatial domain.

Example 1
The first example, shown in Figure 4, constructs the GenGC correlation matrix with a and c defined using simple sine functions illustrated in the left panel. The resulting correlation matrix in the middle panel of Figure 4 has a nonuniform region where correlations vanish (white), with larger nonzero regions in the right half of the domain and smaller nonzero regions on the left. This reflects the choice of function for c and its impact on the region of compact support. The specific one-dimensional correlations, as we see in the right panel, also illustrate the impact of the spatial variability of the cut-off parameter c, where the correlations are compactly supported and asymmetric about zero separation. Though a also varies over space in this example, these values of a do not have a significant impact on the shape of the resulting correlations.
Correlation functions like the GenGC correlation function shown in Figure 4 may be useful for covariance tapering, particularly for localization in ensemble-based data assimilation schemes. Current data assimilation schemes often implement the GC99 function for localization, which is isotropic on the sphere, since c is fixed and a = 1∕2 is held constant. Alternate means of localization have been proposed, such as adaptive localization (Bishop and Hodyss, 2007), spatially dependent localization (Buehner, 2012), scale-dependent localization (Buehner and Shlyaeva, 2015), empirical localization functions (Anderson and Lei, 2013), and cross-localization for coupled data assimilation (Stanley et al., 2021). 3 The GenGC correlation function removes the restriction of a single cut-off length and produces correlations that can be anisotropic and asymmetric about zero separation, as illustrated in Figure 4.

Example 2
We can see the impact of allowing a to vary over space on the anisotropy of the GenGC correlation function more clearly in Figure 5. In this example, the function a varies significantly on the unit circle and attains values less than zero. As a result, the GenGC correlation matrix shown in the middle panel of Figure 5 is highly anisotropic and has negative correlations, all while remaining compactly 3 The mulitvariate Gaspari-Cohn localization function introduced by  supported (white region). The one-dimensional correlations for this case are much more asymmetric about zero separation (right panel of Figure 5) and vary significantly in their behavior. The gray correlation of row 50 is quite narrow, as it is in the part of the domain with the smallest values of c, while the correlation at row 150 (yellow) is very broad close to zero separation.
The GenGC correlation function becomes negative for values of a less than zero, due to the behavior of the piecewise linear functions defined in Equation (18). Correlations that become negative over certain regions occur naturally in several contexts, such as for geostrophically balanced wind fields (Daley, 1991, pp. 160 -161, 163), for instance. Allowing a to vary across a = 0 changes the interior shape of the correlations significantly and can be tuned to suit different applications.

Example 3
A common application of correlation functions with compact support is to compute the Hadamard (elementwise) product with a correlation function that does not have compact support. Figure 6 is an example of such a case, showing the Hadamard product of the GenGC correlation function in Figure 5 with the first-order autoregressive (FOAR) correlation function. The FOAR correlation function is defined as where L 0 is commonly referred to as the length-scale (not to be confused with the correlation length L defined in Section 3.2). The FOAR correlation function is not differentiable due to the cusp at zero separation and never becomes zero. We see in Figure 6 that taking the Hadamard product of the correlations of Figure 5 and the FOAR correlation with L 0 = ∕4 produces correlations that have compact support and a mixture of features from both functions. The cusp-like behavior of the FOAR is preserved by the Hadamard product, but it is now asymmetric about zero separation, contains negative correlations, and is compactly supported. The Hadamard product of GenGC with the FOAR combines the unique features from each function into one correlation function, which can be a powerful tool for covariance modeling, particularly for applications in which cusp-like behavior can be expected

Two-dimensional example
In addition to one-dimensional examples, we consider a simple two-dimensional case motivated by the example presented in Paciorek and Schervish (2006). We construct correlations over the state of Colorado, which we divide into two regions: a western mountainous region with short cut-off lengths, and an eastern plains region with long cut-off lengths. Figure 7 plots the corresponding a and c fields as functions of latitude and longitude, where both fields are constructed using a hyperbolic tangent (as we did in Figure 2) and are latitudinally invariant. These are relatively homogeneous fields in both the mountainous  Figure 7. Panels (c) evaluate GenGC where a is fixed at one-half and c is allowed to vary according to the background field in the right panel of Figure 7. Finally, panels (d) are the full GenGC correlation function, which uses the full c and a background fields shown in Figure 7. These examples highlight the anisotropy gained by letting a and c vary over the surface of the Earth and illustrate how this affects the correlation function when compared with the case where either one or both are fixed.
The western location in Figure 7 (marked +) is in a region of relatively constant values of a and values of c that are small. As a result, the correlations in Figure 8 are nearly identical, as the long-range behavior of a and c has little influence because of the very short local cut-off lengths. Panels (a) and (c) are nearly identical but differ slightly from panels (b) and (d) in their interior behavior, because each pair uses different a values for their construction. The correlations in this case are all fairly isotropic and decay to zero relatively quickly beyond the 0.05 contour.
Moving eastward to the central marked point in Figure 7 (× point), we obtain the correlations shown in Figure 9. This location sits on the western edge of the steep gradients in a and c, which heavily influences the behavior of the correlations that allow for a and/or c to vary (panels b-d). The GC99 function in panel (a), as expected, is isotropic and only differs from panel (a) in Figure 8 in how fast it decays to zero. Panels (b) and (d) have the most strikingly different interior behavior, due to the spatial dependence of a, where structures are no longer radially symmetric. Panels (c) and (d) capture the changes in c, noting that cut-off lengths are small to the west of the center location and increase when moving east. This is reflected in the eastward expansion of the correlations in panels (c) and (d), whereas the correlations in panels (a) and (b) go to zero within a fixed radius determined by the value of c at the center. Examining Figure 9 in its totality shows the step-by-step progression of the GC99 function to its most general form of GenGC in panel (d).
Correlations in Figure 10 are evaluated at the starred location in Figure 7 (⋆ point) on the eastern edge of the steep gradient in both the a and c fields. In this region, we obtain correlations that become negative for both the G06 function in panel (b) and GenGC in panel (d), which are not captured when a is fixed at one-half in panels (a) and (c). As in Figure 9, we see the influence of the gradient in the cut-off length field c, where correlations that allow the cut-off length field to vary (i.e., panels c and d) exhibit expansion eastward towards larger cut-off lengths and shrinkage westward for shorter cut-off lengths, relative to the fixed cut-off lengths in panels (a) and (b).

CONCLUDING REMARKS
This work presents a generalized version of the Gaspari and Cohn (1999) compactly supported, piecewise rational correlation function (the GC99 function) and its subsequent extension in equation 33 of Gaspari et al. (2006) (the G06 function). The function presented in this work, which we refer to as the Generalized Gaspari-Cohn (GenGC) correlation function, extends the GC99 and G06 functions to let the cut-off length c and parameter a vary over arbitrary subregions Ω 1 , Ω 2 , … , Ω m , such as individual grid cells, that partition the domain Ω on which GenGC is defined. As a consequence, the GenGC correlation function is a piecewise-defined, scalar correlation function that can be inhomogeneous and anisotropic while remaining compactly supported. In particular, by allowing the cut-off length c to vary over the different subregions, GenGC can be compactly supported on regions other than spheres of fixed radius with more direct control of the correlation lengths, introducing features that the GC99 and G06 functions lack. The general framework to construct GenGC and other scalar covariance functions that allow parameters to vary over the domain is built from matrix covariance functions of vector random fields. As we describe in Section 2.1, matrix covariance functions provide the initial structure into which we can introduce inhomogeneity and anisotropy, and from these matrix functions we can extract the piecewise-defined scalar covariance function evaluated in practice. In this way, we build the GenGC correlation function from the convolution of compactly supported, radially symmetric functions, as done in Section 2.2.
Since GenGC is defined piecewise, continuity over the full domain is not always guaranteed, as discussed in Section 3.1. In practice, choosing the subregions Ω 1 , Ω 2 , … , Ω m to be the grid cells of the discretized domain and allowing a and c to vary continuously over the full domain will ensure the GenGC correlation function remains continuous, in an appropriate sense. The one-and two-dimensional examples in Section 4 are constructed in this manner. These examples are intended to illustrate the flexibility of the GenGC correlation function while maintaining its compact support. They suggest potential areas of application where this function may be useful, such as for localization or covariance modeling. In particular, the GenGC correlation function can play an important role in reconstructing covariances from correlation length fields and variance fields. Covariances can be approximated locally through the variance and correlation length fields, and, as shown by Cohn (1993), in some cases these fields satisfy partial differential equations that can be evolved forward in time. The correlation length fields are given to a parameteric correlation function and the covariance is recovered by rescaling the correlations by the evolved variance.
The increased flexibility of the GenGC correlation function compared with the GC99 and G06 functions is reflected in its correlation length, Equation (27), which varies over the domain Ω as a k and c k vary over the different subregions Ω k for k = 1, 2, … , m. This correlation length is a scalar value, independent of the dimensionality of Ω, due to the radial symmetry of the functions h k in Equation (18) used to construct GenGC. This implies that, for Ω in two and three dimensions, GenGC is still locally isotropic, in the sense that in each subregion Ω k there is only one, scalar correlation length defined, independent of direction. To be fully anisotropic in two and three dimensions requires correlation lengths that depend on direction (for example: Cohn, 1993, Sec. 5b, Eqs. 5.34a,b; local correlation tensors defined in Michel et al., 2016, Sec. 2.4). Though the GenGC correlation function is a generalization of the GC99 and G06 functions, further work is needed to construct a version that is fully anisotropic in two and three dimensions. AUTHOR CONTRIBUTIONS Shay Gilpin: conceptualization; formal analysis; funding acquisition; investigation; software; visualization; writing -original draft; writing -review and editing. Tomoko Matsuo: conceptualization; funding acquisition; writing -review and editing. Stephen E. Cohn: conceptualization; formal analysis; funding acquisition; writing -review and editing. and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF or of NASA. SG thanks Sage Shaw for his help refactoring the code used to generate the one-dimensional examples.

DATA AVAILABILITY STATEMENT
The code that implements the GenGC correlation function is shared on Zenodo at https://doi.org/10.5281/zenodo. 7859258 (Gilpin, 2023), which includes the link to the corresponding Github repository.

APPENDIX A. MATRIX AND SCALAR COVARIANCE FUNCTIONS
Section 2.1 describes the general methodology for extracting a scalar covariance function, Equation (9), from a matrix covariance function. In this Appendix, we prove that Equation (9) is in fact a covariance function, followed by a concrete example of a matrix covariance function and the extracted scalar covariance function.
Example. Suppose we partition the domain Ω into m = 2 subregions Ω 1 and Ω 2 . The matrix covariance function B(r, s) for r, s ∈ Ω is the following 4 × 4 block matrix, the arguments of which in each block we represent in abstract form with the subregions Ω 1 and Ω 2 , The boxed terms correspond to the entries we extract to build the piecewise-defined scalar covariance function in Equation (9), ) . (A4)

APPENDIX B. COMPUTATION EXAMPLE AND FORMULAE
The explicit computation of GenGC is very lengthy. In this Appendix, we give an example of how to compute one of the functions using Equation (17), followed by the complete formulae for GenGC.
For these computations, assume that c k ≤ c . In the case in which c ≤ c k , the indices in the functions below simply need to be switched. Computation of C k (r, s) for r ∈ Ω k , s ∈ Ω in Equation (17) is first split into the following six cases depending on the relationship between c k and c , with 12 subcases each: The six different cases and 12 corresponding subcases can be determined by considering figures such as Figure B1. This figure shows the (z, r)− plane (with z and r defined as in Equation (14)), where the solid black lines mark the regions that determine the bounds of Equation (14) and the piecewise portion of h (z; a , c ).
The intersections of r = c k ∕2 and r = c k in gray (which for Figure B1 are shown for 0 < c k ≤ c ∕4) with the solid black lines divide the z-axis into 12 different intervals which define 12 different sets of Equation (14) integrals to evaluate. These are the = 1, 2, ..., 12 cases in f p (z). The 12 intersections shift along the z-axis as c k changes with respect to c , defining the six functions f p (z), p = 1, 2, ..., 6.

F I G U R E B1
Diagram in the (z, r) plane used to evaluate the integrals for the function f 1 (z) (0 < c k ≤ c ∕4). Solid black lines denote boundaries in z and r that define either the bounds in Equation (14) or different piecewise definitions of h . Grey horizontal lines mark r = c k ∕2 and r = c k for the case in which 0 < c k ≤ c ∕4. The intersections of the solid black lines and r = c k ∕2 and the r = c k gray lines (indicated by the black dashed lines) mark the different subintervals in z that define f 1 (z), where the evaluation of Equation (14) changes.
We begin by computing the normalization constant (17). To do so, we first compute B 0 kk (0), which is equivalent to B 0 (0) for k = . From Equation (15), we evaluate the following integral: where for simplicity we have written h k,1 (z) = (2(a k − 1)z∕c k + 1) n k for 0 ≤ z ≤ c k ∕2 and h k,2 (z) = 2a k n k (1 − z∕c k ) for c k ∕2 ≤ z ≤ c k ; see Equation (18). Therefore, the normalization constant becomes For illustration, we now show how to compute f 11 (z). The = 1 interval in z, which determines f 11 (z), is 0 ≤ z ≤ c k ∕2. From Figure B1, we see three distinct regions which determine the bounds on the outermost integral: r = z ≤ c k ∕2, r = c k ∕2, and r = c k . All three regions satisfy r < c ∕2 − z, implying that r + z < c ∕2, so h (s; a , c ) = h ,1 (s) for the innermost integral in all cases. The lower bound |r − z| is split into three cases: r − z < 0, 0 ≤ r − z ≤ c k ∕2, and c k ∕2 ≤ r − z ≤ c k . Thus, Equation (14) reduces to the following: where again for simplicity we have used h k,1 (z) and h k,2 (z) as defined above. The coefficients for f 11 (z) are computed by evaluating Equation (B6) and normalizing by Equation (B5). These are given explicitly in the first row of Table B1.
All other functions f p (z) are computed in a similar manner, although the integrals become more complicated, as do the resulting coefficients. Tables B1-B19 summarize the coefficients in each case. Again, in the case in which c k = c , GenGC reduces to equation 33 of G06; see appendix C of G06 for the formulae in this case. For these computations, we evaluated the integrals symbolically using Mathematica (Wolfram Research, Inc., 2022).
Since an explicit, analytical formula for GenGC is defined by Tables B1-B19, it can be implemented efficiently in a discrete setting. Once the coefficients in Tables B1-B19 are written, two sets of conditionals are necessary to then determine which coefficients to evaluate. Since GenGC is a scalar covariance function, only scalar operators are necessary for its evaluation. Therefore, GenGC can be evaluated efficiently in practice, typically as a linear operator acting on a vector (e.g., Gilpin, 2023).

TA B L E B1
Coefficients for the function f 1 (z) = n k n

TA B L E B2
Coefficients for the function f 1 (z) = n k n

TA B L E B3
Coefficients for the function f 1 (z) = n k n

TA B L E B4
Coefficients for the function f 2 (z) = n k n

TA B L E B5
Coefficients for the function f 2 (z) = n k n

TA B L E B6
Coefficients for the function f 2 (z) = n k n  Table B3.

Interval f 2j Coefficients
f 211 Same as f 111 , see Table B3.
Same as f 112 , see Table B3.

TA B L E B7
Coefficients for the function f 3 (z) = n k n

TA B L E B8
Coefficients for the function f 3 (z) = n k n

TA B L E B9
Coefficients for the function f 3 (z) = n k n  Table B3.
Same as f 112 , see Table B3.

TA B L E B10
Coefficients for the function f 4 (z) = n k n
f 510 Same as f 410 , see Table B12.
f 511 Same as f 411 , see Table B13.
c + c k 2 ≤ z ≤ c + c k f 512 Same as f 412 , see Table B13.