Variograms for kriging and clustering of spatial functional data with phase variation

Spatial, amplitude and phase variations in spatial functional data are confounded. Conclusions from the popular functional trace-variogram, which quantifies spatial variation, can be misleading when analyzing misaligned functional data with phase variation. To remedy this, we describe a framework that extends amplitude-phase separation methods in functional data to the spatial setting, with a view towards performing clustering and spatial prediction. We propose a decomposition of the trace-variogram into amplitude and phase components, and quantify how spatial correlations between functional observations manifest in their respective amplitude and phase. This enables us to generate separate amplitude and phase clustering methods for spatial functional data, and develop a novel spatial functional interpolant at unobserved locations based on combining separate amplitude and phase predictions. Through simulations and real data analyses, we demonstrate advantages of our approach when compared to standard ones that ignore phase variation, through more accurate predictions and more interpretable clustering results.

: Boxplots of amplitude prediction errors ( -axis) over 200 simulation runs. The binned spatial distances between the unobserved site 0 and the initialization site are given on the -axis. Figure 3 further illustrates pairwise alignment of functions at neighboring sites to the two templates as well as amplitude prediction at site (−1, −2). At location (−1, −2), we show the true function in solid blue, the amplitude prediction constructed via global alignment to the estimated Karcher mean in dashed red, and the spatially-weighted template/prediction in dashed green (same as the green function in Figure 2). We use the true starting point to plot the two predictions in order to highlight amplitude differences between them and the true function. The local alignmentbased amplitude prediction shares many of the same shape features as the true blue function; this is not the case for the global alignment-based prediction. In the other eight panels (spatial locations), we show the observed function in blue as well as the alignment of this function to the Karcher mean (red) and to the spatially-weighted template (green). In most cases, the alignment to the spatially-weighted template is much more natural, resulting in an improved amplitude prediction at site (−1, −2). Convergence. In Figure 4, we empirically assess convergence properties of Algorithm 1 by plotting the prediction errors (top row) and relative change between estimates (bottom row), as a function of iteration number, for three different examples. Each example corresponds to one of the three simulation settings described in the main article: bimodal (left), B-spline with independent amplitude and phase (middle), and B-spline with dependent amplitude and phase (right). To generate the plots, we first calculate the amplitude prediction at site (0, 0) using the data observed on a 5 × 5 grid with coordinates (−2, −1, 0, 1, 2) (except for (0, 0)). The amplitude prediction errors for 0 at (0, 0), displayed in the first row in Figure 4, are measured via Figure 3: Amplitude kriging at site (−1, −2) and alignment of observed functions at neighboring sites. Each panel represents a single site with labeled coordinates. At site (−1, −2), we show the true function (blue), the Karcher mean-based amplitude prediction (dashed red), and the spatially-weighted template/prediction (dashed green); the starting point for both predictions was fixed to the true starting point. At the other eight sites, we show the observed function (blue), alignment of this function to the Karcher mean (red), and alignment of this function to the spatially-weighted template (green).
wherê( ) 0 is the amplitude prediction at the th iteration. We also monitor the relative change defined as As seen in the figure, there do not appear to be any convergence issues and the algorithm usually converges in a small number of steps.

Appendix B Proofs of Propositions 1 and 2
Proofs of Propositions 1 and 2. We first prove Proposition 1. Let ⟨⋅, ⋅⟩ denote the 2 inner-product and ‖ ⋅ ‖ the corresponding 2 norm. Then, the prediction error decomposes as follows: where the first equality holds due to the constraint ∑ =1 = 1, and the third equality uses Fubini's theorem. As per our assumption, given a template , the aligned function ( ,̂) recovers the underlying amplitude component : Amplitude prediction errors, as defined by (1) (top row), and relative changes, as defined by (2) (bottom row), in a single simulation run under three simulation scenarios: bimodal (left column), B-spline with independent amplitude and phase (middle column), and B-spline with dependent amplitude and phase (right column). The -axis denotes the iteration number.
where ℎ = ‖ − ‖ 2 , for , = 1, … , . The last equality results from the definition of the amplitude trace-variogram, which by Fubini's theorem and stationarity assumption where var{( , )( )} = 2 ( ) does not depend on the spatial location ∈  implies that The proof of Proposition 2 follows along almost identical lines. The changes are to replace the amplitude components in the proof of Proposition 1 by phase components, and to condition on the shape random field  in the expectation. The expected phase prediction error given  is Under the assumptions of the Proposition, we have ( ( ) | ) = ( ) and var( ( ) | ) are constant in space for any . Then, where ℎ , = √ ‖ − ‖ 2 2 + ⋅ ℎ ( , ) 2 ( = 0, 1, … , ; = 0, 1, … , ) and we use the property that for random variables , , , cov( ,

Appendix C Consistent alignment and convergence of amplitude kriging estimator under one-dimensional model
In Section 2.2 in the main article, we describe an approach for template-based alignment of independent functional data. A model-based formulation of the alignment procedure considers the model where is a template function, ∈ + and ( ) are realizations of an error process ( ). Unfortunately, it is well-known that the template is not identifiable except when ( ) ≡ 0; in other words, can only be estimated consistently (as → ∞) under a one-dimensional model where randomness arises purely through scale variation (Kurtek and Srivastava, 2011). This is not an artifact of the square-root slope transform framework, but rather relates to the fact that and cannot be disentangled in general when is infinite-dimensional; see Chakraborty and Panaretos (2021) for a different perspective that leads to the same conclusion. Thus, under a general functional model that is not onedimensional, the warping functions , = 1, … , cannot be consistently estimated. An important implication of this is that, in the spatial setting, it is also generally not be possible to consistently estimate the warping functions { , ∈ } ( = 1, … , ), and as a result, to construct consistent estimators of the amplitude and phase tracevariograms.
In the spatial setting, in order to decompose the trace-variogram into amplitude and phase components, consider the model for the observed data with unobserved and . Equivalently, ( , )( ) = ( ) + ( ). We assume that the data is generated from two functional random fields: the amplitude random field {( , ), ∈ }, and the corresponding phase random field { =̇1 ∕2 , ∈ }, associated with the functional random field { , ∈ }. The ∈  is a deterministic mean amplitude, constant in space, and { } and { } are independent functional random fields assuming values in  and Ψ, respectively, with E( ) ≡ 0 ∀ ∈ . The random field { } depends on { } only through the mean . Under the model, the amplitude of is ( , ), free of phase variation, and is thus the common mean amplitude. Consistent estimation of for alignment of spatial functional data under (7) is again unattainable for the same reasons as in the independent case. Instead, one can show that, under the simplified one-dimensional model, the amplitude kriging estimator̃0, at spatial location 0 ∈ , converges in 2 to an element of the true orbit [ 0 ]. This leads to the following proposition. Proposition 3. Consider observations 1 , … , and the function to predict 0 from the one-dimensional model where { } is a scalar, positive, isotropic random field on , ∈  is a deterministic mean function, and ∈ Γ is a random warping function. Suppose the variogram of { } is continuous in a neighborhood of 0 and 0 is a limit point of { 1 , … , } as → ∞. Then, the estimator̃0 converges in 2 to an element of the orbit [ 0 ] as → ∞. Proof. We simplify the notation by referring to , = 0, 1, … , as = 0, 1, … , , i.e., = , = , = . The proof is based on (i) establishing that minimizing the amplitude prediction error (‖̃0 − ( 0 , 0 )‖ 2 ) to obtain the coefficient vector can be reduced to solving the same problem for scalars 1 , … , pertaining to the scalar random field { }, and (ii) using known results on consistency of the kriging estimate based on 1 , … , to thus establish the required result.
Since is unknown, we choose 1 as a template to align the rest of the data and obtain their phase components; we will see that this choice has no bearing on the result. Under the one-dimensional model, it is possible to recover the phase functionŝ, = 2, … , in terms of 1 regardless of what is: where the third equality again uses the norm-preserving action of Γ on .
The key observation is that in the onedimensional model, optimal alignment under the norm-preserving action is independent of the scales of the functions. The aligned functions (with respect to 1 as the template) are then given by ( ,̂) = ( , −1 1 ), = 2, … , . Indeed,̂1 is then the identity map. In order to compute the prediction error, we require an estimate of 0 . Since any estimate of 0 will not depend on the scale 0 of 0 , we choosê0 = 0 • −1 1 . The next key observation is that despite not knowing 0 , it is true that ( 0 ,̂0) ∈ [ 0 ] regardless of what̂0 is, and it thus suffices to show that̃0 → ( 0 ,̂0) in 2 as → ∞. Using the aligned functions {( ,̂)} and the estimatê 0 , note that the coefficient that defines the estimator̃0 is obtained by minimizing the amplitude prediction error Thus, after eliminating the phase variation in the one-dimensional model, amplitude kriging reduces to univariate kriging of the scaling terms since ‖ ‖ 2 < ∞. We also observe that we could have chosen any of the , = 2, … , as a template. We now examine consistency of̂0 = ∑ =1̂, wherêis obtained by minimizing the amplitude prediction error. When the variogram of { } is known and continuous in a neighborhood of 0, and a limit point of { 1 , … , } is 0 as → ∞, the expected prediction error E{(̂0 − 0 ) 2 } → 0 as → ∞ (Yakowitz and Szidarovszky, 1985). When the variogram is unknown, it can be replaced by the empirical amplitude trace-variogram ‖ − ‖ 2 , and we have again used the fact that under the one-dimensional model = • −1 1 when 1 is used as the template. If observed locations are regularly spaced on a grid with fixed spacing but increasing domain, Davis and Borgman (1982) proved that the empirical variogram̂(ℎ) is a consistent estimator Figure 5: Convergence of prediction error for ordinary kriging (OK, red) and amplitude-phase kriging (APK, green) as sample size increases. The -axis is the sample size and the -axis is the median amplitude prediction error across 50 repetitions.
for the true variogram, provided that the random field { , ∈ } is -dependent; -dependence is a condition that ensures that elements in a random field with large enough distance are independent.
Under mild assumptions, the consistent empirical variogram enables consistent estimators of parameters in parametric variogram models, which are obtained by the least squares method (Lahiri, Lee and Cressie, 2002). Furthermore, Yakowitz and Szidarovszky (1985) and Stein (1988) discuss the effects of misspecification of the variogram on kriging. Given the true covariance function , the best linear unbiased estimator iŝ0. If the covariance function is misspecified as * , we have the best estimator̂ * 0 . Stein (1988) demonstrates that if * and are compatible (Ibragimov and Rozanov, 2012), then where * is the expectation under the misspecified covariance. Since the prediction error E((̂0 − 0 ) 2 ) → 0 as → ∞ given the true variogram of { }, the prediction error when using * also converges to 0. Proposition 3 further guarantees convergence of Algorithm 1, as specified in Section 4.1 in the main article, for the one-dimensional model. This is due to the fact that the spatial dependency in the data arises through the scaling coefficients only, and the computation of the weights at each iteration is not dependent on {̂}.
We report results of a simulation that validates Proposition 3. We generate spatial functional observations from the one-dimensional model on the spatial domain  = [0, 1] 2 . The spatial locations are drawn uniformly. We set ( ) = exp[−( − 2) 2 ∕(2×1.5 2 )]+exp[−( + 2) 2 ∕(2×1.5 2 )], ∈ [−5, 5]. The vector of scaling coefficients, [ 1 , … , ], is sampled from a multivariate normal distribution with mean vector (5, … , 5) and Matérn covariance with scale parameter 1, smoothing parameter 0.5, and range = 0.5; the error constants, [ 1 , … , ], are sampled from a multivariate normal distribution with mean (5, … , 5) and Matérn covariance with scale parameter 1, smoothing parameter 0.5 and range = 0.5. The warping functions , = 1, … , are generated using a Beta CDF with correlated uniform parameters as in the main manuscript; the parameters are chosen as = 1.5 and = 0.5. For different sample sizes , we simulate data in this fashion 50 times and focus on prediction at site (0.5, 0.5) via ordinary kriging and amplitude-phase kriging. For each sample size, we compute the median of the amplitude error (̂0, 0 )∕‖ 0 ‖ (̂0 and 0 are the SRSFs of the predicted and true functions, respectively), across the 50 repetitions, to assess consistency of amplitude prediction based on these two methods. The results are presented in Figure 5. As expected, the error associated with the amplitude-phase kriging predictions is much smaller than the one associated with the ordinary kriging predictions when sample size is small. As sample size increases, the two methods perform similarly and converge to a very small error value. This is expected since both approaches yield consistent estimators. We note that the final error is not 0 due to numerical approximations in the two procedures. In many real data scenarios, Table 1 Average prediction errors (SD) using metrics 1-5, across 50 different replicates, for amplitude-phase kriging (APK), two-stage kriging (TSK), ordinary kriging (OK), and universal kriging (UK), when sites are randomly sampled on the spatial domain. 2 is divied by 100 and 4 is multiplied by 10 to adjust the scale. controls the magnitude of phase variation and is the range parameter of spatial correlation for amplitude and phase. the number of observations available for kriging is limited. Thus, lower amplitude errors and faster convergence of the error, with respect to sample size, of amplitude-phase kriging as compared to ordinary kriging in Figure 5 clearly summarizes the benefits of the proposed approach in the case of the one-dimensional model.  (14) in the main manuscript. The same holds for the spatially weighted phase distance between the two sets of observations. Thus, regardless of the value of , one always merges the two sets observations with the smallest distance max{ , ∶ ∈ 1 , ∈ 2 }. While this example used complete linkage, the same holds for other linkage functions.
To show that this is true empirically, we generated simulated data using the model described in Section 6.2 in the main manuscript, and conducted amplitude-phase clustering using the new dissimilarities in (8) for = 2 , = −10, −9, … , 10. Choosing the estimated amplitude and phase partitions under = 1 as the baseline, we computed the rand index between estimated partitions for each value of as indicated above, and the baseline estimated partitions. Figure 6 shows that the estimated amplitude and phase clusters do not depend on the value of since all of the rand indices are exactly 1 for both amplitude and phase, i.e., the clustering results are identical for each value of .  additional finding based on this simulation is that stronger spatial dependency enhances prediction performance of all methods, as expected, but generally does not change their relative performance. In Figures 7 and 8, we also display an example of the empirical and fitted trace-variograms, and the corresponding values of kriging coefficients for the B-spline Scenario 1 simulation. The kriging maps in Figure 8 illustrate the contribution of each observed function to the kriging estimators. The results suggest that ordinary kriging tends to assign non-zero coefficients to more of the observed functions, resulting in a prediction with fewer distinguishable amplitude features. On the other hand, the amplitude and phase predictions are dominated by the observations at sites that are spatially close to site 1. Sensitivity to different levels of pre-smoothing. We perform the same leave-one-out cross-validation on the 24 daily average ozone functions as in Section 8.1 in the main article, but use three different levels of spline-based smoothing: = 1 × 10 −4 (low), = 3 × 10 −4 (medium; results also reported in the main article) and = 5 × 10 −4 (high). We report the mean of the five error metrics, 1-5, for each level of smoothing in Table 2. When the input data is relatively smooth ( = 5 × 10 −4 ), the difference in amplitude errors between the proposed method and ordinary kriging is very small; two-stage kriging results in the largest errors. This is due to the smoothing procedure, which smooths out notable shape characteristics of the functions. This is beneficial for ordinary kriging that generally underestimates prominent shape features in predicted functions due to misalignment of the observed data. On the other hand, the Table 2 Leave-one-out cross-validation average prediction errors of amplitude-phase kriging (APK), two-stage kriging (TSK), and ordinary kriging (OK) for the smoothed ozone data in North California. All numbers were multiplied by 1000. performance of two-stage kriging suffers in this case, because there are fewer prominent features that can be used to inform the global alignment. In contrast, when less smoothing is applied to the observed functions ( = 1 × 10 −4 ), the amplitude prediction errors are much smaller for the proposed method as compared to ordinary kriging; the amplitude mean squared error ( 3) is reduced by 14% compared to ordinary kriging. At the same time, the proposed approach has similar prediction accuracy as two-stage kriging in terms of amplitude. This is due to the fact that more shape characteristics of the observed functions are preserved after smoothing; further, the proposed approach is much more effective in capturing these features in the amplitude predictions than ordinary kriging. In terms of phase, the proposed method outperforms ordinary kriging for all values of the smoothing parameter; the reduction in the phase mean squared error ( 4) ranges from 8% to 12%. Amplitude-phase kriging significantly outperforms two-stage kriging in terms of phase prediction when a considerable amount of smoothing is applied to the data. On the other hand, two-stage kriging outperforms amplitude-phase kriging for the two lower values of the smoothing parameter. Overall, it is evident that amplitude-phase kriging is fairly robust under different magnitudes of smoothing of the observed functions. Canadian weather data clustering without spatial dependency. The clustering techniques discussed in the main article account for the spatial dependency across functional observations. The spatial dependency is encoded in the dissimilarity matrix via weights, where the discrepancy between pairs of functions near each other, with respect to the distance on the domain, is down-weighted. The weight, computed using the fitted trace-variogram, works as an empirical prior, where the dependency decays as the domain distance increases, and its range and rate of decay depend on the data. In particular, if the data suggest no spatial dependency, spatially-weighted clustering is the same as standard hierarchical clustering without spatial information. Spatial weighting tends to preserve connectivity of clusters and often results in more interpretable results.
To show the difference between clustering with and without spatial adjustment, we applied standard hierarchical clustering, with average linkage, to the same data as considered in Section 8.2 in the main article. The results are shown in Figure 9, and are directly comparable to Figure 3 in the main article. The partitions estimated using hierarchical clustering without accounting for spatial dependency tend to be more scattered, e.g., cluster 2 (yellow) in amplitude and phase clustering. Such results are difficult to interpret and relate to geological factors. In addition, standard hierarchical clustering of this data results in many more singleton clusters. : Clustering (average linkage, clusters in different colors) of functional residuals, after adjusting for latitude and longitude effects, obtained from the Canadian weather data when dissimilarity does not account for spatial dependency.