Some Recent Developments in Inference for Geostatistical Functional Data

We review recent developments related to inference for functions defined at spatial locations. We also consider time series of functions defined at irregularly distributed spatial points or on a grid. We focus on kriging, estimation of the functional mean and principal components, and significance testing, giving special attention to testing spatio–temporal separability in the context of functional data. We also highlight some ideas related to extreme value theory for spatially indexed functional time series.


Introduction
The objective of this paper is to review some recent developments in inference for geostatistical functional data.We thus consider data which are a collection of curves at spatial locations {s k }.Each t → X(s k ; t) is a curve.For example, the {s k } can be locations of meteorological stations, and X(s k , t) can be some measurement, for example temperature or wind speed at time t.A sample of geostatistical spatial data has the form {X(s k , t), s k ∈ S, t ∈ T , k = 1, 2, . . ., N } .
The spatial domain S is typically a subset of the two-dimensional plane or sphere.The points s k at which observations are available are generally scattered over S in some irregular manner.In some applications, e.g.satellite pictures, these points form a grid, but the methodology we discuss in this paper has been motivated by and generally applies to irregularly spaced locations s k .Some more general data structures consisting of time series of functions observed at spatial locations will also be considered.
Just as in the case of scalar geostatistical data, the model for functional geostatistical data is the random field {X(s), s ∈ S}, with the difference that now each X(s) is a random function.We assume that each X(s) is an element of the Hilbert space L 2 = L 2 (T ), with the interval T = [0, 1] chosen merely for convenience.We assume that each X(s) is square integrable, i.e.
If we assume that S is the whole Euclidean space, we can define strict stationarity by the condition {X(s 1 + h), X(s 2 + h), . . ., X(s m + h)} d = {X(s 1 ), X(s 2 ), . . ., X(s m )} . (1) If this is the case, then E ∥X(s)∥ 2 does not depend on s.Square integrability and strict stationarity imply that the mean function µ(t) = EX(s; t) and the covariances C(h; t, u) = Cov(X(s; t), X(s + h; u)) exist and do not depend on s.When working with functional geostatistical data, it is important to keep in mind that the function h → C(h; t, u) is nonnegative definite only if t = u.In that case, C(•; t) := C(•; t, t) is the spatial covariance function of the scalar field {X(s; t)}.It is often assumed that the covariances are isotropic, i.e.C(h; t, u) = C(h; t, u) depends only on h = ∥h∥.Unless stated otherwise, in the remainder of this paper, we assume that the field {X(s)} is square integrable, strictly stationary, and isotropic.
We also consider a more complex data structure; at spatial locations we observe time series of functions.The data have the form X n (s; t), where n is year, s is a spatial location, and t is time within a year.For example X n (s; t) can be the curve of daily temperatures, the unit of t is day, in year n, and at location s.Other examples are: total daily precipitation, wind strength, wind direction, solar radiation.Many of these daily time series are either maxima, minima or averages of data obtained in higher temporal resolutions.Historical records can be treated as geostatistical data, as meteorological stations are not uniformly distributed.However, computer weather prediction models produce complete records at dense grids that extend many decades into the future.
The paper is organized as follows.In Section 2, we review the subject of kriging in the context of functional data.Kriging is a fundamental tool in the field of spatial statistics, and so is a suitable starting point.We then turn, in Section 3, to the subject of estimation of the the mean function and the functional principal components from a sample of spatially distributed curves.Most methods of Sections 2 and 3 are implemented in the R package geofd.Examples of code are given in Chapter 9 of Kokoszka & Reimherr (2017).Section 4 is devoted to several significance tests in the context of spatial functional data.Testing of separability is presented in a separate section, Section 5, due to its importance in many inferential procedures for spatio-temporal data.We conclude with fairly recent research on extreme events defined in terms of spatially indexed functional time series.This work, whose many aspects are still under development, is outlined in Section 6.
It is not our objective to present a complete review of the subject, but rather to explain those recent developments which have attracted our interest.We strive not to merely provide references, but to explain the central points in detail that will allow the reader to gain sufficient understanding to decide if a specific development is relevant to her work or not.
We assume that the reader is familiar with the fundamental concepts of spatial statistics, explained, e.g., in initial chapters of Wackernagel (2003), Schabenberger & Gotway (2005) or Sherman (2011).Chapter 3 of Gelfand, Diggle, Fuentes & Guttorp (2010) also provides a sufficient introduction.Some familiarity with the fundamental concepts of functional data analysis would also be helpful; Chapters 1 and 3 of Kokoszka & Reimherr (2017) provide enough background.

Kriging
We observe functions X(s k ) at spatial locations s 1 , s 2 , . . ., s N .Suppose s is a different location, and we want to predict the function X(s) using the functions X(s 1 ), X(s 2 ), . . ., X(s N ).In the functional setting, the available data can be represented as a matrix of scalars assuming all functions are observed at the same points t j .In some situations, the points t j may depend on the location s k , so the scalar data points are X(s k , t kj ).
There are many approaches to computing a predictor X(s) of the unobserved curve X(s).We start with a method which is the most direct extension of the approach used for scalar data.
Taking a functional point of view, we treat each curve X(s k ) as an indivisible data object.We want to determine the weights w k for the predictor of the form (2) Since the X(s k ) are now functions, we calculate the weights by minimizing the expected L 2 distance between X(s) and X(s).We thus want to find weights w 1 , w 2 , . . ., w N which minimize Observe that Define the functional spatial covariances as Notice that for any two spatial points, C(s, s ′ ) is a real number, as opposed to an operator.Using this notation, we see that Finding the kriging weights is thus based on the same equations as for scalar data, but using the functional covariances (3).Therefore, the weights w k are found by solving the system of equations To implement this approach in practice, we must estimate the covariances C(s k , s ℓ ) and C(s k , s).We now explain how this can be done.Using the assumptions of stationarity and isotropy, we see that The kriging predictor is often expressed in a slightly different way.We want to find a predictor X ⋆ (s) of the form By contrast, equations (4) do not imply that ∑ N k=1 w k = 1.In many applications, the weights w j are very close to the weights λ j , and the predicted function X(s) and X ⋆ (s) are also very close.
In addition to these simple methods, (Delicado, Giraldo, Comas & Mateu 2010) give references to several other methods of kriging functional data.Another relatively simple kriging method they discuss, which gives predictions as good as more complex methods can be summarized as follows.The predictor has the form where the unknown functions w k (•) satisfy ∑ N k=1 w k (t) = 1, for each t.The observed functions X(s k ) and the unknown weight functions w k (•) are expanded using some basis system: In particular, one can use the estimated FPC's vm as the B m .The coefficient matrix is also found by minimizing E∥ X(s) − X(s)∥ 2 .This approach is similar to the method of co-kriging used for multivariate data, e.g.Wackernagel (2003).

Estimation of the mean function and of the FPC's
To estimate the mean function, we use the weighted average and the problem becomes how to compute the optimal weights w k .As in Section 2, in (5) we treat the curves X(s k ) as indivisible statistical objects, and so we apply one weight to the whole curve.One could consider weights w k (t), defined as functions, but the simple approach we describe in this section is easy to implement, and produces estimates which significantly improve on the simple average.Figure 1 illustrates the difference between the simple average (w k = 1/N ) and the weighted average we are about to derive, using annual temperature curves in Canada.There are plenty of locations in the southern part of Canada, especially in the densely populated South-East of the country, and very few in the North.If we postulate the model X(s k ; t) = µ(t) + ε(s k ; t), then µ should represent the typical annual temperature profile for the whole country.The simple average will mostly reflect temperature curves in the region where there are many stations, i.e. mostly in the South.To obtain a more informative kriged mean function, stations far in the North must receive larger weights.Some stations in the south will receive even slightly negative weights.We define the optimal weights as those that minimize Revista Colombiana de Estadstica 42 (2019) 101-122 subject to the condition ∑ n k=1 w k = 1 which ensures that E μ = µ.The method of Lagrange multipliers leads to the following system of N + 1 linear equations: in which w 1 , w 2 , . . ., w N , r are N +1 unknowns.The functional covariances C(s k , s n ) can be estimated as explained in Section 2.
A fundamental tool of functional data analysis is the Karhunen-Loéve expansion.Every function X in L 2 can be represented as , where the v j are orthonormal functions known as the functional principal components (FPC's).The main feature of this expansion is that the function is a very good approximation of the function X(•) even for small p; in many applications p = 3 is sufficient.A small number of scores ξ 1 , ξ 2 , . . ., ξ p thus encodes an infinitely dimensional function.Details are explained in Chapter 11 of Kokoszka & Reimherr (2017).The functions v j are the eigenfunctions of the covariance function c(t, s) = Cov(X(t), X(s)).Therefore, if the spatially indexed functions X(s) have the same distribution, then each of them admits the expansion We now explain how to estimate the v j in (8) under the assumptions of stationarity and isotropy stated in Section 1.We also assume that µ(t) = 0; in practice we would subtract the estimated mean function.We describe one of the two methods proposed by Gromenko, Kokoszka, Zhu & Sojka (2012).Both methods have very similar finite sample performance, and have significantly smaller MSE's compared to the standard method which does not take spatial dependence into account.
It is assumed that the v j admit the expansion where the B α are elements of an orthonormal basis, for example Fourier basis (splines will not work because they are not orthogonal).The expansion ( 9) is already an estimation step.We would take K so large that each observed function X(s k ) is well approximated by the sum The b α (s k ) form an observable field.
The v j are determined (up to a sign) by the conditions C(v j ) = λ j v j and is the covariance operator.Using the orthonormality of the B j , we obtain Fix i and j, and define the scalar field z by z(s) = b i (s)b j (s).We can postulate a parametric model for the covariance structure of the field z(•), and use an empirical variogram to estimate µ z = Ez(s) as a weighted average of the z(s k ).Denote the resulting estimate by rij .The empirical version of ( 10) is then Relation (11) defines the estimator C which acts on the span of On the other hand, λx = ∑ i λx i B i .Since the B i form an orthonormal basis, we obtain we can write the above as a matrix equation Denote the solutions to (12) by The x(j) satisfy are also orthonormal (because the B j are orthonormal).The vj given by ( 14) are the estimators of the FPC's, and the λj in (13) of the corresponding eigenvalues.
A natural question to ask is what happens if one applies standard software to the observed curves X(s k ), 1 ≤ k ≤ K, in order to estimate the FPC's, v j , Revista Colombiana de Estadstica 42 (2019) 101-122 in (8).The usual method, implemented in the function pca of the fda package (and in many other packages) is to use the the eigenfunctions of the sample covariance operator.It is suitable for a sample of iid curves or for curves forming a regularly spaced functional time series, but not for geostatistical functional data.We illustrate with an example.A comprehensive theory and more examples were developed by Hörmann & Kokoszka (2013).This material is also presented in Chapter 18 of Horváth & Kokoszka (2012).
For mean zero functions, the sample covariance operator C is defined by The usual estimated FPC's are defined by C N (v j ) = λj vj .Consider a functional random field where {e j , j ≥ 1} is a complete orthonormal system and the ξ j (s) are mean zero random variables with ∑ ∞ j=1 λ j ⟨e j , x⟩ e j , so the λ j are the eigenvalues of C, and the e j the corresponding eigenfunctions, i.e. v j = e j .Now consider a sequence s n → 0. Because of the positive dependence, X(s n ) is close to X(0), so C N , as an arithmetic average, is close to the random operator X ⋆ = ⟨X(0), •⟩ X(0).Observe that X ⋆ (X(0)) = ∥X(0)∥ 2 X(0).Thus X(0) is an eigenfunction of X ⋆ .Since it is random, it cannot be close to any of the v j , even asymptotically.
The above is only an intuitive illustration of the point that if the spatial locations at which functions are observed have dense clusters, or if the the dependence is strong, then the eigenfunctions of C N do not consistently estimate the FPC's v j in (8).

Independence, Trend and Change Point Tests
In this section, we discuss several significance tests for geostatistical functional data.These tests are analogs of simpler tests for iid curves.The spacial dependence between the curves requires more complex forms of the test statistics.In addition, these statistics typically require complex numerical procedures to compute them.We therefore merely describe what has been done, and provide suitable references.
We begin with a test of independence of two functional random fields.It was proposed by Gromenko et al. (2012), and is described in detail in Chapter 17 of Horváth & Kokoszka (2012).The data are observed at N spatial locations: s 1 , s 2 , . . ., s N .At location s k , we observe two curves: X(s k ) and Y (s k ).We assume that the sample {X(s k )} is a realization of a random field {X(s), s ∈ S}, and the sample {Y (s k )} is a realization of a random field {Y (s), s ∈ S}.We want to test the null hypothesis: H 0 : for each s ∈ S, the random functions X(s) and Y (s) are independent against the alternative that H 0 does not hold.The test statistic detects departures from H 0 that manifest themselves in the lack of the correlation between the projections ⟨x, X(s)⟩ and ⟨y, Y (s)⟩, for any x, y ∈ L 2 .In practice, the lack of correlation can be tested only for x and y from sufficiently large finite dimensional subspaces, those spanned by the first p FPC's of X(s), and the first q FPC's of Y (s).The idea of the test, thus requires that the pairs (X(s), Y (s)) have the same distribution for every s ∈ S. The construction of the test assumes that both fields, {X(s), s ∈ S} and {Y (s), s ∈ S} are stationary.The test statistic S N is constructed from the inner products ⟨X(s k ), vj ⟩ , 1 ≤ j ≤ p and ⟨Y (s k ), ûj ⟩ , 1 ≤ j ≤ q, where the vj are estimated FPC's of the X-sample and the ûj of the Y -sample.The vj and the ûj be computed as described in Section 3.Under H 0 , the test statistic S N converges, as N → ∞, to a chi-square distribution with pq degrees of freedom.Once the vj and the ûj are computed, the tests statistic can be computed without difficulty using package fda.Since the critical values are those of the chi-square distribution, the test is easy to apply.
The problem of testing the equality of two mean functions, each defined for functions observed over a different spatial region, is considered in Gromenko & Kokoszka (2012).
Our next test is motivated by an interesting and extensively studied problem of space physics.The account presented here is based on Gromenko & Kokoszka (2013) and Gromenko, Kokoszka & Sojka (2017).We first describe the space physics problem, and then explain the idea of the test.Increased concentration of greenhouse gases in the upper atmosphere is associated with global warming in the lower troposphere (the atmosphere roughly below 10 km).Roble & Dickinson (1989) suggested that the increasing amounts of these radiatively active gases, mostly CO 2 and CH 4 , would lead to a global cooling in the ionosphere (atmosphere roughly 300 km above the Earth's surface).Rishbeth (1990) pointed out that this would result in a thermal contraction of the ionosphere.The height of the ionosphere can be approximately computed using data from a radar-type instrument called the ionosonde.Relevant measurements have been made for many decades by globally distributed ionosondes.In principle, these observations could be used to quantitatively test the hypothesis of Roble & Dickinson (1989).The difficulty in testing the contraction hypothesis comes from several sources.The height of the ionosphere depends on magnetic coordinates, the season, long term changes in the strength and direction of the internal magnetic field, and, most importantly, on the solar cycle; more solar radiation leads to greater ionization.This is illustrated in Figure 2. Another difficulty stems from the fact that ionosonde records are not complete.Most observation stations do not operate continuously for many decades.They start and end operation at different times, some of them are out of service for many years, or even decades.In the mid-latitude northern hemisphere, there are 81 ionosonde stations, but at any given time, data from no more than 40 are available, as shown in Figure 3.This means that the estimation methods described in Section 3 cannot be used, as they require complete records to compute various integrals.More complex methods that work for incomplete records are needed.Let Y (s k ; τ i ) be the original record at location s k , measured from 1958 to 2015, possibly with long gaps.The set of all locations is {s k , 1 ≤ k ≤ K}, and the set of time points at which measurements may be available is {τ i , 1 ≤ i ≤ T }; in Gromenko, Kokoszka & Sojka (2017) these are months from January 1958 to December 2015.The following spatio-temporal model is postulated: where s is a generic location in a region of interest, and τ is continuous time.A simple form of the mean function relevant to the space physics problem is where SRF(τ ) is the solar radio flux, cf.Fig. 2, and M (s; τ ) is a suitable function computed from the coordinates of the internal magnetic field.The interest lies in the estimation of the mean function, and testing if it contains a linear trend, i.e. testing H 0 : β 2 = 0.The function µ(•, •) is treated as an unknown deterministic functional parameter.The second term, ε(s; τ ), describes the spatio-temporal variability away from the mean function.Stochastic modeling of this term is needed to develop inferential procedures.The term θ(s; τ ) represents a random error, which can be associated with measurement error.The details of the estimation and testing procedures are too complex to describe here.The conclusion is that β 2 is significantly negative, confirming the hypothesis of global ionospheric contraction.The software to perform the test is available and can be used to test for the presence of global trends in other data of this type, for example in near-surface temperatures.
We conclude this section with a discussion of change-point detection for the mean function.The presentation is based on Gromenko, Kokoszka & Reimherr (2017).General theory for change points for scalar data is presented in Csörgő & Horváth (1997), several extensions to functional data are presented in Horváth & Kokoszka (2012).In the spatio-temporal setting we consider here, the data are assumed to follow the model In most applications, n denotes year, s, as before, spatial location, and t time within a year.For a fixed s, {X n (s), n = 1, 2, . ..} is a time series of functions, one function per year.While we assume that the curves are observed densely in time, they are only observed at a finite number of spatial locations {s k : k = 1, . . ., K}.The goal is to determine if the mean functions, µ n , are the same across n or if there is a change in the mean.In other words, we aim to evaluate the null hypothesis The alternative of a sudden change at a single year n * is to be viewed as a mathematical approximation.As with all testing problems, the null hypothesis of no change is tested, and the alternative specifies the violations of the null hypothesis which will be detected with the highest power.The tests described below, will also detect different violations of H 0 , including gradual changes taking place over a few years.
To carry out the tests, we assume that the covariance structure of the ε n is separable, a test of which will be discussed in section 5.In particular, we assume that Cov(ε n (s; t), ε n (s Notice that the decomposition above is unique only up to a constant, we thus assume that ∫ v(t, t) dt = 1.Gromenko, Kokoszka & Reimherr (2017) consider three different test statistics based on CUSUM's (cumulative sums), each of which uses the expansion (8) applied to the annual curves in different ways.They do not require a specific procedure for estimating v(t, t ′ ) and U = {u(s k , s k ′ )}, instead formulating their tests assuming they have consistent estimates.Examples of how to estimate these quantities are given in their appendix, as well as in Aston, Pigoli & Tavakoli (2016) and Constantinou, Kokoszka & Reimherr (2017).The first test involves using a temporal FPCA, normalizing by the corresponding eigenvalues, and then pooling across space.The tests statistics is The pooling weights, ŵk , can be chosen however the user prefers, but in practice, using weights as described in Section 3 works well.It has now become well recognized in the FDA community that dividing by estimated eigenvalues can cause stability problems with certain data.Thus a second test statistic was also proposed which omits normalizing by the λi : For p large, Λ2 is essentially an approximation of the third and final test statistic Each statistic has a slightly different limiting distribution.In particular, under where B ik (t) are Brownian bridges which satisfy These asymptotics can be used for choosing rejection regions with proper Type 1 error.To do so, one can use Monte Carlo, though (Gromenko, Kokoszka & Reimherr 2017) also provide a normal approximation which works well when considering a relatively large number of spatial points.These tests can be implemented using the scpt package in R, which is available from http://www.personal.psu.edu/~mlr36/codes.html and includes an example code for implementing the methodology.If a change point is detected, its location can be identified as the year when a test statistics attains maximum.Gromenko, Kokoszka & Reimherr (2017) applied the above test procedures (with suitable finite sample calibration) to test if the precipitation pattern over the midwest US states has changed.All tests led to the same conclusion: one change point in second half of the sixties.The pattern of the change is shown in Fig. 4. The biggest changes are in the area around Michigan Lake and south-west of the lake.To visualize the spatial distribution of change, the authors used the spatial field heat map shows a statistically significant change over a region, not a variation in the magnitude of change which may be due to chance.

Separability Tests
The second order structure of a random field of functions is described by the covariance function σ(s, s ′ ; t, t ′ ) = Cov(X(s, t), X(s ′ , t ′ )).Theoretical and computational aspects of most procedures can be significantly simplified if one can assume that σ(s, that is, if the spatio-temporal covariance function factors into the product of a purely spatial and purely temporal covariance functions.The above condition is, for example, required to develop the change point methodology described in Section 4. If ( 17) holds, we say that the functional random field is separable.In particular, the spatial dependence structure is the same for any time t.Tests whose null hypothesis is ( 17) have been proposed by (Liu, Ray & Hooker 2017), (Aston et al. 2016) and (Constantinou et al. 2017).We discuss the latter two approaches here.
We begin by discussing two different procedures for estimating v(t, t ′ ) and U = {u(s k , s k ′ )}.Aston et al. (2016) used partial trace operators to estimate these quantities, while (Constantinou et al. 2017) combined bases expansions with a "flip-flop" approach, which basically iterates between estimating v and U. Partial trace operators are used to eliminate either the temporal or spatial component of the covariance, in particular, define the estimated covariance function as (In the above formula, and in the remainder of this section, we assume that the mean function is zero.)Then the spatial covariance is estimated using and the temporal covariance is estimated using The normalization above ensures that the trace of v(t, t ′ ) is normalized to be 1, though other normalizations are possible.One thing to note is that the full covariance σ does not actually need to be estimated, as one can move to estimating the Û and v directly.
The second approach to estimating v and U comes form (Constantinou et al. 2017) and begins with basis expansions.In particular, one approximates each temporal curve using J basis functions: The basis e j (t) can either be deterministic or data driven, such as FPCA.Testing separability of X n is then translated into testing separability of ξ jn (s k ), so that the null becomes However, this is now a multivariate problem and we can utilize multivariate methods for estimating V and U, in particular, the maximum likelihood estimators, assuming the data are Gaussian, satisfy where Ξ n = {ξ jn (s k )} is a J ×K matrix.To compute these estimators, one chooses an initial value, for example Û0 = I, and then iterates between the two estimates, hence the name "flip-flop method".
With these estimators in hand, one can now carry out a number of different tests.To construct these tests first define where vec is the vectorization operator, which stacks the columns of a matrix into a vector.Under H 0 one can show that where ⊗ is the Kronecker product.Constantinou et al. (2017) consider three different approaches The first test uses the Moore-Penrose generalized inverse of the estimated covariance matrix of V ⊗ Û − Σ , W + , to normalize the difference and create a Wald type test statistic.The form for W is complex, but can be found in Constantinou et al. (2017).The generalized inverse is necessary since the symmetry of the matrices and the non-uniqueness of U and V create a number of linear constraints.This turns out to be a fairly unstable test statistic in terms of Type 1 error, and works well only for very small J and K.The second test is derived from the likelihood ratio and had previously been explored in the multivariate literature in ?.This test is also quite unstable, but only when using a chi-squared distribution as an asymptotic approximation.Mitchell, Genton & Gumpertz (2006) provided a Monte-Carlo method which provides a much more stable (in terms of type 1 error) test.The last test forgoes normalizing the test statistic and instead uses an asymptotic distribution given by a weighted sum of chi-squares.This test is very stable and, in the settings considered by Constantinou et al. (2017) exhibited excellent power.Lastly, we note that Aston et al. (2016) provided a testing procedure that focussed on testing the separability in the directions dictated by the eigenfunctions.They utilized a bootstrap approach instead of asymptotic distributions, and their individual tests can be pooled to construct a global test for evaluating separability overall.

Spatio-Temporal Extremes
In this section, we summarize the work of (French, Kokoszka, Stoev & Hall 2019) which deals with the computation of probabilities of heat waves.The raw data are spatially-indexed time series of daily temperature measurements, X(s k , j), denoting the temperature at a spatial location s k , k = 1, . . ., K, on day j.As argued above, due to the natural annual climate cycle, for each site, we partition {X(s i , j)} into years, and view the resulting 365-dimensional vectors as samples from a functional time series: Here, t → X n (s k ; t) is the temperature curve at site s k for year n, viewed as a function of time t in days.In contrast to the setting of previous sections, the data used by French et al. (2019) are not historical records, but data generated by a computer climate model.These artificial data are of much higher quality than historical records; there are no gaps, and the daily records are available at 16,100 locations forming a grid covering much of North America.It is, at this point, not clear how to extend the methodology of (French et al. 2019) to historical records.The advantage of using computer model data is that they are predicted future temperatures (French et al. 2019 work with the period 2041-2070), which are more relevant to the prediction of future heat waves.On the other hand, these data do depend on a model, and the poor quality, geostatistical historical records are the real data.French et al. (2019) propose many functionals that can quantify a heat wave, but here we focus on one specific approach that explains the general idea.A heat wave is characterized by its spatial and temporal extents and by its intensity.The intensity is typically quantified by a threshold.Public health concerns call for a fixed threshold, like 105 0 F.However, in climate studies of large spatial regions, with many climatic zones, such a fixed threshold is not appropriate.Also the variability of temperatures depends greatly on the geographical location, with coastal locations exhibiting much smaller variability than locations far away from large bodies of water.It is therefore reasonable to work with standardized temperatures where (20) If the Z n (s k , t i ) exceed a fixed threshold z, e.g.z = 2, for a number of neighboring locations and over a period of time, then we have observed a heat wave (the Z n (s k , t i ) are practically normal).The severity of a heat wave increases with the size of the region, the duration and the threshold z that is exceeded.Suppose the X n (s k , t i ) are maximum daily temperatures, and set This is the average maximum temperature over the ℓ days preceding day t j .Next, define If Z ⋆ n (t j ) > z, then the average maximum temperature over ℓ days over K (neighboring) locations exceeds, z; this corresponds to a heat wave defined by this specific functional.We are interested in the probability of a heat wave in any given year.We assume that this probability does not depend on year n.We thus want to compute, for some relevant z > 0, p(z) = P (∃ j : Z n (t j ) > z) = P ( max The concatenated sequence Z ⋆ (t j ) is stationary and weakly dependent, so (see e.g.Beirlant, Goegebeur, Segers & Teugels 2006, Chapter 10), there are sequences a J and b J such that where H is a univariate Generalized Extreme Value distribution function.The function H depends on three parameters, which can be estimated, together with the constants a J and b J , using now standard R implementations.
Figure 5 shows examples of regions corresponding to 50, 150 and 450 neighboring locations.Figure 6 shows a map of the probability of a heat wave for d = 50 for three durations ℓ, with (a) corresponding to ℓ = 3, (b) to ℓ = 10, and (c) to ℓ = 30.When ℓ = 3, there is a surprisingly high probability of localized heat waves over the Labrador Peninsula.Such short heat spells may occur with probability approaching 50%, that is on average every second year.While our EVT approximation may break down for such high probabilities, it is nevertheless obvious that that part of Canada will see heat spells much more frequently than in the past.Generally, we see that the area around the Hudson Bay will experience an increased frequency of hot spells lasting a few days.There is a noticeable drop in the probability of such a heat wave around the Rocky Mountain range.The probability is also very low along the Eastern seaboard of the United States.Increasing the duration to ℓ = 10 days, dramatically reduces the probability of a heat wave of the corresponding magnitude.The reader will note the different probability scale.Many parts of Canada once again show an increased probability of a heat wave of this magnitude, as well as parts of Iowa and Illinois, certain regions in Texas, and, most visibly, the Pacific Ocean off the Southern California coast.Increasing the duration to approximately 1 month (ℓ = 30), causes the probability of a heat wave to drop even further; generally, throughout North America, heat waves of this magnitude will occur with probability of less than 1%, i.e. once per one hundred years, on average.Over the Canadian plains and the Canadian Rockies, this probability increases only slightly to about 1.5%.There are two patches, in Arizona and Southern Texas, with probabilities elevated to 2-3%.

Conclusions
This article has focused on aspects of the statistical analysis of functions and time series of functions distributed over space that have attracted our interest.No claim is made that this survey is complete.There have been developments in the analysis of functional data arising in brain and other medical studies that have important spacial aspects, but do not readily fit into the more traditional and well understood paradigms of spatial statistics.One can expect that as new data sets consisting of functions that exhibit some form of spatial dependence arise, new tools may need to be developed to extract useful information.

Figure 1 :
Figure 1: Gray lines: annual temperature curves at 35 locations in Canada (averaged over several decades).Dashed line: simple average of these curves.Continuous black line: mean function estimated by functional kriging.

Figure 2 :
Figure 2: Gray lines represent ionosonde measurements obtained at observatories located in mid-latitude northern hemisphere, with the scale on the left-hand side.The black line represents the observed solar radio flux with the scale on the right-hand side.

Figure 3 :
Figure 3: Number of available stations in the mid-latitude northern hemisphere.

Figure 4 :
Figure 4: The spatial field showing the L 2 distance between the mean log-precipitation before and after 1966.There is an increase in precipitation throughout the year in the area around location 4, decrease in the first half of the year in the area around location 1. Locations close to 2 and 3 do not show a large change nor a consistent pattern.
They performed kriging with the exponential covariance model to obtain the heat map shown in Fig. 4. The application of the significance tests confirms that the Revista Colombiana de Estadstica 42 (2019) 101-122

Figure 5 :
Figure 5: A map of the neighborhood structures for different locations using 50, 150, and 450 nearest neighbors.Each × marks a neighborhood centroid and the sequences of grey shading mark the extents of the increasing neighborhood sizes.