Regular Variation in Hilbert Spaces and Principal Component Analysis for Functional Extremes

Motivated by the increasing availability of data of functional nature, we develop a general probabilistic and statistical framework for extremes of regularly varying random elements X in L 2 [0 , 1] . We place ourselves in a Peaks-Over-Threshold framework where a functional extreme is defined as an observation X whose L 2 -norm ∥ X ∥ is comparatively large. Our goal is to propose a dimension reduction framework resulting into finite dimensional projections for such extreme observations. Our contribution is double. First, we investigate the notion of Regular Variation for random quantities valued in a general separable Hilbert space, for which we propose a novel concrete characterization involving solely stochastic convergence of real-valued random variables. Second, we propose a notion of functional Principal Component Analysis (PCA) accounting for the principal ‘directions’ of functional extremes. We investigate the statistical properties of the empirical covariance operator of the angular component of extreme functions, by upper-bounding the Hilbert-Schmidt norm of the estimation error for finite sample sizes. Numerical experiments with simulated and real data illustrate this work


Introduction
The increasing availability of data of functional nature and various applications that could now possibly rely on such observations, such as predictive maintenance of sophisticated systems (e.g.energy networks, aircraft fleet) or environmental risk assessment (e.g.air quality monitoring), open new perspective for Extreme Value Analysis.In particular, massive measurements sampled at an ever finer granularity offer the possibility of observing extreme behaviors, susceptible to carry relevant information for various statistical tasks, e.g.anomaly detection or generation of synthetic extreme examples.
The main purpose of this paper is to develop a general probabilistic and statistical framework for the analysis of extremes of regularly varying random functions in the space L 2 [0, 1], the Hilbert space of square-integrable, real-valued functions over [0, 1], with immediate possible generalization to other compact domains, e.g.spatial ones.A major feature of the proposed framework is the possibility to project the observations onto a finite-dimensional functional space, via a modification of the standard functional Principal Component Analysis (PCA) which is suitable for heavy-tailed observations, for which second (or first) moments may not exist.
Recent years have seen a growing interest in the field of Extreme Value Theory (EVT) towards high dimensional problems, and modern applications involving ever more complex datasets.A particularly active line of research concerns unsupervised dimension reduction for which a variety of methods have been proposed over the past few years, some of them assorted with non asymptotic statistical guarantees relying on suitable concentration inequalities.Examples of such strategies include identification of a sparse support for the limiting distribution of appropriately rescaled extreme observations (Goix et al. (2017); Simpson et al. (2020); Meyer and Wintenberger (2021); Drees and Sabourin (2021); Cooley and Thibaud (2019); Medina et al. (2021)), graphical modeling and causal inference based on the notion of tail conditional independence (Hitz and Evans (2016); Segers (2020); Gnecco et al. (2021)), clustering (Chautru (2015); Janßen and Wan (2020); Chiapino et al. (2019)), see also the review paper Engelke and Ivanovs (2021).In these works, the dimension of the sample space, although potentially high, is finite, and dimension reduction is a key step, if not the main purpose, of the analysis.On the other hand, functional approaches in EVT have a long history and are still the subject of recent development in spatial statistics, see e.g. the recent review from Huser and Wadsworth (2022).For statistical applications, typically for spatial extremes, strong parametric assumptions must be made to make up for the infinite-dimensional nature of the problem.Dimension reduction is then limited to choosing a parametric model of appropriate complexity and it is not clear how to leverage dimension reduction tools recently developed for multivariate extremes in this setting.The vast majority of existing works in functional extremes consider the continuous case, following in the footsteps of seminal works on Max-stable processes (De Haan (1984); De Haan and Ferreira (2006)): the random objects under study are random functions in the space C[0, 1] d , d ∈ N * , of continuous functions on the product unit interval, endowed with the supremum norm.Some exceptions exist, e.g. the functional Skorokhod space D[0, 1] d equipped with the J 1 -topology has been considered in several works (see Davis and Mikosch (2008); Hult and Lindskog (2005) and the references therein), and upper-semicontinuous functions equipped with the Fell topology are considered in Resnick and Roy (1991); Molchanov and Strokorb (2016); Sabourin and Segers (2017); Samorodnitsky and Wang (2019).Again, it is not clear how to perform dimension reduction in these functional spaces.
In the present paper we place ourselves in the Peaks-Over-Threshold (POT) framework: the focus is on the limit distribution of rescaled observations, conditioned upon the event that their norm exceeds a threshold, as this threshold tends to infinity.In the continuous case, an extreme observation is declared so whenever its supremum norm is large, i.e. above a high quantile.The limiting process arising in this context is a Generalized Pareto process.In the standard POT framework, the definition of an extreme event depends on the choice of a norm which may be of crucial importance in applications.As an example, in air quality monitoring for public health matters, it may be more relevant to characterize extreme concentration of pollutants through an integrated criterion over a full 24-hours period, rather than through the maximum hourly record.This line of thoughts is the main motivation behind the work of Dombry and Ribatet (2015), which consider alternative definitions of extreme events by means of an homogeneous cost functional, which gives rise to r-Pareto processes.However the observations are still assumed to be continuous stochastic processes and the framework is not better suited for dimension reduction than those developed in the previously cited works.A standard hypothesis underlying the POT approach is regular variation (RV), which, roughly, may be seen as an assumption of approximate radial homogeneity regarding the distribution of the random object X under study, conditionally on an excess of the norm ∥X∥ of this object above a high radial threshold.An excellent account of regular variation of multivariate random vectors is given in the monographs Resnick (1987Resnick ( , 2007)).In Hult and Lindskog (2006) regular variation is extended to measures on arbitrary complete, separable metric spaces and involves M 0 -convergence of measures associated to the distribution of rescaled random objects.One characterization of regular variation in this context is via weak convergence of the pseudo angle Θ = ∥X∥ −1 X and regular variation of the (real-valued) norm ∥X∥.Namely the law of Θ given that ∥X∥ > t (t > 0), L(Θ | ∥X∥ > t), which we denote by P Θ,t , must converge weakly as t → ∞, towards a limit probability distribution P Θ,∞ on the unit sphere (see e.g.Segers et al. (2017); Davis and Mikosch (2008)).In the present work we place ourselves in the general regular variation context defined through M 0 -convergence in Hult and Lindskog (2006), and we focus our analysis on random functions valued in the Hilbert space L 2 [0, 1], which has received far less attention (at least in EVT) than the spaces of continuous, semi-continuous or càd-làg functions.One main advantage of the proposed framework, in addition to allowing for rough function paths, is to pave the way for dimension reduction of the observations via functional PCA of the angular component Θ.In this respect the dimension reduction strategy that we propose may be seen as an extension of Drees and Sabourin (2021), who worked in the finite-dimensional setting and derived finite sample guarantees regarding the eigenspaces of the empirical covariance operator for Θ.However their techniques of proof cannot be leveraged in the present context because they crucially rely on the compactness of the unit sphere in R d , while the unit sphere in an infinitedimensional Hilbert space is not compact.
Several questions arise.First, when dealing with functional observations, the choice of the norm (thus of a functional space) is not indifferent, since not all norms are equivalent.In particular, their is no reason why regular variation in one functional space (say, C[0, 1]) would be equivalent to regular variation in a larger space such as L 2 [0, 1].Also a recurrent issue in the context of weak convergence of stochastic processes is to verify tightness conditions in addition to weak convergence of finitedimensional projections, in order to ensure weak convergence of the process as a whole.The case of Hilbert valued random variables makes no exception (see e.g.Chapter 1.8 in Vaart and Wellner (1996)).A natural question to ask is then: 'What concrete conditions regarding the angular and radial components in a RV/POT framework, which may be verified in practice on specific generative examples or even on real data, are sufficient in order to ensure tightness?'.Regarding the PCA of the angular distribution, one may wonder whether the eigen functions associated with the angular covariance operator above finite levels t > 0 indeed converge to the eigen functions of the covariance operator associated with the limit distribution P Θ,∞ under the RV conditions alone, and whether the results of Drees and Sabourin (2021) regarding concentration of the empirical eigen spaces indeed extend to the infinite-dimensional Hilbert space setting.
Extreme Value Analysis of functional PCA with L 2 -valued random functions have already been considered in the literature, from a quite different perspective however, leaving the above questions unanswered.In Kokoszka and Xiong (2018), the authors assume regular variation of the scores of a principal component decomposition, (i.e. the random coordinates of the observations projected onto a L 2 -orthogonal family) and they investigate the extremal behavior of their empirical counterparts.In Kokoszka et al. (2019) and Kokoszka and Kulik (2023), regular variation is assumed and various convergence results regarding the empirical covariance operators of the random function X (not the angular component Θ) are established, under the condition that the regular variation index belongs to some restricted interval, respectively 2 < α < 4 and 0 < α < 2. In contrast in the present work the value of the regular variation index is unimportant as the PCA that we consider is that of the angular component Θ of the random functions.As Θ belongs to a bounded subset of L 2 [0, 1], existence of moments of any order is automatically granted.Also in the existing works above mentioned, regular variation in L 2 [0, 1], in the sense of Hult and Lindskog, is taken for granted and no attempt is made to translate the general, abstract definition from Hult and Lindskog (2006) into concrete, finite-dimensional conditions.In Kim and Kokoszka (2022), extremal dependence between the scores of the functional PCA of X is investigated.They prove on this occasion (see Proposition 2.1 therein) that regular variation in L 2 [0, 1] implies multivariate regular variation of finite-dimensional projections of X.However, the reciprocal statement is not investigated.
The contribution of the present article is twofold.(i) We provide a comprehensive description of the notion of regular variation in a separable Hilbert space which fits into the framework of Hult and Lindskog (2006).In Section 3, we formulate specific characterizations involving finitedimensional projections and moments of the angular variable Θ, and we discuss the relationships between regular variation in C[0, 1] and in L 2 [0, 1].It turns out that the former implies the latter, whereas the converse is not true.We provide several examples and counter-examples illustrating our statements.(ii) We make a first step towards bridging the gap between dimension reduction approaches and functional extremes by considering the functional PCA of the angular variable Θ.In Section 4, we investigate the convergence of the non asymptotic covariance operator associated with distribution P Θ,t .In the situation where n ≥ 1 independent realizations of the random function X can be observed, we additionally provide statistical guarantees regarding empirical estimation of the sub asymptotic covariance operator associated to a radial threshold t n,k , the 1 − k/n quantile of the radial variable ∥X∥, in the form of concentration inequalities regarding the Hilbert-Schmidt norm of the estimation error, which leading terms involve the number k ≤ n of extreme order statistics considered to compute the estimator.These bounds, combined with regular variation of the observed random function X and the results from the preceding section ensure in particular the consistency of the empirical estimation procedure.In Section 5 we present experimental results involving real and simulated data illustrating the relevance of the proposed dimension reduction framework.Certain technical details are deferred to the Appendix.
For clarity, we start off by recalling some necessary background regarding probability and weak convergence in Hilbert spaces, functional PCA and regular variation.

Background and Preliminaries
As a first go, we recall some key facts about regular variation in metric spaces, probability in Hilbert spaces and Principal Component Analysis of random elements of a Hilbert space.Here and throughout, the indicator function of any event E is denoted by 1{E}.The Dirac mass at any point a is written as δ a and the integer part of any real number u by ⌊u⌋.By L 2 [0, 1] is meant the Hilbert space of square integrable, real-valued functions f : [0, 1] → R equipped with its standard inner product ⟨f, g⟩ = 1 0 f (s)g(s)ds and the L 2 -norm ∥f ∥ = ( 1 0 f 2 (s)ds) 1/2 .Our results are valid in any arbitrary separable Hilbert space H, for which we abusively use the same notations regarding the scalar product and the norm as in the special case of L 2 [0, 1] when it is clear from the context.Also we write H 0 = H \ {0}.Finally the arrow w − → stands for weak convergence of Borel probability measures: P n w − → P i.f.f.we have f dP n → f dP as n → +∞ for any bounded, continuous function f defined on the same space as P n and P .

Regular Variation in Euclidean and Metric Spaces
We recall here the main features of Regular Variation in metric spaces, a framework introduced by Hult and Lindskog (2006) as a generalization of the Euclidean case documented e.g. in Resnick (1987); Bingham et al. (1989).
Let (E, d) be a complete separable metric space, endowed with a multiplication by nonnegative real numbers t > 0, such that the mapping (t, x) ∈ R + × E → tx is continuous.One must assume the existence of an origin 0 ∈ E, such that 0x = 0 for all x ∈ E. In the present work we shall take E = H, a separable, real Hilbert space, and 0 will simply be the zero of H. Let E 0 = E \ {0}.For any subset A ⊂ E, and t > 0, we write tA = {tx : x ∈ A}.Denote by C 0 the set of bounded and continuous real-valued functions on E 0 which vanish in some neighborhood of 0 and let M 0 be the class of Borel measures on E 0 , which are finite on each Borel subset of E 0 bounded away from 0. Then the sequence µ n converges to µ in M 0 and we write A measurable function f : R → R is called regularly varying with index ρ, and we write f ∈ RV ρ , whenever for any x > 0 the ratio f (tx)/f (t) → x ρ as t → ∞.A measure ν in M 0 is regularly varying i.f.f.there exists a nonzero measure µ in M 0 and a regularly varying function b such that (2.1) The limit measure is necessarily homogeneous, for all t > 0 and Borel subset A of E 0 , µ(tA) = t −α µ(A) for some α > 0. Then we say that ν is regularly varying with index −α and we write ν ∈ RV −α .A random element X valued in E (that is, a Borel measurable map from some probability space (Ω, A, P) to E) is called regularly varying with index α > 0 if its probability distribution is in RV −α .In this case, one writes X ∈ RV −α (E).A convenient characterization of regular variation of a random element X is obtained through a polar decomposition.Let r(x) = d(x, 0) for x ∈ E.
For simplicity, and because it is true in the Hilbert space framework that is our main concern, we focus on the case where the distance to 0 is homogeneous, although this assumption can be relaxed, as in Segers et al. (2017).Notice that in H, r(x) = ∥x∥.Introduce a pseudo-angular variable, Θ = θ(X) where for x ∈ E 0 , θ(x) = r(x) −1 x and let R = r(X).Denote by S the unit sphere in E relative to r, S = {x ∈ E : r(x) = 1}, equipped with the trace Borel σ-field B(S).The map is the polar decomposition.A key quantity throughout this work will be the conditional distribution of the angle given that R > t for which we introduce the notation Several equivalent characterizations of regular variation of X have been proposed in Segers et al. (2017) in terms of the pair (R, Θ) where R = r(X), thus extending classical characterizations in the multivariate setting, see Resnick (2007).In particular the next statement shall prove to be useful in the subsequent analysis.
Proposition 2.1 (Proposition 3.1 in Segers et al. (2017)).A random element X in E is regularly varying with index α > 0 i.f.f.conditions (i) and (ii) below are simultaneously satisfied: (i) The radial variable R is regularly varying in R with index α; (ii) There exists a probability distribution P Θ,∞ on the sphere (S, B(S)) such that P Θ,t w − → P Θ,∞ as t → ∞.

Probability and Weak Convergence in Hilbert Spaces
Most of the background gathered in this section may be found with detailed proofs, references and discussions in the monograph Hsing and Eubank (2015), which provides a self-contained introduction to mathematical foundations of functional data analysis.Other helpful resources regarding probability and measure theory in Banach spaces and Bochner integrals include Vakhania et al. (2012) or Mikusiński (1978).
Probability in Hilbert spaces Consider a separable Hilbert space (H, ⟨•, •⟩) and denote by ∥ • ∥ the associated norm.Let (e i ) i≥1 be any orthonormal basis of H. Since a separable Hilbert space is a particular instance of a Polish space it follows from basic measure theory in (see e.g.Vakhania et al. (2012), Theorem 1.2) that the Borel σ-field B(H) is generated by the family of mappings {h * : x → ⟨x, h⟩, h ∈ H}, or in other words, by the class of cylinders In addition, since the countable family (e * i ) i≥1 separates points in H, it also generates the Borel σfield, see Proposition 1.4 and its corollary in Vakhania et al. (2012).In other words, if we denote by π N the projection from H to R N onto the first N ≥ 1 basis vectors, π N (x) = (⟨x, e 1 ⟩, . . ., ⟨x, e N ⟩), the family of cylinder sets We call H-valued random element (or variable) any Borel-measurable mapping X from a probability space (Ω, A, P) to H. A mapping X : Ω → H is Borel-measurable i.f.f. the real-valued projections ⟨X, h⟩ are Borel-measurable for all h ∈ H, and the distribution of X is entirely characterized by the distributions of these univariate projections, see Lemma 1.8.3. in Vaart and Wellner (1996) or Theorem 7.1.2in Hsing and Eubank (2015).Since the family C of cylinder sets is a π-system generating B(H), it is also true that the distributions of all finite dimensional projections (π N (X), N ∈ N) onto a given basis also determine the distribution of X. Integrability conditions for random elements in H are understood here in the Bochner sense.A random element X : (Ω, A key property of the classic expectation is linearity, and is also satisfied by the expectation defined in the Bochner sense.Namely if T is a bounded, linear operator from H 1 to H 2 , two Hilbert spaces, and if X is a Bochner-integrable random element in H Hsing and Eubank (2015).Many other properties of the classic expectation of a real-valued random variables are preserved, e.g.dominated convergence theorem.In particular, a version of Jensen's inequality can be formulated for H-valued random variables, see e.g.pp.42-43 in Ledoux and Talagrand (1991).
Weak convergence of H-valued random elements As our main concern in Section 3 is to characterize regular variation in Hilbert spaces in terms of weak convergence of appropriately rescaled variables, we recall some basic facts regarding weak convergence in Hilbert spaces.Most of the material recalled next for the sake of completeness can be found in Chapter 1.8 of Vaart and Wellner (1996) and Chapter 7 of Hsing and Eubank (2015) in a more detailed way.
By definition a sequence (X n ) n∈N of H-valued random variables weakly converges (or converges in distribution) to a H-valued random variable X, and we write X n w −→ X (or equivalently, µ n w −→ µ if µ n denotes the probability distribution of X n and µ, that of X), i.f.f., for every bounded, continuous function f : This abstract definition may be difficult to handle for verifying weak convergence in specific examples.However, weak convergence in H may equivalently be characterized via weak convergence of one-dimensional projections and an asymptotic tightness condition, as described next.Notice that, because H is separable and complete, the Prokhorov Theorem applies, i.e. uniform tightness and relative compactness of a family of probability measures are equivalent.Recall that a sequence of probability measures is uniformly tight if for every ε > 0, there exists a compact set Notice that, because H is separable and complete, any single random element valued in H is tight, see Lemma 1.3.2 in Vaart and Wellner (1996).
Remark 2.1 (On measurability and tightness).Before proceeding any further, in order to clear out any potential confusion, we emphasize that measurability of the considered maps X n : Ω → H is not required in Vaart and Wellner (1996), while it is assumed in the present work, in which we follow common practice in functional data analysis focusing on Hilbert-valued observations (as e.g in Hsing and Eubank (2015)).Notice also that the notion of tightness employed in Vaart and Wellner (1996) as a criterion for relative compactness of a family of random variables (X n , n ∈ N), is asymptotic tightness, that is: for all ε > 0, there exists a compact subset K of H, such that for every δ > 0, lim inf n→∞ P X n ∈ K δ > 1 − ε.Here, K δ denotes the δ-enlargement of the compact set K, that is, {x ∈ H : inf y∈K ∥x − y∥ < δ}.This is seemingly at odds with other presentations (Prokhorov (1956); Hsing and Eubank (2015)) where the argument is organized around the standard notion of uniform tightness, recalled above.However in a Polish space such as H, the two notions of tightness (asymptotic or uniform) are equivalent (Vaart and Wellner (1996), Problem 1.3.9),so that the presentations of Vaart and Wellner (1996) and Hsing and Eubank (2015) are in fact closer together than they may appear at first view.
A convenient criterion which is the main ingredient to ensure tightness (hence relative compactness) of a family of random H-valued random variables is termed asymptotically finite-dimensionality in Vaart and Wellner (1996) and seems to originate from Prokhorov (1956).A sequence of H-valued random variables is asymptotically finite-dimensional if, given a Hilbert basis (e i , i ≥ 1) as above, for all ε, δ > 0, there exists a finite subset I ⊂ N * such that lim sup (2. 3) It should be noticed that the above property is independent from the specific choice of a Hilbert basis.Asymptotic finite-dimensionality combined with uniform tightness of all univariate projections of the kind ⟨X n , h⟩, h ∈ H, is sufficient conditions for uniform tightness of the family of random variables (X n ) n∈N (see Hsing and Eubank (2015), Theorem 7.7.4).Also, since knowledge of the distributions of all univariate projections characterizes the distribution of a random Hilbertvalued variable X, asymptotic-finite dimensionality combined with weak convergence of univariate projections (or of finite dimensional ones on a fixed basis) are sufficient to prove weak convergence of a family of random elements in H, as summarized in the next statement.
Theorem 2.1 (Characterization of weak convergence in H).A net of H-valued random variables (X t ) t∈R converges in distribution to a random variable X if and only if it is asymptotically finitedimensional and either one of the two conditions below holds: The fact that asymptotic finite-dimensionality together with Condition 1. in the statement imply weak convergence results from Theorem 1.8.4 in Vaart and Wellner (1996) in the case where all mappings are measurable.To see that Condition 1. may be replaced with Condition 2. in order to prove weak convergence, note that asymptotic finite-dimensionality implies uniform tightness in the case of a Hilbert space (see Remark 2.1 above).Hence, weak convergence occurs if the two subsequential limits coincide.It is so because the family of cylinder sets C is a measure-determining class.

Principal Component Analysis of H-valued Random Elements
We recall the necessary definitions and mathematical background underlying principal component decomposition of H-valued random elements.A self-contained exposition of the topic may be found in Hsing and Eubank (2015), Chapter 7. In the sequel we use indifferently the terminology principal component decomposition or principal component analysis (functional PCA or PCA in short).Because of its optimality properties in terms of L 2 -error when H = L 2 [0, 1], functional PCA is widely used for a great variety of statistical purposes in functional data analysis.A standard reference on this topic is the monograph Ramsay and Silverman (2005).
On H a separable real Hilbert space as above, and for (f, g) ∈ H 2 the tensor product f ⊗ g is the linear operator on H defined by f ⊗ g(h) = ⟨f, h⟩g.Direct calculations show that f ⊗ g is a Hibert Schmidt operator with Hilbert-Schmidt norm ∥f ⊗g∥ HS(H) = ∥f ∥∥g∥.We recall that a linear operator T on H is Hilbert-Schmidt if, given a Hilbert basis (e i ) i≥1 , we have i∈N * ∥T e i ∥ 2 < ∞.The latter quantity is then the Hilbert-Schmidt norm of T , denoted by ∥T ∥ HS(H) and does not depend on the choice of the Hilbert basis.Hilbert-Schmidt operators are compact and the space HS(H) of Hilbert-Schmidt operators on H, equipped with ⟨ • , • ⟩ HS(H) the scalar product associated with the HS(H) norm, is itself a separable Hilbert space.
Let X be a H-valued random element as above and assume that E∥X∥ 2 < ∞.Then also E ∥X ⊗ X∥ HS(H) < ∞ so that the tensor product inside the expectation is Bochner integrable and one may define the (non-centered) covariance operator (2.4) By construction C is self-adjoint and C ∈ HS(H), thus C is compact.Also by linearity of Bochner integration, for any (h, g) ∈ H 2 , we have: A key result in functional PCA is the eigen decomposition of the covariance operator (see Theorem 7.2.6 from Hsing and Eubank (2015) regarding the centered covariance operator, which is also valid for the non-centered one): where λ 1 ≥ λ 2 ≥ . . .are the eigenvalues sorted by decreasing order and the φ i 's are orthonormal eigenvectors.The set of non zero eigenvalues λ i is either finite, or a sequence of nonnegative numbers converging to zero.The non zero eigenvalues have finite multiplicity.The eigen functions φ i form a Hilbert basis of Im(C).As it is the case for the centered version of C, the decomposition (2.5) immediately derives from the spectral theorem for compact, self-adjoint operators and the fact that C is nonnegative definite.
A useful property of the eigen functions (φ i ) i≥1 is that they allow perfect signal reconstruction, since almost-surely, X may be decomposed as (2.6) see Theorem 7.2.7 in Hsing and Eubank (2015).The scores Z i = ⟨X, φ i ⟩ satisfy E Z 2 i = λ i and E [Z i Z j ] = 0, so that the expansion (2.6) is called bi-orthogonal.For all N ≥ 1, the truncated expansion i≤N ⟨X, φ i ⟩φ i is optimal in the sense that it minimizes the integrated mean-squared error over all orthonormal collections (u 1 , . . ., u N ) of H.The tail behavior of the (summable) eigenvalue sequence (λ i ) i≥1 describes the optimal N -term approximation error, insofar as Notice that in the present paper we consider non centered covariance operator, mainly for the purpose of alleviating notations.We refer the reader to Cadima and Jolliffe (2009) for a comparison of centered and uncentered PCA.
Remark 2.2 (Functional PCA and Karhunen-Loève expansion).The functional PCA framework is closely related to the celebrated Karhunen-Loève expansion in the case where however both terms refer to subtly different frameworks, which deserves an explanation.The former framework (which is the one preferred in this work) relies on a H-valued random element X, with standard results concerning convergence of the expansions of X and its covariance operator in the Hilbert norm and Hilbert-Schmidt norm, respectively, recalled above.Then X's trajectories are in fact equivalence classes of square-integrable functions and the specific value X s (ω) of a realisation X(ω) at s ∈ [0, 1] is only defined almost everywhere.In contrast, the latter (Karhunen-Loève) framework relies on a second order stochastic process X = (X s , s ∈ [0, 1]), that is, a collection of random variables, which is continuous is quadratic mean with respect to the index s.Then one must impose additional joint measurability conditions of the mapping (ω, s) → X s (ω) in order to ensure that the process X is indeed a H-valued random element.In such a case the mean functions and the covariance operators defined both ways coincide.Also, the celebrated Karhunen-Loève Theorem (Loève (1978)) ensures convergence in quadratic mean of the expansion of X s , uniformly over s ∈ [0, 1].In order to avoid another layer of technicality, and because our main interest indeed lies in the eigenspaces of covariance operators rather than in pointwise reconstruction of the functions, we adopt in the present work the view where X is a H-valued random element, although additional joint measurability assumptions may be imposed in order to fit into the Karhunen-Loève framework.

Regular Variation in Hilbert Spaces
As a warm up we discuss a classic example in EVT, a multivariate multiplicative model within the framework of the multiplicative Breiman's lemma (Basrak et al. (2002), Proposition A.1) for which RV may be easily proved using existing general characterizations such as Equation (2.1).This example will serve as a basis for our simulated data example in Section 5.
Example 3.1.Let Z = (Z 1 , . . ., Z d ) ∈ R d be regularly varying with index α > 0 and limit measure µ, and let A = (A 1 , . . ., A d ) be a random vector of H-valued variables A i , independent of Z, such that E ( Proof.In their Proposition A.1, Basrak et al. (2002) consider the case where A j ∈ R q and A = (A 1 , . . ., A d ) is a q × d matrix.In the proof, they use the operator norm for A, but because all norms are equivalent in that case, their argument remains valid with the finite-dimensional Hilbert-Schmidt norm.In this finite-dimensional context, ∥A∥ is equal to , where ∥ • ∥ 2 is the Euclidean norm.An inspection of the arguments in their proof shows that they also apply to the case where A j ∈ H, up to replacing ∥A j ∥ 2 with ∥A j ∥ H and ∥A∥ with ( In particular Pratt's lemma is applicable because Fatou's Lemma is valid for nonnegative Hilbert space valued functions. The remainder of this section aims at providing some insight on specific properties of RV in H, as compared with RV in general separable metric spaces as introduced by Hult and Lindskog (2006) or, at the other end of the spectrum, RV in a Euclidean space.On the one hand, we focus on possible finite-dimensional characterizations of RV in H, with a view towards statistical applications in which abstract convergence conditions in an infinite dimensional space cannot be verified on real data, while finite-dimensional conditions may serve as a basis for statistical tests.Although we do not go as far as proposing such rigorous statistical procedures, we do suggest in the experimental section some convergence diagnostics relying on the results gathered in this section.On the other hand we discuss the relationships existing between RV in C[0, 1] and RV in H = L 2 [0, 1].

Finite-dimensional Characterizations of Regular Variation in H
RV random elements in H have been present in the literature for a long time, due to strong connections between RV and domains of attraction of stable laws in general and in separable Hilbert spaces in particular.As an example Kuelbs and Mandrekar (1974) show (through their Lemma 4.1 and their Theorem 4.11) that a random element in H which is in the domain of attraction of a stable law with type 0 < α < 2 is regularly varying.However this connection does not yield any finite-dimensional characterization which are our main focus here.
As a first go we recall Proposition 2.1 from Kim and Kokoszka (2022) making a first connection between regular variation in H and regular variation of finite dimensional (fidi in abbreviated form) projections.Let (e i , i ∈ N) be a complete orthonormal system in H.For I = (i 1 , . . .i N ) a finite set of indices with cardinality N ≥ 1, denote by π I the 'coordinate projection' on the associated finite family, π I (x) = (⟨x, e i1 ⟩, . . ., ⟨x, e i N ⟩), x ∈ H.In particular we denote by π N : H → R N the projection onto the N first elements of the basis (e i , i ∈ N).
Proposition 3.1 (RV in H implies multivariate RV of fidi projections).If X a random element of H is regularly varying with index α > 0 then also for all finite index set One natural question to ask is whether the reciprocal of Proposition 3.1 is true.We answer in the negative in Proposition 3.2 below.Proposition 3.2 (Multivariate RV of fidi projections does not imply RV in H).The reciprocal of Proposition 3.2 is not true.In particular there exists a random element X in H which is not RV, while Sketch of proof.We construct a random element X in H in such a way that the probability mass of its angular component Θ, given the radial component R, escapes at infinity as R grows.Here, at infinity must be understood as span(e i , i ≥ M ) as M → ∞.Namely let X := RΘ with radial component R = ∥X∥ ∼ P areto(α) on [1, +∞[ (i.e.∀t ≥ 1, P (R 0 ≥ t) = t −α ) and define the conditional distribution of Θ given R as the mixture of Dirac masses: In other words, for i ≤ R, we have Θ = e i with probability proportional to 1/i.The remaining of the proof, deferred to the Appendix, consists in verifying that (i) all finite-dimensional projections of X are RV; (ii) asymptotic finite-dimensionality (see Equation (2.3)) of the family of conditional distributions P Θ,t does not hold, hence it may not converge to any limit distribution, so that Condition (ii) from Proposition 2.1 does not hold and X may not be RV.
The counter-example above suggests that the missing assumption to obtain RV in H is some relative compactness criterion.This is partly confirmed in the next example where the angular variables Θ t is again a mixture model supported by the e i 's but where the probability mass for the conditional distribution of Θ given ∥X∥ concentrates around finite-dimensional spaces.The proof, postponed to the Appendix, proceeds by verifying both conditions from Proposition 2.1.
In words, Θ ∈ {e 1 , e 2 , ...} and ∀r ≥ 1, ∀j ∈ N * such that j ≤ r, we have Then, the random element X = RΘ is regularly varying in H with index α with limit angular random variable Θ ∞ given by The next proposition confirms the intuition built up by the above examples that asymptotic finite dimensionality is a necessary additional assumption to RV of finite dimensional projections and of the norm.
Proposition 3.3.Let X be a H-valued random element.The three conditions below are equivalent.
1. X is regularly varying in H with index α > 0, limit measure µ and normalizing sequence b n > 0, i.e.
2. The family of measures (µ n ) n≥1 is relatively compact in M 0 (H)-topology, and for all N ∈ N, π N X is regularly varying in R N with limit measure µ N = µ•π −1 N , index α and scaling sequence b n .
3. The family of measures (µ n ) n≥1 is relatively compact in M 0 (H)-topology, and for all h ∈ H 0 , ⟨x, h⟩ is regularly varying in R with limit measure µ h = µ • (h * ) −1 , index α and scaling sequence b n , where h * (x) = ⟨h, x⟩.
If X is RV as in the statement 1., then (µ n ) n≥1 converges in the M 0 (H) topology and the family is of course relatively compact.Also fix N ≥ 1 and notice that π N is a continuous mapping from (H, ∥ • ∥) to R N endowed with the Euclidean norm.The same is true for the bounded linear functional h * .The continuous mapping theorem in M 0 (see Hult and Lindskog (2006), Theorem 2.5) 2. ⇒ 1.If µ n is relatively compact, the sequence µ n converges in M 0 (H) i.f.f.any two subsequential limits µ 1 , µ 2 coincide.However it follows form the previous implication that in such a case, the finite dimensional projections of µ 1 and µ 2 coincide, namely N , for all integer N .Consider the family of cylinder sets of H with measurable base, C = {π −1 N (A), A ∈ B(R N ), N ∈ N * }.On C the measures µ, µ 1 , µ 2 coincide.The cylinder sets family C is a π-system which generates the Borel σ-field, because it is associated to the family of bounded linear functional (e * i , i ∈ N) which separates points.Thus µ, µ 1 , µ 2 coincide on every Borelian set, and the proof is complete.3. ⇒ 1.As above, it is enough to show that two subsequential limits µ 1 , µ 2 coincide.In this case it is obviously so, because it is known that the Borel σ-field on H is generated by the mappings h * (e.g.Hsing and Eubank (2015), Theorem 7.1.1.).
The line of thought of Proposition 3.3 may be pursued further by characterizing the property of relative compactness of a family (ν n ) n∈N ∈ M 0 (H) through asymptotic finite-dimensionality (see Equation (2.3)), following the lines of the proof of Theorem 4.3 in Hult and Lindskog (2006), relying in particular on Theorem 2.6 of the cited reference.However it is also possible to rely on known characterizations of relative compactness for probability measures, coupled with the polar characterization of RV (Proposition 2.1).We propose in this spirit the following simple characterization solely based on weak convergence of univariate and finite-dimensional projections, together with regular variation of the norm, without additional requirements regarding asymptotic finite-dimensionality.
Theorem 3.1.Let X be a random element in H and let Θ t be a random element in H distributed on the sphere S according to the conditional angular distribution P Θ,t .Let P Θ,∞ denote a probability measure on (S, B(S)) and let Θ ∞ be a random element distributed according to P Θ,∞ .The following statements are equivalent.
3. ∥X∥ is regularly varying in R with index α, and Proof.The fact that 1 implies 2 and 3 is a direct consequence of the polar characterization of RV (Proposition 2.1) and of the continuous mapping theorem applied to the bounded linear mappings h * , h ∈ H and π N , N ∈ N.
For the reverse implications (3⇒ 1) and (2 ⇒ 1), in view of Proposition 2.1, we only need to verify that for any sequence either Condition 2 or Condition 3 holds true, then it will be so i.f.f. the family P Θ,tn , n ∈ N is asymptotically finite-dimensional.
We use the fact, stated and proved in Tsukuda (2017), that if (Z n , n ∈ N) and Z are H-valued random elements such that, as n → ∞, and for all j ∈ N * E[⟨Z n , e j ⟩ 2 ] → E[⟨Z, e j ⟩ 2 ], (3.4) then the sequence (Z n ) n∈N is asymptotically finite-dimensional.

Regular Variation in
Turning to the case where H = L 2 [0, 1], we discuss the relationships between the notions of regular variation in L 2 [0, 1] and in C[0, 1], the space of continuous functions on [0, 1].Indeed, any continuous stochastic process (X t , t ∈ [0, 1]) is also a random element in H = L 2 [0, 1], as proved in Hsing and Eubank (2015), Theorem 7.4.1, or 7.4.2.It is thus legitimate to ask whether regular variation with respect to one norm implies regular variation for the other norm for such stochastic processes.Proposition 3.4.Let X be a continuous process over ), as t → +∞, where Θ ∞,∞ is the angular limit process w.r.t. the sup-norm ∥ • ∥ ∞ .Then, X ∈ RV −α (L 2 [0, 1]), and the angular limit process Θ ∞,2 (w.r.t. the L 2 norm ∥ • ∥) has distribution given by where B ∈ B(S 2 ).Dombry and Ribatet (2015) applies (upon chosing ℓ(X) = ∥X∥ with the notations of the cited reference), which yields regular variation of X in L 2 [0, 1], together with the expression given in (3.5) for the angular measure associated with the L 2 norm ∥ • ∥.
Proposition 3.5.The reverse statement of Proposition 3.4 is not true in general.There exists a sample-continuous stochastic process over [0, 1] which is regularly varying in Proof.We construct a 'spiked' continuous process with controlled L 2 norm, while the sup-norm is super-heavy tailed.Let Z follow a Pareto distribution with parameter α Z > 0, and define a sample-continuous stochastic process Straightforward computations yield ∥Y ∥ ∞ = exp(Z) and ∥Y ∥ 2 = Z.Let ρ be another independent Pareto-distributed variable with index 0 < α ρ < α Z .Finally, define X = ρY .Then X is a samplecontinuous stochastic process over [0, 1].We have ∥X∥ ∞ = ρ exp(Z), which is clearly not regularly varying because (see e.g.Mikosch (1999) On the other hand, the pair (ρ, Y ) satisfies the assumptions of Example 3.1 with d = 1.Hence, X = ρY is regularly varying in H = L 2 [0, 1].
Propositions 3.5 and 3.4 together show that the framework of L 2 -regular variation encompasses a wider classes of continuous processes than standard C[0, 1] regular variation.This opens a road towards applications of EVT in situations where the relevant definition of an extreme event has to be understood in terms of 'energy' of the (continuous) trajectory, as measured by the L 2 norm, rather than in terms of sup-norm.

Principal Component Analysis of Extreme Functions
This section gathers the main results of the paper.Motivated by dimension reduction purposes, our goal is to construct a finite-dimensional representation of extreme functions.In other words our primary purpose is to learn a finite-dimensional subspace V of H = L 2 [0, 1] such that the orthogonal projections of extreme functions onto V are optimal in terms of angular reconstruction error.Throughout this section we place ourselves in the setting of regular variation introduced in Section 3 and consider a regularly varying random element X in H as in Theorem 3.1, with the same notations.Our focus is thus on building a low-dimensional representation of the angular distribution of extremes P Θ,∞ introduced in Section 2.1.We consider the eigen decomposition of the associated covariance operator where Θ ∞ ∼ P Θ,∞ , and the φ ∞ j 's and λ j ∞ 's are eigenfunctions and eigenvalues of C ∞ following the notations of Section 2.3.If P Θ,∞ is sufficiently concentrated around a finite-dimensional subspace of moderate dimension p, a reasonable approximation of P Θ,∞ is provided by its image measure via the projection onto V p ∞ = Vect(φ ∞ j , j ≤ p).Independently from such sparsity assumptions, the space V p ∞ minimizes the reconstruction error (2.7) of the orthogonal projection relative to Θ ∞ .It is also the unique minimizer as soon as λ p ∞ > λ p+1 ∞ , as discussed in the background section 2.3.Our main results bring finite sample guarantees regarding an empirical version of V p ∞ constructed using the k ≪ n largest observations.In this respect our work may be seen as an extension of Drees and Sabourin (2021), who consider finite dimensional observations X ∈ R d , to an infinite dimensional ambient space.However our proof techniques are fundamentally different from the cited reference.Indeed their analysis relies on Empirical Risk Minimzation arguments relative to the reconstruction risk at infinity, where Π V denotes the orthogonal projection onto V .The main ingredients of their analysis are (i) the fact that V p ∞ minimizes the risk at infinity (ii) compactness of the unit sphere (or of any bounded, closed subset of R d ).In the present setting such compactness properties do not hold and we follow an entirely different path, as we investigate the convergence of an empirical version of C ∞ in the Hilbert-Schmidt norm, and then rely on perturbation theory for covariance operators in order to control the deviations of its eigenspaces.We thus consider the pre-asymptotic covariance operator (4.1) In the sequel, the discrepancy between finite dimensional linear subspaces of H is measured in terms of the Hilbert-Schmidt norm of the difference between orthogonal projections, namely we define a distance ρ between finite dimensional subspaces V, W of H, by It should be noticed that Drees and Sabourin (2021) denote by ρ the operator norm of the difference between the projections, which is coarser than the Hilbert-Schmidt one.
We show in Section 4.1 that the first p eigenfunctions of the pre-asymptotic operator C t generate a vector space V p t converging to V p ∞ whenever λ p ∞ > λ p+1 ∞ .Second, we establish in Section 4.2 the consistency of the empirical subspace V p t (the one generated by the first p eigenfunctions of an empirical version of C t ) and we derive nonasymptotic guarantees for its deviations, based on concentration inequalities regarding the empirical covariance operator.

The Pre-asymptotic Covariance Operator and its Eigenspaces
Since perturbation theory allows to control the deviations of eigenvectors and eigenvalues of a perturbed covariance operator, a natural first step in our analysis is to ensure that the preasymptotic operator C t introduced in (4.1) may indeed be seen as a perturbed version of the asymptotic operator C ∞ , as shown next.
Theorem 4.1 (Convergence of the pre-asymptotic covariance operator).In the setting of Theorem 3.1, as t → ∞, the following convergence in the Hilbert-Schmidt norm holds true, Proof.Recall from Proposition 2.1 that RV of X implies weak convergence of the net Θ t towards Θ ∞ .Using the fact that the mapping h ∈ H → h⊗h ∈ HS(H) is continuous, also Θ t ⊗Θ t converges weakly towards Θ ∞ ⊗ Θ ∞ .Let (t n ) n∈N be a nondecreasing sequence of reals converging to infinity.
Since the separability of (H, ⟨•, •⟩) implies the separability of (HS(H), ⟨•, •⟩ HS(H) ) (see Blanchard et al. (2007), Section 2.1), we may apply the Skorokhod's Representation theorem to the weakly converging sequence Θ tn ⊗ Θ tn .Thus there is a probability space (Ω ′ , F, P ′ ), and random elements almost surely with respect to P ′ .A Jensen's type inequality in Hilbert spaces (see Ledoux and Talagrand (1991), pp. 42-43) yields The dominated convergence theorem applied to the vanishing sequence of random variables ∥Y n − Y ∞ ∥ HS(H) (which are bounded by the constant 2) completes the proof.
Remark 4.1.An alternative way to obtain the weak convergence of Θ t ⊗ Θ t , which is key in the proof of Theorem 4.1, is to leverage Proposition 3.2 in Kokoszka et al. (2019), which ensures that the operator X ⊗ X is regularly varying in HS(H).Since Θ ⊗ Θ is indeed the angular component of X ⊗ X, the result follows by an application of Proposition 2.1.
The next result concerns the convergence of eigenspaces and is obtained by combining tools from operator perturbation theory with the result from Theorem 4.1.In order to avoid additional technicalities we consider in the next statement an integer p such that λ p ∞ > λ p+1 ∞ ≥ 0, that is, a positive the spectral gap.Notice that such a p necessarily exists since Corollary 4.1 (Convergence of pre-asymptotic eigen spaces).Let p ∈ N * be such that λ p ∞ > λ p+1 ∞ .Then, as t tends to infinity, ρ(V p t , V p ∞ ) → 0. Proof.According to Theorem 3 in Zwald and Blanchard (2005), for A and B two Hilbert-Schmidt operators on a separable Hilbert space, and an integer p such that the ordered eigenvalues of A satisfy λ is such that A + B is still a positive operator, then following inequality holds where V p and W p are respectively the eigen spaces spanned by the first p eigenvectors of A and A+B.From Theorem 4.1, the operators A = C ∞ and B = C ∞ −C t satisfy the required assumptions stated above for t sufficiently large, and ∥B∥ HS(H) may be chosen arbitrarily small, which concludes the proof.
Remark 4.2 (Convergence of eigenvalues and choice of p).Even though the eigenvalues of C ∞ are not the main focus of our work, they are involved in the conditions of Corollary 4.1 through the requirement of a positive spectral gap.Of course these eigen values are unknown, however Weyl's inequality (see Hsing and Eubank (2015), Theorem 4.2.8)ensures that sup j≥1 Identification of an integer p for which the eigen gap is positive may thus be achieved using consistent estimates of the λ j t 's for t large enough.

Empirical Estimation: Consistency and Concentration Results
We now turn to statistical properties of empirical estimates of C t and its eigen decomposition based on an independent sample X 1 , ..., X n distributed as X.Following standard practice in Peaks-Over-Threshold analysis, we consider a fixed number of excesses k above a random radial threshold chosen as the empirical 1 − k/n quantile of the norm, with k ≪ n.Even though our main results are of non asymptotic nature, letting k, n → ∞ with k/n → 0 yields consistency guarantees such as Corollary 4.3 below.Denote by X (1) , . . .X (n) the permutation of the sample such that ∥X (1) ∥ ≥ ∥X (2) ∥ ≥ ... ≥ ∥X (n) ∥, and accordingly, let Θ (i) , R (i) denote the angular and radial components of is an empirical version of the (1 − k/n) quantile of the norm R, which we shall sometimes denote by t n,k .With these notations an empirical version of C t n,k is Remark 4.3.(Choice of k) Choosing the number k of observations considered as extreme, is an important but difficult topic in EVT.A wide variety of methods have been proposed in univariate problems (Caeiro and Gomes (2016); Scarrott and MacDonald (2012)), some rule of thumbs exist in multivariate settings based on visual inspection of angular histograms (Coles and Tawn (1994) or stability under rescaling of the radial distribution (Stărică (1999)) with little theoretical foundations.
We leave this question outside our scope, although visual diagnostics are proposed in our numerical study based on Hill plots and convergence checking based on the finite-dimensional characterizations of RV stated in Theorem 3.1.
Our analysis of the statistical error ∥ C k −C t n,k ∥ HS(H) involves the intermediate pseudo empirical covariance observable, although its deviation from C k may be controlled by the classical Bernstein inequality (Proposition 4.2).Our point of departure is the following decomposition of the statistical error, We analyze separately the two terms in the right-hand side of (4.3) in the next two propositions.
Proposition 4.1.Let δ ∈ (0, 1).With probability larger than 1 − δ/2, we have sketch of proof.A Bernstein-type concentration inequality from McDiarmid (1998) which is applicable to arbitrary functions of n variables with controlled conditional variance and conditional range (Theorem 3.8 of the reference, recalled in Lemma B.1 from the Appendix) ensures that In order to control the expected deviation E ∥C t n,k − C t n,k ∥ HS(H) in the left-hand side, we use the fact that, if A 1 , ..., A n are independent centered H-valued random elements, E[ 3 in the Appendix).We apply this result to A i chosen as the deviation of the operator Θ i ⊗ Θ i 1{R i ≥ t n,k } from its expectation, which yields ) and finishes the proof, as detailed in Appendix B.
We now turn to the second term ∥ C h − C t n,k ∥ HS(H) in the error decomposition (4.3).
Proposition 4.2.Let δ ∈ (0, 1).With probability larger than 1 − δ/2, we have Proof.First, the triangle inequality yields The number of non-zero terms inside the sum in the above display is the number of indices i such that ' R i < R (k) and R i ≥ t n,k ' , or the other way around, thus Solving for ε and using the fact that √ a + b ≤ √ a + √ b for any nonnegative numbers a, b, we obtain the upper bound in the statement.
We are now ready to state a non-asymptotic guarantee regarding the deviations (in the HS-norm) of the empirical covariance operator.
Remark 4.4 (Tightness of the upper bound, asymptotics).The bound obtained in Theorem 4.2 constitutes a minimal guarantee regarding covariance estimation of the extremes.By no means do we claim optimality regarding the multiplicative constants, which we have not tried to optimize, as revealed by an inspection fo the proof where the decomposition of the adverse event into two events of same probability may be sub-optimal.However the leading term of the error as k → ∞ is an explicit, moderate constant and the rate of convergence is 1/ √ k, which matches known asymptotic rates in the literature of tail empirical processes in the univariate or multivariate case (see e.g.Einmahl and Mason (1988) or Aghbalou et al. (2023), Theorem 3).We leave to further research the question of the asymptotic behaviour of C k − C t n,k as k, n → ∞, k/n → 0, a problem which could be attacked by means of Lindeberg central limit theorems in Hilbert spaces (Kundu et al. (2000)).
Combining Theorems 4.1 and 4.2, the following consistency result is immediate.

Corollary 4.2 (Consistency). The empirical covariance of extreme angles
Theorem 4.2 also provides a control of the deviations of the empirical eigenspaces, with a proof paralleling the one of Corollary 4.1.In the following statement we denote by V p k such an eigenspace, that is, the linear space generated by the first p eigen functions of C k .
Denote the pre-asymptotic eigen gap by Let n, k be large enough so that γ p t n,k > 0 (see Remark 4.2 for the fact that γ p t n,k → γ p ∞ > 0).For δ ∈ (0, 1), with probability larger than 1 − δ, we have , where B(n, k, δ) is the upper bound on the deviations of C k stated in Theorem 4.2.In particular, we have the following consistency result as n, k → ∞ while k/n → 0,

Illustrative Numerical Experiments
Two possible applications of PCA for functional extremes are considered here.In both contexts, our goal is to assess the usefulness of the proposed functional PCA method for extremes by comparing it with the closest alternative, namely functional PCA of the full sample (not only extremes).On the one hand, a typical objective is to identify likely profiles of extreme events, by which we mean a finite dimensional subspace of H with basis given by the eigenfunctions of C ∞ with the highest eigenvalue.In this context, extreme functional PCA serves as a pattern identification tool for a qualitative interpretation.This line of thoughts is illustrated in Section 5.1 on a toy simulated dataset in the multiplicative model of Example 3.1.On the other hand, functional PCA of extremes may be viewed as a data compression tool allowing to represent functional extremes in a finite dimensional manner, with optimal reconstruction properties which would not be achieved by standard functional PCA.The relevance of this approach is demonstrated in Section 5.2 with an electricity demand dataset which is publicly available on the CRAN network.On this occasion we also propose visual diagnostics for functional regular variation according to finite-dimensional characterizations proposed in Section 3.
The electricity demand dataset thursdaydemand considered in Section 5.2 is available in the R package fds.It contains half-hourly electricity demands on thursdays in Adelaide between 6/7/1997 and 31/3/2007.It is made of n = 508 observations X i , each of them being represented as a vector of size 48, indicating the recorded half-hour demand on day i.Here an 'angle' is in practice the profile of the half-hour records over one day, i.e. the original curve rescaled by its L 2 -norm.
In our toy example (Section 5.1) we generate a functional regularly varying dataset of same dimension d = 48 with larger sample size n = 10e + 3, according to Example 3.1.With the notations of the latter example, we choose Z ∈ R 4 with independent components, with Z 1 ∼ Pareto(0.5),Z 2 ∼ 0.8 * Pareto(0.5), , where N m, √ σ 2 is the normal distribution with mean m and variance σ 2 .The first two components have a heavier tail than the last four, which may be considered at noise above sufficiently high level.The angular measure on the sphere of R 4 is concentrated on the canonical basis vectors (e 1 , e 2 ).
From a numerical perspective, all scalar products in L 2 [0, 1] are approximated in this work by the Euclidean scalar product in R 48 , which corresponds to a Riemann midpoint rule.For simplicity, and because the choice of the unit scale is also arbitrary, we dispense with standardizing by the half-hour width between records.Several numerical solutions exist to perform the eigendecomposition of the empirical covariance operator.However the considered datasets are moderately high dimensional and because all observations are regularly sampled in time we may use the simplest strategy, which is to perform the eigendecomposition of second moments matrix X ⊤ X ∈ R 48×48 where X i,j is the j th time record on the i th day.In practice we rely on the svd function in R issuing the singular value decomposition of X based on a LAPACK routine.This boils down to choosing as a basis for L 2 [0, 1] a family of indicator functions centered at the obervation times.Alternative orthonormal families in L 2 [0, 1] (typically, the Fourier basis or wavelet basis) may be preferred in higher dimensional contexts or with irregularly sampled observations.

Pattern Identification of functional extremes
With the synthetic dataset described above, we compare the output of functional PCA applied to extreme angular data, to the one obtained using all possible angles, i.e. the eigen decomposition of C k with that of C n .The scree-plot (i.e. the graph of ordered eigen values, normalized by their sum) for both operators is displayed in Figure 1.The gap between the first two eigenvalues and the remaining ones is more pronounced with C k than with C n , indicating that the method we promote is able to uncover a sparsity pattern at extreme levels.The limit measure of extremes is indeed concentrated on a two-dimensional subspace, as opposed to the distribution of the full dataset which support has dimension four.In addition the 'true' extreme angular pattern, which is a superposition of two periodic signals with frequencies (1, 7), is easily recognized on the first two eigenfunctions of the extreme covariance C k (solid lines, first two panels of the second row in Figure 1) while these frequencies are perturbed by shorter tailed 'noise' with the full covariance C n (dotted lines).The discrepancy between extreme and non-extreme eigen functions vanishes for the third eigen function, which may be considered as 'noise' as far as extremes are concerned.

Optimal reconstruction of functional extremes on the electricity demand dataset
Here we investigate the L 2 reconstruction error when projecting new (test) angular observations on the eigenspaces issued from the spectral decomposition of the empirical covariance operator C k .Another important goal of this section is to provide guidelines and graphical diagnostic tools allowing to check whether functional regular variation in L 2 may reasonably be assumed for a given functional dataset.We choose to consider the component-wise square root of the records so that the (squared) L 2 norm of each vector X i is an approximation of the integrated demand over a full day, which seems meaningful from an industrial perspective.For simplicity we ignore in this illustrative study any temporal dependence from week to week.As a first step, regular variation must be checked and an appropriate number k of extreme observations should be selected for estimating C ∞ with C k .A Gaussian QQ-plot (not shown) suggests that the radial quantile is potentially heavy-tailed.In view of Theorem 3.3, 2., one should check regular variation of the radial variable and weak convergence of univariate projections ⟨Θ t , h⟩.Regarding the radial variable R = ∥X∥, we propose to inspect a Hill plot and a Pareto quantile plot (Beirlant et al. (2006), Chapter 2).Visual inspection (Figure 2) suggests a stability region for the Hill estimator of γ = 1/α (left panel) between k = 50 and k = 200.Choosing k = 100 corresponds to an empirical quantile level 1 − k/n ≈ 0.7, for which the Pareto quantile plot (right panel) is reasonably linear.For k = 100 the estimated regular variation index with the Hill estimator γ is α = 1/γ = 22.5 (0.95 CI: [18.8 − 27.9]).
The condition of weak convergence of projections ⟨Θ t , h⟩ is obviously difficult to check in practice, in particular because it must hold for any h.As a default strategy we propose to check convergence of the (absolute value of) the first moment, namely convergence of E|⟨Θ t , h⟩| as t → ∞, for a finite number of 'appropriate' functions h j , j ∈ {1, . . ., J}.The context of daily records suggests a periodic family, namely we choose h j (x) = sin(2πjx), for j ∈ {1, 2, 3, 4, 6, 8}.Turning to performance assessment, we consider the reconstruction squared error of a validation subsample of extreme angles V ⊂ {Θ (1) . . ., Θ (k) } (k = 100), after projection on the principal eigenspaces of dimension p corresponding to three variants of the empirical uncentered angular covariance operator.In this experiment we choose p = 2. Namely we consider uncentered covariances (i) C k , built from an extreme training set T = {Θ (1) . . ., Θ (k) } \ V ; (ii) C n , incorporating all angles (including non-extreme ones) except from the validation set, {Θ 1 , . . ., Θ n } \ V; (iii) C n,k , built from a subsample of {Θ 1 , . . ., Θ n } \ V of same size as T .The left panel of Figure 4 displays the boxplots of the cross-validation error obtained over 300 independent experiments where a validation set V of size 30 is randomly chosen among {Θ (1) . . ., Θ (k) }.The right panel displays the out-ofsample error over a tail region, namely the validation set V is composed of the most extreme data {Θ (1) . . ., Θ (30) }, and the boxplots represent the variability of the reconstruction error over the validation set.The conclusion is the same for both panels, performing functional PCA on the fraction of the angular data corresponding to the most extreme angles significantly reduces the reconstruction error, despite the reduced size of the training set.Comparison between the second and the third boxplot of each panel illustrates the negative impact of the reducing the training sample size, while comparing the first and the third boxplots shows the bias reduction achieved by localizing on the tail region.On this particular example the bias-variance trade-off favors our approach.1 i δ ei .We have shown that for all N ∈ N, π N (X) is regularly varying in R N with tail index −α.
We now show that X / ∈ RV(H).Since ∥X∥ ∈ RV −α (R), according to Proposition 2.1, we have to prove that L Θ | R ≥ t does not converge when t tends to infinity.From Theorem 1.8.4. in Vaart and Wellner (1996), it is enough to show that the sequence of measures P Θ,n = P (Θ ∈ • | R > n) is not asymptotically finite-dimensional.i.e. that ∃δ, ε > 0, ∀d ∈ N * , lim sup Hence, the asymptotic finite-dimensional condition does not hold and X is not regularly varying in H.
Proof of the claim in Example 3.2.
We show that X is regularly varying in H.Following the lines of the proof of Proposition 3.2, it is enough to verify that P Θ,t w − → Θ ∞ .Since the common support of P Θ,t and P Θ,∞ is discrete we only need to show that P (Θ = e j | R > t) → 6/(πj) 2 for fixed j ∈ N * .
In order to apply this inequality to our purposes we need to write the empirical pre-asymptotic operator (or its surrogate C t n,k ) as a function f t of the sample X 1:n .With this in mind, we introduce a thresholded angular functional Observe that with this notation, Θ i 1{R i > t} = θ t (X i ).Consider now the function Notice that f t n,k (X 1:n ) = ∥C t n,k − C t n,k ∥ HS(H) which is the focus of Proposition 4.1.
Lemma B.2 (deviations of f t n,k (X 1:n )).With the above notations, we have .
Proof.We apply Lemma B.1 to the function f = f t n,k .To do so we derive upper bounds on the maximum deviation term b and on the maximum sum of variances σ 2 from the statement.Let x 1:n ∈ H n .The maximum deviation b is bounded by 2/k since by independence among X i 's, with the notations of Lemma B.1, where the first inequality comes from the triangle inequality |∥a∥ − ∥b∥| ≤ ∥a − b∥, and the second one from the fact that ∥s ⊗ s∥ HS(H) = 1 if ∥s∥ = 1.
There remains to bound the variance term.Since for every 1 ≤ i ≤ n, by the tower rule for conditional expectations, E [g i (x 1:i−1 , X i )] = 0, we may write, for Y i and independent copy of X i , σ 2 i (f t n,k (x 1 , ..., x n )) = E (f t n,k (x 1 , ..., x i−1 , Y i , X i+1 , ..., X n ) − f t n,k (x 1 , ..., x i−1 , X i , X i+1 , ..., X n )) Hence, v is bounded from above by 2/k.injecting the upper bounds on v and b in Lemma B.1 concludes the proof.
The following intermediate lemma proves useful for bounding the expected deviation in the left-hand side of Lemma B.2. Lemma B.3.Let A 1 , ..., A n be independent centered random elements in H. Then Proof.The left-hand side equals n i=1 E ∥A i ∥ 2 + 2 1≤i<l≤n E [⟨A i , A l ⟩].Since the A i 's are independent with mean 0, for all 1 ≤ i < l ≤ n, which concludes the proof.
We are now ready to obtain a bound on E∥C t n,k − C t n,k ∥ HS(H) .
Corollary 4.3 (Deviations of empirical eigenspaces).Let p ∈ N * satisfying the same positive eigen gap assumption as in Corollary 4.1, that is γ p ∞ Figure 3 display the six plots of empirical conditional moment 1 k k i=1 |⟨Θ (i) , h j ⟩|.The plots confirm the existence of a relative stability region around k = 100.

Figure 1 :Figure 2 :Figure 3 :Figure 4 :
Figure 1: Simulated data: Scree plots and first three eigenfunctions.Diamond shaped dots and dashed lines: angular functional PCA of extremes ( C k ).Round dots and solid lines: angular functional PCA of the full dataset ( C n ).Dotted lines on the first two plots, bottom left: (normalized) functions A 1 , A 2 , i.e. support of the angular measure for extremes.