Locally Stationary Functional Time Series

The literature on time series of functional data has focused on processes of which the probabilistic law is either constant over time or constant up to its second-order structure. Especially for long stretches of data it is desirable to be able to weaken this assumption. This paper introduces a framework that will enable meaningful statistical inference of functional data of which the dynamics change over time. We put forward the concept of local stationarity in the functional setting and establish a class of processes that have a functional time-varying spectral representation. Subsequently, we derive conditions that allow for fundamental results from nonstationary multivariate time series to carry over to the function space. In particular, time-varying functional ARMA processes are investigated and shown to be functional locally stationary according to the proposed definition. As a side-result, we establish a Cram\'er representation for an important class of weakly stationary functional processes. Important in our context is the notion of a time-varying spectral density operator of which the properties are studied and uniqueness is derived. Finally, we provide a consistent nonparametric estimator of this operator and show it is asymptotically Gaussian using a weaker tightness criterion than what is usually deemed necessary.


Introduction
In functional data analysis, the variables of interest take the form of smooth functions that vary randomly between repeated observations or measurements. Thus functional data are represented by random smooth functions Xpτ q, τ P D, defined on a continuum D. Examples of functional data are concentration of fine dust as a function of day time, the growth curve of children as functions of age, or the intensity as a function of wavelength in spectroscopy. Because functional data analysis deals with inherently infinite-dimensional data objects, dimension reduction techniques such as functional principal component analysis (FPCA) have been a focal point in the literature. Fundamental for these methods is the existence of a Karhunen-Loève decomposition of the process (Karhunen, 1947;Loève, 1948). Some noteworthy early contributions are Kleffe (1973); Grenander (1981); Dauxois et al. (1982); Besse and Ramsay (1986). For an introductory overview of the main functional data concepts we refer to Ramsay and Silverman (2005) and Ferraty and Vieu (2006).
Most techniques to analyze functional data are developed under the assumption of independent and identically distributed functional observations and focus on capturing the first-and second-order structure of the process. A variety of functional data is however collected sequentially over time. In such cases, the data can be described by a functional time series tX t pτ qu tPZ . Since such data mostly show serial dependence, the assumption of i.i.d. repetitions is violated. Examples of functional time series in finance are bond yield curves, where each function is the yield of the bond as a function of time to maturity (e.g. Bowsher and Meeks, 2008;Hays et al., 2012) or the implied volatility surface of a European call option as a function of moneyness and time to maturity. In demography, mortality and fertility rates are given as a function of age (e.g. Erbas et al., 2007;Hyndman and Ullah, 2007;Hyndman and Booth, 2008), while in geophysical sciences, magnometers record the strength and direction of the magnetic field every five seconds. Due to the wide range of applications, functional time series and the development of techniques that allow to relax the i.i.d. assumption have received an increased interest in recent years.
The literature on functional time series has mainly centered around stationary linear models (Mas, 2000;Bosq, 2002;Dehling and Sharipov, 2005) and prediction methods (Antoniadis et al., 2006;Bosq and Blanke, 2007;Aue et al., 2015). A general framework to investigate the effect of temporal dependence among functional observations on existing techniques has been provided by Hörmann and Kokoszka (2010), who introduce L p m approximability as a moment-based notion of dependence. Violation of the assumption of identically distributed observations has been examined in the setting of change-point detection (e.g. Berkes et al., 2009;Hörmann and Kokoszka, 2010;Aue et al., 2009;Horváth et al., 2010;Gabrys et al., 2010), in the context of functional regression by Yao et al. (2005); Cardot and Sarda (2006) and in the context of common principal component models by Benko et al. (2009).
Despite the growing literature on functional time series, the existing theory has so far been limited to strongly or weakly stationary processes. With the possibility to record, store and analyze functional time series of an increasing length, the common assumption of (weak) stationarity becomes more and more implausible. For instance, in meteorology the distribution of the daily records of temperature, precipitation and cloud cover for a region, viewed as three related functional surfaces, may change over time due to global climate changes. In the financial industry, implied volatility of an option as a function of moneyness changes over time.
Other relevant examples appear in the study of cognitive functions such as high-resolution recordings from local field potentials, EEG and MEG. It is widely known that these type of data have a time-varying spectral structure and their statistical treatment requires to take this into account. While heuristic approaches such as localized estimation are readily implemented and applied, a statistical theory for inference from nonstationary functional time series is yet to be developed.
The objective of the current paper is to develop a framework for inference of nonstationary functional time series that allows the derivation of large sample approximations for estimators and test statistics. For this, we extend the concept of locally stationary processes (Dahlhaus, 1996a) to the functional time series setting.
We show that fundamental results for multivariate time series can be carried over to the function space, which is a nontrivial task. Our work, which provides a basis for inference of nonstationary functional time series, focuses on frequency domainbased methods and therefore also builds upon the work by Panaretos and Tavakoli (2013a,b). Functional data carry infinite-dimensional intrinsic variation and in order to exploit this rich source of information, it is important to optimally extract defining characteristics to finite dimension via techniques such as functional PCA (FPCA). In the case of stationary dependent functional data, the shape and smoothness properties of the random curves are completely encoded by the spectral density operator, which has been shown to allow for an optimal lower dimension representation via dynamic FPCA (see e.g., Panaretos and Tavakoli, 2013a;Hörmann et al., 2015). Since the assumption of weak stationarity is often too restrictive, we aim to provide the building blocks for statistical inference of nonstationary functional time series and for the development of techniques such as time-varying dynamic FPCA. In particular, our framework will be essential for the development of optimal dimension reduction techniques via a local functional Cramér-Karhunen-Loève representation. Such a representation must not only take into account the between-and within curve dynamics but also that these are time-varying. Moreover, the frequency domain arises quite naturally in certain applications such as brain data imaging and is moreover very useful in nonparametric specifications. For example, Aue and van Delft (2017) use our framework to derive a test for stationarity of functional time series against nonstationary alternatives with slowly changing dynamics.
The paper is structured as follows. In section 2, we first introduce some basic notation and methodology for functional data and relate this in a heuristic manner to the concept of locally stationary time series and introduce the definition of a locally stationary functional time series. In section 3, we demonstrate that timevarying functional ARMA models have a causal solution and are functionally locally stationary according to the definition in section 2. This hinges on the existence of stochastic integrals for operators that belong to a particular Bochner space. In section 4, the time-varying spectral density operator is defined and its properties are derived. In particular, we will show uniqueness of the time-varying spectral density operator. In section 5, we derive the distributional properties of a local nonparametric estimator of the time-varying spectral density operator and deduce a central limit theorem. The results are illustrated by application to a simulated functional autoregressive process in section 6. Technical details and several auxiliary results that are of independent interest are proved in the Appendix.

Locally stationary functional time series
Let X " tX t u t"1,...,T be a stochastic process taking values in the Hilbert space H " L 2 pr0, 1sq of all real-valued functions that are square integrable with respect to the Lebesgue measure. While current theory for such processes is limited to the case where tX t u is either strictly or weakly stationary, we consider nonstationary processes with dynamics that vary slowly over time and thus can be considered as approximately stationary at a local level.
As an example, consider the functional autoregressive process X given by X t pτ q " B t`Xt´1˘p τ q`ε t pτ q, τ P r0, 1s, for t " 1, . . . , T , where the errors ε t are independent and identically distributed random elements in H and B t for t " 1, . . . , T are bounded operators on H. Assuming that the autoregressive operators B t change only slowly over time, we can still obtain estimates by treating the process as stationary over short time periods. However, since this stationary approximation deteriorates over longer time periods, standard asymptotics based on an increasing sample size T do not provide suitable distributional approximations for the finite sample estimators. Instead we follow the approach by Dahlhaus (1996aDahlhaus ( , 1997 and define local stationary processes in a functional setting based on an infill asymptotics. The main idea of this approach is that for increasing T the operator B t is still 'observed' on the same interval but on a finer grid, resulting in more and more observations in the time period over which the process can be considered as approximately stationary. Thus we consider a family of functional processes X t,T pτ q " B t{T`Xt´1,T˘p τ q`ε t pτ q, τ P r0, 1s, 1 ď t ď T, indexed by T P N that all depend on the common operators B u indexed by rescaled time u " t{T . Consequently, we in fact examine a triangular array of random functions that share common dynamics as provided by the continuous operatorvalued function B u , u P r0, 1s. For each T , a different 'level' of the sequence is thus considered where the dynamics change more slowly for increasing values of T . We will establish a class of functional time series with a time-varying functional spectral representation that includes interesting processes such as the above example and higher order time-varying functional ARMA models. The framework as provided in this paper will allow to investigate how nonstationarity affects existing methods, such as (dynamic) FPCA, and how these methods should be adjusted in order to be robust for changing characteristics. Similarly as Dahlhaus and Subba Rao (2006) and Vogt (2012) in the case of ordinary time series, we call a functional time series locally stationary if it can be locally approximated by a stationary functional time series. In the following definition, }¨} 2 denotes the L 2 -norm of H.
Definition 2.1 (Local stationarity). A sequence of stochastic processes tX t,T u tPZ indexed by T P N and taking values in H is called locally stationary if for all rescaled times u P r0, 1s there exists an H-valued strictly stationary process tX puq t u tPZ , such that › › X t,T´X puq t › › 2 ď`ˇˇt T´uˇ`1 T˘P puq t,T a.s.
for all 1 ď t ď T , where P puq t,T is a positive real-valued process such that for some ρ ą 0 and C ă 8 the process satisfies E`ˇP puq t,Tˇρ˘ă C for all t and T and uniformly in u P r0, 1s.
For the purpose of illustration, a very simple locally stationary functional time series is depicted in figure 2.1 (A). Note that visual interpretation of a functional time series can be extremely difficult, especially when it is driven by many interacting components. The process in figure 2.1 (A) is driven solely by two components and is generated as where φ 1 , φ 2 are basis functions of H and the random coefficients ξ t,T , χ t,T are independent Gaussian time-varying AR(2). The parameters of the time-varying AR(2) models are chosen in such a way that the magnitude and phase of the roots of the characteristic polynomials of ξ t,T and χ t,T vary cyclically with rescaled time t{T but in opposite direction as time progresses. The corresponding coefficient curves .1 (C) and 2.1 (D), respectively. The dependence structure of the two driving components vary from independence to close to unit root behavior and this varying cyclical behavior is also clearly visible in the resulting functional process tX t,T u. In order to contrast this with behavior observed under stationarity, figure 2.1 (B) depicts the closely related weakly stationary functional process of (2.1) where the random coefficients are generated using two stationary AR(2) with parameters specified as the time average of ξ t,T and χ t,T , respectively. A comparison of the plots for the locally stationary and the stationary case shows a clear difference in the dynamics of the two processes, which is particularly discernible in the projections on the Fourier components. The example thus indicates the effect of falsely misspecifying a locally stationary process as stationary on statistical inference.
Definition 2.1 is broad and is further investigated in Aue and van Delft (2017). It will allow for the development of statistical inference procedures for nonstationary functional time series and in particular encompasses nonlinear functional models. Nonlinear functional time series is a topic that is relatively unexplored. Possible relevant models that are worth investigating are, for instance, time-varying additive functional regression (Müller and Yao, 2008) and time-varying functional ARCH models (Hörmann et al., 2013). However, as the focus of this paper is on frequency domain based methods, it is more appropriate to work with an alternative characterization of local stationarity in terms of spectral representations, which we discuss below. We start by introducing the necessary terminology on operators and spectral representations for stationary functional time series.

Functional spaces and operators: notation and terminology
First, we introduce some basic notation and definitions on functional spaces and operators. Let pT, Bq be a measurable space with σ-finite measure µ. Furthermore, let E be a Banach space with norm }¨} E and equipped with the Borel σ-algebra. We then define L p E pT, µq as the Banach space of all strongly measurable functions f : T Ñ E with finite norm }f } p " }f } L p E pT,µq "´ż }f pτ q} p E dµpτ q¯1 p for 1 ď p ă 8 and with finite norm }f pτ q} E for p " 8. We note that two functions f and g are equal in L p , denoted as f L p " g, if }f´g} p " 0. If E is a Hilbert space with inner product x¨,¨y E then L 2 E pT, µq is also a Hilbert space with inner product xf, gy " xf, gy L 2 E pT,µq " ż xf pτ q, gpτ qy E dµpτ q.
For notational convenience, we use the shorter notation }f } p and xf, gy whenever no ambiguity about the space L p E pT, µq is possible. Similarly, if T Ă R k and µ is the Lebesgue measure on T , we omit µ and write L p E pT q, and if E " R we write L p pT, µq.
Next, an operator A on a Hilbert space H is a function A : H Ñ H. An operator A is said to be compact if the image of each bounded set under A is relatively compact. If H is separable, there exist orthonormal bases tφ n u and tψ n u of H and a monotonically decreasing sequence of non-negative numbers s n pAq, n P N converging to zero, such that A f " 8 ř n"1 s n pAq xf, ψ n y φ n (2.2) for all f P H. The values s n pAq are called the singular values of A and (2.2) is the singular value decomposition of A. For operators on H, we denote the Schatten p-class by S p pHq and its norm by~¨~p. More specifically, for p " 8, the space S 8 pHq indicates the space of bounded linear operators equipped with the standard operator norm, while for 1 ď p ă 8 the Schatten p-class is the subspace of all compact operators A on H such that the sequence spAq "`s n pAq˘n PN of singular values of A belongs to p ; the corresponding norm is given by~A~p " }spAq} p . For 1 ď p ď q ď 8, we have the inclusion S p pHq Ď S q pHq. Two important classes are the trace-class and the Hilbert-Schmidt operators on H, which are given by S 1 pHq and S 2 pHq, respectively. More properties of Schatten-class operators and in particular of Hilbert-Schmidt operators are provided in Appendix B1. Finally, the adjoint of A is denoted by A : while the identity and zero operator are given by I H and O H , respectively. As usual, the complex conjugate of z P C is denoted by z and the imaginary number by i.
The main object of this paper are functional time series X " tX t u that take values in the Hilbert space H " L 2 pr0, 1sq. More precisely, for some underlying probability space pΩ, F , Pq, let H " L 2 H pΩ, Pq be the Hilbert space of all H-valued random variables X with finite second moment E}X} 2 2 ă 8. To avoid ambiguities between the norms of H and H, we write }X} H for the norm in H and reserve the notation }X} 2 for the more frequently used norm in H. Throughout the paper, we assume that X t P H. For the spectral representation and Fourier analysis of functional time series tX t u, we also require the corresponding spaces H C " L 2 C pr0, 1sq and H C " L 2 H C pΩ, Pq. We recall some basic properties of functional time series. First, a functional time series X is called strictly stationary if, for all finite sets of indices J Ă Z, the joint distribution of tX t`j | j P Ju does not depend on t P Z. Similarly, X is weakly stationary if its first-and second-order moments exist and are invariant under translation in time. In that case, the mean function m of X is defined as the unique element of H such that xm, gy " ExX t , gy, g P H.
Furthermore, the h-th lag covariance operator C h is given by , g 1 , g 2 P H, and belongs to S 2 pHq. Since S 2 pHq is isomorphic to the tensor product, we call C h also autocovariance tensor. The covariance operator C h can alternatively be described by its kernel function c h satisfying xC h g 1 , g 2 y " ż 1 0 ż 1 0 c h pτ, σq g 1 pσq g 2 pτ q dσ dτ, g 1 , g 2 P H.
In analogy to weakly stationary multivariate time series, where the covariance matrix and spectral density matrix form a Fourier pair, the spectral density operator or tensor F ω is given by the Fourier transform of C h , A sufficient condition for the existence of F ω in S p pH C q is ř hPZ~C h~p ă 8. Since the setting of this paper allows for higher order dependence among the functional observations, we also require the notion of higher order cumulant tensors. The necessary derivations and definitions are given in Appendix B2. Throughout the remainder of this paper, time points in t1, . . . , T u will be denoted by t, s or r, while rescaled time points on the interval r0, 1s will be given by u and v. Additionally, angular frequencies are indicated with λ, α, β or ω and functional arguments are denoted by τ, σ.
Finally, we require the notion of stochastic integrals with respect to operatorvalued functions. To this end, let B 8 denote the Bochner space B 8 " L 2 S8pH C q pr´π, πs, µq of all strongly measurable functions U : r´π, πs Ñ S 8 pH C q such that where µ is a measure on the interval r´π, πs given by µpAq " ş A~F ω~1 dω for all Borel sets A Ď r´π, πs. The subspace B 2 is then defined similarly with Hilbert-Schmidt norms replacing the operator norms. We distinguish explicitly between the two spaces as the latter space allows for stronger results to be obtained but excludes interesting processes such as functional autoregressive processes.

Assumptions
In this section, we collect for better reference the assumptions required in subsequent sections. We start by the main assumptions needed for a frequency domain characterization of local stationarity. In contrast to Panaretos and Tavakoli (2013a), who only consider transfer functions in B 2 , we also prove the more general case of transfer functions in B 8 as it includes the important case of functional autoregressive processes. The necessary results are proved in section B2.3 of the Appendix.
Throughout the assumptions and the paper, we refer to S p pH C q and B p with p " 2 or p " 8 to make the distinction between the two cases. (A1) (i) tε t u tPZ is a weakly stationary white noise process taking values in H with spectral representation ε t " ş π π e iωt dZ ω , where Z ω is a 2π-periodic orthogonal increment process taking values in H C ; (ii) the functional process X t,T with t " 1, . . . , T and T P N is given by t,ω P B p and an orthogonal increment process Z ω .
(A2) There exists A : r0, 1sˆr´π, πs Ñ S p pH C q with A u,¨P B p and A u,ω being continuous in u such that for all T P N sup ω,t A pT q t,ω´A t T ,ω p " O`1 T˘. (A3) The function A¨,¨is Hölder continuous of order α ą 1{2 in u and ω.
(A4) The function A¨,¨is twice continuously differentiable in u and ω with second derivatives being uniformly bounded in u and ω. We note that a functional Cramér representation such as in (A1)(ii) can also be obtained when the spectral density operator is not well-defined; we refer the reader to van Delft and Eichler (2017b), in which a functional version of Herglotz Theorem is proved and frequency domain representations for stationary time series on the function space are further generalized. For the derivation of asymptotic results for kernel estimators of the spectral density operator, we require additional assumptions on the moments of k-order of the process ε t in (A1). The following assumption will be imposed for k ď 4 or k ă 8.
(A6) The function h : R Ñ R`is symmetric with compact support on r0, 1s and is of bounded variation.
(A7) The function K f : R Ñ R`is symmetric, has bounded variation and compact support r´1, 1s, and satisfies

Local stationarity in the frequency domain
The original definition of local stationary processes by Dahlhaus (1996a) has been formulated in the frequency domain. The following proposition can be viewed a generalization of Dahlhaus (1996a) to the functional setting.
Proposition 2.2. Suppose that assumptions (A1) and (A2) hold. Then tX t,T u is a locally stationary process in H.
Proof. For u P r0, 1s, we define the approximating stationary functional process tX puq t u tPZ by X puq t " ż π π e iωt A u,ω dZ ω .
Then we have the process satisfies the conditions of Definition 2.1 with ρ " 2.
As in the time series setting, we need the existence of a transfer operator A u,ω that is continuous in u P r0, 1s to guarantee locally an approximately stationary behavior without sudden changes. In order to include interesting cases such as autoregressive processes for which a time-varying functional spectral representation with a common continuous transfer operator A u,ω does not exist, we require that such a representation only holds approximately by condition (A2).
The previous result leads us to consider time-varying processes of the form where tε s u sPZ is a weakly stationary functional white noise process in H and tA pT q t,s u sPZ are sequences of linear operators for t " 1, . . . , T and T P N. The following result states the conditions under which such a process satisfies condition (A1).
Proposition 2.3. Suppose that tε t u tPZ satisfies assumption (A1) and, for p " 2 or p " 8, let tA pT q t,s u sPZ be a sequence of operators in S p pHq satisfying ř s~A pT q t,s~p ă 8 for all t " 1, . . . , T and T P N. Then the process satisfies assumption (A1) with A pT q t,ω P B p . The proof is relegated to section A1.1 of the Appendix. For p " 2, the proposition yields a time-varying version of the corresponding result of Panaretos and Tavakoli (2013a). The more general case p " 8 also includes linear models introduced by Bosq (2000) and Hörmann and Kokoszka (2010) as well as the important class of time-varying functional autoregressive processes, which we discuss in detail in the next section.
Remark 2.4. For processes of the form (2.4), we can alternatively verify Definition 2.1 in the time domain provided we impose some regularity conditions on the decay of the sequence of filter operators tA pT q t,s u sPZ . For example, sufficient conditions for (A1)-(A2) to be satisfied would be to assume that there exists a positive monotonically decreasing sequence t psqu sPZ that satisfies ř sPZ |s| psq ă 8 such that sup t,T~A pT q t,s~p ă K psq and that there exists a sequence tA s,u u sPN that satisfies sup t,T~A pT q t,s´As,u~p ď K psq T , for some constant K independent of T . The local asymptotic theory derived later in this paper relies however on additional smoothness conditions of the approximate transfer operators such as condition (A4). These could then be replaced by sup u~B 2 Bu 2 A s,u~p ă K psq, where B 2 Bu 2 A s,u denotes the second-order derivative of the function u Þ Ñ A s,u . Depending on the application, different conditions could be considered. The investigation of necessary restrictions on the time domain filter operators are beyond the scope of this paper and are left for future work.

Locally stationary functional autoregressive processes
Due to its flexibility as well as its simplicity, functional autoregressive processes have been found useful in numerous applications such as economics and medicine, especially for prediction purposes (see e.g., Damon and Guillas, 1982;Besse and Ramsay, 1986;Antoniadis and Sapatinas, 2003, for early work). Despite of being linear in the function space, the filter operators act on a Hilbert space of which the elements can still exhibit arbitrary degrees of nonlinearity and can therefore be seen to be highly nonlinear in terms of scalar records. Most estimation techniques are however still based on the assumption of i.i.d. functional errors. This assumption has been relaxed by Bosq (2000), where the assumption of independence of the errors of the causal solution is relaxed to uncorrelatedness in an appropriate sense, and by Hörmann and Kokoszka (2010) for functional AR(1) processes within the framework of L p -m-approximability.
In this section, we introduce a class of time-varying functional autoregressive processes for which inference and forecasting methods can be developed in a meaningful way. More specifically, we will show that time-varying functional autoregressive processes as well as the more general time-varying functional ARMA processes are locally stationary and that stationary functional ARMA(m,n) processes are a special case. For this we first need to establish that a causal solution exists for time-varying functional AR(m) processes. This is done in the theorem stated below.
Theorem 3.1. Let tε t u tPZ be a white noise process in H and let tX t,T u be a sequence of time-varying functional AR(m) given by with B u,j " B 0,j for u ă 0 and B u,j " B 1,j for u ą 1. Furthermore, suppose that (i ) the operators B u,j are continuous in u P r0, 1s for all j " 1, . . . , m; (ii ) for all u P r0, 1s, the operators satisfy ř m j"1~B u,j~8 ă 1. Then (3.1) has a unique causal solution of the form for all t P N with sup t,T In order to prove the theorem, note that we can represent the functional AR(m) process in state space form Here, Xt ,T is a m-dimensional random vector taking values in the Hilbert space H m with inner product xx, yy " ř m i"1 xx i , y i y H . Furthermore, Bů denotes a matrix of operators and thus is itself an operator on H m . Consequently, we can write the functional AR(m) process more compactly as Proof of Theorem 3.1. In order to show that a causal solution exists in the locally stationary setting, we require the following result which is proved in Appendix A1.2.
We note that this is a weaker assumption than~Bů~8 ă 1. Although~B˚k 0 u~8 ă 1 is usually stated as the condition for a causal solution in the stationary case, the condition ř m j"1~B u,j~8 ă 1 is easier to check in practice. As a consequence of this lemma, the assumption ř m j"1~B u,j~8 ă 1 for all u P r0, 1s implies that the spectral radius of Bů satisfies rpBůq " sup

From (3.3), this implies a solution is given by
where r¨s 1,1 refers to the upper left block element of the corresponding block matrix of operators. In order to prove the theorem we shall proceed in a similar manner as Künsch (1995) and derive that for some constant c and ρ ă 1. For this, we require the following lemma.
Lemma 3.3. Let BpHq be the algebra of bounded linear operators on a Hilbert space. Then for each A P BpHq and each ε ą 0, there exists an invertible element M of BpHq such that rpAq ď~M AM´1~8 ď rpAq`ε.
Since BpHq forms a unital C˚-algebra, this lemma is a direct consequence of a result in Murphy (1990)[p.74]. Lemma 3.3 together with (3.4) imply we can specify for fixed u a new operator M puq P BpHq such that Because of the continuity of the autoregressive operators in u, we have that for all u P r0, 1s, there exists a neighborhood Vpuq such that Define now the finite union Ť r i"1 Vpu i q with Vpu i q X Vpu l q " ∅ for i ‰ l. Due to compactness and the fact that Bů " B0 for u ď 0 this union forms a cover of p´8, 1s. The preceding then implies that there exists a constant c such that Bv~8 ď c~M pu i qBvM´1pu i q~8, i " 1, . . . , r. Now, fix t and T and define the set J i,l " ts ě 0 : t´s T P Vpu i qu X t0, 1, . . . , l´1u. Then specify ρ " 1 1`δ{3 to obtain `l´1 ś which gives the result.
Theorem 3.1 will be used to show that time-varying functional ARMA models for which a functional spectral representation exists satisfy conditions (A1) and (A2) and hence by Proposition 2.2 are locally stationary. Before we can consider general time-varying functional ARMA models we first need the following result, which shows that for time-varying functional autoregressive processes there exists a common continuous transfer operator A u,ω that satisfies condition (A2).
Theorem 3.4. Let tε t u tPZ be a white noise process in H and let tX t,T u be a sequence of functional autoregressive processes given by If the process satisfies, for all u P r0, 1s and p " 2 or p " 8, the conditions (i ) C u is an invertible element of S 8 pHq; (ii ) B u,j P S p pHq for j " 1, . . . , m with ř m j"1~B u,j~l ă 1 and B u,0 " I H ; (iii ) the mappings u Þ Ñ B u,j for j " 1, .., m and u Þ Ñ C u , are continuous in u P r0, 1s and differentiable on u P p0, 1q with bounded derivatives, then the process tX t,T u satisfies (A2) with and thus is locally stationary.
The proof of Theorem 3.4 is relegated to Appendix A1.2. As shown in Theorem 3.1, a sufficient condition for the difference equation (3.6) to have a causal solution is ř m j"1~B u,j~8 ă 1 or~B˚k 0 u~8 ă 1 for some k 0 ě 1. The moving average operators will then satisfy ř 8 l"0~A pT q t,l~8 ă 8, and Proposition 2.3 shows that X t,T satisfies assumption (A1) with A pT q t,ω P B 8 . It follows from (3.5) that timevarying functional AR(m) processes that have a causal solution with moving average operators satisfying ř 8 l"0~A pT q t,l~2 ă 8 do not exist. Instead we need at least A pT q t,0 to be an invertible element of S 8 pHq together with ř m j"1~B u,j~2 ă 1. By Proposition B1.6, this case is covered by Proposition 2.2 with A t T ,ω P S 2 pH C q in condition (A2).
For stationary functional AR(m) processes this is straightforward to verify using back-shift operator notation and by solving for the inverse of the autoregressive lag operator. Under slightly more restrictive assumptions it is possible to obtain uniform convergence results for processes with transfer operators A pT q t,ω P B 2 . We will come back to this in sections 4 and 5, in which we consider capturing the changing second-order dependence structure via the time-varying spectral density operator.
Using Theorem 3.4, it is now straightforward to establish that the time-varying functional ARMA processes are locally stationary in the sense of Proposition 2.2. A time-varying functional moving average process of order n has transfer operator where Φ t{T,j P S p pHq are the moving average filter operators. This follows from the spectral representation of the ε t . Setting A t T ,ω " A pT q t,ω gives the result. Finally, we can combine this with Theorem 3.4, to obtain that conditions (A1) and (A2) are satisfied for time-varying functional ARMA(m,n) with common continuous transfer operator given by For operators that do not depend on t, this result proves the existence of a welldefined functional Cramér representation for weakly stationary functional ARMA(m,n) processes as discussed in Bosq (2000) or as in Hörmann and Kokoszka (2010). The latter is easily seen by means of an application of the dominated convergence theorem and by defining the m-dependent coupling process by X pmq t,T " g t,T pε t , . . . , ε t´m`1 , εt´m, ε p˚q t´m´1 , . . .q for measurable functions g t,T : H 8 Ñ H with t " 1, . . . , T and T P N and where tεt u is an independent copy of tε t u.

Time-varying spectral density operator
We will now introduce the time-varying spectral density operator and its properties. We will show that the uniqueness property of the time-varying spectral density established by Dahlhaus (1996a) also extends to the infinite dimension. Let X t,T satisfy conditions (A1) and (A2) with A pT q t,ω " A pT q 1,ω for t ă 1 and A pT q t,ω " A pT q T,ω for t ą T . We define the local autocovariance operator as the cumulant tensor where tsu denotes the largest integer not greater than s. This operator belongs to S 2 pHq and hence has a local autocovariance kernel c pT q u,s P L 2 pr0, 1s 2 q given by xC pT q u,s g 1 , g 2 y " ż ż c pT q u,s pτ, σqg 1 pσqg 2 pτ qdσdτ g 1 , g 2 P H. that the Fourier transform of (4.1) is a well-defined element of S 2 pH C q and is given by For fixed T , this operator can be seen as a functional generalization of the Wigner-Ville spectral density matrix (Martin and Flandrin, 1985) and we shall therefore refer to it as the Wigner-Ville spectral density operator F pT q u,ω . It is easily shown that the Fourier transform of the autocovariance kernel c pT q u,s , for fixed t and T , forms a Fourier pair in L 2 with the kernel of F pT q u,ω , referred to as the Wigner-Ville spectral density kernel More specifically, given ř sPZ }c pT q u,s } p ă 8 for p " 2 or p " 8, the spectral density kernel is uniformly bounded and uniformly continuous in ω with respect to }¨} p . Additionally, the inversion formula holds in }¨} p for all s, u, T , τ , and σ. This formula and its extension to higher order cumulant kernels is direct from an application of the dominated convergence theorem. Under additional assumptions, certain results presented in this paper will hold uniformly rather than in mean square. Sufficient would be to assume that the functional white noise process tε t u tPZ is mean square continuous and that the sequence of operators tA pT q t,s u sPZ is Hilbert-Schmidt with continuous kernels for all t " 1, . . . , T and T P N. The process tX t,T u is then itself mean square continuous and a slight adjustment in the proof of Proposition 4.1 demonstrates that ř sPZ › › c pT q u,s › › 8 ă 8. This is also expected to be the case for yet-to-be-developed concepts such as a time-varying Cramér-Karhunen-Loève representation. The results obtained in the previous section however demonstrate that a representation under these stronger conditions excludes time-varying functional AR(m) models. We will therefore not impose them but merely remark where stronger results could be obtained.
The pointwise interpretation of the L 2 -kernels makes it easy to verify that the Wigner-Ville spectral operator F pT q u,ω is 2π-periodic in ω and self-adjoint. Namely, c pT q u,´s pσ, τ q " c pT q u,s pτ, σq implies f : pT q u,ω pσ, τ q " f pT q u,ω pτ, σq, where f : the kernel function of the adjoint operator F : . Moreover, F ε ω is trace-class by Parseval's identity and therefore Proposition B1.3 implies that (4.3) is actually an element of S 1 pH C q. We will show in the following that (4.3) converges in integrated mean square to the time-varying spectral density operator defined as The time-varying spectral density operator satisfies all of the above properties and is non-negative definite since for every ψ P L 2 C pr0, 1sq, u,ω ψy ě 0, which is a consequence of the non-negative definiteness of F ε ω . For any two elements ψ, ϕ in L 2 C pr0, 1sq, one can interpret the mapping ω Þ Ñ xψ, F u,ω ϕy " xF u,ω ψ, ϕy P C to be the local cross-spectrum of the sequences txψ, X puq t yu tPZ and txϕ, X puq t yu tPZ . In particular, ω Þ Ñ xψ, F u,ω ψy ě 0 can be interpreted as the local power spectrum of txψ, X puq t yu tPZ for all u P r0, 1s. In analogy to the spectral density matrix in multivariate time series, we will show below that the local spectral density operator completely characterizes the limiting second-order dynamics of the family of functional processes tX t,T : t " 1, . . . , T u T PN .
Using identity (B1.3), we have that We can therefore write the left-hand side of (4.7) as Consider the operator and its continuous counterpart By Hölder's inequality for operators (Proposition B1.3), both are trace-class and hence Hilbert-Schmidt. Another application of Hölder's inequality together with assumption (A2) yields It is therefore sufficient to derive a bound on A similar argument as in (4.8) shows that for some constant C ą 0. The operator-valued function G u,ω is therefore Hölder continuous of order α ą 1{2 in u. Note that (4.5) implies the inverse Fourier transform of this operator is a well-defined element of S 2 pH C q. We can therefore write (4.9) as whereG s can be viewed as the s-th Fourier coefficient operator of G s 2T ,λ . Because of Hölder continuity, these operators satisfy~G s~2 ď }π α`1~G u,ω~2 |s|´α " Ops´αq. Hence, Concerning the partial sum ř n´1 s"0 |ĝ s pτ, σq| 2 , we proceed as in Dahlhaus (1996a) and use summation by parts to obtain which follow from the properties ofG s and Lemma B3.1. It is straightforward to see that ř n´1 s"0~G´s~2 2 satisfies the same bound. Hence, Choosing an appropriate value n ! T completes the proof.
Intuitively, the value of n such that n logpnq T´α Ñ 0 can be seen to determine the length of the data-segment over which the observations are approximately stationary. Only those functional observations X t,T from the triangular array with t{T P " uǹ T , u´n T ‰ will effectively contribute to the time-varying spectral density operator at u. As T increases, the width of this interval shrinks and sampling becomes more dense. Because the array shares dynamics through the operator-valued function A u,ω , which is smooth in u, the observations belonging to this interval will thus become close to stationary as T Ñ 8. The theorem therefore implies that, if we have infinitely many observations with the same probabilistic structure around some time point t, the local second-order dynamics of the family are completely characterized by F u,ω .
The above theorem provides a promising result. It is well-known from the time series setting that a Cramér representation as given in Proposition 2.2 is in general not unique (e.g. Priestley, 1981). However, Theorem 4.2 shows that the uniqueness property as proved by Dahlhaus (1996a) generalizes to the functional setting. That is, if the family of H-valued processes tX t,T : t " 1, . . . , T u T PN has a representation with common transfer operator A u,ω that operates on H C and that is continuous in u, then the time-varying spectral density operator will be uniquely determined from the triangular array. This uniqueness of the time-varying spectral density operator is expected to be extremely valuable in the development of inference methods. For example, it would be of interest to determine whether this result will allow to develop Quasi Likelihood methods to fit parametric models in the functional setting. Such an extension is not direct and has to take into account the compactness of the operator and the properties of Toeplitz operators in the infinite dimension. This is however beyond the scope of this paper and the authors will consider this in future work.
Remark 4.3. If assumptions (A1) and (A2) hold with p " 2, we have by continuity of the inner product that the kernel a u,ω of A u,ω is well-defined in L 2 C pr0, 1s 2 q and hence is uniformly Hölder continuous of order α ą 1{2 in both u and ω. If we thus additionally assume that the tε t u tPZ are mean square continuous and the operator A u,ω is an element of B 2 of which the Hilbert-Schmidt component has a kernel that is continuous in its functional arguments, the error holds in uniform norm.

Estimation
The time-varying spectral density operator as defined in section 4.2 allows to capture the complete second-order structure of a functional time series with possibly changing dynamics. In order to consider inferential techniques such as dynamic FPCA for nonstationary functional time series, functional Whittle likelihood methods or other general testing procedures, we require a consistent estimator for the time-varying spectral density operator. In this section, we present a nonparametric estimator of the time-varying spectral density operator. It should be noted that this requires a careful consideration of certain concepts on the function space, details of which are relegated to sections B2-B3 of the Appendix, and that there are some discrepancies compared to existing results available in the Euclidean setting.
The section is structured as follows. First, we define a functional version of the segmented periodogram and derive its mean and covariance structure. We then consider a smoothed version of this operator and show its consistency. Finally, we provide a central limit theorem for the proposed estimator of the time-varying spectral density operator. Proofs of this section can be found in section A1.4 of the Appendix.

The functional segmented periodogram tensor
The general idea underlying inference methods in the setting of locally stationary processes is that the process X t,T can be considered to be close to some stationary process, say X pu 0 q t , on a reasonably small data-segment around u 0 . If this segment is described by tt : | t T´u 0 | ď b t {2u for some bandwidth b t , classical estimation methods from the stationary framework can be applied on this stretch. The estimated value is subsequently assigned to be the value of the parameter curve at the midpoint u 0 of the segment. The entire parameter curve of interest in time-direction can then be obtained by shifting the segment. We will also apply this technique in the functional setting.
First, let the length of the stretch considered for estimation be denoted by N T , where N T is even and N T ! T . In the following, we will drop the explicit dependence of N on T and simple write N " N T . Then the local version of the functional Discrete Fourier Transform (fDFT) is defined as in ω that takes values in H C . The data-taper is used to improve the finite-sample properties of the estimator (Dahlhaus, 1988): firstly, it mitigates spectral leakage, which is the transfer of frequency content from large peaks to surrounding areas and is also a problem in the stationary setting. Secondly, it reduces the bias that stems from the degree of nonstationarity of the process on the given data-segment, that is, the fact that we use the observations X t,T for estimation rather than the unknown stationary process X pu 0 q t . We define the data-taper by a function h : r0, 1s Ñ R and setting h s,N " h`s N˘; the taper function h should decay smoothly to zero at the endpoints of the interval while being essentially equal to 1 in the central part of the interval. Thus the taper gives more weight to data-points closer to the midpoint.
As a basis for estimation of the time-varying spectral density operator, we consider the normalized tensor product of the local functional Discrete Fourier Transform. This leads to the concept of a segmented or localized periodogram tensor is the finite Fourier transform of the k-th power of the data-taper. Given the moments are well-defined in L 2 C pr0, 1s 2 q, the corresponding localized periodogram kernel is given by Similar to the stationary case, sufficient conditions for the existence of the higher order moments of the localized periodogram tensor are obtained from To ease notation, we denote t u,r " tuT u´N {2`r`1 to be the r-th element of the data-segment with midpoint u. For u j " j{T we also write t j,r " t u j ,r and abbreviate u j,r " t j,r {T . The following result is used throughout the rest of the paper.
Proposition 5.1. Suppose that assumption (A1) holds with A pT q t,ω P B 8 , and additionally sup ω 1 ,...,ω k´1~F ε ω 1 ,...,ω k´1~2 ă 8. Then where the equality holds in the tensor product space H C b¨¨¨b H C . Moreover, for fixed t P t1, . . . , T u and T P N, the k-th order cumulant spectral tensor of the linear functional process tX t,T u, Note that under the stronger condition A pT q t,ω P B 2 , the tensor F pt,T q λ 1 ,...,λ k´1 will be trace-class for all k ě 2. The above proposition implies that the higher order cumulant tensor of the local fDFT can be written as Here, the function H N pG ‚ , ωq and similarly H k,N pG ‚ , ωq generalize the definitions of H N and H k,N to where pA r q r"0,...,N´1 and pB r q r"0,...,N´1 are vectors of tensors or operators. From the taper function h, we derive the smoothing kernel K t in rescaled time u by for x P r´1 2 , 1 2 s and zero elsewhere; furthermore, we define the bandwidth b t,T " N {T that corresponds to segments of length N , and set K t,T pxq " 1 b t,T K t`x b t,T˘. Finally, we define the kernel-specific constants The first order and second order properties of the segmented functional periodogram can now be determined.
Theorem 5.2. Suppose that assumptions (A1), (A2), (A4), and (A5) with k ď 4 hold. Then the mean and covariance structure of the local functional periodogram are given by and cov`xI pT q u,ω 1 , g 1 b g 2 y, xI pT q u,ω 2 , g 3 b g 4 y" for all g 1 , g 2 , g 3 , g 4 P H C .
The proof exploits assumption (A2) and is based on the theory of L-functions (Dahlhaus, 1983), which allows to provide upper bound conditions on the datataper function. Details of the extension of the latter to the functional setting can be found in section B3 of the Appendix.

Consistent estimation
Theorem 5.2 shows that the segmented periodogram tensor is not a consistent estimator. In order to obtain a consistent estimator we proceed by smoothing the estimator over different frequencies. That is, we consider convolving the segmented periodogram tensor with a window function in frequency direction where b f,T denotes the bandwidth in frequency direction. To ease notation, we also write K f,T pωq " 1 b f,T K f`ω b f,T˘. Additionally we use subsequently as an abbreviation for kernel-specific constants. ExF pT q u,ω g 1 , g 2 y " xF u,ω g 1 , g 2 y`1 Bu 2 xF u,ω g 1 , g 2 y`1 and covariance structure cov`xF pT q u,ω 1 , g 1 b g 2 y, xF pT q u,ω 2 , g 3 b g 4 y" 2π }K t } 2 2 b t,T T ż Π K f,T pω 1´λ1 q K f,T pω 2´λ1 q xF u,λ 1 g 3 , g 1 y xF u,´λ 1 g 4 , g 2 y dλ 1
The proof follows from a multivariate Taylor expansion and an application of Lemma P4.1 of Brillinger (1981). We note that the covariance has greatest magnitude for ω 1˘ω2 " 0pmod2πq, where the weight is concentrated in a band of width Opb f,T q around ω 1 and ω 2 respectively. The above result shows that the bandwidths need to decay to zero with an appropriate rate in order to obtain consistency.
The proof follows straightforwardly from a change of variables and a functional generalization of approximate identities (e.g., Edwards, 1967).
Since~F u,ω~2 is moreover uniformly bounded in u and ω, the statement follows directly from equation (A1.15).
Remark 5.6. Theorem 5.2, Theorem 5.3, Proposition 5.4, and Corollary 5.5 can be shown to hold in a stronger sense under additional assumptions. Namely, if the respective set of assumptions hold with p " 2 then the kernel function of the transfer operator A pT q t,ω is well-defined. If this function is continuous and the white noise process tε t u is moreover mean square continuous with sup ω 1 ,...,ω k´1 }f ε ω 1 ,¨¨¨,ω k´1 } 8 ă 8 for k ď 4, the aforementioned statements hold in uniform norm.
Theorem 5.7 (Convergence in integrated mean square). Suppose that assumptions (A1),(A2), and (A4) to (A8) with k " 2, 4 hold. Then the spectral density operator is consistent in integrated mean square. More precisely, we have Since it is uniform in ω P Π, we have pointwise mean square convergence where the error also satisfies E~F pT q The proof follows almost straightforwardly from decomposing the above in terms of its variance and its squared bias.
Remark 5.8 (Discrete observations). In practice, functional data commonly are observed not continuously but only on a discrete grid. In this case, the above consistency results continue to hold only under additional regularity assumptions. To illustrate the effect of discrete observations, suppose that, for given T , we observe the random functions X t pτ q, t " 1, . . . , T , on the grid points 0 ď τ 1 ă . . . ă τ M ď 1. The corresponding discrete data x i,t " X t pτ i q are interpolated to yield functions X M t pτ q in some M -dimensional subspace H M of H. Furthermore, let P M X t be the orthogonal projection of X t onto H M . Then the mean square error of the estimatorF u,ω pX M q based on the discrete observation can be bounded by application of Minkowski's inequality by E~F u,ω pX M q´F u,ω~2 2 ď 2 E~F u,ω pX M q´F u,ω pP M Xq~2 2`2 E~F u,ω pP M Xq´F u,ω~2 2 . (5.16) Here, the first term can be interpreted as the error due to discretization. In the second term, the mean square error ofF u,ω pP M Xq can be rewritten as where F u,ω pP M Xq is the time-varying spectral density operator of the process P M X t . The first term describes the estimation error and, by Parseval's equality, is bounded by E~F u,ω pP M Xq´F u,ω pP M Xq~2 2 ď E~F u,ω pXq´F u,ω~2 2 and hence converges to zero as T Ñ 8 in a local stationary framework. Finally, the second term is the approximation error due to replacing the functions X t by their projections P M X t . Under suitable regularity conditions on the functions X t and for an increasingly dense grid, the approximation error can be made arbitrarily small. Furthermore, the discretization error }X M t´P M X t } 2 converges to zero such that for an appropriate rate of M Ñ 8, the error due to discretization in 5.16 also tends down to zero. The detailed derivations are similar to those in section 5 of Panaretos and Tavakoli (2013b) and are therefore omitted.

Weak convergence of the empirical process
The results of the previous section give rise to investigating the limiting distribution ofF pT q u,ω , the local estimator of the spectral density operator. We will proceed by showing that joint convergence of its kernelf pT q u,ω to complex Gaussian elements in L 2 C pr0, 1s 2 q can be established.
Consider the sequence of random elements`Ê pT q u,ω˘T PN in L 2 C pr0, 1s 2 q, wherê or fixed ω P r´π, πs and u P r0, 1s. In order to establish convergence in L 2 C pr0, 1s 2 q, it is more appropriate to consider the representation ofÊ Hence, the distribution ofÊ pT q u,ω is fully characterized by the finite-dimensional distribution of the coefficients of its basis representation. Furthermore, weak convergence ofÊ pT q u,ω will follow from the weak convergence of`xÊ pT q u,ω , ψ mn y˘m ,nPN in the sequence space 2 C . Subsequently, we identifyÊ pT q u,ω with its dual pÊ pT q u,ω q˚P L 2 C pr0, 1s 2 q˚and writeÊ pT q u,ω pφq " xÊ pT q u,ω , φy for all φ P L 2 C pr0, 1s 2 q.
To show convergence to a Gaussian functional process, we make use of the following result by Cremers and Kadelka (1986), which weakens the tightness condition usually employed to prove weak convergence and generalizes earlier results by Grinblat (1976).
Lemma 5.9. Let pT, B, µq be a measure space, let pE, |¨|q be a Banach space, and let pX n q nPN be a sequence of random elements in L p E pT, µq such that (i ) the finite-dimensional distributions of X n converge weakly to those of a random element X 0 in L p E pT, µq and (ii ) lim sup Then X n converges weakly to X 0 in L p E pT, µq.
In our setting, the weak convergence of the processÊ pT q u,ω in L 2 C pr0, 1s 2 q will follow from the joint convergence ofÊ pT q u,ω pψ m 1 ,n 1 q, . . . ,Ê pT q u,ω pψ m k ,n k q for all k P N and the as T Ñ 8. In contrast, Panaretos and Tavakoli (2013b) employ the slightly stronger conditionˇˇÊ pT q u,ω pψ mn qˇˇ2 ď φ mn for all T P N and m, n P N and some sequence pφ mn q P 1 . In fact, the condition corresponds in our setting to the one given in Grinblat (1976). Finally, we note that condition (5.17) is sufficient for our purposes, but recently it has been shown (Bogachev and Miftakhov, 2015) that it can be further weakened to For the convergence of the finite-dimensional distributions, we show convergence of the cumulants of all orders to that of the limiting process. For the first and second order cumulants ofÊ pT q u,ω pψ mn q, this follows from Theorem 5.3. It therefore remains to show that all cumulants of higher order vanish asymptotically.
The distributional properties of the functional process can now be summarized in the following theorem. where E u,ω j , j " 1, . . . , J, are jointly Gaussian elements in L 2 C pr0, 1s 2 q with means E`E u,ω i pψ mn q˘" 0 and covariances cov´E u,ω i pψ mn q, E u,ω j pψ m 1 n 1 q" for all i, j P 1, . . . , J and m, m 1 , n, n 1 P N.
Proof of Theorem 5.11. For condition 5.17, we note that and it therefore is satisfied by Theorem 5.3. Together with the convergence of the finite-dimensional distributions this proves the asserted weak convergence.

Numerical simulations
To illustrate the performance of the estimator in finite samples, we consider a timevarying functional time series with representation where B u,1 P B 8 is continuous in u P r0, 1s and where tε t u is a collection of independent innovation functions. In order to generate the process, let tψ i u iPN be an orthonormal basis of H and denote the vector of the first k Fourier coefficients of X t,T by X pT q t " pxX t,T , ψ 1 y, . . . , xX t,T , ψ k yq 1 . Similar to Hörmann et al. (2015), we exploit that the linearity of the autoregressive operator implies the first k Fourier coefficients, for k large, approximately satisfy a VAR(1) equation. That is, where ε t " pxε t , ψ 1 y, . . . , xε t , ψ k yq 1 and B t T ,1 " pxB t T ,1 pψ i q, ψ j y, 1 ď i, j ď kq. Correspondingly, the local spectral density kernel will satisfy where f pT q u,ω is the spectral density matrix of the Fourier coefficients in (6.2). Implementation was done in R together with the fda package. For the simulations, we choose the Fourier basis functions on r0, 1s. The construction of the estimator in (5.11) requires specification of smoothing kernels and corresponding bandwidths in time-as well as frequency direction. Although the choice of the smoothing kernels usually does not affect the performance significantly, bandwidth selection is a well-known problem in nonparametric statistics. As seen from Theorem 5.3, both bandwidths influence the bias-variance relation. Depending on the persistence of the autoregressive process a smaller bandwidth in frequency direction is desirable around the peak (at λ " 0 for the above process), while slow changes in time direction allow for tapering (i.e., smoothing in time direction) over more functional observations. It would therefore be of interest to develop an adaptive procedure as proposed in van Delft and Eichler (2015) to select the bandwidth parameters. Investigation of this is however beyond the scope of the present paper. In the examples below, the bandwidths were set fixed to b t,T " T´1 {6 and b f,T " 2T´1 {5´b t,T . We chose as smoothing kernels which have been shown to be optimal in the time series setting (Dahlhaus, 1996b). In order to construct the matrix B t T ,1 , we first generate a matrix A u with entries that are mutually independent Gaussian where the pi, jq-th entry has variance ui´2 c`p 1´uqe´i´j.
The entries will tend to zero as i, j Ñ 8 , because the operator B t T ,1 is required to be bounded. The matrix B u,1 is consequently obtained as B u,1 " ηA u {~A u~8 . The value of η thus determines the persistence of the process. Additionally, the collection of innovation functions tε t u is specified as a linear combination of the Fourier basis functions with independent zero-mean Gaussian coefficients such that the l-th coefficient xε t , ψ l y has variance 1{rpl´1.5qπs 2 . The parameters were set to c " 3 and η " 0.4. To visualize the variability of the estimator, figure 6.1 depicts the amplitude of the true spectral density kernel of the process for various values of u and λ with 20 replications of the corresponding estimator superposed for different sample sizes T . For each row, the same level curves were used where each level curve has the same color-coding within that row. The first two rows of figure 6.1 give the different levels for the estimator around the peak in frequency direction, while the last row provides contour plots further away from the peak. Increasing the sample size leads to less variability, as can be seen from the better aligned contour lines. It can also be observed that the estimates become more stable as we move further away from the peak. Nevertheless, the peaks and valleys are generally reasonably well captured even for the contour plots in the area around the peak.
As a second example, we consider a FAR(2) with the location of the peak varying with time. More specifically, the Fourier coefficients are now obtained by means of a VAR(2) X pT q t 2~8 . The entries of the matrices A u,1 and A u,2 are mutually independent and are generated such that rA u,1 s i,j " N p0, e´p i´3q´pj´3q q and rA u,2 s i,j " N p0, pi 8{2`j2{2 q´1q, respectively. The norms are specified as η u,1 " 0.4 cosp1.5´cospπuqq and η u,2 "´0.5. This will result in the peak to be located at λ " arccosp0.3 cosr1.5´cospπuqsq. The collection of innovation functions tε t u is chosen such that the l-th coefficient xε t , ψ l y has variance 1{rpl´2.65qπs 2 . Figure 6.2 provides the contour plots for different local λ " 0 true T " 2 9 T " 2 12 T " 2 16 u " 0.25 time values where the frequency was set to λ " 1.5´cospπuq, i.e., the direction in which most change in time-direction is visible in terms of amplitude. We observe good results in terms of identifying the peaks and valleys overall where again the variability clearly reduces for T ą 512. For the value u " 0.5, one is really close to the location of a peak and observe wrongful detection of a small peak in the middle of the contour plot. This is an indication some over-smoothing occurs which, to some extent, is difficult to prevent for autoregressive models, even in the stationary time series case.

Concluding remarks
This paper forms a basis for the development of statistical techniques and methods for the analysis of nonstationary functional time series. We have provided a theoretical framework for meaningful statistical inference of functional time series with dynamics that change slowly over time. For this, the notion of local stationarity was introduced for time series on the function space. We focused on a class of functional locally stationary processes for which a time-varying functional Cramér representation exists. The second-order characteristics of processes belonging to this class are completely captured by the time-varying spectral density operator. We moreover introduced time-varying functional ARMA processes and showed that these belong to the class of locally stationary functional processes. In the last section, we considered the nonparametric estimation of the time-varying spectral density operator. To derive the asymptotic distribution, a weaker tightness criterion is used than what is common in the existing literature. The results derived in this paper give rise to consider Quasi-likelihood methods on the function space as well as the development of prediction and appropriate dimension reduction techniques for nonstationary time series on the function space. This is left for future work.
Acknowledgements. This work has been supported in part by Maastricht University, the contract "Projet d'Actions de Recherche Concertées" No. 12/17-045 of the "Communauté française de Belgique" and by the Collaborative Research Center "Statistical modeling of nonlinear dynamic processes" (SFB 823, Project A1, C1, A7) of the German Research Foundation (DFG). The authors sincerely thank the Editor and three Referees for their constructive comments that helped to produce an improved revision of the original paper.

Appendix A1. Proofs
In this appendix, we prove the main theoretical results of the paper.

A1.1. Proofs of section 2
Proof of Proposition 2.3. For fixed t P t1, . . . , T u and T P N, let U s,ω " e iωpt´sq A pT q t,s . We have where T is the mapping defined by linear extension of T pU 1 rα,βq q " U pZ β´Zα q (see section B2.3). By definition of the operator U s,¨, }U s,¨} The continuity of the mapping T then implies a.e. in H.
To ease notation, we shall write I and O for the identity and zero operator on H, respectively while we denote the identity operator on H m by I H m . Consider the bounded linear operatorP u pλq on H P u pλq " λ m I´λ m´1 B u,1´. ..´λB u,m´1´Bu,m , λ P C.
It is straightforward to derive that, under the assumption ř m j"1~B u,j~8 ă 1, noninvertibility ofP u pλq implies that λ has modulus strictly less than 1. Define the invertible matrices U u pλq and M u pλq on the complex extension H m by where P u,0 pλq " I and P u,j pλq " λ u P u,j´1 pλq´B u,j for j " 1, . . . , m. Then from which it follows that pλ I H m´Bůq is not invertible whenP u pλq is not invertible. In other words, the spectrum S u of Bů over the complex extension of H m , which is a closed set, satisfies S u " tλ P C : λ I H m´Bů not invertibleu Ă tλ P C :P u pλq not invertibleu " tλ P C : |λ| ă 1u.
Hence, the assumption that ř m j"1~B u,j~8 ă 1 for all u, implies that the spectral radius of Bů satisfies for some δ ą 0. The equality is a well-known result for the spectral radius of bounded linear operators 1 and can for example be found in Dunford and Schwartz (1958).
From (A1.1) it is now clear that there exists a k 0 P Z, α P p0, 1q and a constant c 1 such that for all k ě k 0~B˚k Finally, it has been shown in Bosq (2000, p.74) that this is equivalent to the conditioñ B˚k 0 u~8 ă 1 for some integer k 0 ě 1. Proof of Theorem 3.4. The moving average representation (3.2) and the difference equation (3.6) together imply that the process can be represented as Using the linearity of the operators and applying a change of variables l 1 " l`j, this can be written as For a purely nondeterministic solution we require Because ε t is white noise in H, it has spectral representation Since a solution of the form (3.2) exists, we also have t,l e´i ωl . Substituting the spectral representations of X t,T and ε t into (3.6), we get together with the linearity of the operators B u,j and A Given the operator A t T ,ω satisfies equation (3.7), the previous implies we can write 1 ? 2π

From the last equation, it follows that
where Ω pT q t,ω " O H , t ď 0. We will show that this operator is of order Op 1 T q in S p pH C q. By Proposition B1.6, the smooth transfer operator satisfies A u,ω P S p pH C q. Under the conditions of Theorem 3.4, we have that for any element ψ P H C and fixed ω P Π, the mapping u Þ Ñ A u,ω pψqpτ q is continuous and, from the properties of the B u,j , is differentiable and has bounded derivatives with respect to u. By the Mean Value Theorem, we obtain for all ω P Π, uniformly in u. It then easily follows from equation (A1.5) and Proposition B1.3 that~C t T Ω pT q t,ω~p " Op 1 T q uniformly in t, ω. From (A1.3), we additionally have Since the moving average operators are either in S 2 pHq or in S 8 pHq, the above together with another application of Hölder's inequality for operators yields sup t,ω~A for some constant K independent of T . We remark that the state space representation of the previous section allow similarly to derive that (A1) and (A2) still hold for p " 8 under the weaker assumption of Lemma 3.2.

A1.3. Proofs of section 4
Proof of Proposition 4.1. For fixed t and T , we have by Minkowski's inequality where t¨u A denotes the complement event. Now since A pT q t,ω " A 0,ω for t ă 1 and A pT q t,ω " A 0,ω for t ą T , we can write It thus corresponds to the cross-covariance operator of the two stationary processes X p0q t and X p1q t at lag s. By Propsosition B2.4, we can alternatively express this as cumpX p0q s , X p1q 0 q " ÿ l,k pA 0,l b A 1,k qcumpε t`s´l , ε t´k q.
Using then that ε t is functional white noise, we find for the second term in (A1.6) The result now follows.

A1.4. Proofs of section 5
Proof of Proposition 5.1. We have by Theorem B2.2 and by Proposition B2.1, where the equality holds in the tensor product space H C b¨¨¨b H C . Note that the last line corresponds to the inversion formula of the cumulant tensor of order k. For fixed t P t1, . . . , T u and T P N, the k-th order cumulant spectral tensor of the linear functional process tX t,T u can thus be given by F pt,T q λ 1 ,...,λ k´1 "´A pT q and is well-defined in the tensor product space Â k i"1 H C . In particular, Proposition B1.3 implies the corresponding operator is Hilbert-Schmidt for k ě 2 We therefore have that the kernel function f pt,T q λ 1 ,...,λ k´1 pτ 1 , . . . , τ k q is a properly defined element in L 2 C pr0, 1s 2 q. In case k " 2, we moreover have that F λ 1 P S 1 pH C q. This follows by the fact that the ε t are white noise and thus~F ε λ 1~1 ď ř t~c umpε t , ε 0 q~1 " C ε 0~1 " E}ε 0 } 2 2 ă 8.
Proof of Theorem 5.2. Under assumption (A1)-(A2) we have for all t " 1, .., T and T P N that X t,T are locally stationary random elements in H. Therefore, by Proposition 5.1 and (5.7), In order to replace the transfer function kernels with their continuous approximations, we write We focus on finding a bound on the first term as the second term can be bounded similarly. Since H N p¨,¨q is linear in its first argument, we have by the triangle inequality For the first term of this expression, assumption (A2) and Lemma B3.3 imply for some generic constant C independent of T . Next, we consider the second term. Similarly as in the proof of Lemma B3.3, we have H N`Au j,‚ ,λ´Au j,‚ ,ω , ω´λ" H N pω´λq`A u j ,λ´Au j ,ω˘`HN pω´λq`A u j,N´1 ,λ´Au j,N´1 ,ωN´1 ř r"0 "´A u j,r ,λ´Au j,r´1 ,λ¯´´Au j,r ,ω´Au j,r´1 ,ω¯ı H s pω´λq.
Since the transfer function operator is twice continuously differentiable in u and ω, we find by two applications of the mean value theorem ´A u j,r ,λ´Au j,r´1 ,λ¯´´Au j,r ,ω´Au j,r´1 ,ω¯ 8 ď sup uPr0,1s,ωPΠ Hence we obtain the upper bound Moreover, Lemma B3.3 implies pT q t j,‚ ,´λ , ω´λ˘ 8 ď C L N pω´λq.
With these bounds and Proposition B1.3 and Lemma B3.1, we now obtain ż The second term of (A1.8) is similar and thus the error from replacing A pT q t j,r ,λ and A pT q t j,s ,´λ by A u j,r ,ω and A u j,s ,´ω , respectively, is of order O`l ogpN q N q in L 2 . The expectation of the periodogram tensor can therefore be written as where the remainder term R T is of order O`l ogpN q N˘. Correspondingly, the local periodogram kernel is given by Since by the conditions of the theorem, the operator-valued function A u,ω is twice continuously differentiable with respect to u, Theorem B1.8 implies that the spectral density operator F u,ω is also twice continuously differentiable in u P p0, 1q. Hence, by a Taylor approximation of F u j,r ,ω about u j , we find for the mean of the periodogram tensor EpI pT q u j ,ω q " F u j ,ω`1 2 b 2 t,T κ t B 2 F u,ω Bu 2ˇu "u j`O´l ogpN q N¯, where we have used the definition in (5.10) of the smoothing kernel K t in time direction. As by the assumption on the taper function this kernel is symmetric about zero, the first order term in the Taylor approximation is zero.
This proves the first part of from Theorem 5.2. For the covariance, we note that the product theorem for cumulants (see section B2) and the fact that the means are zero imply cov`I pT q u j ,ω 1 , I pT q where S ijkl denotes the permutation operator on b 4 i"1 L 2 C pr0, 1sq that permutes the components of a tensor according to the permutation p1, 2, 3, 4q Þ Ñ pi, j, k, lq, that is, S ijkl px 1 b¨¨¨b x 4 q " x i b¨¨¨b x l . We first show that the first term of this expression is of lower order than the other two. By (5.7), the cumulant is equal to and hence, by Lemma B3.3, is bounded in L 2 -norm by Next we consider the second term of (A1.10). A similar derivation as for the expectation of the periodogram tensor shows that the term equals " H 2,N`Fu j,‚ ,ω 1 , ω 1´ω2˘b H 2,N`Fu j,‚ ,´ω 1 , ω 2´ω1P roceeding in an analogous matter for the third term of (A1.10), we obtain the stated result.
Proof of Theorem 5.3. By Theorem 5.2, the expectation of the periodogram tensor can be written as where the remainder term R T is of order O`l ogpN q N˘. Because the operator-valued function A u,ω is twice differentiable with respect to both u and ω, it follows from Theorem B1.8 that the tensor F u,ω is twice continuously differentiable in both u, and ω. We can therefore apply a Taylor expansion of F u j,r ,ω about to the point x " pu j , ω o q to obtain where the remainder can generally be bounded by In order to derive the mean of the estimator, we set vb t,T " r´N {2 T and recall that the taper function relates to a smoothing kernel K t in time direction by for v P r´1 2 , 1 2 s with bandwidth b t,T " N {T . It then follows from (A1.11) that a Taylor expansion about to the point x " pu j , ω o q yields EpF pT q u j ,ωo q " F u j ,ωo`2 Because the smoothing kernels are symmetric around 0, we obtain where the error terms follow from (A1.12) and Theorem 5.2, respectively. This establishes Result iq of Theorem 5.3.
We treat the two terms of the covariance tensor separately. Starting with the first term, we have Since F u,λ is uniformly Lipschitz continuous in u, we have~F ur,λ´Fu,λ~2 ď C N T and hence the first term on the right hand side is bounded by For the second term, we exploit uniform Lipschitz continuity of the kernel function K f to get the upper bound In total we obtain cov`F pT q u,ω 1 ,F pT q u,ω 2˘ 2 " O´l niformly in ω 1 , ω2 P r´π, πs and u P r0, 1s.
Proof of Proposition 5.4. A change of variables shows that (5.14) can be written as The error terms will tend to zero under assumption (A8). Since the product of the two kernels in the first integral is exactly zero whenever |λ´pω 1´ω2 q| ą b f,T or λ ą b f,T , the first integral vanishes for large enough T unless ω 1 " ω 2 . For ω 1 " ω 2 , the integral in the first term becomes ż Π K f,T p´λq K f,T pλq xF u,ω 1`λ g 3 , g 1 y xF u,´ω 1´λ g 4 , g 2 y dλ and further by symmetry of the kernel " ż Π K f,T pλq 2 xF u,ω 1`λ g 3 , g 1 y xF u,´ω 1´λ , g 4 yg 2 dλ.
We note that }K f }´2 2 K f,T pλq 2 satisfies the properties of an approximate identity (e.g., Edwards, 1967). Hence application of Lemma F.15 of Panaretos and Tavakoli (2013b), which covers approximate identities in a functional setting, yields that the integral converges to }K f } 2 2 xF u,ω 1 g 3 , g 1 y xF u,´ω 1 g 4 , g 2 y, with respect to }¨} 2 . Since the integral in the second term in (A1.15) vanishes unless ω 1 "´ω 2 , we can apply a similar argument, which proves the proposition.
Proof of Theorem 5.7. We decompose the difference in terms of its variance and its squared bias. That is, The cross term cancels which is easily seen by noting that E`F pT q u,ω´E pF pT q u,ω q˘" O H C and hence E´@F pT q u,ω´E pF pT q u,ω q, EpF pT q u,ω q´F u,ω D H C bH C¯" 0 for all u P r0, 1s and ω P r´π, πs. Consider the first term of (A1.16). Self-adjointness ofF pT q u,ω and E}X t,T } 4 2 ă 8 imply that tr`covpF pT q u,ω ,F pT q u,ω q˘" 8 ÿ n,m"1 @ Er`F pT q u,ω´E pF pT q u,ω q˘b`F pT q u,ω´E pF pT q u,ω q˘ψ nm , ψ nm  u,ω pτ, σq´Ef pT q u,ω pτ, σqˇˇ2 dτ dσ dω Proof of Proposition 5.10. We have cum`Ê pT q u,ω 1 pψ m 1 n 1 q, . . . ,Ê pT q u,ω k pψ m k n k q˘" pk, 0q pk, 1q , and, for p " pi, jq, γ p " γ ij " p´1q j λ i as well as r p " r ij " m 1´j i n j i for i " 1, . . . , k and j P t0, 1u. For the next steps, we further denote the elements of P l with |P l | " d l by p l1 , . . . , p ld l . Then, by (5.7), we obtain further for the above cumulant Noting that the inner integral is a inner product in the tensor product space, we geťˇˇA Noting that by Lemma B3.3 Substituting the upper bound for the cumulant in (A1.17) and noting that 1 N H 2,N p0q Ñ }h} 2 2 as N Ñ 8,we finďˇc umpÊ pT q u,ω 1 pψ m 1 n 1 q, . . . ,Ê pT q u,ω k pψ m k n k q˘ˇď It is sufficient to show that for each indecomposable partition tP 1 , . . . , P M u the corresponding term in the above sum tends to zero. First, suppose that M " k.
Bounding the factors K f,T pω i´λi q by }K f } 8 {b f,T for i " 2, . . . , k and integrating over λ 3 , . . . , λ k , we obtain by Lemma B3.2(i) as an upper bound where we have K f,T pωq ď b f,T L b´1 f,T pωq and repeatedly Lemma B3.1(iv). Next, if M ă k we select variables λ i 1 , . . . , λ i k´2 according to Lemma B3.2(ii) and bound all corresponding factors K f,T pω i j´λ i j q for j " 1, . . . , k´2 by }K f } 8 {b f,T . Then integration over the k´2 selected variables yields the upper bound , since }K f,T } 1 " 1. Since b f,T N " b f,T b t,T T Ñ 8 and k{2´1 ą 0, the upper bounds tend to zero as T Ñ 8, which completes the proof.
In Appendix B, we provide additional technical material necessary to complete the proofs of the main part of this paper. Section B1 contains background material on operator theory. Section B2 provide background on the higher order structure of random functions as well as an important result on the existence of a stochastic integral, necessary to define functional Cramér representations. Section B3 introduces the necessary background on tapering on function spaces. Appendix B1. Some operator theory We start with a general characterization of a tensor product of a finite sequence of vector spaces, which in particular holds for sequences of Hilbert spaces.
Definition B1.1 (Algebraic tensor product of Banach spaces). Given a finite sequence of vector spaces V 1 , . . . , V k over an arbitrary field F, we define the algebraic tensor product V 1 b¨¨¨b V k as a vector space with a multi-linear map V 1ˆ¨¨¨V k Ñ W given by pf 1 , . . . , f k q Ñ pf 1 b¨¨¨b f k q such that, for every linear map T : V 1ˆ¨¨¨ˆVk Ñ W , there is unique k-linear mapT : V 1ˆ¨¨¨ˆVk Ñ W that satisfies T pf 1 , . . . , f k q "T pf 1 b¨¨¨b f k q.
Here, uniqueness is meant up to isomorphisms. The tensor product can be viewed as a linearized version of the product space V 1ˆ¨¨¨ˆVk satisfying equivalence relations of the form apv 1 , v 2 q " pav 1 , v 2 q " pv 1 , av 2 q where a P K and v 1 P V 1 , v 2 P V 2 , which induce a quotient space. These relationships uniquely identify the points in the product space V 1ˆ. . .ˆV k that yield multi-linear relationships. In a way, the tensor product Â k j"1 V j can thus be viewed as the 'freest' way to put the respective different vector spaces V 1 , . . . , V k together. We mention in particular that the algebraic tensor product satisfies the associative law, i.e., pV 1 b V 2 q b V 3 " V 1 b pV 2 b V 3 q, and hence it will often be sufficient to restrict attention to k " 2.
The algebraic tensor product of two Hilbert spaces H 1 and H 2 is itself not a Hilbert space. We can however construct a Hilbert space by considering the inner product acting on H 1 b H 2 given by xx b y, x 1 b y 1 y H 1 bH 2 " xx, x 1 yxy, y 1 y, x, x 1 P H 1 , y, y 1 P H 2 and then taking the completion with respect to the induced norm }¨} H 1 bH 2 . The completed space, denoted by H 1 p bH 2 , is identifiable with the Hilbert-Schmidt operators and is referred to as the Hilbert Schmidt tensor product. Throughout this work, when reference is made to the tensor product space of Hilbert spaces we mean the latter space. When no confusion can arise, we shall moreover abuse notation slightly and denote H 1 p bH 2 simply by H 1 b H 2 .
Definition B1.2. The tensor product pA b Bq P S P pHq b S p pHq -S p pS p pHqq between two operators A, B P S p pHq is defined as that for any C P S p pHq, we have the identity where B : denote the adjoint operator of B.
Proposition B1.3 (Hölder's Inequality for operators). Let H be a separable Hilbert space and A, B P S 8 pHq. Then the composite operator AB also defines a bounded linear operator over H, i.e., AB P S 8 pHq. This operation satisfies the associative law. Moreover, let 1 ď p, q, r ď 8, such that 1 r " 1 q`1 p . If A P S q pHq and B P S p pHq then AB P S r pHq and AB~r ď~A~q~B~p. Corollary B1.5. Let A i , i " 1,¨¨¨, k for k finite belong to S p pHq and let ψ " pψ 1 b¨¨¨b ψ k q be an element of Â k i"1 H. Then we have that the linear mapping A "`A 1 b ... b A ks atisfies iq }Aψ} 2 ă 8 and iiq~A~p ă 8.
Proposition B1.6 (Neumann series). Let A be a bounded linear operator on H and I H be the identity operator. If~A~8 ă 1, the operator I H´A has a unique bounded inverse on H given by If A P S 2 pHq with~A~2 ă 1, then this equality holds in Hilbert-Schmidt norm.
Proof. We only show the case A P S 2 pHq. Note that the space S 2 pHq is a Hilbert space. Then for m ă n, which shows that the partial sum forms a Cauchy sequence and hence has a limit A˚in S 2 pHq. Furthermore, we have pI H´A q A˚" lim nÑ8 pI H´A q n ř k"0 A n " lim nÑ8`I H´A n`1 q " I H in S 2 pHq, which shows that A˚is the inverse of I H´A .
Proposition B1.7 (Hilbert-Schmidt operators as kernel operator). Let H " L 2 C pT, µq be a separable Hilbert space, where pT, µq is a measure space, and let A be an operator on H. Then A P S 2 pHq if and only if it is an integral operator, that is, there exists a function a P L 2 C pTˆT, µ b µq such that A xpτ q " ż apτ, σq xpσq dµpσq for all τ P T µ-almost everywhere. Moreover, we have~A~2 " }a} 2 .
Proof. First, suppose A is an integral operator on H with kernel a P L 2 C pTˆT, µbµq.
Because H is separable, it has a countable orthonormal basis tψ n u nPN . For fixed τ P M , the function a τ pσq " apτ, σq defines a measurable function on L 2 C pT, µq. We can therefore write Aψ n pτ q " ż apτ, σqψ n pσqµpσq " xa τ , ψ n y.

Dominated Convergence Theorem yields
Axpτ q " A´lim where pψ i q iPN is an orthonormal basis of H, provided that Ep}X} 2 2 q ă 8. For higher moments, it is appropriate to consider these as tensors in a tensor product space H b¨¨¨b H of appropriate dimension. More precisely, let X 1 , . . . , X k be random elements in H. Then we define the moment tensor EpX 1 b¨¨¨b X k q by EpX 1 b¨¨¨b X k q " Similarly, we define the cumulant tensor cumpX 1 , . . . , X k q by cumpX 1 , . . . , X k q " The cumulants on the right hand side are as usual given by cum`xX 1 , ψ i 1 y, . . . , xX k , ψ i k y˘" ÿ ν"pν 1 ,...,νpq where the summation extends over all unordered partitions ν of t1, . . . , ku.
Formally, we also abbreviate this by cumpX 1 , . . . , X k q " ÿ ν"pν 1 ,...,νpq where S ν is the permutation that maps the components of the tensor back into the original order, that is, S ν`b p r"1 b pi,jqPνr X ij˘" X 11 b¨¨¨b X kl k . Next, let A 1 , . . . , A k linear bounded operators on H. As in Appendix B1, let A 1 b¨¨¨b A k be the operator on H b¨¨¨b H given by pA 1 b¨¨¨b A k qpx 1 b¨¨¨b x k q " pA 1 x 1 q b¨¨¨b pA k x k q for all x 1 , . . . , x k P H. The next proposition states that moment tensors-and hence also cumulant tensors by the above definitions-transform linearly.
Proposition B2.1. Let A 1 , . . . , A k be bounded linear operators on H and X 1 , . . . , X k be random elements in H. Theǹ Proof. Let tψ i u iPN be an orthonormal basis of H. Using the definition of a moment tensor, we get pA 1 b¨¨¨b A k q EpX 1 b¨¨¨b X k q " ÿ i 1 ,...,i k PN E´k ś j"1 xX j , ψ i j y¯pA 1 ψ i 1 q b¨¨¨b pA k ψ i k q and further, by representing A j ψ i j with respect to the chosen orthonormal basis, " ÿ i 1 ,...,i k PN ÿ n 1 ,...,n k PN E´k ś j"1 xX j , ψ i j y¯k ś j"1 xA j ψ i j , ψ n j ypψ n 1 b¨¨¨b ψ n k q " ÿ n 1 ,...,n k PN where we have used linearity of the operators, of the inner product, and of the ordinary mean.
As a direct consequence of the above proposition, we also have linearity of cumulant tensors. More precisely, for i " 1, . . . , k, let X i be a random tensor in Â k j"1 H and let A i be a linear bounded operator on the same tensor product space. Theǹ A 1 b¨¨¨b A k˘c um`X 1 , . . . , X k˘" cum`A 1 X 1 , . . . , A k X k˘. (B2.4)

B2.1. Higher order dependence under functional stationarity
For stationary processes, we follow convention and write the cumulant tensor (B2.1) as a function of k´1 elements, i.e., we denote it by C t 1 ,...,t k´1 . Under regularity conditions this operator is in S 2 pHq and we can define the corresponding k-th order cumulant kernel c t 1 ,...,t k´1 of the process X by A sufficient condition that is often imposed for this to hold is E}X 0 } k 2 ă 8. Similar to the second-order case, the tensor (B2.1) will form a Fourier pair with a k-th order cumulant spectral operator given summability with respect to~¨~p is satisfied. The k-th order cumulant spectral tensor is specified as where the convergence is in~¨~p. Properties on the kernels that are relevant in the time-dependent framework are discussed in section 4 of the main paper.
Theorem B2.2. Let tX t u tPZ be a stationary stochastic process in L 2 pr0, 1sq such that E}X 0 } k 2 ă 8 for all k P N and ř 8 t 1 ,...,t k´1 "´8~C t 1 ,...,t k´1~2 ă 8. Furthermore let Then there exists a 2π-periodic stochastic process tZ ω u ωPR taking values in L 2 C pr0, 1sq with Z ω " Z´ω such that lim N Ñ8 E}Z pN q ω´Zω } 2 2 " 0. Furthermore, tZ ω u equals almost everywhere the functional orthogonal increment process of the Cramer representation of tX t u, that is, a.e. in H.
Finally, we have for k ě 2 cum`Z ω 1 , . . . , Z ω k˘" which holds almost everywhere and in L 2 .
Proof of Theorem B2.2. The theorem generalizes Theorem 4.6.1 of Brillinger (1981). Let µ be the measure on the interval r´π, πs given by for all Borel sets A Ď r´π, πs. Similar to the time series setting, it has been shown (Panaretos and Tavakoli, 2013a) that there is a unique isomorphism T of sptX t u tPZ onto L 2 C pr´π, πs, µq such that T X t " e itf or all t P Z. The process defined by Z ω " T´1`1p´π,ωs p¨q˘is then a functional orthogonal increment process of which the second order properties are completely determined by the spectral density operator F. We have T pZ ω´Zν q " 1 pν,ωs p¨q,´π ă ν ă ω ă π, and for b j P C, j " 1, . . . , N For the first part of the proof, we shall use that the function 1 p´π,ωs p¨q can be approximated by the N -th order Fourier series approximation where the Fourier coefficients are given bỹ b ω,t " 1 2π ż π π 1 p´π,ωs pλq e´i tλ dλ. (B2.8) The approximation satisfies the properties listed in the following proposition (Brockwell and Davis, 1991, Proposition 4.11.2).
Proposition B2.3. Let tb N u N ě1 be the sequence of functions defined in (B2.8).
Proof of Proposition B2.4. For the first equality, we need to show that lim N Ñ8 E}X pN q t,T´X t,T } k 2 " 0.
We will do this by demonstrating that the the tail series X´p " 0. We now prove (2). By Proposition B2.1 and (1), we have cumpA pT q t 1 ,s 1 ε t 1´s1 , . . . , A pT q t k ,s k ε t k´sk q "´A pT q t 1 ,s 1 b¨¨¨b A pT q t k ,s k¯c umpε t 1´s1 , . . . , ε t k´sk q.
It is therefore sufficient to show that cum´ř s 1 PZ A pT q t 1 ,s 1 ε t 1´s1 , . . . , ř s k PZ A pT q t k ,s k ε t k´sk q " ÿ s 1 ,...,s k PZ cumpA pT q t 1 ,s 1 ε t 1´s1 , . . . , A s k ,t k ,T ε t k´sk q.
Let tψ l u lPN be an orthonormal basis of H. Then tψ l 1 b¨¨¨b ψ l k u l 1 ,...,l k ě1 forms an orthonormal basis Â k j"1 H. For the partial sums N ř s j "1 A pT q t j ,s j ε t j´sj , j " 1,¨¨¨, k, we obtain by virtue of the triangle inequality, the Cauchy-Schwarz Inequality and generalized Hölder Inequality The result now follows by the dominated convergence theorem.

B2.3. Existence of the stochastic integral
In order to provide sufficient conditions for local stationarity of functional processes in terms of spectral representations, we turn to investigating the conditions under which stochastic integrals ş π π U ω dZ ω for S 8 pH C q-valued functions U ω are well-defined. For this, let µ be a measure on the interval r´π, πs given by µpAq " ż A~F ω~1 dω, (B2.11) for all Borel sets A Ď r´π, πs and let B 8 " L 2 S8pH C q pr´π, πs, µq be the corresponding Bochner space of all strongly measurable functions U : r´π, πs Ñ S 8 pH C q such that }U } 2 B8 " ż π π~U ω~2 8 dµpωq ă 8. (B2.12) Panaretos and Tavakoli (2013a) showed that the stochastic integral is well defined in H C for operators that belong to the Bochner space B 2 " L 2 S 2 pH C q pr´π, πs, µq, which is a subspace of B 8 . In particular, it contains all functions U : r´π, πs Ñ S 2 pH C q of the form U ω " gpωq I`A ω , where g and A are, respectively, C and S 2 pH C q-valued functions that are both càdlàg with a finite number of jumps and A additionally satisfies ş π π~A ω~2 2~F ω~1 dω ă 8. Here, continuity in S 2 pH C q is meant with respect to the operator norm~¨~8.
Because the space B 2 is too restrictive to include interesting processes such as general functional autoregressive processes, we first show that the integral is properly defined in H C for all elements of B 8 . To do so, consider the subspace Q 0 Ă B 8 of step functions spanned by elements U 1 rα,βq for U P S 8 pH C q and α ă β P r´π, πs. Additionally, denote its closure by Q " Q 0 . Define then the mapping T : Q 0 Þ Ñ H C by linear extension of T pU 1 rα,βq q " U pZ β´Zα q. (B2.13) The following lemma shows that the image of T is in H C .
The mapping T : Q 0 Þ Ñ H C is therefore continuous. Together with the completeness of the space H C this establishes that, for every sequence tU n u ně1 Ă Q 0 converging to some element U P Q, the sequence tT pU n qu ně1 forms a Cauchy sequence in H C with limit T pU q " lim nÑ8 T pU n q. By linearity and continuity of the mapping T , the limit is independent of the choice of the sequence. Furthermore, since Q 0 is the subspace spanned by step functions that are square integrable on r´π, πs with respect to the finite measure µ and hence is dense in L 2 S8pH C q pr´π, πs, µq, we have B 8 Ď Q. Since }T pU q} H C ď }U } B8 , the above extension is well-defined for all U P B 8 . Appendix B3. Data taper In order to show convergence of the higher order cumulants of the estimator in (5.11), we will make use of two lemmas from Dahlhaus (1993) (Lemma A.4 and A.5 resp.). Both rely on the function L T : R Ñ R, T P R`, which is the 2π-periodic extension of if 1{T ď |λ| ď π. (B3.1) The function L T satisfies some nice properties. The following lemma lists those required in the current paper: Lemma B3.1. Let k, l, T P N, λ, α, ω, µ, γ P R and Π : p´π, πs. The following inequalities then hold with a constant C independent of T .
In addition, we also make use of Lemma 2 from Eichler (2007).
(i ) If m " n then for any n´2 variables α i 1 , . . . , α i n´2 we have ż Π k´2 n ś j"1 L T pγ j q dα i 1¨¨¨d α i n´2 ď C L N pα i n´1˘α in q 2 logpT q n´2 .
where }R 1,N } 2 " O´sup Similarly if g ‚ T on the left hand side is replaced by the kernel function g pT q ‚ P H 2 C of G pT q ‚ . If the kernels are bounded uniformly in their functional arguments, Lemma A.5 of Dahlhaus (1993) is pointwise applicable.