Simultaneous inference of the mean of functional time series

: For functional time series with physical dependence, we con- struct conﬁdence bands for its mean function. The physical dependence is a general dependence framework, and it slightly relaxes the conditions of m-approximable dependence. We estimate functional time series mean functions via a spline smoothing technique. Conﬁdence bands have been con- structed based on a long-run variance and a strong approximation theorem, which is satisﬁed with mild regularity conditions. Simulation experiments provide strong evidence that corroborates the asymptotic theories. Addi-tionally, an application to S&P500 index data demonstrates a non-constant volatility mean function at a certain signiﬁcance level.


Introduction
Functional data often arise from measurements obtained by separating an almost continuous time period into natural consecutive intervals, for example days. The functions thus obtained form a time series {Y i , i ∈ Z}, where each Y i is a curve Y i (t), t ∈ [a, b]. Such data structures are referred as functional time series. Without loss of generality, we assume that the domain of the functions is [0,1]. A central issue in functional time series analysis is taking into account the temporal dependence of the observations, i.e. the dependence between curves of {Y i (t), i ≤ m} and {Y j (t), j ≥ m + l} for any l ∈ Z. Further, we assume that Y i (·), i ∈ Z, is strictly stationary (stationary in short); see [3]. If a functional time series is not stationary, it is assumed that it can be transformed into a stationary time series by a preprocessing procedure.
The functional data analysis focuses mostly on i.i.d. functional observations; see [18,22,23,17,21] and references therein. In spite of the methodological advancement with independent observations, the work on functional time series has been sparse and of a more theoretical nature; see, for example, [3]. Functional time series retains the merit of functional observations, while it is more flexible than purely i.i.d curves by allowing a dependence structure. Applications of functional time series analysis are as wide as in the i.i.d. case. For example, we consider S&P500 data. We interpret the intraday information as curves, and the dependence among curves is natural. Recently, more and more researchers work on this type of functional data. To the best of our knowledge, the available inference methods focus mostly on nonparametric estimations of mean and covariance functions of functional series; see Part IV of [10,1,13,14] among the others. A simultaneous confidence band is lacking in the literature.
Two distinct types of functional data have been studied. [22,23,16] studied sparse longitudinal data. While [5,19] and [10] concerned dense functional data. As in the i.i.d. case, functional time series has the same two scenarios, and we only deal with equally-spaced dense design in this paper.
The basic unit of information of functional data analysis is the entire observed curve, and an important theoretical and practical issue is the necessity of detecting the global trend of its mean function. Nonparametric simultaneous confidence bands are powerful tools of global inferences for functions, see [6,9,11,4,20,24,25] for the theory and applications. For i.i.d. functional data, [4] constructed simultaneous confidence bands for its mean function.
The paper seeks to construct simultaneous confidence bands of the mean function of dense functional time series data. Our goal is to develop a simple but flexible nonparametric method with a well-justified theory and a fast algorithm to implement in practice. This is done by approximating the nonparametric components via spline estimation. Our approach allows for formal derivation of the asymptotic properties of the proposed estimators. We establish the √ nconsistency of the proposed estimator for the mean function.
We organize our paper as follows. In Section 2 we describe the physical dependence and asymptotics under it. In Section 3 we state our main results of confidence bands. In Section 4 we provide the procedure to implement the construction of the proposed confidence band. In Section 5 we report findings of a simulation study. An empirical example in Section 6 illustrates how we use the confidence bands for statistical inferences. We end the paper with a concluding section. Technical proofs are in the Appendix.

Weak dependence
In this paper we use the following notation. Let F denote L 2 ([0, 1]); the Hilbert space on a compact domain of square integrable functions with the norm ||x|| 2 2 = 1 0 x 2 (s)ds and the inner product . || · || means || · || 2 when r is not specified. Now we focus on the temporal dependence of functional time series, and we extend the physical dependence for random variables ( [15] and [2]) to a functional setup. Recall that, in our paper, the strictly stationary time series is functional valued rather than real valued, i.e. each Y i is a random curve.
In the following, we directly use the description of the physical dependence in [2] to a functional setup. Assume EY 1 = 0 with E Y 1 p 2 < ∞, 2 < p ≤ 4. For a stationary sequence allowing the representation where i , i ∈ Z are i.i.d random elements, and g : 1/p . This quantity can be interpreted as the dependence of Y i on 0 and Y i,0 is a coupled version of Y i with 0 in the latter replaced by 0 . Throughout the paper we assume (2.1) We provide an example with the functional physical dependence.
In the literature in functional time series, m-approximable dependence has been extensively used, see [1] and [13] among the others. In their framework, weak dependence is quantified by a summability condition which intuitively states that the function g decays so fast that the impact of shocks far back in the past is so small that they can be replaced by their independent copies, with only a small change in the distribution of the process. To be specific, define The proof is in the appendix. The opposite of the statement in lemma 1 is not true, and a simple counter example is constructed.
i−n − i−n with { n ; n ∈ Z} being an i.i.d. copy of { n ; n ∈ Z}. Taking expectation on the inner product, we have , which entails the physical dependence.

Main results
For a standard process {ξ(t), t ∈ [0, 1]}, one defines its mean function where the random coefficients ξ ·k are of mean 0 and variance 1, and the func- are the eigenvalues and eigenfunctions of the covariance function C (t, s) = cov{ξ 1 (t), ξ 1 (s)} respectively. Recalling (1.1), the data generating process is now written as Practically, we express a functional observation Y ij as where κ is assume to be a finite positive integer. The number κ of the basis functions impacts the performance of some procedures. For the data studied in this paper, we generally choose κ so that the plotted functional objects resemble original data with some smoothing that eliminates the most obvious noise. A practical selection of κ is given in Section 4.

Confidence bands
In this subsection, we present the confidence band result. In the first step, we estimate the mean function via spline smoothing. To describe the spline functions, we first introduce a sequence of equally spaced points {t j } Np j=1 , called interior knots which divide the interval [0, 1] into (N p + 1) equal subintervals Np+1 is the distance between neighboring knots. Denote S p the space of pth order spline space, i.e. S p is spanned by B-spline basis {B j,p } of order p as described in [7].
Let the "infeasible estimator" of function m were observed, a view taken in [10]. We propose to estimate the mean function m(t) bŷ in which S p is a p-th degree spline space.
In i.i.d. functional samples, the sample covariance operator is used, but for functional time series this problem is more complicated with a dependence structure. For vector-valued time series, the variance of its sample mean is asymptotically approximated by the long-run variance G(t, s), here It is proven in [14] that the series in (3.4) is convergent in L 2 under some weak dependence assumptions. Denote by ζ(t) a standardized Gaussian process such that Eζ(t)≡0, Eζ 2 (t)≡1 with the correlation function We impose the following technical assumptions: Assumptions (A1) and (A2) are typical for spline smoothing ( [20]). Assumption (A3) ensures that the principle components have collectively bounded smoothness. Assumption (A4) concerns the number of observations for each subject and the number of knots of B-splines, which are needed to ensure the asymptotic result. Assumption (A5) provides the Gaussian approximation of estimation error process. Besides the above assumptions, we also assume regularity conditions for the physical dependence measure about functional time series Y , which will be stated in the Appendix.
We first present the asymptotic property of m(t), then we show thatm(t) has the same asymptotic property as m(t).

Theorem 1. Under Assumptions (A1)-(A5)
, for any α ∈ (0, 1), as n → ∞, the "infeasible estimator" m(t) converges at the √ n rate with and for any t ∈ [0, 1], The spline estimatorm is asymptotically equivalent to m, that is The explicit expression of the confidence band for m(t) is presented in the following corollary, which is a direct result of the theorem above.

Implementation
In this section, we show in detail the kernel bandwidth selection and how we choose the number of interior knots when we apply a spline smoothing technique. Given any data set (j/n, Y ij ) N,n j=1,i=1 from model (3.2), we obtain the spline estimatorm(t) through (3.3). According to Assumption (A4), the number of interior knots for estimating m(t) is taken to be N p = c[n 1/2p log(n)], where [a] denotes the integer part of a. To select the number of knots N p , we use BIC method as follows, here r ij s are residuals and n T is the total number of observations. We choose knots with the smallest BIC within the range of [.5n 1/2p log(n), 5n 1/2p log(n)]. When constructing the confidence bands, one needs to evaluate the function σ 2 n (t) by estimating the unknown functions f (t) and σ 2 Y (t) and then plugging in these estimators. The same approach from [20] will be taken for these estimators. Also, one needs to estimate the unknown functions C (t, s), G (t, s) and the quantile Q 1−α .
For two curves with lag deviation i, the empirical covariance function is defined asγ whereȲ n (t) = 1/n 1≤i≤n Y i (t). Then, the estimation of G is given bŷ where K(·) is a kernel function and h = h(n) is a smoothing bandwidth satisfying h(n) → ∞ and h(n)/n → 0. Under m-approximable dependence assumption, [14] showed thatĜ is a consistent estimator of the long-run covariance function G in the sense of {Ĝ n (t, s) − G(t, s)} 2 dtds → 0 in probability as n goes to infinity. Similarly, this consistency holds under the physical dependence.
We use the flat top kernel as suggested by [14].
Equation (4.1) provides a pilot estimator of the covariance function G(·, ·). The smoothed estimator of covariance function G (t, s) is througĥ where G ·jj =Ĝ(j/N, j /N ), 1 ≤ j, j ≤ N i , and S p ⊗ S p is a tensor product spline space. We then estimate C (t, s). A pilot estimator is by the sample covarianceĈ Then we obtainĈ sm (t, s) through bivariate spline smoothing similarly as in

Simulation
We carry out a set of simulation studies to illustrate the finite sample behavior of the proposed confidence bands obtained in Section 3. We consider all combinations of subject sizes n = 200, 500, 1000, 2000 and within subject observation sizes N = 50, 100, 200; each pair of data-generated processes was replicated 1000 times. We generate data from an autoregressive (FAR(1)) process

here β(·) is a (bounded) kernel operator defined by β(x)(t) = β(t, s)x(s)ds.
We consider the following two kernel settings for the dependence structure For the mean function m(·), we used a C (∞) [0, 1] function m(t) = sin(2πt), for 0 ≤ t ≤ 1. Suggested by a referee, we also considered a C (3) [0, 1] mean function standard Brownian bridges and N i s are i.i.d. standard Normals. Note that this gives E( 2 i (t)) = 1 for all 0 ≤ t ≤ 1, which is assumed by our estimation procedure. Different noise levels σ = 0.5, 1 were used to interpret the result. The first 200 trajectories were discarded to ensure stationarity.
We used cubic spline to estimate the mean functions in our simulation studies. For the long-run covariance functions estimation, we used the flat top kernel as described in section 4 with bandwidth h = [n 1/3 ]. This estimation was used for all data-generating processes. Tables 1, 2, 4 and 5 show the coverage frequencies from 1000 replications for the nominal confidence levels 1 − α = 0.90, 0.95 and  Tables 3 and 6 provide both the maximum and averaged widths of the confidence bands over the grid points under different combinations. The results of the simulation studies can be summarized as follows. The empirical coverage rates go to the nominal ones as the sample size increases in all cases. When the sample size is as large as n = 1000, the coverage percentages of the confidence bands are very close to the nominal confidence levels 0.90, 0.95 Table 6 Maximum and averaged widths of confidence bands for m(·) in (5.1) (Setting A)  and 0.99. The coverage frequencies under different noise levels (σ = 0.5 and 1) are comparable for both settings. However, for higher noise levels, the confidence bands are wider. One can also see that both the maximum and averaged widths decrease as the sample size increases. Visualizations of such confidence bands are shown in Figures 1 and 2 for combinations of different subject sizes

Application
In this section, we apply the confidence band to S&P500 intraday returns data. Let I k (t) denote S&P500 index on the day k at the time t. Then y k (t) can be viewed as the log-return of the stock, y k (t) = log I k (t) − log I k (t − h), during period h, where h is typically 1 or 5 minutes. We used h = 5 for 5-minute returns, which was recorded throughout each trading day. The volatility of the stock is then represented by σ 2 k (t) = V ar(y k (t)|F k−1 ) with F k−1 being the information up to day k − 1. The left panel in Figure 3 shows S&P500 indices for 10 consecutive days. It is natural to choose one trading day as the underlying time  interval. We show, in the right panel of Figure 3, the corresponding trajectories of the 10 daily volatility curves. Every day, the volatility curve exhibits a certain pattern, typically with some upward or downward momentum. To quantify such behavior, we use the confidence band of functional time series. Our data record is from January 2006 to December 2011, containing 1511 trading days. For each day, there are 79 observations from 9:30 am to 4 pm.
We estimated the mean function of daily volatility curves using cubic spline, shown as dashed-dotted line in the middle of Figure 4. Notice from there that the volatility, σ 2 k (t), tends to be larger at the beginning and the end of a trading day. We, therefore, constructed 90% and 95% confidence bands (upper and lower dashed lines and dotted ones in Figure 4) for the mean daily volatility function. To test the hypothesis that the mean daily curve is constant, one would like to see if a constant line can be fitted in the confidence bands. We failed to do it under 90% confidence level since the maximum value of the lower bound curve is larger than the minimum value of the upper bound curve as shown in Figure 4. Therefore, we conclude non-constant mean daily volatility with .10 significance level. However, we cannot reject constant daily volatility using .05 significance level as also shown in Figure 4 (The constant horizontal line with value 7.3 × 10 −8 goes through the 95% band completely). Higher volatility at the beginning and the end of a trading day has been observed by several authors (e.g. [12] and [8]). We confirm this phenomenon with statistical inference results.

Concluding remarks
In this article, we have constructed simultaneous confidence bands of mean of functional time series. It is not a simple extension of confidence bands for functional data analysis with independence assumption. We have rigorously established the consistency and asymptotic normality of the proposed estimators. The assumptions of asymptotics are based on a physical measure dependence structure. A widely used m-approximable dependence can be ensured in our dependence structure.
The current article has focused on fixed and dense design, and it can be used in financial data. For instance, the intraday return data are mostly obtained and recorded at fixed time points with high frequency. Scientific experiments may also be designed and recorded this way. However, in some applications, practitioners may look for random and/or sparsely designed functional data, and this is a future topic for our research.
As one referee pointed out, our results in Theorem 1 and Corollary 1 could be improved with the estimatedĜ(t, t) instead of the theoretical G(t, t). The distribution theory of mean function μ(t) withĜ(t, t) is much more challenging than the case with G(t, t). At this stage, we don't even have a nice expression of G(t, t). However, we believe this would be an interesting future research topic.
Combining the triangle inequality and Minkowski inequality, we have We get the desired result immediately.

A.1. Preliminaries
First, we state a strong approximation lemma for a real valued stationary process {X i } n i=1 with physical dependence, which is used in the proofs of Lemma A.3 and Theorem 1.

A.2. Proof of Theorem 1
With spline basis defined in section 3, denote Then the solution ofm (t) in (3.3) can be expressed aŝ To prove the theorem, we break the estimation errorm(t) − m(t) by the representation of Y ij , and we obtain the following crucial decomposition in which m = (m (1/N ) , . . . m (N/N )) T is the signal vector, e = (σ (1/N )¯ .,1 , . . . , σ (N/N )¯ .,N ) T is the noise vector with ., with the bias term m(t) − m(t) + ξ(t) and the noise term e(t). The next three lemmas concern m(t), e(t) andξ(t) given in (A.4), and Theorem 1 follows immediately.
Proof. By theorems of spline approximations in de Boor and Assump- , here the last equation is satisfied with Assumption (A4). hence for any α ∈ (0, 1) Proof. First note the fact that Z .,k,ξ are independent N 0, n −1 variables implies that max 1≤k≤κ Z .,k,ξ = O p n −1/2 . Similarly as in [4], we have, under Assumption (A3), The definition of ζ(t) in (A.5), together with the definition of m(t) in (3.1), the strong approximation in (A.1) and the above bound on max 1≤k≤κ Z .,k,ξ entail that