An autocovariance-based learning framework for high-dimensional functional time series

Many scientific and economic applications involve the statistical learning of high-dimensional functional time series, where the number of functional variables is comparable to, or even greater than, the number of serially dependent functional observations. In this paper, we model observed functional time series, which are subject to errors in the sense that each functional datum arises as the sum of two uncorrelated components, one dynamic and one white noise. Motivated from the fact that the autocovariance function of observed functional time series automatically filters out the noise term, we propose a three-step procedure by first performing autocovariance-based dimension reduction, then formulating a novel autocovariance-based block regularized minimum distance estimation framework to produce block sparse estimates, and based on which obtaining the final functional sparse estimates. We investigate theoretical properties of the proposed estimators, and illustrate the proposed estimation procedure via three sparse high-dimensional functional time series models. We demonstrate via both simulated and real datasets that our proposed estimators significantly outperform the competitors.


Introduction
Functional time series refers to functional data objects that are observed consecutively over time.Existing research on functional time series has mainly focused on extending the univariate or low-dimensional multivariate time series methods to the functional domain.An incomplete list of the relevant references includes Bosq (2000), Bathia et al. (2010), Hörmann and Kokoszka (2010), Panaretos and Tavakoli (2013), Aue et al. (2015), Hörmann et al. (2015), Li et al. (2020) and Chen et al. (2022).The rapid development of data collection technology has made high-dimensional functional time series datasets increasingly common.Examples include hourly measured concentrations of various pollutants such as PM10 trajectories (Hörmann et al., 2015) collected at different measuring stations, daily electricity load curves (Cho et al., 2013) for a large number of households, cumulative intraday return trajectories (Horváth et al., 2014), daily return density curves (Bathia et al., 2010) and functional volatility processes (Müller et al., 2011) for a collection of stocks.
We consider in this paper a setting for modelling high-dimensional functional time series as follows.
Let W t p¨q " tW t1 p¨q, . . ., W tp p¨qu T , t " 1, . . ., n, be the observed p-vector of functional time series defined on a compact interval U, where the dimension p is large in relation to n, and p may be greater than n.
Suppose that W t p¨q is subject to an error: where X t p¨q " tX t1 p¨q, . . ., X tp p¨qu T is a functional time series of interest, e t p¨q " te t1 p¨q, . . ., e tp p¨qu T is white noise in the sense (3) below, and tX t p¨qu n t"1 and te t p¨qu n t"1 are uncorrelated.Note that both X t p¨q and e t p¨q are latent.We assume that both W t p¨q and X t p¨q are weakly stationary, and EtW t p¨qu " 0. For any integer h and u, v P U, put Σ W h pu, vq " CovtW t´h puq, W t pvqu , Σ X h pu, vq " CovtX t´h puq, X t pvqu , Σ e h pu, vq " Covte t´h puq, e t pvqu . (2) We call e t p¨q a white noise if Ete t puqu " 0 and Σ e h pu, vq " 0 for any u, v P U and h ‰ 0 .
(3) Furthermore, we assume that a T X t p¨q is not white noise for any non-zero constant vector a P R p .Under this setting, the linear dynamic structure of W t p¨q is entirely determined by that of X t p¨q, and all white noise elements W t p¨q are absorbed into e t p¨q.The presence of e t p¨q reflects that signal curves X t p¨q are seldom completely observed.Instead, they are often only measured, with errors, on a grid.These noisy discrete data are smoothed to yield 'observed' curves W t p¨q.See Bathia et al. (2010) for the univariate version of model (1).When W 1 p¨q, . . ., W n p¨q are univariate and independent, Hall and Vial (2006) considered the same model under a 'low noise' setting assuming that e t p¨q goes to 0 as n grows to 8.
To separate X t p¨q from e t p¨q, e.g., via the covariance function, even in the univariate case, some special structures were imposed; see, e.g., diagonal Σ e 0 of Yao et al. (2005) and banded Σ e 0 of Descary and Panaretos (2019).In contrast, we do not impose any structures on Σ e 0 in this paper, and our estimation filters out the impact of e t p¨q automatically.
The standard estimation procedures for univariate functional time series models usually consist of three steps (Aue et al., 2015).Dimension-reduction is performed first via, e.g., functional principal components analysis (FPCA).Each observed curve is then approximated by a finite truncation.This effectively transforms functional time series into a vector time series of FPC scores.In the second step the estimation of the function-valued parameters in the model is transformed to that of some appropriate parameter vectors/matrices based on estimated FPC scores.Finally the estimated principal component functions are utilized to obtain function-valued estimates based on the estimated parameter vectors/matrices.To overcome the difficulties caused by high-dimensionality (i.e.large p in relation to n), some functional sparsity assumptions are imposed, which results in the estimation under block sparsity constraints in the second step in the sense that variables belonging to the same block (or group) are simultaneously included or excluded.In regression setups, the group-lasso penalized least squares estimation (Yuan and Lin, 2006) is often adopted in the second step to obtain block sparse estimates.Similar three-step procedures have been developed to estimate sparse high-dimensional functional models, see, e.g., vector functional autoregression (VFAR) (Guo and Qiao, 2022), scalar-on-function linear additive regression (SFLR) (Fan et al., 2015;Kong et al., 2016;Xue and Yao, 2021;Fang et al., 2022) and function-on-function linear additive regression (FFLR) (Fan et al., 2014;Luo and Qi, 2017;Fang et al., 2022).However, those estimation procedures are developed under an assumption that signal curves are observed directly.
In our setting the observed curves W t p¨q are subject to the error contamination as in model (1).
Both FPCA and penalized least squares estimation based on the estimated covariance function p Σ W 0 are inappropriate since Σ W 0 " Σ X 0 `Σe 0 and, hence, p Σ W 0 is no longer a consistent estimator for Σ X 0 .Motivated from the fact that Σ W h " Σ X h for any h ‰ 0, which automatically removes the impact from the noise e t p¨q and ensures that the estimator for Σ W h is also legitimate for Σ X h , we propose an autocovariance-based three-step learning procedure.Differing from FPCA based on the Karhunen-Loève expansion, our first dimension reduction step is formulated under an alternative data-driven basis expansion of X tj p¨q based on the eigenanalysis of a positive-definite operator defined in terms of the autocovariance functions of W tj p¨q.
Different from the penalized least squares estimation, our second step makes use of the autocovariance of basis coefficients to construct high-dimensional moment equations and then applies the proposed block regularized method to estimate the associated block sparse parameter vectors/matrices.Our third step re-transforms block sparse estimates to functional sparse estimates via estimated basis functions obtained in the first step.
Our theoretical development stands at the intersection between high-dimensional statistics and functional time series, facing several challenges due to non-asymptotics and infinite-dimensionality with serial dependence.Firstly, in the proposed second step we deal with the estimated basis coefficients to produce block sparse estimates whereas the conventional sparse estimation is applied directly to observed data.Accounting for such approximation is a major undertaking.Secondly, under a high-dimensional and dependent setting, it is essential to develop non-asymptotic error bounds on the relevant estimated terms as a function of n, p and the truncated dimension, and to assess how the serial dependence affects non-asymptotic results.Thirdly, compared to non-functional data, the infinite-dimensional nature of functional data leads to the additional theoretical complexity that arises from specifying the block structure and controlling bias terms formed by truncation errors from the dimension reduction step.
The main contribution of our paper is three-fold.
1. Our autocovariance-based learning framework can address the error contamination model (1) in the presence of infinite-dimensional signal curve dynamics with the addition of 'genuinely functional' noise.It makes the good use of the serial correlation information, which is the most relevant in the context of time series modelling.2. To provide theoretical guarantees for the first and the third steps and to verify imposed high-level regularity conditions in the second step, we establish useful non-asymptotic error bounds on the relevant estimated terms under the autocovariance-based dimension reduction framework.
3. We utilize the autocovariance among basis coefficients to construct high-dimensional moment equations with partitioned group structure, based on which we formulate the second step in a novel block regularized minimum distance (RMD) estimation framework to produce block sparse estimates.The group information can be explicitly encoded in a convex optimization targeting at minimizing the block 1 norm objective function subject to the block 8 norm constraint.To theoretically support the second step, we investigate convergence properties of the block RMD estimator.Besides being useful in the second step, the block RMD estimation framework itself is of independent interest and can be applied more broadly.
Our paper is set out as follows.In Section 2, we present Step 1, i.e. the autocovariance-based dimension reduction technique.We also establish some essential deviation bounds on the relevant estimated terms.
In Section 3, we use an example to illustrate the construction of high-dimensional moment equations.
We then formulate a general block RMD estimation method (i.e. Step 2) and investigate its theoretical properties.In Section 4, we illustrate the proposed three-step framework using three examples of sparse high-dimensional functional time series models, i.e.SFLR, FFLR and VFAR.Theoretically, we study convergence rates of the associated estimators in these models.In Section 5, we examine the finite-sample performance of the proposed estimators through both simulations and an analysis of a real financial dataset.All technical proofs are relegated to the Appendix.
Notation.For a positive integer q, we denote rqs " t1, . . ., qu.Let L 2 pUq be a Hilbert space of square-integrable functions on a compact interval U.The inner product of f, g P L 2 pUq is defined as xf, gy " ş U f puqgpuq du.For a Hilbert space H Ă L 2 pUq, we denote the p-fold Cartesian product by H p " H ˆ¨¨¨ˆH and the tensor product by S " H b H.For f " pf 1 , . . ., f p q T and g " pg 1 , . . ., g p q T in H p , we define xf , gy " We use }f } " xf , f y 1{2 and }f } 0 " with Ip¨q being the indicator function to denote functional versions of induced norm and 0 -norm, respectively.For an integral operator K : H p Ñ H q induced from the kernel function K " pK ij q qˆp with each K ij P S, Kpf qpuq " t ř p j"1 xK 1j pu, ¨q, f j p¨qy, . . ., ř p j"1 xK qj pu, ¨q, f j p¨qyu T P H q for any f " pf 1 , . . ., f p q T P H p .For notational economy, we will also use K to denote both the kernel and the operator.We define functional versions of Frobenius and matrix 8 -norms by }K} F " p ř q i"1 For any real matrix B " pb ij q qˆp , we write }B} max " max iPrqs,jPrps |b ij | and use }B} F " p ř q i"1 ř p j"1 |b ij | 2 q 1{2 and }B} 2 " λ 1{2 max pB T Bq to denote its Frobenius norm and 2 -norm, respectively.For two sequences of positive numbers ta n u and tb n u, we write a n À b n or b n Á a n if there exist a positive constant c such that a n {b n ď c.We write a n -b n if and only if a n À b n and b n À a n hold simultaneously.

Methodology
Our Step 1 is to approximate each curve X tj p¨q by a finite linear combination: we expand curve X tj p¨q using the data-driven orthonormal basis functions tψ jl p¨qu 8 l"1 , and truncate the expansion to the first d j (to be specified in Section 5) terms: where η tjl " xX tj , ψ jl y, η tj " pη tj1 , . . ., η tjd j q T P R d j and ψ j p¨q " tψ j1 p¨q, . . ., ψ jd j p¨qu T .Different from the conventional Karhunen-Loève expansion, the eigenvalues λ j1 ě λ j2 ě ¨¨¨ą 0 and the corresponding eigenfunctions ψ j1 p¨q, ψ j2 p¨q, . . .are taken from the spectral decomposition of an operator defined as where L ą 0 is some prescribed fixed integer, and Σ X h,ij pu, vq denotes the pi, jq-th element of Σ X h pu, vq in (2).Also denote by Σ W h,ij and Σ e h,ij the pi, jq-th element of, respectively, Σ W h and Σ e h .The idea of using non-zero lagged autocovariances was initiated by Bathia et al. (2010).A direct consequence is the identity since Σ X h,jj pu, zq " Σ W h,jj pu, zq for all pu, zq P U 2 and h ‰ 0. This paves the way to estimate K jj , and therefore also ψ j p¨q, directly based on observations W 1j p¨q, . . ., W nj p¨q.The impact of the noise terms e tj p¨q is filtered out automatically.It is worth noting that we choose not to use autocovariance functions Σ W h,jj directly in defining K jj as they are not nonnegative definite.The definition of K jj in (5) ensures that it is nonnegative definite, and there is no cancellation of the information accumulated from lags 1 to L. Hence the estimation is not sensitive to the choice of L. In practice, we choose small L such as 1 ď L ď 5, as the most significant autocorrelations typically occur at small lags.
In the standard Karhunen-Loève expansion, tψ jl p¨qu 8 l"1 is deduced from the spectral decomposition of Σ X 0,jj .Since Σ X 0,jj pu, vq " Σ W 0,jj pu, vq ´Σe 0,jj pu, vq , some strong assumptions have to be imposed to eliminate the impact of Σ e 0,jj pu, vq in order to obtain consistent estimates for ψ jl p¨q.For example, Hall and Vial (2006) assumes that W 1j p¨q, . . ., W nj p¨q are independent and the noise e tj p¨q goes to 0 as n grows to 8. Note that the dimension reduction via FPCA can also be performed based on the spectral decomposition of Σ W 0,jj instead of Σ X 0,jj , as any basis could be used for expanding the data.However, because of Σ W 0,jj " Σ X 0,jj `Σe 0,jj , using Σ W 0,jj may require a larger truncated dimension to capture the sufficient signal information, leading to reduced statistical efficiency.
It is also worth mentioning that the penalized least squares approach adopted in the covariance-based second step is based on Σ X 0,jk pu, vq " Σ W 0,jk pu, vq ´Σe 0,jk pu, vq and hence is inappropriate under model (1).
With the available observations tW t p¨qu tPrns , a natural estimator for K jj in ( 5) is defined as where λj1 ě λj2 ě ¨¨¨ą 0 are the eigenvalues, and ψj1 p¨q, ψj2 p¨q, ¨¨¨are the corresponding eigenfunctions.
Let Etη pt´hqj η T tk u " tσ phq jklm u lPrd j s,mPrd k s with its estimator pn´hq ´1 ř n t"h`1 p η pt´hqj p η T tk " tσ phq jklm u lPrd j s,mPrd k s for j, k P rps and h ě 0, where p η tj " pη tj1 , . . ., ηtjd j q T .Our proposed autocovariance-based Step 2 and Step 3 explicitly rely on the sample autocovariance among estimated basis coefficients, tσ phq jklm : j, k P rps, l P rd j s, m P rd k s, h P rLsu, and the estimated basis functions t ψjl p¨q : j P rps, l P rd j su, respectively.See details in Sections 3.1 and 4. Their convergence properties in elementwise 8 -norm under high-dimensional scaling are investigated in Section 2.2 below.

Rates in elementwise 8 -norm
To characterize the effect of serial dependence on the relevant estimated terms, we will use the functional stability measure of tW t p¨qu tPZ (Guo and Qiao, 2022).
Condition 1.For tW t p¨qu tPZ , the spectral density operator f W θ " p2πq ´1 ř hPZ Σ W h e ´ihθ for θ P r´π, πs exists and the functional stability measure defined in (9) is finite, i.e.
The quantity M W in (9) is expressed proportional to functional Rayleigh quotients of f W θ relative to Σ W 0 .Hence it can more precisely capture the effect of small decaying eigenvalues of Σ W 0 on the numerator in (9), which is essential to handle truly infinite-dimensional functional objects tW tj p¨qu.We next define the functional stability measure of all k-dimensional subsets of tW t p¨qu tPZ , i.e. tpW tj p¨q : j P Jq T u tPZ for J Ă rps with cardinality |J| ď k, by Under Condition 1, it is easy to verify that M W k ď M W ă 8. Our non-asymptotic results are developed using the infinite-dimensional analog of Hanson-Wright inequality (Rudelson and Vershynin, 2013) in a general Hilbert space H, for which we need to impose the sub-Gaussian condition.
Definition 1.Let Z t p¨q be a mean zero random variable in H for any fixed t and Σ 0 : H Ñ H be a covariance operator.Then Z t p¨q is a sub-Gaussian process if there exists a constant c ą 0 such that Epe xx,Zy q ď e c 2 xx,Σ 0 pxqy{2 for all x P H. Condition 2. (i) tW t p¨qu tPZ is a sequence of multivariate functional linear processes with sub-Gaussian errors, namely sub-Gaussian functional linear processes, W t p¨q " ř 8 l"0 B l pε t´l q for any t P Z, where B l " pB l,jk q pˆp with each B l,jk P S, ε t p¨q " tε t1 p¨q, . . ., ε tp p¨qu T P H p and the components in tε t p¨qu tPZ are independent sub-Gaussian processes satisfying Definition 1; (ii) The coefficient functions satisfy ř 8 l"0 }B l } 8 " Op1q; (iii) ω ε 0 " max jPrps ş U Σ ε 0,jj pu, uq du " Op1q, where Σ ε 0,jj pu, uq " Covtε tj puq, ε tj puqu.
The multivariate functional linear process can be seen as the generalization of functional linear process (Bosq, 2000) to the multivariate setting and also the extension of multivariate linear process (Hamilton, 1994) to the functional domain.Condition 2(ii) ensures the stationarity of tW t p¨qu tPZ and, together with Condition 2(iii), implies that ω W 0 " max jPrps ş U Σ W 0,jj pu, uq du " Op1q (see Lemma 5 in Appendix B), which is essential in deriving non-asymptotic results.The sub-Gaussian condition is imposed on the functional process to facilitate the use of Hanson-Wright-type inequality in our non-asymptotic analysis.We believe that a Nagaev-type concentration bound can be established to accommodate functional linear process with functional errors under a weaker finite polynomial moments condition.It is also interesting to develop nonasymptotic results for more general non-Gaussian functional time series under other commonly adopted dependence framework.
Condition 3(i) controls the lower bound of eigengaps with larger values of α yielding tighter gaps between adjacent eigenvalues.See similar conditions in Hall and Horowitz (2007) and Kong et al. (2016).
To simplify notation, we assume the same α across j, but this condition can be relaxed by allowing α to depend on j and our theoretical results can be generalized accordingly.
We next establish the deviation bounds on estimated eigenpairs, t λjl , ψjl p¨qu, and the sample autocovariance among estimated basis coefficients, tσ phq jklm u, in elementwise 8 -norm.
Theorem 1.Let Conditions 1-3 hold, and d be a positive integer possibly depending on pn, pq.For n Á log p, there exist some positive constants c 1 and c 2 independent of pn, p, dq such that max jPrps,lPrds holds with probability greater than 1 ´c1 p ´c2 , where M W 1 is defined in (10).
Theorem 2. Let conditions in Theorem 1 hold and h ě 1 be fixed.For n Á d 2α`2 pM W 1 q 2 log p, there exist some positive constants c 3 and c 4 independent of pn, p, dq such that max j,kPrps,l,mPrds holds with probability greater than 1 ´c3 p ´c4 , where M W 1 is defined in (10).
Remark 2.1.(i) The parameter d in Theorems 1 and 2 can be understood as the truncated dimension of infinite-dimensional functional objects under the expansion in (4).In general, d can depend on j, say d j , then the maximums in ( 11) and ( 12) are taken over j, k P rps, l P rd j s, m P rd k s and the corresponding right-sides remain the same.
(ii) Compared with the normalized deviation bounds under FPCA framework established in Guo and Qiao (2022), we obtain slower rates in ( 11) and ( 12) for decaying eigenvalues.Note that tν jl p¨qu lě1 provides the unique basis with respect to which X tj p¨q can be expressed as Karhunen-Loève expansion with uncorrelated coefficients.It gives the most rapidly convergent representation of X tj p¨q in the L 2 sense.By comparison, the expansion of X tj p¨q through tψ jl p¨qu lě1 in (4) results in a suboptimal convergent representation with correlated coefficients.From a theoretical viewpoint, whether the rates in ( 11) and ( 12) are minimax optimal is of interest and requires further investigation.

Block RMD estimation framework
Resulting from Step 1, the estimation of sparse function-valued parameters is transformed to the block sparse estimation of parameter vectors/matrices in Step 2. To identify these parameters, we choose tp η pt´hqk : h P rLs, k P rpsu as vector-valued instrumental variables and construct autocovariance-based moment equations, which is illustrated using an example of SFLR in Section 3.1.We then formulate a general block RMD estimation method in Section 3.2 and study its theoretical properties in Section 3.3.

An illustrative example
We illustrate via the high-dimensional SFLR: where tX tj p¨qu tPrns,jPrps satisfy model (1), tε t u tPrns are i.i.d. and mean-zero random errors, and tX tj p¨qu and tε t u are independent.Given observations tpW t p¨q, Y t qu tPrns , our goal is to estimate p functional coefficients β 0 p¨q " tβ 01 p¨q, . . ., β 0p p¨qu T .To guarantee a feasible solution under high-dimensional scaling, we assume that β 0 p¨q is functional s-sparse, i.e. s components in β 0 p¨q are nonzero with s !p.
Resulting from the truncated expansion of X tj p¨q via (4) in Step 1, (13) can be rewritten as where b 0j " ş U ψ j puqβ 0j puq du P R d j and r t " `1 η tjl xψ jl , β 0j y is the truncation error.Given some prescribed positive integer L, in Step 2, we choose tη pt´hqk : h P rLs, k P rpsu as vector-valued instrumental variables.Then b 0 " pb T 01 , . . ., b T 0p q T P R ř p j"1 d j can be identified by the following moment equations: Etη pt´hqk ε t u " g hk pb 0 q `Rhk " 0 , k P rps , h P rLs , where g hk pb 0 q " Etη pt´hqk Y t u ´řp j"1 Etη pt´hqk η T tj b 0j u and the bias term R hk " ´Etη pt´hqk r t u.With tp η tj u tPrns,jPrps and t p ψ j p¨qu jPrps obtained in Step 1, for any b " pb T 1 , . . ., b T p q T P R ř p j"1 d j , we define which provides the empirical version of g hk pbq " Etη pt´hqk Y t u´ř p j"1 Etη pt´hqk η T tj b j u.Applying the block RMD estimation introduced in Section 3.2 below results in a block sparse estimator p

A general estimation procedure
In this section, we present the proposed Step 2 in a general block RMD estimation framework.Note that Step 2 considers the block sparse estimation of some matrix-valued parameters, θ 0 " pθ T 01 , . . ., θ T 0p q T P R ř p j"1 d j ˆd with each θ 0j P R d j ˆd .For SFLR with a scalar response, d " 1.Given some prescribed positive integer L and q " pL target moment functions θ Þ Ñ g i pθq mapping θ P R ř p j"1 d j ˆd to g i pθq P R d k ˆd with i " ph ´1qp `k and k P rps for h P rLs, where both p and q are large, we assume that θ 0 can be identified by the following moment equations: where R i 's are formed by autocovariance-based truncation errors due to finite approximations in Step 1.
We are interested in estimating the block sparse θ 0 based on empirical mappings θ Þ Ñ p g i pθq of θ Þ Ñ g i pθq for i P rqs.See Sections 3.1 and 4 for detailed expressions of g i p¨q and p g i p¨q in some exemplified models.
It follows from ( 16) that p g i pθ 0 q « 0 , i P rqs .
Based on (17), we define the block RMD estimator p θ " p p θ T 1 , . . ., p θ T p q T P R ř p j"1 d j ˆd as a solution to the following convex optimization problem: where γ n ě 0 is a regularization parameter.The group information is encoded in the objective function, which forces the elements of p θ j to either all be zero or nonzero, thus producing the block sparsity in p θ.It is worth noting that, without the bias terms R i 's in ( 16), our proposed block RMD estimation framework can be seen as a blockwise generalization of the RMD estimation (Belloni et al., 2018) by replacing | ¨| by } ¨}F .To solve the large-scale convex optimization problem in (18), we use the R package CVXR (Fu et al., 2020), which is easy to implement and converges fast.In Sections 4.1, 4.2 and 4.3, we will illustrate our proposed autocovariance-based block RMD estimation framework using examples of SFLR, FFLR and VFAR, respectively.

Theoretical properties
For a block matrix B " pB ij q iPrN 1 s,jPrN To simplify notation in this section and theoretical analysis in Section 4, we assume the same truncated dimension d j " d across j P rps, but our theoretical results can be extended naturally to the more general setting where d j 's are different.
Let gpθq " tg 1 pθq T , . . ., g q pθq T u T and R " pR T 1 , . . ., R T q q T P R qdˆd .We focus on the case of which the moment function θ Þ Ñ gpθq mapping from R pdˆd to R qdˆd is linear with respect to θ in the form of gpθq " Gθ `gp0q for some G P R qdˆpd .This together with ( 16) implies that the form of which can be easily verified for, e.g., SFLR, FFLR and VFAR models considered in Section 4. Now we reformulate the optimization task in (18) as where p gpθq " p Gθ `p gp0q is the empirical version of gpθq.It is worth noting that θ 0 is block s-sparse with support S " tj P rps : }θ 0j } F ‰ 0u and its cardinality s " |S|.
Before presenting properties of the block RMD estimator p θ, we impose some high-level regularity conditions.
Conditions 4(i) and (ii) together ensure that the empirical moment functions are nicely concentrated around the target moment functions.Using our derived non-asymptotic results in Section 2.2, we can easily specify the concentration bounds in Condition 4(i) for SFLR, FFLR and VFAR.With further imposed smoothness conditions on coefficient functions, Condition 4(ii) can also be verified.Condition 4(iii) indicates that θ 0 is feasible in the optimization problem (20) with high probability, in which case a solution p θ of (20) exists and satisfies } p θ} pd, dq 1 . The non-block version of such property typically plays a crucial role to tackle high-dimensional models in the literature.
Let δ " θ ´θ0 .We define a block 1 -sensitivity coefficient where Condition 4(iii) as justified in Lemma 1 in Appendix B, the lower bound of κpθ 0 q is useful to establish the error bound for } p δ} pd, dq 1 . See also Gautier and Rose (2019) for non-block q -sensitivity quantities to handle high-dimensional instruments.We then need Condition 5 below to determine such lower bound.
Note that G can be divided into q ˆp blocks of the size d ˆd.Let G J,M be the submatrix of G consisting of all the pj, kq-blocks with j P J Ă rqs and k P M Ă rps.For an integer m ě s, let where σ min pG J,M q and σ max pG J,M q are the smallest and largest singular values of G J,M .
In Condition 5, the quantity µ serves as a key factor to determine the lower bound of κpθ 0 q, which is justified in Lemma 4 in Appendix B. When µ is bounded away from zero, we have a strongly-identified model.When µ Ñ 0, it corresponds to the scenario with weak instruments.See also Belloni et al. (2018) for similar conditions.
Theorem 3. Let Conditions 4-5 hold.If }θ 0 } pd, dq 1 ď K for some K ą 0 and the regularization parameter γ n À pK `1q n1 ` 2 , then with probability at least 1 ´pδ n1 `δn2 q, the block RMD estimator p θ satisfies Remark 3.1.(i) The error bound in ( 22) has the familiar variance-bias tradeoff as commonly considered in nonparametrics statistics, suggesting us to carefully select the truncated dimension d so as to balance the variance and bias terms for the optimal estimation.
(ii) With commonly imposed smoothness conditions on functional coefficients, it is easy to verify that K _ 2 " opsq for SFLR, FFLR and VFAR in Section 4. 1{2 km Ñ 0 as l, m Ñ 8. Consider a general cross-covariance matrix G " Epxy T q P R qdˆpd with entries decaying to zero as d Ñ 8, where x " px 1 , . . ., x qd q T with Epxq " 0 and y " py 1 , . . ., y pd q T with Epyq " 0, it is more sensible to impose Condition 5 on its normalized version r G " D x GD y instead of G itself, where D x " diagtVarpx 1 q ´1{2 , . . ., Varpx qd q ´1{2 u and D y " diagtVarpy 1 q ´1{2 , . . ., Varpy pd q ´1{2 u.For three exemplified models, D x and D y are formed by tλ ´1{2 jl : j P rps, l P rdsu.
Remark 3.1(iii) motivates us to present the following proposition that will be used in the theoretical analysis of associated estimators for SFLR, FFLR and VFAR in Section 4.
Proposition 1. Suppose that all conditions in Theorem 3 hold except that Condition 5 holds for r G, then with probability at least 1 ´pδ n1 `δn2 q, the block RMD estimator p θ satisfies

Applications
In this section, we illustrate the proposed estimation procedures with the three concrete models, namely SFLR, FFLR and VFAR.

High-dimensional SFLR
Consider the high-dimensional SFLR in (13), we first perform autocovariance-based dimension reduction on tW tj p¨qu tPrns for each j P rps.According to Section 3.1 and following the optimization framework in (18), we then develop the block RMD estimator p b as a solution to the constrained optimization problem: where γ n ě 0 is a regularization parameter and p g hk pbq is defined in (15).Given that the recovery of functional sparsity in β 0 p¨q is equivalent to estimating the block sparsity in b 0 , in Step 3, we estimate functional sparse coefficients by βj p¨q " p ψ j p¨q T p b j , j P rps . (24) We next present the convergence analysis of t βj p¨qu jPrps .To simplify the notation, we assume the same truncated dimension d j " d across j P rps.We rewrite ( 14) in the form of ( 19), where g " pg T 11 , . . ., g T 1p , . . ., g T L1 , . . ., g T Lp q T , R " pR T 11 , . . ., R T 1p , . . ., R T L1 , . . ., R T Lp q T and G " pG ij q P R pLdˆpd whose pi, jq-th block is G ij " Etη pt´hqk η T tj u P R dˆd with i " ph ´1qp `k and k P rps for h P rLs.Applying Theorem 2 and Proposition 3 in Appendix A on p G and p gp0q, respectively, we can verify Condition 4(i) with the choice of n1 -M W,Y d α`2 pn ´1 log pq 1{2 , where M W,Y is specified in Proposition 3. Before presenting the main theorem, we list the regularity conditions below.
Condition 6. (i) For each j P S " tj P rps : }β 0j } ‰ 0u, β 0j p¨q " ř 8 l"1 a jl ψ jl p¨q and there exists some positive constant τ ą α `1{2 such that |a jl | À l ´τ for l ě 1; (ii) Let r G " p r G ij q be the normalized version of G " pG ij q by replacing each G ij by r G ij " EtD k η pt´hqk η T tj D j u, i " ph ´1qp `k, k P rps for h P rLs and j P rps, where D j " diagpλ ´1{2 j1 , . . ., λ ´1{2 jd q.There exist universal constants c 6 ą 0 and µ ą 0 such that σ max pm, r Gq ě c 6 and σ min pm, r Gq{σ max pm, r Gq ě µ for m " 16s{µ 2 .
Condition 6(i) restricts each component in tβ 0j p¨q : j P Su based on its expansion through basis tψ jl p¨qu lě1 .The parameter τ determines the decay rate of basis coefficients and hence controls the level of smoothness with large values yielding smoother functions in tβ 0j p¨q : j P Su.See similar conditions in Hall and Horowitz (2007) and Kong et al. (2016).Noting that components of G decay to zero as d grows to infinity, we impose Condition 6(ii) on r G, which can be viewed as the normalized counterpart of Condition 5 for SFLR.
Applying Proposition 1 and Theorem 1 yields the convergence rate of the SFLR estimate p βp¨q " t β1 p¨q, . . ., βp p¨qu T under functional 1 norm in the following theorem.are small.To balance the variance and bias terms in (25) for the optimal estimation, we can choose the optimal truncated dimension d -pM 2 W,Y n ´1 log pq ´1{p2τ `2α`3q .(ii) Note that our convergence analysis relies on (12) rather than the normalized deviation bounds in Guo and Qiao (2022), the rate in (25) is slightly slower than that in Fang et al. (2022) by a multiplicative factor d α{2 .For univariate functional linear regression, we similarly observe a slower rate for the autocovariancebased generalized methods-of-moments estimator (Chen et al., 2022) compared to the covariance-based least squares estimator (Hall and Horowitz, 2007).

High-dimensional FFLR
Consider high-dimensional FFLR in the form of where tX t p¨qu tPrns satisfy model (1) and are independent of i.i.p¨q correspond to the dynamic signal and white noise elements, respectively.Then Y t p¨q can be approximated under the autocovariance-based expansion in the sense of (4) and our subsequent analysis still follow.
For each j P rps, we expand X tj p¨q according to (4) truncated at d j .Some specific calculations lead to the representation of (26) as where B 0j " ş U ˆV ψ j puqβ 0j pu, vqφpvq T dudv P R d j ˆd and r t " pr t1 , . . ., r t dq T is the truncation error with r tm " ř p j"1 ř 8 l"d j `1 η tjl xxψ jl , β 0j y, φ m y for m P r ds.Let B 0 " pB T 01 , . . ., B T 0p q T P R ř p j"1 d j ˆd .We choose tη pt´hqk : h P rLs, k P rpsu as vector-valued instrumental variables, which are assumed to be uncorrelated with the random error ε t in ( 27).Within the framework of ( 16), we assume that B 0 is the unique solution to the following moment equations: 0 " Etη pt´hqk ε T t u " g hk pB 0 q `Rhk , h P rLs , k P rps , where g hk pB 0 q " Etη pt´hqk ζ T t u ´řp j"1 Etη pt´hqk η T tj B 0j u and R hk " ´Etη pt´hqk r T t u.Given the recovery equivalence between functional sparsity in β 0 and the block sparsity in B 0 , we aim to estimate the block sparse matrix B 0 using the empirical versions B Þ Ñ p g hk pBq for h P rLs and k P rps, where t p ψ j puqu jPrps and p φpvq " t φ1 pvq, . . ., φ dpvqu T are obtained in Step 1.
In the following, we investigate the convergence property of t βj p¨, ¨qu jPrps in (29).To simplify the notation, we assume the same truncated dimension d j " d across j P rps.We first rewrite (28) in the form of ( 19) and apply Theorem 2 and Proposition 2 in Appendix A on p G and p gp0q to verify Condition 4(i) with the choice of n1 -M W,Y d α_ α`2 pn ´1 log pq 1{2 , where M W,Y is specified in Proposition 2. In a similar fashion to α, the parameter α as specified in Condition 10 in Appendix A determines the tightness of eigengaps of the covariance function of tY t p¨qu.We then impose the following smoothness condition on nonzero coefficient functions.
We are now ready to present the convergence rate of the FFLR estimate p βp¨, ¨q " t β1 p¨, ¨q, . . ., βp p¨, ¨qu T under functional 1 norm in Theorem 5.

High-dimensional VFAR
The high-dimensional VFAR of a fixed lag order H, namely VFAR(H), takes the form of where tX t p¨qu satisfy model (1), the errors ε t p¨q " tε t1 p¨q, . . ., ε tp p¨qu T are i.i.d.sampled from a p-vector of mean-zero random functions, independent of X t´1 p¨q, X t´2 p¨q, . . ., and A ph 1 q 0 " tA ph 1 q 0,jj 1 p¨, ¨qu j,j 1 Prps is the unknown functional transition matrix at lag h 1 .In the special case H " 1 with A 0 " A p1q 0 , Theorem 3.1 of Bosq (2000) ensures the stationarity of tX t p¨qu if there exists an integer l 0 such that sup }f }ď1 }A l 0 0 pf q} ă 1 for f P H p .According to Guo and Qiao (2022), all VFAR(H) models can be reformulated as a VFAR(1) model and hence it is not hard to adjust the stationarity condition for the general case H ą 1.To make a feasible fit to (31) under a high-dimensional regime based on observed curves tW t p¨qu tPrns , we assume tA ph 1 q 0 u h 1 PrHs is rowwise functional s-sparse with s " max jPrps s j !p.To be specific, for the j-th row of components in tA ph 1 q 0 u, we denote the set of nonzero functions by S j " tpj 1 , h 1 q P rps ˆrHs : }A ph 1 q 0,jj 1 } S ‰ 0u and its cardinality by s j " |S j | for j P rps.
For each j P rps, we approximate X tj p¨q based on the expansion in (4) truncated at d j .With some specific calculations, model ( 31) can be rowwisely rewritten as where Ω ph 1 q 0,jj 1 " ş U 2 ψ j 1 puqA ph 1 q 0,jj 1 pu, vqψ j pvq T dudv P R d j 1 ˆdj and r tj " pr tj1 , . . ., r tjd j q T is the truncation error with each r tjm " ř H h 1 "1 ř p j 1 "1 ř 8 l"d j 1 `1 η pt´h 1 qj 1 l xxψ j 1 l , A ph 1 q 0,jj 1 y, ψ jm y for m P rd j s.Let Ω 0j " tpΩ p1q 0,j1 q T , . . ., pΩ p1q 0,jp q T , . . ., pΩ pHq 0,j1 q T , . . ., pΩ pHq 0,jp q T qu T P R H ř p j 1 "1 d j 1 ˆdj .We choose tη pt´H´hqk : h P rLs, k P rpsu as vector-valued instrumental variables, which are assumed to be uncorrelated with the random error ε tj in (32).Within the framework of ( 16), we assume that Ω 0j is the unique solution to the following moment equations: 0 " Etη pt´H´hqk ε T tj u " g j,hk pΩ 0j q `Rj,hk , h P rLs , k P rps , where g j,hk pΩ 0j q " Etη pt´H´hqk η T tj u´ř H h 1 "1 ř p j 1 "1 Etη pt´H´hqk η T pt´h 1 qj 1 Ω ph 1 q 0,jj 1 u and R j,hk " ´Etη pt´H´hqk r T tj u.Given that estimating the functional sparsity in the j-th row of tA ph 1 q 0 u h 1 PrHs is equivalent to estimating the block sparsity in Ω 0j for each j, our goal is to estimate the block sparse matrix Ω 0j using the empirical versions Ω j Þ Ñ p g j,hk pΩ j q for h P rLs and k P rps, where p g j,hk pΩ j q " 1 n ´H ´h and tp η tj u tPrns,jPrps are obtained in Step 1.
Step 2 follows (18) to formulate the block RMD estimator p Ω j by solving the following optimization task: where γ nj ě 0 is a regularization parameter.
Step 3 estimates functional transition matrices by Âph 1 q jj 1 pu, vq " p ψ j 1 puq T p Ω ph 1 q jj 1 p ψ j pvq , pu, vq P U 2 , j, j 1 P rps , h 1 P rHs , where t p ψ j p¨qu jPrps are obtained in Step 1.
We next present convergence analysis of t Âph 1 q jj 1 p¨, ¨q : j, j 1 P rps, h 1 P rHsu.To simplify the notation, we assume the same truncated dimension d j " d across j P rps.For each j, we first express (33) as below: g j pΩ 0j q `Rj " G j Ω 0j `gj p0q `Rj " 0 , where g j " pg T j,11 , . . ., g T j,1p , . . ., g T j,L1 , . . ., g T j,Lp q T , R j " pR T j,11 , . . ., R T j,1p , . . ., R T j,L1 , . . ., R T j,Lp q T and G j " pG j,ii 1 q P R pLdˆpHd whose pi, i 1 q-th block is G j,ii 1 " Etη pt´H´hqk η T pt´h 1 qj 1 u P R dˆd with i " ph ´1qp `k, k P rps for h P rLs and i 1 " ph 1 ´1qp `j1 , j 1 P rps for h 1 P rHs.Applying Theorem 2 on p G j and p g j p0q, we can verify Condition 4(i) with the choice of n1 -M W 1 d α`2 pn ´1 log pq 1{2 .Similar to Condition 6 for SFLR, we then give the following regularity conditions.Condition 8. (i) For each j P rps and pj 1 , h 1 q P S j , A ph 1 q 0,jj 1 pu, vq " ř 8 l,m"1 a ph 1 q jj 1 lm ψ j 1 m puqψ jl pvq and there exists some constant τ ą α `1{2 such that |a ph 1 q jj 1 lm | À pl `mq ´τ ´1{2 for l, m ě 1; (ii) For each j P rps, let r G j " p r G j,ii 1 q be the normalized version of G j " pG j,ii 1 q by replacing each G j,ii 1 by r G j,ii 1 " EtD k η pt´H´hqk η T pt´h 1 qj 1 D j 1 u for i " ph ´1qp `k and i 1 " ph 1 ´1qp `j1 with k, j 1 P rps, h P rLs and h 1 P rHs, where D j " diagpλ ´1{2 j1 , . . ., λ ´1{2 jd q.There exist universal constants cj ą 0 and µ j ą 0 such that σ max pm, r G j q ě cj and σ min pm, r G j q{σ max pm, r G j q ě µ j for m " 16s j {µ 2 j .
We finally establish the convergence rate of the VFAR estimate t Âph 1 q jj 1 p¨, ¨qu j,j 1 Prps,h 1 PrHs in the sense of functional matrix 8 norm as follows.
Theorem 6. Suppose that Conditions 1-3 and 8 hold.If the regularization parameters satisfy γ njs j td α`2 M W 1 pn ´1 log pq 1{2 `d´τ`1{2 u for j P rps and µ " min jPrps µ j , the estimate t Âph 1 q jj 1 p¨, ¨qu satisfies Remark 4.3.Similar to Remarks 4.1(ii) and 4.2 (ii) for SFLR and FFLR respectively, the rate for t Âph 1 q jj 1 p¨, ¨qu in ( 34) is slightly slower than that for the covariance-based estimator in Guo and Qiao (2022) by the same factor d α{2 .
5 Empirical studies

Simulation study
In this section, we conduct a number of simulations to evaluate the finite-sample performance of the proposed autocovariance-based estimators for SFLR, FFLR and VFAR models.
VFAR: We generate block sparse Ω with 5% or 10% nonzero blocks for p " 80 or p " 40, respectively.Specifically, for the j-th block row, we set the diagonal block Ω jj " diagp0.60,0.59, 0.58, 0.3, 0.2, 6 ´2, . . ., 25 ´2q and randomly choose one off-diagonal block being 0.4Ω jj and two off-diagonal blocks being 0.1Ω jj .Such block sparse design on Ω can guarantee the stationarity of the generated VFAR(1) process.It is worth noting that estimating VFAR(1) results in a very high-dimensional task, since, e.g. even under the most 'low-dimensional' setting with p " 40, n " 400 and truncated dimension d " 3, one needs to estimate p40 ˆ3q 2 " 14, 400 parameters based on only 400 observations.The p-vector of functional covariates tX t p¨qu tPrns for SFLR and FFLR below are generated in the same way as those for VFAR.
Implementing our proposed autocovariance-based learning framework (AUTO) requires choosing L and d j 's.As our simulated results suggest that the estimators are not sensitive to the choice of L, we set L " 3 in simulations.To select d j , we take the standard approach by selecting the largest d j eigenvalues of p K jj in ( 6) such that the cumulative percentage of selected eigenvalues exceeds 90%.jk q T p η pt´1qk } 2 for VFAR, and choose the one with the smallest error.
We compare AUTO with the standard covariance-based estimation framework (COV), which proceeds in the following three steps.The first step performs FPCA on tW tj p¨qu tPrns for each j P rps, where the truncated dimension was selected in the same way as d j .Therefore, estimating SFLR and FFLR models are transformed into fitting multiple linear regressions with the univariate response (Kong et al., 2016) and the multivariate response (Fang et al., 2022), respectively and the VFAR estimation is converted to the VAR estimation (Guo and Qiao, 2022).The second step considers minimizing the covariance-based criterion, essentially the least squares with the addition of a group lasso type penalty.Such criterion can be optimized using an efficient block fast iterative shrinkage-thresholding algorithm developed in Guo and Qiao (2022), which converges faster than the commonly adopted block coordinate descent algorithm (Fan et al., 2015).The third step recovers functional sparse estimates using estimated eigenfunctions.
We examine the performance of COV and AUTO for three models in terms of relative estimation errors, i.e. } p A ´A} F {}A} F for VFAR, p ř p j"1 } βj ´β0j } 2 q 1{2 {p ř p j"1 }β 0j } 2 q 1{2 for SFLR and p for FFLR.We ran each simulation 100 times.Figure 1 displays boxplots of relative estimation errors for three models.Several conclusions can be drawn from Figure 1.First, AUTO significantly outperforms COV for three models under all scenarios we consider.Second, as discussed in Section 2.1, AUTO provides consistent estimates, while the consistency of COV estimates is jeopardized by the white noise contamination.This can be demonstrated by our empirical results that AUTO provides more substantially improved estimates over COV as n increases from 100 to 400.Third, the performance of AUTO slightly deteriorates as p increases from 40 to 80, providing empirical evidence to support that the rates in ( 25), ( 30) and (34) for SFLR, FFLR and VFAR models, respectively, all depend on the plog pq 1{2 term.

Real data analysis
In this section, we illustrate our developed methodology using a public financial dataset, which was obtained from the WRDS database and consists of high-frequency observations of prices for S&P 100 index and component stocks (list available in Table 2 in Appendix C, we removed several stocks for which the data were not available so that p " 98 in our analysis) in year 2017 comprising 251 trading days.We obtain one-minute resolution prices by using the last transaction price in each one-minute interval after removing the outliers, and hence convert the trading period (9:30-16:00) to minutes r0, 390s.
We construct cumulative intraday return (CIDR) trajectories (Horváth et al., 2014), in percentage, by W tj pu k q " 100rlogtP tj pu k qu ´logtP tj pu 1 qus, where P tj pu k q pt P rns, j P rps, k P rN sq denotes the price of the j-th stock at the k-th minute after the opening time on the t-th trading day.We work with mildly smoothed CIDRs obtained by expanding the data with respect to a 45-dimensional B-spline basis.The  CIDR curves always start from zero and have nearly the same shape as the original price curves, but make the stationarity assumption more plausible.We performed the functional KPSS test (Horváth et al., 2014) on CIDR curves for each stock using the R package fsta (Shang, 2013).The p-values are all larger than 1%, which indicates that there is no overwhelming evidence against the stationarity.
Our target is to predict the intraday return of the S&P 100 index based on observed CIDR trajectories of component stocks, W tj puq, u P U " r0, N s up to time N, where, e.g., N " 360 corresponds to 30 minutes prior to the closing time of the trading day.With this in mind, we construct a sparse SFLR model with erroneous functional covariates as follows where Y t is the intraday return of the S&P 100 index on the t-th trading day, X tj p¨q and e tj p¨q represent the signal and noise components in W tj p¨q, respectively.We split the whole dataset into three subsets: training, validation and test sets consisting of the first 171, the subsequent 40 and the last 40 observations, respectively.We apply the validation set approach to select the regularization parameters for AUTO and COV, based on which we estimate sparse functional coefficients in (35) and calculate the mean squared prediction errors (MSPEs) on the test set.For comparison, we also implement autocovariance-based generalized method-of-moments (AGMM) (Chen et al., 2022) and covariance-based least squares method (CLS) (Hall and Horowitz, 2007) to fit the unvariate version of (35) for each component stock, among which we choose the best models leading to the lowest test MSPEs.Finally, we include the null model using the mean of training responses to predict test responses.
The resulting test MSPEs for different values of N and all comparison approaches are presented in Table 1.We observe a few apparent patterns.First, in all scenarios we consider, AUTO provides the best predictive performance, while the autocovariance-based methods are superior to the covariancebased counterparts.Second, the predictive accuracy for functional regression type of methods improves as N approaches to 390 providing more recent information into the covariates.Third, AUTO and COV significantly outperform AGMM and CLS, while Mean gives the worst results.This indicates that using multiple selected functional covariates from the trading histories indeed improves the prediction results.
In analogy to (10), we can define the functional cross-spectral stability measure of all k 1 -dimensional subsets of tW t p¨qu and k 2 -dimensional subsets of tY t p¨qu (or for k 1 P rps and k 2 P rps.For scalar time series tZ t u, the non-functional stability measure degenerates to , which is equivalent to that proposed in Basu and Michailidis (2015).The stability measure of all kdimensional subsets of tZ t u, i.e.M Z k for k P rps, can be defined similarly according to (10).For each k P rps, we represent Y tk p¨q " ř 8 m"1 ζ tkm φ km p¨q under the Karhunen-Loève expansion, where ζ tkm " xY tk , φ km y and tpθ km , φ km qu mě1 are the pairs of eigenvalues and eigenfunctions of Σ Y 0,kk .Let tp θkm , φkm qu mě1 be the estimated eigenpairs of p Σ Y 0,kk and ζtkm " xY tk , φkm y.We next impose a condition on the eigenvalues tθ km u mě1 and then develop the deviation bound in elementwise 8 -norm on how σW,Y h,jklm " pn ´hq ´1 ř n t"h`1 ηpt´hqjl ζtkm concentrates around σ W,Y h,jklm " Covtη pt´hqjl , ζ tkm u, which plays a crucial role in the convergence analysis of the FFLR estimate in Section 4.2.
Proposition 2. Suppose that Conditions 1-3, 9(i) and 10 hold, tY t p¨qu tPrns is sub-Gaussian functional linear process and h is fixed.Let d and d be positive integers possibly depending on pn, p, pq and M W,Y " For n Á pd 2α`2 _ d2α`2 qpM W,Y q 2 logpppq, there exist some positive constants c 7 and c 8 independent of pn, p, p, d, dq such that max jPrps,kPrps,lPrds,mPr holds with probability greater than 1 ´c7 pppq ´c8 .
We next consider a mixed process scenario consisting of tW t p¨qu and tZ t u and establish the deviation bound on how ˆ X,Z h,jkl " pn ´hq ´1 ř n t"h`1 ηpt´hqjl Z tk concentrates around X,Z h,jkl " Covtη pt´hqjl , Z tk u, which is essential in deriving the convergence rate of the SFLR estimate in Section 4.1.
Proposition 3. Suppose that Conditions 1-3 and 9(ii) hold, tZ t u tPrns is sub-Gaussian linear process and h is fixed.Let d be a positive integer possibly depending on pn, p, pq and , there exist some positive constants c 9 and c 10 independent of pn, p, p, dq such that max jPrps,kPrps,lPrds holds with probability greater than 1 ´c9 pppq ´c10 .

B Technical proofs
Throughout, we use c, c, c, č and 9 c to denote generic positive finite constants that may be different in different uses.
Proof.Let v j denotes the j-th row of V T x for j P rrs.Write σ 2 r }x} 2 F ď }Ax} 2 F " trpx T A T Axq " trpx T VΛ 2 V T xq " p ř r j"1 σ 2 j v T j v j q 1{2 ď σ 2 1 }x} 2 F , where, in the inequalities above, we have used }V T x} F " }x} F due to the orthonormality of V. Taking the squared root on both sides completes the proof of this lemma. l To simplify the notation, we will use σ min pmq and σ max pmq to represent σ min pm, Gq and σ max pm, Gq, respectively.
Lemma by ( 21).Let T 1 denote the largest m components of t}δ i } F u iPrps , and T 2 be the subsequent m-largest, etc.Let V µ " diagpµ b 1 d q where µ P R q with ř q i"1 Ip|µ i | ‰ 0q ď m and 1 d " p1, . . ., 1q T P R d .We let }µ} " p ř q i"1 µ 2 i q 1{2 and }µ} 8 " max iPrqs |µ i |.Then, we have where G ¨,T j is the block submatrix of G consisting of all rows and all block columns in T j of G for j ě 1.
Define J1 " arg max |J|ďm σ min pG J,T 1 q.We can let µ " pµ i q with µ i " 1 if i P J1 and 0 otherwise, so that }µ} 8 " 1.Then the first term in (B.1) becomes where the first inequality comes from Lemma 2. Define Jj " arg max |J|ďm σ max pG J,T j q for each j ě 2. By the similar arguments as above, the second term in (B.1) becomes By the construction of sets tT j u jě1 , we have }δ  on both sides of (B.7).l Lemma 4. Suppose that Condition 5 holds.Then there exists some positive constant c such that κpθ 0 q ě cµ 2 {p24sq.
´Ωph 1 q 0,jj 1 uψ j › › S " } p Ω ph 1 q jj 1 ´Ωph 1 q 0,jj 1 } F . (B.37) We next bound the fourth term.For pj 1 , h 1 q P S j , by the orthonormality of tψ jl u, where the third term above is of a smaller order of the second term due to (B.36).By max j } p Ω j } pd,dq 1 ď max j } p Ω j ´Ω0j } pd,dq 1 `max j }Ω 0j } pd,dq 1 , (B.35) and Theorem 1, the first term is of a smaller order of the second term.Applying (B.36) with µ " min j µ j and s " max j s j completes our proof.l C List of S&P 100 component stocks used in Section 5.2

)
Remark 4.2.(i) With the same expression of G for both SFLR and FFLR, Condition 6(ii) is required in both Theorems 4 and 5.Note we can further remove the assumption dd, and establish the general convergence rate as a function of d, d and other parameters.(ii) The rate for the autocovariance-based estimator in (30) is slightly slower than that for the covariancebased estimator in Fang et al. (2022) by a multiplicative factor d α{2 .

.
By cancelling }θ 0,S } pd, dq 1 on both sides above, we obtain } p θ S c ´θ0,S c } By the same techniques to prove (B.29), we bound the first three terms 0,jj 1 } S ď max j } p Ω j } pd,dq 1 " d 1{2 max jPrps,lPrds } ψjl ´ψjl } `d1{2 max j 1 Prps,mPrds } ψj 1 m ´ψj 1 m } * `max j } p Ω j ´Ω0j } pd,dq1`Ops j d ´τ `1{2 q , d. mean-zero functional errors tε t p¨qu tPrns , and tβ 0j p¨, ¨qu jPrps are functional coefficients to be estimated.With observed data tpW t puq, Y t pvqq : pu, vq P U ˆV, t P rnsu, we target to estimate β 0 " tβ 01 p¨, ¨q, . . ., β 0p p¨, ¨qu T under a functional sparsity constraint when p is large.Specifically, we assume β 0 is functional s-sparse with support S " tj P rps : }β 0j } T t φp¨q, where ζ t " pζ t1 , . . ., ζ t dq T and φp¨q " tφ 1 p¨q, . . ., φ dp¨qu T .Note that we can relax the independence assumption for tε t p¨qu tPrns and model observed responses via r Y S ‰ 0u and cardinality s " |S| !p. Provided that each observed Y t p¨q is decomposed into the sum of dynamic and white noise components in (26), we approximate Y t p¨q under the Karhunen-Loève expansion truncated at d, i.e.Y t p¨q « ζ t p¨q " Y t p¨q `eY t p¨q, where Y t p¨q and e Y t where p ζ t " p ζt1 , . . ., ζt dq T with ζtm " xY t , φm y for m P r ds and tp η tj u tPrns,jPrps are obtained in Step 1.In hk pBq} F ď γ n , where γ n ě 0 is a regularization parameter.In Step 3, we estimate the coefficient functions by βj pu, vq " p ψ j puq T p B j p φpvq , pu, vq P U ˆV , j P rps , The AIC and BIC methods require the calculation of the effective degrees of freedom, which leads to a very challenging task given the high-dimensional, functional and dependent nature of the model structure and hence is left for future research.In our simulations, we generate a training sample of size n and a separate validation sample of the same size.Using the training data, we compute a series of estimators with 30 different values of the regularization parameters, i.e. t p b jPrps ) as a function of γ n for SFLR (or FFLR) and t p Ω pγ nj q jk u kPrps as a function of γ nj for VFAR, calculate the squared error between observed and fitted values on the To choose the regularization parameter(s) for each model and comparison method, there are several possible methods one could adopt such as AIC, BIC and cross-validation.u