Functional mixed effects wavelet estimation for spectra of replicated time series

Motivated by spectral analysis of replicated brain signal time series, we propose a functional mixed effects approach to model replicate-specific spectral densities as random curves varying about a deterministic population-mean spectrum. In contrast to existing work, we do not assume the replicate-specific spectral curves to be independent, i.e. there may exist explicit correlation between different replicates in the population. By projecting the replicate-specific curves onto an orthonormal wavelet basis, estimation and prediction is carried out under an equivalent linear mixed effects model in the wavelet coefficient domain. To cope with potentially very localized features of the spectral curves, we develop estimators and predictors based on a combination of generalized least squares estimation and nonlinear wavelet thresholding, including asymptotic confidence sets for the population-mean curve. We derive risk bounds for the nonlinear wavelet estimator of the population-mean curve, a result that reflects the influence of correlation between different curves in the replicate-population, and we derive consistency of the estimators of the inter- and intra-curve correlation structure in an appropriate sparseness class of functions. To illustrate the proposed functional mixed effects model and our estimation and prediction procedures, we present several simulated time series data examples and we analyze a motivating brain signal dataset recorded during an associative learning experiment.


Introduction
Spectral analysis of replicated time series has recently gained growing interest, in particular in the field of brain data analysis, where it is common to collect time series data (such as EEG or local field potential data) from multiple subjects, or over multiple trials in an experiment, and the inferential focus is not on the mean responses of the time series but on the stochastic variation of the time series about their means.Other applications can be found, for instance, in biomedical experiments, geophysical and financial data analysis, or speech modeling.While there is an extensive literature on spectral analysis and inference of individual time series, this is not necessarily the case for replicated time series, and existing approaches mostly work under simplifying assumptions such as independent or at least uncorrelated time series replications, which if not satisfied can lead to statistically inefficient estimators or even give misleading inferences.In this paper we address the specific problem of analyzing spectra of replicated time series showing potentially very localized features, allowing for explicit correlation between the time series replicates.To illustrate, one can think of subject-replicated time series data collected from multiple subjects in an experiment with possible correlation between subjects due to unknown covariates (age, gender, etc.), or data collected over multiple trials of an experiment, where the spectral characteristics of the trial-replicated time series evolve over the course of the experiment.A particular motivating example for the latter is spectral analysis of brain data trial-replicated time series in the context of learning experiments, such a dataset is analyzed in Section 7. As pointed out by [24] and [7] there is a strong need to generalize existing approaches into this direction, however only few modifications to the assumption of independent time series replicates have been developed by now.In the context of second-order spectral analysis for independent stationary replicated time series, [6] introduced a log-linear mixed effects model, which was later generalized by [14] and [16] by considering nonparametric estimation of the fixed effects curve.Also in the case of independent replicated time series, [8] developed a tree-structed wavelet method for log-spectral estimation, whereas [21] introduced a more general mixed-effects approach based on spline smoothing of empirical log-spectra handling two-level nested designs with replicated time series for a number of independent subjects, here different time series replicates within a subject are allowed to be correlated based on known covariates.[34] considered a covariate-indexed functional fixed effects model for time-varying spectra of independent replicated nonstationary time series, and [23] applied the Bayesian wavelet-based mixed effects approach developed by [25] to model time-varying spectra of replicated nonstationary time series, allowing for potential correlation between the time series replicates induced by the experimental design.More recently, in the context of learning experiments, [7] model log-spectra of replicated nonstationary time series trials by including a replicate-time effect that evolves over the course of the experiment.In a general functional data analysis context, not aimed at spectral analysis of time series in particular, nonparametric functional mixed effects models have been considered among others by [13], and [33] using smoothing-spline approaches, and in [3] using functional principal components.In order to avoid the modelling of functional data by inherently smooth curves, wavelet-based approaches have been considered by [25] and [26] using Bayesian wavelet shrinkage methods, by [11] using nonlinear wavelet thresholding, and by [2] focusing on inference in a wavelet-based functional mixed effects model (see [24] for a comprehensive overview).In this work we introduce an additive two-layer functional mixed effects model in the frequency domain for a collection {X s (t)} s=1,...,S of S individual time series, each with discrete observations over time.The time series replicates are modeled to have random replicate-specific log-spectra, which consist of a fixed effects curve on the first layer (population-average or -mean log-spectrum), additional to replicate-specific random effects curves on the second layer.We model explicit correlation between the random effects curves and do this in an appropriate way to allow for its fully nonparametric estimation, disposing of only a single realization for each of the S time series replicates.As we observe only the noisy replicate-specific log-periodograms, we face a denoising problem of log-periodogram curves in the presence of potentially very localized structure for the underlying log-spectra, this problem is addressed by nonlinear wavelet thresholding.By projection onto an orthonormal wavelet basis, we obtain an equivalent finite-dimensional linear mixed effects model in the coefficient domain.This allows us to apply traditional linear mixed model estimation methods combined with nonlinear wavelet thresholding in a unified framework for both the fixed-and random effects empirical wavelet coefficients.To achieve simultaneous estimation of the fixed effects curve and the correlation structure between different random effects curves, we propose an easy-to-implement iterative generalized least squares estimation algorithm.We complete our methodology by proposing predictors of the individual replicate-specific log-spectra, as well as asymptotic confidence regions for the population-mean log-spectrum, which is interesting in its own as the literature on inference in the context of nonlinear wavelet thresholding estimators is relatively sparse.The structure of the paper is as follows.In Section 2 we introduce the model set-up in both the frequency and wavelet coefficient domain with an appropriate combined 0 -sparseness constraint for the fixed-and random effects that allows for general inhomogeneous functional behavior over frequency.Some conditions on the variance-covariance structure of the random effects allow for its consistent estimation.In Section 3 we present estimators for the different components in the model, and we also propose predictors for the replicate-specific log-spectra.Section 4 provides consistency results for the estimators of the fixed effects curve and the variance-covariance-structure of the random effects curves, where we consider asymptotics in both the time series length T and the replicate sample size S.In particular, we derive bounds on the L 2 -risk of the nonlinear wavelet estimator of the fixed effects curve in an appropriate 0 -sparseness class, a result that reflects the influence of correlation between different curves in the replicate population.In Section 5 we derive asymptotic confidence regions for the population-mean log-spectrum based on the nonlinear wavelet estimator.Section 6 presents numerical results on the performance of the estimation and inference procedures for simulated time series data, and in Section 7 we analyze a motivating data example consisting of brain signal time series data recorded over the course of an associative learning experiment.The technical proofs are deferred to the Appendix section, which can be found in the supplementary material.

Model setup
Let {X s (t)} t>0 be a collection of mean-zero second-order stationary univariate time series for replicates s = 1, . . ., S. We assume that the replicated time series are weakly dependent, as detailed in Section 2.1.1 below, in order to ensure that the power spectra are well-defined as the Fourier transform of the replicatespecific autocovariance functions.If we observe a collection of discretely sampled time series {X s (t), t = 1, . . .2T }, their raw bias-corrected log-periodograms at frequencies ω = /(2T ) ∈ [0, 1) are computed as, where γ ≈ 0.577 is a bias-correction equal to the Euler-Mascheroni constant (see [41]).For convenience, we consider the time series length to be dyadic 2T = 2 J in order to avoid additional complications in the subsequent wavelet estimation.

Frequency domain functional mixed model
We model the replicate-specific log-spectra as random curves varying about a deterministic population-mean log-spectrum, which is common to all replicates, see Figure 1 for a simulated example.Similar approaches are considered in [6], [8], and [21] to model the (log-)spectra of stationary replicated time series.We express the raw log-periodograms in terms of the following functional mixed effects model in the frequency domain: where, ) is a population-mean log-spectrum (functional fixed effect).Hereafter, L p (X) always denotes the L p -space of measurable functions on X with respect to the Lebesgue measure.2. {U f s , s = 1, . . ., S} are mean-zero random processes (functional random effects) with realizations in L 2 ([0, 1/2]) for each replicate s.The distributional assumptions and variance-covariance structure of the functional random effects are detailed in Section 2.1.2and Section 2.2 respectively.

E
, where σ 2 e := π 2 /6 as shown in [41].Note that for ω 0 = 0, E f s (ω 0 ) d → log(χ 2 1 ) + γ, but since the influence of this term is negligible for T large enough, we consider the error term E f s (ω 0 ) to have the same asymptotic distribution as the other error terms in our subsequent analysis (as in [27], [21], and [8]).The errors E f s (ω ) are assumed to be independent between different replicates and independent of the functional random effects U f s for all s, .Since our main interest lies in the analysis of the spectral characteristics of the replicated time series, we have introduced a functional mixed model on the level of the (log)-spectra in the frequency domain.It is nonetheless important to examine the implications of this model in the time domain, as the frequency domain model does not map to an additive functional mixed model for the replicated time series in the time domain.Each stationary mean-zero time series replicate {X s (t)} t>0 has a Cramér representation of the form: where the replicate-specific transfer functions A f s (ω) are [0, 1]-periodic random Hermitian functions, i.e.A f s (−ω) = A f * s (ω) (here * denotes the complex conjugate).The random processes Z s (ω) are orthogonal increment processes that are independent between replicates and independent of the random transfer functions A f s (ω), such that: This is related to the stochastic transfer function models in [21] and [20] for replicated time series organized in multiple groups or units, whereas in our case we dispose only of a single time series replicate per group.Conditional on the functional random effects U f s (ω) = u f s (ω) in the frequency domain, for each s = 1, . . ., S, the time series replicate {X s (t)} t>0 has a replicate-specific spectrum: where the realized replicate-specific spectra are non-negative by construction.Furthermore, conditional on the random effects U f s (ω) in the frequency domain, we assume that the time series replicates are weakly dependent in the sense that This ensures that the realized replicate-specific spectra above are well-defined as the Fourier transforms of the realized replicate-specific autocovariance functions, and the reverse for the inverse Fourier transform.

Wavelet domain linear mixed model
Since the realizations of the random replicate-specific log-spectra are L 2 ([0, 1])periodic functions, we consider a periodized orthonormal wavelet basis of L 2 ([0, 1]), denoted by B = {ψ k } ∞ k=0 , constructed from the translated and dilated versions of a sufficiently smooth father and mother wavelet function, compactly supported on [0, 1].Here, for ease of notation we compress the usual scale and location indices (j, m) into a single scale-location index k, using classical lexicographical ordering.Since the log-periodograms Y f s are sampled over a discrete grid of frequencies, instead of true wavelet coefficients (projections of the replicate-specific log-spectra), we compute the empirical wavelet coefficients: More specifically, projecting the discrete sampled frequency domain model in eq.( 2.2) onto the wavelet basis B via its discrete wavelet transform (we denote the discrete wavelet transform-matrix by W B ), we obtain a linear mixed model in the wavelet coefficient domain given by, where, . This is a deterministic sequence of fixed effect wavelet coefficients shared by all replicates in the population.

U
) for s = 1, . . ., S. In particular, we assume that the sequences U •k = (U 1k , . . ., U Sk ) are Gaussian random vectors for each k = 1, . . ., T .The assumptions on the variance-covariance structure of the vectors U ) for s = 1, . . ., S. The random vectors s• are sequences of asymptotically independent wavelet noise coefficients with E[ sk ] = o T (1) and Var( sk ) = σ 2 e /T + o T (T −1 ).The noise coefficients are independent between different replicates and independent of the random effects sequences for all s, k.

Covariance matrix assumptions
Let U = (U •1 , . . ., U •T ) be the S × T -dimensional random matrix of stacked random effects sequences U s• .One of our main interests is in allowing for explicit correlation between the random effects sequences of different replicates, therefore we will not assume the covariance matrix of vec(U ) to be diagonal as is the case in [6], [21], and [8].However, some structure on the covariance matrix Cov(vec(U )) is necessary, since consistent estimation in a totally unstructured matrix is impossible (using only ST observations).We consider structural assumptions on the variance-covariance matrix of vec(U ) as proposed in [25] and [26] in a general functional data analysis context.Since the frequency domain model and the wavelet coefficient domain model are equivalent representations, the structural assumptions in the wavelet domain automatically transfer to assumptions on the variance-covariance structure of the functional random effects in the frequency domain.We assume that the covariance matrix G := Cov(vec(U )) consists of the Kronecker product of a T × T within-replicate diagonal covariance matrix and an S × S between-replicate correlation matrix: where G S is symmetric and positive-semidefinite.By considering a diagonal within-replicate covariance matrix G T , the random effects coefficients are assumed to be uncorrelated between scale-locations k = 1, . . ., T .Note that a diagonal within-replicate covariance matrix G T in the wavelet domain does not mean that the within-replicate covariance matrix in the frequency domain also has to be diagonal.To illustrate, a single non-zero variance component σ 2 u1 corresponding to the variance of the random scaling coefficient at scale-location (0, 0) translates to a random shift in the mean of the replicate-specific logspectra in the frequency domain, thus resulting in highly correlated behavior of the random log-spectra over frequency.Furthermore, the variance components are heterogeneous across coefficients, therefore allowing for very general spatially inhomogeneous behavior of the random log-spectra in the frequency domain across replicates.The unstructured between-replicate correlation matrix G S allows for correlation between the random effects coefficients of different replicates at matching scale-locations k.We observe that the correlation ρ ss between two different replicates remains the same across all locations.This is in order to keep the dimensions of the working covariance matrices small, but also to allow for consistent estimation of ρ ss as the length of the time series increases.Note that G is symmetric and positive-semidefinite, since it is the Kronecker product of two symmetric positive-semidefinite matrices.Also, there is no identification issue between the two matrices G T and G S , as G S is restricted to have unit diagonal.

Functional space assumptions
In order to develop the necessary estimation theory, we impose some regularity (smoothness) conditions on the realized replicate-specific sequences in the wavelet coefficient domain, or equivalently, on the realized discretely sampled log-spectra in the frequency domain.In particular, we assume that the fixed and random effects sequences are asymptotically sparse elements of the 0 -sequence space with respect to the wavelet basis B, defined as: Assumption (A1).Let h ∈ 0,T (k h,T ), with set of indices of non-zero coefficients K h,T .We assume that k uT ) ∈ 0,T (k u,T ) with set of indices of non-zero coefficients K u,T , such that sup k σ 2 uk < ∞.We assume that K u,T ⊆ K h,T and These regularity conditions assert that the fixed and realized random effects sequences or curves increase in complexity with T (almost surely for the random effects), but at a slower rate than T .Furthermore, we make the assumption K u,T ⊆ K h,T , this is a convenient way to ensure that the population-mean logspectrum h f and the realized replicate-specific log-spectra h f s |u f s = h f + u f s share the same smoothness properties.This complexity constraint allows to disentangle the different parts in the variance components coming from the random effects and the noise terms, but is also important for the sake of interpretation in a functional mixed effects model, as discussed in [13], [33], and [2].
Before presenting the estimation procedure, we recall some useful results on nonlinear thresholding methods in a classical Gaussian sequence model under 0 -sparsity constraints.Consider the 0 -Gaussian sequence model: with θ = (θ 1 , . . ., θ n ) ∈ 0,n (k n ) and z 1 , . . ., z n iid ∼ N (0, 1).Under the 0 -sparsity constraint k n → ∞ as n → ∞, but k n = o(n), the minimax 2 -risk of estimation for θ satisfies, where • denotes the Euclidian norm.It is well-known that hard (or soft) nonlinear thresholding of the coefficients asymptotically achieves the minimax risk.
In particular, the hard nonlinear thresholding estimator θi = y i 1{|y i | ≥ λ n } for i = 1, . . ., n, with λ n = n 2 log(n/k n ) is an asymptotic minimax estimator, see [19] for a detailed proof.We note that the nonlinear thresholding estimators { θi } i=1,...,n are nonadaptive in the sense that the threshold λ n depends on the (typically unknown) smoothness space parameter k n .[1] show that in the context of 0 -Gaussian sequence models, using False Discovery Rate (FDR) nonlinear thresholding, it is possible to asymptotically achieve the minimax risk without requiring knowledge of the smoothness space parameter k n .For details on this FDR-based procedure, and the appropriate choice of its tuning parameter q n , we refer to [1].

Population-mean log-spectrum
We estimate the population-mean log-spectrum h f (ω ) at frequencies ω ∈ [0, 1/2) by the projection estimator, where, 1. w k = (w 1k , . . ., w Sk ) are generalized least squares weights depending on the between-replicate correlation structure through . Here, V k denotes the asymptotic covariance matrix T I S , with I S the S × S-identity matrix.e /(ST ) 2 log(T /k h,T ).The motivation for this thresholded set comes from the observation that the replicate-specific sequences -conditional on the random effects-are independent between replicates and, individually for each replicate, follow an 0,T (k h,T )-sequence model with the same set of non-zero coefficients K h,T for each replicate and noise variance approximately σ 2 e /T .In the unconditional case, the distributional behavior of the sequences at scalelocations of zero coefficients (k / ∈ K h,T ) remains unchanged, allowing for the same control on the number of false positives as in the conditional case.Moreover, under some regularity conditions, the empirical wavelet noise coefficients are asymptotically normal for increasing T , this asymptotically justifies the threshold choice λ h,T based on a Gaussian sequence model (see Section 4.1).

Estimation of the within-replicate covariance matrix
The within-replicate random effects covariance matrix G T is assumed to be diagonal, with vector of variance components σ 2 u = (σ 2 u1 , . . ., σ 2 uT ) ∈ 0,T (k u,T ) on the diagonal.This vector is estimated by the thresholded sample-variances, with estimated set of indices of non-zero variance components Here the statistics T k (Y •k ) and the threshold λ u,T are given by, where ψ (0) (•) and ψ (1) (•) denote the digamma and trigamma function respectively.The motivation for this thresholded set, which has similar structure as the thresholded set K h (Y ), comes from the observation that -for uncorrelated replicates-the vector {T k (Y •k )} T k=1 is variance stabilizing and behaves approximately as an 0,T (k u,T )-Gaussian sequence model with noise variance ψ (1) (S/2).This justifies the threshold choice λ u,T based on a Gaussian sequence model.For a general between-replicate correlation matrix G S , it follows that the distributional behavior of the zero variance components with indices k / ∈ K u,T remains the same, thus allowing for the same control on the number of false positives as in the uncorrelated case.We note that the threshold λ u,T is slightly more conservative than the asymptotic minimax universal threshold as in Section 2.3.The reason for this is that in the context of a Gaussian sequence model, under the more conservative threshold, both the number of false positives and the number of false negatives in the estimated set of indices of non-zero variance components tend to zero almost surely (Corollary 4.3).Under the asymptotic minimax threshold, this only holds true for the number of false negatives.

Estimation of the between-replicate correlation matrix
The between-replicate correlation matrix G S is estimated elementwise by considering the following sample-correlation based estimators, 2) where δ > 0 is a small constant that ensures that the denominator is bounded away from zero, and Note that the rescaling does not affect the estimated zero variance components

Iterative estimation scheme
In order to estimate the population-mean log-spectrum h f (ω ), we consider a generalized least squares estimator with weights depending on the betweenreplicate correlation structure.On the other hand, estimation of the random effects covariance and correlation matrices G T and G S depends on the populationmean sequence h, since the sample-variances and sample-correlations need to be centered about their respective means.This is typically the case in linear mixed mdoel estimation where one allows for between-replicate correlation, and in this context, one easy-to-implement approach is to consider an iterative-generalized least squares scheme (see e.g.[17]).First, we compute the thresholded ordinary least squares estimator of h by equally weighting each of the observations across replicates.This does not require any information on the between-replicate correlation structure.Next, we iterate between estimation of G T and G S given the estimate of h, and estimation of h given estimates of G T and G S , and we continue iterating until some convergence criterion is satisfied.We note that under a similar random effects variance-covariance structure, considering only the linear part of the estimators (without the thresholding), [18] show that an iterative-generalized least squares scheme converges exponentially with probability tending to one as the number of replicates S increases.

Replicate-specific log-spectra
The replicate-specific log-spectra H f s (ω ) at frequencies ω ∈ [0, 1/2) are predicted by the projection estimators, the inverse discrete wavelet transform with respect to the basis B of the predicted replicate-specific sequence of coefficients H s (Y ) := { H sk (Y )} T k=1 .Prediction in the wavelet coefficient domain reduces to prediction in a linear mixed model, and we can find estimated predictors of the random effects sequences T I S are plug-in estimators, and 1 S denotes an S-dimensional vector of ones.Note that if G k and V −1 k are replaced by the true matrices G k and V −1 k , and ĥ is replaced by the best linear unbiased estimator, then the U •k are the best linear unbiased predictors of the random effects sequences U •k in a linear mixed model, see [37].In general, due to the nonlinear thresholding of coefficients, ĥ ceases to be an unbiased estimator of h.By combining the estimator for the fixed effects sequence h and the predictors for the random effects sequences U •k , the replicate-specific sequences of coefficients 4. Estimation theoretical results

Risk bounds for the population-mean log-spectrum
In this section we derive finite-sample upper bounds for the 2 -risk of the estimated fixed effects sequence of coefficients ĥ(Y ) = { ĥk (Y •k } T k=1 with respect to h.By Parseval's relation, for any given number of replicates, the 2 -risk in the wavelet coefficient domain is asymptotically equal to the L 2 -risk of the projected estimators in the frequency domain.It then follows that the same expression derived for the 2 -risk in the wavelet coefficient domain also gives an upper bound for the L 2 -risk of the estimated population-mean log-spectrum ĥf .The derivation of the 2 -risk bounds is based on the observation that, under some regularity conditions, the non-Gaussian linear mixed model in the wavelet coefficient domain (eq.(2.4)) is asymptotically equivalent to a Gaussian linear mixed model (T → ∞) as the empirical wavelet noise coefficients are essentially local averages of the log-periodogram ordinates, and the random effects wavelet coefficients are assumed to be normally distributed.This allows us to calculate the 2 -risk of the sequence ĥ(Y ) first under an accompanying Gaussian sequence model, and relate this to the 2 -risk under the non-Gaussian sequence model.The asymptotic equivalence between the two models is based on a uniform asymptotic normality result for the empirical wavelet noise coefficients.[29] already establishes uniform asymptotic normality for the empirical wavelet noise coefficients of periodogram ordinates for a general non-Gaussian process X(t) in the context of a single long time series.Since the technical considerations in [29] are not the main focus of this paper, here we derive uniform asymptotic normality of the empirical wavelet noise coefficients of log-periodogram ordinates only for a weakly dependent Gaussian process X s (t) as in [4, Chapter 5] with a given (log-)spectrum, i.e. conditional on the functional random effects in the frequency domain.By the (conditional) Gaussianity assumption for the time series replicates, we can derive cumulant bounds for the realized replicatespecific log-periodogram ordinates in the frequency domain using results from [38].These cumulant bounds are used to derive uniform asymptotic normality of the empirical wavelet noise coefficients for an increasing number of coefficients, along the same lines as [29] for a single time series replicate.Note that this result is conditional on the functional random effects in the frequency domain, however since the random effects wavelet coefficients in eq.( 2.4) are assumed to be normally distributed, unconditional uniform asymptotic normality of the wavelet coefficients of the random replicate-specific log-periodograms follows as well.We point out that asymptotic normality of empirical wavelet noise coefficients of the log-periodogram has already been suggested without proof by [9], [27], and [8] under the approximate additive noise model with k ∼ (0, π 2 /6).In order for a certain summation effect to work, we make the additional assumption that, for increasing T , the set of non-zero fixed effects coefficients is bounded away from the finest wavelet scale, intuitively this means that the finest wavelet scale contains virtually only noise and no signal as T increases.This is a typical assumption in the wavelet literature, and in an ordinary signal plus noise model this is commonly used for estimation of the noise variance through the empirical coefficients located only at the finest wavelet scale (see [40]).

Assumption (A2). Define the set,
for some constant C > 0. We assume that there exist some T * > 0 and 0 < α * < 1, such that for ).For T sufficiently large, the 2 -risk of ĥ(Y ) satisfies, 2) where denotes the inequality ≤ up to a multiplicative positive constant.
The first term on the right-hand side in eq.( 4.2) is equivalent to the minimax rate of estimation in an 0,T (k h,T )-Gaussian sequence model with noise variance of order (ST ) −1 .The second term arises from introducing the random effects and is an upper bound of the integrated error made in estimating h by taking a weighted sample average over a finite number of replicates.We observe that , and this term is minimized by the generalized least squares weights w k as in Section 3.1.In the case of sub-optimal ordinary least squares weights w k = ( 1 S , . . ., 1 S ) the second term becomes: This implies that if S 2 1 S G S 1 S → 0 as S, T → ∞ the thresholded ordinary least squares estimator remains a consistent estimator of h.For uncorrelated replicates this term is for instance of the order k u,T /S.The expression 1 S G S 1 S is always nonnegative, and is decreasing as replicates become more negatively correlated.This might seem surprising, but can be illustrated by the following simple bi-replicate example: suppose one observes two random replicate-curves that are highly negatively correlated, with high probability the true population mean-curve lies in between the two replicate-curves and the error term due the random effects should therefore be smaller than in the independent curve situation; if the curves are perfectly negatively correlated, the true population mean-curves lies exactly in between the two replicate-curves, and the error term due to random effects should disappear completely.

Consistent estimation of random effects covariance matrices
In this section, we derive some asymptotic results for the estimators σ2 uk (Y •k ) of the variance components in the within-replicate covariance matrix G T , and the estimators ρij (Y ) of the correlation coefficients in the between-replicate correlation matrix G S .The derived results crucially rely on the condition G S F = o(S), which controls the level of correlation between different replicates.Essentially it requires that the effective number of uncorrelated replicates increases with the total number of replicates S. To illustrate, for uncorrelated replicates G S F /S = 1/ √ S, whereas for perfectly correlated replicates G S F /S = 1.In order to simplify the proofs, as in [21], [9], [27], and [8], we work under the approximate model where the empirical wavelet noise coefficients are mean zero with variance σ 2 e /T , which is the asymptotic version of the model as T → ∞.Note that we do not assume normality of the empirical wavelet noise coefficients, nor independence between different scale-locations k within a single replicate.
for each s = 1, . . ., S and k = 1, . . ., T , and that there exist uniform consistent estimators , and C ≤ λ u,T = o(log(T )) for some constant C > 0, then 2), then for each i, j with i = j, ρij (Y ) ) for k / ∈ K u,T are needed in order to control the number of false positives in the set of estimated non-zero variance components in eq.( 4.3).Under assumptions (A1)-(A3), by the cumulant bounds derived in the proof of Theorem 4.1 (see Appendix), it follows that for the nonlinear estimators ĥk (Y •k ) in Theorem 4.1 with ordinary least squares weights w k = ( 1 S , . . ., 1 S ) .The following corollary gives the theoretical justification for the form of the statistics T k (Y •k ) as given in eq.(3.1), which are based on the accompanying Gaussian sequence model where the empirical wavelet noise coefficients are exactly normally distributed.Under the Gaussian sequence model, it is possible to adopt the threshold λ u,T = ψ (1) (S/2) 2 log(T − k u,T ), which converges to zero if log(T )/S → 0 as S, T → ∞, while preserving the consistency result that both the number of false positives and false negatives in the estimated set of non-zero variance components is zero with probability tending to one.
For correlated replicates, with general correlation matrix G S , it remains true that for S → ∞,

Confidence regions
In this section, we develop asymptotic confidence regions for the discretely sampled population mean log-spectrum, where it is important to take into account the possible correlation between different replicate-specific curves to avoid the use of erroneous confidence sets.In the wavelet coefficient domain, the estimated sequence of fixed effect coefficients is a nonlinear biased estimator of h, and for this reason it is generally difficult to derive asymptotic confidence bounds directly from the asymptotic distribution of ĥ(Y ), even under Gaussian model assumptions.As proposed in [35] and [10] among others, instead we consider an estimator of the squared norm h − ĥ 2 (conditional on ĥ), and we derive the asymptotic distribution of this estimator instead of the asymptotic distribution of the original estimator ĥ.Asymptotic 2 -confidence regions for h can then be constructed by restricting the norm of h with respect to the estimated sequence ĥ.Moreover, by the norm equivalence between the functional (i.e.frequency) domain and wavelet coefficient domain, we can easily transfer the confidence regions for h in the wavelet domain to confidence regions for h f in the frequency domain.For convenience, we work under the approximate model assumption that the wavelet noise coefficients are exactly normally distributed, with mean zero and variance σ 2 e /T .The derived confidence regions are therefore approximate in the sense that they are based on asymptotic distributional behavior of the estimator of the pivot quantity (S → ∞), but also on the fact that the empirical wavelet noise coefficients are only asymptotically normally distributed (T → ∞) under appropriate model conditions as discussed in Section 4.1.The method is based on the assumption that we can split the sample ξ = (ξ 1 , . . ., ξ T ) ∈ R S×T , with ξ k ∼ N (h k 1 S , V k ), into two sets of independent observations ξ (1) , ξ (2) ∈ R S×T .Suppose that the covariance matrices V k are known, one simple approach to split the sample into two independent samples at the cost of making the variance twice as large is to consider, where the vectors X k ∼ N (0, V k ) are independent of ξ k .We estimate the sequence h using only the observations in ξ (1) , and construct the confidence regions from the additional independent set of observations ξ (2) conditional on ĥ(ξ (1) ).The nature of the estimator ĥ is irrelevant for the construction of the confidence region, however, since the radius of the confidence region is proportional to h − ĥ , better estimators ĥ (in terms of 2 -risk) will lead to smaller confidence regions.Suppose that we have split the sample into two independent parts ξ (1) , ξ (2) , with ξ . ., T and we have computed ĥ := ĥ(ξ (1) ).The next step is to find an estimator of the pivot quantity h − ĥ 2 , conditional on ĥ, using only the set of observations ξ (2) .We consider the unbiased estimator, R(ξ (2) , ĥ) = where w k = (w 1k , . . ., w Sk ) is a vector of (generalized least squares) weights such that s w sk = 1.Straightforward calculus shows that, conditional on ĥ, R(ξ (2) , ĥ) is an unbiased estimator of h − ĥ 2 and, τ 2 (h, ĥ) := Var R(ξ (2) , ĥ) = where diag(w k ) is a diagonal matrix with the vector w k on the diagonal. (2), ĥ) with standard normal quantile z α , and ĥ = ĥ(ξ (1) ) such that ξ k independent for all k = 1, . . ., T .Then, lim inf The validity of the asymptotic confidence regions relies on the condition Γ k 1 / Γ k F → 0, which requires the maximum absolute row sum (or column sum by symmetry) of Γ k to be dominated by its Frobenius-norm for increasing S.This condition implies that the number of relevant principal components of the matrix Γ k is increasing, or in other words, the vector of eigenvalues of Γ k should not be dominated by one or a few large values as S increases.Although in a somewhat different spirit than the condition G S F = o(S), this condition also implies that the effective number of independent replicates should increase with the total number of replicates S. Note that considering a vector of equal weights w k = ( 1 S , . . ., 1 S ) , this condition can be restated in terms of the between-replicate correlation matrix as G S 1 / G S F → 0 for S → ∞.This ratio has an optimal rate 1/ √ S when G S is equal to the identity matrix, and it can be verified that this condition implies G S F = o(S), (the other direction does not hold).
In Theorem 5.1 we have constructed asymptotic confidence regions only for the sequence of fixed effect coefficients in the wavelet coefficient domain, however by the 2 -normalization of the wavelet basis we have that 1 √ T h f − ĥf = h − ĥ , thus we can consider the scaled confidence regions in the frequency domain given by, 2) , ĥ) and by Theorem 5.1, the asymptotic coverage probability also satisfies, Remark Note that the confidence regions are constructed under the assumption that the covariance matrices V k are known.In practice, these covariance matrices are unknown, and we therefore replace them by plug-in estimators V k .
The generalized least squares weights w k , which typically also depend on the covariance matrices V k , can be replaced for instance by the sub-optimal ordinary least squares weights w k = ( 1 S , . . ., 1 S ) .This does not change the asymptotic normality result of the estimator R(ξ (2) , ĥ) in the proof of Theorem 5.1, but it comes at the cost of increasing its variance, thereby increasing the radius of the confidence regions.

Simulated data examples
In this section, we assess the finite-sample performance of the developed estimators by some simulated data examples.In Algorithm 1 below, we describe a procedure to simulate replicated time series {X s (t), s = 1, . . ., S} with random log-spectra H f s by means of their discrete Cramér representations.In short, given a wavelet basis B, population-mean transfer function a f (with populationmean log-spectrum h f (ω) = log(|a f (ω)| 2 )), within-replicate covariance matrix G T , and between-replicate correlation matrix G S , we generate replicate-specific random transfer functions A f s (ω) which are inserted into discrete Cramér representations to generate the replicated time series.

8:
A f s ← a f exp(U f s ) 9: In Algorithm 1, ξ s denotes a complex-valued normal random variable with independent real and imaginary parts, such that ξ s = ξ s * − .

Population-mean log-spectrum
We consider data generated under a single population-mean log-spectrum h f coming from an ARMA(2, 2) process with parameters φ = (−0.2,−0.9), θ = (0, 1) and white noise variance σ 2 w = 1.The log-spectrum of this ARMA(2, 2) process is particularly difficult to estimate due to some sharp local features.In the right image of Figure 1  log-spectrum h f = (h f (ω 0 ), . . ., h f (ω T −1 )) is shown for ω ∈ [0, 1/2) with T = 1024.In fact, the displayed curve is a relatively sparse 0 -approximation of h f under a Daubechies extremal-phase wavelet basis B with N = 6 vanishing moments, where we have thresholded all coefficients with |h k | < T −1 .Here, we have used the WaveThresh package in R, see [28, Chapter 2] for more details.

Random effects covariance matrices
For the T × T -dimensional covariance matrix G T , we consider the diagonal matrix diag(σ 2 u ) with set of indices of non-zero variance components given by, which are simply all the indices contained in the wavelet scales {j : 0 ≤ j < J} with the additional constraint that K u,T ⊆ K h,T .For the magnitudes of the variance components, we consider σ 2 uk decaying with a factor 2 per increasing wavelet scale, i.e. for some constant C > 0, let σ 2 u1 = C and define for k > 1, Under this specific model, Figure 1 shows generated random log-spectra for three replicates (two of which are highly correlated) and corresponding simulated replicate-specific time series, using a Daubechies extremal-phase wavelet basis with N = 6 vanishing moments and parameters C = 0.5, J = 4, which are also the values used in the subsequent simulation studies.
For the between-replicate S × S-dimensional correlation matrix G S we consider two different scenarios: 1.A symmetric block-diagonal matrix containing a single (S/2×S/2) dimensional block of highly correlated replicates with ρ ij = 0.9 for 1 ≤ i, j ≤ S/2.The constructed correlation matrix satisfies G S F = o(S) and is positive-semidefinite.2. A symmetric contour-matrix that consists of layers of block matrices with decaying levels of correlation, see Figure 2. The layers are chosen such that again G S F = o(S) and G S is positive-semidefinite.The correlation matrix G S , for S ≥ 16 dyadic, is constructed as follows.Divide an S × Sidentity matrix into blocks B ij of size 8 × 8 with 1 ≤ i, j ≤ S/8.Fix j ≥ 2, for i < j, set all elements of B ij equal to {1 − j 2 S } + , and do the same for B jj only for its off-diagonal elements.Similarly, fixing i ≥ 2, for j < i, set all elements of B ij equal to {1 − i 2 S } + , and finally put B 11 = B 22 .

Simulation study
We assess the performance of the proposed estimation procedure and compare this to the performance of several related alternatives.First, we consider a naive ordinary least squares approach (OLS), where we smooth the replicate-specific log-periodograms using FDR thresholding with tuning parameter q = 0.001 and estimate h f by averaging the smoothed curves over replicates, thereby not taking into account the between-replicate dependence structure.Second, we consider the non-adaptive iterative-generalized least squares approach, where the smoothness space parameter k h,T is assumed to be known and the set K h,T is estimated using the universal threshold λ h,T as in Section 3.1.Third, we consider an adaptive iterative-generalized least squares approach, in which k h,T is not known.In order to estimate the set K h,T we use FDR thresholding with tuning parameters q = {0.1,0.001}.Finally, in order to assess the increase in estimation error due to the iteration scheme, we also compute oracle estimators ĥf and G S ( ĥ), where we assume the true generalized least squaresweights (depending on G S ) to be known in estimating h f , so that no iteration of the estimators is required.Note that in this final scenario we do not assume knowledge of k h,T nor k u,T , and the set K h,T is estimated as in the third scenario with FDR tuning parameter q = 0.001.The performance of the estimator ĥf = ( ĥf (ω 0 ), . . ., ĥf (ω T −1 )) is assessed through the squared error averaged over Fourier frequencies.Similarly, the performance estimators G T and G S is evaluated through the squared error averaged over matrix elements.
Block-diagonal correlation matrix G S (Case (a))   In Table 1 we show average squared errors with in parenthesis corresponding standard errors (σ/ √ M ) for M = 1000 replicated simulation experiments.It should not come as a surprise that knowledge of the true generalized least squares-weights significantly improves the estimation error.This is seen by comparing the estimation error for h f of the naive ordinary least squares approach, which does not take into account the between-replicate correlation structure, with that of the oracle estimator.The iterative-generalized least squares scheme then inflates the estimation error for h f relative to the oracle estimator, but still outperforms the ordinary least squares approach under all of the considered scenarios.Another important observation is that the performance of the adaptive estimators -regardless of the choice of the FDR tuning parameter-is similar to the performance of the nonadaptive estimators in essentially all of the considered scenarios.The adaptive estimators slightly outperform the nonadaptive estimators in some cases, which is most likely due to the fact that the asymptotic minimax threshold λ h,T is somewhat conservative in a finite-sample situation.In Figure 3 and 4, we give some visual representations of the estimates under a block-diagonal between-replicate correlation structure (T = 512, S = 64).Figure 3 shows the estimated population-mean log-spectrum and between-replicate correlation matrix for a single simulation experiment using the adaptive ap-proach with FDR tuning parameter q = 0.001, and Figure 4 shows true and predicted random effects curves for a single simulation experiment under the same scenario.

Confidence region coverage
We also assess the validity of the constructed confidence regions for h f by computing their empirical coverage under some of the simulated models considered before.Since we are interested in the (negative) impact on the empirical coverage caused by replacing the true covariance matrices V k by estimates V k we consider two different scenarios.In the first scenario, we assume the true matrices V k to be known, both in performing the sample splitting procedure and in constructing the confidence regions.In the second scenario, we consider the matrices V k to be unknown, therefore replacing them by plug-in estimators V k .For the weight vectors w k , we consider ordinary least squares weights, equally weighting each replicate.As a benchmark procedure, we also compute the empirical coverage of parametric bootstrap confidence regions for h f with B = 1000 bootstrap samples as proposed in [21] and [7].The bootstrap confidence regions are constructed using the true covariance matrices V k , and in this sense are oracle confidence regions, that should be compared to the asymptotic confidence regions under the first scenario.In Table 2, the empirical coverage for the different approaches is shown for M = 5000 replicated simulation experiments (M = 1000 for the bootstrap confidence regions), the simulation results are shown only for the adaptive approach (i.e.k h,T unknown) using FDR tuning parameter q = 0.001, the results for the other approaches considered in Table 1 are similar.

Analysis of brain signals: replicated LFP time series
To conclude, we analyze brain signal data recorded during an associative learning experiment.The dataset consists of local field potential (LFP) time series traces, measuring electrical activity in the brain over the course of the experiment, see [12] and [7] for a more detailed description.During the experiment, a male macaque learns the association between one set of objects (four pictures) and another set of objects (doors located in four different quadrants of the visual field) by means of trial-and-error.In each trial, the macaque was first presented with a picture and then required to select one of the four doors.Each time the macaque made the correct association it was given a reward in the form of a small quantity of juice.Over the course of the associative learning experiment, the electrical activity in the brain of the macaque was measured using local field potentials.Local field potentials measure electrical activity in the brain directly via chronically implanted probes, in contrast to other commonly-used non-invasive recording techniques, such as electroencephalograms (EEGs) and functional magnetic resonance imaging (fMRI).In this analysis, we consider univariate local field potential time series data recorded in the nucleus accumbens (NAc) region, which is a region in the brain that has been demonstrated to be highly implicated in cognitive processes involving memory and reward.After preprocessing of the LFP time series data, there remains a total of 590 (univariate) time series traces of length 2048 sampled at 1000 Hz, thus roughly corresponding to 2 seconds of data.The goal of the analysis is to study trial-population spectral characteristics, and in particular the evolving spectral properties of the time series over the course of the experiment.In Figure 5, we show the recorded LFP time series for three trials (start, middle, and end of the experiment) and their corresponding raw log-periodograms, which show clear common frequency behavior across the three different trials.The left image of Figure 7 shows initial smoothed logperiodograms across all the trials in the experiment, using FDR thresholding with tuning parameter q = 0.001, without taking into account the betweentrial dependence structure.Note that instead of the individually smoothed logperiodograms, we show blockwise-average log-periodograms over 10 adjacent individual trials in order to improve visibility of the image.
In general, the log-periodograms in Figure 7 (left) display very common frequency behavior across trials, however towards the end of the experiment the overall power in the middle-and high-frequency range (ω > 0.1) seems to increase (except for the frequency-band around ω ≈ 0.14), and we also note that power in the very low-frequency range (ω close to zero) fades out after approximately half of the trials.Simply averaging the spectral estimates across all trials will not take into account, nor give us any information, about the dependence structure of the underlying brain dynamics over the course of the experiment.Therefore, we need a model that allows for explicit correlation between trialreplicates in the population.In the data analysis we consider frequency content up to 256 Hz (ω = 0.25), since higher frequency behavior is typically attributed to noise and not physiological behavior in the brain, and we do not want the estimated betweentrial correlation matrix to be dominated by this very high-frequency content (ω > 0.25).In the left image of Figure 6, we show the estimated populationmean log-spectrum h f = (h f (ω 0 ), . . ., h f (ω 511 )) , with in grey several regions of interest for the neuroscientist: the α-band (8-16 Hz), the β-band (16-32 Hz), γ-band (32-100 Hz).The black dotted lines correspond to the variability (square root of the diagonal of the estimate of G f T = W B G T W B ) of the random effects curves in the frequency domain.Note that there are two small dips at 60 Hz and 180 Hz, these are artifacts remaining after the application of two Butterworth band-stop filters in order to filter out the power line frequency around 60 Hz (in North America) and one of its harmonics at 180 Hz.The right image in Figure 6 shows the estimate of the between-trial correlation matrix G S for blocks of 10 adjacent individual trials.Note that the estimated correlation matrix clearly demonstrates the correlation structure between the trials that we observed for the initial smoothed log-periodograms in Figure 7 (left).Trials at the beginning of the experiment are highly correlated and trials at the end of the experiment are highly correlated, and the correlation between trials decays as the lag between trials increases.This suggests that the trial-specific log-spectra evolve over the course of the learning experiment, as already discussed in [7], and it is important to take this behavior into account in the data analysis in order to improve estimation of the population-mean curve and especially prediction of the replicate-specific curves, but also to avoid misleading results in any subsequent inference procedures.The right image of Figure 7 shows the predicted trial-specific log-spectra (again for blocks of 10 adjacent trials).Comparing the two images in Figure 7, we observe that the predicted trial-specific log-spectra perform better in suppressing the overall noise than the individual smoothed log-periodograms, since the functional mixed-effects model pools information across different trials, on the other hand we are still able to capture most of the relevant features present for the individual smoothed log-periodograms.

Conclusion
In the context of spectral analysis for replicated time series, where the focus is on population spectral characteristics rather than the behavior of individual time series, we propose to model replicate-specific log-spectra as random curves based on a nonparametric functional mixed effects approach.We address the specific problem of analyzing spectra that are characterized by localized peaks or troughs by successfully using projection estimators, and in particular nonlinear wavelet thresholding.Here we benefit both from a convenient linear mixed effects structure in the wavelet coefficient domain, and the possibility to constrain the complexity of this nonparametric estimation problem by natural 0 -sparsity constraints.The paradigm of 0 -driven sparsity leads to simple constraints ensuring that the replicate-specific curves and the population-mean curve share the same level of complexity (as in [13]), but also allows to come up with practical near-optimal threshold choices for both fixed-and random effects estimation.The good performance of this smoothing device is also confirmed by our empirical results.
As an additional important ingredient, we introduce a generic correlation model for the population of random effects curves, where both intra-and inter-subject correlation are modeled in a convenient nonparametric way in the wavelet coefficient domain.The importance of including correlated functions in the general setting of functional response regression has already been underlined by [24].The author clearly expresses that the overwhelming majority of existing work cannot model correlation between curves and is hence only suitable for independently sampled functions.This is not realistic for many functional data problems, and can lead to estimators that are statistically inefficient, or even give misleading inferences.As an example of replicated time series data with explicit correlation between different replicates in the population, we analyze empirical brain signal data over the course of an associative learning experiment.There is a clear indication that the spectral behavior of trial-replicated time series evolves over the course of the experiment, and we are able to reproduce a meaningful correlation structure over time series replicates that demonstrates this evolu-tionary behavior over the course of the learning experiment.We note that our fully nonparametric approach, although developed in the framework of spectral analysis for time series data, can equally well be applied in a general functional data analysis context in the presence of correlated random curves, where we benefit from the twofold adaptation properties of wavelets towards sparse and localized structures.
To conclude, we discuss two important directions in which to generalize the proposed model.First, replicated time series spectral analysis of brain data eventually calls for a multivariate treatment in order to reveal dependence structures between different regions in the brain through cross-spectral analysis of different components of the multivariate time series (e.g.multivariate EEG time series data from different regions in the brain).Recent unpublished work by [22] treats the problem of analyzing associations between power spectra of multivariate time series and cross-sectional outcomes by an approach based on a tensor-product spline model, in frequency and outcome, of Cholesky components of outcome-dependent power spectra.However, to the best of our knowledge, no quantitative analysis that embeds replicate-specific spectral matrices into a multivariate functional mixed effects model exists so far, not even for the case of independent replicates.We are currently generalizing our functional mixed effects approach developed for replicated univariate time series to this more challenging setting.Second, there is considerable evidence (see e.g.[31]) that for long EEG recordings, the second-order stationary assumption for the time series is too strong.It is preferable to weaken this assumption and to consider for instance a variance-covariance structure that slowly changes over time.In the context of an individual time series, time-varying spectral analysis is a challenging task since it can lead to estimators with extremely high variance (see e.g.[30]).We expect that the methodology presented here will become very efficient in this context, since it allows for pooling of information across the different time series replicates.
Appendix: Proofs 9. Key components in the proof of Theorem 4.1 We outline the proof of Theorem 4.1 through several key lemmas.The first lemma gives the uniform asymptotic normality result for empirical wavelet coefficients of the log-periodogram ordinates and relates the 2 -risk of ĥλ (Y ) under the sequence model in the wavelet coefficient domain in Section 2.1.2to the 2risk of ĥλ (ξ) under an accompanying Gaussian sequence model.Here, λ ≥ 0 is an arbitrary nonlinear threshold.
Lemma 9.1.Under assumptions (A1) and (A3), uniformly in k ∈ J T,α , for arbitrary 0 < α < 1, This lemma implies that it suffices to derive an 2 -risk upper bound of ĥ(Y ) under the accompanying Gaussian sequence model.In the lemma below, we derive exact expressions of the mean-squared errors of ĥk (ξ k ) with respect to h k under the Gaussian model, which is equivalent to exact normality of the empirical wavelet noise coefficients.Lemma 9.2.Suppose that 1k , . . ., Sk iid ∼ N (0, σ 2 e /T ), such that ξ k ∼ N (h k 1 S , V k ) for each k = 1, . . ., T .An exact expression of the mean squared error of ĥk,λ (ξ k ) with threshold λ ≥ 0 is given by, e /T , and φ(•) denotes the standard normal probability density function.Furthermore, for each k These upper bounds are sharp when either λ The derivations follow from straightforward calculus and the proofs are therefore omitted.The expression for the mean squared error generalizes an expression for the mean squared error of a nonlinear hard threshold estimator in a classical Gaussian sequence model found in [5].Note that we find back the expression in [5] when S = 1 and σ2 k,T = 1.The main result in Theorem 4.1 on the 2 -risk of ĥ(Y ) now follows from combining Lemma 9.1, the upper bounds in Lemma 9.2, and plugging in the threshold λ h,T (see Section 10).9.1.Proof of Lemma 9.1 . ., Sk ) independent noise terms.Then, conditional on the random effects coefficients, we will show that, uniformly in k ∈ J T,α with α > 0, where −∞ < x s ≤ T ν for some ν > 0. In order to prove this asymptotic normality result, we derive upper bounds on the n-th order cumulants (n ≥ 2) of sk = E f s , ψ k T , the empirical wavelet coefficients of the log-periodogram error terms E f s with respect to the wavelet basis function ψ k .Since Var( sk ) = O(T −1 ), the n-th order cumulant of sk (for all s, k) can be written in terms of joint cumulants as, Under (A3), conditional on the random effects coefficients, for each s = 1, . . ., S, X s (t) is a stationary Gaussian process such that [38] it then follows that, conditional on the random effects coefficients, for each s = 1, . . ., S and n ≥ 2: where, n 1 + . . .+ n m = n, and Furthermore, since wavelet basis functions at wavelet scale j are of the order uniformly in ω for some C > 0.
From eq.( 9.1) we obtain, cum n ( sk / Var( sk )) where by orthonomality of the wavelet basis functions ψ k (ω) 2 dω = 1, and where B n denotes the n-th Bell number satisfying B n ≤ n! for all n ∈ N.
By the same arguments as in [29], due to the cumulant bounds in eq.( 9.2) and Lemma 1 in [36], for all s = 1, . . ., S (9.3) ) uniformly in ω by [38], thus for k ∈ J T,α , and since Var( sk ) = O(T −1 ), the standardized bias satisfies b : ). Rewriting eq.( 9.3) gives, Let w.l.o.g.b ≥ 0 and fix some c > 1, then for x s ≤ c On the other hand, for c < x s ≤ ∆ T by a formula for Mill's ratio (see [29]), we conclude from eq.( 9.5) that conditional on the random effects coefficients, for all s = 1, . . ., S for −∞ < x s ≤ ∆ T and uniformly in k ∈ J T,α .Moreover, for any given S, and since conditional on the random effects coefficients the terms Y sk − h u sk are all independent across replicates, this is equivalent to: T I S ).The random effects coefficients are assumed to be jointly multivariate normal, therefore the unconditional version follows as well, In the following part, we relate the mean squared error of ĥk,λ (Y •k ) w.r.t.h k to the mean squared error of ĥk,λ (ξ k ) w.r.t.h k , and show that they are asymptotically equivalent as T → ∞.We split up, According to eq.(9.6) above, there exist C ( ) In the argument below we use that for g : R S → R measurable, if a ∈ R S is such that P (g(X) ≥ g(a)) = 1, then , where we recall that ĥk,λ (•) is the (deterministic) thresholding rule that defines our estimator, and let Note that a exists and is attained for x ∈ B k , since B k is closed and bounded and therefore compact.
It remains to show that the remaining part behaves according to the claimed rates.
By (A1), h ∈ 0,T (k h,T ) and σ 2 u ∈ 0,T (k u,T ) with K u,T ⊆ K h,T .We decompose the first sum on the right-hand side above into three different regions {k / ∈ K h,T }, {k ∈ K h,T \ K u,T }, and {k ∈ K h,T ∩ K u,T }, and upper bound by Lemma 9.2, We observe that λ h,T = σe √ T S 2 log(T /k h,T ) > σe √ T S for T large, since k h,T /T → 0 as T → ∞.Therefore, for T sufficiently large, where in the second step we use z √ T Sλ h,T /σe ≥ 1 and in the third step By plugging eq.( 10.4) into eq.(10.1), we obtain for T sufficiently large Since 0 < µ < ∞ can be chosen arbitrarily large, for µ > 2 the term Proof.First we show that the inclusion K u,T ⊆ K u (Y ) holds with probability tending to 1 as S, T → ∞.
It can be verified that ψ (0) (x) = log(x)+O(x −1 ), therefore | log(2/S)+ψ (0) (S/2)| = o S (1), and the first term on the right-hand side is seen to grow at the rate ∼ log(T ) for S, T increasing.For the second term, we show that sup For k ∈ K u,T , term (ii) and term (iii) uniformly converge to zero in probability as S, T → ∞, which follows from the fact that sup k |h k − ĥk | = o S,T p (1) and For term (i), by linearity of the expectation and recalling that where we use that by assumption sk ∼ (0, σ 2 e /T ).Also, where in the second step we use that sk ⊥ U s k for all s, s , sk ⊥ s k for s = s , and the fact that E[ sk ] = 0 for all s, k.We observe that Var(( For the uniform convergence in probability of log(σ 2 k (Y •k )) over k ∈ K u,T , we show that for any , γ > 0, there exist S 0 , T 0 sufficiently large (depending on only , γ) such that for S > S 0 , T > T 0 and all k ∈ K u,T .For arbitrary > 0, by the law of total probability, Since log(x) has bounded derivative on x ∈ [δ/2, ∞) for δ bounded away from zero, it is uniformly continuous on the domain [δ/2, ∞).Thus, there exists Using that inf k∈K u,T σ 2 k ≥ δ, for this choice of 1 we get, By the uniform convergence in probability of σ2 k (Y •k ), there exist S (1) 0 , T 0 for S > S (1) 0 , T > T 0 and all k ∈ K u,T .Similarly, using again that inf k∈K u,T σ 2 k,T ≥ δ, there exist S for S > S We conclude that the left-hand side inside the probability in eq.(11.1)grows at the rate ∼ log(T ) in probability for increasing S, T .On the other hand, λ u,T = o(log(T )) by assumption.Combining these two results implies that K u,T ⊆ K u (Y ) with probability tending to 1 as S, T → ∞.
Next, we show that the other inclusion K u (Y ) ⊆ K u,T also holds with probability tending to 1 as S, T → ∞, which is equivalent to showing For k / ∈ K u,T , eq.(11.2) term (ii) and term (iii) are o S,T p (T −1 ), which is obtained by combining By continuity of the logarithm, for arbitrary > 0 and fixing x 0 = 1, there exists Since 1 only depends on x 0 = 1, and 1).On the other hand, λ u,T ≥ C for some constant C > 0 by assumption.Combining these two results implies that K u (Y ) ⊆ K u,T with probability tending to 1 as S, T → ∞.To conclude, since both In order to show the uniform convergence in probability of the nonlinear estimators σ2 uk (Y •k ) over k ∈ {1, . . ., T }, it suffices to show that for any , γ > 0, there exist S 0 , T 0 sufficiently large (depending only on , γ) such that for S > S 0 , T > T 0 and all k ∈ {1, . . ., T }.
Second, consider the case k / ∈ K u,T , again by the law of total probability for arbitrary > 0, We note that P (σ 2 uk > | k / ∈ K u,T ) = 0 for all > 0 since σ 2 uk = 0 for k / ∈ K u,T , and again by eq.(11.2) for any γ > 0 there exist S (2) 0 , T Thus for any , γ > 0, 0 , T > T In part (i) below it is shown that term (i) converges to ρ ij in probability as S, T → ∞.In part (ii) it is shown that the terms (ii) and (iii) converge to zero in probability as S, T → ∞.Then, combining these results, and observing that k u,T / ku P → 1 as S, T → ∞ since P ( K u = K u,T ) → 1, an application of Slutsky's lemma yields the claimed result.
Part (i) From the proof in Section 11.2, we know that sup k∈K . Therefore, term (i) in eq.(11.7)can be rewritten as, where we also use that inf k∈K u,T σ 2 uk ≥ δ.We show that 1 Term (ii), (iii), and (iv) converge to zero in probability as S, T → ∞.This is observed from combining sup k ( ĥk − h k ) = o S,T p (1), inf k∈K u,T σ 2 uk > 0, and respectively It remains to show that term (i) in eq.(11.9)converges to ρ ij in probability.By linearity of the expectation, where we use that, Cov(U ik , U jk ) = 0 for all i, j and k = k , U ik ⊥ jk for all i, j and k, k , and ik ⊥ jk for all i = j and k, k .We argue that ik and ik are asymptotically uncorrelated for all i and k = k .In the frequency domain, the noise terms at different frequencies E f i (ω ) and E f i (ω ) for = are asymptotically independent (see [4]), therefore where E f i = (E f i (ω 0 ), . . ., E f i (ω T −1 )) .Projecting to the coefficient domain with orthogonal discrete wavelet transform-matrix W and i = ( i1 , . . ., iT ) , this yields and by the rotational invariance of the Frobenius-norm, from which we conclude that ik and ik are asymptotically uncorrelated for all k = k.Combining the arguments above, it follows that, → 0, as T → ∞ using that k u,T → ∞ as T → ∞.From eq.(11.9),eq.(11.10) and Slutsky's lemma, we conclude that, Since P ( K u = K u,T ) → 1 as S, T → ∞, also P (#{ K u \ K u,T } = 0) → 1, thus the right-hand side above converges to zero as S, T → ∞.
Completely analogous, using that P (#{K u,T \ K u } = 0) → 1 as S, T → ∞, we find that term (iii) in eq.(11.7)converges to zero in probability as S, T → ∞.where the convergence is uniformly over k / ∈ K u,T , since σ 2 uk = 0 implies that T k (ξ k ) is independent of k.
Part (ii) By the same argument as in Section 11.1, it follows that K u,T ⊆ K u (ξ) with probability tending to 1 as S, T → ∞, using that λ u,T = o(log(T )) as in Theorem 4.2.
In order to show K u (ξ) ⊆ K u,T with probability tending to 1 as S, T → ∞, we use a standard argument (see e.g.[19,Chapter 8]).By the uniform weak convergence in eq.( 12 → 0, as S, T → ∞ from which we conclude that also K u (ξ) ⊆ K u,T with probability tending to 1 as S, T → ∞.
First, we derive the asymptotic distribution of term (i).Write , such that Z k ∼ N (0, I S ).Here V −1/2 k is a symmetric matrix square root of V −1 k .We can rewrite, Therefore, since by assumption Γ k 1 / Γ k F → 0, By an application of the Lindeberg-Feller central limit theorem, we conclude that which is also sufficient for the Lindeberg conditions to hold.

2 .
K h (Y ) = {k : | 1 S S s=1 Y sk | ≥ λ h,T } is the estimated set of indices of nonzero coefficients with universal threshold λ h,T = σ 2

Figure 3 .Figure 4 .
Figure 3. Estimates of h f (solid black line) and G S for a single simulation experiment, with ( ĥf 0.01 , ĥf 0.99 )-empirical pointwise quantiles for 1000 repititions of the experiment.

Figure 6 .
Figure 6.Estimates of the population-mean log-spectrum h f (left) and between-trial correlation matrix G S (right).
all k ∈ K u,T .The uniform convergence in probability in eq.(11.3)now follows from the above arguments with S 0 = S
the uniform convergence in probability with k ∈ {1, . . ., T } in accordance with Theorem 2 follows from the above arguments.11.3.Consistency of the estimators of ρ ijProof.We show consistency of the estimators ρij (Y ) marginally for each i, j with i = j.Writing K u = K u (Y ) and ku = | K u (Y )|, we decompose: ρij

Table 2
Empirical coverage (×10 2 ) of confidence regions for h f .