A joint design for functional data with application to scheduling ultrasound scans☆

A joint design for sampling functional data is proposed to achieve optimal prediction of both functional data and a scalar outcome. The motivating application is fetal growth, where the objective is to determine the optimal times to collect ultrasound measurements in order to recover fetal growth trajectories and to predict child birth outcomes. The joint design is formulated using an optimization criterion and implemented in a pilot study. Performance of the proposed design is evaluated via simulation study and application to fetal ultrasound data.


Introduction
Functional data analysis has been a popular statistical research area for the last two decades and has found application in many fields such as brain imaging (Jiang et al., 2009;Greven et al., 2010;Reiss and Ogden, 2010;Lindquist, 2012;Lu and Marron, 2014;Park and Staicu, 2015), biosignals (Crainiceanu et al., 2012;Randolph et al., 2012;Goldsmith and Kitago, 2016), genetics (Tang and Müller, 2009;Reimherr and Nicolae, 2014) and wearable computing (Morris et al., 2006;Li et al., 2014;Xiao et al., 2015). For a comprehensive treatment of functional data analysis see Silverman (2002, 2005) and Horváth and Kokoszka (2012). This paper considers sampling design for noisy growth data. The motivation arises from the study of fetal growth, where measurements of fetal size may be obtained during pregnancy using ultrasound. And the particular question to be addressed is: when a fixed number of ultrasound scans will be taken during pregnancy, what are the optimal time points for data collection? Optimality can be defined either in terms of recovering individual fetal growth trajectories or in terms of predicting a birth outcome, such as birth weight. However, in practice it may be important to predict both individual growth trajectories and birth outcomes, and in such cases a joint optimality criterion must be formulated. We also consider the closely related question of the number of ultrasound scans required to achieve a desired level of optimality.
We address this question within the functional data framework. Design for functional data has received some interest recently. For example, Ferraty et al. (2010) considered a nonparametric model with a scalar response and a functional ✩ Supplementary materials are available with this article at the Computational Statistics and Data Analysis website. predictor and Delaigle et al. (2012) studied a similar problem for classifying and clustering functional data. Both methods are restricted to densely sampled functional data and focus on dimensionality reduction for a dense functional predictor. And for spatially correlated functional data, Rasekhi et al. (2014) and Bohorquez et al. (2015) considered the problem of selecting spatial sampling points.
Design for functional data has also been extended to longitudinal data. Ji and Müller (2017) proposed prediction-based criteria for sampling functional data with the target of either recovering individual functions or predicting a scalar outcome. Wu et al. (2017) exploited the mixed effects model representation of functional data and proposed a design criterion based on Fisher's information matrix of eigenvalues of the covariance function. There are several limitations with these approaches. Wu et al. (2017) focused on recovering individual functions, while Ji and Müller (2017) were limited to the study of design separately and did not consider a joint design, which is the focus of our data application. In addition, in these works the number of design points was pre-fixed and no data-driven method was developed. Finally, Ji and Müller (2017) did not compare functional data models versus parametric mixed effects models for prediction-based designs. Our work addresses these gaps.
Following early work on design, such as Ylvisaker (1987) and the references therein and recent work by Ji and Müller (2017), we consider prediction-based designs and propose a unified design criterion for both recovering individual functions as well as predicting scalar outcomes from a functional predictor. We also propose a practical data-driven method for selecting the number of design points, building on the result that the larger the number of design points, the better the prediction will be (see Theorem 1). Finally we conduct a comprehensive simulation study to evaluate the performance of functional data models as compared to parametric mixed effects models, and demonstrate numerically that functional data models might be preferred over parametric mixed effects models for prediction-based optimal designs for longitudinal data.
The rest of the paper is organized as follows. In Section 2 we introduce functional data models and propose a unified prediction-based design criterion for sampling functional data. In Section 3 we study the theoretic properties of the proposed design. In Section 4 we discuss implementation of the design and propose a data-driven method for selecting the number of design points. In Section 5 we illustrate the proposed method using a fetal ultrasound data. In Section 6, we investigate the performance of the design via simulation studies.

Optimal design for functional data
In this section, we first describe functional data models and then formulate two optimal design problems for sampling functional data: one design targets accurate prediction of individual functions while the other targets accurate prediction of a scalar outcome. Then, we propose a unified design criterion that targets both recovering individual functions and predicting a scalar outcome. In particular, the unified design contains the previous two designs as special cases.

Statistical models
Consider a random function X (t)(t ∈ T ) defined over a continuous and compact time domain T . Suppose that X (·) is a Gaussian process with mean function µ(t) = E{X (t)} and covariance function r(s, t) = Cov{X (s), X (t)}. We assume that X (·) is square integrable in T and without loss of generality we let T = [0, 1].
In practice, X (·) is observed at a finite number of time points and contaminated with noise. Hence, for a random function X i (·) with a subject index i observed at p time points (t 1 , . . . , t p ) ′ ∈ T p , the observations are where the ϵ ij are i.i.d. N (0, σ 2 ϵ ) and independent of X i (·). Let Y be a scalar outcome with a functional predictor X (·). And consider the functional linear model where α is an intercept,X (t) = X (t) − µ(t), β(t) is a smooth coefficient function, and e is white noise independent of X (·) with mean zero and variance σ 2 e . The fundamental element in functional data analysis is the covariance function r(s, t). By Mercer's theorem, r(s, t) can be written as ∑ ∞ l=1 λ ℓ φ ℓ (s)φ ℓ (t), where λ 1 ≥ λ 2 ≥ · · · ≥ 0 is the collection of eigenvalues and the φ ℓ (·) are the associated eigenfunctions which satisfy Here 1 {·} is 1 if the condition inside the bracket holds and 0 otherwise. To ensure that β(t) is identifiable, we assume that the coefficient function β(t) can be written as ∑ K ℓ=1 β ℓ φ ℓ (t), where the β ℓ are scalars and, a possibly infinite K represents the number of non-zero eigenvalues.

Optimal design for predicting functions
Fix p ≥ 1 and assume that p observations will be collected from a new subject. The goal is to select the p optimal sampling points in T for predicting the new subject's curve with the smallest possible error.
Let t = (t 1 , . . . , t p ) ′ ∈ T p be the vector of sampling points and W i * (t) = { W i * (t 1 ), . . . , W i * (t p ) } ′ be the noisy observations for a new subject i * . Under model (1), the best predictor of X i * (t) conditional on W i * (t) is the best linear unbiased predictor (BLUP) of X i * (t), where r(t, t) = {r(t, t 1 ), . . . , r(t, t p )} ′ ∈ R p and Σ W (t) = Cov{W i * (t)} ∈ R p×p . For simplicity, we suppress the notation t from Σ W (t) and use Σ W . The optimal sampling points t can be selected by minimizing the mean integrated squared error of the BLUP, The optimal design is then defined as t opt := arg min t∈T p M 1 (t). And we simplify M 1 (t) as where tr(·) is the trace operator and R = ∫ r(t, t) ′ r(t, t)dt ∈ R p×p whose (j 1 , j 2 ) element is given by

Optimal design for predicting an outcome
Similar to Section 2.2, assume that p observations will be collected from a new subject indexed by i * . But let the goal now be to select the p optimal sampling points in T for predicting the new subject's scalar outcome with the smallest possible error.
Using the same notation as in Section 2.2, let t = (t 1 , . . . , t p ) ′ ∈ T p be the vector of sampling points and W i * (t) = { W i * (t 1 ), . . . , W i * (t p ) } ′ be the noisy observations for subject i * . Under the functional linear model (2) Then under the functional data model (1), the best predictor of Y i * conditional on W i * (t) is the best linear unbiased predictor of Y i * , And the mean squared error for predicting E(Y i * |X i * ) is Then the optimal design is t opt := arg min t∈T p M 2 (t). Note that the mean squared error for predicting Y i * is M 2 (t) + σ 2 e , which results in the same design. This design was studied in earlier work including Ritter (1996), and more recently Ji and Müller (2017).

A joint design for functional data
In practice, there might be multiple goals in design with each goal resulting in one optimal design. Depending on the goal, the corresponding optimal design may vary and may not be optimal for alternative goals. Indeed, the optimal sampling points for predicting functions may not be the optimal sampling points for predicting a scalar outcome, and vice versa. It may thus be useful to consider a joint design to balance between the different goals. Note that joint designs may also be referred to as compound designs in the statistical design literature (Atkinson et al., 2007, Chapter 21).
Before formulating a joint design, consider first the design objective function where B is an arbitrary positive semidefinite matrix and we will call it a ''linear design criterion matrix'' as the objective function depends linearly on elements of B. The form in (7) is general with different B leading to different designs. In particular, it includes as special cases, the objective functions for the design for predicting functions (5) and for the design for predicting an outcome (6). Indeed, for predicting the growth curve of a new subject, B is the identity matrix and for predicting a scalar outcome of a new subject, B = ββ ′ . Additionally, if it is more important to predict a curve more accurately at some time points than others, one may consider a weighted mean integrated squared error E It can be shown that the objective function to be minimized still takes the form in (7) with a particular design criterion matrix. Specifically, with a finite operator norm by Lemma 1 in Appendix A.

Now consider a bivariate continuous function
M 2 (t)). Suppose the joint design is to minimize the objective function M(t). It is reasonable to impose the following assumption on f (·, ·): Assumption 1. f (x, y) is nondecreasing along both x and y and f (0, 0) = 0. Moreover, lim x→0,y→0 f (x, y) = 0.
Let w 1 and w 2 be two fixed non-negative constants. Two sensible forms of f are: f 1 (x, y) = w 1 x + w 2 y and f 2 (x, y) = max(w 1 x, w 2 y). The former is a joint design that minimizes a linear combination of two prediction errors while the latter means that the joint design aims to minimize the maximum of the two prediction errors (up to multiplicative weights).
In particular, It is straightforward to show that both forms satisfy Assumption 1. The two constants w 1 and w 2 are used to control the weights of the two different design objective functions. One reasonable choice of w 1 and w 2 is to balance the two design objective functions such that one design does not dominate the other. In view of (7) and Theorem 2 from the following section, we may let w 1 = 1/tr(Λ) and w 2 = 1/tr(ββ ′ Λ), and it can then be shown that 0 ≤ w 1 M 1 (t) ≤ 1 and 0 ≤ w 2 M 2 (t) ≤ 1.

Properties of M(t)
In this section we study the properties of M(t) for any function f (·, ·) that satisfies Assumption 1. We assume that the random functions, X (t), are square integrable (i.e., E {∫ T X (t) 2 dt } < ∞) and the coefficient function β(t) in the functional linear regression model (2) is also square integrable (i.e., ∫ T β 2 (t)dt < ∞). Proofs of the theorems are provided in Appendix A.
Theorem 1 implies that more observations (i.e., larger p) do not increase the value of the objective function M(t).
We also study the deterministic bound of M(t) as p diverges to infinity according to a fixed design.

Theorem 2. Suppose that the assumptions stated in Appendix
Theorem 2 provides the rationale that a dense set of time points in T is sufficient as the candidate sampling points. In practice, because of the cost for data collection and other considerations, a small number of sampling points with reasonable prediction power might be preferred. In Section 4.2, we propose a data-driven method for selecting the number of optimal time points.

Model estimation using pilot data
To implement the proposed optimal design, we need to estimate the covariance function r(s, t), error variance σ 2 ϵ and coefficient function β(t) using pilot data. Many methods exist for covariance function estimation including local polynomial regression (Yao et al., 2005), mixed effects models (James et al., 2000) and geometric PCA (Peng and Paul, 2009). We use the fast covariance estimation method (FACEs) from Xiao et al. (2017), which uses a penalized tensor product of cubic B-splines for approximating the true covariance function. The error variance σ 2 ϵ can also be estimated by FACEs. As for estimating β(t), we select K , the number of eigenfunctions, by the percentage of variance explained (PVE) with a value of 0.95.

Optimization algorithm and selection of number of design points
In practice, the optimal sampling points are selected from a pre-determined set of candidate time points, denoted by s. Theorem 2 suggests that equally spaced sampling points can form a reasonable set of candidate points. If the number of selected design points is small, then we use a full search algorithm (i.e., we evaluate M for every combination of p points from s). If the number of selected design points is large, a full search becomes computationally difficult and one may use a Monte Carlo sampling method in Wu et al. (2017) or a sequential search method in Ji and Müller (2017). In this paper we focus on the full search algorithm.
In many applications, the number of optimal time points p may not be known a priori. One approach is to choose the smallest p such that the expected error is smaller than some pre-determined tolerance error. Alternatively, similar to Ferraty et al. (2010), one may incorporate the cost of collecting more sampling points into consideration. Here we propose a new method for selecting p. First, when t = ∅, an empty set, we define M 1 (∅) = tr(Λ), M 2 (∅) = tr(ββ ′ Λ), and M(∅) = f (M 1 (∅), M 2 (∅)). Note that M 1 (∅) is the total variation of the functional predictor while M 2 (∅) is the total variation of the response that can be explained by the functional predictor. Then it can be easily verified by the definitions in (5) and (6) and the assumptions on f that M(t) ≤ M(∅) for any t of any dimension. Let t ⋆ p = arg min t⊆s, t∈T p M(t). Then, where 0 < δ < 1 is a fixed constant corresponding to the maximum percent reduction in expected squared error gained by augmenting the design with an additional design point. By Theorem 1 in Section 3, for any p, Small values of δ seem preferable and we use δ = 0.05 in both the data application and simulations. This implies that we select ap such that the addition of a new design point will result in no more than 5% of reduction in expected squared error with respect to the error reduced by using the fully observed functional predictor.

Software and shiny interactive graphic
The proposed optimal design method has been implemented as an R package (R Core Team, 2016) FDAdesign that includes interactive graphics using shiny (Chang et al., 2016) which can be used to evaluate design objectives corresponding to different sampling designs. The interface of the graphic is illustrated in the data application. Details about using the FDAdesign package and the interactive graphics can be found in Section S.1 of the Supplementary materials.

Application to fetal ultrasound
We apply the proposed methodology to fetal growth data, where ultrasound scans were performed at different weeks of gestational age (GA). For this analysis, we model measurements of abdominal circumference by ultrasound (scaled within the range of 0 and 1) to estimate individual fetal growth trajectories and use newborn birth weight as the scalar outcome.
The fetal growth dataset contains between 1 and 6 ultrasound scans for each of 2388 subjects, with most subjects having 5 scans. The spaghetti plot for the ultrasound measurements is shown in Fig. 1, with data from 3 subjects highlighted. While the 3 subjects do show some degree of curvilinearity, the overall pattern of trajectories raises the question of whether a linear mixed effects model would suffice for this data.
Thus, we compare the functional model and the linear mixed effects model using 10-fold cross validation. We find that the linear mixed effects model has twice the prediction error of the functional model. Figure S.2 in the Supplementary materials illustrates the prediction performance of the two models for one particular case, where 90% data are used for model estimation and the remaining 10% data are used for evaluation. Therefore, using functional model seems more appropriate for this application. When we predict subject birth weight using the functional linear model (2) with abdominal circumference as the functional covariate, it turns out that about 99% of the variation in the birth weight is explained by the functional covariate. For the estimated coefficient function; see Figure S.3 of the Supplementary materials.
Finally we consider a linear joint design with the target of accurately recovering the ultrasound measurements of fetal abdominal circumference and predicting the newborn outcome of birth weight. The objective function for the joint design  is w 1M1 (t) + w 2M2 (t), whereM 1 (t) is the estimated objective function for recovering individual functions whileM 2 (t) is the estimated objective function for predicting a scalar outcome; see Section 2.4 for more details. To balance the two objective functions, we let w 1 = ( ∑ 3 ℓ=1λ ℓ ) −1 and w 2 = ( ∑ 3 ℓ=1λ ℓβ 2 ℓ ) −1 , where theλ ℓ and theβ ℓ are estimated from the fetal data and the top 3 eigenvalues are selected using a PVE of 0.95. The above weights ensure that 0 ≤ w 1M1 (t) ≤ 1 and 0 ≤ w 2M2 (t) ≤ 1. As a result, one objective function will not dominate the other. We let the set of candidate time points s be the collection of half weeks between 13 and 41 weeks gestational age. Using the proposed method, we determine the optimal sampling points when the number of sampling points p is fixed at 1, 2 and 3. We also calculate the relative errorM/M(∅) and Fig. 3 displays the results. The top left panel of Fig. 3 shows that if only 1 sampling point is selected, then 37 weeks is the optimal time point for collecting the ultrasound measurement and its relative error is about 0.20 (bottom right panel of Fig. 3). If 2 sampling points are desired, then 32 and 38 weeks are the optimal time points for collecting ultrasound measurements. With 2 optimal sampling points, the relative error is 0.13, which is smaller than the relative error with only 1 optimal sampling point. The bottom right panel of Fig. 3 displays the relative error with several values of p. As expected, the relative error decreases as p increases. Using the selection criterion (9) with δ = 0.05, we determine that optimally, 2 sampling points would be selected.
To evaluate the uncertainty in the estimated optimal sampling points, we bootstrap the fetal ultrasound data at the subject level and select the optimal sampling points for 1000 bootstrapped datasets. Fig. 4 gives the histograms of the selected optimal sampling points, which show small variability of the estimated optimal sampling points. For example, for p = 1, week 37 is selected about 60% of the times.
Finally, we plot in Fig. 5 screenshots of the Shiny interface for the fetal ultrasound. The top panel displays the heat map of the objective function/prediction errors as a bivariate function of two scan weeks. The optimal weeks are highlighted. The heat map indicates that at least one sampling point needs to be no early than 33 weeks in order to obtain a relatively small prediction error. As these plots evaluate the prediction error of any combination of candidate sampling points, they can be used to find all candidate sampling points that give a prediction error smaller than certain fixed error. The interface is interactive as users can select the first scan weeks and then the application will find the optimal second scan weeks. Moreover, users can go further by selecting the second scan weeks and compare the results with the optimal scan weeks. For example, as illustrated in the bottom panel, 13 weeks is selected for the first scan, then the 37 weeks is found to be optimal second scan weeks (left plot in the bottom panel). If 16 weeks is also selected for the second scan, then the result can be compared with several different choices including the optimal scan weeks (right plot in the bottom panel). A similar screenshot with the goal of selecting just one scan is presented in Figure S.4 of the Supplementary materials. Fig. 4. Histograms of selected optimal scan weeks from 1000 bootstrapped datasets for p = 1 and p = 2. The blue dashed lines are the estimated optimal scan weeks using the original fetal ultrasound data.

A simulation study
We conduct a simulation study to investigate the performance of the proposed design for (a) estimating optimal sampling points and for (b) selecting the number of optimal sampling points. We also compare functional data models against a parametric mixed effects model in terms of estimating optimal sampling points, when data is generated from either a functional data model or a parametric mixed effects model. We focus on the linear joint design where the goal is to best predict both an underlying true curve and a scalar outcome and we use the same design criterion matrix in the data example with weights w 1 = ( ∑ ℓ λ ℓ ) −1 and w 2 = ( ∑ ℓ λ ℓ β 2 ℓ ) −1 .

Simulation settings
For each simulation scenario, we use 200 Monte Carlo samples from the model in (1). For simplicity, we let the mean function µ(t) be zero for all t. We generate X i (t) by X i (t) = ∑ 5 ℓ=1 ξ iℓ φ ℓ (t), where {φ 1 (t), . . . , φ 5 (t)} is a set of orthonormal eigenfunctions (to be specified later) and ξ iℓ is sampled from a normal distribution with mean zero and variance λ ℓ = 10/2 ℓ . Random errors ϵ ij are sampled independently from a normal distribution with mean zero and variance σ 2 ϵ = 9.6875, which implies that the signal to noise ratio σ −2 ϵ ∑ 5 ℓ=1 λ ℓ equals one. The number of observations per subject varies across subjects and the sampling time points are drawn from the uniform distribution in the unit interval. We consider a factorial design with three experimental factors:
Note that in FLM-Case1 the coefficient function β(t) depends on the eigenfunctions φ ℓ (t) and is different for the periodic and non-periodic covariances (see Figure S.7 of the Supplementary materials). Random errors in (2) were sampled independently from a normal distribution with mean zero and variance σ 2 e = 4.
(a) Heat map of the objective functionM(t) evaluated with two scans.
(b) Objective functionM(t) evaluated with two scans while one fixed at 16 weeks.

Results for estimation of optimal sampling points
We consider estimation of optimal sampling points when the number of optimal points p is fixed at either 3, 4 or 5. Let t * p be the p optimal sampling points that minimize the true objective function M(·) and lett p be the p selected sampling points that minimize the estimated objective functionM(·). We evaluate the accuracy of the estimated optimal sampling points using the following evaluation criterion: The absolute relative error, ARE p,i sim , measures how close the expected (integrated) squared error using observations collected at the p estimated optimal sampling points is to the expected (integrated) squared error using the p true optimal points. We compare between M(t * p ) and M (t p,i sim ) , rather than betweent p and t * p , for the following reasons. First, when the covariance function r(s, t) is periodic as the one shown in the top left panel of Figure S.6 of the Supplementary materials, t * p is not identifiable. This is because with a periodic covariance function, data (excluding random errors) collected at any sampling point in the left half of the domain is the same as data collected at one sampling point in the right half. The identifiability issue is illustrated in Section S.3.2 of the Supplementary materials. Second, as our ultimate goal is to minimize M(·), the expected (integrated) squared error, we consider that the measure ARE p,i sim is more appropriate.
In additional to functional data methods, we consider the following linear mixed effects (LME) model, W ij = b i0 +b i1 t ij +ϵ ij , for estimating the covariance function r(s, t), where b i0 and b i1 are subject-specific random intercept and slope, respectively. The above model leads to a quadratic covariance function. In the following tables we use the labels, non-parametric and Table 1 Median of absolute relative errors, {ARE p,i sim : i sim = 1, . . . , 200} and the corresponding interquartile ranges (IQR) in parentheses for the case of the periodic covariance.

Joint-Case1
Joint-Case2 Joint-Case3 Note: Joint-Case1 indicates that the scalar responses are generated using β(t) in FLM-Case1; similarly, Joint-Case2 corresponds to FLM-Case2 and Joint-Case3 to FLM-Case3. non-parametric and Parametric refer to the covariance estimation using the fPCA and LME models, respectively.
parametric, to indicate covariance estimation using the functional data model and using the linear mixed effects model, respectively.
The results with the periodic covariance function are summarized in Table 1 decreases with more number of optimal sampling points. The same holds when we use the parametric covariance estimation, estimating r(s, t) using the LME model. Because the LME model is misspecified for modeling functional data, selecting more optimal points by the parametric estimation has only a slight effect on improving the prediction accuracy. In all cases the proposed method with the non-parametric covariance estimation gives a smaller prediction error than the parametric estimation. When data are generated from the LME model, the proposed method performs equally well with both the non-parametric and parametric covariance estimation; see Section S.3.4 of the Supplementary materials. In conclusion, the proposed method with non-parametric covariance estimation using the fPCA model performs well on data with both simple and complex covariance structures.

Results for selection of number of optimal sampling points
Now we evaluate the performance of the proposed method in (9) for selecting the number of optimal sampling points p.
We use δ = 0.05 and the true number of optimal points p * determined by (8) is 3. The performance of the proposed method is assessed in terms of the proportion of selecting the correct number of optimal sampling points, is the number of optimal sampling points determined by (9) using the ith simulated data.
The simulation results are presented in Table 2. We see that the performance of the proposed method is excellent for all cases. The results for the non-periodic function are similarly good and presented in Table S.4 of the Supplementary materials.

Uncertainty of estimated optimal sampling points
To assess the uncertainty of the optimal sampling points estimated from the proposed method, we use a bootstrap approach as in the data application. For each simulated data, we bootstrap at the subject level, select optimal sampling points from the model estimation based on the bootstrapped data, and calculate the third quartile and 90% percentile of absolute relative errors in (10). The medians of the percentiles are presented in Tables S.7 and S.8 of the Supplementary materials. The results show good stability of the estimated optimal sampling points, which gets better when either the sample size or the number of observations per subject increases. positive semidefinite and S(t) = Λ 1/2 F(t)Λ 1/2 and S (t) = Λ 1/2 F (t) Λ 1/2 . For simplicity we denote F(t) by F and F (t) byF.
SinceF − F is positive semidefinite, there exists a matrix (F − F) 1/2 such thatF − F = (F − F) 1/2 (F − F) 1/2 . Then In the above, ∥ · ∥ F is the Frobenius norm of a matrix. □ Lemma 2. Suppose t ∈ T p ,t ∈ T p+c for some fixed integer c > 0 and T is a bounded interval. Define Proof. Without of loss of generality we assume that σ 2 ϵ = 1. Let A = Φ (t) Λ 1/2 and A 1 = Φ(t)Λ 1/2 . Then A can be partitioned as A = ( ) ′ (after some proper permutation of the index int), where A 1 and A 2 are of dimensions p × K and c × K , respectively. It follows that .
It follows that where B 1 = A −1/2 22·1 A 21 A −1 11 A 1 and B 2 = A −1/2 22·1 A 2 . Therefore, we derive that which is always positive semidefinite. □ Proof of Theorem 2. In light of Assumption 1 and Eq. (7), it suffices to prove the theorem for the design objective function in (7) with an arbitrary positive semidefinite matrix B of finite operator norm.
For the random design, the key is to establish that |v 2 kk /(pλ k ) − 1| = o P (1), |d k − pλ k | = o P (1) for all k ≤ K 0 , which holds by a proposition (similar to Proposition 5.1 in Bunea and Xiao (2015)) in the Supplementary materials. □