Assessing Prediction Error at Interpolation and Extrapolation Points

Common model selection criteria, such as $AIC$ and its variants, are based on in-sample prediction error estimators. However, in many applications involving predicting at interpolation and extrapolation points, in-sample error cannot be used for estimating the prediction error. In this paper new prediction error estimators, $tAI$ and $Loss(w_{t})$ are introduced. These estimators generalize previous error estimators, however are also applicable for assessing prediction error in cases involving interpolation and extrapolation. Based on the prediction error estimators, two model selection criteria with the same spirit as $AIC$ are suggested. The advantages of our suggested methods are demonstrated in simulation and real data analysis of studies involving interpolation and extrapolation in a Linear Mixed Model framework.


Introduction
Predicting a phenomenon at different points than the points appearing in the training sample plays an important role across many research fields such as in Geostatistics (Li and Heap 2014;Kyriakidis and Journel 1999), health (Manton et al. 2012) and Econometrics (Baltagi 2008).In many of these use cases the new predicted points are interpolation or extrapolation points with respect to space or to time.For example, Brown and Comrie (2002) interpolated climate values in Southwestern U.S., where the coverage of climate information is sparse.By predicting at interpolation points, they created a high-resolution map of seasonal temperature and precipitation in this area.Another example given by Stewart et al. (2009) is forecasting the effects of obesity and smoking on U.S. life expectancy in 2020 by using a data set for the years 2003 through 2006.
Modeling approaches involving prediction at interpolation and extrapolation points were studied in Machine Learning, mainly in the context of transductive Support Vector Machine (Joachims 1999), however also in regression (Le et al. 2006).
Assessing prediction error at interpolation and extrapolation points, or more generally at transduction points, cannot be done using traditional in-sample prediction error estimators as is used in AIC (Akaike 1974) and its variants.Similarly, K-fold Cross-Validation, which estimates the generalization error, is also unsuitable in these cases, where prediction points are specified.This paper introduces a prediction error estimator, tAI, which generalizes previous in-sample prediction error estimators like mAI (Vaida and Blanchard 2005) and cAI (Vaida and Blanchard 2005), however, it doesn't assume that the predicted points are the same as the points appearing in the training sample and therefore is applicable to a wider range of use cases, such as cases involving prediction at interpolation and extrapolation points.Since prediction error assessment is highly related to model selection, a new model selection criterion, tAIC, which is based on tAI, is proposed as well.tAI is suitable when the observations are normally distributed, whether they are correlated or not and therefore is applicable for various parametric models with different variance structure assumptions such as Linear Mixed Model (LMM), Gaussian Process Regression (GPR), Generalized Least Squares (GLS) and Linear Regression.Relaxing the normality requirement of tAI, we also propose in Section 5 an approach for inference on interpolation and extrapolation that is based on squared error loss rather than likelihood, and hence generalizes the Optimism approach in model selection (Efron 1986).
In many use cases involving predicting at interpolation and extrapolation points, the dependent variable has a correlation structure (Li and Heap 2014;Kyriakidis and Journel 1999).For example, in the use case that is given by Brown and Comrie (2002), it is natural to assume a spatial correlation structure on the Southwestern U.S. area.Similarly, in repeated measures studies that forecast long-term treatment effects, a correlation structure with respect to time is commonly assumed (Ho et al. 2011).Therefore, use cases involving correlated data and models that are implemented on correlated data, such as LMM, GPR and GLS, are good platforms for analyzing how predicting at interpolation and extrapolation points influences prediction error estimation and model selection.Before introducing tAI, a setup which puts LMM, GPR and GLS under a unified framework, will be defined: Let y ∈ R n and the fixed matrices {X ∈ R n×p , Z ∈ R n×q } be a training sample, y * ∈ R n * and the fixed matrices {X * ∈ R n * ×p , Z * ∈ R n * ×q } be a prediction set, where µ = Xβ, µ * = X * β, V is a function of Z and V * is a function of Z * .For example, in LMM it is typically assumed that the columns of Z, Z * are associated with normally distributed random effects with covariance matrix G ∈ R q×q such that where I n , I * n are the identity matrices with dimensions n and n * respectively.In GPR it is often assumed that where K is some kernel function.
In addition denote where By normality of y and y * , Given V, Cov(y * , y) and the ML estimator of β, E(y * |y) can be used for predicting y * as follows This procedure generalizes standard prediction practices in LMM, GPR and GLS.In addition, f * is the Best Linear Unbiased Predictor (BLUP) (Harville 1976).
tAI is an estimator of the following prediction error, Correspondingly, given a set of candidate models, tAIC would be defined as a model selection criterion selecting a model with the minimal tAI.This methodology of estimating the prediction errors for different models and then selecting the model with the minimal prediction error, is the same as is implemented in AIC and its variants.
{X * , Z * , R * } = {X, Z, R} is not assumed in the setup above and in its associated prediction error measure, eq. ( 3).Therefore, tAI is applicable in various use cases that require flexibility in defining {X * , Z * , R * }.For example, in the use case mentioned above of Brown and Comrie (2002), where GPR is used for predicting interpolated climate values (Kriging), it is reasonable to define {X * , Z * } as the data points at the high-resolution spatial array rather than as the data points at the training sample, {X, Z}, which cover the area sparsely.Therefore, while prediction error estimators that are based on insample error estimation and generalization error estimation are unsuitable to this case, tAI is suitable.For similar considerations, tAIC is required in repeated measures studies in health and Biomedicine, when the main interest is to select LMM model minimizing the prediction error at long-term points, {X * , Z * , R * }, which are different than the points that are used for model building, {X, Z, R} (Pope III et al. 2002;Li et al. 2008).
Beside downscaling of climate maps and estimating long-term effect in clinical studies, interpolation and extrapolation using LMM and Kriging are important tools for many research topics in mining engineering, agriculture, environmental sciences, especially when sampling is difficult and expensive like in mountainous and deep marine regions (Li and Heap 2011;Stahl et al. 2006;Vicente-Serrano et al. 2003).tAI and tAIC are relevant for all these research topics as well as for others which don't involve interpolation and extrapolation but still don't satisfy {X * , Z * , R * } = {X, Z, R}.Various use cases will be presented and analyzed in Sections 3 and 4.
2 tAI and tAIC tAI is derived by estimating −E y * |y l(y * )/n * by the averaged log-likelihood of the training sample, plus a penalty correction where is the estimated conditional expectation, Ê(y * |y), when {X * , Z * , R * } = {X, Z, R}.This approach of estimating prediction error by deriving the bias of the training error is also used in AIC and its variants (Akaike 1974).Consequently, the estimator and tAI can be seen either as an estimator of −E y * |y l(y * )/n * or of its expectation The following theorem and corollary introduce a general expression for C tAI and therefore also for tAI and tAIC.
Theorem 1.Consider the setup given in eq. ( 1).In addition, let Hy ∈ R n and H * y ∈ R n * be predictors of y and y * respectively, where H and H * don't contain y, y * and satisfy The proof is attached in Appendix A.
Corollary 2. Under the set-up described in eq. ( 1), the conditions in Theorem 1 are satisfied by the BLUPs f and f * , where and By Corollary 2 and Theorem 1, C tAI can be calculated under the setup that is described in eq. ( 1).Therefore, tAI can be implemented in LMM, GPR and other related models.
Beside prediction error estimation, these results can be used for defining the following model selection criterion.
Definition 3. Given set of models H, satisfying the conditions in Theorem 1, tAIC is the following criterion where tAI h is tAI for model h.

Comparison with other prediction error estimators
The prediction error estimators that appear in cAIC and mAIC (Vaida and Blanchard 2005) were developed for normal linear models under different restrictions on the variance structure, but assuming {X, Z, R} = {X * , Z * , R * }. cAIC is aimed at the LMM and GPR case, where Cov(y * , y) = 0, while mAIC considers the GLS case where Cov(y * , y) = 0.
For cAIC the prediction error estimate is: while for mAIC it is: cAIC, mAIC are defined from cAI, mAI similarly to tAIC.
It is easy to confirm that when {X, Z, R} = {X * , Z * , R * } our tAI formula indeed reduce to the cAI and mAI formulas.
In addition, for GLS, we can also show an interesting interpretation for the difference between the mAI and tAI expressions.With a little algebra we get: where 1 Vaida and Blanchard (2005) define this prediction error estimator with a factor of 2n, i.e., as 2n×cAI.
In addition, they denote the prediction error estimator as cAIC.However, here, in order to distinguish between the prediction error estimator and the model selection procedure, the prediction error estimator is denoted as cAI and the criterion as cAIC.Similarly with mAI and mAIC.
Since Var( β) achieves the Cramer-Rao bound: where I is Fisher-information.The determinants |V | and |V * | are often called the generalized variance (Wilks 1932;Johnson et al. 2014).

Relaxing Theorem 1 Conditions
Although this paper focuses on prediction error estimation and model selection for LMM and GPR, Theorem 1 is more general and doesn't assume the paradigm applied in LMM and GPR, i.e., predicting using E(y * |y) and estimating the marginal mean parameters with MLE.Theorem 1 assumes: 1. Normality of y * and y.

Ey
and therefore can be used in other cases satisfying the above conditions.
When the normality assumption cannot be taken, another model selection criterion, which is based on similar approach as tAI can be implemented.For more details see Section 5.
In case the normality assumption can be taken, however the fitted model doesn't satisfy condition 3 of unbiasedness, the following extended version of Theorem 1 results can be used instead: The proof can be found in Appendix A as part of the proof of Theorem 1.
Note that this expression is less useful, as it depends on µ and µ * which are unknown.

Use cases
In this section, typical use cases of using tAI and tAIC are presented.

Predicting interpolation and extrapolation in spatial array and longitudinal temporal data
As was described in the introduction, predicting interpolated and extrapolated data points using LMM and GPR is common in Biomedicine, health, Climatology and other research fields, where temporal and spatial datasets are common.The flexible definition of X * , Z * , R * and V * in tAI makes it applicable when the goal is to estimate prediction error at interpolated and extrapolated data points along time and space dimensions.
In Section 4 we analyze numerically a repeated measures clinical study, containing child growth measurements (Potthoff and Roy 1964), where interpolation and extrapolation objectives can be defined and application of tAI is demonstrated.The following example, built on the application of Tsanas et al. (2010), demonstrates that appropriate use of tAIC can also simplify and improve on existing methodology.We note that in Example 3.1, one may think that y * is used twice, for model building and for prediction error estimation, and therefore over-fitting can occur.However, since in tAI approach, unlike in cross-validation approach, y * is used as a conceptual idea in order to derive C tAI and not as real observations, y * is not used twice.
In the spatial data analysis domain, common application areas include geographical data (Li and Heap 2014) and neuroimaging data (Salimi-Khorshidi et al. 2011).Such studies usually use GPR rather than LMM.Although GPR and LMM reflect different perspectives -while GPR is based on functional data analysis, LMM is based on multivariate analysis -and use different techniques for expressing the covariance matrices, both models use conditional expectation, f * , for prediction, hence tAI is also appli-cable for GPR.In the Introduction we demonstrated this by the use case of creating high-resolution climate maps (Brown and Comrie 2002).A similar use case, containing chemical concentration in soil data is analyzed numerically in Section 4.

Other Transductive Settings
LMM and GPR are also used for modeling data without spatial or temporal correlation structure, and the prediction problems that arise often involve prediction outside the training sample.
One interesting example is modeling the effect of SNPs (Single Nucleotide Polymorphism) on a phenotype as part of a Genome-Wide Association Study (GWAS).In this case the common practice is to consider the SNPs as random effects and other explanatory variables (e.g.age, height and gender) as fixed effects (Zhang et al. 2010).When using LMM for modeling the effect of SNPs on phenotype, tAI allows estimating the prediction error for an extended population compared to the training sample.It is directly useful in the important case when {X * , Z * } can be collected from other studies which investigate different phenotype, however contain the SNPs and the explanatory variables that are used in the training sample (Wray et al. 2013).
Missing values of the dependent variable which is a common phenomenon in statistical analysis and in particular in clinical trial with repeated measures study design (Wood et al. 2004; O'neill and Temple 2012).There are many methods for handling missing values in repeated measures studies, some of the methods involving missing values imputation (Mallinckrodt et al. 2003).In case of having missing data of the dependent variable at some known points but the goal is to estimate the prediction error with respect to the original study design (Hogan et al. 2004), tAI can be used without imputing the missing values.

Numerical Results
This section focuses on comparison between tAI, cAI and mAI, as well as between their corresponding model selection criteria, tAIC, cAIC and mAIC, using simulation and real data analyses.

Simulation Analyses
The goal of the following analyses is to investigate the accuracy of tAI, cAI and mAI in estimating −E y * |y l(y * )/n * , for different sample sizes and variance setups.In addition, tAIC, cAIC and mAIC will also be analyzed and compared with respect to the oracle solution Additional numerical results with respect to a potentially different oracle solution are presented in Appendix C.

Simulation setup
The simulation demonstrates prediction error estimation and model selection for the following LMM setting: where i ∈ {1, ..., S} is the subject number and j ∈ {1, ..., 12} is the measurement number.
Three linear mixed models were fitted given the true covariance matrices, all the models contain the time covariate, in addition, model number 1 contains x i,j,k , ∀k ≤ 2, model number 2 contains x i,j,k , ∀k ≤ 4 and model number 3 contains x i,j,k , ∀k which is also the model that generates the data.
Results As can be seen from Figure 1, tAI density is concentrated around the mean of where h best is the selected model by the relevant criterion.This error reflects the true average error that is obtained when implementing the different model selection criteria.
In addition, the average error of the oracle criterion, is presented as well.
As can be seen from Figure 2, tAIC obtain better results than cAIC and mAIC in all the nine setups.Similar analysis with respect to the error   As can be seen from Figure 3, tAIC achieves the best results in this case as well.
Similar analysis with respect to the oracle criterion

Real data analyses
The analyses below focus on comparison between tAI, cAI, mAI and Meuse.grid is a higher resolution grid of the same area, containing 3103 observations of location and some of the covariates that are available in the Meuse data set, however it doesn't contain the metal concentration measurements.The Meuse.grid is available in R software as well.

Results
The Meuse data set was partitioned randomly into training and test samples.
Four Gaussian process regression models were fitted to the log of the Lead concentration. 2 All the models share the same kernel structure, squared-exponential kernel, where Z i,1 refers to the latitude of measurement i, Z i,2 refers to longitude of measurement i and l 1 , l 2 and σ f lie in R + .Each model has a different marginal mean, see Table 1.The descriptions of the covariates can be found in R software.As can be seen in Figure 4, tAI  As can be seen from Figure 5, the differences between the tAI, cAI and mAI are sustained and the results are consistent with the previous figures, i.e., cAI, mAI give lower error estimates, which likely underestimate the prediction error.

Growth data
Data description The Growth data was introduced by Potthoff and Roy (1964) and contains four skull length measurements for 27 children at ages 8, 10, 12 and 14 (total of 27 × 4 measurements) along with the child's age and gender.
Results measurements at age 14.Three linear mixed models are fitted, all have the same variance structure, containing random intercept per child and random slope for the child's age, however each model has a different set of fixed effects (see Table 2).Growth Data

Holdout Age=14
Figure 6: For each model, each symbol refers to a prediction error, estimated by a different method.
As can be seen in Figure 6, in general perspective, tAI estimates −l(y * )/n * most accurately.The other prediction error estimators under estimate −l(y * )/n * .
Figure 7 presents three similar analyses as is presented in Figure 6, however where the other time-points measurements are designated as holdout.When age = 8, the results are similar to the results in Figure 6, however, when age = 10 and age = 12, tAI and cAI have similar performance.This is not surprising since in these cases {X * , Z * , R * } is similar to {X, Z, R}.

Optimism for Prediction at Interpolation and Extrapolation Points
The formulation of tAI and the derivation of C tAI are based on the normality assumptions of y and y * which is commonly assumed when LMM and GPR are implemented.However, the approach that is used for developing tAI can be used for creating other prediction error estimators that are not based on the normality assumption of y * and y.For example, in the standard formulation of the prediction error estimator that is based on expected Optimism correction (Efron 1986), where and it is assumed that y * and y are drawn from the same distribution and have the same predictor, Hy.However, as was already discussed in the previous sections, these conditions are not satisfied in many use cases.The following prediction error generalizes where Similarly to tAIC definition, given a set of models H, Loss(Opt t ) can be used for model selection as follows where Loss h (Opt t ) is Loss(Opt t ) for model h.
Lemma 4 introduces a general expression of w t for predictors that are linear in y.
Lemma 4. Let y ∈ R n be a random variable with mean µ and variance V. Similarly, let y * ∈ R n * be a random variable with mean µ * and variance V * .In addition, let Hy ∈ R n and H * y ∈ R n * be the predictors of y and y * respectively when H, H * don't contain y and y * .Then Corollary 5. Given the definitions in lemma 4, when Hµ = µ and In case H * = H, V * = V and V − Cov(y, y * ) = σ 2 I n , then which is the same result as was introduced by Hodges and Sargent (2001) for Linear Hierarchical models.
Loss(Opt t ) is based on the squared error loss function which reflects Euclidean distance.Other prediction error estimators which are based on different distances, such as on Mahalanobis distance (Mahalanobis 1936), might be suggested as well.Corollary 6 presents a penalty correction for a prediction error estimator which is based on Mahalanobis distance.
Corollary 6.Given the definitions in lemma 4, when Hµ = µ and , where The relation between eq. ( 6) and C tAI arises due to the relation between Mahalanobis distance and the normal likelihood which tAI is based on.
It is natural to use Loss(Opt t ) instead of tAI for linear predictors that don't assume normality, such as the predictors that are used in nearest neighbors, Nadaraya-Watson kernel regression and smoothing spline models.Moreover, due to the form of the normal density function, many predictors that seem to be based on the normality assumption can be alternatively interpreted as a solution of a least squares problem or complex versions of least squares problems like weighed least squares and penalized least squares problems.
For example, GLS can be interpreted as the solution of weighted least squares problem with the weight matrix V −1 .Similarly, f * can be interpreted as the solution of the following problem, The proof is attached in in Appendix A. These alternative interpretations are free from normality assumption and therefore Loss(Opt t ) can be suitable for them.Since many predictors can be interpreted in different ways, then the assignation of predictors to tAI or to Loss(Opt t ) should refer to the possibility to assume normality rather than to the predictor type.

Discussion and Conclusions
tAI is an extension of the prediction error estimators that are used in cAIC and mAIC, extending them to estimate prediction error at interpolation and extrapolation points.
As it is demonstrated in Section 3, these use cases are common in various research fields, and particularly in Geostatistics and health, when GPR and LMM are used for predicting at interpolation and extrapolation points.Since GLS, linear regression and smoothing splines can be expressed as LMM (Brumback et al. 1999), tAI is applicable for them as well.
The correction in tAI is more complicated than the corrections in cAIC and mAIC, which are tr(H)/n and p/n respectively.The correction in tAI, is affected by the relations between Var(y) to Var(y * ), Var( f ) to Var( f * ) and between Cov(y, y * ) to Var(y).
When interpreting the correction as a measure of over-fitting, the differences between the corrections gives a new perspective about how the over-fitting is composed as a function of the variance structure of the problem.
In many cases the variances parameters are unknown in advance and therefore are estimated by various procedures prior the model fitting, e.g.REML in LMM (Verbeke 1997).Estimating the variance parameters implies an extra variation for tAI, especially when the sample size is small.Estimating the in-sample prediction error under the LMM setup when the variance parameters are unknown was addressed by (Liang et al. 2008).
Extending this to a transductive setup is a challenge for a future work.
The numerical analyses emphasize the practical importance in using tAI in scenarios where {X * , Z * , R * } = {X, Z, R} are different.It is noticeable especially when predicting at extrapolation points, since in this case the differences between Var(y) to Var(y * ) and between Var( f ) to Var( f * ) can be large.
Loss(Opt t ) is another prediction error estimator for cases involving predicting at interpolation and extrapolation points.Unlike tAI, Loss(Opt t ) doesn't assume that the observations are normally distributed and therefore it is also applicable in various nonparametric applications.Since many predictors that are apparently based on normal linear model can be alternatively interpreted as solutions for the generalized least squares problems, the assignation of predictors to tAI or to Loss(Opt t ) should refer to the possibility to assume normality rather than to the predictor formula.

A Proofs
Proof of Theorem 1. and and therefore The solution for B is: The solution for a is Therefore the optimal linear equation is the same as f * .
B Scenarios in mixed model where R = σ 2 I Example 7. Consider the following model and . Since b 1 is common for y and y * , its estimate can be utilized for achieving a better accuracy in predicting y * .Since y * doesn't contain b 2 and , estimating them doesn't contribute achieving a better accuracy in predicting y * .Therefore, in terms of predicting y * using f * , the following model definition has the same predicting formula as the previous one, Since the second formulation is simpler it can be preferred when the goal is predicting y * .
Example 8. Consider the standard LMM setup when y ∈ R n is drawn from K clusters, however each observation, y i , is an average of w i i.i.d observations, y i = ( w i l=1 φ i,l )/w i , where φ i,l ∼ N (0, σ 2 ).Assume φ i,l are unknown, however w i is known.The variance of the residual, in this case is  Another common use case is when due to poor available data, technical restrictions or other reasons, part of the correlation of y is not explained by the random effects.In those cases, this part will be expressed by the residual, , and therefore Var( ) will be a non-diagonal matrix.

C Additional Numerical Results
Figures  Figure 9 presents the error for each one of the model selection criteria, tAIC, cAIC mAIC and the oracle criterion

Example 3. 1 .
Tsanas et al.  introduced a new method for measuring progression of Parkinson's disease.Their motivation is that the standard methodology for measuring Parkinson progression, which uses UPDRS score (Unified Parkinson's Disease Rating Scale), is costly and requires a physician visit.Their alternative methodology is creating a formula that approximates the UPDRS score with speech signals which are not costly.Six months data was collected for their study, containing large amount of longitudinal speech signal measurements per patient, however, UPDRS scores were collected only at a small number of the time points.In order to select the best covariates with respect to the whole speech signals sample, they suggested to interpolate the UPDRS scores using 'straightforward linear interpolation', then to fit several alternative models and to select one of them using AIC and other model selection criteria.An alternative paradigm that doesn't require imputing UPDRS score is by using tAIC.Since tAIC doesn't assume {X * , Z * } = {X, Z}, there is no need in interpolating the UPDRS score in order to select a model minimizing the estimated prediction error with respect to the whole speech signals sample.

Figure 1 :
Figure 1: Densities of tAI, cAI, mAI and −E y * |y l(y * )/n * as a function of the number of subjects and σ 2 .

Figure 3
Figure 3 presents the agreement rate of the criteria, tAIC, cAIC and mAIC with the oracle criterion

Figure 2 :
Figure 2: For each setup, each symbol refers to the prediction error E y * |y − l h best (y * )/n * of the relevant criterion, mAIC, cAIC tAIC and the oracle criterion.

Figure 3 :
Figure 3: For each setup, each bar refers to the agreement rate of the relevant criterion with the oracle criterion (y * )/n * most accurately.The other prediction error estimators consistently under estimate −l(y * )/n * .

Figure 5 2Figure 4 :
Figure 5 is based on Meuse and on Meuse.griddata sets where the whole Meuse data set is used as training data and the Meuse.griddata set is used as the prediction set, {X * , Z * }.Since the Lead consternation is not given in the Meuse.griddata set, then −l(y * )/n * is unknown.Therefore tAI, cAI and mAI are compared without having a ground truth.2Only log(Lead) can be analyzed under the normality assumption.

Figure 5 :
Figure 5: For each model, each symbol refers to a prediction error, estimated by a different method.

Figure 7 :
Figure 7: For each model, each symbol refers to a prediction error, estimated by a different method.

Figure 8 :
Figure 8: Densities of tAI, cAI, mAI and −E y * |y l(y * )/n * as a function of the sample size and σ 2 .

Figure 9 :
Figure 9: For each setup, each symbol refers to the prediction error E y * |y − l h best (y * )/n * of the relevant criterion, mAIC, cAIC tAIC and the oracle criterion.

Figure 10 :
Figure 10: For each setup, each bar refers to the agreement rate of the relevant criterion with the oracle criterion −E y * |y l(y * )/n * .cAIandmAIare stochastically smaller than −E y * |y l(y * )/n * since their corrections, tr(H)/n and p/n, are unsuitable for this case of predicting at extrapolation points.In addition, since tAI, cAI and mAI share the same random part, l(y)/n, but different mean, −E y l(y)/n plus C tAI , tr(H)/n, p/n respectively, their densities have the same shape however shifted with respect to the corrections.In contrast, −E y * |y l(y * )/n * has the same mean as tAI but different variance, since Var −E y * |y l(y * )/n * depends on H Figure2presents for each criterion, tAIC, cAIC and mAIC, the error * , R * and n * that do not appear in Var (tAI) = Var (−l(y)/n) .In our case, H * contains large values compared to H and therefore Var E y * |y − 1 n * l(y * ) > Var (tAI) .

Table 2 :
Growth data: Covariates 8a and 8b present the distributions of tAI, cAI, mAI and −E y * |y l(y * )/n * for models 1 and 2. For more details, see Section 4.
1 n * E y E y * |y l h (y * )in the nine setups.For more details see Section 4.Figure10presents the agreement rate of the criteria, tAIC, cAIC and mAIC with E y E y * |y l h (y * ). *