Mixed-Effects Regression Splines to Model Myopia Data

Myopia is a disorder of ocular refraction with varying rates of progression. Although the disorder has a dynamic nature, prospective longitudinal studies with long term follow-ups have been remarkably few. In this paper, we show how mixed-effects regression splines with different choices of basis functions can be used to model myopia progression data in a flexible way. We show how the estimated model may be used to find prediction curves with corresponding confidence and tolerance intervals for a new myopic subject. We discuss alternative choices of the basis functions such as the truncated polynomial spline functions (2 types) and B-spline functions. Principal component functions may be used for an analysis of the variation of the curves in the population. The theory is collected together and presented in a coherent way as well as illustrated with a careful analysis of myopia progression data from a Finnish myopia study. Citation: Nordhausen K, Oja H, Pärssinen O (2015) Mixed-Effects Regression Splines to Model Myopia Data. J Biom Biostat 6: 239. doi:10.4172/21556180.1000239 J Biom Biostat ISSN: 2155-6180 JBMBS, an open access journal Page 2 of 9 Volume 6 • Issue 3 • 1000239 estimate of θ, namely


Introduction
Myopia, also known as nearsightedness, is defined as that state of ocular refraction in which parallel rays of light entering the eye at rest are brought to focus in front of the retina. In this situation, distant objects cannot be perceived distinctly. Myopia can occur at any age but in most cases it appears at school age and usually progresses several years, in few cases progression continues throughout life. In practice myopia is treated either by prescribing corrective lenses, such as glasses or contact lenses, or by performing refractive surgery. For people with myopia and for those who treat myopia and prescribe glasses, reliable predictions of myopia progression as early as possible would be extremely valuable.
The prevalence of myopia has markedly increased during recent decades in many countries. This increase is hard to explain only by hereditary factors. Epidemiological studies have confirmed that a longer education and higher occupational status, for example, are connected with an increased prevalence of myopia [1]. Many studies have shown that the younger the age of onset the faster is myopia progression [2]. Based on a small sample, Thorn et al. [3] estimated that myopia beginning in childhood "stabilizes" in 80% of cases by about 19-years of age. There has been an increased interest in long term follow-up studies to model the progression of myopia. The estimated models with possible covariates then often provide individual prediction curves as well. One may also wish to test, for example, whether the progression of myopia levels off at a certain age.
The dynamic progression of myopia has not been widely studied in the epidemiological or biostatistical literature. The studies mainly consider the individual progression curves from retrospective material or by means of progression in different materials [4]. Möttönen at al. [5] suggested to model myopia using a polynomial (quadratic or cubic) random coefficient model. The model is not satisfactory, however, as even cubic myopia progression curves are not flexible enough to explain strong changes between 10 and 20 years. In this paper we will show that mixed-effects regression analysis using quadratic or cubic splines is a flexible tool to model the progression of myopia that easily provides individual prediction curves. Furthermore, we show that different important questions concerning the progression of myopia may be answered simply by changing the spline basis functions. For a general theory for mixed-effects regression splines, see e.g. [6,7]. One of the aims of this paper is to collect and present the theory in an easy and coherent way for the users. Mixed-effects regression spline models have recently been applied to CD4 counts [8] and to the growth of cattle [9]. However, these models have never before been applied to myopia progression data.
The paper is structured as follows. In Section 2 we discuss the use of the random effect regression model in the case that subject i, i=1, ..., n, has p i measurements at time points t i1 , ..., t ip i . Note that the number of measurements as well as the time points vary individually as is always the case for myopia follow-up data. Each subject is then supposed to have his/her own myopia progression curve as a function of time. This curve is observed only through the measurements at the p i time points (with measurement errors coming from N (0, τ 2 )). The individual curves are linear combinations of k basis curves, and the k coefficients are assumed to come from a k-variate normal distribution with an unknown mean vector depending on parameter θ and unknown covariance matrix Σ. Parameters θ and Σ (providing the mean curve and variation of the curves) are the population quantities we are interested in. We discuss the estimation of parameters θ, Σ, and τ 2 and show how the estimated parameters may be used to find prediction curves with corresponding confidence and tolerance intervals. In Section 3 we discuss alternative choices of the basis functions; a special interest is in the set of principal component functions that can be used for a careful analysis of the variation of the curves in the population. In Section 4 we discuss the use of spline functions, truncated polynomial spline functions (2 types) and B-spline functions, as basis functions. In Section 5 the theory is then illustrated with a real data set from a Finnish longitudinal myopia study. The paper ends with some final comments in Section 6. estimate of θ, namely has a multivariate normal distribution The maximum likelihood (ML) estimates of the parameters θ, Σ, and τ 2 can be found by minimizing the -2 times the log-likelihood function The minimization can be done using the expectation maximization (EM) algorithm or other optimization routines. For more details about basis functions and their use in mixed models [6,7].

Prediction curves based on the model
Consider next the estimate of the mean progression curve with its pointwise 100(1-α)% confidence interval as well as its 100(1-α)% confidence band. One also often wishes to estimate the 100(1-α)% point wise tolerance interval and the 100(1-α)% tolerance band for the progression curve. These are found as follows. Notice that the confidence intervals and bands in 2 and 3 are based on the joint limiting normal distribution of the estimates and therefore only approximate.

The mean progression curve
An approximate 100(1-α)% pointwise confidence interval for the mean progression curve ( ( )) θ ′ ⊗ x f t at time t is given by Notice that pointwise confidence intervals do not give a confidence band for the (whole) mean progressive curve over all possible values of t. The confidence band is considered next.
An approaximate 100(1-α)% confidence band for the mean progression curve ( ( )) where c 1−α is given by . Note that c 1−α depends on limits t 0 and t 1 but can be easily found by simulation. Note also that  We then make the following model assumption.
The parameter Θ stands for the connection between x i and the myopia progression curve for the ith individual, Σ shows the variation of the curves (in the population of the progression curves), and τ 2 tells about the random variation of the measurements around the individuals curves. Note that, if we collect the assumptions together, the model can be also written as a mixed model where ⊗ denotes the Kronecker product. Then and, if Σ and τ were known, then the maximum likelihood (ML) the individual value of the refraction curve at time t, that is,

in the subpopulation of individuals having the same covariate vector value x) is given by
Finally an estimate for 100(1-α)% tolerance band for (in the subpopulation of individuals having the same covariate vector value x) is given by Assume next that the parameters in the model are known, and we predict the mean progression curve of individual i with covariate vector x i , that is, the curve The prediction is based on observation vector Then we have the following estimates and predictions.

Alternative choices of the basis curves
Let us assume that the model fitting is performed using k basis functions f 1 , ..., f k but, for the interpretation purposes, we wish to present the results using another set of basis functions 1 , , … k g g . Assume also that these two sets of basis functions span the same set of progression functions, that is, Then, if we write there exists a full-rank k × k matrix A such that f (t)=Ag(t), for all t.
and the model is transformed The parameters are then transformed in the following way Recall that, in the ML estimation, the parameter estimates are transformed in the same way.

Principal component analysis for the variation of the curves
One popular set of basis functions are principal component functions which often can be interpreted and are also ordered according to the amount of variation they explain.

Consider the random functions
where, as before, The magnitude of an eigenvalue related to a principal component function reflects its relevance and the associated score of it how strong expressed the feature represented by this function is, similar as in traditional PCA. For further ideas about the decomposition of the random effects covariance matrix, see for example [10].

Mixed-effects regression splines
In this section we consider spline functions which provide a flexible basis for smooth myopia progression modeling [6,11,12].

Truncated polynomial spline functions
We first consider the splines based on truncated power functions. For the definition, let 0 1 1 ... for the set of functions p  says that the mean curve is constant after the last knot point t k , and this hypothesis can be tested using 1( ,..., ) More generally, any linear hypothesis H 0 : Aβ=b for a chosen r × p matrix A having rank(A)=p and for a chosen r-vector b can be tested using β − A b that is, under the null hypothesis, approximate r-variate normal with zero mean vector and covariance matrix Under the null hypothesis, the limiting distribution of the squared form test statistic is then a chi squared distribution with r degrees of freedom.

B-spline functions
Yet another alternative is to use the basis of B-spline functions [13,14]. B-spline functions are constructed recursively using the original and additional knot points  Remark 1: Note that, naturally, all three parametric function sets p  , p  , and p  provide the same fit (see for example Section 3.7.1 of [6]). Estimation of parameters of p  has best numerical properties (see for example Chapter 5 of [11]) but the interpretation of the regression parameters in p  , and p  is often easier. The first p+1 parameters in p  , for example, give the p-polynomial on the last time interval [t m , t m+1 ]. The null hypothesis H 0 : β 1 =...=β p =0 then says that the mean value is constant (does not depend on time) on the last interval.

A Real Data Example
The data set We illustrate the theory with a data set from a Finnish longitudinal myopia study. 240 children in central Finland were recruited for this study in [1983][1984]. For details about the design and inclusion criteria, see [15]. In this analysis, the measurements are refraction values on the right eyes of 118 girls and 118 boys; 4 children were excluded for various reasons. In the beginning of the study the ages of the children were between 8.8 and 12.8 years old (mean age 10.9 years). In the first three years the children were measured yearly (they actually were also randomized into three different treatment arms which is ignored here; there were no statistical differences between the arms [16]. As a part of the follow-up, the subjects were supposed to see the same ophthalmologist for measurements 15 and 25 years after the start of the study. Available measurements based on glass prescriptions and files from different ophthalmologists and opticians were used to obtain additional observations between these time points. During the followup, 2 subjects died and 18 subjects dropped out because of refractive surgery. The measurements were not equally spaced and often sparse with the mean number of measurements per subject 8.1 (range 2-15). The total number of recordings is 1908 with the oldest age 39.0 years. Table 1 lists the numbers of measurements for different time intervals.
In the original study, measurements of several covariates were collected but, for our demonstration purpose, we consider only sex. The observed individual curves are then shown separately for boys and girls in Figure 1. These are the raw data for our modeling, and the aim is to build prediction curves (with confidence and tolerance bands) for a new subject based on these data. A minor question was to consider the age when the progression of myopia levels off. The statistical tools described in the previous sections are used for the analysis. The analysis was done using R 2.15.0 [17], especially the packages splines and lme4 [18].

Estimates of the parameters for B-spline basis functions
B-spline quadratic basis functions were used for the estimation with the knots at the ages 12, 16, 22. These choices were based on the consultations with specialists. If no prior knowledge were available, the number and locations of the knots could be determined using crossvalidation, model se-lection criteria (AIC or BIC), likelihood ratio tests or by using a roughness penalization approach, etc (see for example [7]). In our model the number of basis functions is then k=6. With one dichotomous explaining variable (sex), we have 12 parameters for θ (mean curves) and 21 parameters for Σ, and the residual variance τ 2 .
Using the R-function lmer [18], we obtain the fixed effects estimates of θ that are presented in Finally, the error variance estimate is 2 0.0633 τ =

Prediction
The estimated mean curves for boys and girls based on θ are shown in Figure 2. The girls seem to have a faster development of myopia but the differences between boys and girls seem to vanish with the time. After the last knot at age 22, the boys' mean curve seems still to be almost linearly descending while the girls' curve seems to level off.      From a medical point of view, it is interesting to predict the progression of myopia just using the measurement at the first visit. As an example, we provide in Figure 3 the progression chart for the first visit at the age of 9 years.
As explained before, the prediction curves may be based on several observations as well. To illustrate this, we randomly selected one boy and one girl with at least eight measurements. For both subjects, we found the prediction curves with pointwise 95% tolerance intervals and 95% tolerance bands based on the first three, first five, and first seven observations. As explained in Section 2.2, tolerance bands may be here based on For the girl, the prediction curves based on three or seven observations seem to work well. If five points were used, the (strange) later upward trend at later age is missed. The results for the boy look similar. The conservative tolerance band that is easy to compute is too conservative to be helpful here.

Alternative sets of basis functions
B-spline basis functions are mathematically and numerically attractive but the interpretation of the parameters in Table 2 Figure 5 (two first columns). It is remarkable that the first three PCA basis functions explain together over 98% of the variation. The first principal function is simply for the general progression level (with a typical progression). The second component seems to be an indicator for strong early progression as a subject with a large score will exhibit much faster progression than one with a small score. The third function roughly describes the contrast between the first and third interval reflected by the two opposing peaks of the curve.
Although the principal component functions may be used to find the main type of variations between the individual curves, they are not practical if one wishes to have parameters to tell what is happening in the beginning or in the end of the follow-up period. These studies can be performed with the truncated power splines. One may be interested, for example, whether the progression of myopia levels off after the last knot value. Figure 2 shows that on the average the stabilization does happen neither for boys nor for girls. Separate analyses were made for boys and girls, however, to confirm that. For both cases, we fitted the model using B-splines and then changed to the truncated power spline functions shown in Figure 5 (the rightmost column) as explained in Section 3.1. (The matrix A may be simply found by calculating the values of all 12 functions at six suitable time points).
To demonstrate the conversion Tables 3 and 4 gives the estimates for the fixed effects using B-spline functions and truncated power spline functions for boys and girls, respectively. The null hypothesis that the myopia progression levels off at 22, then says that the second and third coefficients for truncated power spline functions are zero. Both p-values are less than 0.0001 and the null hypotheses should therefore be rejected, see Section 4.1.

Discussion
In this paper we showed that the random regression analysis using polynomial spline functions is a flexible way to model myopia progression. The estimated model then also provides easy prediction charts for myopia progression depending on previous measurements. In the estimation of the parameters in the model, the B-spline functions are preferable but the results can be easily transformed to any set of basis functions spanning the same function space. The model also easily allows the use of covariates.
In our example, the theory was illustrated by testing the hypothesis that on the average the myopia progression levels off at the age of 22. The hypothesis was rejected but we believe that this may have happened partly due to the bias coming from the data collection procedure. The numbers of measurements p i and data points 1 , ..., i i i p t t are informative in the sense that fast myopia progression is naturally connected to a high number of measurements. The subjects with early stabilized level have sparse late measurements, and the estimates of the parameters describing the progression in the end of the follow-up are then mainly based on the measurements from subjects with late progression. This causes bias in the estimation of the model. This is similar to the analysis of missing data problems (with informative missingness) or to the analysis of clustered data with informative cluster size [19]. To correct the bias is an interesting topic for a future work. Furthermore, robust estimation procedures should be developed here as the data contain clear outliers, that is, the individual curves with atypical or even impossible behaviour. The outliers naturally have a strong effect on the fit of the data.