Introduction

The variance components (VC) approach is a method that expresses the phenotypic variances and covariances among individuals as a function of the estimated number of alleles shared identical by descent (IBD) at a given locus. The method was first described by Fisher.1 Penrose2 subsequently developed a method of detecting linkage by measuring the observation correlation between two graded (ordered categorical) traits based on theory that when two loci are linked, the range of covariance of their effects within the sibship should increase. Jayakar3 developed a similar VC approach in which he allowed for the inclusion of offspring from phase unknown matings. More recently, extensions of VC approaches were developed by various authors.4,5,6,7,8,9,10,11,12,13 VC methods have shown to have increased power to detect linkage for quantitative traits compared to sib-pair methods.14,15,16 However, the statistical properties of the VC methods are not yet well characterised and should be investigated further. In this study, the statistical properties of the VC approach, as implemented in SOLAR (Version 1.4.0) and GENEHUNTER (Version 2.0) were investigated using computer simulation. The type I error rate was determined for different null hypothesis; the average lod score was computed for several genetic models; the proportion of variance attributed to single locus, random environment, polygenic inheritance was estimated; and the power to detect linkage for both implementations was compared. Thus, this study also serves to compare two implementations of the VC method.

Materials and methods

In linkage analyses, the usual null hypothesis is assumed to be that of ‘no linkage’. However, there are several different underlying models that can be postulated for a hypothesis of no linkage. These include: a random environmental effect (no genetic effect on the trait), an unlinked single-locus genetic effect for the trait, and a genetic effect that is not due to a single locus, but rather to unlinked additional loci (oligogenic effects) or to polygenic effects.

To evaluate type I error rates, data were simulated for nine different null hypotheses where the phenotypic variance was due either to a random environment, or to a random environmental effect and different proportions of unlinked single locus components or polygenic effects (Table 1. Model ENV, the ‘traditional’ null hypothesis, assumed that all of the phenotypic variance was due to random environmental effects. Models labelled PG assumed that phenotypic variance was due to random environmental and polygenic effects. Models labelled SG assumed that the phenotypic variance was due to random and unlinked single-locus effects.

Table 1 Models simulated under various null hypotheses

The generating models for the alternative hypotheses are detailed in Table 2. In these models, 30% of the phenotypic variance was due to a single-locus effect (g1) with the remaining variation due either to random environmental (e), unlinked single locus (g2), polygenic effects (Vp) and/or common sibship environment (ce) effects.

Table 2 Models simulated under alternative hypotheses

Simulation methods

Simulations were performed using the software package GASP (V3.31).17 In GASP, the simulation model is defined on the basis of a set of components and parameters specified by the user. For the purpose of this study, the trait locus was assumed to have two equally frequent alleles with additive effects. Under a null hypothesis where the phenotypic variance was due to random environmental effects, the trait was normally distributed with mean 0 and variance 1, otherwise the trait mean was a mixture of normal distributions. Samples consisting of 100 nuclear families with a fixed sibship size of 5, and a single marker with four equal frequent alleles (heterozygosity equal to 0.75) were simulated for each model. Two thousand replicates were simulated for each model. The recombination fraction between the trait locus and marker locus was 0.05.

The VC method, as described by Almasy and Blangero,11 is an extension of the model proposed by Amos,9 and is implemented in SOLAR.11 The method assumes that the vector of phenotypic trait values x in a pedigree of size n follows a multivariate normal distribution. The In-likelihood of the pedigree, L, can be written as:

where μ is the vector of the theoretical trait means (which may include effects due to covariates), and Σ is the expected variance–covariance matrix. Covariate effects are introduced by modelling the trait vector as μ=γ+Aβ, where γ is the ovarall trait mean, and A is an n X m design matrix for the trait whose n rows contain m covariates which are uncorrelated with the additive genetic effects for each pedigree, and β is an m-vector of fixed covariate effects.

This method can be used to estimate the genetic variance attributable to a region around a genetic marker of interest. Although there is no detailed description of the VC approach implemented in GH2, both SOLAR and GH2 use scoring methods. However, unlike SOLAR, GH2 does not reparametrize the VC parameters in terms of heritability.

Because the VC approach allows for single locus, polygenic and residual components of variance, the corresponding components of variance (σa2, σg2, σe2) were estimated. A model estimating the variance attributable to the marker locus was compared to a model where the variance was fixed to be zero. In both models, the residual genetic variance was parameterised as an additive polygenic component. The −2 log-likelihood ratio was compared to a chi-square distribution with the number of degrees of freedom determined by the number of independent parameters.

Results

Table 3 presents the empirical type I error rate at a nominal level of 0.05 for nine null hypotheses for both SOLAR and GH2. In the ENV100 model, the estimated type I error rate (0.003) was well below the specified nominal level (0.05). In the PG models, the type I error rate generally increased with the increase in the percentage of variance due to the polygenic component, plateauing at about the nominal significance level. Similarly in the SG models, the type I error rate increased as the percentage of variance due to the unlinked single locus increased. Results are nearly identical for the GH2 and SOLAR implements.

Table 3 Empirical type I error rate (nominal P-value of 0.05) for different simulation models

Table 4 presents the results for the modelled effects obtained using GH2 and SOLAR, respectively. Average lod scores were nearly identical for both implementations for all four models (correlation coefficient over 0.997). As presented in Table 1, 30% of the variance was attributable to the single-locus effect for all models. For model 1, both GH2 and SOLAR estimated that 22% of the variance was due to the single locus effect. Eight per cent of the phenotypic variance was mistakenly attributed to the polygenic effect, when in fact a polygenic effect did not exist. In the presence of a second unlinked single locus (model 2) or a polygenic effect (model 3) each accounting for 30% of the total variance, about 5% of the total variance that should have been attributed to the single-locus effect was mistakenly attributed to the polygenic component. However, in the presence of a common sibship environment (model 4), the variance due to the single locus component was slightly overestimated (32–33% vs 30%) and 10–11% of the variance was mistakenly attributed to a polygenic component.

Table 4 Estimated average lods and proportion of variance attributable to single locus and polygenic effects

Table 5 presents the power to detect linkage based on models 1–4. The lod score corresponding to a type I error rate of 0.05 (one sided) is 0.588. Power was also evaluated for lod scores of 1, 2, 3 and 4. Overall, the addition of an unlinked second single locus, a polygenic component, or a common sibship environment increased the average lod scores. Power to detect linkage was highest in the presence of a common sibship environment.

Table 5 Power of detecting linkage under different alternative hypotheses

Discussion

SOLAR and GH2 were shown to yield very similar results when applied to the same data even though SOLAR actually reparametrises the VC parameters involved in heritability and GH2 does not. The other differences is that GH2 computes identical by descent (IBD) values exactly but SOLAR does it approximately. In this exercise, however, all simulations were performed with single markers therefore, the difference in calculating IBD sharing in two implementations should not introduce a noticeable variation in the results.

Interestingly, for both implementations, the type I error rate was conservative when there was no genetic component in the underlying model. This may be due to the fact that the likelihood surface is flat, thus making the global maximum more difficult to reach. When a polygenic or an unlinked single locus component was included in the generating model, the type I error rate increased, but remained close to the nominal level. However, when the unlinked single locus contributed 80% of the total variance, the type I error rate was somewhat liberal (0.07). Similar findings have been reported by other authors18 when there are outliers. One solution may be to utilise a multivariate normal distribution instead of the Chi-square distribution.19,20 Other possible solutions have been discussed in detail by Allison et al.18 Also, a few robust VC methods have been proposed by various investigators.21,22,23,24

Furthermore, the VC method implemented in both GH2 and SOLAR tends to underestimate the variance attributable to a linked single locus component except when the underlying generating model contains a common sibship environment. This behaviour becomes more pronounced when the model contains only a single locus and random environment component. This is due to the fact that the VC method always assumes a proportion of variance due to a polygenic component. As noted by Amos9 the use of samples with only one, or relatively few, type of pairs of relatives, may lead to confounding of the estimates of variance attributable to single-locus, polygenic and environmental effects. As shown in the simulations performed in Amos's study, there was a bias (underestimate) in the proportion of the variance attributable to the single-locus effect, and a substantial proportion of the single-locus variance was incorrectly attributed to the ‘polygenic’ effect. This effect appears to be inherent in the VC method, rather than the particular implementation.

We need to note that the shared environment component was not taken into account in either GH2 or SOLAR. We are aware of the fact that SOLAR allows the inclusion of a shared environment. But we were interested in investigating the effect of model misspecification under both implementations. Therefore, it is not surprising to see that two programs gave similar results.

On the other hand, it appeared that it was possible to localise a quantitative locus that accounts for 30% of the phenotypic variance with a reasonably sized sample of randomly ascertained sibships, which suggests that like model dependent methods of linkage analysis, the VC method is first and foremost a test of linkage. Like the estimator of the recombination fraction, estimates of the proportion of variance attributable to genetic and non-genetic effects should be interpreted with caution.

Our future work will further explore the statistical properties of the VC methods under weaker underlying genetic models by simulating traits with lower heritability using various family sizes and structures.