Abstract
In order to evaluate the statistical properties of the variance components method implemented in SOLAR and GENEHUNTER2, the type I error rate, power and estimated size of modelled effects were determined using computer simulation. Results suggested that the type I error rate was quite conservative when the variance was completely due to random effects. However, when either a polygenic effect or an unlinked single locus component effect was included in the generating models the type I error rate was closer to the nominal level. The size of the polygenic or single locus effect appeared to influence the type I error rate. Results suggested that the variance components method underestimated the variance attributed to a single locus effect and overestimated the variance attributed to a polygenic effect in the models considered regardless of the source of variation.
Similar content being viewed by others
Introduction
The variance components (VC) approach is a method that expresses the phenotypic variances and covariances among individuals as a function of the estimated number of alleles shared identical by descent (IBD) at a given locus. The method was first described by Fisher.1 Penrose2 subsequently developed a method of detecting linkage by measuring the observation correlation between two graded (ordered categorical) traits based on theory that when two loci are linked, the range of covariance of their effects within the sibship should increase. Jayakar3 developed a similar VC approach in which he allowed for the inclusion of offspring from phase unknown matings. More recently, extensions of VC approaches were developed by various authors.4,5,6,7,8,9,10,11,12,13 VC methods have shown to have increased power to detect linkage for quantitative traits compared to sib-pair methods.14,15,16 However, the statistical properties of the VC methods are not yet well characterised and should be investigated further. In this study, the statistical properties of the VC approach, as implemented in SOLAR (Version 1.4.0) and GENEHUNTER (Version 2.0) were investigated using computer simulation. The type I error rate was determined for different null hypothesis; the average lod score was computed for several genetic models; the proportion of variance attributed to single locus, random environment, polygenic inheritance was estimated; and the power to detect linkage for both implementations was compared. Thus, this study also serves to compare two implementations of the VC method.
Materials and methods
In linkage analyses, the usual null hypothesis is assumed to be that of ‘no linkage’. However, there are several different underlying models that can be postulated for a hypothesis of no linkage. These include: a random environmental effect (no genetic effect on the trait), an unlinked single-locus genetic effect for the trait, and a genetic effect that is not due to a single locus, but rather to unlinked additional loci (oligogenic effects) or to polygenic effects.
To evaluate type I error rates, data were simulated for nine different null hypotheses where the phenotypic variance was due either to a random environment, or to a random environmental effect and different proportions of unlinked single locus components or polygenic effects (Table 1. Model ENV, the ‘traditional’ null hypothesis, assumed that all of the phenotypic variance was due to random environmental effects. Models labelled PG assumed that phenotypic variance was due to random environmental and polygenic effects. Models labelled SG assumed that the phenotypic variance was due to random and unlinked single-locus effects.
The generating models for the alternative hypotheses are detailed in Table 2. In these models, 30% of the phenotypic variance was due to a single-locus effect (g1) with the remaining variation due either to random environmental (e), unlinked single locus (g2), polygenic effects (Vp) and/or common sibship environment (ce) effects.
Simulation methods
Simulations were performed using the software package GASP (V3.31).17 In GASP, the simulation model is defined on the basis of a set of components and parameters specified by the user. For the purpose of this study, the trait locus was assumed to have two equally frequent alleles with additive effects. Under a null hypothesis where the phenotypic variance was due to random environmental effects, the trait was normally distributed with mean 0 and variance 1, otherwise the trait mean was a mixture of normal distributions. Samples consisting of 100 nuclear families with a fixed sibship size of 5, and a single marker with four equal frequent alleles (heterozygosity equal to 0.75) were simulated for each model. Two thousand replicates were simulated for each model. The recombination fraction between the trait locus and marker locus was 0.05.
The VC method, as described by Almasy and Blangero,11 is an extension of the model proposed by Amos,9 and is implemented in SOLAR.11 The method assumes that the vector of phenotypic trait values x in a pedigree of size n follows a multivariate normal distribution. The In-likelihood of the pedigree, L, can be written as:
where μ is the vector of the theoretical trait means (which may include effects due to covariates), and Σ is the expected variance–covariance matrix. Covariate effects are introduced by modelling the trait vector as μ=γ+Aβ, where γ is the ovarall trait mean, and A is an n X m design matrix for the trait whose n rows contain m covariates which are uncorrelated with the additive genetic effects for each pedigree, and β is an m-vector of fixed covariate effects.
This method can be used to estimate the genetic variance attributable to a region around a genetic marker of interest. Although there is no detailed description of the VC approach implemented in GH2, both SOLAR and GH2 use scoring methods. However, unlike SOLAR, GH2 does not reparametrize the VC parameters in terms of heritability.
Because the VC approach allows for single locus, polygenic and residual components of variance, the corresponding components of variance (σa2, σg2, σe2) were estimated. A model estimating the variance attributable to the marker locus was compared to a model where the variance was fixed to be zero. In both models, the residual genetic variance was parameterised as an additive polygenic component. The −2 log-likelihood ratio was compared to a chi-square distribution with the number of degrees of freedom determined by the number of independent parameters.
Results
Table 3 presents the empirical type I error rate at a nominal level of 0.05 for nine null hypotheses for both SOLAR and GH2. In the ENV100 model, the estimated type I error rate (0.003) was well below the specified nominal level (0.05). In the PG models, the type I error rate generally increased with the increase in the percentage of variance due to the polygenic component, plateauing at about the nominal significance level. Similarly in the SG models, the type I error rate increased as the percentage of variance due to the unlinked single locus increased. Results are nearly identical for the GH2 and SOLAR implements.
Table 4 presents the results for the modelled effects obtained using GH2 and SOLAR, respectively. Average lod scores were nearly identical for both implementations for all four models (correlation coefficient over 0.997). As presented in Table 1, 30% of the variance was attributable to the single-locus effect for all models. For model 1, both GH2 and SOLAR estimated that 22% of the variance was due to the single locus effect. Eight per cent of the phenotypic variance was mistakenly attributed to the polygenic effect, when in fact a polygenic effect did not exist. In the presence of a second unlinked single locus (model 2) or a polygenic effect (model 3) each accounting for 30% of the total variance, about 5% of the total variance that should have been attributed to the single-locus effect was mistakenly attributed to the polygenic component. However, in the presence of a common sibship environment (model 4), the variance due to the single locus component was slightly overestimated (32–33% vs 30%) and 10–11% of the variance was mistakenly attributed to a polygenic component.
Table 5 presents the power to detect linkage based on models 1–4. The lod score corresponding to a type I error rate of 0.05 (one sided) is 0.588. Power was also evaluated for lod scores of 1, 2, 3 and 4. Overall, the addition of an unlinked second single locus, a polygenic component, or a common sibship environment increased the average lod scores. Power to detect linkage was highest in the presence of a common sibship environment.
Discussion
SOLAR and GH2 were shown to yield very similar results when applied to the same data even though SOLAR actually reparametrises the VC parameters involved in heritability and GH2 does not. The other differences is that GH2 computes identical by descent (IBD) values exactly but SOLAR does it approximately. In this exercise, however, all simulations were performed with single markers therefore, the difference in calculating IBD sharing in two implementations should not introduce a noticeable variation in the results.
Interestingly, for both implementations, the type I error rate was conservative when there was no genetic component in the underlying model. This may be due to the fact that the likelihood surface is flat, thus making the global maximum more difficult to reach. When a polygenic or an unlinked single locus component was included in the generating model, the type I error rate increased, but remained close to the nominal level. However, when the unlinked single locus contributed 80% of the total variance, the type I error rate was somewhat liberal (0.07). Similar findings have been reported by other authors18 when there are outliers. One solution may be to utilise a multivariate normal distribution instead of the Chi-square distribution.19,20 Other possible solutions have been discussed in detail by Allison et al.18 Also, a few robust VC methods have been proposed by various investigators.21,22,23,24
Furthermore, the VC method implemented in both GH2 and SOLAR tends to underestimate the variance attributable to a linked single locus component except when the underlying generating model contains a common sibship environment. This behaviour becomes more pronounced when the model contains only a single locus and random environment component. This is due to the fact that the VC method always assumes a proportion of variance due to a polygenic component. As noted by Amos9 the use of samples with only one, or relatively few, type of pairs of relatives, may lead to confounding of the estimates of variance attributable to single-locus, polygenic and environmental effects. As shown in the simulations performed in Amos's study, there was a bias (underestimate) in the proportion of the variance attributable to the single-locus effect, and a substantial proportion of the single-locus variance was incorrectly attributed to the ‘polygenic’ effect. This effect appears to be inherent in the VC method, rather than the particular implementation.
We need to note that the shared environment component was not taken into account in either GH2 or SOLAR. We are aware of the fact that SOLAR allows the inclusion of a shared environment. But we were interested in investigating the effect of model misspecification under both implementations. Therefore, it is not surprising to see that two programs gave similar results.
On the other hand, it appeared that it was possible to localise a quantitative locus that accounts for 30% of the phenotypic variance with a reasonably sized sample of randomly ascertained sibships, which suggests that like model dependent methods of linkage analysis, the VC method is first and foremost a test of linkage. Like the estimator of the recombination fraction, estimates of the proportion of variance attributable to genetic and non-genetic effects should be interpreted with caution.
Our future work will further explore the statistical properties of the VC methods under weaker underlying genetic models by simulating traits with lower heritability using various family sizes and structures.
References
Fisher RA . The correlation between relatives on the supposition of mendelian inheritance Roy Soc Edin 1918 52: 399–433
Penrose LS . Genetic linkage in graded human character Ann Eugenics 1937 8: 233–237
Jayakar SD . On the detection and estimation of linkage between a locus influencing a character and a marker locus Biometrics 1970 26: 451–464
Hopper JL, Mathews JD . Extensions to multivariate normal models for pedigree analysis Ann Hum Genet 1982 46: 373–383
Goldgar DE . Multipoint analysis for human quantitative genetic variation Am J Hum Genet 1990 47: 957–967
Schork NJ . Efficient computation of patterned covariance matrix mixed model in quantitative segregation analysis Genet Epidemiol 1991 8: 29–46
Schork NJ . Extended pedigree patterned covariance matrix mixed model for quantitative phenotype analysis Genet Epidemiol 1992 9: 73–86
Schork NJ . Extended multipoint identity-by-descent analysis of human quantitative traits: efficiency, power and modeling considerations Am J Hum Genet 1993 53: 1306–1319
Amos CI . Robust variance-components approach for assessing genetic linkage in pedigrees Am J Hum Genet 1994 54: 535–543
Fulker DW, Cherny SS . An improved multipoint sib-pair analysis of quantitative traits Behav Genet 1996 26: 527–532
Almasy L, Blangero J . Multipoint quantitative-trait linkage analysis in general pedigrees Am J Hum Genet 1998 62: 1198–1211
Sham PC, Cherny SS, Purcell S, Hewitt JK . Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data Am J Hum Genet 2000 66: 1616–1630
Blangero J, Williams JT, Almasy L . Variance component methods for detecting complex trait loci Adv Genet 2001 42: 151–192
Pugh EW, Jaquish CE, Sorant AJ, Doestsch JP, Bailey-Wilson JE, Wilson AF . Comparison of sib-pair and variance components methods for genomic screening Genet Epidemiol 1997 14: 705–710
Shugart YY, Goldgar DE . Multipoint genomic scanning for quantitative loci: effects of map density, sibship size, and computational approach Eur J Hum Gene 1999 7: 103–109
William JT, Blangero J . Comparison of variance components and sibpair-based approaches to quantitative trait linkage analysis in unselected samples Genet Epidemiol 1999 16: 113–134
Wilson AF, Bailey-Wilson JE, Pugh EW, Sorant AJM . The genometric analysis simulation program (GASP): a software tool for testing and investigating methods in statistical genetics Am J Hum Genet 1996 59: A193
Allison DB, Neale MC, Zannolli R, Schork N, Amos CI, Blangero J . Testing the robustness of the likelihood-ratio test in a variance component quantitative trait loci mapping procedure Am J Hum Genet 1999 65: 531–544
Self SG, Liang KY . Asymptotic properties of maximum likelihood estimators likelihood ratio tests under non-standard conditions J Am Stat Assn 1987 82: 605–610
Lange KL, Little JA, Taylor JMG . Robust statistical modeling using the t distribution J Am Stat Assn 1989 84: 881–896
Astemborski JA, Beaty TH, Cohen BH . Variance components analysis on forced expiration in families Am J Hum Genet 1985 21: 741–753
Beaty TH, Liang KY . Robust inference for variance component models in families ascertained through probands. I. Conditional on proband's phenotype Genet Epidemiol 1987 4: 203–210
Beaty TH, Liang KY, Seerey S, Cohen BH . Robust inference for variance Genet Epidemiol 1987 4: 211–221
Beaty TH, Self SG, Liang KY, Connolly MA, Chase GA, Kwiterovich PO . Use of robust variance components method to analyze triglyceride data in families Ann Hum Genet 1985 49: 315–328
Acknowledgements
We thank three anonymous reviewers for insightful comments. This work was supported in part by Cigarette Restitution Fund of State of Maryland, NCI grant R03 CA85135-01 and NIH grant AG16992.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shugart, Y., O'Connell, J. & Wilson, A. An evaluation of the variance components approach: type I error, power and size of the estimated effect. Eur J Hum Genet 10, 133–136 (2002). https://doi.org/10.1038/sj.ejhg.5200772
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/sj.ejhg.5200772
Keywords
This article is cited by
-
Heritability and Genetics of Serum Dickkopf 1 Levels in African Ancestry Families
Calcified Tissue International (2015)
-
Refining genome-wide linkage intervals using a meta-analysis of genome-wide association studies identifies loci influencing personality dimensions
European Journal of Human Genetics (2013)
-
Genome-wide linkage and peak-wide association study of obesity-related quantitative traits in Caribbean Hispanics
Human Genetics (2011)
-
The power to detect genetic linkage for quantitative traits in the Utah CEPH pedigrees
Journal of Human Genetics (2005)