Compendium of genome-wide scans of lipid-related phenotypes: adding a new genome-wide search of apolipoprotein levels

The genetic dissection of complex inherited diseases is a major challenge. Despite limited success in finding genes, substantial data based on genome-wide scan strategies is now available for a variety of diseases and related phenotypes. This can perhaps best be appreciated in the field of lipid and lipoprotein levels, where the amount of information generated is becoming overwhelming. We have created a database containing the results from whole-genome scans of lipidrelated phenotypes undertaken to date. The usefulness of this database is demonstrated by performing a new autosomal genomic scan on apolipoprotein B (apoB), LDL-apoB, and apoA-I levels, measured in 679 subjects of 243 nuclear families. Linkage was tested using both allele-sharing and variancecomponent methods. Only two loci provided support for linkage with both methods: a LDL-apoB locus on 18q21.32 and an apoA-I locus on 3p25.2. Adding those findings to the database highlighted the fact that the former is reported as a lipid-related locus for the first time, whereas the latter has been observed before. However, concerns arise when displaying all data on the same map, because a large portion of the genome is now covered with loci supported by at least suggestive evidence of linkage. —Bossé, Y., Y. C. Chagnon, J-P. Després, T. Rice, D. C. Rao, C. Bouchard, L. Pérusse, and M-C. Vohl. Compendium of genome-wide scans of lipid-related phenotypes: adding a new genome-wide search of apolipoprotein levels. J. Lipid Res. 2004. 45: 2174–2184. Supplementary key words lipoproteins • quantitative trait locus • cardiovascular risk factors • linkage • dyslipidemia Mapping genes involved in complex human diseases is one of the major challenges in human genetics. With the increasing incidence of chronic diseases in industrialized societies, finding these genes is clinically and economically relevant. During the past few years, considerable research resources have been deployed to study the genetic causes of complex human diseases to better understand their pathogenesis and, ultimately, improve prevention strategies, diagnostic tools, and therapies (1). Encouraged by the early success in the identification of genes responsible for monogenic diseases, many investigators have embraced genomescan strategies. This trend has resulted in an enormous amount of information, which is now typically difficult to synthesize and interpret for a given complex disease. The importance ascribed to lipid and lipoprotein levels in risk estimation and in the treatment of coronary heart disease (CHD) (2) has stimulated molecular studies to investigate the genetic causes underlying human variation in these traits. A large number of genome-wide screens of serum lipid-related phenotypes have been performed to date, and a review of such studies seems timely. Because linkage results must be replicated to be credible (3), a compendium of published quantitative trait loci (QTLs) may facilitate the identification of replicated findings. To provide an example on how such information can be useful, we add the results of a new genome scan of apolipoprotein B (apoB) and apoA-I levels to this compendium. ApoB and apoA-I levels are good markers of CHD risk (4, 5). A number of studies have clearly established that genetic factors contribute to interindividual differences in apolipoprotein levels. An elegant study comparing identical and fraternal twins reared together with twins reared apart has shown that a large portion of the variance in Abbreviations: apoB, apolipoprotein B; CHD, coronary heart disease; cM, centimorgan; IBD, identical by descent; LOD, logarithm of the odds; QFS, Québec Family Study; QTL, quantitative trait locus. 1 To whom correspondence should be addressed. e-mail: marie-claude.vohl@crchul.ulaval.ca The online version of this article (available at http://www.jlr.org) contains three additional tables. Manuscript received 26 July 2004 and in revised form 26 August 2004. Published, JLR Papers in Press, September 17, 2004. DOI 10.1194/jlr.R400008-JLR200 review at Lval U iv P eriiques B iblioeque, on Jne 2, 2020 w w w .j.org D ow nladed fom 0.DC1.html http://www.jlr.org/content/suppl/2004/11/17/R400008-JLR20 Supplemental Material can be found at:

Mapping genes involved in complex human diseases is one of the major challenges in human genetics. With the increasing incidence of chronic diseases in industrialized societies, finding these genes is clinically and economically relevant. During the past few years, considerable research resources have been deployed to study the genetic causes of complex human diseases to better understand their pathogenesis and, ultimately, improve prevention strategies, diagnostic tools, and therapies (1). Encouraged by the early success in the identification of genes responsible for monogenic diseases, many investigators have embraced genomescan strategies. This trend has resulted in an enormous amount of information, which is now typically difficult to synthesize and interpret for a given complex disease.
The importance ascribed to lipid and lipoprotein levels in risk estimation and in the treatment of coronary heart disease (CHD) (2) has stimulated molecular studies to investigate the genetic causes underlying human variation in these traits. A large number of genome-wide screens of serum lipid-related phenotypes have been performed to date, and a review of such studies seems timely. Because linkage results must be replicated to be credible (3), a compendium of published quantitative trait loci (QTLs) may facilitate the identification of replicated findings. To provide an example on how such information can be useful, we add the results of a new genome scan of apolipoprotein B (apoB) and apoA-I levels to this compendium.
ApoB and apoA-I levels are good markers of CHD risk (4,5). A number of studies have clearly established that genetic factors contribute to interindividual differences in apolipoprotein levels. An elegant study comparing identical and fraternal twins reared together with twins reared apart has shown that a large portion of the variance in apoB and apoA-I levels is attributable to genetic factors, with heritability estimates greater than 50% (6). In addition, based on complex segregation analyses, major gene effects have been reported for these two phenotypes (7,8). Mutations in genes that encode apoB, LDL receptor, and ABCA1 have been implicated in monogenic disorders altering plasma apolipoprotein levels, including familial hypobetalipoproteinemia [Online Mendelian Inheritance in Man (OMIM) 605019], familial hypercholesterolemia (OMIM 143890), and hypoalphalipoproteinemia (OMIM 604091). However, these mutations do not account for the variation in plasma apoB and apoA levels in the general population. In an attempt to identify the responsible genes, a large number of association and linkage studies have been performed with candidate genes. These studies have been difficult to interpret because of conflicting results, lack of replication, and the occurrence of positive findings only in specific subgroups. Perhaps the highest linkage signal for apoB levels was reported in Dutch pedigrees on chromosome 1p31 [logarithm of the odds (LOD) ϭ 4.7] (9). Other suggestive linkages (LOD Ͼ 1.7) have been found on chromosome 12q24 for apoA-I (10) and on 1p, 11q24, 21q21, and Xq23 for apoB (11,12). However, other genome-wide scans failed to identify QTLs for apoB levels (10,13). To search for additional loci influencing apoB and apoA-I levels or to replicate previous findings, we performed an autosomal genome scan among 243 nuclear families participating in the Québec Family Study (QFS).

Population
Subjects were participants in the QFS, an ongoing project with French-Canadian families investigating the genetics of obesity and its comorbidities (14). In this study, 679 subjects of 243 nuclear families had apolipoprotein measurements available. This cohort represents a mixture of random sampling and ascertainment through obese (body mass index Ͼ 32 kg/m 2 ) probands. Table 1 presents the characteristics of subjects in each of the sex and generation groups. The study was approved by the Laval University Medical Ethics Committee, and all subjects provided written informed consent. All procedures followed were in accordance with institutional guidelines.

Apolipoprotein measurements
Blood samples were obtained from an antecubital vein in the morning after a 12 h overnight fast. The apolipoprotein measurements were performed with the rocket immunoelectrophoretic method (15). ApoB concentrations were measured in plasma, whereas LDL apoB and apoA-I concentrations were measured in the infranatant (d Ͼ 1.006 g/ml) obtained after separation of very low density lipoprotein from the plasma by ultracentrifugation. The measurements were calibrated with reference standards obtained from the Centers for Disease Control and Prevention (Atlanta, GA).

Linkage analysis
A total of 443 markers spanning the 22 autosomal chromosomes with an average intermarker distance of 7.2 centimorgan (cM) were genotyped as described by Chagnon et al. (16). The apolipoprotein traits were adjusted for the effects of age (up to cubic polynomial to allow for nonlinearity), gender, and body mass index using a stepwise multiple regression procedure retaining only significant covariates ( P Ͻ 0.05) as described previously (17). Adjustments of the phenotypes were performed using SAS (version 8.2).
We conducted quantitative trait linkage analyses using both allele-sharing and variance-component methods. For the allele-sharing method, we used the new Haseman-Elston regression-based method (18), which models the mean corrected cross-product of the sibs' trait values instead of the squared sib pair trait difference used in the original method (19). Two-point and multipoint (at 1 cM intervals) estimates of alleles shared identical by descent (IBD) were generated using GENIBD software, and linkage was tested using SIBPAL2 software from the S.A.G.E. 4.0 statistical package (20). The maximum number of sib pairs was 347. Empirical P values of the test statistic were also computed using a Monte Carlo permutation procedure with 10,000 replicate permutations for genomic regions containing two-point linkage markers with suggestive evidence of linkage ( P Ͻ 0.0023). Linkage was also performed with a variance-component model using the QTDT (quantitative transmission disequilibrium test) computer program (21). Under this model, a phenotype is influenced by the additive effects of a QTL (q), a residual familial component attributable to polygenes (g), and a residual nonfamilial component (e). Hypothesis testing was performed with the likelihood ratio test. The likelihood of the null hypothesis is obtained by restricting the additive genetic variance attributable to the QTL ( q ) equal to zero ( q ϭ 0). The test is conducted by contrasting this restricted model with the alternative, in which q is estimated ( q 0). The difference in minus twice the log likelihoods between the null and alternative hypotheses is approximately distributed as a Chi-square, which allowed LOD score computation as 2 /(2 log e 10). We have taken a LOD score of у 3.00 ( P р 0.0001) as evidence of linkage and a LOD score of у 1.75 ( P р 0.0023) as evidence of suggestive linkage (22). We have also retained LOD scores of у 1.18 ( P р 0.01) to identify potential independent confirmation of a previously reported significant linkage (23).

Database
The initial search for genome-wide scan publications on lipidrelated phenotypes was accomplished with keywords (genome scan ϩ lipoprotein and linkage ϩ lipoprotein ϩ genome) at the bioinformatics site of the National Center for Biotechnology In- formation (www.ncbi.nlm.nih.gov). The publication list was completed and verified by examination of both the discussion section and the reference list of the publication found in the initial search. The search focused on results published before the end of April 2003 and excluded abstracts presented at meetings. A whole-genome scan Excel database for lipid-related phenotypes was established. The database contained bibliographic details (first author, source, and years), study population (ethnicity), ascertainment scheme, phenotypic traits, sample size details (number of individuals, sib pairs, and families), linkage analysis methods, and results. Any evidence of linkage, from suggestive and better (LOD score у 1.7 or P р 0.0023) was treated as an observation (a hit). Results were entered in the database with the name of the linked marker/gene, its location (megabase and chromosomal band), and its maximum LOD score or Z score or P value. For most studies, markers were provided in the papers and were those defining the peak or were the closest to the signal. When the marker's name or the specific location of the QTL (hits)  was not available in the original paper, the authors were contacted and asked to provide the missing information. To identify possible replication and compared loci across studies, the location of each linked marker/gene was positioned on a single map provided by the Human Genome browser of the University of California, Santa Cruz (assembly, June 2002; http://genome.ucsc.edu). When a two-stage strategy was reported in the publication, the P value of the second stage was favored unless it did not reach the criteria to be included in Table 4 (criteria based on whole-genome scan). This decision was made to take the best out of these studies considering that the criteria for claiming significant linkage are different between the first and second stages of the analysis. Similarly, when multiple linkage methods were used in the same publication, the most significant result was kept for the database.
To evaluate whether QTLs were randomly distributed across the genome, we regressed the observed hit ratio against the expected hit ratio as reported previously (24). The observed hit ratio of each chromosome was obtained as number of hits on a specific chromosome/number of hits across all chromosomes ϫ 100; the expected hit ratio of each chromosome was obtained as number of genes on a specific chromosome/total number of genes in the genome ϫ 100. The gene content of each chromosome and for the whole genome are from Venter et al. (25). A significant association (positive slope) between the observed and expected hit ratios would suggest that the positive linkages reported in the literature are distributed randomly across the genome. In contrast, if the association is missing, it would suggest that the observed hits are concentrated within specific chromosomes containing the genes controlling lipid and lipoprotein levels.

Genome scan on apoB, LDL-apoB, and apoA-I in the QFS cohort
Detailed results for all chromosomes and phenotypes are available in the three supplementary tables online. Table 2 summarizes the markers showing weak to moderate evidence of linkage (P р 0.01 or LOD score у 1.18) with the allele-sharing (two-point and multipoint) and the variance-component linkage methods. The highest variance-component LOD score was obtained for LDL-apoB on chromosome 18q21.32 (LOD ϭ 2.05) (Fig. 1). Hits were also observed by the variance-component method for total apoB on 6p22.3-p21.1 and 6q23.1, for LDL-apoB on 2q35 and 11q22.3, and for apoA-I on 3p25.2.
In this study, the new Haseman-Elston linkage method yielded more genetic loci. For LDL-apoB, single-point evidence of linkage was observed on 20p13. In addition, the apoE and LIPE locus on 19q13 suggested the presence of a susceptibility locus for LDL-apoB as well as for apoB levels. The search for loci influencing apoA-I concentrations has been the most productive. Indeed, single-point linkages were demonstrated in five genomic regions: 3p25.2 (Fig. 2), 5q21.3, 9q31.3, 12q24.21, and 15q11.2. Suggestive evidence was also observed on 10q21.1, 11p15.1, 16p13.11, and 16q22.2. Multipoint linkage analysis, on the other hand, revealed strong evidence of linkage (P Ͻ 0.000001) on a 2 cM region (151-153 cM) flanked by UCP1 and D4S1586 markers. Additional multipoint linkages were observed on 13q33.3 and 16p13.11, with the strongest signals observed with markers D13S796 and D16S405, respectively. Finally, a multipoint linkage was observed on 16q12, with the highest peak (P ϭ 0.000003) located between marker D16S261 and D16S3253 at 54.4 cM.
Most of the strong linkage evidence observed with the allele-sharing linkage method (both single point and multipoint) was not supported by the variance-component method. Only two loci, one at 18q21.32 (marker D18S38; Fig. 1) for LDL-apoB and the other at 3p25.2 (D3S1259; Fig. 2) for apoA-I, were supported by both the allele-sharing and the variance-component methods. These findings were added to the accumulating database derived from the published genome-wide scans for lipid-related phenotypes.

Descriptive statistics of the database containing published genome-wide scans for lipid-related phenotypes
The database included 32 citations published from 1998 through 2003. Phenotypes incorporated in the database  ApoA-I is adjusted for the effects of age, age 2 , age 3 , gender, and body mass index. The horizontal dotted line is a reference corresponding to P ϭ 0.01.
at Laval Univ Periodiques -Bibliotheque, on June 22, 2020 www.jlr.org Downloaded from and the number of genome scans for each phenotype are presented in Table 3. The most frequently studied phenotypes were total cholesterol (n ϭ 10), LDL-cholesterol (n ϭ 11), HDL-cholesterol (n ϭ 18), and triglyceride (n ϭ 16). Studies of familial hypercholesterolemia, familial combined hyperlipidemia, and familial hypobetalipoproteinemia typically used a disease affliction status (affected or unaffected) based on lipid and nonlipid criteria. The other phenotypes were treated as either quantitative or qualitative variables. The study design, the sample size, and the linkage methods varied greatly between studies. Only 15.6% of the investigations were conducted among families ascertained randomly. The remaining were ascertained based on specific clinical criteria such as familial combined hyperlipidemia, familial hypercholesterolemia, familial hypobetalipoproteinemia, CHD, myocardial infarction, low HDL-cholesterol concentrations, hypertension, obesity, and type 2 diabetes. Few studies were from genetically isolated populations, such as the Hutterites, North-Eastern Indian, and Pima Indians. Table 4 presents a summary of the loci providing evidence of linkage from the compendium of whole-genome scans. A total of 152 hits were identified, which suggests that an average of 4.8 positive loci per study reached the suggestive threshold of significance (P р 0.0023 or LOD у 1.7). To evaluate whether positive loci were randomly distributed across the genome, we plotted the observed number of hits against the expected number of hits for chromosomes 1-22 (Fig. 3) (see Materials and Methods). A close relationship between positive loci and theoretical gene content was apparent. This suggests that the null hypothesis of random linkage across the genome cannot be rejected. On the other hand, some chromosomes showed an increased number of observed hits relative to expected hits. Indeed, chromosomes 21, 13, 15, and 2 had observedto-expected hit ratios of 2.7, 2.4, 1.8, and 1.5, respectively.

DISCUSSION
The avalanche of information anticipated from wholegenome linkage scans (23) has certainly been confirmed for the field of blood lipids and lipoproteins. The accumulating information may soon be overwhelming even for the scientists. Here, we have produced a summary of the loci providing evidence of linkage from published genome-wide scans carried out on blood lipid-related phenotypes (Table 4). We believe that such a compendium will be useful to others in the field. For instance, it may help investigators to access quickly the data on linkage for a specific genomic region or a particular phenotype. We have integrated all linkage signals on the same map to facilitate comparisons across studies.
To provide an example of the usefulness of this compendium, we performed a new genome-wide search of apoB, LDL-apoB, and apoA-I levels. The results suggested the existence of a susceptibility locus for LDL-apoB on 18q21.32 and a second one for apoA-I on 3p25.2. Additional linkages were observed with the allele-sharing linkage method, but the lack of consistency across linkage methods made the significance of these findings quit doubtful. From Table 4, we can easily identify the other QTLs that have been reported in the same regions from previous genome-wide scan studies. Interestingly, the apoA-I locus on 3p overlaps with the locus for low HDL-cholesterol levels reported in Finnish families (26) and with the locus for LDL-3 (phenotype defined as the cholesterol concentration in small LDL particles) observed in Mexican Americans (27). The re-     gion is also close to the locus for familial hypobetalipoproteinemia (28). In contrast, the LDL-apoB locus (18q21.32) observed in this study represents a newly identified locus. Although some genome-wide scans have been performed on apoB levels before (9)(10)(11), this study was the first to investigate the LDL-apoB subfraction. Genome-wide scans with subphenotypes have been successful in the past (27,29) and may explain the identification of this new locus on 18q21.32.
Our biggest challenge in the compilation of Table 4 was the choice of a significance level for inclusion of a linkage result. This question is related to the ongoing debate concerning significance levels appropriate for reporting evi-dence of linkage from genome-wide scans of complex traits (23,(30)(31)(32)(33)(34). With the emergence of genome-wide scans to identify loci underlying complex traits, geneticists have proposed a refinement of the originally proposed LOD score of 3 threshold (35). Although some advocated a continuation of the more stringent guideline to control false-positives (23), others suggested more flexible guidelines to hunt down genes with small effects believed to be involved in complex traits (31). Rao (32) proposed a middle ground, for the purpose of carrying out follow-up studies, to deal with both false-positive and false-negative claims. The recommendation was to increase tolerance from one falsepositive in 20 genomic scans assuming a continuous map, as suggested by Lander and Kruglyak (23), to one per scan assuming a more realistic map density of 400 markers, and to additionally rely on replication. These modifications set the nominal P value to 0.0023, which corresponds to a LOD score of 1.75 (22,36). However, it is interesting that this new threshold corresponded to what was called "putative" linkage by Thomson (31) and "suggestive" linkage by Lander and Kruglyak (23). Accordingly, all point-wise significance levels below this threshold were included in Table 4.
For complex traits, independent replication of an earlier finding gives substantial credibility to the results. Accordingly, it is a standard practice in the literature to compare the newly identify loci with those previously published even if the lipid-related phenotypes are not the same. However, this practice is questionable considering the large number of genome scans performed to date and the uncertainty about the location estimates of a QTL. Indeed, determining whether a given study has replicated an earlier finding is not simple, particularly when different markers have been used. When do we accept that two location estimates in a genomic region represent the same QTL? This issue has been addressed before, and it has been proposed that the location estimate may sometimes be several centimorgan away from the true locus (37). In fact, Fig. 3. Regression analysis of observed and expected hits on the autosomal chromosomes. The observed hit ratio of each chromosome was obtained as number of hits on a specific chromosome/all 152 hits ϫ 100, and the expected hit ratio of each chromosome was obtained as number of genes on a specific chromosome/total number of genes in the genome ϫ 100. The gene content of each chromosome and the genome are from Venter et al. (25).
at Laval Univ Periodiques -Bibliotheque, on June 22, 2020 www.jlr.org Downloaded from the 95% confidence interval of the location estimate can span tens of centimorgan depending on family size and number, penetrance of locus, and heterogeneity. Based on the above, the cumulative evidence from genome-wide screens for lipid-related phenotypes now covers a very large portion of the genome (Table 4). It is likely that the entire genome will be covered with at least suggestive evidence of linkage in a few years, and replication of findings will be guaranteed in future genome-wide scans if the lipid-related phenotypes are grouped together. This phenomenon is not unique to lipid-related phenotypes. The evolution of the human obesity gene map is a good example of this trend, with more than 300 genes, markers, and chromosomal regions that have now been associated or linked with human obesity phenotypes (38).
Despite the large number of QTLs reported to date, a coherent and comprehensive picture of the loci contributing to variations in lipid and lipoprotein has not been achieved. This is demonstrated by the inability to reject the hypothesis of random positive linkage (Fig. 3). We have learned that the genetic mechanisms underlying the predisposition to favorable or unfavorable plasma lipoproteinlipid levels are more complicated than previously thought. The emergence of such a large number of potential susceptibility loci for lipid-related phenotypes should be interpreted with caution and used carefully before claiming replication. It is commonly accepted that a P value of less than 0.01 from an independent study sample is sufficient to declare replication of an earlier significant linkage (23). However, a large part of the data in Table 4 is only suggestive linkage, which implies that some of the loci are false-positives. In addition, given the large number of genome scan reports and the inability to precisely localize the loci (37), many regions are likely to be replicated solely by chance. For example, more than 30 loci reached the P Ͻ 0.01 threshold in the present genome scan study of apolipoprotein levels, and many of them could be considered replicated linkage. New strategies to deal with these issues are urgently needed.

CONCLUSION
In summary, the identification of genes for complex human diseases and their associated biological traits has had limited success to date. This limited success may be explained by genetic heterogeneity, incomplete penetrance, epistasis, phenocopy and pleiotropy (39), and undoubtedly other factors. In this paper, we provide a compendium of previous results from genome scan studies of lipidrelated phenotypes. We have recorded a large number of loci covering a large portion of the genome. The number of false-positives is difficult to assess but is likely to be high because positive findings are more frequently published. Because of this publication bias, a lot of positive hits presented in Table 4 will eventually turn out to be false-positives. Accordingly, even though a single tool summarizing the extensive literature on the subject may prove to be useful, it should be used with caution. Caution is also ad-vised for claiming replication, because a large number of loci have been reported and the probability of claiming replication just by chance is getting high. We also report a new genome scan of apolipoprotein levels. Linkage was tested using both allele-sharing and variance-component methods. Many loci provided weak to moderate evidence of linkage, but only two QTLs were supported by both analytical methods.