Introduction

Coronary heart disease (CHD) is one of the leading causes of death.1 Multiple genetic risk variants with small to moderate effects on the susceptibility to CHD have been identified in genome-wide association (GWA) studies; however, these variants explain only a small fraction of the heritable component of the risk of CHD.2, 3 Therefore, many genetic variants remain to be discovered. Among the discovered genes, many are related to lipid metabolism.4, 5 New study designs will be necessary for uncovering additional associated variants. Selecting the populations who already have high lipid levels are well suited to search for genes that increase the risk of CHD beyond dyslipidemia.6 Therefore, we performed a GWA study in a selected sample of familial hypercholesterolemia (FH) patients having severe hypercholesterolemia caused by mutations in the low-density lipoprotein receptor (LDLR) gene (MIM 143890).7

Traditional GWA studies are performed on extremely large sample sizes including tens of thousands of individuals.8 In contrast, we used an extreme genetics approach to enhance the statistical power. In this method, only the individuals with an extreme of the phenotype are genotyped. In our study, we genotyped FH patients who had CHD at very young age as cases, and elderly patients who, despite their high low-density lipoprotein (LDL) cholesterol level, had not experienced CHD as controls. We hypothesized that using this design would enhance the identification of additional genetic risk variants of CHD in FH.

Materials and methods

The schematic study design is shown in Figure 1. We performed the study in two stages. Stage I included a GWA study in the Dutch ‘Association of CHD Risk in a Genome-wide Old-versus-young Setting’ (ARGOS) sample. Stage II consisted of genotyping of a second case–control sample, refer to as Mutations Associated with Risk of Cardiovascular disease in volunteers with Hypercholesterolemia (MARCH) and a large FH cohort, refer to as FH Follow-up (FHFU). All patients were of Caucasian descent. In ARGOS, this was confirmed by multi-dimensional scaling of identical-by-state pairwise distances. All patients gave informed consent and the local ethics committees approved the protocol.

Figure 1
figure 1

Study design. Abbreviations: CHD, coronary heart disease; FH, familial hypercholesterolemia; GWA, genome-wide association; SNP, single nucleotide polymorphism.

Description of the populations

Gene-finding stage

ARGOS. The ARGOS sample consists of 500 patients, who were selected from 17 000 Dutch FH patients with a mutation in the LDLR gene. They were recruited in the Netherlands by the nationwide molecular screening program of the ‘Stichting Opsporing Erfelijke Familiare Hypercholesterolemie’.9 Phenotypic data (including CHD events) were acquired from general practitioners and by reviewing medical records at the lipid and cardiologic clinics. We selected the 264 youngest patients with premature CHD and the 236 oldest patients without any CHD, stratified for sex. The maximum age of the female cases was 60 years and that of the male cases 45 years. The minimum age of the controls was 65 years for males and 70 years for females. First and second degree family members were excluded. CHD was defined as the presence of at least one of the following: (i) myocardial infarction (MI), proved by at least two of the following: (a) classical symptoms (>15 min), (b) specific abnormalities on electrocardiography, and (c) elevated cardiac enzymes (>2 × upper limit of normal); (ii) percutaneous coronary intervention or other invasive procedures; (iii) coronary artery bypass grafting (CABG). Patients with angina pectoris were excluded because in the majority of cases this diagnosis could not be assured by objective data.

MARCH

The MARCH study group consisted of 413 FH patients (190 cases, 223 controls), from Italy, Norway, Spain, and the United Kingdom. A few patients were born in another country but all were Caucasian. All patients had clinically proven FH and a mutation in either the LDLR or the APOB gene. The maximum age of the female cases was 59 and that of the male cases 45 years. The minimum age of the controls was 50 years for males and 60 for females. For the cases in this cohort, the same CHD definition was applied as described above for the ARGOS sample with the addition of (iv) angina pectoris (AP), as this phenotype was accurately addressed in this cohort. AP was diagnosed as classical symptoms in combination with at least one unequivocal positive result of one of the following: (a) exercise test, (b) nuclear scintigram, (c) dobutamine stress ultrasound, or (d) >70% stenosis on a coronary angiogram. The controls had no manifest CHD.

FHFU study

The second replication cohort consisted of Dutch clinically proven heterozygous FH patients who were recruited from 27 lipid clinics in the Netherlands between 1989 and 2002.10, 11 For the cases in this cohort, the same CHD definition was applied as described above for the MARCH sample. The controls had no manifest CHD. The DNA of 2073 unrelated patients was available for the present analysis. A total of 51 FH patients had already been included in the ARGOS sample and were, therefore, removed from the FHFU group, leaving 2022 DNA samples for analyses.

Additional cohorts

In addition to these three FH cohorts, selected single nucleotide polymorphisms (SNPs) were also genotyped in population-based cohorts that have been described in detail previously: The Rotterdam Study (n=5207, a cohort of elderly inhabitants of a suburb of Rotterdam, from which we selected patients with prevalent CHD to ascertain that CHD occurred at relatively young age; 777 CHD cases), deCODE (n=12 848, selected from the national Islandic deCODE database; 726 cases with premature MI) and GerMIFSII (n=2520, a German cohort with 1222 proven MI cases before the age of 60).12, 13, 14

Genotyping in ARGOS

The samples of participants of the ARGOS group were assayed with Illumina Infinium HumanHap550K Chips (Illumina, San Diego, CA, USA) at Erasmus University Medical Center in Rotterdam, the Netherlands. Samples were processed according to the Illumina Infinium II manual. In brief, each sample was whole-genome amplified, fragmented, precipitated, and resuspended in the appropriate hybridization buffer. After hybridization, these denatured samples were processed for the single-base extension reaction and were stained and imaged on an Illumina Bead Array Reader. Normalized bead-intensity data obtained for each sample were loaded into the Illumina Beadstudio software where the fluorescent intensities were converted into SNP genotypes. Using Quanto software (http://biostats.usc.edu/software) considering ARGOS as a conventional case–control study and assuming large effect sizes owing to only genotyping the phenotypic extremes, we estimated that had >80% statistical power to detect effect sizes >2.5 at a genome-wide significant level and than 1.46 for a P-value threshold of 0.05.

Genotyping frequencies have been deposited at the European Genome-phenome Archive (EGA, http://www.ebi.ac.uk/ega/) which is hosted at the EBI, under accession number EGAS00001000734. The genotyping data are also available via the Dutch Biobanking and Biomolecular Research Infrastructure (https://catalogue.bbmri.nl/biobanks/, accession number 199).

Quality-control filtering

After genome-wide genotyping, a call-rate threshold above 98% was used for inclusion of the samples. Subjects were excluded from the sample if their sex was inconsistent with genetic data from the X chromosome and if duplicate samples produced inconsistent genotypes. SNPs were excluded if they had (i) successful genotyping in <90% of the cases and controls, (ii) a minor allele frequency <1% in the population, (iii) showed deviation from Hardy–Weinberg equilibrium (P-value<0.0001), or (iv) were monomorphic across all samples. After these exclusions, 535 179 SNPs remained.

Genotyping in MARCH and FHFU

In MARCH and FHFU, the genotypes of selected SNPs were determined using fluorescence-based TaqMan allelic discrimination assays and analyzed on an ABI Prism 7900 Sequence Detection System (Applied Biosystems, Foster City, CA, USA). Reaction components and amplification parameters were based on the manufacturer’s instructions using an annealing temperature of 60 °C. Results were scored blinded to CHD status. SNPs were excluded following the same quality criteria as in the gene-finding stage.

Statistical analysis

The genomic inflation factor was calculated using the mean of the χ2-tests generated on all SNPs that were tested. For each SNP which passed the quality control in the ARGOS population, the association with risk of CHD was examined in an additive genetic model using a logistic regression model adjusted for sex. Given the fact that age was an inclusion criterion to generate the contrast between cases and controls of the ARGOS population, we did not adjust for age.

We selected all SNPs that were associated with CHD with a P-value <1.00 × 10−4 in ARGOS to analyze in the second stage. In the second stage, the association with risk of CHD was examined using logistic regression in MARCH and the FHFU. In MARCH, we adjusted for sex only, as age was a selection criterion similar to ARGOS. In the FHFU cohort, we adjusted for age, sex, and statin use, as statin use was expected to be a confounder and, in contrast to ARGOS and MARCH, it was well documented in that cohort. Using Bonferroni correction, significance threshold was 0.0017 (0.05/29). We performed a z-based meta-analysis to combine the results of MARCH and FHFU in the second stage. Furthermore, we combined the results of all three studies using z-based meta-analysis.

We used Plink version 1.03 to run GWA study in ARGOS, SPSS version 15 to run logistic regression models in MARCH and FHFU and finally ‘meta’ and ‘rmeta’ packages running under R to perform the meta-analysis.15, 16, 17

Replication of well-known SNPs associated with CHD

To test the effect of earlier defined genetic risk variants on CHD in ARGOS, we analyzed the SNPs described in a large-scale meta-analysis in the CARDIoGRAM consortium (22 233 cases and 64 762 controls). Whenever the SNP was not available on the IlluminaHap550, we used a proxy as identified by SNAP (http://www.broadinstitute.org/mpg/snap/ldsearchs.php).

Results

Characteristics of the ARGOS cohort

Out of 17 000 Dutch FH patients with a known LDLR mutation, we selected the 264 youngest FH patients with CHD and the 236 oldest FH patients without any CHD. The mean±SD (range) age was 41.7±8.3 (23–59 years) in cases and 75.6±5.9 (65–years) in controls. A total of 249 cases and 217 controls were successfully genotyped. There were no significant differences in age, smoking, or plasma cholesterol levels between the patients who were and those who were not successfully genotyped (data not shown). General characteristics of the genotyped patients are shown in Table 1 and the age distribution in Supplementary Figure 1.

Table 1 Characteristics of the ARGOS population and the replication populations

Eighty-one cases (32.5%) had a negative LDLR mutation, for example, a mutation leading to complete loss of function of the LDL receptor, whereas only 42 controls (19.4%) had a receptor-negative mutation (P=0.004). This was mainly due to an overrepresentation of the c.1359-1G>A mutation, which was present in 46 cases and only in 21 controls. The other mutations were equally distributed (Supplementary Table 1). On average, the controls were 34 years older than the cases (P-value <0.001). Consequently, hypertension and diabetes mellitus were more often present in the controls than in the cases. More cases than controls were ever smokers (Table 1).

Stage I (GWA analysis)

After quality-control filtering, we included 535 179 SNPs in the GWA analysis. A quantile–quantile plot of the observed against expected P-value distribution is shown in Figure 2. The genomic inflation factor (λgc) was 1.01 in the total sample. Supplementary Figure 2 illustrates the primary findings from the GWA analysis in the ARGOS population and presents P-values for each of the interrogated SNPs across the chromosomes. For a total of 40 SNPs clustered around 21 loci on all chromosomes except 3, 6, 12, 15, 16, and 18–21 the P-value was lower than the threshold of 1 × 10−4 (Table 2). Of these, 11 were in complete linkage disequilibrium (LD) with a leading SNP in the same locus. Twelve out of 40 SNPs were located in a cluster on chromosome 11p15; they were located in four different LD blocks and could be tagged by six SNPs. We took 29 SNPs, including the 6 SNPs in11p15, to stage II.

Figure 2
figure 2

QQ Plot.

Table 2 SNPs influencing coronary heart disease risk in the ARGOS population (with P<1.0 × 10−4)

Stage II

We successfully genotyped 28 SNPs in MARCH and all 29 SNPs in FHFU (Supplementary Table 2). For none of the SNPs, the P-value was lower than the Bonferroni corrected threshold of 0.017 either in MARCH or FHFU. The smallest P-value was found for rs176388 (odds ratio OR 0.33, P=0.042) in MARCH. Although the direction of the effect was the same in FHFU, the association was not significant (OR 0.80, P=0.36). We performed a meta-analysis to combine the results of the analysis in MARCH and FHFU. None of the SNPs reached the Bonferroni significant threshold after combining the results of MARCH and FHFU.

Finally, we combined the results of ARGOS, MARCH, and FHFU; however, none of the top SNPs identified in ARGOS reached the genome-wide significance level. The smallest combined P-value was 4 × 10−4 for rs176388, for which the A allele showed a protective effect in all cohorts. Second best was rs1380945, for which all cohorts showed that the G allele was associated with increased CHD risk (Table 3 and Supplementary Figure 3).

Table 3 Results of the genome-wide association study in the two FH replication populations

Genotyping additional cohorts from the general population did not show any genome-wide significance.

Replication of well-known SNPs associated with CHD in the general population

To examine if the genetic risk variants identified in GWA studies in the general population also showed an effect in ARGOS and to test whether effect sizes were larger in our ‘extreme genetics’ population, we examined the association of 25 previously reported genetic risk factors in CARDIoGRAM. In this meta-analysis, 9 out of 12 previously reported CHD loci were confirmed with a P-value of <5.0 × 108 and 13 new ones identified (Table 4).5 None of the studied SNPs were significantly associated with CHD in ARGOS. Lowest P-values were obtained for rs4977574 at 9p21 (CARDIoGRAM OR 1.29, 95% CI 1.23–1.36; P=1.35 × 10−22; ARGOS OR 1.28, 95% CI 1.00–1.67; P=0.05) and for the SNP in the proprotein convertase subtilisin/kexin type 9 (PCSK9) gene that did not show genomic signficance in CARDIoGRAM (CARDIoGRAM OR 1.08, 95% CI 1.05–1.11; P=9.1 × 10−8; ARGOS OR 1.52, 95% CI 1.10–2.12, P=0.01).

Table 4 Results of earlier identified genetic risk factors in ARGOS5

Discussion

In this study, by using an extreme genetics approach, we aimed to fortify our statistical power to identify novel genetic risk variants for CHD. However, none of the suggestive findings were either confirmed in the second stage or reached the genome-wide significant threshold in a meta-analysis of all populations combined.

We expected to identify larger effect sizes in our GWA study, compared with traditional GWA studies on CHD, for two reasons. First, we studied genetic risk variants for CHD in a cohort of FH patients, and hypercholesterolemia is one of the most important risk factors for CHD. Based on Rothman’s model of causation, one would expect that in the presence of similar environmental factors, risk variants in genes will be associated with larger outcome effects.6 Second, we applied an extreme selection approach. Therefore, the genetic contrast between cases and controls was expected to increase.18 This incremental contrast has been shown to increase the power in simulation studies in quantitative traits. The effect sizes found when selectively genotyping only the phenotypic extremes will be increased.18, 19, 20 Plomin suggested that common disorders could also be considered as quantitative traits, as risk on a common disease in the population could be regarded as a distribution of ‘polygenetic liability’ to a disease. The extreme sampling approach should, therefore, produce larger effect sizes in our study as well. Thus, the power was expected to be relatively high, despite the reduction in number of individuals as a result of the selection criteria.18, 19, 21 Our study is the first to apply this method using real data. Using Quanto, we confirmed that the study was sufficiently powered to identify risk variants with large effect sizes in this high-risk group of subjects. Our findings, being more specific our odds ratios, indicate that the leverage in effect size using this approach is, in fact, quite modest. This is also clear from the data from CARDIoGRAM in our study, as an example, the well-replicated locus at 9p21.3 had an OR of 1.28, very close to the 1.25 value that was found in the original study of subjects from the general population. Most other known CHD risk SNPs identified in earlier studies and published by Schunkert et al5 were not associated with statistically significant effects in this FH sample. However, our study was underpowered for this analysis if the true effect sizes were the same as reported and less increased by our design than anticipated, so we can only look at the direction of the effect. Out of 18 SNPs that could be tested, 11 ORs were in the same direction, whereas only 6 were not (Table 4). Of these 11, most interesting was the SNP in PCSK9. This SNP is associated with an 8% higher risk of CHD in the general population; however, the odds in ARGOS were increased by >50%. Although the P-value in CARDIoGRAM did not reach genome-wide significance, it is tempting to speculate that this finding and the enhanced effect in ARGOS is not due to chance, but might reflect an interaction between this gene and cholesterol levels. PCSK9 is a gene encoding proprotein convertase subtilisin/kexin type 9, involved in the intracellular degradation of LDL receptors. PCSK9 levels correlated inversely with LDL cholesterol levels in FH patients.22 PCSK9 inhibitors are being tested in phase III trials23 now, and are expected to be highly effective in FH patients.24, 25

Our definition of extreme groups might have been too restrictive, as it constitutes a very small proportion of the FH sample (500 out of 17 000).

Nevertheless, our findings may help to understand why genetic risk prediction models have not yet succeeded. In our approach, we attempted to identify novel genes by selecting the extreme groups of the CHD risk distribution spectrum. Genetic risk prediction studies, on the other hand, start with genetic information and estimate who will end up in the low- and the high-risk group.26 As we could demonstrate that the effect sizes in this design are not as enlarged as we expected, the contrast needed for prediction might be less than anticipated as well: genetic risk factors most likely have a more gradual distribution over the population instead of being over- or underrepresented in the phenotypic extremes and the combination of classical and genetic risk factors defines the CHD risk.

Our approach has a number of limitations. Finding extreme cases is a challenging effort. Although FH is a relatively common genetic disorder, collecting a large sample of subjects either with early onset CHD or healthy aging is difficult. The low genomic inflation factor (1.01) in the total sample indicated that population admixture was not likely.27 We do realize that from a statistical point of view our sample size was not enough to detect genes with small effects. Another issue might be whether difference in age between cases and controls might bring in extra confounding. Although only Caucasian patients were studied, first and second degree family members were excluded and we had phenotypic data, we cannot exclude the presence of an unknown confounder. A last limitation of a GWA study in FH subjects is that the results may not necessarily apply to the general population.28 Results may be restricted to FH patients or to hypercholesterolemic patients in general. Also, it might be possible that CHD risk in the young is genetically different from CHD risk at an older age.

We conclude that the genetics of CHD risk in FH is complex and even applying an ‘extreme genetics’ approach, we did not identify new genetic risk variants. Most likely, this method is not as effective in leveraging effect size as anticipated, and may, therefore, not lead to significant gains in statistical power. Also, this study might explain why genetic risk prediction modeling is yielding disappointment. The odds ratio associated with genetic variation at the PCSK9 locus points to important consequences for PCSK9 activity in FH patients and provides hope for the novel approach to lower these levels through monoclonal antibodies in order to prevent CHD.