Analyses of more than 60,000 exomes questions the role of numerous genes previously associated with dilated cardiomyopathy

Abstract Background Hundreds of genetic variants have been described as disease causing in dilated cardiomyopathy (DCM). Some of these associations are now being questioned. We aimed to identify the prevalence of previously DCM associated variants in the Exome Aggregation Consortium (ExAC), in order to identify potentially false‐positive DCM variants. Methods Variants listed as DCM disease‐causing variants in the Human Gene Mutation Database were extracted from ExAC. Pathogenicity predictions for these variants were mined from dbNSFP v 2.9 database. Results Of the 473 DCM variants listed in HGMD, 148 (31%) were found in ExAC. The expected number of individuals with DCM in ExAC is 25 based on the prevalence in the general population. Yet, 35 variants were found in more than 25 individuals. In 13 genes, we identified all variants previously associated with DCM; four genes contained variants above our estimated cut‐off. Prediction tools found ExAC variants to be significantly more tolerated when compared to variants not found in ExAC (P = 0.004). Conclusion In ExAC, we identified a higher genotype prevalence of variants considered disease‐causing than expected. More importantly, we found 13 genes in which all variants previously associated with DCM were identified in ExAC, questioning the association of these genes with the monogenic form of DCM.


Introduction
Familial dilated cardiomyopathy (DCM) is a heart muscle disorder defined by the presence of dilatation and systolic impairment of the left or both ventricles, in the absence of hypertension, coronary artery disease, or valvular abnormalities (Codd et al. 1989;Elliott et al. 2008;Jefferies and Towbin 2010). Dilated cardiomyopathy is a known risk factor for sudden cardiac death (SCD), a major cause of heart failure (HF) and end-stage disease may necessitate cardiac transplantation in both children and adults (Elliott et al. 2008;Everly 2008;Jefferies and Towbin 2010). Dilated cardiomyopathy is a disorder affecting approximately 1:2500 individuals (Codd et al. 1989;Maron et al. 2006). Factors, such as myocarditis, radiation, or toxins may lead to the development of the DCM phenotype. However, a large proportion of cases are idiopathic and a genetic etiology is reported in 48% of these cases (Haas et al. 2015).
Remarkable progress has been made regarding the genetic background of DCM, associating the disease with hundreds of rare variants across many genes (Harakalova et al. 2015;The Human Gene Mutation Database 2015). Due to latest advances in the field of genetic testing, recent reports have found variants previously associated with DCM to be either nonpathogenic or low-frequency polymorphisms (Andreasen et al. 2013). These findings now question the role of many variants associated with the monogenic form of DCM.
In 2014, the Exome Aggregation Consortium (ExAC) published a browser providing exome data on approximately 61,000 individuals, thereby giving knowledge on low-frequency polymorphisms (Exome Aggregation Consortium).
We aimed to identify possible false-positive genetic variants previously associated with DCM, by investigating the prevalence of previously reported DCM associated variants in the ExAC data and comparing the prevalence of these variants with the expected prevalence of DCM in the same population.

Materials and Methods
In ExAC, next-generation sequencing of protein coding regions of the genome was performed in 60,706 unrelated individuals from different population groups, ultimately divided in South Asians, Europeans (non-Finnish), Finnish, Africans, East Asians, Latinos and others. No clinical data are available on the ExAC population. The Human Gene Mutation Database (HGMD) was searched with the term "dilated, cardiomyopathy" and missense-, stop gained-and splice variants associated with DCM were identified.(The Human Gene Mutation Database 2015) Variants found in HGMD were then systematically searched for in ExAC.
Based on the DCM prevalence of 1:2500 (Codd et al. 1989;Maron et al. 2006), the estimated number of individuals with DCM in ExAC is approximately 25. Thus, a given variant with complete penetrance can be present 25 times in ExAC and still theoretically be the monogenic cause of DCM.
The pathogenicity of stop gained and splice variants were not predicted by prediction tools.
The American College of Medical Genetics and Genomics (ACMG) recently proposed a new set of criteria for interpretation of variants (Richards et al. 2015). They recommended a five-tier terminology system using the terms "pathogenic", "likely pathogenic", "benign", "likely benign", and "uncertain significance". The term "likely" was defined as a 90% certainty. Necessary information was extracted from ExAC, HGMD, Ensembl, the Exome Variant Server (EVS), and published literature (The Human Gene Mutation Database 2015; Ensembl; Exome Aggregation Consortium; Exome Variant Server). In silico analysis was used as a tool to incorporate all information and classify variants (In silico ACMG program). This method was used to further examine and classify the variants found in the 13 genes questioned.
Differences in distribution of the LR prediction categories between variants identified in ExAC and variants not identified in ExAC were assessed by a chisquared test. A two-sided P-value <0.05 was considered significant.

Results
We found 473 variants previously associated with DCM in HGMD and 148 (31%) of these were identified in ExAC (Table 1). Eight variants were stop gained variants, nine were splice variants, and 131 were missense variants ( Table S1). The missense variants were found in both homozygous and heterozygous individuals. The stop gained variants were all heterozygous and found in five different genes (BAG3, LAMA4, PLN, TTN, and VPS13A) and the splice variants were found in four different genes (FLNC, TNNT2, TTN, and LMNA), all heterozygous for the variants as well.
The 148 variants were identified in 7928 alleles corresponding to 7743 individuals carrying a variant when taking homozygous alleles into account. However, this is only valid if we assume that no individual in the ExAC population has more than one DCM-associated variant. On average, the 148 variants found in ExAC have been screened for in 55,366 individuals. This corresponds to a DCM genotype prevalence of 1:7 (7743:55,366) in ExAC.
When comparing variants found in ExAC with those not found in ExAC, we find a significantly larger number of tolerated variants in ExAC when using LR prediction (P = 0.004). However, there was no difference in the number of damaging and neutral variants between the two groups (P = 0.07 and P = 0.09, respectively).
The pathogenicity of variants found in the 13 genes in which all HGMD listed variants were identified in ExAC was further classified according to the ACMG guidelines. Of the 22 variants, 14 were of uncertain significance, three were classified as likely benign, three as benign, and two as pathogenic (Table 3).

Discussion
In a recently published exome browser, ExAC, we identified 148 variants previously associated with DCM. This corresponds to 31% of all variants associated with DCM, questioning a substantial proportion of the genetic background. The estimated prevalence of DCM is 1:2500 in the general population, (Codd et al. 1989;Maron et al. 2006) we would therefore expect around 25 individuals to have DCM in the ExAC data. The 148 variants found in ExAC were identified in 7743 individuals, corresponding to a genotype prevalence of 1:7. Thus, the genotype prevalence was 3-400-fold higher than the expected phenotype prevalence of DCM. This could relate to low penetrancethough unlikely to explain a difference of this sizeor it might relate to lower pathogenicity of these variants. We found 35 variants to be present in more than 25 individuals. Recent guidelines stated that an allele frequency which is higher than expected for a disease in a control population is considered a strong indication for a benign interpretation (Richards et al. 2015). Supporting the notion, that these genetic variants are less likely to be monogenetic causes of DCM.
The remaining 113 variants found in ExAC were present in <25 alleles and an interpretation of these variants is therefore more complicated. LR predicted variants found in ExAC to be significantly more tolerated (P = 0.004). This could indicate that variants identified in Table 1. DCM-associated variants identified in HGMD and found in ExAC.

No. of variants found in ExAC
No. of variants found in HGMD ExAC are less likely disease-causing in comparison to those not found ExAC.
The most important finding of this study is the identification of 13 genes in which all variants previously associated with DCM were found in ExAC (Table 3). This highly questions the role of these genes in the DCM pathogenesis and further suggests that these genes could be innocent bystanders with regard to DCM. Four genes, TMPO, NEBL, FLT1, and ISL1, are specially worth noticing, as variants found in these genes were found in more than 25 individuals. Interestingly, variants in TMPO and FLT1 have previously been identified numerously in a large control population (Andreasen et al. 2013). However, our study validates this in a control population ninefold larger. The four genes, TMPO, NEBL, FLT1, and ISL1, are examples of genes initially associated through candidate gene approach due to a plausible association to DCM. NEBL and The new ACMG guidelines provide a more stringent approach with regard to classifying variants. We applied these guidelines on the variants found in the 13 genes questioned and found a large proportion to be of uncertain significance (Table 3), due to either conflicting evidence or lack of evidence (Table S3). This simply AN, allele number, shows how many individuals were exome sequenced at the given locus; AC, allele count, a count of how many alleles of a given variant was found; LR, logistic regression; Homoz., homozygous.
demonstrates that we do not have sufficient evidence for classification of these variants. Due to advances in the genetic field, exome sequencing has become more accessible in recent years. Thus, a large number of reports have been conducted, to separate falsepositive variants from truly disease-causing variants in different cardiomyopathies and channelopathies (Norton et al. 2012;Refsgaard et al. 2012;Andreasen et al. 2013;Mogensen et al. 2015). All together, these findings suggest a revision of previously DCM-associated variants and an optimized approach for the clinical application of genetic findings.
Genetic screening is applied in daily clinical practice more than ever, stressing the importance of accurately classifying genetic variants. Presymptomatic testing of relatives for a false-positive variant will lead to a wrong differentiation between carriers at riskand non-carriers not at risk, ultimately leading to devastating outcomes. It is therefore of the outmost importance that variants reported as causative truly are disease-causing. In addition, our data suggest that a positive genetic test result should not be considered pathogenic beyond doubt but considered a supplement to the clinical assessment.
In conclusion, we found a much higher genotype prevalence of previously DCM-associated variants than expected in a newly published population-based exome browser.
Our findings suggest that some of these previously DCMassociated variants could represent false-positive findings or at best disease modifiers. More importantly, we found 13 genes in which all variants previously associated with DCM were identified, which seriously questions these genes as likely monogenic causes of DCM.
The study was limited by the lack of clinical data. However, the penetrance of the disease and the age at onset is highly variable. Thus, clinical data would not have changed the conclusion of this study (Mestroni et al. 1999). We acknowledge that the cut-off can be debated. However, the cut-off is arbitrary and only provided as a tool to highlight variants too numerous, to be a monogenic cause of DCM. We choose a very conservative cut-off based on the prevalence of DCM (1:2500), as it implicates that all DCM cases in the ExAC browser are caused by one variant, which is indeed unlikely. However, a small overrepresentation of DCM in the ExAC data is possible. Nevertheless, it would be unlikely that this would contribute with more than a few alleles to the total allele count for each reported variant as most DCM variants reported in the literature are very rare variants specific for a family. In other words, we still find 25 as a reasonable cut-off. The assumption that no individual has more than one DCM-associated variant in ExAC is of course a limitation. However, this is the condition for the analysis. We rely on the fact that the prevalence of AN, allele number, shows how many individuals were exome sequenced at the given locus; AC, allele count, a count of how many alleles of a given variant was found; LR, logistic regression; Homoz., homozygous.
DCM is so high in ExAC, that it is very unlikely that the difference in prevalence of DCM in ExAC and the general population can be explained by individuals carrying more than one variant.

Supporting Information
Additional Supporting Information may be found online in the supporting information tab for this article: Table S1. DCM-associated variants identified in the Exome Aggregation Consortium. Table S2. Additional analysis for Polyphen-2 and SIFT predictions Chi-squared test for Polyphen-2 predictions. Table S3. ACMG classification of variants in 13 genes.