Evaluating the Clinical Validity of Hypertrophic Cardiomyopathy Genes

Supplemental Digital Content is available in the text.


Supplementary Tables and Figures:
For supplementary tables, please see included excel spreadsheets.
-S Table 1: Full list of unique genes identified on n=24 NCBI Genetic Testing Registry panels -S Table 2: 24 GTR panels showing percentage of panels representing the gene -S Table 3: Gene lists showing compilation of the final curation gene list -S Table 4: Full list of curated genes showing ExAC missense and loss of function constraint scores and GTEx expression in left ventricle -S Table 5: Full list of curated genes and matrix scores -S Table 6: Full list of curated genes and OMIM phenotype description -S Table 7: 26 OMIM genes reported to be associated with "HCM" -S Table 8: All n=4191 ClinVar assertions identified using the search strategy -S Table 9   There is no convincing case-control evidence for an association between ACTA1 and HCM. Although this gene-disease association is supported by expression studies, no reports have convincingly implicated the gene in humans. In summary, there is no reported evidence to support this gene-disease association with HCM. This classification was approved by the Hypertrophic Cardiomyopathy Gene Curation Expert Panel on November 1, 2016. ACTC1: Hypertrophic cardiomyopathy https://search.clinicalgenome.org/kb/gene-validity/8766 The ACTC1 gene has been associated with autosomal dominant hypertrophic cardiomyopathy (HCM) in at least 6 probands in 4 publications. Four unique variants (missense) with convincing evidence of pathogenicity have been reported in humans, including de novo inheritance with maternity and paternity confirmed in two cases and segregation with disease in 26 additional family members. ACTC1 was first associated with this disease in humans in 1999 (Mogensen et al, PMID 10330430). More evidence is available in the literature, but the maximum score for genetic evidence (12 pts.) has been reached. The mechanism for disease is unknown. The ACTC1 gene was significantly enriched for missense variants in Walsh et al, 2016 (PMID 27532257). Overall, the gene was found to have an Odds Ratio of 8.59 (5.06-14.5) for HCM. This gene-disease association is supported by expression studies, in vitro functional assays, and an animal model. In summary, ACTC1 is definitively associated with autosomal dominant HCM. This has been repeatedly demonstrated in both the research and clinical diagnostic settings, and has been upheld over time. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Expert Panel on September 5, 2017. ANKRD1: Hypertrophic Cardiomyopathy https://search.clinicalgenome.org/kb/gene-validity/10043 The ANKRD1 gene has been associated with autosomal dominant hypertrophic cardiomyopathy (HCM) in 5 probands in 2 publications. Three unique missense variants have been reported in humans with no family history of cardiomyopathy. ANKRD1 was first associated with this disease in humans in 2009 (Arimura et al, 19608031). Evidence suggest a gain-of-function mechanism for variants in this gene (Crocini et al, 2013, PMID 23572067). This gene-disease association is supported by expression studies in mature rat cardiomyocytes and the examination of contraction parameters using engineered heart tissues. In summary, there is limited evidence to support this gene-disease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on September 19, 2017.
CACNB2: Hypertrophic cardiomyopathy https://search.clinicalgenome.org/kb/gene-validity/10044 No convincing evidence for a causal role for CACNB2 in hypertrophic cardiomyopathy (HCM) has been reported. Although this gene-disease association is supported by expression in heart, no reports have directly implicated the gene in HCM. In summary, there is no reported evidence to support this gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on June 6, 2017.
CALR3: Hypertrophic cardiomyopathy https://search.clinicalgenome.org/kb/gene-validity/fd8036e5-bf01-4b89-b4fc-94080d835d99--2018-08-06T13:14:27 The CALR3 gene has been associated with hypertrophic cardiomyopathy (HCM) in 2 probands in 1 publication. Two unique heterozygous variants of unknown significance (missense), with no experimental evidence to support their pathogenicity, have been reported in humans (Chiu et al, 2007, PMID 17655857). One additional variant was reported in a proband with variants in a number of HCMassociated genes, possibly indicating a modifying role for CALR3 as suggested by the authors (Botillo et al, 2016, PMID 26656175). The mechanism for disease is unknown. There is no experimental evidence to support the gene-disease association. In summary, there is limited evidence to support this genedisease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on February 7, 2017. The mechanism for disease is unknown. The gene-disease association is supported by expression studies, in vitro assays, and animal models. In summary, there is moderate evidence to support this gene-disease association. While more evidence is needed to establish this association definitively, no convincing contradictory evidence has emerged. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on July 18, 2017. KCNQ1: Hypertrophic cardiomyopathy https://search.clinicalgenome.org/kb/gene-validity/10059 The KCNQ1 gene has been associated with autosomal dominant hypertrophic cardiomyopathy (HCM) using the ClinGen Clinical Validity Framework. This association was made using Case-level data alone. Only one 1 missense variant has been reported in an individual that presented with HCM and long QTc intervals (D'Argenio et al, 2014, PMID 24183960). Furthermore, the individual also harbored variants in two other genes associated with hypertrophic cardiomyopathy, MYBPC3 (c.3627+2T>A) and TNNT2 (c.459+175G>A). MYBPC3 (c.3627+2T>A) is considered pathogenic and segregated in individuals in the family presenting with hypertrophic cardiomyopathy. While the D'Argenio paper asserts pathogenicity for the KCNQ1 variant, it is unclear whether this assertion is for the the Long QT or HCM phenotypes observed in the proband. Furthermore, the current ClinGen Clinical Validity Framework is for Mendelian inheritance and does not support classifications for disease in which multiple genes may be contributory. Of note, this gene has also been implicated in Long QT syndrome, and this gene-disease relationship will be assessed separately. There is no experimental evidence to support the association at this time. In summary, there is no reported evidence of Mendelian inheritance to support this genedisease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts or refutes the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Expert Panel on January 4, 2017.
KLF10: Hypertrophic cardiomyopathy https://search.clinicalgenome.org/kb/gene-validity/10077 The KLF10 gene has been associated with autosomal dominant hypertrophic cardiomyopathy using the ClinGen Clinical Validity Framework. This association was made using case-level data and case-control data. At least 5 unique missense variants have been reported, and an additional variant that is predicted benign based on maximum allele frequency in the general population above the recommended cutoff for pathogenicity (Bos et al, 2012, PMID 22234868). KLF10 was first associated with this disease in humans as early as 2012. The association was observed in only 6 probands from one publication (Bos et al, 2012 PMID 22234868). No segregation data is available. The mechanism for disease is unclear, but predicted to be loss of function (LOF)  More evidence is available in the literature, but the maximum score for genetic evidence and/or experimental evidence (12 pts.) has been reached. Of note, MYBPC3 has been shown to cause HCM in an autosomal recessive fashion, with earlier and more severe presentation of phenotypes associated with HCM, and represents a semi-dominant condition. The molecular mechanism for HCM is loss of function (LOF), and missense, nonsense, frameshift and splice site mutations in MYBPC3 have been shown to be pathogenic for cardiomyopathy. Of note, this gene has been implicated in dilated cardiomyopathy and left ventricular noncompaction. This gene-disease association is supported by biochemical, expression, protein interaction, and animal models evidence. In summary, MYBPC3 is definitively associated with autosomal dominant HCM. This has been repeatedly demonstrated in both the research and clinical diagnostic settings, and has been upheld over time. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Expert Panel on September 5, 2017. but, upon review, were considered too common to be disease-causing. The mechanism for disease is unknown. Experimental evidence to support the gene-disease association includes expression data (Gorza et al, 1984, PMID 6234108) and a biochemical function similar to a known HCM gene, MYH7 (unconventional myosin). In summary, there is limited evidence to support this gene-disease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on November 21, 2017. , that also harbored a variant of unknown significance in MYH7 (p.E743D), a gene which is also implicated in the disease hypertrophic cardiomyopathy. No segregation data is available. The mechanism for disease is unknown, but predicted to be gain of function (GOF) from functional assays performed by Davis et al., 2001. This gene-disease association is supported by expression studies showing restricted expression of MYLK2 in skeletal muscle and heart, cell culture model system, and an animal model. In summary, there is limited evidence to support this gene-disease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Expert Panel on April 17, 2017.

MYOM1: Hypertrophic cardiomyopathy
https://search.clinicalgenome.org/kb/gene-validity/8ff00fbf-6b20-49cd-8af8-0f6404240db5--2018-08-01T18:51:28 The MYOM1 gene has been associated with hypertrophic cardiomyopathy (HCM) in 1 proband in 1 publication. A single missense variant, p.Val1490Ile, segregated in 2 affected family members with HCM and destabilized dimerization of the MYOM1 protein (Siegert et al, 2011, PMID 21256114). The mechanism for disease is unknown. This gene-disease association is supported by an expression study (Schoenauer et al, 2011, PMID 21069531). In summary, there is limited evidence to support this genedisease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on July 20 th , 2017. ). However, the frequencies of these variants in the ExAC database (exac.broadinstitute.org) are consistent with benign variation. Another unique variant (missense) was identified in a proband that also harbored a variant in MYH7 and was excluded as causative (Guo et al, 2017, PMID 28296734). The mechanism for disease is unknown. The gene-disease association is supported by expression data, an interaction with ACTN2, and a mouse model. In summary, there is limited evidence to support this gene-disease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on June 20, 2017.

MYPN: Hypertrophic cardiomyopathy
https://search.clinicalgenome.org/kb/gene-validity/10048 The MYPN gene has been associated with hypertrophic cardiomyopathy (HCM) in 9 probands in 2 publications. MYPN was first associated with this disease in humans in 2010 (Bagnall et al, PMID 20801532). The proband identified in this paper was shown to also harbor a pathogenic variant in MYH7. Five unique variants of unknown significance (4 missense, 1 nonsense) with no experimental evidence to support their pathogenicity have been identified (Purevjav et al, 2012, PMID 22286171) in addition to 4 variants that are predicted to be benign. The mechanism for disease is unknown. The genedisease association is supported by expression studies, a mouse model, and an in vitro assay. In summary, there is limited evidence to support this gene-disease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the genedisease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on November 1, 2016. were also excluded as causative after expert review. The mechanism for disease is unknown. The gene-disease association is supported by expression data in addition to an in vitro assay. In summary, there is limited evidence to support this gene-disease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on February 1, 2017.
PDLIM3: Hypertrophic cardiomyopathy https://search.clinicalgenome.org/kb/gene-validity/76cba060-8080-42b1-b9ea-2193fea658b0--2018-08-06T13:21:24 The PDLIM3 gene has been associated with HCM in 2 probands in 2 publications. One variant of unknown significance and 1 multi-exon deletion have been reported in humans with HCM. PDLIM3 was first associated with HCM in humans in 2010 (Bagnall et al, 2010, PMID: 20801532). PDLIM is expressed in heart and interacts with ACTN2. However, there is no known mechanism through which PDLIM causes HCM. In summary, there is limited evidence to support this gene-disease association with HCM. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on November 1, 2016. The frequencies of 2 variants in the ExAC database (exac.broadinstitute.org) are consistent with benign variation. Two probands were found to harbor disease-causing variants in the TNNI3 gene and MYBPC3 gene, respectively, in addition to a variant of unknown significance in the TCAP gene. The mechanism for disease is unknown. The gene-disease association is supported by expression data and an interaction with CSRP3. In summary, there is limited evidence to support this gene-disease association. Although more evidence is needed to support a causal role, no convincing evidence has emerged that contradicts the gene-disease association. This classification was approved by the ClinGen Hypertrophic Cardiomyopathy Gene Curation Expert Panel on November 1, 2016.

BACKGROUND:
ClinGen's gene curation process is the method designed to aid in evaluating the strength of a gene-disease relationship based on publicly available evidence. Information about the gene-disease relationship, including genetic, experimental, and contradictory evidence curated from the literature is compiled and used to assign a clinical validity classification per criteria established by the ClinGen Gene Curation Working Group (GCWG) [1]. This protocol details the steps involved in curating a genedisease relationship and subsequently assigning a clinical validity classification. This curation process is not intended to be a systematic review of all available literature for a given gene or condition, but instead an overview of the most pertinent evidence required to assign the appropriate clinical validity classification for a gene-disease relationship at a given time. While the following protocol provides guidance on the curation process, professional judgment must be used when deciding on the strength of different pieces of evidence that support a gene-disease relationship.
Optional: Microsoft Office (Word, Excel, or Powerpoint to record your data from curation)

OVERVIEW OF GENE CURATION:
The gene curation frame work consists of the following steps.
• Collection of evidence: The evidence is collected primarily from published peer-reviewed literature, but can also be present in publicly accessible resources, such as variant databases, which can be used with discretion. Literature searches can be conducted using PubMed (http://www.ncbi.nlm.nih.gov/pubmed) and/or Google Scholar (https://scholar.google.com/intl/en/scholar/help.html#searching) • One need not comprehensively curate all evidence for a gene-disease pair (particularly for "Definitive" associations), but instead focus on curating and evaluating the relevant pieces of evidence described in this protocol.
• Identifying different evidence types: The curator needs to identify and curate genetic and experimental evidence separately (details are defined later in "Genetic Evidence" and Experimental Evidence" sections). Genetic evidence is divided into two categories: case-level data and case-control data. Typically studies describing individuals or families with variants in the gene of interest will be scored as case-level data, while studies using statistical analysis to determine the enrichment of variants in case and control groups will be scored as case-control data. The gene-level experimental data used in this framework to assess a gene-disease relationship are in vitro and in vivo functional studies that implicate the causative role of a gene in disease. These are based on MacArthur and colleagues and described in detail below [2].
• Assignment of clinical validity classification using gene curation matrix: Next the curator evaluates the evidence and assigns points to the evidence using the scoring matrices provided below (Fig. 3,8). This information is then summarized and tallied to generate a total score and calculated clinical validity classification, which will be reviewed by a committee of appropriate disease experts.

Figure 1: Gene Curation Workflow CLINICAL VALIDITY CLASSIFICATIONS:
The gene curation working group members have developed a method to qualitatively define the "clinical validity" of a gene-disease relationship using a classification scheme based on the strength of evidence that supports or refutes any claimed relationship. This framework allows the "clinical validity" of a gene-disease relationship to be transparently and systematically evaluated. These classifications can then be used to prioritize genes for analysis in various clinical contexts. The suggested minimum criteria needed to obtain a given classification are described for each clinical validity classification. These criteria include both genetic and experimental evidence, which are described below in this document. The default classification for genes without an identified variant in humans is "No Reported Evidence." The level of evidence needed for each supportive gene-disease relationship category builds upon the previous category (e.g. "Limited" builds upon "Moderate"). Gene-disease relationships classified as "Contradictory" likely have evidence supporting as well as opposing the gene-disease association, but are described separately from the classifications for supportive gene-disease relationships.

Evidence Level Figure 2: Clinical Validity Classifications (Evidence Description)
Supportive Evidence

DEFINITIVE
The role of this gene in this particular disease has been repeatedly demonstrated in both the research and clinical diagnostic settings, and has been upheld over time (in general, at least 3 years). No convincing evidence has emerged that contradicts the role of the gene in the specified disease.

STRONG
The role of this gene in disease has been independently demonstrated in at least two separate studies providing strong supporting evidence for this gene's role in disease, including both of the following types of evidence: • Strong variant-level evidence demonstrating numerous unrelated probands harboring variants with sufficient supporting evidence for disease causality 1 • Compelling gene-level evidence from different types of supporting experimental data 2 . In addition, no convincing evidence has emerged that contradicts the role of the gene in the noted disease.

MODERATE
There is moderate evidence to support a causal role for this gene in this disease, including both of the following types of evidence: • At least 3 unrelated probands harboring variants with sufficient supporting evidence for disease causality 1 • Moderate experimental data 2 supporting the gene-disease association The role of this gene in disease may not have been independently reported, but no convincing evidence has emerged that contradicts the role of the gene in the noted disease.

LIMITED
There is limited evidence to support a causal role for this gene in this disease, such as: • Fewer than three observations of variants with sufficient supporting evidence for disease causality 1 OR • Variants have been observed in probands, but none have sufficient evidence for disease causality.
• Limited experimental data 2 supporting the gene-disease association The role of this gene in disease may not have been independently reported, but no convincing evidence has emerged that contradicts the role of the gene in the noted disease.

NO REPORTED EVIDENCE
Evidence for a causal role in disease has not been reported. These genes might be "candidate" genes based on linkage intervals, animal models, implication in pathways known to be involved in human diseases, etc., but no reports have directly implicated the gene in human disease cases.

CONFLICTING EVIDENCE REPORTED
Although there has been an assertion of a gene-disease association, conflicting evidence for the role of this gene in disease has arisen since the time of the initial report indicating a disease association. Depending on the quantity and quality of evidence disputing the association, the association may be further defined by the following two sub-categories: 1. Disputed a. Convincing evidence disputing a role for this gene in this disease has arisen since the initial report identifying an association between the gene and disease. b. Disputing evidence need not outweigh existing evidence supporting the gene-disease association.

Refuted
a. Evidence refuting the role of the gene in the specified disease has been reported and significantly outweighs any evidence supporting the role. b. This designation is to be applied at the discretion of clinical domain experts after thorough review of available evidence c. While it is nearly impossible to entirely refute a gene's potential role in disease, this category is to be used when all existing data has been fully refuted leaving the gene with essentially no valid evidence remaining, after an original claim.

NOTES
1 Variants that disrupt function and/or have other strong genetic and population data (e.g. de novo occurrence, absence in controls, strong linkage to a small genomic interval, etc.) are considered convincing of disease causality in this framework. See "Variant Evidence" on p.13 for more information. 2 Examples of appropriate types of supporting experimental data based on those outlined in MacArthur et al. 2014 [2].

LITERATURE SEARCH:
Many human genes are implicated in more than one disorder. Therefore, prior to starting a curation and entering the details into the Gene Curation Interface, the curator should be absolutely clear on which disease entity is being curated. The expert group can give guidance if needed.
1. The initial search should be broad and inclusive. A good way to start is by searching "gene symbol/name AND disease" (in some cases it may be sufficient to search for the gene name/symbol alone a. Curation may occur from that publication ONLY when sufficient details are included in the review article. b. If sufficient details are NOT included in the review article then the curator will need to return to each individual publication to curate the information. 3. Additional searches are often necessary to identify sufficient gene level experimental evidence. Note that additional gene level experimental evidence may exist in publications BEFORE the gene:disease association was first made. a. Search PubMed for experimental data (Examples below) • "gene AND function" • "protein AND function" • "gene AND animal" b. Additional information may also be available in OMIM (www.OMIM.org) in the "Gene function" or "Biochemical Features" sections c. GeneReviews (http://www.ncbi.nlm.nih.gov/books/NBK1116/) often has information in the "Molecular Genetics" section of the disease entries that may be useful. d. Other databases such as UniProt (www.uniprot.org/), MGI (www.informatics.jax.org/), etc. may also be useful, provided that primary references are given that can be curated. For a list of databases that may be helpful for the curation process, see Appendix A. e. GeneRIFs (Gene References Into Functions), within NCBI Gene, lists article links that summarize experimental evidence for a given gene. The link itself leads to an article in PubMed and can serve as an additional source for experimental evidence.
4. An additional component of the curation process is to determine if the original gene-disease association has been replicated; therefore, it is critical to find the original paper with the proposed relationship. OMIM and GeneReviews often cite the first publication and should be cross-referenced. Additionally, a recent review article may be helpful in ruling out any contradictory evidence that may have been reported since the original publication.
a. The "Allelic Variants" section of OMIM and the "Molecular Genetics > Pathogenic allelic variants" section of GeneReviews may have relevant information. b. Be sure to extract information from the original publication, NOT directly from these websites.
Once all of the relevant literature about the gene-disease relationship has been assembled, curation of the different pieces of evidence can begin.

GENETIC EVIDENCE
Genetic evidence may be derived from case-level data (studies describing individuals or families with variants in the gene of interest) and/or case-control data (studies in which statistical analysis is used to evaluate enrichment of variants in cases compared to controls). While a single publication may include both case-level and casecontrol data, individual cases should NOT be double-counted (e.g., an individual case that is part of a case-control cohort should not be given points from both the "caselevel data" and "case-control data" categories). For example, although this would be an unlikely situation, if a case from a case-control study were singled out and a pedigree was provided, this case could be evaluated with case-level data and segregations counted, but the case-control data itself should not be counted. In this scenario, a note should be made for expert review.

Genetic Evidence Summary Matrix
A matrix used to categorize and quantify the genetic evidence curated for a genedisease pair is provided below. NOTES: All variants under consideration should be rare enough in the general population to be consistent with prevalence of disease.

Case-Level Data
Assessing case-level data requires knowledge of the inheritance pattern of the disease in question and careful interrogation of the individual variants identified in each case. Within this framework, a case should only be counted towards supporting evidence if the variant identified in that individual has some indication of a potential role in disease (e.g. impact on gene function, recurrence in affected individuals, etc.). Each case may be given points for both variant evidence (see below for details on interpretation) and segregation evidence (see p. 15 for details on calculation).

TOTAL ALLOWABLE POINTS for Genetic Evidence U 12
General Notes for variant scoring: 1. When curating an autosomal dominant disease or an X-linked disorder consider the evidence types in row "A". If you are curating an autosomal recessive disease, consider the evidence types in row "B". In X-linked disorders, affected probands will often be hemizygous males and/or manifesting heterozygous females. Recognizing that there can be rare cases of females affected by Xlinked recessive disorders (due to chromosomal aneuploidy, skewed X inactivation, or homozygosity for a sequence variant), or males who carry an Xlinked variant but are unaffected or mildly affected (due to Klinefelter syndrome, 47, XXY) evaluators must be aware of the nuances of interpretation of individual cases and X-linked pedigrees. Points can be assigned at the discretion of the expert reviewer taking into account the available evidence. Furthermore, there are known cases of female carriers of X-linked recessive conditions manifesting symptoms that are milder or later in onset compared to males, and scoring of genetic evidence in these examples should be subject to expert review with regard to the assigned gene/disease/inheritance combination. 2. Computational scores (such as conservation scores, constraint scores, in silico prediction tools, variation intolerance scores, etc) are often disease and context-dependent and should not be considered as strong pieces of evidence for variant pathogenicity. However, they can be recorded during curation and used as supporting evidence for variant scoring to be confirmed by expert review. 3. For a variant to be considered potentially disease-causing, its frequency in the general population should be consistent with phenotype frequency, inheritance pattern, disease penetrance, and disease mechanism (if known). These pieces of information can often be located in the literature (See "Literature Search" p. 8), but may also be contributed by experts. If such information is available, the prevalence of the variant in affected individuals should be enriched compared to controls. The Genome Aggregation Database (gnomAD; http://gnomad.broadinstitute.org) provides a reference set of allele frequencies for various populations and can be used to assess whether the frequency of the variant in question is consistent with the prevalence of the disease. Gene curation committees may find it helpful to set a maximum allele frequency (MAF) above which a variant would be considered benign. Generally, MAF thresholds will vary as a function of disease prevalence. This MAF threshold is specific to the disease and should apply to all variants being evaluated, in the context of that disease. 4. For each case information category, a suggested number of points per case is provided. However, the points may be altered, within a defined a range, to account for the strength of evidence available to indicate that a variant is deleterious (see Figure 3). Within each range, the curator may choose one of the following scores: 0.1, 0.25, 0.5, followed by 0.5 point increments up to the maximum possible score for that category. However, the curator should always document reasons for any deviation in suggested scores for expert review. 5. When scoring variants for autosomal recessive disorders in individuals who are compound heterozygotes, there should be some evidence to suggest that the variants are in trans in order to be scored. For example, for an individual who is compound heterozygous for two variants in the gene of interest, both parents should be tested to show that the variants are in trans. Molecular methods showing that variants are in trans are also acceptable. For individuals who appear to be homozygous for a variant, testing of the parents is not required in order to count the case. a. Some functional impact to the gene product must be demonstrated for the case to be given default points. Examples of functional impact include reduced activity of an enzyme in cells expressing a variant in that gene, or reduced expression of a gene product in cells from an individual with a variant(s) in the gene. Impact based on functional validation can score 0.5 or above (up to 1.5/case) depending on the validation quality and disease relevance of the functional assay. b. In silico predictions do not provide strong evidence for functional impact and therefore, impact based on in silico predictions only would score less than the default 0.5 points. It may be appropriate to award default points if in-depth in silico modeling studies e.g. based on impact on 3D structure, have been done, but this requires discussion with an expert. c. Sum up the number of points. The suggested points per case can be found in column "E" (dominant) and "G" (recessive). Total up all of the variant evidence points and place them in "J" (dominant) or "L" (recessive), as appropriate.

Predicted or observed null variants
Some types of variants can be assumed to disrupt gene function. This category includes nonsense, frameshift, canonical +/-1 or 2 splice site variants, single or multi-exon deletion, whole gene deletion, etc). For missense and small in frame insertions and deletions, see #1: a. Assign fewer points if there is alternative splicing or if the null variant is near the C terminus and/or nonsense medicated decay (NMD) is not predicted (NOTE: NMD is not expected to occur if the stop codon is downstream of the last 50 bp of the penultimate exon). b. Consider assigning fewer points if a gene product is still made, albeit altered. For example, cDNA analysis and Western blot for an individual with a canonical splice site change show that an exon is skipped but that the reading frame is maintained and a protein is produced. c. Gene constraint scores can be helpful when assessing disease mechanism.
For example, the disease mechanism could be assumed to be loss of function (LOF) if the gene is LOF constrained. Constraint scores can be found by searching the gene in ExAC (exac.broadinstitute.org) and viewing the "constraint metric" at the top right of the page. The closer the probability of LOF intolerance (pLI) is to 1, the more LOF-constrained the gene. However, constraint scores must be interpreted in the context of the gene and disease in question. For example, if the gene is associated with multiple diseases, LOF constraint could be associated with a disease other than the one being curated. In addition, genes associated with severe, pediatric-onset disorders may appear to be more constrained than adultonset conditions where overall fitness is not impacted. d. Individuals with large deletions, duplications, and other chromosomal rearrangements encompassing genetic material outside the gene of interest should not be counted because the impact of the loss/gain for the additional material cannot be assessed. e. Sum up the number of points. The suggested points per case can be found in column "D". Total up all of the variant evidence points and place them in "I".

De novo variants:
a. These can be any type of variant, but should be given points depending on statistical expectation of de novo variation in the gene in question, if known. In some cases, this can be found in the literature and should be noted if found (See "literature search" p. 8). However, the curator may also leave this to be supplied by experts during curation review. b. In order for a variant to be considered de novo, both parents must be tested to show that they do not carry the variant. Consider awarding default points to null variants (e.g. nonsense, frameshift, canonical splice site) that appear to be de novo based on testing parents for the variant, but award fewer points to missense variants and small, in-frame deletions. The scores can be increased if the maternity and paternity of the proband are confirmed e.g. by short tandem repeat analysis or trio whole exome sequencing (WES). For example, a case with a missense variant could receive default points if maternity and paternity are confirmed. Additional points can be added for any variant if functional evidence supports a deleterious impact for the variant. c. Sum up the number of points. The suggested points per case can be found in column "C". Total up all of the variant evidence points and place them in "H". NOTE: In addition to meeting the above criteria, the variant should not have data that contradicts a pathogenic role, such as an unexplained non-segregation, etc. If the points given above for the summary matrix exceed the max score, use the Max score found in "M-P" for the summary matrix.

Segregation Analysis:
The use of segregation studies in which family members are genotyped to determine if a variant co-segregates with disease can be a powerful piece of evidence to support a gene-disease relationship.
For the purposes of this framework, we are employing a simplified analysis in which we assume the recombination fraction (θ) is zero (i.e. non-recombinants are not observed) to estimate a LOD score (see equations below). We suggest awarding different amounts of points depending on the methods used to investigate the linkage interval. For that reason, it is critical that the curator make a note of testing methodologies in families counted towards the segregation score. See below for a) instructions how to count segregations and calculate a simplified LOD score and b) how to evaluate the sequencing methods for the linkage interval and award points accordingly. Note that these are general guidelines; if you encounter cases where you are unsure how to evaluate/score segregation, please discuss with your expert group and/or the ClinGen Gene Curation working group.

Counting Segregations and Calculating Simplified LOD Scores:
If a LOD score has been calculated by the authors of a paper: This LOD score should be documented and may be used to assign segregation points (according to the sequencing methods used to investigate the linkage region and identify the variants) in the scoring matrix (see Fig 6 for scoring suggestions). If a LOD score is provided by the authors, the ClinGen curator should not use the formula(s) below to estimate a new LOD score. If for some reason you do not agree with the published LOD score, do not assign any points and discuss the concerns with the expert reviewers. See below for more guidance on scoring. Fill out the "Segregation evidence" portion of the matrix. The number of points should be recorded in "Q".
If a LOD score has NOT been calculated by the authors of a paper: Curators may estimate a LOD score using the simplified formula(s) below if the following conditions are met: o The disorder is rare and highly penetrant. o Phenocopies are rare or absent.
o For dominant or X-linked disorders, the estimated LOD score should be calculated using ONLY families with 4 or more segregations present. The affected individuals may be within the same generation, or across multiple generations.
o For recessive disorders, the estimated LOD score should be calculated using ONLY families with at least 3 affected individuals in the pedigree, including the proband). Genotypes must be specified for all affected and unaffected individuals counted; specifically, parents of affected individuals must be genotyped or other methods must be used to show that the variants are in trans if the affected individuals are noted to be compound heterozygotes.
o Families included in the calculation must not demonstrate any non-explainable non-segregations (for example, a genotype-/phenotype+ individual in a family affected by a disorder with no known phenocopies). Families with non-explainable non-segregations should not be used in LOD score calculations.
If any of the previous conditions are not met, do not use the formula(s) below to estimate a LOD score.
To be conservative in our simplified LOD score estimations, for autosomal dominant or X-linked disorders, only affected individuals (genotype+/phenotype+ individuals) or obligate carriers (regardless of phenotype) should be included in calculations. An obligate carrier is an individual who has not been tested for the variant in question but who is inferred to carry the variant by virtue of their position in the pedigree (for example, an individual with a parent with the variant and a child with the variant, an individual with a sibling with the variant and a child with the variant, etc.).
Within a given gene-disease curation, if more than one family meets the criteria above for scoring segregation information, sum their LOD scores to score (using the tables in Figures 4 or 5). For example, if Family A has an estimated LOD score of 1.2 and Family B has an estimated LOD score of 1.8, the summed LOD score will equal 3. See the discussion on sequencing method below for guidance on assigning points to the LOD score.
Expert reviewers may choose to specify the most appropriate way to approach segregation scoring within their disease domain, including enacting more formal, rigorous LOD score calculations. NOTE: Segregation implicates a locus in a disease, NOT a variant. Therefore, all linkage studies should be carefully assessed to ensure that appropriate measures have been taken to rule out other possible causative genes within the critical region (see guide on point assignment based on methods to investigate a linkage region below).

For dominant/X-linked diseases:
Z (LOD score) = log10 1 (0.5) Segregations  In general, the number of segregations in the family will be the number of affected individuals minus one, the proband, to account for the proband's genotype phase being unknown. However, as there may be exceptions, segregations should be counted carefully, as outlined below.
For example, pedigree A shows a family with hypertrophic cardiomyopathy. a. There are four segregations that can be counted beginning at the proband. This includes the mother (II-2) who is an obligate carrier and can be assumed to be genotype-positive even though she was not tested. Using four segregations in the formula above results in an estimated LOD score of 1

Unaffecteds
Affecteds individual who can be definitively inferred to be genotype positive based on the genetic status of other family members, as discussed above) should also be included, regardless of phenotype. In this case, the absence of a phenotype in two genotype+ individuals (III-2 and III-5) is considered irrelevant as they can be explained by delayed onset and/or reduced penetrance. However, these individuals are not included in the calculation because they are unaffected.

2.
When estimating LOD scores for autosomal recessive disorders, count unaffected individuals as those who would be at the same risk to inherit two altered alleles as an affected individual, i.e. homozygous normal or heterozygous carrier siblings of a proband. For example, there are two unaffected individuals in Pedigree B, one unaffected individual in Pedigree C, and two unaffected individuals in Pedigree D.

3.
For reasonably penetrant Mendelian disorders, a single LOD score can be calculated across multiple families, providing that each family meets the criteria above. For example, in pedigrees B, C and D, each with fully penetrant recessive hearing loss, the LOD scores can be added ((1.5 for B) + (1.3 for C) + (1.5 for D)) to give a total LOD score of 4.3. However, pedigree E cannot be included in this LOD score total because this family does not have enough affected individuals.

Assigning points to LOD scores:
While segregation evidence can be convincing for a particular locus, 10s or even 100s of genes can be within a linkage interval. Thus, segregation does not necessarily implicate a single gene or variant. Many publications do not thoroughly investigate other genes or variants found in the linkage interval and those that do cannot rule out the effects of thousands of other variants in the interval. Thus, it is critical for a curator to evaluate the methods used to identify candidate variants. Some publications more thoroughly investigate the genes and variants in a linkage interval than others. Accordingly, more points are awarded for segregation evidence in cases where whole exome/genome sequencing was performed or if the entire linkage interval was sequenced. These methods provide more convincing evidence than a candidate gene approach in which only one or a handful of genes in a linkage region are sequenced. See figure 6 below for suggested point ranges for LOD scores.

NOTE:
For this scoring matrix, LOD scores from all families meeting size requirements must be summed before awarding segregation points, regardless of the sequencing methodology used. Sequencing methodology (e.g., candidate gene sequencing, whole exome sequencing, etc.) should be taken into account when deciding on the most appropriate score for this evidence. See example 2 below for an example of scoring multiple families with variants ascertained via different methodologies. Note that simply having a single family meeting the minimum size requirements is not necessarily enough to warrant any points. As the methods in each publication vary, the suggested points in figure 6 are merely a guide for the curator. There are 11 affected individuals in the pedigree (phenotype +, genotype +), and using our simplified LOD score formula, this corresponds to a LOD score of 3 (see Figure 4). The linkage region for this family contained 15 genes and the authors sequenced all of the genes in the linkage interval and the HCM variant was the only suspicious variant. Looking at Figure 6, you can award this LOD score 2 points.
Example 2: Let's return to Pedigrees B, C, and D above, assuming now that we know more about how the linkage intervals were investigated or how the variants were identified. Pedigree B: LOD Score 1.5, Variants identified using whole exome sequencing Pedigree C: LOD Score 1.3, Variants identified using whole exome sequencing Pedigree D: LOD Score 1.5, Variants identified using candidate gene analysis. Only the gene of interest was sequenced. First, we would sum the LOD scores across families, which gives us a LOD score of 4.3. Because the variants were detected using two different methods, we can opt to split the difference between the suggested point values of 1 for candidate gene sequencing and 2 for whole exome sequencing and award this segregation analysis 1.5 points.
We recognize that the methods in each publication vary, therefore the suggested points in figure 6 are merely a guide for the curator. If curators are unsure of segregation scoring based on genotyping method, please consult experts.

Case-Control Data:
Case-Control studies are those in which statistical analysis is used to evaluate enrichment of variants in cases compared to controls. Each case-control study should be independently assessed based on the criteria outlined in this section to evaluate the quality of the study design. Consensus with a clinical domain expert group is highly recommended. Case-control studies should be assigned points at the discretion of expert opinion based on the overall quality of each study. Assign each study a number of points between 0-6, then sum the points given to all studies, and fill in "S". NOTE: If the points given exceed the max score, use the Max score found in "T" for the summary matrix. 3. The quality of each case-control study should be evaluated using the following criteria in aggregate: a. Variant Detection Methodology: Cases and controls should ideally be analyzed using methods with equivalent analytical performance (e.g. equivalent genotype methods, sufficient and equivalent depth and quality of sequencing coverage). b. Power: The study should analyze a number of cases and controls given the prevalence of the disease, the allele frequency, and the expected effect size in question to provide appropriate statistical power to detect an association. (NOTE: The curator is NOT expected to perform power calculations, but to record the information listed in this section for expert review.) c. Bias and Confounding factors: The manner in which cases and controls were selected for participation and the degree of case-control matching may impact the outcome of the study. The following are some factors that should be considered: i. Are there systematic differences between individuals selected for study and individuals not selected for study (i.e. do the cases and controls differ in variables other than genotype)? ii. Are the cases and controls matched by demographic information (e.g., age, ethnicity, location of recruitment, etc.)? Are the cases and controls matched for genetic ancestry, if not did investigators account for genetic ancestry in the analysis? iii. Have the cases and controls been equivalently evaluated for presence or absence of a phenotype, and/or family history of disease? d. Statistical Significance: The level of statistical significance should be weighed carefully. i. When an odds ratio (OR) is presented, its magnitude should be consistent with a monogenic disease etiology. ii. When p-values or 95% confidence intervals (CI) are presented for the OR, the strength of the statistical association can be weighed in the final points assigned. iii. Factors, such as multiple testing, that might impact that interpretation of uncorrected p-values and CIs should be considered when assigning points. NOTE: Point totals should NOT exceed the max score. If the totals from "H-Q" exceed the max score, use the max score found in "U" for the genetic evidence portion of the summary matrix. Please prioritize curating genetic evidence over experimental evidence to reach a definitive score.

Figure 7: Case-control Genetic Evidence Examples
Detailed explanations for assigned points are provided below the This study can be assigned 2 points because a population database was used rather than appropriately-matched controls (i.e. the study is not matched demographically) and the p-value is not very significant. A population database could be used as controls for 2 reasons: a. Both the cases and controls were sequenced for the entire gene Y. b. The total number of individuals with null variants (i.e. nonsense, canonical splice-site, and frameshift) was compared between cases and controls. Study receiving no score (0 points): While this study is similar to the study receiving 2 points, the detection method differed between cases and controls (i.e. cases were sequenced, controls were genotyped). In the cases, gene Z was sequenced. However, only the controls with a specific variant were used for comparison to the cases.
Although this study cannot be counted as case-control data, it can be counted as caselevel data. The suggested points/evidence can be found in column "K". Total up all of the experimental points and place them in the points given section found in "V". NOTE: If the sum of all models and rescue points exceeds the max score of 4, use the Max score found in "Y" of the experimental evidence summary matrix.

EXPERIMENTAL EVIDENCE
Total up the total number of experimental evidence points from Rows "W-Y" and enter them on Row "Z". NOTE: If the total experimental evidence points exceed the max score, use the Max score of 6 points for the summary matrix. Please prioritize curating genetic evidence over experimental evidence to reach a definitive score.
For specific examples of different pieces of experimental evidence, please see Appendix B.

Variant evidence vs experimental evidence
Not all evidence supports the role of the gene in the disease. Therefore, the curator must carefully consider whether to count functional evidence in the experimental evidence section or in the case-level data section. Only evidence that supports the role of the gene in the disease should be counted in the experimental evidence section. Experimental evidence that does not directly support the role of the gene in the disease but indicates that the variant is damaging to the gene function can, instead, be used to increase points in the case-level data section. Some very general examples are given below. Please note that these examples are a guide. Each piece of evidence should be carefully considered when deciding on which category to assign points. These decisions should be discussed with experts in the disease area, if needed.

Variant evidence, general examples:
• Immunolocalization showing that the gene product is mislocalized in cells from a patient or in cultured cells. This would be counted as case-level evidence UNLESS mislocalization/accumulation of an altered gene product is a known mechanism of disease, in which case this evidence could be counted as experimental evidence (functional alteration). • Mini-gene splicing assay or RT-PCR showing that splicing is impacted by a splice-site variant. • A variant in a gene encoding an enzyme is expressed in cultured cells and enzyme activity is deficient. • A variant is shown to disrupt the normal interaction of the gene product of interest (protein A) with another protein (protein B). NOTE: If protein B is implicated in the same disease, the interaction can be counted in experimental data (Function: protein interaction), and the lack of interaction due to the variant can be counted as case-level data.

Experimental evidence, general examples:
• A signaling pathway is known to be involved in the disease mechanism. Expression of a missense variant in cells shows that the gene product can no longer function as part of this pathway. • The variant is shown to be associated with a known hallmark of the disease e.g.
abnormal deposition or mislocalization of a gene product, abnormal contractility of cells etc., either in patient cells or cultured cells expressing the variant. • Study showing enzyme deficiency in tissues of many patients, leading to conclusion that deficiency of the enzyme causes the disease e.g. early studies showing enzyme deficiency in individuals with a metabolic disorder. • Any model organism with a variant initially identified in a human with the disorder.

CONTRADICTORY EVIDENCE
NOTE: This designation is to be applied at the discretion of clinical domain experts after thorough review of available evidence. The curator will collect the contradictory evidence and the classification (Disputed/Refuted) is to be determined by the clinical domain experts. Below are a few examples of contradictory evidence. Note that this list is not all-inclusive and if the curator feels that a piece of evidence offers evidence that does not support the gene-disease relationship, this data should always be recorded (Summary and PMIDs) and pointed out for expert review.
1. Case-control data is not significant: As case-control studies evaluate variants in healthy vs affected individuals, if there is no statistically significant difference in the variants between these groups, this should be marked as potentially contradictory evidence for expert review. See case-control examples above (p.22, Fig. 7) NOTE: Evidence contradicting a single variant as causative for the disease does not necessarily rule out the gene:disease relationship.
2. Minor allele frequency is too high for the disease: Many diseases have published prevalence, which can often be found in the GeneReviews entry. If ALL minor alleles in a gene are present in a specific population or the general population (ExAC, ESP, 1000Genomes) at a frequency that is higher than what is estimated for the disease, this could suggest lack of gene-disease relationship and should be marked as potentially contradictory evidence for expert review. For example, Adams-Oliver syndrome is an autosomal dominant disease and has a prevalence of 0.44 in 100,000 (4.4e-6) live births. If a new gene were being curated for this disease and supposedly pathogenic variants were identified with an allele frequency in ExAC of 0.4882, this could be potentially contradictory evidence. NOTE: Evidence contradicting a single variant as causative for the disease does not necessarily rule out the gene:disease relationship. Additionally, disease prevalence can vary in different populations, so read the GeneReviews entry thoroughly and keep demographic information in mind during this evaluation.
3. The gene-disease relationship cannot be replicated: One measure of a genedisease relationship is its replication both over time and across multiple studies and disease cohorts. If a study could not identify any variants in the gene being curated in an affected population that was negative for other known causes of the disease, this could be considered potentially contradictory evidence and should be marked for expert review. However, when assigning this designation, a curator need consider disease prevalence. If a disease is rare, a small study may not identify any variants in the curated gene. For example, Perrault syndrome is characterized by hearing loss in males and ovarian dysfunction in females and only 100 cases have been reported. Thus, if a study with a small cohort does not identify any variants in a gene being curated for this syndrome, this may not necessarily be evidence against gene-disease association. In any case, if a curator suspects that any evidence supports a lack of gene-disease association, it should be marked for expert review.

Non-segregations:
Non-segregations should be considered carefully, as agedependent penetrance and phenotyping of relatives could have an impact on the number of apparent non-segregations within a family. Thus, the age of unaffected variant carriers should be of similar age to the affected variant carriers. If a curator suspects non-segregations, these should be noted for expert review.
5. Non-supporting functional evidence: The types of different experimental evidence are detailed in the "Experimental Evidence" Section (p. 23). If any of this experimental evidence suggests that variants, although found in humans, do not affect function or that the function is not consistent with the established disease mechanism, this evidence should be marked as potentially contradictory evidence for expert review. For example, if a gene were being curated for a disease association and the mouse model did not have any phenotype, this could be potentially contradictory evidence.

SUMMARY AND FINAL MATRIX
A summary matrix was designed to generate a "provisional" clinical validity assessment using a point system consistent with the qualitative descriptions of each classification. This final gene curation matrix and instructions for filling it out can be found below.
Fill in the "Gene/Disease Pair" at the top of the matrix.