Prioritizing genes for systematic variant effect mapping

Abstract Motivation When rare missense variants are clinically interpreted as to their pathogenicity, most are classified as variants of uncertain significance (VUS). Although functional assays can provide strong evidence for variant classification, such results are generally unavailable. Multiplexed assays of variant effect can generate experimental ‘variant effect maps’ that score nearly all possible missense variants in selected protein targets for their impact on protein function. However, these efforts have not always prioritized proteins for which variant effect maps would have the greatest impact on clinical variant interpretation. Results Here, we mined databases of clinically interpreted variants and applied three strategies, each building on the previous, to prioritize genes for systematic functional testing of missense variation. The strategies ranked genes (i) by the number of unique missense VUS that had been reported to ClinVar; (ii) by movability- and reappearance-weighted impact scores, to give extra weight to reappearing, movable VUS and (iii) by difficulty-adjusted impact scores, to account for the more resource-intensive nature of generating variant effect maps for longer genes. Our results could be used to guide systematic functional testing of missense variation toward greater impact on clinical variant interpretation. Availability and implementation Source code available at: https://github.com/rothlab/mave-gene-prioritization Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
Clinical genetic testing is frequently performed for carrier screening and for the detection and diagnosis of disease. The rapid decline in sequencing costs, coupled with increased demand for genetic testing and a shift in the standard of care toward larger gene panels, has resulted in rapid increases in the number of previously unseen variants (Blazer et al., 2015).
Clinical variant interpretation is required to determine which variants are worthy of a physician's concern. In ClinVar, a widely used resource for reporting clinical variants, $40% of all variants reported in disease-implicated genes are missense variants (Landrum et al., 2016). Unfortunately, more than half of these missense variants are classified as variants of uncertain significance (VUS) (Starita et al., 2017;Weile and Roth, 2018). Although physicians and genetic counselors do flag VUS emerging from a diagnostic genetic test for future follow-up and reinterpretation, VUS results are generally not considered clinically actionable (Hoffman-Andrews, 2017). Therefore, evidence that can help reclassify VUS as 'benign', 'likely benign', 'pathogenic' or 'likely pathogenic' could substantially impact the value of genetic testing for patient care. Although well-established functional assays, such as complementation (Lee and Nurse, 1987;Osborn and Miller, 2007) or in vitro biochemical activity assays (Guidugli et al., 2014;Millot et al., 2012), are used as evidence for variant interpretation under current American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines (Richards et al., 2015;Brnich et al., 2018;Brnich et al., 2019), such assays are resource-intensive and have not generally been performed for rare clinical missense variants. Multiplexed assays of variant effects (MAVEs) are an emerging tool to provide systematic experimental testing of nearly all missense variants for selected protein targets (Starita et al., 2017). Missense variant effect maps have, for some genes, been shown to outperform general computational predictors of pathogenic variation (previously trained on variants across many genes) (Sun et al., 2020;Weile et al., 2017). For example, a variant effect map for the human CBS gene was generated in yeast cells based on the ability of each variant to complement the loss of the yeast CBS ortholog CYS4. This MAVE study detected almost four times more pathogenic variants than a computational approach, at the same stringency (Sun et al., 2020).
Because MAVE technology has emerged only recently (Fowler and Fields, 2014), full-length variant effect maps are available for fewer than 30 human proteins (Esposito et al., 2019;Weile and Roth, 2018). Moreover, many of these maps cover proteins for which very few difficult-to-interpret missense VUS have been reported. As the output of MAVE studies rapidly increases (Weile and Roth, 2018), it would be helpful to have more guidance on the protein targets for which variant effect maps could have the greatest impact on clinical variant interpretation. Here, we implement three strategies, each building on the previous, to prioritize gene targets of missense variant effect mapping studies.

Clinical variants
We used ClinVar (Landrum et al., 2016) to assemble missense variants that had been interpreted in the context of clinical genetic testing ('clinical testing'), as opposed to variants in the literature that were not clearly interpreted in the context of clinical testing ('literature-only'), and had been assigned a clinical interpretation of 'uncertain significance' by all submitters. Variants with conflicting interpretations were removed.
Missense variants were also extracted from Invitae's clinical variant database, as were the number of patients in which each variant was found, the clinical area of the patients' tests, and the most recent interpretation of each variant. Because our repeated observation (i.e. reappearance) measure could potentially over-weight genes that had been subject to more intensive cascade screening of individuals related to the primary proband (i.e. extensive family member testing), we included only variants from probands who were not known to be related. Variants were interpreted using the Sherloc system, for which both methods and validation have been published (Nykamp et al., 2017). Interpreted variants were stripped of all protected health information (i.e. de-identified) under an approved protocol from the Western Institutional Review Board (IRB #20161796). As previously described, the Sherloc system offers a series of detailed refinements to the ACMG/AMP variant classification criteria to capture exceptions and special cases. For all VUS in the database, we retrieved the semiquantitative scores that had been assigned given supporting evidence types. Briefly, each evidence criterion was awarded a preset number of points on the benign (1B-5B) and pathogenic (1P-5P) scales, and total benign and pathogenic points were calculated separately (see Nykamp et al., 2017 for details on the Sherloc system). To be classified as 'likely benign' or 'likely pathogenic', a variant had to receive at least three benign points (i.e. 3B) or four pathogenic points (i.e. 4P), respectively. For instance, a variant that is absent from the Exome Aggregation Consortium (ExAC), a large-scale reference human genetic variation dataset (Lek et al., 2016), would receive 1P, and another 3P if it were seen in four unrelated clinical case reports, so that it would receive a total of four pathogenic points and be classified as 'likely pathogenic', in the absence of other evidence.

Accounting for 'movability' of variants
Variants annotated as VUS were considered 'movable' if new functional evidence (i.e. evidence from functional assays) would be sufficient to change the variant interpretation to a non-VUS category. We computed the number of movable VUS under two scenarios: functional evidence was 'strong' and functional evidence was 'weak', conveying either two and a half points or one point, respectively, within the Sherloc system (Nykamp et al., 2017). For VUS to be 'moved', a variants's total (after considering new strong or weak functional evidence) would need to either reach four pathogenic points (4P) or three benign points (3B). For example, VUS previously receiving two pathogenic points (2P) and no benign points (0B) based on other scoring criteria, if found to significantly reduce fitness in a human cell line assay and if the interpreter considered this to be strong functional evidence worth 2.5P, would now have a total of 4.5P and be moved to 'likely pathogenic'. Alternatively, variants that previously received no pathogenic or benign points (0P, 0B) could not reach 4P even with the same piece of strong functional evidence, and would not be considered 'movable'. This analysis accounted for pre-existing functional evidence (e.g. VUS for which strong functional evidence had already been considered in the classification would not be considered movable). Any VUS with substantial evidence for a benign classification (>3B) and substantial evidence for a pathogenic classification (>4P) was not considered movable, because new functional evidence is unlikely to resolve the conflict.

Giving extra weight to reappearing variants
To score the tendency of a gene to harbor clinical variants that are repeatedly observed, we divided (for each gene) the number of patients in which a variant had been observed by the number of unique variants observed for that gene. To reduce the impact of common variants on our analysis, we capped the number variant observations at seven (treating variants seen in more than seven patients as though they had been observed in only seven). Applying this cap, 95% of VUS were left unchanged ( Supplementary Fig.  S1a).

Modeling the reappearance-weighted fraction of movable variants
We used the Invitae clinical variant database to calculate two coefficients that were subsequently used to modify variant effect map impact estimates for each gene. The first coefficient, the fraction of all variants that were movable (i.e. movability fraction, or M), was calculated as: (1) where CðgÞ is the unique VUS count for gene g and C movable ðgÞ is the unique movable VUS count for gene g. To limit the effects of small sample size for genes with fewer observed variants, we used a more conservative estimation approach (i.e. regularized movability fraction, orM) that returned values closer to the average M value (M avg ) where less data were available. This was calculated using: where CðgÞ is the unique VUS count for gene g and C pseudo is a pseudocount (see next page for a description of pseudocount selection) added to allow processing genes for which no movable VUS were observed in the Invitae dataset. The second coefficient we calculated gives extra weight for repeated observations (R) of a variant. This 'reappearance coefficient', which captures the ratio of patients to unique variants while limiting the extra weight provided by capping the number of occurrences, was calculated by: where PðgÞ is the number of patients observed to have a VUS for gene g. To limit the effects of small sample size for genes for which fewer patients had been observed, we calculated a regularized R (R) that returned values closer to the average R value (R avg ) when less data were available:R where CðgÞ and C pseudo are the unique VUS count for gene g and a pseudocount, respectively. For each regularization step, we used C pseudo ¼ 8. Thus, e.g. a gene with only two observed VUS had an M or R value that was 80% [8 pseudocounts/(8 pseudocounts þ 2 observed VUS) ¼ 80%] driven by the average value across all genes, while M or R for a gene with more than 72 VUS was more than 90% [72 observed VUS/(8 pseudocounts þ 72 observed VUS) ¼ 90%] driven by VUS observed in that gene. A pseudocount enabled the model to process datasets (e.g. ClinVar) where movability fraction (M) and reappearance (R) are not available, as long as a dataset (e.g. Invitae) is present to provide the average movability and reappearance.
Our simplest measure of impact score (S ClinVar ) was calculated simply as the unique VUS count: where CðgÞ is the unique VUS count for gene g in ClinVar. We then used the above-described coefficients of movability and reappearance to calculate a movability-and reappearance-weighted impact score (MARWIS) (S MARWIS ) for each gene with at least one VUS in the ClinVar database: where CðgÞ is the unique VUS count for gene g. MARWIS was calculated either using the count of unique VUS from ClinVar (C ClinVar ) or from Invitae (C Invitae ), yielding S MARWIS ðg; C ClinVar Þ and S MARWIS ðg; C Invitae Þ, respectively. In calculating S ClinVar and S MARWIS scores, the goal was to estimate absolute impact on clinical variant interpretations that a variant effect map might provide, so we did not consider measures of difficulty of the project, e.g. by considering the length of each protein.

Difficulty-adjusted impact score (DAIS)
For each protein, a DAIS (S DAIS ) was calculated to account for the increased difficulty of variant effect mapping for longer proteins: where D L ðgÞ is the protein length [i.e. number of amino acids, based on the canonical isoform according to the Ensembl database (Yates et al., 2019)]. The denominator estimated the resources that would be required for variant effect mapping, as the sum of length-dependent (D L ) and -independent (D fixed ) costs, with length-independent costs modeled as equivalent to the length-dependent costs of a 300 amino acid protein (D fixed ¼ 300). Length-independent costs were introduced to capture costs such as assay development and assay validation experiments, as well as length-independent aspects of downstream analysis. Figure 1 shows the three strategies we used to prioritize genes according to how useful a variant effect map could be in classifying variants for those genes. The first strategy ranked the genes by the number of unique missense VUS that had been reported to ClinVar ('VUS count'). The second strategy ranked genes by MARWIS to give extra weight to reappearing, movable VUS. The third strategy ranked genes by DAIS, which rescale the MARWIS to account for the more resource-intensive nature of generating variant effect maps for longer genes. Analysis scripts, datasets and raw results are available via GitHub (https://github.com/rothlab/mave-gene-prioritization).

Application of prioritization strategies
The three strategies-using (i) 'unique VUS count', based on the number of unique VUS in ClinVar; (ii) MARWIS, the ClinVar VUS count modified by Invitae-derived movability and reappearance coefficients and (iii) DAIS, which adjusted the MARWIS with a rough estimate of the relative difficulty of producing a variant effect map-resulted in three unique lists of the 20 highest-priority genes ( Genes in bold were common to all three top 20 lists.
not be addressed using only the public ClinVar resource, which does not provide information about movability. Although ClinVar can contain multiple reports for the same variant, the number of such reports is a poor proxy for the rate of repeated observation, given that providers will typically only provide a new report to ClinVar upon observation in a new patient if their interpretation has changed.
To obtain information about movability and reappearance for each VUS, we mined data from the Invitae database, which encompassed 411 782 missense VUS observations (218 096 unique missense VUS in 1921 genes). Following ACMG/AMP guidelines, as implemented by the Sherloc system (Nykamp et al., 2017) used by Invitae for interpretation, we established that in the Invitae database, only 9.6% of VUS were potentially movable on the basis of strong, new, functional evidence (3.7% to benign or likely benign, 6.9% to pathogenic or likely pathogenic and 1% to either category given the direction of functional evidence).
After calculating movability and reappearance coefficients for each gene (see Section 2), we combined the two coefficients with the VUS count from ClinVar to estimate an MARWIS for each of the 3646 genes with missense VUS in ClinVar. To limit the dependence of gene prioritization on the experience of any one clinical genetics provider, we only used ClinVar variants to derive the unique VUS count. However, results based on unique VUS counts from Invitae yielded similar MARWIS (Pearson correlation, r ¼ 0.90) (Fig. 2). We applied reduced major axis regression, which is more appropriate than ordinary least-square regression when a bivariate relationship is symmetrical (Smith, 2009).
Ranking ClinVar genes by unique VUS count and by MARWIS yielded substantial agreement (Pearson correlation, r ¼ 0.92) (Fig. 3), but with many noteworthy exceptions. For most genes that were extreme outliers, the MARWIS were higher than those for other genes with similar unique VUS counts, indicating that consideration of variant reappearance and movability can substantially alter prioritization. For example, as shown in Figure 3, the MYH7 MARWIS was nearly five times the score projected from the MYH7 unique VUS count by the linear regression model Although the MARWIS approach allows ranking of genes for which a variant effect map would provide the greatest impact on clinical variant interpretation, it does not account for the idea that producing a variant effect map is more resource-intensive for a longer protein than for a shorter one. The top 20 MARWIS-ranked genes encoded proteins with an average length of 3674 amino acids (based only on the annotated canonical isoform), as compared with an average length of 842 amino acids over all proteins scored. Therefore, we developed the DAIS, which divides the MARWIS by a gene-specific estimate of relative difficulty. This estimate was necessarily crude, as determining the best functional assay (e.g. in vitro test versus complementation in human cells) for each disease-associated protein and its relative cost is not currently feasible. A detailed analysis for limitation of this estimation and ways to further prioritize ranked genes are explored in Section 4.

Comparison of rankings
Examining the 'top 20' genes from each of the three lists, we found 10 genes common to all lists: BRCA2, ATM, APC, MSH6, NF1, BRCA1, POLE, MSH2, PALB2 and BRIP1 (Table 1) (Gossage et al., 2015). Its importance in genetic disease already suggested that it should be a high-priority target for proactive functionality testing, but the fact that it is a relatively short protein (178 amino acids) suggests that it should be further prioritized. The 'top 40' list for this preferred strategy is provided in Table 2, annotated by previously conducted systematic variant mapping studies, number of unique missense variants and VUS in ClinVar, typical mode of inheritance, whether or not pathogenic variants tend to be gain-of-function or loss-of-function, and the clinical categories to which each gene belongs.

Sensitivity of results to parameter choices
To assess whether our results were sensitive to the parameters of our analysis, pseudocounts ranging from 1 to 10 were explored. High correlation between rankings generated with different pseudocount choices suggested that our results were not highly sensitive to pseudocount choice ( Supplementary Fig. S2, lowest observed r ¼ 0.999). Next, we varied the choice of cap on the number of repeated observations from 1 to 10. High correlation between ranks suggested that our results were also not highly sensitive to this choice ( Supplementary Fig. S1b, lowest observed r ¼ 0.96). Finally, lengthindependent costs of a protein ranging from 0 amino acids (i.e. no length-independent cost) to 500 amino acids (assuming lengthindependent costs are equivalent to the length-dependent costs for a protein of length 500) were explored. Our rankings were also not highly sensitive to the choice of length-independent cost ( Supplementary Fig. S3, lowest observed r ¼ 0.94).

Discussion
Previous studies have provided metrics to rank human genes based on how likely they are to tolerate functional genetic variation (Lek et al., 2016;Petrovski et al., 2013). Despite being useful for inferring gene essentiality, such measures do not predict the burden of clinical missense variants annotated as VUS. Indeed, there are many examples of absolutely essential genes that are not associated with genetic disease (e.g. because defects yield embryonic lethality).
Here, we examined three strategies for ranking genes according to the potential for MAVEs to assist clinical variant interpretation, and further provided a list of prioritized genes using each strategy. We recommended the most nuanced DAIS strategy because it takes into account protein length as at least one element of difficulty in variant effect mapping.
Some of the prioritized genes we identified had already been subjected to systematic functional testing. For example, a recent MAVE study on functionally critical domains of the BRCA1 gene identified more than 400 non-functional missense variants that might improve clinical interpretation of BRCA1 variants (Findlay et al., 2018). High throughput assays performed on the TP53 gene (Bhagavatula et al., 2017;Giacomelli et al., 2018;Kotler et al., 2018) revealed novel loss-of-function and gain-of-function missense variants that might facilitate interpreting TP53 clinical variants. However, variant effect mapping studies are currently available for only 4 of the top 40 DAIS-ranked genes (Table 2). To our knowledge, no variant interpretation currently reported in ClinVar has yet made use of MAVE evidence.
As the MAVE community continues publishing high-quality MAVE studies, it is potentially beneficial to re-evaluate the effectiveness of improving clinical variant interpretation with MAVE when more MAVE data become available.
We note that our ranking schemes, which sought to maximize impact on clinical variant interpretation, should not be confused with rankings by overall clinical impact of variant interpretation. Such a ranking would also need to consider clinical actionability. A more complex accounting of the clinical impact gained by improved clinical interpretation would be a valuable future direction. Although this would be an enormous undertaking that falls outside the scope of this work, we expect that the scoring schemes described here could contribute to such an effort.
Another limitation was that we did not separately account for the effect of somatic variants when estimating impact scores. Although some of the variants reported in ClinVar have been observed somatically, most are germline variants (Landrum et al., 2018). While TP53 is one of the most frequently somatically mutated genes in human cancers (Olivier et al., 2010), with missense variants reported in more than 25 000 cancer samples in the Catalogue of Somatic Mutations in Cancer (Tate et al., 2019), somatic variants represent only 74 (11%) of the 649 missense VUS for TP53 in ClinVar. Thus, our prioritization largely ignored the impact that variant effect maps could have on interpreting somatic variants. Prioritizing genes based on a more inclusive combination of somatic and germline variation, as well as other variant types (e.g. in-frame deletions or insertions, intronic or promoter variation) is an avenue for future investigation.
We did not consider reasons to study the functional impact of human variation beyond that of improving clinical variant interpretation. For example, MAVE studies can provide clues about functional protein domains (e.g. by revealing potential protein interaction interfaces) (Weile et al., 2017).
Our study calculated movability in terms of what fraction of VUS could potentially be reclassified with the appearance of strong new functional evidence. Of course, the number and identity of movable variants will ultimately depend on the quality of the new functional evidence and on the variant interpreter's judgment (following ACMG/AMP guidelines) of the strength of evidence it provides. As a result, the impact of a variant effect map on clinical interpretation should ultimately be assessed for each individual gene and map. As such analysis emerges, we imagine that any trends could be considered in a priori prioritization of other genes (e.g. to upweight priorities for well-conserved metabolic enzymes if it were found that maps for these genes tended to provide stronger evidence).
It would also be of interest to determine the genes and variants for which systematic gathering of other types of evidence (e.g. cosegregation of phenotypes within families), either alone or together with functional evidence, could affect variant classification. This type of analysis could also consider synergism between functional evidence and other evidence types. For example, systematic analysis of cosegregation could not only enable reclassification for some VUS, but also place others within range of reclassification based on new functional evidence, thereby increasing the impact of a variant effect map.
Although current ACMG/AMP guidelines make no recommendations as to whether variant effect maps provide strong or weak evidence for benign or pathogenic classification, published studies have demonstrated the value of variant effect maps in identifying pathogenic variants (Sun et al., 2020;Weile et al., 2017). However, although this study defined a VUS to be movable to pathogenic/likely pathogenic or to benign/likely benign given strong functional evidence, no similar studies have investigated the value of these maps in identifying benign variants with confidence. Thus, the determination of movability should be revisited as a better understanding of the evidentiary performance of MAVEs emerges.
Our prioritization strategies implicitly considered reclassification from VUS to pathogenic/likely pathogenic to have the same clinical value as reclassification from VUS to benign/likely benign. However, future ranking schemes might place more clinical value on reclassification toward pathogenicity. Our model also did not account for the fact that VUS in known cancer driver genes might have more clinical impact than a VUS in a gene where disease association has not been firmly established. In theory, a less disease-relevant gene with many VUS could be ranked higher than a known diseasecausing gene with very few VUS. However, genes for which a disease association is not clear are less likely to be included in clinical genetic tests; as a result, these genes should have fewer VUS identified and hence lower prioritization by our ranking strategies. Although we recommend DAIS as it balances benefits against costs, the estimation of difficulty used in DAIS was necessarily crude, as no systematic measure is available to determine the best functional assay for each disease-associated protein and its relative cost. Therefore, after identifying a 'short list' of genes, users may wish to reweight MARWIS based on their own estimates of relative difficulty. There are other ways to refine the relative priorities of genes that were top-ranked by one of our scores. For instance, one might prefer essential genes in more tractable cell lines given the relative ease of assay development. When multiple alternative functional assays are available, users may wish to separately adjust not only for the difficulty of each alternative assay, but also by basing weight on expected fidelity or evidentiary value of that assay. However, because such criteria tend to vary between users, they were not included as general adjustment parameters in this study.
Because the quality of an MAVE study depends on the validity of the functional assays it uses, a quality assurance step is required to benchmark the robustness of functional assays by evaluating a list of variants known to be pathogenic or likely pathogenic and variants known to be benign or likely benign. As a result, in Supplementary Table S1, we provide (i) the number of unique pathogenic, likely pathogenic, likely benign and benign variants in ClinVar and (ii) the number of unique VUS, the number of occurrence of VUS, the number of movable VUS and the number of occurrence of movable VUS in the Invitae database for the top 100 genes ranked by DAIS. A threshold might be applied to further prioritize genes with enough variants with clear clinical classification. To facilitate functional assay selection, we developed MaveQuest, an online resource for planning experimental tests of human variant effects (Kuang et al., 2020). For example, for CHEK2, which was ranked third by DAIS, MaveQuest suggests multiple potential functional assays. One possibility is a CRISPR knockout assay supported by a negative viability phenotype observed in a systematic CRISPR knockout screen (Wang et al., 2015). Another is a trans-species complementation assay based on the ability of human CHEK2 to complement the loss of the yeast ortholog RAD53 (Roeb et al., 2012). Calculation of MARWIS was only possible given coefficients derived from data on movability and repeated observations of VUS, such as those provided by Invitae. Although VUS counts were based on the more inclusive ClinVar dataset, genes for which Invitae offers tests had greater opportunity for increased (or decreased) prioritization based on movability and reappearance. We note that the composition of variants in ClinVar necessarily inherits biases from Invitae and major submitters to ClinVar. These biases could be viewed as a feature rather than a limitation, in that they should collectively tend to favor genes for which genetic assays are of greatest clinical interest. However, different clinical genetic services may differ in terms of their focus on particular genes and disease areas (e.g. somatic variation within tumor genomes). The modeling process we describe can be tuned to blend coefficients from multiple sources. Therefore, other organizations could enhance this prioritization process through similar sharing of de-identified clinical variation data collected under approved protocols.
In conclusion, our study explored three ways to prioritize genes according to the clinical impact of systematically testing missense variant functions. Through use of the DAIS strategy in particular, we offer a list of genes prioritized by potential benefit in clinical variant interpretation, as weighted by a measure of difficulty in producing the maps. As more VUS are identified in annotated disease genes, and as more genes are implicated in disease, the priority list of genes will evolve. Therefore, the prioritization exercise reported here should be periodically revisited.