Utility of Exome Sequencing Databases in Validating Genetic Variants Associated with Multiple Endocrine Neoplasia

Background: Multiple Endocrine Neoplasia (MEN) syndromes and Familial Medullary Thyroid Cancer have a well-documented genetic origin; However, it is not always clear whether genetic variation represents a pathologic vs. normal rare variant. Objectives: We aim to assess the validity of published variants using online exome sequencing databases, and to identify undiscovered variants which potentially cause disease. Methods: A literature search of PubMed, OMIM, and online MEN2 databases was conducted to include genetic variants thought to be causative of MEN. Published variants were compared against the Exome Variant Server (EVS) and 1000 Genomes Project. Results: There were 40 publications which yielded 170 unique variants implicated in the pathogenesis of MEN. Of these, 47 variants were found within exome sequencing databases. Six published variants were found within sequenced populations at inappropriately high frequencies. Exome sequencing data analysis revealed seven potentially causative variants not found in the literature. Conclusions: The current MEN literature is robust in that most published variants are absent or rare within exome sequencing databases. However, some variants may be inappropriately implicated with causing disease. Additionally, the EVS identified undescribed variants which may be of interest.


Introduction
Multiple Endocrine Neoplasia (MEN) syndromes and Medullary Thyroid Cancer (MTC) have a pathogenesis with a well-documented genetic origin. This group of diseases is divided in to MEN1, MEN2a, and MEN2b based on the specific constellation of tumors and phenotype that arises. Mutations in the MENIN and RET genes cause MEN1 and MEN2a/b, respectively, and specific mutations significant prognostic implications. As attention has turned toward understanding the genetics of tumorigenesis, a library of implicated mutations has been generated. Most of these take the form of Single Nucleotide Polymorphisms (SNPs) that result in missense mutations. While some of these mutations have a very strong link with development of the phenotype, many published variants occur too infrequently within study populations to reliably determine a causative role in dis-ease. The process by which these mutations are deemed abnormal has traditionally relied on inference stemming from the degree of sequence conservation across species [1,2]. This method of determining genetic normality can unintentionally classify non-pathologic variants as potentially disease-causing, while individuals may harbor rare variants of unknown functional consequence [3][4][5]. Therefore, it will become increasingly important to classify variants accurately for prognostic and therapeutic decision-making.
Recently, advancements in genomic sequencing have made it possible to assess genetic variation across broad populations. It has become clear that individuals may harbor rare variants of unknown functional consequence. To reduce this problem, it has been proposed that genome databases such as the 1000 Genomes Project or the Exome Variant Server may be used as controls when investigating genetic variation in individuals with rare disease states [3,[6][7][8]. This has previously been demonstrated with Mendelian diseases [9]. These databases are compiled from high-throughput sequencing data from large populations and mirror the genetic variability present in the general population. Charapenova, et al. examined the state the literature of genetic pediatric epilepsy syndromes using these databases in order to make a statement on the robustness of prior publications [3]. As with MEN, analysis of small sample sizes may incorrectly classify mutations as pathogenic when they in fact represent a rare normal human variant. As these technologies and genetic databases develop, the information can be utilized to minimize reporting errors associated with disease pathogenesis. In addition to compiling allelic frequencies of sequenced variants, certain databases utilize the protein folding software PolyPhen to predict the impact of amino acid substitutions on protein structure and function for each sequenced variant. These predictions are generated based on computer modeling of protein folding and the subsequent anticipated impacts on domains vital to protein function. The combined use of PolyPhen scores and variant allele frequency may allow for better characterization of novel variants. In the present study, we aimed to validate the use of exome sequencing databases for the broad and rapid evaluation of Of those published variants, 46 were found within the exome sequencing databases (Table 2) with only 6 variants previously reported as "Damaging" found at MAF > 0.005 (Table 3). Of these 6 variants, five were synonymous mutations, and all have been implicated in sMTC pathogenesis. The variant G691S was associated with a MAF of 0.157 and is predicted to be benign [11]. Of the 38 published variants with expectedly low frequencies, 27 were predicted to damaging to the final gene product by PolyPhen, resulting in 33 total damaging variants of any MAF identified within the databases.

Assessment of overall variability of target genes within exome sequencing databases
The overall variance in each implicated gene was summarized (Table 4). EIF4G1 demonstrated the highest variability, followed by EGFR, with 339 and 317 variants respectively. Each gene had a total of 55 variants predicted to be damaging. Within EIF4G1, the variant R1223H was associated with a MAF of 0.0113 and predicted to be damaging by PolyPhen. RET had a relatively intermediate level of variability with 244 total variants. Of these, 47 were predicted to be damaging. There were three variants predicted to be damaging which also appeared at a MAF > 0.005. MENIN had a relatively low level of variance with only 72 variants within the database populations, 15 of which were predicted to be damaging. Only a single variant, R176Q, was both predicted to be damaging and appeared at a high MAF. The Gsp oncogene is a variant of GNAS. This gene is associated with 270 variants, only 11 of which are predicted to be damaging and none of which appear at high MAF. SDHB and SDHD have been implicated in MEN2B and were associated with 62 and 20 variants each. Of these, 6 and 5 are predicted to be damaging. Each gene had 1 variant appearing at a high MAF. None of these variants appeared among those found in our literature search (Table 5).

Discussion
Ideally, all variants under consideration must be ger-novel mutations by applying this method to previously published MEN-causing gene variants.

Materials and Methods
A systematic review of English literature conducted with the MeSH keywords ''Multiple Endocrine Neoplasia" AND "Point Mutation" yielded 163 publications. The references of these papers were reviewed for key articles. Additionally, the OMIM database entries for MEN1, MEN2, and sporadic or familial Medullary Thyroid Cancer (sMTC, FMTC) were reviewed for additional references. We also included the published MEN2 Database variants within the analysis [10]. Variants were tabulated and queried in the Exome Variant Server and 1000 Genomes Project databases. The allelic frequency and Poly-Phen scores of the obtained variants were then recorded. A Minor Allele Frequency (MAF) of 0.005 was used as the threshold above which the variant is considered too common to be directly responsible for pathogenesis given the known disease prevalence [4,5].
Each gene implicated by the literature search was also analyzed directly within the EVS. The total number of variants found within the sample population was tabulated. Those variants with PolyPhen scores of "Possibly Damaging" or "Probably Damaging" were tabulated and included as well as those with a MAF > 0.005.

Literature review and presence of identified variants within exome sequencing databases
The literature review produced 40 publications which met inclusion criteria ( Figure 1). There were 170 unique variants across 7 different genes reported to be involved in the pathogenesis of MEN syndromes (Table 1). Of these, 100 variants were located within MENIN (previously called MEN1) associated with MEN1 syndrome. Only a single published variant, R171W, appeared in exome sequencing data and was predicted to be "Possibly Damaging" according to PolyPhen. It was found at an expected low allelic frequency of 0.000308. Additionally,    [10,13] population genetics among otherwise healthy persons. However, an important caveat to this statement is that rare somatic mutations may play a questionable role in a disease process if they are found at high rates among mline rather than acquired and accumulated within the tumor cells. Much of the available literature on cancer genetics includes tumor genetics and acquired mutations. These are not necessarily amenable to analysis by notably are those with prognostic and treatment implications. Specific RET codons include 883, 918, and 922 that drive recommendations for the most aggressive treatment including thyroidectomy before 6 months of age. The absence of such mutations in these populations is encouraging considering the potential unnecessary morbidity that could be associated with falsely attributing prognostic value to them. However, those variants which did appear in the databases at unexpectedly high rates raise important questions about the genetic pathogenesis of MEN syndromes and endocrine cancers. We included in this study a summary of SNPs which have been previously implicated in sMTC which yield silent mutations. For example, the variant G691S was associated with a MAF of 0.157 and is predicted to by PolyPhen to be benign, although in one study it was more prevalent in patients with sMTC versus healthy controls [11]. The authors who previously published these variants note that there is potential that the DNA-level mutation may impact splice variants which act to enhance or silence expression [13]. However, many of these variants occur at a greater allelic frequency than our 0.005 MAF threshold for variants associated with disease pathogenesis [11]. Therefore, either it is unlikely that these variants are involved in disease pathogenesis or they are subject to other epigenetic or regulatory effects that warrant further investigation. While this may reduce their prognostic significance, it is possible that they represent benign variants which co-segregate with other yet undescribed deleterious mutations.
A portion of our analysis focused on assessing the overall level of variation within an implicated gene rath-healthy populations. The MEN family of diseases was selected initially because of the relatively high penetrance of disease resulting in a more robust body of published variants that fit this requirement. Nevertheless, we believe that utilization of exome sequencing databases will play an increasing role in the validation of newly discovered germline mutations implicated in the pathogenesis of any hereditary cancer.
The present article represents a non-exhaustive systematic review of the published literature on the genetic basis of Multiple Endocrine Neoplasias. As previously stated, the goal was to critically evaluate the current state of the literature utilizing open-access databases such as EVS. The current gold standard of assessing abnormality of novel genetic variance relies on conservation across species. We aimed to adapt a method previously described in the pediatric neurology literature to diseases relevant to oncologic surgery [3]. Most the published variants collected here were absent from the EVS. Of those that appeared, the majority were present at sufficiently low rates such that they could represent undiagnosed individuals. Because EVS was initially designed with a focus on cardiopulmonary and hematologic disorders, this population may harbor individuals with undiagnosed MEN diseases. It is assumed that the frequency with which these individuals appear in such databases should mirror that of the general population, however this cannot be guaranteed [7,8,12]. The low rate at which published variants do appear highlights the robustness of the current literature. As expected, many key mutations previously discussed in the literature did not appear within the exome sequencing databases, most  er than looking at specific variants themselves, which relied heavily on PolyPhen predictions. While there was a great deal of concordance among predictions in terms of rarity and severity of the deleterious effect, a subset of variants within the database were associated with a high PolyPhen score for the given MAF. None of these variants appeared at extraordinarily high MAF, and individuals harboring these variants could represent those with undiagnosed disease. These variants may be targets of future investigation. However, it should be noted that PolyPhen scores may require further validation before being used solely to guide future research. Interestingly, a search for the sole MENIN variant yields a single publication regarding adrenal cortical cancer, which is not traditionally associated with MEN1 [14]. Of the three variants found within RET, R982C was mentioned once in a patient with MEN1 with MEN2-like features or rarely in combination with other variants [15].

Conclusions
The present article stands as a proof of concept for the use of exome sequencing databases to evaluate published variants implicated in the pathogenesis of diseases relevant to surgical oncology. Through a retrospective evaluation of the literature, we identified some well-published variants present in the sequenced populations at unexpectedly high rates. The validity of such variants with regards to their role in disease is called into question. This illustrates how these databases allow for rapid evaluation of novel mutations and an opportunity for investigators to quickly validate their findings. An independent analysis of EVS data revealed several mutations which have not yet been implicated in associated MEN syndromes. Further work is warranted to determine what role such variants play in the disease process. These findings may lead to novel genetic targets for further research.

Grant
None.