Germline Mutations in Cancer Predisposition Genes are Frequent in Sporadic Sarcomas

Associations of sarcoma with inherited cancer syndromes implicate genetic predisposition in sarcoma development. However, due to the apparently sporadic nature of sarcomas, little attention has been paid to the role genetic susceptibility in sporadic sarcoma. To address this, we performed targeted-genomic sequencing to investigate the prevalence of germline mutations in known cancer-associated genes within an Asian cohort of sporadic sarcoma patients younger than 50 years old. We observed 13.6% (n = 9) amongst 66 patients harbour at least one predicted pathogenic germline mutation in 10 cancer-associated genes including ATM, BRCA2, ERCC4, FANCC, FANCE, FANCI, MSH6, POLE, SDHA and TP53. The most frequently affected genes are involved in the DNA damage repair pathway, with a germline mutation prevalence of 10.6%. Our findings suggests that genetic predisposition plays a larger role than expected in our Asian cohort of sporadic sarcoma, therefore clinicians should be aware of the possibility that young sarcoma patients may be carriers of inherited mutations in cancer genes and should be considered for genetic testing, regardless of family history. The prevalence of germline mutations in DNA damage repair genes imply that therapeutic strategies exploiting the vulnerabilities resulting from impaired DNA repair may be promising areas for translational research.

TP53 mutations. LFS is characterized by a tumor spectrum that includes sarcoma, typically developing before age 45 years 4,5 . In hereditary retinoblastoma, patients with germline RB1 mutations are at increased risk of second primary tumors comprising mostly sarcomas 2,4 . These associations of sarcoma with cancer syndromes implicate genetic predisposition in sarcoma development. However, given the heterogeneity and rarity of sarcomas, few studies have investigated genetic susceptibility in sporadic sarcoma. A greater understanding of genetic predisposition in sarcoma development will help refine our interpretation on the clinical implications of genetic alterations to sarcomas as well as facilitate identification of sarcoma patients who may be at risk for other cancers. This will guide patient-care strategies, such as offering predictive testing for cancer syndromes and preventive surveillance.
Few studies to-date have been published regarding germline alterations in sarcomas [6][7][8][9][10][11] ; those that did mostly focused on specific sarcoma subtypes, such as Ewing's sarcoma 8 , rhabdoid tumors 10,11 and osteosarcoma 9 . The largest study to-date was performed by the International Sarcoma Kindred Study (ISKS), in which a predominantly kindred-oriented cohort of 1192 sarcoma probands were interrogated for germline mutations in a panel of cancer-associated genes. They reported that 55% of sarcoma cases harboured at least one pathogenic mutation 7 . This is a strikingly large proportion, although it should be noted that as a kindred study, their cohort is likely to bias for individuals with familial history for cancer and thus, potential germline mutation carriers. However, it is unknown whether findings from a predominantly Caucasian cohort extrapolate to an Asian population. To address this, we interrogate an Asian cohort of sarcoma patients in this study for the prevalence of germline alterations in 52 cancer-associated genes using a combined approach of targeted genomic sequencing and digital multiplex ligation-dependent probe amplification (digitalMLPA).
Mutation spectrum in Asian sporadic sarcoma cohort. Using our variant prioritization pipeline, we found 65 non-silent mutations, of which 32 (49.2%) were VUS, 20 (30.8%) benign and 13 (20.0%) were predicted to be pathogenic. Of the 13 predicted pathogenic mutations, 12 were identified by targeted sequencing, which comprised eight missense mutations, two nonsense and two frameshift mutations (Table 2). These mutations affected 9 genes, two each in ATM, ERCC4 and FANCI, and one each in BRCA2, FANCC, FANCE, MSH6, POLE, and SDHA. One copy number alteration affecting TP53 was detected through digitalMLPA ( Table 2). Seven of these variants have been observed in very low frequencies in the East Asian population of 1000 G and ExAC databases, whereas the remaining variants are novel (Table 2). Of these mutations, 11 (84.6%) occurred in genes associated with DNA damage repair (DDR) and the remaining two (15.4%) in known cancer predisposition genes (Fig. 1).
The predicted pathogenic mutations were found in 9 patients (13.6%, 95% CI: 6.8-24.8%) across 10 genes (Table 2). A recent study on germline mutations in pediatric cancers compared the prevalence of 60 autosomal-dominant (AD) genes in their cancer cohort with the 1000 G population 9 . Using their 1000 G data, we repeated the comparison against our cohort on the subset of our genes that overlapped with their 60 AD genes and observed a prevalence of predicted pathogenic mutation carriers at 6.1%, which is significantly higher than 1.1% in the 1000 G population (Fisher's exact test, P = 0.01) (Supplementary Table S1).
Of the 13 mutations predicted pathogenic from our pipeline, five met the criteria for pathogenic/likely pathogenic classification recommended by the American College of Medical Genetics (ACMG) guidelines 12 (Table 2,  Supplementary Table S2). These include the two frameshift mutations in BRCA2 and FANCC, two ERCC4 nonsense mutations and one TP53 copy number alteration. Additionally, three predicted pathogenic mutations are recommended by ACMG for return as incidental findings to patients 13 : one missense mutation in MSH6, one BRCA2 frameshift deletion and one TP53 copy number alteration. These alterations occurred in three patients of differing sarcoma histologies ( Table 2). Age at sarcoma diagnosis was ≥45 years in all patients except the MSH6 mutation carrier, whose age-at-diagnosis was 24 years old.
Mutations in DNA damage repair (DDR) genes. Among the 11predicted pathogenic mutations detected in DDR genes, two were frameshift deletions, two nonsense and seven missense mutations ( Table 2). Eight DDR genes were affected; ATM, BRCA2, ERCC4, FANCC, FANCE, FANCI, MSH6, and POLE (Fig. 1). Truncating mutations, including frameshift deletions and nonsense mutations, occurred in BRCA2, FANCC and ERCC4. The ERCC4 nonsense mutation (p.Cys723*) was found in two patients, who were diagnosed with giant cell tumor of bone and alveolar rhabdomyosarcoma at 16-and 24-years-old respectively ( Table 2). This mutation occurred within ERCC4 domain (Fig. 2), which is the nuclease catalytic site of ERCC4 14 . Some of the remaining DDR mutations were also mapped to functionally important protein domains (Fig. 2); for instance the TAN (Tel1/ATM N-terminal) domain of ATM and the catalytic domain of POLE.
Seven patients harbored the 11predicted pathogenic mutations, marking a germline DDR mutation carrier frequency of 10.6% (95% CI: 4.7-21.2%) within our cohort. The mutations were not observed to be associated with any particular sarcoma histology. However, two patients were found to harbour multiple germline mutations in DDR genes; both were female, one with alveolar rhabdomyosarcoma at 24-years-old and the other had undifferentiated pleomorphic sarcoma at age 48 years (Supplementary Table S3). Interestingly, the former carried four germline mutations, all affecting DDR genes: ATM, ERCC4, FANCI and MSH6. A review of her family history revealed an uncle with nasopharyngeal cancer. Unfortunately, we were not able to reach the patient for more detailed familial information nor were we able to establish the somatic status of these variants as her tumor specimen was unavailable. In the latter patient with UPS, sequencing of her tumor showed loss of heterozygosity of the BRCA2 variant, supporting the pathogenic prediction of this variant (Supplementary Figure S4).

Mutations in other known cancer predisposition genes.
Through targeted sequencing, one predicted pathogenic missense mutation was identified in SDHA, a known cancer predisposition gene, in a patient diagnosed with epitheloid sarcoma at 24-years-old (Table 2, Fig. 1). The mutation mapped to the fumarate reductase C-terminal of SDHA (Fig. 2), a catalytic domain in which germline mutations have been reported to be deleterious in patients presenting paragangliomas and pheochromocytomas as well as Leigh syndrome [15][16][17] . Additionally, digitalMLPA revealed a gross deletion of TP53 exon 1 in a female patient with leiomyosarcoma diagnosed at 49-years-old and no records of familial cancer (Table 2). Validation by quantitative PCR (qPCR) confirmed the heterogeneous germline deletion and the loss of heterozygosity at this site in the patient tumor (Supplementary Figure S4).
Association of mutations with sarcoma histology and family history. The nine patients carrying predicted pathogenic mutations had varying sarcoma histological diagnoses (Table 2). To explore potential associations between affected genes and tumor spectrum, the varying histological subtypes were categorized into  Table S5); however, we did not find any clear associations between mutated genes and histological subtypes (Fig. 1). We also assessed for potential correlation of the predicted pathogenic germline mutations with family histories. Clinical records of these patients were revisited to  Predicted pathogenic germline variants found in the sporadic sarcoma cohort of this study. Diagram depicting distribution of the predicted pathogenic germline mutations across 10 cancer-associated genes according to three histological categories based on an arbitrary genetics-driven classification. Each column represents one patient. The mutation type is colour-coded as shown below the diagram.
determine any family history that may have been missed by the treating clinician, however no evident correlation observed ( Table 2), suggesting that family history should not be the sole inclusion factor when considering genetic predisposition in sarcoma patients.
Variants of uncertain significance. In our analysis, a total of 32 VUS (Fig. 1)

Discussion
To our knowledge, our study is the first to screen for germline cancer gene mutations in Southeast Asian sporadic sarcoma patients. Whereas the ISKS 7 recently indicated that a large proportion of sarcomas may harbor a germline component, our study differs in that our cohort was prospectively recruited solely on the basis of young age at diagnosis (<50 years). Taken together with the ISKS findings, our study independently confirms in an entirely Asian cohort that a substantial fraction of apparently sporadic sarcomas may harbour a germline component. In our cohort of 66 sarcoma patients, 13.6% (95% CI: 6.8-24.8%) had at least one predicted pathogenic germline mutation in the 52 cancer-associated gene panel. Although lower than the 55% reported in the ISKS 7 , this can be potentially explained by a combination of factors. First, the ISKS gene panel is larger than ours (72 vs 52 genes). Second, while family history is not an inclusion criteria for the ISKS, patients with suspicious family histories may be more likely to be referred to the study. This is consistent with 17% of informative families in ISKS meeting the criteria for recognized cancer syndromes.
For this study, we used a local database of germline variants detected in a healthy matched cohort, which allowed us to remove 16 candidate variants that are probably rare, population-specific polymorphisms not found in databases such as ExAC and 1000 G. This illustrates the importance of having an ancestry-matched cohort of Figure 2. Visualization of protein domains for the genes with predicted pathogenic mutations identified in this study. TP53 with copy number alteration is not visualized. decent size to filter rare polymorphisms, echoing the recent findings where by African-American patients had variants misclassified as pathogenic but were subsequently reclassified as benign in light of additional population data 18 .
Half of the predicted pathogenic variants identified in this study are novel, and these mutations mostly affect DDR genes. Interestingly, a truncating mutation in ERCC4 (p.Cys723*) was found in two patients with sarcoma diagnosed under age 25 years (Table 2). Apart from playing a key role in DDR, ERCC4 is also involved in maintaining genomic stability 19 . This truncating mutation has been observed in gastric cancer tumors, and was shown to impair DNA repair capacity in CHO-K1 cells 20 . Incidentally, the ISKS reported an excess of pathogenic variants in ERCC2. The observation of ERCC2 and ERCC4 predicted pathogenic mutations in our study and the ISKS, coupled with the early age-of-onset in our two patients, suggests a potential role for the nucleotide excision repair (NER) pathway in sarcoma predisposition.
The prevalence of predicted pathogenic DDR gene mutation carriers in our cohort (10.6%) suggests that constitutional defects in this pathway may be associated with sarcoma. This is consistent with the enrichment of pathogenic mutations in DDR-related genes such as ATM and BRCA2 seen in the ISKS 7 . Double-stranded DDR is highly conserved and crucial for chromosome structure maintenance and genomic stability. From our analysis of TCGA sarcoma data 21 for pathogenic somatic mutations in these eight DDR genes, we observed a 3.4% prevalence, suggesting that these genes may indeed have a role in sarcomagenesis. It is also noteworthy that only one patient in our cohort harboured a germline TP53 deletion, consistent with the relatively lower prevalence of germline 6 versus somatic [21][22][23] TP53 mutations in sarcomas.
The presence of multiple predicted pathogenic DDR gene germline mutations (ERCC4, ATM, FANCI, MSH6) in an early-onset sarcoma in our study suggests that multiple pathogenic mutations may have an additive effect towards sarcoma predisposition, a hypothesis supported by the ISKS in which an earlier age-at-diagnosis was correlated with the cumulative burden of multiple pathogenic mutations 7 . Notably, all the variants found in our two patients with multiple predicted pathogenic germline mutations occurred in DDR genes ( Table 2). Both patients have one protein-truncating variant co-occuring with predicted pathogenic single nucleotide variants. It is conceivable that even if the deleterious effect of each mutation is non-significant independently, the collective impact of these co-occurring predicted pathogenic mutations may potentially lead to impaired DNA repair and genomic instability, therefore conferring susceptibility to tumorigenesis. Recent findings showing frequent germline mutations in DNA homologous recombination genes within a metastatic prostate cancer cohort suggests the potential application of targeted therapies, such as PARP1-inhibition and platinum-based chemotherapy 24 . The excess of predicted pathogenic DDR gene germline mutations in our sporadic sarcoma cohort suggests that a subset of sarcomas may be candidates for such targeted therapies.
Several limitations were encountered in this study. First, the cohort size is constrained by the rarity of sarcomas. Second, heterogeneous histology and sparse patient family history precluded any associations with their genotype. Third, the performance of various in silico variant pathogenicity prediction algorithms can be variable, and there remains no consensus on the choice of algorithms for predicting variant pathogenicity 12 . Thus, interpretation of disease causality for variants, especially missense variants, remains a challenge despite proposed guidelines 12,25 . We sequenced tumors of the patients harbouring the missense variants as a means of assessing pathogenicity but tumor DNA was not available for most of the patients, hence the missense variants of these patients were interpreted with caution. The only two variants we successfully validated -FANCE (p.Glu448Lys) and FANCI (p.Asp728Gly) -did not show loss of heterozygosity, however structural data has shown that these positions of the two FANC genes are involved in the important protein-protein interaction with FANCD2 26,27 . These genes are members of the Fanconi anemia (FA) pathway, which is known to predispose to FA and other malignancies when impaired 28,29 . FANCE has been demonstrated to play a key role in the architecture of the FA core complex by mediating interactions with FANCD2 30, 31 whereas FANCI forms a heterodimer with FANCD2 known as the ID complex 26 , both of which are critical for the activity of the FA pathway 28 . As the Glu448 residue of FANCE is highly conserved across species and important for FANCD2 binding 27 , mutation of Glu448Lys is likely to impact on the interaction between FANCE and FANCD2 due to the change in residue size and charge. The two FANCI variants seen in our cohort -Asp728Gly and Asn580Ser -corresponded to residues in the FANCI helical domain 2 that are highly conserved across species 26 . In particular, Asn580 is located in a region concentrated with polar residues shown to interface with FANCD2. While the specific effect of these mutations remains to be functionally confirmed, the potential deleterious consequence on the activity of FANCD2 in addition to the functional studies reported in literature demonstrating the loss of protein function in the FANC-family genes 32, 33 collectively provide some evidence favouring the assignment of pathogenicity to the missense mutations we observed in this study. Importantly, the consistency of our findings with a larger, more powered study such as the ISKS indicates that our bioinformatics approach can reasonably discover potentially pathogenic germline mutations in our cohort. Despite these limitations, our findings show that a considerable proportion of sporadic sarcomas may have underlying genetic predisposition.
In summary, our study is the first to investigate and identify an excess of potentially pathogenic germline mutations in a Southeast Asian cohort of young sarcoma. Our findings, together with that of the ISKS, show that prevalence of pathogenic germline mutation carriers in an apparently sporadic sarcoma cohort may be higher than anticipated and that sarcoma has a significant hereditary component. Additionally, frequent observation of potentially pathogenic germline mutations in the DDR pathway suggest that inherited defects in this pathway may contribute to sarcoma predisposition. Sarcoma patients encountered in the clinic, especially young ones, should therefore be treated as potential carriers of germline pathogenic mutations in cancer predisposition genes regardless of family history and considered for genetic testing. Insights from this study will help direct further efforts to enhance our understanding of genetic predisposition in sarcoma with potentially significant impact on patient-care.

Materials and Methods
Patients. Patients consulted at our sarcoma subspecialty clinic at the National Cancer Centre Singapore were prospectively recruited for this study. Sixty-six patients under age 50 years of varying sarcoma subtypes (excluding GIST) were selected for sequencing. Patient clinical data including sarcoma histology, personal and family history of cancer were collated (Table 1). Patient-derived peripheral blood was used to obtain genomic DNA for sequencing. This study was approved by the SingHealth Centralised Institutional Review Board (IRB 2010/426/B) with signed informed consent from all patients. All study procedures were carried out in accordance with the approved guidelines.
Targeted genomic sequencing. A panel of 52 genes associated with cancer-predisposition and DNA damage repair was customized using Agilent SureDesign (Agilent, Santa Clara, CA, USA). Purified patient genomic DNA were sheared to 150-200 base pairs (bp) fragments for targeted capture of the customized gene panel. Variant prioritization pipeline. Sequenced reads were aligned to the human reference genome (hs37d5) as detailed in Supplementary Methods. Missense variants and micro-indels were identified, then filtered by read-depth and quality score. Variants overlapping target regions were retained for analysis. To prioritize candidate germline variants, filtered variants were annotated and common polymorphisms removed by excluding variants present in >1% of East-Asian or South-Asian population as defined by Exome Aggregation Consortium (ExAC) and 1000 Genomes (1000 G) databases 34,35 . Variants found using an in-house database of common polymorphisms in our local population (n = 454) were excluded, then filtered to retain only splice-site and non-synonymous exonic variants. Frameshift, nonsense and splice-site variants were deemed pathogenic. Missense variants were classified as potentially pathogenic, variant of uncertain significance (VUS) or benign using in silico prediction algorithms SIFT, PolyPhen2 HDIV, Mutation Assessor, FATHMM and CADD. Variants were considered potentially pathogenic if ≥3 algorithms predicted the variant to be damaging, and benign if none considered the variant damaging. Remaining variants were categorized as VUS. Analysed sequencing data were deposited in the European Nucleotide Archive (accession no. PRJEB20843). Variant predictions were checked on InterVar 36 for interpretation based upon the American College of Medical Genomics and Genetics (ACMG) guidelines 12

Validation of variants.
Candidate variants were validated by Sanger sequencing using BigDye Terminator v3.1 (ABI, ThermoFisher Scientific Corporation). Resulting chromatograms were analyzed using Mutation Surveyor (Softgenetics, PA, USA). Copy number variants detected through digitalMLPA were validated by quantitative PCR (qPCR). Cycle threshold (C t ) values were normalized to GAPDH endogenous control and fold-change in gene dosage was calculated using the ΔΔC t method by normalizing against a pool of three healthy controls. For validation of the somatic status of candidate variants, Sanger sequencing was performed on tumor DNA extracted from fresh frozen or formalin-fixed paraffin embedded tumors using QIAamp DNA mini (Qiagen, 51304) or QIAamp FFPE tissue (Qiagen, 56404) kits.
Statistical analyses. Patient characteristics and sequencing results were tabulated with descriptive statistics including medians, means and standard deviations for proportions with 95% confidence interval (CI). Proportions were analyzed using Fisher's exact test. All P-values are two-tailed.