Unmasking Intra-Tumoral Heterogeneity and Clonal Evolution in NF1-MPNST

Sarcomas are highly aggressive cancers that have a high propensity for metastasis, fail to respond to conventional therapies, and carry a poor 5-year survival rate. This is particularly true for patients with neurofibromatosis type 1 (NF1), in which 8%–13% of affected individuals will develop a malignant peripheral nerve sheath tumor (MPNST). Despite continued research, no effective therapies have emerged from recent clinical trials based on preclinical work. One explanation for these failures could be the lack of attention to intra-tumoral heterogeneity. Prior studies have relied on a single sample from these tumors, which may not be representative of all subclones present within the tumor. In the current study, samples were taken from three distinct areas within a single tumor from a patient with an NF1-MPNST. Whole exome sequencing, RNA sequencing, and copy number analysis were performed on each sample. A blood sample was obtained as a germline DNA control. Distinct mutational signatures were identified in different areas of the tumor as well as significant differences in gene expression among the spatially distinct areas, leading to an understanding of the clonal evolution within this patient. These data suggest that multi-regional sampling may be important for driver gene identification and biomarker development in the future.


Introduction
Malignant peripheral nerve sheath tumor (MPNSTs) is the sixth most common soft tissue sarcoma [1] and has an incidence rate of 0.1-0.2 per 100,000 persons per year [2]. MPNSTs are often associated with neurofibromatosis type 1 (NF1). The incidence rate of MPNSTs in patients with NF1 is much higher than that of the general population, estimated to be 1.6 per 1000 per year, or a lifetime risk of 8-13% [3]. Approximately 50% of MPNSTs occur in patients with neurofibromatosis [4][5][6][7], and the other 50% of MPNSTs occur sporadically or in the setting of previous radiation therapy [4,6].

Histology
Images of the hematoxylin-eosin sections were taken (20X magnification) using an Olympus BX-51 microscope using an Olympus DP71 digital camera, and DP Controller software. Tumor purity was estimated based on morphologic review of the entire hematoxylin-eosin stained section estimating the number of tumor cells, stromal cells, lymphocytes, and extravasated red blood cells. Two pathologists reviewed these slides independently providing an estimated percentage of total tumors cells per slide.

Sequencing and Bioinformatics Analysis
Whole exome sequencing (WES), RNA sequencing (RNA-Seq), and copy number analysis (CNVkit) [26] were performed on each sample and compared to a blood sample as a germline DNA control. Both Illumina Whole Genome Sequencing (eWGS) of 3 tumor samples and 1 PBMC normal sample, and Illumina RNA Sequencing of the 3 tumor samples were generated from the sampled areas.

Library Construction and Sequencing
Each tumor had 2 enriched libraries constructed (n = 6), and the PBMCs had a single enriched library constructed (n = 1). Exome libraries were captured with an IDT exome reagent, then pooled with a WGS library for sequencing on an Illumina HiSeq4000 with at least 1000x coverage. RNA was prepared with a TrueSeq stranded total RNA library kit, then sequenced on an Illumina HISeq4000 with 72M reads per sample.

Histology
Images of the hematoxylin-eosin sections were taken (20X magnification) using an Olympus BX-51 microscope using an Olympus DP71 digital camera, and DP Controller software. Tumor purity was estimated based on morphologic review of the entire hematoxylin-eosin stained section estimating the number of tumor cells, stromal cells, lymphocytes, and extravasated red blood cells. Two pathologists reviewed these slides independently providing an estimated percentage of total tumors cells per slide.

Sequencing and Bioinformatics Analysis
Whole exome sequencing (WES), RNA sequencing (RNA-Seq), and copy number analysis (CNVkit) [26] were performed on each sample and compared to a blood sample as a germline DNA control. Both Illumina Whole Genome Sequencing (eWGS) of 3 tumor samples and 1 PBMC normal sample, and Illumina RNA Sequencing of the 3 tumor samples were generated from the sampled areas.

Library Construction and Sequencing
Each tumor had 2 enriched libraries constructed (n = 6), and the PBMCs had a single enriched library constructed (n = 1). Exome libraries were captured with an IDT exome reagent, then pooled with a WGS library for sequencing on an Illumina HiSeq4000 with at least 1000x coverage. RNA was prepared with a TrueSeq stranded total RNA library kit, then sequenced on an Illumina HISeq4000 with 72M reads per sample.
Manual review was used to remove additional sequencing artifacts. Germline variants and somatic variants reported on variant detecting pipeline were compared to see any intersection of variants. Any intersecting variants were removed from the somatic variant gene list, thus filtering out the germline variants. Common variants with 1000 genome MAF (minor allele frequency) > 0.05 were filtered out. Waterfall somatic variant plots were created with GenVisR [34] by including somatic variants that occurred in each area. Variants reported on the waterfall plot are most likely to be pathogenic, which is reported via VEP. These variants were not reported as a somatic variant in COSMIC (Catalogue Of Somatic Mutations In Cancer) [35] and ClinVar [36] archive, thus these variants are best classified as variants with unknown significance. In order to predict clinical significance and predictions of the functional effects of these variants, each variant was reviewed on SIFT [37] and Polyphen [38]. IMPACT rating was determined by VEP for each non-coding variant.

Copy Number Analysis
CNVkit was used to infer and visualize copy number from high-throughput DNA sequencing data. Coverage for each bait position in the exome reagent was calculated, then segments of constant copy number were identified using circular binary segmentation. Data were plotted to provide visualization of CNVs.

Inference of Clonal Phylogeny
SciClone [39] and ClonEvol [40] were utilized to attempt to perform a phylogeny inference. However, the analysis was complicated by the abundance of copy number-altered regions in these tumors, and these standard algorithms were unable to automatically perform that inference. Manual review of the shared and private single nucleotide variants and large copy number altered areas, though, revealed only one possible phylogeny for this tumor.

RNA Sequence Preprocessing
RNA-Sequence (RNA-seq) was trimmed from 3 -end with a minimum quality Phred score of 20 and aligned against hg38-Ensembl Transcripts release 99 via BWA-MEM. Pre/post quality control and full expectation-maximization (EM) quantification were run via Partek ® Flow ® [41]. Gene counts and transcript counts were normalized by CPM (counts per million) by using edgeR [42] package. Heatmap visualizations were created using gplots [43] R package (Warnes, G.R. Seattle, WA, USA).

Gene Differential Expression Analysis
The gene-specific analysis (GSA) method was used to test for differential expression of genes or transcript between sample regions in Partek ® Flow ® [44]. Differential expressed genes were defined as the following statistic parameters: p-value <= 0.05; FDR step up <= 0.05; Fold Change < −2 or >2. From differentially expressed genes, a GO enrichment test was used to functionally profile this set of genes, to determine which GO terms appear more frequently than would be expected by chance when examining the set of terms annotated to the input genes, each associated with a p-value.

Pathway Analysis
A list of genes in copy number aberrant (CNA) regions was extracted. CNA regions were defined as copy number regions greater than 3 or copy number regions less than 1. For each area, we intersected the list of genes that are located in the CNA regions with the differentially expressed gene list reported in the RNA differential expression analysis (p-value <= 0.05). PantherDB [45] was utilized to discover GO terms and pathways that may be affected by these genes.

Patient Information
Patient characteristics can be seen in Table 1. The patient was a male with a history significant for a clinical diagnosis of neurofibromatosis type 1-patient had a plexiform neurofibroma, spinal neurofibromas, café au lait macules, and multiple first-degree relatives with neurofibromatosis type 1-and was 40 years old at the time of diagnosis of MPNST. He presented with a large tumor located in the left neck. Resection showed a high-grade malignant peripheral nerve sheath tumor, 10.2 cm in the largest dimension, with negative margins. The patient did not receive any adjuvant therapy for his MPNST following initial resection due to poor performance status. He recurred 21 months after the initial diagnosis and ultimately died secondary to complications from metastatic disease (33 months after initial diagnosis). Samples were taken in three different locations within the primary tumor immediately following the inititial resection for the purpose of this study.

Histology of Biopsy Sites
We first reviewed the H&E images of the tumor to correlate histology to the gross images of the tumor. H&E stained sections in Figure 2 show representative images of the three sampled areas. Area #1 demonstrates tissue of a spindle cell neoplasm of neural differentiation arranged in fascicles with elongated hyperchromatic nuclei and a mild to moderate amount of cytoplasm. The tumor purity of this sample was >95%. Area #2 shows spindled cells in a background of hemorrhage, a finding commonly seen in these high-grade tumors with a tumor purity of >95%. Area #3 represents an area of necrosis, another characteristic finding for MPNST. This sample showed >95% tumor purity.

Patient Information
Patient characteristics can be seen in Table 1. The patient was a male with a history significant for a clinical diagnosis of neurofibromatosis type 1--patient had a plexiform neurofibroma, spinal neurofibromas, café au lait macules, and multiple first-degree relatives with neurofibromatosis type 1--and was 40 years old at the time of diagnosis of MPNST. He presented with a large tumor located in the left neck. Resection showed a high-grade malignant peripheral nerve sheath tumor, 10.2 cm in the largest dimension, with negative margins. The patient did not receive any adjuvant therapy for his MPNST following initial resection due to poor performance status. He recurred 21 months after the initial diagnosis and ultimately died secondary to complications from metastatic disease (33 months after initial diagnosis). Samples were taken in three different locations within the primary tumor immediately following the inititial resection for the purpose of this study.

Histology of Biopsy Sites
We first reviewed the H&E images of the tumor to correlate histology to the gross images of the tumor. H&E stained sections in Figure 2 show representative images of the three sampled areas. Area #1 demonstrates tissue of a spindle cell neoplasm of neural differentiation arranged in fascicles with elongated hyperchromatic nuclei and a mild to moderate amount of cytoplasm. The tumor purity of this sample was >95%. Area #2 shows spindled cells in a background of hemorrhage, a finding commonly seen in these high-grade tumors with a tumor purity of >95%. Area #3 represents an area of necrosis, another characteristic finding for MPNST. This sample showed >95% tumor purity.

Whole Exome Sequencing (WES), RNA Sequencing (RNA-Seq), and Copy Number Analysis
We first interrogated the sequencing data to identify the germline NF1 variant within this tumor. Figure 3 shows a lollipop plot identifying the patient's likely NF1 germline variant based on exclusion of any variants with minor allele frequency >0.05 in the 1000 genomes database. Next, to investigate

Whole Exome Sequencing (WES), RNA Sequencing (RNA-Seq), and Copy Number Analysis
We first interrogated the sequencing data to identify the germline NF1 variant within this tumor. Figure 3 shows a lollipop plot identifying the patient's likely NF1 germline variant based on exclusion of any variants with minor allele frequency >0.05 in the 1000 genomes database. Next, to Genes 2020, 11, 499 6 of 20 investigate intra-tumoral heterogeneity within the sample, RNA sequencing of the three sample sites was performed and is shown in Figure 4.
Genes 2020, 11, 499 6 of 18 intra-tumoral heterogeneity within the sample, RNA sequencing of the three sample sites was performed and is shown in Figure 4.  Distinct gene expression profiles were observed in each of the areas sampled. The top 16 differentially expressed genes are listed in Table 2 and include a number of genes involved in transcription and translation. We next performed a copy number analysis of the three biopsy sites to determine whether or not different copy number alterations were observed in each area ( Figure 5). Distinct copy number signatures can be appreciated in each of the three samples further illustrating intra-tumoral heterogeneity. Additionally, we evaluated the single nucleotide variants found in each of the samples. This broad overview of all somatic variants is depicted in the waterfall plot in Figure  6. Again, distinct somatic variants can be appreciated across different areas. We next explored the potential significance of these variants through further bioinformatics analysis. While the biological significance of each of these variants is uncertain, there is evidence that some of these variants may play a role in the pathogenesis. For each variant in a coding region, CBioPortal [47] was queried for each gene to determine if the somatic variant was in a functional domain. Additionally, the RNAseq data was queried to determine if the variant in a specific area of the tumor influenced the gene expression of that gene in a specific area. Finally, SIFT and Polyphen were used to predict pathogenicity. Table 3a,b list the somatic variants in the coding region that may play a role in the Location of NF1 germline variant. One intronic germline variant, NC_000017.11:g.31296270C>T (rs11080149). was identified and is depicted in this figure.
Genes 2020, 11, 499 6 of 18 intra-tumoral heterogeneity within the sample, RNA sequencing of the three sample sites was performed and is shown in Figure 4.  Distinct gene expression profiles were observed in each of the areas sampled. The top 16 differentially expressed genes are listed in Table 2 and include a number of genes involved in transcription and translation. We next performed a copy number analysis of the three biopsy sites to determine whether or not different copy number alterations were observed in each area ( Figure 5). Distinct copy number signatures can be appreciated in each of the three samples further illustrating intra-tumoral heterogeneity. Additionally, we evaluated the single nucleotide variants found in each of the samples. This broad overview of all somatic variants is depicted in the waterfall plot in Figure  6. Again, distinct somatic variants can be appreciated across different areas. We next explored the potential significance of these variants through further bioinformatics analysis. While the biological significance of each of these variants is uncertain, there is evidence that some of these variants may play a role in the pathogenesis. For each variant in a coding region, CBioPortal [47] was queried for each gene to determine if the somatic variant was in a functional domain. Additionally, the RNAseq data was queried to determine if the variant in a specific area of the tumor influenced the gene expression of that gene in a specific area. Finally, SIFT and Polyphen were used to predict pathogenicity. Table 3a,b list the somatic variants in the coding region that may play a role in the  Table 2 and include a number of genes involved in transcription and translation. We next performed a copy number analysis of the three biopsy sites to determine whether or not different copy number alterations were observed in each area ( Figure 5). Distinct copy number signatures can be appreciated in each of the three samples further illustrating intra-tumoral heterogeneity. Additionally, we evaluated the single nucleotide variants found in each of the samples. This broad overview of all somatic variants is depicted in the waterfall plot in Figure 6. Again, distinct somatic variants can be appreciated across different areas. We next explored the potential significance of these variants through further bioinformatics analysis. While the biological significance of each of these variants is uncertain, there is evidence that some of these variants may play a role in the pathogenesis. For each variant in a coding region, CBioPortal [47] was queried for each gene to determine if the somatic variant was in a functional domain. Additionally, the RNAseq data was queried to determine if the variant in a specific area of the tumor influenced the gene expression of that gene in a specific area. Finally, SIFT and Polyphen were used to predict pathogenicity. Table 3a,b list the somatic variants in the coding region that may play a role in the pathogenesis of this tumor based on the above criteria. For those mutaions in non-coding regions, the Ensembl Variant Effect Predictor [33] was used to determine whether or not the variant would be predicted to affect gene expression. All of the identified variants were classified as modifiers, indicating that pathogenicity prediction is difficult, thus the effects of these variants are unclear. (Table 3c). Further details of the somatic variants can be found in Supplemental Table S1. Next, a gene ontology analysis was performed. To do this, a list of genes in copy number aberrant (CNA) regions was extracted. For each area, the list of genes located in the CNA regions intersected with the differentially expressed gene list reported in the RNA differential expression analysis, and PantherDB [45] was utilized to identify pathways that may be affected by these genes. Table 4 displays the unique genes in each area with copy number aberrations and alterations in gene expression. Genes depicted in Area 1 have been reported in the literature to serve a myriad of functions in tumorigenesis, including base excision repair, nucleotide excision repair, and alternative splicing [48][49][50][51][52][53][54][55]. Those in Area 2 are involved in several different pathways, including transcriptional regulation in addition to ribosomal and proteasomal function [56][57][58][59][60]. Finally, the genes in Area 3 consist of several ribosomal subunits and small nucleolar RNAs, suggesting that both translation and transcription are uniquely affected compared to other areas [61][62][63]. This analysis suggests that there may be different functional programs at play across the three areas. Next, we manually reviewed the data to look for changes in other known drivers of MPNST including TP53, ATRX, EED, SUZ12, and CDKN2A. There were no copy number changes or somatic mutions in any of these genes. Finally, we performed a careful manual review of all of the shared and unique somatic variants and copy number alterations in each area in order to develop a predicted clonal evolution. Figure 7 depicts the predicted phylogenetic tree of the subclones from each area, representing the likely clonal evolution of the tumor.          Table 4. Differentially Expressed Gene Pathway Analysis. These genes were located in copy number aberrant regions defined as copy number more than 3 or lower 1 and also demonstrated differential expression by RNA seq. Different pathways are implicated in the distinct sections.

Discussion
Despite advances in our understanding of the pathobiology of MPNST and the identification of seemingly promising therapeutic targets using a single model system in preclinical studies, no investigational agents have demonstrated efficacy following translation to human clinical trials. One element that has largely been ignored in the study of MPNST has been the possible existence of intratumoral heterogeneity. No single study in MPNST has focused on intra-tumoral heterogeneity. However, spatial intra-tumoral heterogeneity has become an area of interest in the study of other solid malignancies to begin to understand clonal evolution [91][92][93][94][95]. Within the NF1 field, researchers are beginning to appreciate the importance of understanding spatial and temporal heterogeneity.

Discussion
Despite advances in our understanding of the pathobiology of MPNST and the identification of seemingly promising therapeutic targets using a single model system in preclinical studies, no investigational agents have demonstrated efficacy following translation to human clinical trials. One element that has largely been ignored in the study of MPNST has been the possible existence of intra-tumoral heterogeneity. No single study in MPNST has focused on intra-tumoral heterogeneity. However, spatial intra-tumoral heterogeneity has become an area of interest in the study of other solid malignancies to begin to understand clonal evolution [91][92][93][94][95]. Within the NF1 field, researchers are beginning to appreciate the importance of understanding spatial and temporal heterogeneity. For example, Peacock et al. performed a genomic analysis of serial samples from one patient who developed an MPNST. Samples were taken at four timepoints (benign plexiform neurofibroma, MPNST pre-treatment, MPNST post-treatment, and MPNST at time of metastasis) [96]. They observed early hemizygous microdeletions in NF1 and TP53 with progressive amplifications of MET, HGF, and EGFR, highlighting the potential role of these pathways in progression. Additionally, Carrió et al. have started to examine intra-tumoral heterogeneity in PNF (plexiform neurofibromas), ANF (atypical neurofibroma) and ANNUBP (atypical neurofibromatous neoplasms with uncertain biological potential), the precursors to MPNST. They performed SNP-array analysis and exome sequencing on multiple biopsies of eight PNF, of which some had areas consistent with ANF or ANNUBP. Their data suggested that loss of a single copy of CDKN2A/B in NF1 null cells is sufficient to start ANF development and that total inactivation of both copies is necessary to form ANNUBP [97]. Our study represents the first look at spatial intra-tumoral heterogeneity within an MPNST. We have demonstrated differing mutational profiles, copy number alteration signatures, and gene expression profiles within the three areas sampled. The differing mutation profile includes a variety of single nucleotide variants, including missense, frameshift, and synonymous variants. The role of synonymous variants in the tumorigenesis of MPNST is uncertain. However, there is increasing evidence that synonymous variants can alter gene expression and protein function and thus cannot be simply disregarded [98][99][100][101]. Additionally, several of the genes in Table 3a,b have previously been implicated in cancer [102][103][104][105][106][107][108][109][110][111][112][113][114][115]. For example, in Area 2, CSK was found to have a frameshift variant in its functional domain. CSK encodes a C-terminal Src kinase that has previously been found to act as a tumor suppressor in both breast cancer and prostate cancer [112][113][114]. Interestingly, in the context of breast cancer, Smith et al. showed that C-terminal Src kinase loss facilitated tumorigenesis by altering expression of the PRC2 complex subunits, EZH2 and SUZ12 [113]. Based on these data, it is possible that alterations in CSK could be another way in which the PRC2 complex is affected in MPNST. Another gene, CCL16, is involved in chemotaxis of human monocytes and lymphocytes. This chemokine was shown to delay mammary tumor growth and reduce rates of metastasis in mouse models [115], raising the possibility of decreased immune surveillance of our patient's MPNST secondary to a non-functional CCL16. In addition to the differences in single nucleotide variants, there were differences in copy number alterations across the three areas with Area 2 showing the most distinct signature in terms of copy number gains and losses. The degree to which each somatic variant, differentially expressed gene, and copy number aberration contributes to the biologic heterogeneity of the tumor remains uncertain. However, future work in our lab will be geared at elucidating this information. Finally, there was a distinct difference in gene expression among the three areas with gene ontology studies pointing toward differences in translation and protein targeting.
Taken together, these data point toward the existence of intra-tumoral heterogeneity and suggest that further investigation into this phenomenon is warranted. Additionally, these data suggest that there should be some caution taken in interpreting sequencing that comes from a single biopsy site. The advent of single cell sequencing has allowed for more rigorous evaluation of intra-tumoral heterogeneity in other cancers including acute leukemias [116,117], as well as in some solid malignancies [118,119]. Future work will be geared at using this data as the foundation to better understand clonal heterogeneity along with single cell sequencing to comprehensively evaluate intra-tumoral heterogeneity and clonal evolution of MPNST.

Conclusions
Significant intra-tumoral heterogeneity exists and may be a barrier to our ability to improve outcomes in patients with NF1-MPNST. These data suggest that multi-regional sampling may be necessary to understand clonal evolution, and for driver gene identification and biomarker development in the future.