Proteomic profiling of soft tissue sarcomas with SWATH mass spectrometry

Soft tissue sarcomas (STS) are a group of rare and heterogeneous cancers. While large-scale genomic and epigenomic profiling of STS have been undertaken, proteomic analysis has thus far been limited. Here we utilise sequential window acquisition of all theoretical fragment ion spectra mass spectrometry (SWATH-MS) for proteomic profiling of formalin fixed paraffin embedded (FFPE) specimens from a cohort of STS patients (n=36) across four histological subtypes (leiomyosarcoma, synovial sarcoma, undifferentiated pleomorphic sarcoma and dedifferentiated liposarcoma). We quantified 2951 proteins across all cases and show that there is a significant enrichment of gene sets associated with smooth muscle contraction in leiomyosarcoma, RNA splicing regulation in synovial sarcoma and leukocyte activation in undifferentiated pleomorphic sarcoma. We further identified a subgroup of STS cases (independent of histological subtype) that have a distinct expression profile in a panel of 133 proteins, with worse survival outcomes when compared to the rest of the cohort. Our study highlights the value of comprehensive proteomic characterisation as a means to identify histotype-specific STS profiles that describe key biological pathways of clinical and therapeutic relevance; as well as for discovering new prognostic biomarkers in this group of rare and difficult-to-treat diseases.


Abstract
Soft tissue sarcomas (STS) are a group of rare and heterogeneous cancers. While largescale genomic and epigenomic profiling of STS have been undertaken, proteomic analysis has thus far been limited. Here we utilise sequential window acquisition of all theoretical fragment ion spectra mass spectrometry (SWATH-MS) for proteomic profiling of formalin fixed paraffin embedded (FFPE) specimens from a cohort of STS patients (n=36) across four histological subtypes (leiomyosarcoma, synovial sarcoma, undifferentiated pleomorphic sarcoma and dedifferentiated liposarcoma). We quantified 2951 proteins across all cases and show that there is a significant enrichment of gene sets associated with smooth muscle contraction in leiomyosarcoma, RNA splicing regulation in synovial sarcoma and leukocyte activation in undifferentiated pleomorphic sarcoma. We further identified a subgroup of STS cases (independent of histological subtype) that have a distinct expression profile in a panel of 133 proteins, with worse survival outcomes when compared to the rest of the cohort. Our study highlights the value of comprehensive proteomic characterisation as a means to identify histotype-specific STS profiles that describe key biological pathways of clinical and therapeutic relevance; as well as for discovering new prognostic biomarkers in this group of rare and difficult-to-treat diseases.

Introduction
Soft tissue sarcomas (STS) are a group of rare and heterogeneous malignancies of mesenchymal origin, comprising more than 100 distinct diagnostic subtypes that are primarily defined by histological characteristics (1). Multiple gene expression-based studies across different histological subtypes have led to a deeper understanding of the oncogenic processes driving STS development and progression as well as the identification of prognostic factors for these cancers (2)(3)(4)(5). Whilst current diagnostic evaluation of STS is reliant on histological assessment by specialist sarcoma histopathologists, supplemented by specific molecular tests in selected subtypes, recent large-scale genomic and epigenetic analyses have demonstrated the feasibility of refining the classification system of STS based on intrinsic underlying biology (3,4). Collectively, these studies highlight the promise of molecular profiling strategies in improving our knowledge of drivers of sarcoma pathogenesis, providing complementary information to aid in the molecular classification of these heterogeneous tumours, identify new therapeutic targets and develop clinically relevant prognostic biomarkers.
While the advent of next generation sequencing (NGS) has accelerated the use of genomic and epigenetic profiling in STS, proteomic analysis in this disease has been limited (6).
Comprehensive analysis of the tumour proteome is highly informative as proteins represent the largest class of druggable targets and directly reflect the functional state of biological pathways (7) . Proteomic analysis of STS has been performed by The Cancer Genome Atlas (TCGA) consortium in a cohort of 173 flash-frozen sarcoma specimens across six histological subtypes (3). However, this study utilised the reverse phase protein array (RPPA) platform which is limited to the analysis of 192 proteins/phosphoproteins. Other studies that utilise mass spectrometry (MS) have largely been limited to 1-2 subtypes including a recent proteomic analysis of 13 gastrointestinal stromal tumours (GIST) which utilised mass spectrometry (MS) to identify three clinical subgroups (6,8). To date no MSbased analysis has been conducted across multiple histological subtypes in STS.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2020. . https://doi.org/10.1101/2020.06. 11.20128355 doi: medRxiv preprint In this study, we utilise sequential window acquisition of all theoretical fragment ion spectra (SWATH)-MS to undertake proteomic profiling of FFPE tissue specimens in a cohort of STS patients across four histological subtypes (n=36). We characterise the intrinsic biological pathways associated with specific histotypes and explore the association of proteomic data with patient outcome. This study represents, to our knowledge, the most comprehensive analysis of the proteome in multiple STS subtypes to date and highlights the potential of proteomic profiles as predictors for patient outcome in these rare cancers.

Patient characteristics
The cohort is comprised of FFPE tumour material from 36 patients treated at The Royal Marsden Hospital. These specimens were obtained from surgical resections of primary tumours from four of the more common STS subtypes: leiomyosarcoma (LMS) (n=12), synovial sarcoma (SS) (n=7), undifferentiated pleomorphic sarcoma (UPS) (n=10) and dedifferentiated liposarcoma (DDLPS) (n=7). Baseline clinico-pathological characteristics of these patients are summarised in Table 1.

Quantitative analysis of the STS proteome
STS specimens were subjected to protein extraction and SWATH-MS analysis in technical duplicates as outlined in Figure 1. This analysis led to the identification and quantification of 2951 proteins across all samples ( Figure 2A and Table S1). Applying t-Distributed Stochastic Neighbour Embedding (t-SNE) on the full dataset, four major groups can be visualised on the 3D-tSNE plot ( Figure 2B), which correspond to the four distinct histological subtypes.
These results demonstrate that SWATH-MS profiling can reveal intrinsic biology associated with distinct histological subtypes which are characterised by unique proteomic signatures.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2020. . https://doi.org/10.1101/2020.06.11.20128355 doi: medRxiv preprint

Defining biological processes that are enriched in STS histological subtypes
To gain an understanding of the underlying biological pathways in each histological subtype, we undertook gene set enrichment analysis (GSEA) of the full proteomic dataset (9). The top 20 ranked enriched biological processes are shown in Figure 3. Gene sets significantly enriched in LMS versus the rest of the cohort comprise of those involved in muscle development and contraction. This finding is consistent with the smooth muscle lineage of this histological subtype which has also been reported in published gene expression studies of LMS (2,10). The utility of SWATH-MS data to rediscover known molecular processes highlights the validity of this approach for pathway discovery. In the case of SS, GSEA identifies RNA splicing and processing to be key gene sets significantly enriched in this histological subtype. This result is in line with a previous study which showed that the SS18-SSX fusion in synovial sarcomas binds to the ribonucleoprotein SYT-interacting protein/coactivator activator (SIP/CoAA) which is a key modulator of RNA splicing (11). Meanwhile, gene sets significantly enriched in UPS finds a number of biological processes linked to leukocyte activation and metal ion transport. Notably, Phase II STS clinical trials of immune checkpoint inhibitors have reported evidence of clinical activity in UPS (12,13), underscoring the value of SWATH-MS in identifying biologically meaningful processes with potential clinical utility. No ontologies were found to be significantly enriched in DDLPS.

Identification of key protein complexes and signalling networks operating in STS histotypes
Using the multiclass Significance Analysis of Microarray (SAM) method, 277 proteins (FDR <0.1%) were found to be significantly differentially expressed across these four histotypes ( Figure S1 and Table S2). Of these proteins, 103 were found to be significantly upregulated only in LMS, and 23 seed proteins were significantly mapped to directly interact with each other to form a zero-order network according to the STRING interactome database ( Figure   4A). These 23 seed proteins formed a subnetwork including the core machinery required for the regulation of smooth muscle contractile activity such as the myosin light chains (MYL1, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2020. In SS, 103 proteins were found to be uniquely significantly upregulated; 28 of which were identified as seed proteins that significantly mapped to directly interact with each other to form a zero-order network. Twenty-four of these proteins are key components of mRNA splicing regulation. Cross-referencing these proteins to the SpliceosomeDB of spliceosome components (16) showed that the proteins enriched in SS comprise of the SR protein class (SRSF1, SRSF3, SRSF7, SRSF9, SRSF11) involved in constitutive and alternative pre-mRNA splicing, the heterogeneous ribonucleoprotein particle (hnRNPs) class and proteins involved in the formation of the spliceosome A Complex (HNRNPA1, U2AF2, SNRNP70) ( Figure 4B). Twenty-nine proteins were found to be upregulated only in UPS, and 3 seed proteins directly interact with each other, all components of the MHC class 1 complex (HLA-A, HLA-B and B2M). In DDLPS, 13 proteins were uniquely upregulated with no zero-order network identified. Notably, CDK4 is one of these 13 upregulated proteins which is in agreement with the molecular pathology of this disease where CDK4 is amplified in ~90% of DDLPS; and CDK4/6 inhibitors (palbociclib, ribociclib and abemaciclib) are currently being evaluated in the treatment of this histological subtype (17)(18)(19).
Identification of proteomic profiles that are predictive of STS patient outcome . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2020. . https://doi.org/10.1101/2020.06. 11.20128355 doi: medRxiv preprint In order to evaluate if there are proteins within our dataset that are associated with overall survival (OS) in our cohort of patients, univariable Cox regression analysis was performed for each of the 2951 proteins as exploratory analyses, and we selected a total of 133 proteins with p<0.05 ( Figure 5A and Table S3). Subsequently, by hierarchical clustering of the 36 cases based on these 133 protein expression values, three subgroups of mixed histological subtypes were identified ( Figure 5B). These three groups were associated with significantly differential OS, with Group 2 comprising cases demonstrating the worst survival estimate ( Figure 5C, Log-rank p<0.00001). Adjusting for other clinicopathological factors including age, tumour size, grade and histological subtypes, the molecular subgroups remained an independent prognostic factor. After adjustment for other prognostic factors, patients in Group 2 were 81 times more likely to die (Hazard ratio 81; 95% Confidence Interval 6-1085; p=0.0009) compared to patients in Group 1. No statistically significant biological pathways were identified to be enriched in the 133 proteins based on overrepresentation analysis (20).

Discussion
In this study we have undertaken the most comprehensive proteomic analysis of multiple STS subtypes to date and have demonstrated the utility of SWATH-MS in the following applications: 1. Defining unique proteomics signatures associated with distinct histological subtypes, 2. Identifying, within histological subtypes, biological processes and key protein networks and 3. Defining histotype-independent candidate biomarkers for predicting patient outcomes. Expanded proteomic analysis incorporating larger STS cohorts with an increased number of histological subtypes will enable further refinement of this approach as a means of gaining new insights into the biology and molecular drivers of this rare group of cancers.
In particular, the enrichment of an immune response signature in UPS, where immune checkpoint inhibitors have showed some clinical activity (12,13), suggests that future inclusion of proteomic profiling in cancer immunotherapy trials may lead to new predictive biomarkers that complement currently available approaches (21).
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In conclusion, we have employed SWATH MS to profile FFPE tissue specimens across four sarcoma subtypes and have identified histotype-specific proteomic profiles that describe key . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2020. . https://doi.org/10.1101/2020.06.11.20128355 doi: medRxiv preprint biological pathways as well as discovered a new candidate prognostic biomarker protein panel in STS. We anticipate the future application of this strategy to additional STS subtypes and clinical trial cohorts has the potential to deliver new therapeutic targets and define predictive and prognostic biomarkers in these rare and difficult-to-treat diseases. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2020. . https://doi.org/10.1101/2020.06.11.20128355 doi: medRxiv preprint 8.8, 0.50% (w/v) sodium deoxycholate, 0.35% (w/v) sodium lauryl sulphate) was added at a ratio of 200ul/mg of dry tissue. The sample was homogenised using a LabGen700 blender (ColeParmer) with 3x 30s pulses and sonicated on ice for 10 min, then heated at 95⁰C for 1 h to reverse formalin crosslinks. Lysis was carried out by shaking at 750rpm at 80⁰C for 2 h. The sample was then centrifuged for 15min at 4°C at 14,000rpm and the supernatant collected. Protein concentration in the homogenate was measured by bicinchoninic acid (BCA) assay (Pierce) The extracted protein sample was digested using the Filter-Aided Sample Preparation (FASP) protocol as previously described in (27). Briefly, each sample was placed into an Amicon-Ultra 4 (Merck) centrifugal filter unit and detergents were removed by several washes with 8M urea. The concentrated sample was then transferred to Amicon-Ultra 0.5 (Merck) filters to be reduced with 10mM dithiothreitol (DTT) and alkylated with 55mM iodoacetamide (IAA). The sample was washed with 100mM ammonium bicarbonate (ABC) and digested with trypsin overnight (Promega, trypsin to starting protein ratio 1:100 µg). Peptides were collected by two successive centrifugations with 100mM ABC and desalted on C18 SepPak columns (Waters). The desalted peptide samples were then dried in a SpeedVac concentrator and stored at -80°C.

SWATH-MS data acquisition and processing
Samples were resuspended in a buffer of 2% ACN/ 0.1% formic acid, spiked with iRT calibration mix (Biognosys AG) and analysed on an Agilent 1260 HPLC system (Agilent Technologies) coupled to a TripleTOF 5600+ mass spectrometer with NanoSource III (AB SCIEX). is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2020. .

1
isolation windows with a fixed size of 13 Da across the mass range of m/z 380-1100 with 1 Da overlap. MS/MS scans were acquired in the mass range of m/z 100-1500. Maximum filling time for MS scans was 250 ms and for MS/MS scans 100 ms, resulting in a cycle time of 3.1 s. SWATH spectra were analysed using Spectronaut 11 (Biognosys AG) against a published human library (28). FDR was restricted to 1% and only the top 6 peptides were used for quantification of a protein. Peak area of 2 to 6 fragment ions was used for peptide quantification, and the mean value of the peptides was used to quantify proteins. 2 peptides were set as minimum requirement for inclusion of a protein in the analysis.

Data processing and statistical methods
Data were log2 transformed, quantile normalised at sample level, followed by feature level (protein) centering across the samples to remove technical bias such as batch effect. The proteomics data was visualised using 3D-t-Distributed Stochastic Neighbour Embedding. To assess similarity of protein expression profiles, pairwise Pearson correlations between all samples were visualized using the 'ComplexHeatmap' R package (29). Dendrograms were created using complete linkage clustering and the Euclidean distances between the resulting correlations. Differential expression analysis was performed using Significant Analysis of Microarray upon a false discovery rate (FDR) less than 0.1% (30). Gene Set Enrichment Analysis (GSEA) version 4.0.1 was used to identify gene sets from the Hallmark database (v7.0, 'c5.bp.v7.0.symbols.gmt') that were significantly enriched between each subtype and rest of the groups respectively. Protein-Protein Interaction (PPI) network analysis and visualisation was performed using NetworkAnalyst 3.0 pipeline against the STRING interactome database, with settings to confidence score >900 and experimental evidence required (31)(32)(33). Univariable survival analysis of overall survival was estimated by Kaplan-Meier curve and Log-rank test for significance. Event was defined as death. Time was defined as surgery to death or last follow-up. Univariable and multivariable Cox proportional hazards analyses were used to assess the prognostic significance of the different variables.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Availability of data and materials
The MS proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (34) partner repository with the dataset identifier PXD019719

Competing interests
The authors declare that they have no competing interests.

Acknowledgements
We thank Chris Wilding, Nafia Guljar and Jess Burns for assistance in the curation of clinical data and specimens.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2020. . https://doi.org/10.1101/2020.06.11.20128355 doi: medRxiv preprint     . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2020. . https://doi.org/10.1101/2020.06.11.20128355 doi: medRxiv preprint   Table S2. LMS is leiomyosarcoma, SS is synovial sarcoma, UPS is undifferentiated pleomorphic sarcoma and DDLPS is dedifferentiated liposarcoma.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2020. . https://doi.org/10.1101/2020.06.11.20128355 doi: medRxiv preprint