Histo-molecular differentiation of renal cancer subtypes by imaging mass spectrometry and rapid proteome profiling

Summary Background Pathology assessment and differentiation of renal cancer types is challenging due to overlapping histological features of benign and malignant tumors, necessitating high-level expertise. Mass spectrometry (MS) is an emerging tool for tumor classification of clinical tissue sections by spatial histo-molecular imaging or quantitative microproteomics profiling. Results We applied MALDI MS imaging (MSI) and LC-MS/MS-based microproteomics technologies to analyze and classify renal oncocytoma (RO, n=11), clear cell renal cell carcinoma (ccRCC, n=12) and chromophobe renal cell carcinoma (CRCC, n=5). Both methods distinguished ccRCC, RO and CCRC with high accuracy in cross-validation experiments (MSI: 93-95%, LC-MS/MS: 100%). Significance This integrated strategy combining MSI and rapid proteome profiling by LC-MS/MS reveals molecular features of tumor sections and enables cancer subtype classification. Mass spectrometry is a promising complementary approach to current pathological technologies for precise digitized diagnosis of diseases.


Introduction
Kidney cancer (renal cell carcinoma, RCC) accounts for 2.2% of all diagnosed cancers and it is the 13 th most common cause of cancer deaths worldwide 1 . Clear cell renal carcinoma constitutes 70% of all kidney cancers 2 and exhibit the highest rate of metastasis among renal carcinomas. Two other common but less aggressive subtypes of renal carcinoma are chromophobe renal carcinoma (CRCC) and the essentially benign oncocytoma (OC) which account for 5% and 3-7 % of all cases, respectively 3,4 . The ability to distinguish between the malignant cancer types ccRCC and CRCC and the benign RO is crucial for a patient in terms of prognosis, progression and intervention strategies as severe as total nephrectomy.
Histopathological kidney cancer diagnostics faces many challenges in daily routine. Typically, test panels consisting of a combination of different chemical and immuno-histochemical staining methods are used to systematically obtain a diagnosis 5 . Overlapping histological features can make it difficult to differentiate tumor types. Analysis, interpretation and diagnosis/prognosis greatly rely on visual inspection and the experience of the involved clinical pathologists. Complementary techniques such as MRI and electron microscopy involve costly instrumentation. Moreover, specific antibodies for staining can be expensive or unavailable for certain molecular targets. Mass spectrometry is emerging as a promising new tool in translational research, from molecular imaging of tissue sections to deep protein profiling of tissue samples 6 . The digital data readout provided by high mass accuracy mass spectrometry and feasibility of molecular quantification makes it a very attractive technology in translational research for investigating human diseases and for diagnostics and prognostics purposes in the clinic. Improvements in mass spectrometry instrument performance and computational analysis paved the way for applications in clinical microbiology 7 and clinical genetics analysis 8 . The fact that mass spectrometry can be applied to a variety of different bio-molecules such as peptides, lipids, nucleic acid makes it extremely versatile and expands the translational and diagnostic possibilities greatly [8][9][10][11] .
Molecular imaging of tissue sections by MALDI mass spectrometry was introduced more than 20 years ago 12,13 and it has been applied in translational research and clinical applications, to study injuries, diseases, or distinguish between different cancer types such as Pancreatic Ductal Adenocarcinoma or Epithelial Ovarian Cancer Histotypes [14][15][16][17][18] .
Mass spectrometry-based proteomics relies on advanced LC-ESI-MS/MS technology, where peptide mixtures are separated by liquid chromatography (LC) prior analysis by electrospray ionization tandem mass spectrometry (ESI MS/MS) and protein identification by protein database searching 19,20 . Current LC-MS/MS strategies enable comprehensive quantitative protein profiling from tissues and body fluids 21,22 . While having been used to identify potential biomarkers or new candidate cancer targets and molecular signaling networks the relatively long LC gradients (hours) and extensive sample preparation protocols make it difficult to apply in a routine clinical setting. Modern mass spectrometers are steadily increasing in sensitivity and scanning speed 23 . In addition, improved chromatographic systems that enable rapid solid phase extraction integrated with reproducible separations are emerging [24][25][26][27] , enabling fast (minutes) and sensitive (nanogram) analysis of complex biological samples.
We hypothesized that the combination of imaging MALDI MS, in situ protein digestion and LC-MS/MS applied to detailed characterization of 5 µm cancer FFPE tissue sections will provide both spatial molecular maps and sufficiently deep proteome profiles to characterize and classify tumor subtypes. We investigated this by testing a series of malignant and benign renal carcinomas, including clear cell renal cell carcinoma (ccRCC), chromophobe renal cell carcinoma (CRCC) and renal oncocytoma (RO). We obtained molecular images at a resolution of 150µm x150µm that sufficed to spatially resolve features to distinguish tumor subtypes. Miniaturized sample preparation by in situ protein digestion was used to recover peptides from distinct areas of the tumor sections for rapid proteome profiling by LC-MS/MS.

Formalin fixed paraffin embedded samples:
Patient samples were collected at Odense University Hospital, Denmark. All samples were obtained upon patient's consent. Formalin fixed paraffin embedded (FFPE) tissues from 11 OC patients, 12 cccRCC patients and 5 CRCC patients were used for LC-MSMS analysis (for CCRC due to the lower number of patients 2 subsequent slides were used from 2 patients adding up to a total of 7 sections). Out of the patient cohort 9 OC, 9 ccRCC and 5 CRCC were used for imaging mass spectrometry analysis.

Tissue preparation:
Preparation of formalin fixed paraffin embedded samples FFPE blocks were cut into 5 µm thick sections and mounted onto indium tin oxide (ITO) covered glass slides (for MSI) or regular microscopy glass slides (for LC-MS/MS). Before deparaffination slides were left on a heated block at 65° C for 1 hour to improve adhesion.

Deparaffination
FFPE section slides were incubated in Xylene for an initial 10 min. and then another 5 min.
using fresh solution each time. Slides were shortly dipped into 96% EtOH before they were washed for 2 min in a mixture of chloroform/Ethanol/AcOH (3:6:1; v:v:v). The slides were then washed in 96% EtOH, 70% EtOH, 50% EtOH and Water for 30 sec. each.

Antigen retrieval
Tissue slides were heated in 10mM citric acid buffer pH 6 for 10 min in a microwave oven at 400 Watt (just below the boiling point) before left for further 60 min incubation at 98°C. Slides were cooled down to room temperature and incubated for 5 minutes in 25 mM ammonium bicarbonate (ABC) buffer. Slides were allowed to dry before application of trypsin protease.

Tryptic digest
For MALDI imaging: 20µg of Trypsin (Promega) was used per slide and was dissolved at a concentration of 100ng/µl in 25mM ABC /10% ACN before being deposited on the tissue using the iMatrixSpray device equipped with a heating bed (Tardo Gmbh, Subingen, Switzerland 28 ) using the following settings: sprayer height = 70mm, speed = 70mm/s, density = 1µL/cm 2 , line distance= 1 mm , gas pressure= 2.5 bar, heat bed temperature= 25°C . After trypsin deposition the slides were incubated in a humid chamber containing 10mM ABC/ 50% MeOH at 37°C over night.
For on-tissue digest intended for LC-MS/MS proteome profiling: Droplets of 2µl Trypsin solution (50ng/µL in 25mM ABC /10%ACN, 0.02%SDS) were deposited using a gel loading pipet tip. Droplets were placed on 3-4 different tumor areas of each FFPE tissue section. The droplets were shortly allowed to dry in order to prevent spreading across the tissue. Slides were transferred to a humid chamber (10mM ABC /50% MeOH) for overnight digestion at 37°C. After digest the digestion spots were extracted twice with 2µL of 0.1% FA and twice with 1.5µL of 30%ACN.

Matrix application
Matrix solutions were freshly prepared from recrystallized α-cyano-4-hydroxycinnamic acid (CHCA) matrix (10mg/mL in 50% Acetonitrile 1% TFA). Matrix was sprayed using the iMatrixSpray (Tardo, Switzerland). Temperature of the heat bed was set at 25°C. The sprayer distance was set to 70mm. Spray speed was set to 100 mm/s. Matrix was sprayed in 3 rounds: 8 cycles with a flowrate of 0.5µl/cm 2 line distance of 1mm, 8 cycles of 1µl/cm 2 line distance of 1mm, 8 cycles of 1µl/cm 2 and a line distance of 2mm.

MALDI MS Imaging data acquisition
Optical images of the tissue were obtained before matrix application using a flatbed scanner

LC-MS/MS analysis
LC-MS/MS data was acquired by an Orbitrap Q-Exactive HF-X (Thermo, Bremen) coupled to an Ultimate 3000 capillary flow LC-system. Peptide samples were loaded at 150µl/min (2% ACN, 0.05% TFA) for 30 sec onto a 5µm, 0.3 x 5 mm, Acclaim PepMap trapping cartridge (Thermo Scientific). Samples were then eluted onto a pulled emitter analytical column (75µm ID, 15cm). The analytical column was "flash-packed" 29 with C 18 Reprosil Pur resin (3µm) and connected by Nanoviper fittings and a reducing metal union (Valco, Houston, TX). The flowrate of the 15 min gradient was 1.2 µL/min with solvent A: 0.1% formic acid (FA) and solvent B: 0.1% FA in 80% ACN. Gradient conditions for solvent B were as followed: 8% to 25% in 10 min, 25% to 45% in 1.7 min. The trapping cartridge and the analytical column were washed for 1 min at 99%B before returning to initial conditions. The column was equilibrated for 2 min.

Data Processing of MALDI MS imaging data
The data was baseline subtracted, TIC normalized and statistically recalibrated and then exported into imzML format 30 using the export function of FlexImaging software (Bruker). The exported mass range was 600-3000 m/z with a binning size of 9600 data points. The imzML files were imported into the R environment (version: 3.4.1) and further processed and analyzed using the R MSI package: Cardinal (version: 2.0.3) 31 . In order to extract pixels of tumor tissue each sample was preprocessed as follows: peaklist was generated by peak picking in every 10 th spectrum and subsequent peak alignment. The whole data was then resampled using the "height" option and the previous created peaklist as spectrum reference.
PCA scores were plotted using car-package (version 3.0.6). Samples were clustered using spatial shrunken centroid clustering 32 . Subsequently, pixel coordinates of cluster containing tumor areas (HE-stain comparison, Supplementary Figure S1) were extracted and manually trimmed, if necessary, so that result files predominantly contained data from tumor areas. The obtained coordinates were then used to extract the corresponding pixel from the unprocessed imzML file. Each tumor type was assigned with a diagnosis factor (ccRCC, RO or CRCC), which was later used as y-argument in the cross-validation. All extracted imaging acquisition files were further restricted to a mass range of m/z 700-2500. Data was resampled with step size 0.25 Da and combined into one file for further processing. Classification and cross validation were performed using partial least square discriminant analysis (PLS-DA) 33 . PLS components were tested for optimum with 30 components for ccRCC/RO comparison and 12 components for RO/CRCC comparison (Supplementary Figure S4).

LC-MS/MS data processing
The MaxQuant 34 software package (version 1.5.7.0) was used for protein identification and label-free protein quantitation. LC-MS/MS data was searched against the Swissprot human proteome database, using standard settings and "match between runs" enabled.

Results:
In this study we investigated the utility of mass spectrometry-based methods for histomolecular profiling applications in clinical renal cancer pathology. We analyzed thin tissue/tumor sections from three different renal cancer types (ccRCC, RO, CRCC) by imaging MALDI MS and by an optimized rapid LC-MS/MS workflow adjusted to suit the demands for clinical settings.

IMAGING MALDI mass spectrometry
All samples were prepared as 5 µm thin FFPE tissue/tumor sections. Nine ccRCC samples were prepared and compared with nine RO samples, and five CRCC samples with five RO samples. From a pathology viewpoint RO and CRCC are more difficult to distinguish than ccRCC and CRCC. As the sample holder for the imaging experiments can only hold 2 slides at a time, two conditions (ccRCC vs. RO and RO vs. CRCC) were compared respectively.
The entire FFPE tissue section was analyzed by histo-molecular imaging MALDI MS and data subsequently processed by unsupervised clustering (spatial shrunken centroid clustering 32 ).
The clustering results ( Figure 1A and 1B) illustrate the heterogeneity of the tissue sections coming from various tissue types such as stroma, fibrotic fatty or healthy tissue. Furthermore, when comparing the tumor area of the HE-stain/microscopy with the results from the mass spectrometry based clustering, spectral differences even within the tumor tissue itself can be observed (Figure 1A and 1C).
Guided by the unsupervised clustering outcome and the corresponding image obtained by HE-staining, pixels from non-relevant surrounding tissue were discarded and only pixel clusters containing actual tumor tissue were used for subsequent comparative analyses.
Variance and similarities within the image sample set were estimated by principal component analysis (PCA). We compared scores from the imaging MALDI MS data from ccRCC and RO Next, we assessed the ability and performance of the MSI data to distinguish and classify renal cancer subtypes. We created a classifier based on partial least squares discriminant analysis (PLS-DA) and subsequently tested the classifier by cross-validation using the whole sample set. In this approach a classifier is trained with imaging data from all samples, except for the one sample being tested for the given condition. This was repeated for all tumor samples in order to test the complete dataset. The optimized PLS-DA model resulted in an accuracy of cancer subtype prediction of 94% (based on all pixels of the sample test set). These differences and similarities were also revealed by PCA analysis of the microproteomics dataset. RO and CRCC separate clearly from ccRCC ( Figure 5B). RO and CRCC datapoints are located close together, indicating that differences between the RO and CRCC cancer subtypes are less dominant. When considering principal components exhibiting less variance (PC3 and PC4), separation of RO and CRCC sample data can be observed (Figure 5B).
We observed eight CRCC samples that separated clearly form the other CRCC samples, both in hierarchical clustering analysis ( Figure 5A) and PCA (Figure 5B). The protein expression profile of these 8 samples exhibited some similarities to both CRCC and ccRCC. Interestingly, this data originated from a tumor from a single patient. Further pathology analysis revealed that these samples were sarcomatoid renal cancer, originating from CRCC and, thus, indeed different from the other CRCC samples.

Protein differences in cancer subtypes
Hierarchical clustering revealed major differences in relative protein abundance between the three cancer types. (Figure 5A). We investigated the nature of these histo-molecular differences by examining the correlation of these proteins to cellular structures, functions, or biochemical processes. Protein groups that exhibited distinctive abundances for the respective cancer type (

Classification
Unsupervised data analysis demonstrated the presence of renal cancer subtype specific differences in the tumor protein profiles. Next, we investigated the feasibility of tumor classification by using the microproteomics data to train a prediction algorithm. We implemented the tumor classification model by using a support vector machine (SVM) approach. We chose the k-fold cross validation strategy 40 ("n-fold" in Perseus). Here the data is randomly distributed in k groups. The model was then trained with data from k-1 groups and the prediction was applied to the samples in the remaining group. This was repeated k times. Low k-values tend to overestimate error rates. In our study 3-5 extraction spots (samples) were derived from the same patient so too high k-values could underestimate the true error rate. We therefore tested the prediction rate error over several k-values (Figure 7 A) applying Radial Basis Function (RBF) and linear kernel functions 41 . The tested error rates were in the range of 3.2% (4 wrong predictions) at the highest (k=3, linear kernel) and 0 % at the lowest. However, k=3 is a very low k-value (excluding more than 41 samples from the training set). We argue that the error rate is most likely overestimated in this case. For more commonly used k-values (k=5-10) the error rate was between 0.8% (1 wrong prediction) and 0% for RBF and 1.6% (2 wrong prediction) and 0% for linear kernel function. False predicted samples included samples from one OC patient predicted as ccRCC. Generally, RBF performed slightly better than linear kernel. Figure 7B exemplifies the outcome of the crossvalidation resulted for RBF and k=6 (around 20 samples per group equivalent to 4 patients excluded from the training set). Each sample was scored for the three tested conditions (ccRCC, RO, CRCC). The highest scoring condition was used to classify a given sample.
Results are shown in a radar plot ( Figure 7B) and demonstrate 100% accuracy in prediction of renal cancer subtypes. Interestingly despite the similarity to ccRCC ( Figure 5B) the sarcomatoid patient samples were predicted as CRCC when given the choice between only the three subcancer types. This is in coherence with their cancer type origin prior transformation.
We initially used all 346 differentially abundant histo-molecular features (proteins) to classify the tumor subtypes. Next, we sought to estimate the minimum number of features that suffice to correctly classify all the renal tumor samples (for k=6 and RBF). We used the feature optimization function in the Perseus software to rank the features and tests the error rate when decreasing the number of features (Figure 7C). The minimal number out of the 346 features was found to be 43 features (list of ranked proteins can be found in supplementary material 9). Further reducing the number to 21 features resulted in an error rate of 1.6% and as little as 6 features lead to an error rate of 9.6%. Conclusively only a portion of the dataset, i.e. at least 43 features would suffice to successfully classify all the kidney tumor samples.
However, keeping an excess of quantified protein features would be beneficial as "safety margin" assuring a high enough number of quantified protein features for robust classification of tumors.

Discussion
The increasing incidence of renal cancer in western countries calls for improved technologies for detection, diagnosis, treatment and prognosis. Innovative mass spectrometry-based applications are beginning to address challenges in clinics and the healthcare sector, such as the use of targeted proteomics to characterize noninvasive liquid biopsies 42 or the so called iKnife, enabling surgeons to identify cancerous tissue in real time 43,44 . Mass spectrometry is becoming increasingly applicable in a clinical setting 45,46 . FFPE sections are a valuable source for mass spectrometry-based diagnosis. As many of the sample preparation steps for MS analysis overlap with the preparation steps for (immuno)histochemical staining, they can be seamlessly fit into the high-throughput sample preparation pipeline for FFPE sections (deparaffination, antigen retrieval) already existing in many hospitals. Distinguishing numerous cancer types or subtypes and making decisions for treatment modalities are daily challenges in hospitals. Approaches that include "digital" large-scale data acquisition and computer-based machine learning algorithms provide deep molecular insight into the respective disease and provides valuable information for early detection, diagnostic and prognostic purposes.
Our proof of concept study demonstrates the potential and benefits of mass spectrometry techniques for detailed characterization of clinical specimen. Specifically, we demonstrate that mass spectrometry provides valuable results in the diagnosis of different renal cancer subtypes (ccRCC, RO and CRCC). The imaging mass spectrometry (MSI) approach allows to collect spatially resolved spectra without a priori knowledge of the tissue, thereby enabling the differentiation between cancerous and noncancerous tissue, as well as subtyping of tumors.
In our study MALDI-MSI could diagnose 93% of the tested patients correctly (16 out of 18 and 10 out of 10) distinguish between ccRCC and RO with 93.75 % accuracy and between RO and CRCC with 95% accuracy. Despite the high accuracy the classification seemed biased towards RO diagnosis. Our PCA data showed that the patient-to-patient tumor variability is significant for ccRCC, necessitating a detailed histo-molecular profile for robust MSI performance. The inclusion of many more patients (n>100) to increase the number of renal cancer tumor samples will likely provide even higher confidence and resolve this issue. Notably, unsupervised clustering identified data inconsistencies and irregularities in the patient cohort. An unexpected feature pattern revealed a sarcomatoid transformation within the CRCC cohort, without a priori knowledge (Figure 5A, 5B). This goes to demonstrate that once the "digital" data is acquired then the computational and statistical applications can uncover relevant and important features of the patient datasets. This sensitivity, specificity and versatility will have major implications for future clinical practices, including histomolecular pathology technologies.
The presented microproteomics method based on optimized, fast chromatographic separation and fast MS/MS sequencing of peptides identified more than 2100 proteins in thin renal tumor FFPE sections. Using short LC runs of only 15 min. we generated a list of 346 significantly altered proteins (p=0.01). The minimum number of proteins determined to be necessary for 100% accurate tumor classification was 43. This low number of features will enable a targeted proteomics approach aimed at quantifying a select panel of proteins. Using fewer features would allow a further reduction of LC run time and increase overall sample throughput. Using our fast LC-MS/MS setup we analyzed a total of 125 samples in a series without experiencing blocking of the LC columns, glass capillaries or ESI needles. LC systems such as the EvoSep system 25 that are specifically dedicated for clinical applications and tailored to be used also by non LC-MS experts can add additional robustness to our approach. Furthermore, implementation of image pattern recognition guided pipetting robots may enhance reproducibility and throughput, e.g. using liquid extraction surface analysis (LESA) technology 47,48 . The latter has been successfully applied in the study of traumatic brain injuries 49 as well as in mouse brain for the identification of proteins and peptides from MSI experiments 50 .
Functional protein analysis using bioinformatics tools revealed molecular networks and biochemical processes consistent with previously known macroscopic, morphological and histological features of the renal cancer subtypes. Cancer-type specific proteome expression features correlated to morphological characteristics of the respective cancer type. RO and CRCC exhibited upregulation of mitochondrial associated proteins. Indeed, increased numbers of mitochondria are frequently observed in these cancer types by electron microscopy 51 and have been identified in previous proteomics studies 52 . As most cancer rely on glycolysis as major energy source (Warburg effect) this seems rather unusual. However, those mitochondria are malfunctioning and it has been speculated that increase in number of mitochondria is a cellular response to the presence of dysfunctional mitochondria 53 .
In addition to mitochondria associated proteins increased intracytoplasmic associated proteins were detected in CRCC distinguishing it from the other cancer types. Microscopically, CRCC is distinguished from other renal carcinomas by its pale cytoplasm resulting from large intracytoplasmic vesicles. This accounts for our detection of an increase of intracellular cytoplasm-associated proteins and vesicle proteins, distinguishing CRCC from the other two renal cancer subtypes.
Clear cell renal cell carcinoma frequently contains zones of hemorrhage that are most likely responsible for the increased levels of complement and coagulation cascade associated proteins, as determined by our microproteomics method. ccRCC is also characterized by hypervascular stroma 3 , which may account for the strong enrichment of extracellular matrix proteins. Again, enhanced glycolysis is a hallmark of many cancer types including ccRCC 54 correlating well with our detection of upregulated glycolysis associated proteins by microproteomics.
For classification we applied PLS-DA to MSI data and support vector machine to the LC-MSMS data. These common classification methods have previously been applied to MSI for the differentiation of papillary and renal cell carcinoma based on lipidomics analysis 55  With the enormous progress in instrument technology, machine learning 58 and the availability of new databases 59 mass spectrometry is on its way to become a versatile tool in the hospitals of the future.