Emerging Proteomic Technologies Provide Enormous and Underutilized Potential for Brain Cancer Research

High-throughput technologies present immense opportunities to characterize brain cancer biology at a systems level. However, proteomic studies of brain cancers are still relatively scarce. Here we discuss the latest proteomic technologies, their application to profiling and quantitation of brain proteomes and how we expect these technologies will be applied to study brain cancer proteomes in the future. Mass spectrometry based proteomics with increased specificity, coverage and throughput will be pervasive in proteomics investigations of brain. Generated data needs to be captured by the curation of databases, and application of creative data analysis strategies is needed to provide meaningful insights into brain functions and associated pathologies. Overall, proteomics applications to brain cancers are in the earliest stages and the expanded use of these technologies holds enormous potential to improve our understanding of brain functions and pathologies.

Technological breakthroughs in proteomics present important new research opportunities, including for brain cancer studies. These advances span areas such as single cell proteomics, targeted proteomics via mass spectrometry (MS) (the Nature method of the year in 2012 (1)), profiling of post translational modifications (PTMs), and large-scale protein screening chips based on modified aptamers and Elisa assays. Multiparameter proteomics approaches have also been recently adopted in the clinic (2). In this brief perspective, we discuss a number of the latest advances in proteomics and their implications for brain cancer research going forward. Some of these technologies have been applied already to brain cancer research and some have not. Essentially none of them have been applied very extensively yet. This brief article is not meant to be a comprehensive assessment of excellent work across this field, but to be indicative of areas of promise from our point of view for some of the latest work in this field.

Coordinated Analysis of Single Cell Proteomics Data for
Deep Biological Insight-Single cell proteomics technologies provide an important capability to study the high degree of cellular heterogeneity found in tumors (3,4) at a higher level of resolution, which is critical to understanding their resilience to therapy and to designing strategies for treatments that deal with the problem of recurrence (5). A microfluidics-based single-cell proteomic chip-the single-cell barcode chip-has demonstrated its usefulness in studying brain tumor cell models (6). The prototype single-cell barcode chip consisted of a two-layer microfluidic networks with 120 microchambers of 2 nL volume. The assembly process started with a duplicate set of DNA barcodes (7), which were later converted into an antibody array using DNA-encoded antibody libraries (8) prior to cell loading. Protein assays were originally developed using antibody pairs recognizing 11 signaling molecules-mostly phosphorylated proteins-from the receptor tyrosine kinases (RTK), PI3K and MAPK pathways. Single cells (or a defined small number of cells) were captured by the antibodies, lysed, and then detected by applying biotinylated paring detection antibodies and fluorophore-labeled streptavidin.
The single-cell barcode chip has been applied to study signal transduction in brain tumor glioblastoma using three isogenic GBM cell lines (U87 and its genetically engineered EGFRvIII and PTEN variants) at the basal level, stimulated by EGF and/or inhibited by erlotinib (6). By assessing the level of relevant pathway proteins under both genetic and molecular perturbation, protein correlation networks were established under the conditions of both ligand stimulation and receptor inhibition and revealed previously unappreciated network interconnectivity in the distinct genetic context, shedding new light on potential therapeutic resistance.
A single-cell barcode chip can be easily expanded to include 35-40 proteins, depending upon the availability of antibody pairs. The primary constraints to this approach being widely adopted thus far include a need for improved control of the number of cells going into each microchamber, lack of a sophisticated commercial platform, and variations in handling the complex primary tumor specimen in clinical settings. Another major caveat for single cell analysis is that during the process of generating single cell suspension, the 3-D tumor structure is inevitably destroyed, and thus spatial information that is often critical for understanding cancer pathogenesis is lost. These challenges are all surmountable, however. Combining single cell proteomics with imaging analysis such as in situ hybridization (ISH) tissue sections as demonstrated by the The Ivy Glioblastoma Atlas Project (http:// glioblastoma.alleninstitute.org/) will provide much needed anatomic information and make relevant single cell analysis more feasible.
Single Cell Proteomic Analyses Using CyTOF-Another promising technology for single cell analysis to highlight protein diversity across tumor cells is Mass Cytometry or CyTOF (Cytometry by Time-Of-Flight) 1 . CyTOF technology marries the high throughput cell-handling capability of conventional flow cytometry with the analytical capacity of atomic mass spectrometry (9). Its major advantage is that protein targetsnormally in the range of 30 -40 as compared with the single digit capabilities of conventional flow-cytometry-can be detected simultaneously in a given cell, without compromising the number of cells being analyzed in a single experiment (10).
The working principle of the CyTOF technology is the isotopic tagging of normally undetectable antibodies with rare metallic elements normally undetectable in cells, coupled with ionization and detection by time-of-flight (TOF) mass spectrometry. Because several dozens of isotopically pure metaltagged antibodies have been generated so far, both cell surface protein markers (for cellular phenotypes) and intracellular signaling proteins (for functional or pathologically perturbed networks) can be designed into a single experiment, allowing stratification of heterogeneous cell populations, and concurrent dissection of the underlying functional pathways (e.g. phosphorylation status) for each cell type. Although (to our knowledge) not yet applied to brain cancers, CyTOF technology has been applied in the studies of hematopoietic differentiation (11), immune response to surgical trauma (12), T-cell receptor signaling (13), B-cell development (14), and cellular responses to various perturbations (11). This technology has wide applicability and provides a means to start characterizing cell-to-cell variability in coordinated protein concentrations at a larger scale than previously possible.
Using CyTOF, the Nolan group published a time resolved molecular roadmap to iPSC reprogramming (15) and identified previously unappreciated intermediate cellular states with distinct subsequent differentiation fates, suggesting that such a system can be applied to study cancer progression. Although practically more challenging for analyzing solid tumors, additional applications of the CyTOF strategy can be envisioned for solid tumor research (including brain cancers), in a way similar to recently reported single cell transcriptomics studies for glioblastoma (16). Limitations for wide adoption of the CyTOF technology are the number of available metal isotopes (gatekeeping the number of proteins that can be measured simultaneously), the current cost of instrumentation and reagents, and the lack of availability of high quality antibodies against proteins of interest.
Targeted Mass Spectrometry via Selected Reaction Monitoring (SRM)-Targeted proteomics via mass spectrometry allows for consistent and reproducible measurement of the same set of proteins at high throughput, and thus provides an excellent platform for brain cancer biomarker discovery, sample characterization, cell perturbation studies, and so forth. Using targeted techniques such as selected reaction monitoring (SRM) (17) (also known in the literature as multiple reaction monitoring, MRM), proteins can be measured with high selectivity, precision, and reproducibility. For each peptide ion to be analyzed in SRM, the first quadrupole in the mass spectrometer filters the selected peptide ion, the second quadrupole (collision-cell) fragments the ion, and then the third quadrupole measures the intensity of the specific fragment ions of the selected peptide. For each peptide of interest, equal amounts of a corresponding isotopically labeled peptide (heavy) is added to the sample proteomic mixtures (light) and the ratio of light to heavy is utilized to compare the protein differences between two conditions. The SR-MAtlas (www.srmatlas.org), a resource developed by Robert Moritz and colleagues, provides peptides and SRM assay parameters for different organisms (including human) that can be downloaded as transition lists for direct deployment. With these resources, performing targeted proteomics for most human proteins is now possible and can be brought to bear now on studying brain cancer proteomes. For example, Sangar et al. used SRM proteomics to quantify the effect of EGFR mutations on a prioritized list of invasion promoting proteins in glioblastoma (GBM) (18). In this study, authors found qualitative as well as quantitative differences in secreted invasion promoting proteins as a result of perturbations in EGFR signaling pathway in GBM cell lines. This study revealed that the EGFRvIII-containing U87 cell line was under higher oxidative stress than the cell lines without the EGFRvIII variant, and that this pathogenic variant might be contributing to higher levels of invasion promoting proteins in the secretome (18).
Other targeted proteomics studies in brain cancers using SRM to date are scarce, however brain specific infirmities such as autism (19) and brain damage (20) are recent examples of application of SRM. These studies highlight the utility of SRM in quantifying proteins (1) in complex proteomes extracted from surface membranes, brain tissues, plasma and urine as well as in relatively simple proteomes such as secretomes and cell lysates; and (2) in large number of samples reliably with small variance as well as measuring low abundance proteins. There are significant challenges for SRM as well. In SRM targeted proteomics, the signal for the peptide can get confounded by other peptides, which have the fragmentation pattern and elution times identical to the peptides of interest. In this case, enriching for a protein of interest or using an alternate fractionating method to separate peptides before injecting into the mass spectrometer for quantification by SRM can be very helpful strategies for enriching signal.
Strengths of SRM make targeted proteomics a potent strategy to seek for protein biomarker panels in peripheral fluids such as cerebral spinal fluid/blood proteins for brain cancers where tissue samples are hard to access. Other less precise high throughput quantitative methods such as iTRAQ (Isobaric Tag for Relative and Absolute Quantitation) and SILAC (Stable Isotope Labeling by Amino acids in Cell culture) techniques also can be used to screen for targets to the validate by SRM. This SRM approach has already been successful in generating a clinically used test for a different solid tumor (lung cancer) as will be discussed below.
Promise of Global Proteomics via SWATH-SWATH-MS (Sequential Window Acquisition of all THeoretical Mass Spectra) (21) promises to be the long-awaited first truly global approach in proteomics with the ability to analyze and quantify 5000 or more proteins from complex mixtures. SWATH is currently under development and is enabled by the increasing ability to extract signal from highly complex data. Experimentally, SWATH is made possible by the fast acquisition speed of the latest generation of quadrupole time-of-flight mass spectrometers. Equally important is the growth of an extensive database of peptide spectral libraries that allow for more detailed extraction of peptide signals from the complex and detailed raw data. SWATH builds a complete ion map using a series of highly complex MS/MS spectra using a novel targeted data extraction strategy that leverages the public peptide spectra repository of the aforementioned SRMAtlas. SWATH libraries only needs to be generated once (and then validated) without needing to collect the data repeatedly and can be iteratively mined computationally as knowledge of protein identifiers improves or when new hypotheses are formed that raise interest in pulling out additional classes of proteins. Essentially, the comprehensive ion maps provided by SWATH only miss peptides that do not have ionizable fragments, are outside of the m/z range of the instrument, or are not resolved by the LC. Thus, SWATH promises to do for proteomics what microarrays (and now RNAseq) did for transcriptomics -namely provide a way to globally measure abundance of almost all proteins across a sample. Although (to our knowledge) this approach has yet to be applied and published on for brain cancers, it has already been applied to the brain such as for characterization of the synaptic proteome for Alzheimer's Disease (22). SWATH clearly holds tremendous potential in the near future for brain cancer research, as this technology will offer us the first global views of the full cancerperturbed proteomes of tumors, peripheral tissues, and derived cell models, including the ability to monitor these global proteomes over time.
Capture Agent-based Panels for Monitoring Proteins at Large Scale and Frequency-It is often advantageous to de-ploy proteomics tests in the form of targeted panels based on capture agents. For fast simultaneous measurement of a relatively large number of proteins (e.g. up to 96 targets), technologies have been developed which convert protein abundance information into nucleic acid readouts, taking advantage of the relatively mature analytical capability for detecting multiplexed nucleic acids (e.g. qPCR, sequencing). One such technology is based upon the proximity extension assay (PEA), which employs oligonucleotide-labeled antibody pairs recognizing the same target protein. Binding of the two antibodies to the same target brings the two oligo probes into close proximity, allowing proximity-dependent DNA polymerization and the formation of a unique DNA reporter sequence. When coupled with the multiplex qPCR platform (e.g. the Fluidigm BioMark), up to 96 protein targets can be quantified in each sample, with 96 samples being carried out in a single run. O-Link Bioscience is leading the effort for the development of PEA technology, and has launched three protein panels (cardiovascular diseases, oncology, and inflammation) each with 92 protein targets (23). PEA technology and related reagents have been applied in serological biomarker discovery efforts in patients with neurological diseases (24). A unique feature of PEA is that the oligo tags are specifically designed so that only those on the pairing antibody will form PCR amplicon, effectively restricting the cross-reactivities from other antibodies, thus making feasible multiplexing. Akin to any antibody-based detection, the availability of good antibody pairs will be the main bottleneck for PEA. Although there are some overlapping antibody selections between the oncology and inflammatory panels, both can be applied ready for profiling blood proteins from cancer patient samples, without even the need to deplete the most abundance plasma proteins.
Another type of promising protein-capture reagents are nucleic acid aptamers, in particular the Slow Off-rate Modified Aptamers (SOMAmer) developed by Somalogic which incorporate several bases that have been modified to include "protein-like" side chains, and a 5Ј-linker. These SOMAmers combine some of the best properties of antibodies and traditional aptamers, while in the meantime retain high specificity for their respective cognate proteins. SOMAmer reagents targeting more than 1000 proteins have been developed (25), constituting perhaps the largest nonantibody protein assays to date. These SOMAmer panels provide a powerful platform for rapid deployment of multi-parameter proteomics diagnostics and other tests. Applications in both cancer and neurological diseases diagnostics have been reported (26,27).
Phosphoproteomics and associated technologies for posttranslational modifications-Phosphorylation, glycosylation, acetylation, ubiquitylation and other post translational modifications (PTMs) play critical roles in brain cancers (28 -31), and thus accruing more knowledge about these processes in tumorigenesis and therapeutic response of cancer cells is critical. Semi-quantitative techniques such as SILAC have been used to quantify phosphorylation status of 2282 proteins in GBM initiating cells (32). Similarly, Huang et al. (33) mutated multiple phosphorylation sites in the EGFRvIII signaling protein and utilized iTRAQ proteomics to quantitatively analyze the effect of these mutations on the downstream signaling proteins as well on proliferation in the U87MG cell line.
Studying post-translational modifications at scale is at a nascent stage, with several research groups developing and publishing technologies to reliably detect and quantify subproteomes such as acetyloproteomes (34,35), SUMOylation (36), ubiquitylation (37), glycosylation (38) and others. These PTMs, to our knowledge, have not been measured at scale in brain research yet, showing areas with significant need for exploration. We expect these technologies to mature further resulting in (1) increase in the number of identified PTM sites and crosstalk between various PTMs, (2) curation of PTM status databases for various conditions and cancers and (3) more impetus on longitudinal PTM data acquisition and analysis.
Clinical Implementation of Multiparameter Targeted Mass Spectrometry-The ultimate goal of applying these proteomics technologies to brain cancers is for clinical use as diagnostics and therapeutics. Let us take the example of the clinically available lung cancer marker panel based on mass spectrometry proteomics as an illustrative example of how targeted proteomics could be used to develop blood-based biomarkers for brain cancers as well (2). The strategy for developing this panel began with a set of about 400 candidates selected based on their biological relevance to the cancer and their likelihood of being found in the blood. This was the beginning of a winnowing strategy to identify first a FIG. 1. A schema for proteomics studies. Proteomics studies can be designed for a number of research objectives, including biomarker discovery, disease stratification, network elucidation, perturbation studies, and can include time-course analysis (especially for studies using peripheral fluids and cells). In these studies, it is typical to compare cancerous and normal specimens from the selected tissue source(s) of interest, which could include e.g. brain tumor tissue, post-mortem brain tissue, cell models, primary cell cultures, neuroblasts, iPSC-derived neurons/glia, cerebral spinal fluid, and/or blood samples. Selected (sub)proteomes, including the non-PTM proteome, phosphorylated proteome, G-glycosylated proteome, A-acetylated proteome etc. can then be assessed using a variety of technologies such as capture-agent based approaches, single cell proteomics, targeted mass spectrometry, discovery proteomics and so forth. All the data collected data should be curated and stored in relational databases and analyzed to get meaningful insights into the brain cancer biology. set of about 200 proteins that could be reliably detected in the blood. These proteins were then tested in a training set and scored for their ability to distinguish blood samples from 144 patients evenly split between having either benign or malignant nodules in their lungs. From the top-scoring 32 proteins about 1 million panels of 10 biomarkers were evaluated and ranked according to their ability to call the 144 nodules correctly. One then asked which proteins were "cooperative" in that they were most frequently found in the higher scoring panels. A final panel of the best performing proteins from this highly "cooperative" set resulted in a 13-marker panel (2). A classifier based on this panel was trained from data obtained at three independent sites, and the validation stage proceeded not only at these three sites with new independent samples, but, importantly, also at a fourth site that had no involvement in the discovery phase. The 13-protein biomarker panel identified the benign nodules with a 90% negative predictive value and a reported specificity of 44%-thus providing the opportunity of over a third of these patients to potentially avoid costly surgical procedures and potential morbidity. Twelve of 13 proteins map into three lung-cancer diseaseperturbed networks, raising the possibility that they could be used to identify transitions from benign to malignant, follow progression of the nodules should they become cancerous, and monitor response to therapy. This SRM-based multiparameter proteomic panel was commercialized and brought to clinical use in 2013, demonstrating promise that similar approaches can be brought to bear in a number of cases now including for brain cancers.
Conclusion & Future Directions-Emerging proteomics technologies offer enormous and, as of yet, largely untapped potential in the study of brain cancers to identify biomarkers for early detection, diagnosis, stratification and progression monitoring, as well as discover candidate targets for therapy. Discovery and targeted mass spectrometry with increased throughput and sensitivity will increasingly enable discovery of candidate disease biomarkers; whereas antibody and/or novel protein-capture agents will provide excellent platforms for subsequent validation and implementation-and we expect to see increasing applications of these approaches in clinical and consumer-oriented settings.
In this context, we see a number of pressing needs Fig. 1. Accelerated growth in more comprehensive, longitudinal and multi-layered proteomics data generation such as phosphoproteomic, glycoproteomic, acetyloproteomic, and others is critical to more fully represent the highly complex human proteome. This growth in multi-layered proteomics data needs to be followed closely by growth in curated databases to capture this information (such as the SRMAtlas) and coupled creative data analysis strategies to integrate these disparate data. Longitudinal studies and large sample sets generated across multiple sites and contexts will require rigorous sample collection, specimen conservation and sample preparation protocols as well as portable data generation methods, uniform data storage, version control and curation protocols. Such systematic approaches across all facets of this process are critical for making omics-based tests-such as can be derived from large-scale proteomics-reproducible and robust across diverse contexts (39,40).
These needs are even more pronounced for brain cancers because these have not yet been extensively studied through proteomics. Increased application of the latest proteomics technologies will significantly contribute to identifying and validating proteins as biomarkers for brain cancers, including for therapeutic response via proxy markers in peripheral tissues, which is especially important given the inaccessibility of brain tissue. High-pressure nanoLC-MS/MS is likely to play an outsized role in aiding identification of brain tumor specific antigens for such applications as immunotherapy for personalized therapeutics. We also envision proteomics technologies complementing an increasing abundance of brain image data to relate effects of cancerous growth on brain functions. Taken together, nextgeneration proteomics holds tremendous potential to deliver comprehensive and meaningful insights into brain cancer proteomes with the hopes of ultimately presenting candidates for effective therapy.