Regulated Phosphosignaling Associated with Breast Cancer Subtypes and Druggability*

Phosphorylation of substrate proteins by kinases regulates signaling pathways and cellular mechanisms. Aberrant signaling is a hallmark of cancer. We investigated quantitative associations between kinase-substrate pairs of more than 30 thousand phosphorylation sites in breast tumors and xenografts, finding auto-phosphorylating kinases and kinase-substrate pairs associated with specific cancer subtypes, druggable targets, and ones that show clinically-related or immune signatures. Our study connects driver kinases to their signaling effects toward informing rational targeted treatments of individual breast tumors. Graphical Abstract Highlights Our search identifies 2,134 kinase-substrate phosphosite pairs in breast cancer. CDKs and MAPKs are dominant regulators of trans substrate-phosphorylation. Druggability, outcomes, and immune signatures related to kinase-substrates. Experimentally validated activated phosphosites of ERBB2, EIF4EBP1, and EGFR. Aberrant phospho-signaling is a hallmark of cancer. We investigated kinase-substrate regulation of 33,239 phosphorylation sites (phosphosites) in 77 breast tumors and 24 breast cancer xenografts. Our search discovered 2134 quantitatively correlated kinase-phosphosite pairs, enriching for and extending experimental or binding-motif predictions. Among the 91 kinases with auto-phosphorylation, elevated EGFR, ERBB2, PRKG1, and WNK1 phosphosignaling were enriched in basal, HER2-E, Luminal A, and Luminal B breast cancers, respectively, revealing subtype-specific regulation. CDKs, MAPKs, and ataxia-telangiectasia proteins were dominant, master regulators of substrate-phosphorylation, whose activities are not captured by genomic evidence. We unveiled phospho-signaling and druggable targets from 113 kinase-substrate pairs and cascades downstream of kinases, including AKT1, BRAF and EGFR. We further identified kinase-substrate-pairs associated with clinical or immune signatures and experimentally validated activated phosphosites of ERBB2, EIF4EBP1, and EGFR. Overall, kinase-substrate regulation revealed by the largest unbiased global phosphorylation data to date connects driver events to their signaling effects.


In Brief
Phosphorylation of substrate proteins by kinases regulates signaling pathways and cellular mechanisms. Aberrant signaling is a hallmark of cancer. We investigated quantitative associations between kinase-substrate pairs of more than 30 thousand phosphorylation sites in breast tumors and xenografts, finding auto-phosphorylating kinases and kinase-substrate pairs associated with specific cancer subtypes, druggable targets, and ones that show clinically-related or immune signatures. Our study connects driver kinases to their signaling effects toward informing rational targeted treatments of individual breast tumors.
Mutations and alterations in cancer dysregulate kinases and signaling cascades. Large-scale studies of breast cancer have discovered drivers, with genomic and expression changes, in kinases of the PI3K/Akt signaling and TP53/RB signaling/cell-cycle checkpoint pathways (1,2). However, genomic findings provide only indirect inference of phosphorylation activity. Further, large-scale proteomic studies using reverse phase protein array (RPPA) 1 are limited to coverages of ϳ200 proteins and phosphoproteins with available antibodies (3)(4)(5). The impact of candidate driver events on direct signaling has therefore been seldomly explored in the corresponding tumors. Although functional experiments in in vivo systems or model organisms have enabled controlled assessment of downstream effects, they must be complemented by in vivo observations that account for the molecular complexity of each tumor.
Mass spectrometry (MS) is evolving rapidly and has cataloged tens of thousands of phosphorylation sites (phosphosites). Recent studies by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) using liquid chromatography (LC) MS/MS have generated proteomic/phosphoproteomic data sets that have more deeply profiled the cancer proteome (2,6,7), providing an opportunity to evaluate regulation of phospho-signaling in cancer. Other MS-based studies focusing on the kinome have highlighted kinases that are disrupted in cancer (8) and identified complementary genomic and proteomic alterations in a handful of signaling pathways (9). However, characterization of kinase-substrate interaction at a single-residue level has been largely limited to in vitro and in silico predictions using sequence-based analysis of kinase binding motifs (10 -14). Direct observation of kinase-substrate associations in patient tumors can complement these studies to help reveal their regulation mechanisms.
In this study, we characterized phospho-signaling in cancer by analyzing the phosphoproteomic profiles of 77 human breast tumors characterized in TCGA and CPTAC projects (1,2) and validating predictions using an independent cohort of 24 breast cancer patient-derived xenografts (PDXs). Approximately 60% (387/630) of phosphosites are significantly associated with expression of their respective proteins across 91 unique kinases in EGFR, PKC, STE20, and other families. MAP kinases and CDKs are involved in wide-spread transcorrelations with downstream substrates. Only selected substrate phosphosites (ϳ5%, 1747/38,710) are associated by kinases in this breast cancer cohort, suggesting cancer-specific usage of signaling modules. Notably, our analysis revealed 806 potentially associated phosphosites by protein kinases not captured by current experimental or motif-based predictions, including ERBB2, EGFR, MAPKs, and CDKs. Landscapes of associations revealed adjacent phosphosites showing similar or distinct regulations. Finally, druggable kinase-substrate pair analysis validated singleton driver events of AKT1, ERBB2, and RAF1, while exposing potentially druggable pairs and cascades downstream of several MAP kinases not apparent at the genomic level. Overall, our findings emphasize that evaluating signaling effects in patients' tumor through a systematic, high-throughput manner could ultimately prove clinically useful.

Sample Description
Samples of human breast cancer have been described in CPTAC marker papers (2,6). The corpus is comprised of 77 breast cancer samples showing unimodal distribution in proteomes, their 3 technical replicates, and 3 normal breast samples. Samples of the 24 PDX breast cancer were as described previously (15).

Data Generation
TCGA Genomics Data-The TCGA somatic mutation data, level-3 segment-based copy number data, and level-3 normalized RNA expression data were downloaded from firehose (archive date October 17, 2014). We then converted the segment-based copy number data to the gene-based copy number data by using the RefSeq database (version 20130727). The CNV ploidy number is divided by 2 and then log2-transformed to obtain the final CNV levels for analysis. We also log2-transformed the RSEM values of RNA expression data.
Global Proteomics Data-Global proteomics data for the human samples were downloaded from the Mertins et al. breast cancer study (2). Global proteomics data for the PDX samples were downloaded from the Huang et al. breast cancer PDX study (15). As previously described, 2-component Gaussian mixture model-based normalization algorithm was used to normalize the data and accomandate both consistently and differentially-expressed proteins and phosphosites within each sample. Further, proteins and phosphosites used in this study were required to have observed (nonmissing) iTRAQ ratios in at least 30 samples and an overall standard deviation larger than 0.5 (across all samples where they were observed).
To obtain phosphoprotein level used in our kinase-substrate analysis, we used the WGCNA package (17) to collapse phosphosite expression level within each protein. For proteins with three or more phosphosites, we used connectivity based collapsing using the phosphosite with the highest connectivity according to a signed weighted correlation network adjacency matrix. For proteins with two phosphosites, we select the phosphosites with the highest mean value.
PhosphositePlus (20,21)-For phosphosite comparison, we downloaded the data from file Phosphorylation_site_dataset.gz (Downloaded: 2/13/2019) to extract phosphosite mapping to human proteins and reverse-translated to their unique genomic positions using transvar (22). For kinase-substrate pair analysis, we downloaded the file Kinase_Substrate_Dataset.gz (Downloaded: 2/11/2016) and extracted the kinase-substrate proteins pairs. To identify phosphosites known in cancer, we filtered the disease-associated sites database for cancer terms (ex. cancer, *oma, and leukemia) using the file Disease-associated_sites.gz (Downloaded: 2/11/2016). The sites that were not matched to a valid genomic coordinate by TransVar (22) were excluded and the remainder sites were further reviewed. We ultimately retained 261 sites, where 84 unique sites were quantified in our dataset.
Phospho.ELM-We downloaded the phospho.ELM database from PhosphositePlus (2/11/2016). We then extracted the phosphosites mapping to human proteins and reverse-translated to their unique genomic positions using transvar (22).
PhosphoNetwork-We downloaded the Supplementary Tables from Newman et al. (10) and derived the predicted kinase-substrate pairs from the file comKSI.csv, this file combined refined kinasesubstrate pairs to 719 kinase-substrate pairs as described. We then further filtered out the pairs already observed in PhosphositePlus and combined the remaining pairs with pairs from PhosphositePlus for analysis.
NetworKIN-We downloaded the NetworKIN output (networkin_ human_predictions_3.1.tsv.gz, download date: 3/20/2018) from Ki-nomeXplorer (12) (http://networkin.info/). We then standardized the kinase names to HUGO symbols using mapping from the human kinome (23) and custom scripts. The processing resulted in 423,574 predicted kinase-substrate phosphosite pairs with mapped gene names, of which 412,696 pairs were within the kinase/substrate genes investigated in our quantitative analyses and were thus used for further analyses.

Cross Data Type and Database Integration
All gene names were converted to HUGO Gene Nomenclature Committee's approved gene names for comparison across levels and datasets. To match the exact phosphosite (ex. PIK3CA:NP_006209.2: s312) across databases, all phosphosites were reverse-mapped to their respective genomic positions (ex. chr3:g.178921452_ 178921454) using transvar (22).

Kinase-Substrate Pairs Regression Analysis-We obtained 3245
unique kinase-substrate pairs in the PhosphositePlus database (20,21) and an additional 1752 kinase-substrate pairs from the Phospho-Network database (24). We then screened all kinase-substrate-phosphosite pairs within these known kinase-substrate protein pairs. For each kinase-substrate-phosphosite pair to be tested, we required both kinase protein/phosphoprotein expression and phosphosite phosphorylation to be observed in at least 10 samples in the respective datasets and the overlapped dataset. We then applied the linear regression model as implemented in glm function in R to test for the relation between kinase and substrate phosphosite. The tests are independently conducted for cis and trans interactions in the cohorts of 77 human and 24 PDX breast cancer samples using linear regression models. Linear regression modeling is generally based on estimating coefficients of independent variables that give the best fit to a dependent variable within the context of a random error. For the i th trial for kinase phosphosite abundance in the cis associations, kinase phosphosite abundance, A i , depends on kinase protein expression, S i , and error, E i , For the i th trial for kinase phosphosite abundance in the trans associations, substrate phosphosite abundance, A i , depends on kinase phosphoprotein expression, K i , substrate protein expression, S i , and error, E i , where the beta coefficients are determined by least-squares calculation. The resulting p values were adjusted using the Benjamini-Hochberg procedure to FDR.
We determined kinase-phosphosite pairs as validated if they showed p value under 0.05 and positive regression coefficients in the PDX cohort. For the kinase-phosphosite pairs showing top significant associations in the regression analysis, we calculated the average of phosphorylation level for each of the substrate phosphosite and protein expression of the kinase within each of 4 breast cancer subtypes (Basal, Her2, LumA and LumB) for display in Fig. 2 and 3.
Finally, we compared our results of trans association to 10 simulated kinase-substrate datasets by distributing the same kinases in even proportions to all incidences of current substrate phosphosites and conducted the kinase-substrate association testing in 574,356 randomized pairs with sufficient quantifications. Only 2.82% (16,194) showed positive association using the same criteria (regression coefficient ␤ Ͼ 0 and FDR Ͻ 0.05). We then determined the significance of our observed proportion to the simulated results using the Fisher's exact test.
Kinase-Substrate Pair Validation Using Thresholds Defined in PDX Data-We used sensitivity/specificity analyses to determine the thresholds derived from the PDX cohort used for validating kinasesubstrate pairs found in the human cohort. In this analysis, the correlated significance (with regression coefficient ␤ Ͼ 0 and FDR Ͻ 0.05) of tested kinase-substrate pairs with sufficient PDX data (cis n ϭ 396; trans n ϭ 27,585) is considered as the response variable and the kinase regression coefficients are considered as the predictor variables (supplemental Fig. S2). For cis analysis, the area under curve (AUC) for the regression coefficient of the kinase protein in the PDX cohort is moderate at 0.5464; but at our threshold coefficient of 0.1, we could achieve sensitivity of 0.98 and specificity of 0.09. For trans analysis, the AUC for the regression coefficient of the kinase phosphoprotein in the PDX cohort is better at 0.612; and at our threshold coefficient of 0.1, we could achieve sensitivity of 0.46 and specificity of 0.71.
Kinase Group and Family Enrichment Analysis-To test whether kinases showing significant cis or trans correlations are enriched in kinase groups and families, we applied the one tailed Fisher's exact test under a null hypothesis that the odds ratio of associated kinases in the family are not greater. The universe of kinases for each of the cis and trans tests was defined as the total tested kinase, and the 2-by-2 table was constructed by (1) whether the kinase belonged to the kinase group/family, and (2) whether the kinase had any significant correlations.

Structural and Cophosphorylation Analysis
We used HotSpot3D (18) to generate pairwise linear and 3D distances between residues within 1,288 proteins having available PDB structures. The active sites were mapped based on data from the RSCB PDB, as of January 2017 (http://www.rcsb.org/pdb/home/ home.do). Pearson's correlation coefficients and adjusted p values were then calculated for each pair of two phosphosites within these proteins. We limited this correlation analysis to pairs of phosphosites jointly observed in at least 5 samples in the breast cancer cohort. For linear examinations of association landscapes, the lolliplots were generated and modified using the PCGP protein painter (http:// explore.pediatriccancergenomeproject.org/proteinPainter).
Druggable Kinase-Substrate Pairs and Cascades Analysis-We compiled a list 76 druggable genes along with their respective drugs, from established public databases as previously described (15). We then limited the analysis to the 68 genes with per-sample average RSEM value greater than 100. We searched for significantly associated kinase-substrate pairs where either the kinase or substrate belonged to the set of druggable genes. The score of each pair was calculated as the sum of standardized kinase and substrate phosphosite levels: where is the mean and is the standard deviation.
To identify druggable outliers in the conventional method, we scanned for events at the somatic mutation, CNV, RNA, and protein expression levels for each gene. CNV, RNA, and protein expression outliers were identified as the ones greater than 2 interquartile ranges (IQR) above median, as previously described (2,15). We then complemented this conventional single-event analysis by identifying outliers using the kinase-substrate score.
To define the druggable cascade of each of the druggable kinase, we extracted all significantly associated cis phosphosites and their first-degree correlated trans substrate phosphosites. We then expanded one level beyond, extracting additional trans phosphosites associated with the phosphoprotein level of the first-degree substrates. The resulting cascades were visualized using heatmaps showing levels of each protein/phosphoprotein/phosphosite and a network diagram using Cytoscape (25).
Clinical and Immune Correlation Analysis-For each kinase-substrate pair, We conducted association analysis between the kinasesubstrate score (defined above) of this pair for each patient and pathological stage, survival, radiation therapy, and immune signatures adjusted for their PAM50 subtypes. For continuous variables, including pathological stage and immune scores, we used a Gaussian linear regression. For whether the sample has gone through radition therapy, we used a logistic regression model. We used the Cox proportional Hazards model for survival analysis. The resulting p values were adjusted to FDR using the Benjamini-Hochberg procedure.
Statistical testing of the results was conducted using linear regression models, where the quantity measured through Western blotting is the dependent variable and the quantification obtained through high throughput proteomics is the independent variable, with the date/replicates of the blotting experiment as the covariate.

SRM Validation Stable Isotope-Labeled Peptides
Crude stable isotope-labeled peptides (SI peptides) were synthesized with 13 C/ 15 N on C-terminal lysine or arginine (New England Peptide, Gardner, MA). The SI peptides were dissolved individually in 15% acetonitrile (ACN) and 0.1% formic acid (FA) at a concentration of 1.5 mM (according to the nominal peptide quantity provided by the vendor) and stored at Ϫ80°C. A mixture of these SI peptides was made with a final concentration of 10 pmol/l for each peptide.
SRM Assay Development-To evaluate the peptide quality and select the best responsive transitions for each peptide, 500 fmol/l of heavy peptide mixtures were subjected to high-resolution mass spectrometry (MS) analysis using the Orbitrap Fusion Lumos instrument (Thermo Fisher Scientific, San Jose, CA) running in the HCD mode (26). The resulting RAW files were processed with DTARefinery (27) and MS-GFϩ (28,29) to match against the RefSeq human protein sequence database, released on April 18, 2017 (46,174 proteins) for generating a list of MS/MS fragment ions derived from SI peptides. The 6 most intensive fragment ions for each peptide were initially selected based on corresponding MS/MS spectra. The collision energies for individual transitions were obtained by using empirical equations from the Skyline software (30). Second, LC-SRM was used to further evaluate all heavy peptides for the LC performance (e.g. the stability of peptide retention time), MS response (e.g. reliable heavy peptides identification and S/NϾ3), transition interferences, and endogenous peptide detectability by spiking them into water and the pooled wild type samples. In the end, 3 or more transitions per peptide were selected for configuration of the final panel of assays for reproducible targeted quantification. The transitions for each precursor were provided in supplemental Table S23.
The intended use of these assays is the Tier 2 measurements, with the ability to measure repeatedly sets of analytes of interest within and across samples by spiked in internal standards for each analyte, for confident detection and precise quantification (31). Response curves were generated for the peptides using triplicate measurements of varying concentrations of heavy peptides and a constant concentration of light peptides spiked into a tryptic digest of a previously characterized human breast cancer xenograft (32) (supplemental Fig.  S13). For peptide IPLENLQIIR, the concentrations of heavy peptide were 0, 20, 50, 100, 200, 800, and 1600 amol/l and the concentration of light peptide was 125 amol/l; 5 l of the sample at each data point was injected for analysis by LC-SRM. Its limit of detection (LOD) and lower limit of quantification (LLOQ) were both determined at 20 amol/l level. For the two phosphopeptides NGLQSCPIKED(pS)-FLQR and ELVEPL(pT)PSGEAPNQALLR, the concentrations of heavy peptide were 0, 0.4, 0.8, 1.6, 3.125, 6.25, 12.5, 25, 50, 100, and 200 fmol/l and the concentration of light peptide was 200 fmol/l; they were spiked into the same breast tumor xenograft sample (100 g peptide), followed by phosphopeptide enrichement (see below) and LC-SRM analysis. The LOD and LLOQ for both phosphopeptides were 0.4 fmol/l and 1.6 fmol/l, respectively. The LOD was determined by S/NϾ3. The LLOQ was determined by both S/NϾ10 and CVϽ20%. The inter-day CVs were also determined (supplemental Fig. S14).
Phosphopeptides Enrichment by IMAC-Two hundred micrograms of tryptic peptides from each sample spiked with 1000 fmol/l of each of the crude SIL peptides were subjected to phosphopeptides enrichment via immobilized metal affinity chromatography (IMAC) (33,34). The in-house-made IMAC tip was capped in a tip-end with a 20-m polypropylene frits disk followed by packing with Ni-NTA silica resin (Qiagen, Hilden, Germany). First, Ni 2ϩ ions were removed by adding 50 mM EDTA in 1 M NaCl. The tip was then activated with 100 mM FeCl 3 and equilibrated with 6% (v/v) acetic acid at pH 3.0 before sample loading. Tryptic peptides were dissolved in 6% (v/v) acetic acid and loaded onto the IMAC tip. Followed by 1% (v/v) trifluoroacetic acid, 80% ACN, and 6% (v/v) acetic acid washing steps, the bound phosphopeptides were eluted by 200 mM NH 4 H 2 PO 4 and then desalted by SDB-XC StageTips (35) and dried under vacuum.
LC-SRM-The enriched phosphopeptides or the unmodified peptide samples with spiked SIL peptides were dissolved in 2% ACN/ 0.1% FA and analyzed using a TSQ Vantage triple quadruple mass spectrometer (Thermo Fisher Scientific) equipped with a nano-ACQUITY UPLC system (Waters, Milford, MA). Peptide samples were loaded onto an ACQUITY UPLC BEH 1.7-m C18 column (100 m i.d. ϫ 10 cm). The mobile phases were (A) 0.1% FA in water and (B) 0.1% FA in ACN. Two microliter of the unmodified peptide sample (0.3 g) and 5 fmol/l SIL peptides were loaded onto the column and separated at a flow rate of 400 nL/min using a 100-min gradient profile as follows (min:%B): 11:0.5, 13.5:0.5, 17:8, 25:13, 55:20, 80: 38.5, 85:95, 89:50, 90:95, 91:0.5. For enriched phosphopeptide samples, all of the eluent from IMAC was dissolved in 10 l and 4.5 l were loaded onto the column and separated with the same LC gradient. The LC column is operated at a temperature of 44°C. The parameters of the triple quadruple instrument were set as follows: 0.7 fwhm Q1 and Q3 resolution, and 1 s cycle time. Data were acquired in time-scheduled SRM mode (retention time window: 10 min) without blinding the samples.

Data Analysis
SRM data were analyzed using Skyline software (version 4.2). The total peak area ratios of endogenous light peptides and their heavy isotope-labeled internal standards (i.e. L/H peak area ratios) were calculated for quantification. Peak detection and integration were carried out based on two criteria: (1) same retention time and (2) similar relative SRM peak intensity ratios across multiple transitions between light (endogenous) peptides and the heavy SIL peptide standards. All data were manually inspected to ensure correct peak detection and accurate integration.

Detect and Characterize Phosphosites in Breast
Cancer-We analyzed genomics and proteomics data from 77 breast cancer and 3 normal breast samples (supplemental Table S1) characterized by TCGA(1) and CPTAC(2) (Fig. 1, Experimental Procedures). Genomic analysis of TCGA DNA-Seq, SNP-array, and RNA-Seq data provided comprehensive assessment of somatic mutations, copy number variations (CNV), and mRNA expression, respectively. The CPTAC breast cancer project quantified global protein and phosphosite expression levels using the Isobaric Tag for Relative and Absolute Quantitation (iTRAQ) technology. We further used proteogenomic data from 24 independent breast cancer patient-derived xenografts (PDXs, supplemental Table S1) (15) generated using the same technologies and bioinformatic pipeline for validation.
In this study, we aim to identify quantitatively correlated kinase-substrate pairs that are concordant in their relative abundance across samples (Fig. 1D). We hypothesize these phosphosites are regulated in a patient-specific manner, contributing to cross-sample variation in wiring of signaling networks. We examined two classes of kinase-substrate relations: (1) cis associations whereby a kinase protein expression is correlated with its own phosphosites and (2) trans associations whereby a kinase phosphoprotein expression is correlated with a substrate phosphosite level (Experimental Proce-dures). We specifically analyzed sites showing substantial variations (Observed across 30 samples and standard deviation Ͼ 0.5) across breast tumors (Experimental Procedures), noting that there may be other kinase-substrate pairs that remain in steady states across breast tumors (i.e. consistently high or low) not addressed by our current approach and dataset (Fig. 1D).
To achieve this goal, we curated and screened 4997 pairs of human kinase and substrate proteins based on the Phos-phositePlus and PhosphoNetwork databases (20,24). We then compared the quantitatively correlated pairs we identified at the phosphosite level with experimentally observed kinase-substrate phosphosite pairs in PhosphositePlus (phosphosites observed in vivo or in vitro) and Phospho-Network (CEASAR: kinase-substrate map generated through in vivo sites and bioinformatics algorithm), as well as pairs based on kinase binding-motif prediction using NetworKIN3.0 (11,12). Our correlation approach overlapped previously identified pairs and added new observations at the phosphosite level (Fig. 1E). For cis associations, 40 experimentally observed cis-pairs showed significant correlations. For trans associations, we observed a notable enrichment of both experimentally observed pairs (Fisher's exact test, 2.2-fold enrichment, p ϭ 5.5e-22) and kinase binding-motif predicted pairs (Fisher's exact test, 1.7-fold enrichment, p ϭ 8.3e-07), suggesting that our quantitative-correlation approach affords an independent and complementary method that can identify potential kinase-substrate phosphosite pairs. Notably, our approach identified a substantial amount of 1440 trans associated pairs in addition to those found by in vitro experiments or binding-motif analysis.
Phosphosites in Auto-phosphorylated Kinases Across Breast Cancer Subtypes-Theoretically, a phosphosite's abundance will correlate with its protein level as determined by stoichiometry. However, among the wide-spread molecular changes in tumors, many are presumed to be "passengers" without clear functional effects and validated functional events can often be prioritized by those affecting downstream changes, such as genomic truncation/CNV/methylation associated with mRNA gene expression (40 -42) or mRNA changes associated with protein expression (15,43). Thus, we hypothesize cis-association will enable us to select for meaningful protein level changes and their functional phosphosites. Further, those phosphosites with increased abundance above expectations might suggest autophosphorylation, a hallmark functionality of many kinases involved in oncogenic signaling.
For cis analysis, protein expression of 120 kinases known to exhibit auto-phosphorylation were evaluated with relation to peptide abundance of their 630 phosphosites using a linear regression model (Experimental Procedures). Protein abundance measures do not always guarantee the activation of the phosphosites of the same protein: 61.4% (387/630) of the tested kinase-substrate relations showed significant positive associations (regression coefficient ␤ Ͼ 0 and FDR Ͻ 0.05) in breast cancer ( Fig. 2A, Fig. S2) in the cohort of 24 breast cancer PDXs (15) and considered validated (supplemental Table S5).
At the kinase group or family level, enrichments of cisassociated kinase genes within the group or family imply their diverse regulation and possible functional quantity changes across breast tumors. Using the background of all tested kinases, the AGC kinase group was most significantly enriched with cis associations (Fisher's Exact Test, p ϭ 0.024, Experimental Procedures), with all 17 tested genes showing significant cis correlations with their respective phosphosites, followed by the STE group (14/14 genes, Fisher's Exact Test, p ϭ 0.049, supplemental Table S6). Notably, at the kinase family level, we observed significant cis-association in all 8 of the STE20 kinases in the analysis, including PAK1/2/4, STK3/ 4/24, and MAP4K1/3 (supplemental Table S7).
We hypothesized that the observed cis associations may indicate kinase autophosphorylation if each unit change in kinase protein resulted in an even greater change in phosphosite level (␤ Ͼ 1). Using this approach, we identified 41 cisregulated sites that showed the steep increase, including CDK9 pT186, RAF1 pS43, BRAF pS151/S729, and PRS6KA4 pS737 (supplemental Fig. S3), and thus potentially were regulated in part by the autophosphorylation. Among the 41 pairs, only CDK9 pT186, PRKCB pS660, and PRKCD pS645 were validated as auto-phosphorylated in PhosphositePlus (20,21), warranting further validation to phosphosites discovered using the quantitative-correlation approach.
Correlated Kinase-Substrate Pairs in Breast Cancer-For trans analysis, we surveyed 7404 kinase-substrate protein pairs and a total of 38,710 kinase-substrate phosphosite relations with sufficient observations in our data set (Experimental Procedures). We applied a linear model using phosphosite abundance of the substrate as the dependent variable, phosphoprotein expression of the kinase as the independent variable. To avoid associations because of co-expression, we correct for the protein expression of the substrate using it as a covariate. Although only 4.51% (1,747/38,710) of the tested relations showed significant positive associations (Fig. 3A, supplemental Table S8), this correlated proportion is still significantly higher than that found using simulated pairs (Experimental Procedures, Fisher's Exact Test, p Ͻ 2.2e-16). Further, Only 143 and 81 of the 1,747 trans pairs were identified when we modeled the kinase-substrate relations using protein and mRNA levels of the kinase, respectively, suggesting that a limited fraction of the identified kinase-substrate pairs are likely because of co-expression of the kinase and substrate (supplemental Fig. S4). Phosphoprotein levels of the kinase are required to identify 1576 of the trans associations, consistent with a mathematical model of phosphosignaling reactions (44) and suggesting the identification of signaling pairs.
Of the identified trans-regulatory pairs, 45.5% with sufficient PDX data (407/894) were confirmed (␤ Ͼ 0.1, corresponding to sensitivity of 0.46 and specificity of 0.71 (Experimental Procedures), supplemental Fig. S2) in breast cancer PDXs (15) (supplemental Table S9). The moderate validation rate may be because of increased sensitivity to micro-environment in trans pairs and larger sample sizes potentially required to firmly establish trans associations (supplemental Fig. S3). To reveal global patterns of trans associations less affected by potential noise and lack of power, we sought to identify kinase families enriching for trans-associated kinase genes among the background of all tested kinases. The identified trans-regulations are congregated in 165 kinases, which showed the most significant enrichment in MAPK (Fisher's Exact Test, p ϭ 0.00045), CDK (Fisher's Exact Test, p ϭ 0.0026), and PKC (Fisher's Exact Test, p ϭ 0.0061) families (supplemental Table S7).
Sequence and Structural Patterns Correlated with Phosphosite Expression-Our cis and trans analyses identified pairs of phosphosites on the same protein showing concordant regulation patterns, including those in ERBB2, PRKCB, and WNK1. We hypothesized that phosphosites in spatial proximity would be affected by the same regulators and would show similar phosphorylation patterns. For each pair of phosphosites, we calculated the correlation coefficient of their levels and compared that to their proximity on PDB structures, as determined by HotSpot3D (18,45) (supplemental  Table S10, Methods). We found a significant correlation be-tween phosphosite levels with both 3D distances (measured in ångströms) of the phosphosite pairs (Spearman correlation, p ϭ 3.5e-08, rho ϭ Ϫ0.40) and linear distances (measured in amino acid residue counts, Spearman correlation, p ϭ 7.9e-04, rho ϭ Ϫ0.25, supplemental Fig. S6), confirming coregulation yet potentially divergent abundance in adjacent phosphosites.
Phosphosites with strong cis-associations may help detect activating events correlated with high kinase expression. We identified ERBB2 pS1151/S998/T1240 as showing the most significant cis-associations with particularly high levels in HER2-E tumors (linear regression, ␤ Ͼ 0.65; FDR Ͻ 6.2e-16) and may serve as complementary HER2 biomarkers to the tyrosine residues targeted by available antibodies, such as pY1221 and pY1248 (Fig. 4A). Notably, pT1240 has been shown to drastically reduce in phosphorylation level upon either trastuzumab and Bis-Fab 1235 (antagonists of HER2) treatments whereas pY1248 shows an increase (46), demonstrating their differential regulation upon inhibition. On the other hand, poorly correlated phosphosites in kinases with strong cis-effects may suggest additional post-translational modification mechanisms. For example, we observed strong association between AKT1 protein and phosphorylation levels for sites pS122/S126/S129 (linear regression, FDR Ͻ 8.2E-08, notation such as S126129/S124S126S129 in Figs., Supplementary Tables, and texts indicates dual/triple phosphorylation of the phosphosites in the same detected peptide). However, the associations for pT308/S475 (linear regression, FDR Ͼ 0.38) are considerably weaker, even with similar observed sample sizes (Fig. 4B). Other discordant sites (supplemental Fig. S7) include the strongly associated ABL1 pS637/ S737/T800/T871 versus the nonassociated pS16/S588/S828; the strongly associated PTK2 pS390/S570/S708/S910 versus the nonassociated pY576/S840; and the associated RIPK1 pS320/S610 versus the nonassociated pS330/S416.
Intriguingly, several regulated phosphosites reside in structural proximity to active sites of kinases (supplemental Table  S11, S12). MAPK3 (ERK1) pT202/Y204, which are known to be phosphorylated by MAP2K1 and MAP2K2 (MEK1 and MEK2) to trigger its activation (47,48), are adjacent to its active site pD166 (Fig. 4D). These two sites showed significant cis-regulation by MAPK3 protein, albeit not trans-association by MAP2K1/2 phosphoprotein. In MAP2K6, MAP3K5regulated pS207 and an adjacent site pS201 are in its catalytic domain near active sites pD179 and pT211 (Fig. 4E). These phosphosites may alter the biochemical properties of the active site and affect the activity level of the corresponding kinase.
Kinase-Substrate Pairs Associated with Treatment Options-To inform targeted treatment, druggability analysis has traditionally focused on detecting single activating events, for example EGFR mutation, PIK3CA mutation, and ERBB2 amplification (49,50). Although these events can predict treatment response, we reason that co-occurrence of elevated kinase-substrate phosphorylation can confirm aberrant activation and reveal new treatment options. We conducted an in silico druggability analysis for the significant cis and trans kinase-substrate pairs by screening 68 druggable genes with potential inhibitors (15) that are abundantly expressed in our breast cancer samples (Experimental Procedures). We used abundance levels of both the kinase and substrate phosphosite to construct a kinase-substrate pair score and identified samples showing 2 standard deviations above cohort medians in their scores (Experimental Procedures).
Among the 286 associated kinase-substrate pairs, we identified 164 overexpressed events among 113 unique pairs (Fig.  5A, supplemental Table S13). Overexpression of the cis pair ERBB2:ERBB2 pS1151 were found in 5 HER2-positive samples. The cis BRAF:BRAF pS364 pair and the trans MAPK14: FOXO3 pS413 pairs were found exclusively in 4 and 3 luminal B breast cancers, respectively. SRC and SYK-regulated trans pairs also showed higher levels in luminal B breast cancers. Pairs associated with MAP kinases, such as MAPK3:PALLD pS55, MAPK3:IL16 pS584 and MAPK8:JUN pT62, showed higher levels and several pair overexpression in luminal A breast cancers. In samples without prominent ERBB2 signaling, we observed other alternative overexpression pairs triggered by other receptor tyrosine kinases, including EGFR regulated trans pairs and IGF1R-regulated cis pair IGF1R: IGF1R pS1365, which is a conserved phosphosite found in 10 species (51). In the 24 breast cancer PDXs, we also discovered overexpression of many regulated pairs, including ERBB2:ERBB2 pS1151, BRAF:BRAF pS364, and EGFR: PLCG1 pS1222 (supplemental Table S14, supplemental Fig.  S8), underscoring their prevalence in breast cancer.
We compared the candidate targets identified through paired druggability analysis and conventional single driver analysis (Fig. 5B, supplemental Table S15), where we compiled mutations, CNV, RNA, and protein levels of the same 68 expressed, potentially druggable genes (Experimental Procedures). We observed concordant single driver events with overexpression pair events for kinases, including AKT1, BRAF, ERBB2, IGF1R, and RAF1. Samples with ERBB2, IGF1R and RAF1 copy number amplified outliers often show high expression of the respective kinases-substrate pairs (Fig.  5B). However, single driver events do not guarantee activated signaling of their associated kinase-substrate pairs. For example, some samples with PIK3CA mutation, ERBB2 mutation, or RAF1 copy number amplification did not show overexpressed kinase-substrate pairs associated with the respective kinases (Fig. 5B). Observing concurrent activation of downstream targets using phosphoproteomics data is needed to further confirm these effects and treatment options for each patient.
On the other hand, active signaling events may occur in samples without mutations or expression aberrations of the kinase. This is particularly evident in both human and PDX samples with outlier MAPK3 and MAPK14 trans pairs (Fig. 5B, supplemental Fig. S9, supplemental Table S16). Only 2 out of 7 samples with outlier MAPK3 trans pairs showed MAP3K1 mutations and none of the 5 samples with outlier MAPK14 trans pairs carried MAP3K1 or MAP2K4 mutations (supplemental Fig. S10), suggesting MAP kinases may be activated by other upstream signaling mechanisms rather than directly being altered at the sequence or expression levels. We also observed some of these overexpression pair events in absence of single driver events for AKT1, BRAF, and EGFR ( Fig.  5B) that require further investigation.
Activated Phosphosignaling Cascades Derived from Connected Pairs-We further extended the analysis from pairs to two-level signaling cascades that include the first and second degree substrates of the druggable kinase (Methods). We hypothesize in these signaling cascades, the activated kinases could trigger phosphorylation of multiple downstream targets at multiple steps, thereby presenting heightened opportunities for targeted inhibition treatment.
Out of 28 kinases associated with at least one cis or trans-associated phosphosite, 16 also had second-degree substrates (Supplemental Table 17). AKT1, BRAF, EGFR, JAK2, PRKCE, and PLK1 showed cis and trans interactions that may help confirm activity levels (Fig. 6, supplemental  Fig. S11). In addition to its cis associations, AKT1 also associated with phosphosites of ILF3, ETS1, and GSK3A. In turn, GSK3A is associated with 3 phosphosites of NDRG1, 2 sites of RICTOR, and 1 site each in LARP1 and ANKS1A (Fig. 6A). BRAF also has multiple cis-associated phosphosites and is associated with phospho-MAP2K1/2 (MEK1/2), which are associated with downstream phosphosites at MTA1, KRT8, and PRKCD (Fig. 6B). Finally, high levels of EGFR phosphorylation were mostly observed in basal subtype breast cancers and these tumors also exhibited high phosphosite levels in GAB1, PLCG1, FAM129B, and PTK2; PTK2 phosphoprotein is further associated with phosphosites of PPP1R13L and PXN (Fig. 6C). The association of the primary druggable kinase with its signaling cascade could strengthen the rationale for targeted inhibition in tumors showing co-activation.
Kinase-Substrate Pairs Correlated with Clinical and Immune Features-Last, we sought to identify kinase-substrate pairs associated with clinical characteristics of breast cancer. For each kinase-substrate pair, we conducted a regression analysis using the aforementioned kinase-substrate score as the independent variable versus pathological stage or immune scores as the dependent variable, adjusting for their PAM50 subtypes as a covariate (Experimental Procedures). One hundred seventy-five pairs showed potential association (p Ͻ 0.05) with pathogical stage (Fig. 7A, supplemental Table S18). Kinase-substrate pairs stemmed from MAP kinases, including MAPK11:PPP1R13L pS113, MAPK9:STAT3 pS727, and MAPK8:NFATC1 pS359 showed top correlations with earlier clinical stages whereas MARK3:MARK3 pS377 and RPS6KB1:PHB2 pS293 are associated with later stages. We also conducted a survival analysis using the kinase-substrate score as the predictor and the PAM50 subtypes as a covariate in a Cox proportional hazards model, finding 83 pairs are potentially associated with survival (Fig. 7B, supplemental Table S19). Notably, these pairs are dominated by 32 CDK1 and 29 CDK2 trans-regulated pairs correlated with longer overall survival (Hazard Rate Ratio Ͻ 1) in the breast cancer cohort. Signaling pairs associated with poor survival include MAP3K11:MAP3K11 pS789S793, PRKAA2: PPP1R13L pS187, and NEK9:NEK9 pS868, supporting previous results showing the functionality of NEK9 in MAPK/ MEK signaling and resistance to PI3K inhibitor treatment (52).
Experimental Validation of Regulated Pairs from Global Proteomics-We applied two independent technologies, Western blotting and selected reaction monitoring (SRM), to validate multiple kinase-substrate pairs discovered in our global proteomics dataset (Methods). We first conducted Western blotting to validate our observations of up-regulated proteins and phosphosites found in the high throughput proteomics approach using PDX tumors from the Washington University Human in Mouse (WHIM) collection (15,59). Specifically, we measured expression of ERBB2, ERBB2 pS1151, EIF4EBP1, and EIF4EBP1 pS65 in 12 PDX breast tumors where these markers showed expressed variability in mass spectrometry data (Fig. 7E). For all 4 markers, we found significant associations between expression quantified through Western blotting and mass spectrometry (Fig. 7F, linear regression, p Ͻ 0.0005).
For ERBB2, the HER2-positive WHIM8 and WHIM35 showed notably high ERBB2 and ERBB2 pS1151 in both quantifications (Fig. 7E, 7F). For EIF4EBP1, Western blotting showed substantial increase of EIF4EBP1 and EIF4EBP1 pS65 for WHIM16, the top EIF4EBP1-expresser in MS data. However, the expression levels in other tumors were lower, not fully recapitulating the dynamic range in MS quantification (Fig. 7E, 7F). Of note, the samples were ordered from left to right (WHIM9 to WHIM2) by their pair scores of MAPK3: EIF4EBP1 pS65, an experimentally validated regulated pairs (60) also identified in our study. However, although their phosphosite levels do distinctly segregate into the highly expressed and poorly expressed groups, our results also demonstrate that EIF4EBP1 pS65 is likely regulated by other contextual factors in vivo.
We then applied a targeted proteomics approach SRM to validate cis-regulated pairs found in EGFR in 12 independent primary human breast tumor samples (Methods, Fig. 7G, 7H). For both phosphosites EGFR pT693 and pS1064, we observed positive associations. EGFR pT693 is significantly associated with EGFR proteins (p ϭ 5.74E-04); EGFR pS1064 was not associated but we only obtained EGFR pS1064 quantifications in 5 samples. EGFR pT693 maps to the juxtamembrane region of the protein (61) and was found to be upregulated in mutant PIK3CA knockin cells (62). Overall, the validation conducted in separate sample sets using different technologies validate the association we identified for ERBB2, EIF4EBP1, and EGFR. DISCUSSION We present a systematic discovery of quantitatively correlated kinase-substrate pairs in breast cancer (Fig. 1D). The high-throughput dataset generated by LC-MS/MS enabled global assessment of 33,239 phosphosites, 19,521 of which were not observed in two of the most comprehensive phospho- site databases, UniProt and Phospho.ELM (Fig. 1A, 1B). Our analysis allowed us to identify 2134 (387 cis and 1,747 trans) kinase/substrate correlation relationships; using the same bioinformatics pipeline, analysis based on RPPA-detected phosphosites only found 4 (supplemental Table S21) and seldom interrogated more than one phosphosite on a protein, further stressing the need of evaluating interactions through global phosphoproteomics in patient samples. This result clearly suggests that more regulated phosphosites in cancer will emerge with additional, even larger mass spectrometry-based proteome studies. Although our analyses have enlarged the catalogue of phospho-regulations, larger sample sizes will be required to fully establish findings and to discover weaker associations. Specifically, we identified many of cis/trans correlations in larger sets above 70 samples with concurrently observed kinase-substrate data (supplemental Fig. S2). Further, our current datasets have limited observations in multiple kinases and substrates and even more sensitive approaches will be required to assess their phosphorylation states. For example, this serine-rich dataset may also be complemented by observations using other techniques enriching for tyrosine residues (63)(64)(65), which may enhance findings of auto-phosphorylated tyrosines in receptor tyrosine kinases, including EGFR and HER2. Increased sample sizes will also enable more complex modeling of the multi-to-multi kinase/substrate relationships not fully accounted for in our current work.
We identified 61.4% (387/630) phosphosites of kinases showing significant cis-association, many of which were concentrated in known or in nominated breast cancer proteins, such as ERBB2, RRPS6KA4, NEK9, RIPK2, and PAK1 (Fig. 2). In contrast, only 4.51% (1747/38,710) trans kinase-substrate pairs showed significant association (Fig. 3). For example, PRKACA has only 23 out of 1,652 corresponding substrate sites being regulated and PRKCZ only has 1 out of 542. It is possible that trans substrate usage may be highly tissue-specific and that some of the previously curated pairs do not interact in breast cancer. Another possibility is that some kinase-substrate pairs established through in vitro evidence are not relevant in physiological environments, although the validation rate for in vivo and in vitro pairs do not differ significantly (supplemental Table S22). Future investigation using data across tissue and cancer types would be pivotal in addressing whether we ob-serve tissue-specific usage of kinase-substrate pairs. Such studies could also reveal the consistently high/low pairs (Fig.  1D) in each cancer type and highlight cancer-specific signaling. In addition to analyses of kinase-substrate pairs activated in specific breast cancer subtypes (Fig. 2, 3,6), expanded cohorts should also enable discovery of pairs associated with specific drivers or histological features.
Our approach validates and complements experimentally or binding motif-predicted phosphosites regulated by kinase (Fig. 1E). Although the regulated sites we discovered are based on observation using primary tumors, their casual relationships can be further validated by experimental perturbation or time-series analyses. Methodologies unifying diverse predictions and observations will be required to understand how each phosphosite is regulated.
The kinase-substrate pairs we discovered also complement previous studies focusing on singleton drivers of cancer. Conventionally, pathways were mostly constructed by linking single candidate driver genes (such as significantly mutated genes or focally amplified genes) through known interactions. Our approach detects the association of the signaling pairs in vivo, and thus directly validates the signaling impact of driver events. This approach also enabled us to build relevant subcascades stemming from potentially druggable kinases AKT1, BRAF, and EGFR (Fig. 6). To compare with other network-generating studies, we also constructed a network of all observed regulations (supplemental Fig. S12). However, such approaches may obfuscate activated subnetworks, as downstream phosphorylation targets could be mediated by multiple kinases.
This first large-scale examination of over 33,000 phosphosites in breast cancers creates a foundation for druggable analysis of kinase-substrate pairs beyond singleton druggable events (Fig. 5). Predictive value of response to targeted treatment has been limited in samples for some clearly-defined driver events in cancer. For example, PIK3CA mutation status cannot conclusively predict treatment response to PI3K inhibitor in clinical studies (66). Co-occurrence of downstream activating events, as we have observed for AKT1, BRAF, ERBB2, IGF1R, and RAF1 (Fig. 5B), may further support targeted inhibition. In both breast cancer PDXs showing ERBB2: ERBB2 pS1151 cis outlier pairs identified using both global proteomics and Western blotting (supplemental Fig. S9), lapa- FIG. 7. Clinical association of kinase-substrate pairs. A, Volcano plot showing association of kinase-substrate pairs with pathological stage. Positive coefficient denotes higher kinase-substrate scores associating with more advanced pathological stage. B, Volcano plot showing association of kinase-substrate pairs with survival. Hazard rate ratios greater than 1 denote higher kinase-substrate scores associating with poor survival. C, Volcano plot showing association of kinase-substrate pairs with transcriptome-based immune signature score, as calculated by the ESTIMATE algorithm. Positive coefficients denote higher kinase-substrate scores associating with higher immune scores. The color of each pair indicates whether its kinase or substrate belongs to the immune gene list used by ESTIMATE. D, Top kinase-substrate pairs (p Ͻ 1e-6) associated with immune scores where each dot indicates one breast cancer sample. E, Western blotting of ERBB2 (HER2), ERBB2 s1151 (p.HER2 S1151), EIF4EBP1 (4E-BP1), and EIF4EBP1 s65 (pEIF4E-BP1 s65) and actin control in 12 breast cancer PDX samples. F, Correlation between protein/ phosphoprotein expression measured through Western blotting versus high-throughput proteomics. Each dot indicates one sample in one experimental replicate. The dots are colored by the samples listed in (E). Four replicates were quantified through Western blotting for statistical analyses. G, Correlation between EGFR protein levels and p.T693 phosphosite level as measured through selected reaction monitoring (SRM). H, Correlation between EGFR protein levels and p.S1064 phosphosite level as measured through SRM. For (G) and (H), each data point represent samples with quantifications in a cohort of 12 independent human breast cancer samples. tinib treatment significantly reduced tumor growth (15). Further, we previously observed varied response to PI3K inhibition therapy in 3 of the breast cancer PDX models, where both WHIM18 and WHIM20 responded to PI3K/mTOR inhibitors and WHIM16 was resistant (15). WHIM18 and WHIM20 also showed notably high pair scores of the cis pair AKT1:AKT1 pS126S129 than WHIM16 (supplemental Fig. S9). Larger data sets using tumors undergoing treatments, such as the ongoing PDXNet projects establishing and investigating thousands of PDXs, will be instrumental for identifying treatment responses and formally testing the hypothesis that activated signaling pairs can accurately predict response to inhibition.
Further, we identified outlier kinase-substrate pairs in samples without singleton events for kinases including EGFR, MAPK3, and MAPK14 (Fig. 5B). Inhibition of the MAP2K1/2 (MEK1/2) upstream of MAP kinases surpressed the MAPK signaling pathway and its combinatory treatment with RTK inhibitors have resulted in tumor regression of triple-negative breast tumors (8). Our discovery of MAPK mediated pairs reveals therapeutic opportunities. Finally, resistance mechanisms often consist of rewiring of signaling pathways and could be further explored through high-throughput proteomics and approaches developed in this study.
Signaling networks are crucially important in cancer. However, large-scale omic studies to date have mainly focused on singling-out individual driver events and rarely investigated their signaling impact. Studying kinase-substrate relations in vivo, and most particularly in tumor samples from patients undergoing therapy, will uncover the wiring of signaling networks in each tumor and likely improve treatment design.