Phosphotyrosine Signaling Proteins that Drive Oncogenesis Tend to be Highly Interconnected*

Mutation and overexpression of receptor tyrosine kinases or the proteins they regulate serve as oncogenic drivers in diverse cancers. To better understand receptor tyrosine kinase signaling and its link to oncogenesis, we used protein microarrays to systematically and quantitatively measure interactions between virtually every SH2 or PTB domain encoded in the human genome and all known sites of tyrosine phosphorylation on 40 receptor tyrosine kinases and on most of the SH2 and PTB domain-containing adaptor proteins. We found that adaptor proteins, like RTKs, have many high affinity bindings sites for other adaptor proteins. In addition, proteins that drive cancer, including both receptors and adaptor proteins, tend to be much more highly interconnected via networks of SH2 and PTB domain-mediated interactions than nononcogenic proteins. Our results suggest that network topological properties such as connectivity can be used to prioritize new drug targets in this well-studied family of signaling proteins.

Receptor Tyrosine Kinase (RTK) 1 signaling networks evolved in Metazoans to process extracellular cues and elicit cellular responses such as differentiation, proliferation or migration. Canonical RTK signaling is initiated when a ligand binds to the extracellular domain of its cognate receptor, inducing receptor dimerization, activation of the intracellular tyrosine kinase domain, and auto-phosphorylation of intracellular tyrosine residues (1). These phosphotyrosine (pTyr) residues then serve as recruitment sites for one or more of the ϳ120 SH2 domains (2) and 44 PTB domains (3) encoded in the human genome. Upon recruitment, many adaptor proteins bearing SH2 and PTB domains themselves become phospho-rylated on tyrosine residues by active receptor or cytosolic tyrosine kinases. This second set of phosphorylation events leads to the recruitment of additional SH2 and PTB domaincontaining proteins (1). For example, the binding of GRB2 to pTyr427 of SHC1 (or pTyr317 in the p52 SHC1 isoform) induces activation of the RAS/MAPK kinase cascade (4). Similarly, binding of phosphatidylinositol 3-kinase (PI3KR1, p85) to pTyr612 of IRS1 initiates signaling through the PI3K/AKT cascade in cells exposed to insulin or insulin-like growth factors (5).
RTKs share a similar set of interactors and activate many of the same response pathways, including RAS/MAPK and PI3K/AKT cascades (1). They do not, however, necessarily elicit the same phenotypes. For example, both EGFR and NTRK1 induce MAPK signaling in PC12 cells, but EGFR triggers proliferation whereas NTRK1 promotes differentiation. Both of these phenotypes are dependent on ERK activity (6). This phenomenon is also observed clinically, where only a subset of RTKs have been shown to drive cancer despite sharing many downstream pathways (7). Current qualitative representations of signaling networks as linear cascades are inadequate to explain the diverse phenotypes that arise downstream of different RTKs.
Aberrant signal processing by RTK networks has been causally linked to cancer development, maintenance, and progression in many human tissues. Well-studied examples include overexpression of ERBB2 in breast cancer (8), KIT in testicular germ cell tumors (9), and MET in gastric cancer (10). Constitutive activating mutations of RTKs, such as those observed in the RET kinase (11,12) in multiple endocrine neoplasia type 2 or KIT in gastrointestinal stromal tumors (13) are also capable of driving oncogenesis. Similarly, mutation or overexpression of SH2 domain-containing cytosolic proteins such as ABL kinase (14) or the PI3K p85 regulatory subunit (15) can also drive cancer, in these cases by inducing constitutive enzymatic activity that is decoupled from upstream signaling events.
Recently, Barabá si and colleagues advanced a mathematical argument that network driver nodes, the nodes that control information flow in a network, should not be highly interconnected (16). If this notion is extended to cancer, in which signaling networks are substantially altered or rewired, we would expect that proteins driving oncogenesis would not be highly interconnected. We sought to determine experimentally if there is indeed a link between network connectivity and the propensity of a protein to drive cancer. In making this determination, we cannot rely solely on literature-derived interaction networks (17,18) as they are confounded by study bias (19,20). Specifically, oncogenic proteins are more intensively studied than nononcogenic proteins, potentially resulting in a bias in terms of number of binding partners. As a means to collect systematic pTyr-mediated interaction data, in vivo methods like the yeast two-hybrid system are not suitable as they do not allow for control over post-translational modification events (21). Systematic co-immunoprecipitation coupled with mass spectrometry is also problematic, as many interactions mediated by tyrosine phosphorylation are transient, with half lives on the order of seconds, and any particular cell type expresses only a subset of the proteome (22). Previous systematic research on the binding specificity of SH2/PTB domains has therefore been performed using phosphorylated peptides and in vitro binding assays. For example, pioneering studies of this type involving peptide libraries uncovered consensus binding motifs for a variety of SH2 and PTB domains (23)(24)(25). It is now apparent, however, that these motifs are simplified views of in vivo selectivity and that specificity is also defined by "anti-motifs" representing excluded contacts for SH2 domains (26), SH3 domains (27), and probably other modular interaction domains as well. As a consequence, it is very difficult to predict accurately whether an SH2 or PTB domain will bind to a particular sequence known to be phosphorylated in vivo.
To overcome these limitations, it is necessary to test interactions between binding domains and peptides bearing physiological sequences one at a time and in a noncompetitive setting. To perform such assays in an unbiased, high-throughput, and quantitative manner, we used protein domain microarrays (28). In brief, 134 purified recombinant SH2 and PTB domains were printed as microarrays in individual wells of 96-well microtiter plates. The arrays were then probed with fluorescently labeled phosphopeptides derived from known sites of tyrosine phosphorylation on human proteins. By probing the arrays with eight different concentrations of each peptide, full saturation binding curves were obtained, providing an estimate of the equilibrium dissociation constant for each biochemical interaction. Our previous work using this assay focused on nine RTKs: FGFR1, IGF1R, MET, NTRK2, PDGFR␤ (29) and the four ErbB receptors (3,30). In these studies, we found that the arrays correctly identify most previously reported interactions (3). In addition, they invariably highlight new interactions, many of which we validated biologically: they occur in nonengineered cells and play important roles in signal transduction (3,30), (31)(32)(33). We now use this approach to systematically quantify, on a nearly proteome-wide level, interactions between SH2 or PTB domains and known sites of tyrosine phosphorylation on both human RTKs and the adaptor proteins themselves. We find a very high degree of connectivity that challenges conventional, linear views of receptor-adapter interaction. Moreover, RTKs and adaptor proteins that have been shown to play a causal role in cancer tend to mediate substantially more interactions than those that do not. This suggests that these connectivity profiles may provide insight into how networks are rewired and may help prioritize new targets for anti-cancer drug discovery.

EXPERIMENTAL PROCEDURES
A detailed protocol for preparing protein domain microarrays has been described previously (28).
Determining Interaction Affinity-Following peptide binding, arrays were scanned at multiple PMT voltages on a Tecan LS400 microarray scanner on both Cy5 and Cy3 channels. Spots with saturated pixels were eliminated. The remaining spots were fit to a line that allows for the conversion of all the Cy3 values measured at different PMT voltages to the same scale. The fold-over-background value of a titration was determined by taking the trimmed mean of the Cy3 values for each domain-peptide titration divided by the Cy3 values of the Thioredoxin control spots. The mean Cy3 value of the Thioredoxin control spots was then subtracted from the Cy3 value of the domain spot and the spots were normalized by the Cy5 value.
Each domain was printed in quadruplicate in each well and the arrays were probed using eight different concentrations of each peptide, ranging from 5 M down to 10 nM. These 32 data points were then fit to the following equation using MATLAB ® 's robust fit function with bisquare weights: where F obs is the mean fluorescence of replicate spots, F max is the fluorescence at saturation, [Peptide] is the total concentration of phosphopeptide, and K D is the equilibrium dissociation constant. Robust fitting procedures are more resistant to the presence of outlier data resulting from missed spots, fluorescent debris or other aberrations (34). Only titrations with fold-over-background values in the top 10% of the data and R 2 values over 0.9 were kept. Replicate affinities were then averaged in log space. The MATLAB ® code used to perform the analysis is supplied in supplementary information.
Determining Oncogene Status-Oncogene status for RTKs was determined using the Sanger Institute Cancer Gene Census (7,35). The Sanger list contains only those genes for which a strong causal link to cancer has been established. Because many SH2/PTB domain-containing proteins have no catalytic activity of their own, they serve principally to amplify and propagate signals from upstream proteins. For this reason, adaptors were assigned as having a role in cancer by their presence in a union of the Sanger list and other cancer gene lists (36 -38) compiled by the Bushman lab (http://microb230.med.upenn.edu/links/genelist).
Statistical Tests for Enrichment of Binding-To test against the null hypothesis that the distribution of connectivity is equivalent for both Oncogene and non-Oncogene classes, we performed the nonparametric Mann-Whitney U test in MATLAB ® R2011a (The MathWorks, Inc., Natick, MA). Only those interactions with dissociation constants below 1 M were considered.
RTK and Adaptor Expression in Tumor Samples-cBio Cancer Genomics Data Server Matlab Toolbox v1.04 was used to retrieve mRNA sequencing data corresponding to 1437 tumors from breast cancer, colon, and rectal adenocarcinoma (39) and clear cell kidney cancer collected by The Cancer Genome Atlas Network. We used a 1 read per kilobase of exon model per million mapped reads (RPKM) threshold to separate genes which are expressed. This threshold has been shown to correspond to 1 transcript per C2C12 cell (40) and optimizes overlap with mass spectrometry results in HeLa cells (41). We consider this to be an upper bound estimate to the total number of potential interaction partners.

RESULTS AND DISCUSSION
Data Collection-We started by compiling a list of known sites of tyrosine phosphorylation on human RTKs and on all SH2 and PTB domain-containing proteins listed in the Phos-phoSitePlus database (42). We restricted our studies to experimentally verified sites of tyrosine phosphorylation as nonphysiologically relevant sites, when artificially phosphorylated, may also bind SH2/PTB domains (43) and thereby confound any systems level conclusions drawn from the resulting data. We focused on 40 of the 53 Uniprot-annotated RTKs that had more than three phosphorylation sites to include only those receptors whose biology is sufficiently understood to enable systematic comparison. This resulted in a total of 729 unique phosphopeptides that we successfully synthesized, fluorescently labeled, and purified using high performance liquid chromatography (supplemental Table S1). We then used each of the peptides to probe microarrays comprising virtually every human SH2 and PTB domain (supplemental Table S2). The arrays were probed at eight concentrations of each peptide, ranging from 10 nM to 5 M (see, for example, Fig. 1A). By repeating this process for each phosphopeptide we were able to generate a quantitative interaction map for the receptor (Fig. 1B). This process was repeated for all of the remaining RTKs to generate, for the first time, a global, systematic, and unbiased view of RTK recruitment (Fig. 1C). Larger versions of the connectivity diagrams for each RTK are available in supplemental Fig. S1. Only interactions with K D Ͻ 1 M are included in these diagrams. SH2 domainmediated interactions of lower affinity have been shown to be physiologically relevant intramolecularly, but it is generally accepted that most bona fide intermolecular interactions exhibit dissociation constants below 1 M (44 -46).
As with any assay, one invariably encounters false positives and false negatives. With protein domain microarrays, the most frequent source of false positives is nonspecific binding between peptide and domain, whereas false negatives likely arise from low surface activity of the domains, presumably because some domains denature on the slide surface or are preferentially immobilized in a way that blocks access to the binding site. To estimate the stochastic false positive rate, we conservatively assumed that domains having the lowest frequency of binding were actually inactive in our assay and exhibited no true-positive interactions (any interactions identified for these domains are by definition false positives). We observed that domains in the bottom 20% of the binding frequency spectrum accounted for 20 interactions out of the 37,490 titrations performed on the domains, corresponding to a stochastic false positive rate of 5.33 ϫ 10 Ϫ4 . If we extend this rate to all the titrations performed (195,546), we estimate that 104 of the 2808 interactions we identified are liable to be stochastic false positives (false discovery rate of 3.72%). As there is no reason to believe that false positive errors are observed preferentially with phosphopeptides derived from cancer-causing proteins, the primary conclusions of this study are not affected.
Another potential limitation of our approach is that the current list of physiological sites of tyrosine phosphorylation in PhosphoSitePlus may be incomplete. This would be particularly problematic if nononcogenic RTKs were less well annotated than oncogenic ones. To investigate this possibility, we used phosphotyrosine-directed mass spectrometry (30) to study six nononcogenic members of the Ephrin class of RTKs -EphA2, EphA3, EphA4, EphB2, EphB3, and EphB4. We overexpressed each receptor in HEK293T cells, a procedure that induced receptor auto-activation and phosphorylation in all six cases. Receptors were then immunoprecipitated using an anti-pTyr antibody and subjected to targeted and untargeted LC-MS/MS. Using this approach, we were able to identify 32 out of the 38 known sites of intracellular tyrosine phosphorylation (84% sensitivity). Remarkably, we did not identify any sites of tyrosine phosphorylation beyond those already reported in the PhosphoSitePlus database (supplemental Table S4). This suggests that the many high throughput, pTyr-directed mass spectrometric studies that have been used to populate PhosphoSitePlus are not biased against nononcogenic receptors and that the existing list of tyrosine phosphorylation events, at least on RTKs, is nearly complete.
Oncogenic RTKs are Highly Connected-By examining the connectivity profile of the 40 RTKs at various affinity thresholds ( Fig. 2A), we sought to identify whether a link exists between connectivity and oncogenicity. We determined whether or not an RTK is an oncogene based on its inclusion in the Sanger Institute's Cancer Gene Census (7) (http://www. sanger.ac.uk/genetics/CGP/Census/), which seeks to determine a strict causal (and not merely correlative) relationship between cancer development and mutation and/or overexpression of a given gene. Based on these assignments, we found that oncogenic RTKs have a significantly higher median connectivity in our interaction data set than nononcogenic RTKs. At an affinity threshold of 1 M, for example, the median number of binding partners is 21 for nononcogenic RTKs and 56 for oncogenic receptors, corresponding to a ϳ2.5-fold difference in the number of interactions (Mann-Whitney U test p ϭ 2.77 ϫ 10 Ϫ5 ). Nononcogenic RTKs also have a median of six phosphorylation sites, whereas oncogenic RTKs have nine phosphorylation sites (Mann-Whitney U test p ϭ 0.007), corresponding to a 50% increase. This indicates that the primary reason for the increased connectivity of oncogenic RTKs is the presence of more promiscuous pTyr docking sites and only secondarily an increase in the number of docking sites. For instance, the nononcogenic receptor CSF1R, although it has twice as many phosphorylation sites as NTRK3 (8 versus 4), has far fewer high-affinity (K D Ͻ1 M) binding partners (6 versus 58). In addition, the RTK with the greatest number of phosphorylation sites in our study is the nononcogenic ERBB4 with 16. ERBB4 has only 18 binding partners, however, which places it in the bottom 25% of the RTKs we studied.
Many interactions annotated here may be of too low affinity to bind at appreciable levels to a receptor that is present at low surface density or at a low level of activation. When a cell becomes malignant, however, gene amplification or overexpression may make phosphorylated receptors (or adaptors) sufficiently abundant that low affinity interactions are enabled and downstream signaling activated. This notion is supported by the observation, in PC12 cells, that EGFR mediates proliferation when present at normal levels but differentiation when overexpressed (6). These phenomena cannot be captured in qualitative diagrams of signaling and highlight the need to think of these interactions as being contextually conditional and existing on a quantitative spectrum without a single fixed threshold. Adaptor-Adaptor Binding-Focusing only on the initial recruitment of adaptors to RTKs provides an incomplete view of the full complexity of early RTK signaling. Adaptors can themselves become tyrosine phosphorylated and interact with each other, forming multi-protein complexes at the receptor. By systematically probing our SH2/PTB domain microarrays with phosphopeptides derived from 334 known pTyr sites on adaptor proteins, we identified a dense network of interactions among the adaptor proteins themselves (Fig. 3A). For example, the well-studied tyrosine kinase ABL1, which is responsible for driving the development of chronic myeloid leukemia when fused with the protein BCR, is currently annotated to mediate 15 interactions through its SH2 domain (47). In our study, we identified 53 biochemical binding partners, including known interactions with DAB1, ERBB2, PTK2, and SHC1 (47). Our data may also resolve some outstanding questions in the field. For example, it has previously been reported that the Vav1 SH2 domain binds to a phosphorylation site on BCR/ABL and that this interaction is critical for activation of Rac-1 and BCR/ABL-mediated leukemogenesis. Previous attempts to identify the site of binding by constructing a series of Tyr to Phe mutants, focusing in particular on the Vav1 consensus binding motif, were unsuccessful (48). Our data reveal an interaction between Vav1 and pTyr917 of BCR/ ABL (Uniprot id: A9UF07), which was not included in the earlier mutagenesis experiments.
RTK-mediated activation of the MAPK pathway is well studied (49,50), but we nevertheless find many new interactions that impinge on this pathway. RTKs vary in their ability to recruit early signaling proteins in this pathway, such as SHC1, GRB2, and PTPN11. Here, we find that these adaptors have the potential to interact with many other signaling proteins as FIG. 3. Adaptors are densely interconnected. A, Adaptor proteins displayed in a circle with the size of the band corresponding to protein length (64). Lines begin at sites of tyrosine phosphorylation and terminate in either a yellow triangle corresponding to a PTB domain or a blue triangle corresponding to an SH2 domain and are color coded according to affinity. B, Visualization of the binding partners of ABL1 SH2 domain (in-links) and phosphorylation sites of ABL1 (out-links). C, RTK activation of the MAPK cascade is a varied process that integrates binding of GRB2 to SHC1 and SHP2 as well as to the RTK. D,The combinatorial complexity of RTK activation of MAPK is liable to be even greater as GRB2, SHC1 and PTPN11 (SHP2) are also capable of binding other adaptors, as can be seen by the diagrams, as well as other RTKs (not visualized).
well, increasing the complexity of MAPK pathway activation (Fig. 3D). This view aligns well with the idea that RTKs function not as discrete molecular machines, as typically depicted, but as pleomorphic ensembles that are highly complex and dynamic with many rapidly interchanging states (22,51).
Oncogenic Adaptors are Highly Connected-To probe the relationship between connectivity and oncogenicity in the adaptor-adaptor layer of binding, we organized domains by sequence similarity (Fig. 4A) (52). The blue bars indicate how many peptides bind to each domain (in-links) and the green bars show how many binding partners interact with phosphorylation sites on the corresponding protein (out-links). Simple inspection reveals a wide range of binding promiscuity within the SH2 and PTB domain families. For example, proteins in the SRC, HCK, and PI3K N-terminal families of domains have relatively high numbers of binding partners and are frequently annotated as oncogenes. When we sort the domains by the number of in-links and out-links, from least to most connected, it is clear that oncogenic proteins (red) are preferentially enriched among proteins with high in-connectivity (median 20 versus 8 in-links, Mann-Whitney U test p ϭ 3.4 ϫ 10 Ϫ3 ), high out-connectivity (median 9 versus 1 out-links, Mann-Whitney U test p ϭ 1.0 ϫ 10 Ϫ4 ), or both (Fig. 4B). Because the adaptors are likely to amplify signals downstream of RTKs, especially those without intrinsic catalytic activity themselves, we used an expanded definition of oncogene that incorporates additional cancer gene lists as well as the Sanger list (see Methods).
The quantitative interaction data generated in this study (supplemental Table S3) should prove generally useful in the fields of network modeling (53) and interaction prediction (54). With the caveat that our data necessarily include false positives and false negatives, we also hope that the novel interactions identified in these studies will aid efforts to uncover signaling pathways downstream of RTKs. To facilitate the use of our data, the supplementary information for this paper contains the annotated interaction list (supplemental Table  S3) and raw microarray image intensities as well as the MATLABா scripts used to fit the titration curves, perform statistical analyses, and generate figures (Supplementary Code).
Our work shows that the genes whose mutation or overexpression drives cancer tend to be highly connected hubs. Moreover, the most highly connected proteins in our networks are the primary targets of new anticancer drugs (e.g. erlotonib for EGFR (55), lapatanib for ERBB2 (56), vandetanib for RET (57), sunitinib for KIT (58), and imatanib for ABL1 (14)). Analogously, viral proteins preferentially target hub proteins during infection (59,60). This combined evidence suggests that disease states arise preferentially from the perturbation of network hubs and drugs should target these same hub proteins. FIG. 4. Oncogenic adaptors are more highly promiscuous. A, Phylogeny of adaptors shows that genetically related domains share similar binding profiles. For example the SRC class (SRC, YES1, FGR) of tyrosine kinases all show a large number of in-links. This can be because of sequence determinants of selectivity as well as the bias in the phosphoproteome to sequences that interact with the domain family. B, Adaptors sorted according to the number of peptides with which they interact (in-links; the x axis) and the number of adaptors that interact with them (out-links; y axis). Oncogenic adaptors have both higher median in-links (20 versus 8, Mann-Whitney U test p ϭ 3.4 ϫ 10 Ϫ3 ) and out-links (9 versus 1, Mann-Whitney U test p ϭ 1.0 ϫ 10 Ϫ4 ) than nononcogenic adaptors. SH 2 domains bind a higher median number of peptides (15.5 versus 2, Mann-Whitney U test p ϭ 1.7 ϫ 10 Ϫ6 ) but share a similar number of out-links (Mann-Whitney U Test p Ͼ 0.01).
In conclusion, we have generated a systematic map covering a substantial fraction of the potential interactions between SH2 or PTB domains and sites of tyrosine phosphorylation on RTKs and adaptor proteins. These interactions are very poorly represented in existing unbiased human interactomes (61,62) despite extensive evidence they play essential roles in signal transduction. We observe a high degree of connectivity among RTKs and adaptor proteins, and among adaptor proteins themselves. This is in contrast to the usual depiction of receptor-proximal signaling as a series of linear pathways connecting sites of tyrosine phosphorylation on RTKs to a few adaptor proteins and then to a few core signaling molecules such as ERK and AKT. Of course, the actual complexity of the resulting network in any particular cell type depends not only on the affinities of the interactions (as determined here) but also on the relative abundance of RTKs and SH2/PTB-containing proteins. mRNA sequencing data of 1430 diverse tumor types collected by the TCGA consortium shows that they express a median of 78% of the phosphotyrosine signaling proteins analyzed in this study. In fact, over half of these proteins are expressed ubiquitously (in Ͼ95% of the tumor samples; supplemental Fig. S2). Thus, it is highly likely that receptor and adaptor proteins combine to form a highly interconnected mesh with the potential to perform complex signal processing functions.