Receptor Tyrosine Kinase (RTK) Mediated Tyrosine Phosphor-Proteome from Drosophila S2 (ErbB1) Cells Reveals Novel Signaling Networks

Protein phosphorylation mediates many critical cellular responses and is essential for many biological functions during development. About one-third of cellular proteins are phosphorylated, representing the phosphor-proteome, and phosphorylation can alter a protein's function, activity, localization and stability. Tyrosine phosphorylation events mediated by aberrant activation of Receptor Tyrosine Kinase (RTK) pathways have been proven to be involved in the development of several diseases including cancer. To understand the systems biology of RTK activation, we have developed a phosphor-proteome focused on tyrosine phosphorylation events under insulin and EGF signaling pathways using the PhosphoScan® technique coupled with high-throughput mass spectrometry analysis. Comparative proteomic analyses of all these tyrosine phosphorylation events revealed that around 70% of these pY events are conserved in human orthologs and paralogs. A careful analysis of published in vivo tyrosine phosphorylation events from literature and patents revealed that around 38% of pY events from Drosophila proteins conserved on 185 human proteins are confirmed in vivo tyrosine phosphorylation events. Hence the data are validated partially based on available reports, and the credibility of the remaining 62% of novel conserved sites that are unpublished so far is very high but requires further follow-up studies. The novel pY events found in this study that are conserved on human proteins could potentially lead to the discovery of drug targets and biomarkers for the detection of various cancers and neurodegenerative diseases.


Introduction
Unraveling signaling networks from the perspective of understanding systems biology has been the most popular approach to set up an effective platform to identify sensitive cell signaling nodes leading to novel drug targets [1]. High-throughput mass spectrometry approaches along with improved techniques such as SILAC for quantitative proteomics have provided the building blocks of the current knowledge base for this new grammar of drug discovery [2]. About 60% of Drosophila proteins have human homologues with well-conserved canonical signaling cascades. Because Drosophila is a less complex model system than a vertebrate, it gives an opportunity to analyze complex signaling networks and translate the findings to identify novel drug targets for human diseases. Datasets from model systems with conserved canonical signaling pathways (such as Drosophila) play an important part in rapidly generating a knowledge base.
Aberrant activation of RTK pathways has been shown to be involved in the development of various types of cancers [3][4][5][6]. Recent therapeutic approaches have involved the development of drugs in the form of small molecules or monoclonal antibodies that block or control activation of tyrosine phosphorylation events on specific proteins to control the progression of cancer; some of these are available currently in the market [7], [8].
The technically challenging nature of tyrosine phosphorylation modifications is mainly attributed to: 1) occurrence of tyrosine phosphorylation modifications on very low-abundance proteins, 2) lower relative abundance of tyrosine phosphorylations compared to serine and threonine phosphorylations, 3) very low stoichiometry and 4) labile nature of pY events during various chemical manipulations as required for mass spectrometry analysis [2]. Unlike serine and threonine phosphorylation modifications, the rules of consensus do not work well with tyrosine phosphorylation, and programs based on algorithms to predict tyrosine phosphorylation have not matched experimental outcomes. Hence a comprehensive high-throughput effort focused on generating tyrosine phosphorylation profiles will add to the knowledge base used to construct robust algorithms based on large datasets.
Here we report a phosphor-proteome from Drosophila exclusively focused on tyrosine phosphorylation events under insulin and EGF signaling pathways. We also present the salient features of the Drosophila proteome architecture and the comparative proteomic analysis for conserved tyrosine phosphorylation events on human proteins.

Phosphopeptide profiles from Drosophila S2 (EGFR) cells
Using the PhosphoScanH technique [9], a high-throughput mass spectrometric analysis of lysates after independent stimulation of the cloned human EGF receptor (ErbB1) and the endogenous insulin-like receptor (InR) in S2 cells yielded a tyrosine phosphopeptide spectrum of 658 tyrosine phosphorylated peptides. Out of these 658 phosphopeptides, 511 were nonredundant consisting of 543 individual tyrosine phosphorylation (pY) events from 290 different proteins spanning the entire Drosophila proteome, with 1-14 sites per protein. The time points of RTK activation (0, 2, 8 and 12 minutes) used for mass spectrometry analysis represent the dynamics of D-ERK activation upon RTK stimulation. RTK-mediated activation of D-ERK in these S2 (EGFR) cells is initiated as early as 2 minutes and decreases after 12 minutes.
Activation of the endogenous insulin receptor yielded 63 peptides with 70 pY events on 38 different proteins whereas activation of over-expressed ErbB1 in S2 cells yielded 283 phosphopeptides containing 325 pY events on 177 proteins. About 20% (116/511) of phosphopeptides containing 146 pY events on 79 proteins were also found in serum-starved samples that were devoid of GF treatment ( Figure 1).
We found 6 pY events on Drosophila Insulin-like receptor (InR) out of which three are conserved in the human insulin receptor and IGF-1R. We identified three phosphorylation sites (Y1110, Y1172 and Y1192) on the cloned ErbB1 receptor in the S2 cell line used in this study.

Architecture of RTK-mediated Tyrosine phosphorproteome
Classification of phosphoproteins revealed various functional categories such as kinases, signaling adaptors, actin binders, microtubule regulators, cell-adhesion molecules, GEFs/GAPs, ubiquitin modifiers, transcriptional and translational regulators, intracellular transport factors, endocytotic epsins, molecular chaperones and proteins involved in various biosynthetic pathways including carbohydrate metabolism. A detailed classification of Drosophila tyrosine phosphopeptides into 16 different categories is given in Table S1 along with information about activated RTK (InR or ErbB1) corresponding to each peptide.
The phosphopeptide profiles obtained upon activation of insulin and EGF RTKs gave the largest spectrum of pY events, representing all major signaling pathways (MAPK pathway, PI3Kinase pathway, STAT pathway and PLCc pathway) under RTK activation responsible for cell growth, proliferation, survival and differentiation.
About one-fourth of the phosphoproteins (74/290) found in this study have no known molecular or biological function, but about half of these have motifs suggestive of their molecular function.
In the context of Drosophila development, phosphoproteins found in this study are involved in early embryogenesis, cellularization, early and late blastulation, gastrulation, patterning and cell migration. Proteins involved in major organogenesis pathways such as heart and muscle development, tubulogenesis (tracheal and myotube development), dorsal vessel, CNS development and reproductive system and down-regulation of RTKs (receptor endocytosis pathway) are also phosphorylated. A detailed illustration of various organ systems, cellular processes and signaling pathways represented by the tyrosine phosphorproteome is illustrated in Figure 2.
In the category of RTKs other than insulin and EGFR, the pvr-(PGDF and VEGF homologue) gene is a major protein with 13 pY events. Classification of pY events based on cellular biological processes yielded four major categories: 1) signaling proteins in CNS development (101 pY events on 44 proteins), 2) dorsal closure (65 pY events on 11 proteins), 3) cell polarity (29 pY events on 19 proteins) and 4) Ras-interacting proteins that included REF and GEFs (22 pY events on 18 proteins).

Comparative analysis of Drosophila phosphoproteome against human protein database
Using InParanoid (SBC) and HomoloGene (NCBI) browsers and the eukaryotic ortholog list from Flybase, a search for human orthologs of Drosophila proteins yielded 216 protein matches. For those cases in which we could not identify any human ortholog, we conducted two different BLAST searches (BLAST2 and BLASTP) to find out the nearest human protein match. ClustalW2 (EMBL-EBI) analysis of 511 tyrosine phosphopeptides on 290 proteins was done using human protein databases (Swiss-Prot and NCBI) to uncover all the pY events conserved in respective human homologous proteins. The standard parameters (Protein Gap open penalty: 10.0; Gap Extension penalty: 0.2; Protein matrix: Gonnet; Protein ENDGAP: -1 and GAPDIST: 4) were used for the entire analysis. Out of these, 175 human protein matches had conserved phosphorylated tyrosines. Often a single Drosophila protein match contained more than one human protein with conserved tyrosine.
About 75.1% (244/325) of pY events upon ErbB1 stimulation are conserved in the human proteome compared to 65.7% (46/70) from activation of insulin receptor and 56.8% (83/146) of pY events common to both EGF and insulin stimulation. A total of 373 pY events (68. 9%) in this data are conserved in the human proteome ( Figure 1).
The human proteins with conserved tyrosines had protein homology ranging from as high as 95% to as low as 20%. We found many tyrosines conserved on the human tyrosine kinases, receptors and a variety of cytoplasmic and membrane-bound proteins involved in CNS development (especially the process of axon guidance and retinal degeneration), cell polarity and carbohydrate metabolism. We checked the various literatures (PubMed) and databases (PhosphoSiteH) hosted by CST Inc, UniProtKB/Swiss-Prot (http://www.expasy.org/uniprot) and PhosphoPep (http://www.phosphopep.org/) to see if the conserved tyrosines are actual in vivo sites on human proteins. Based on the available literature, 38% of the hits found in this study are already found and reported as confirmed in vivo tyrosine phosphorylation sites in human proteins. Table 1 contains a category-wise count of conserved pY events that are confirmed in vivo sites in human proteins as well the novel sites that are not reported so far. Phosphorylated tyrosines on Drosophila are conserved on more than one human protein with equivalent molecular function even if they are distantly related, irrespective of the extent of protein homology. For example, the tyrosines Y1545 and Y1550 on Drosophila insulin receptor (InR) are conserved in human INR, IGF-1R RET oncogene, LTK, MUSK, ALK (anaplastic lymphoma kinase), ALK Ki-1 variant, TFG/ALK fusion kinase, NTRK2, NTRK2 and MERTK. From the same perspective we find a conserved pY event on the Ras interacting protein coded by roughened (R) which is conserved in H-Ras, K-Ras R-Ras, N-Ras, RAP1A and RAP1B. A detailed clustal map showing the clustal alignment of Gene R with human proteins showing conserved tyrosines is given in Figure 3.

Many signaling proteins with conserved tyrosines are involved in disease phenotypes
Tyrosine phosphorylation events conserved on proteins involved in muscular dystrophy, retinal degeneration, Alzheimer's, breast cancers, acute myelogenous leukemia (AML), aggressive melanomas and several other diseases are very interesting molecular targets warranting further validation in transgenic mouse models and tumor cell lines. In total we find 30 pY events on 25 proteins involved in various types of cancers, 8 pY events on 8 leukemia related proteins, 34 pY events on 30 proteins involved in various genetic disorders, and 18 pY events on 14 proteins involved in various neurodegenerative diseases. A list of important proteins involved in human disease development with the location of conserved pY residues is given in Table 2 and a detailed list of all the human proteins with conserved tyrosines that are phosphorylated in respective Drosohila proteins is given in Tables S2 and S3. A few of these conserved pY events are specific to a particular isoform of the respective protein orthologs. A small number of pY events on fly proteins reveal natural (Y-F, Y-H, Y-W and Y-D) substitutions in the respective human orthologs (Table S2).

Discussion
The tyrosine phosphopeptide profiles presented here represent the largest dataset reported in Drosophila to date. This dataset is unique because it highlights activated proteins upon activation of growth factor RTKs (endogenous insulin RTK and human EGFR). Many of the novel phosphorylation events found in this study on proteins previously not known to be involved in RTK pathways, represent new signaling nodes that merit further validation. One of the major issues in the conventional SDS-PAGE method to identify pY modifications is the significant loss during recovery of peptides after in-gel digestion of total protein entrapped in the PAGE gel matrix. Even though this is not an issue in the case of protein identification as such, it may pose many technical impediments for identifying post-translational modifications, especially on tyrosine (pY), which are labile during chemical processing of peptides and recovery from the gel matrix. The PhosphoScanH technique is a non-gel based method involving direct immunoaffinity precipitation of pY peptides concentrated from whole-cell-lysate digests; it facilitates identification of pY sites on both less-abundant proteins and proteins of low stoichiometry of phosphorylation, and importantly, avoids technical complications of peptide recovery from gel matrix. The quality and quantity of phosphopeptides obtained in this study is far better than methods using various IMAC columns. Evaluating various enrichment techniques for tyrosine identification of tyrosine phosphorylations, Schumacher et al [10] found that immunoaffinity precipitation is superior to the immunoaffinity chromatography method.
Peptides containing multiple acidic residues (D and E) around phosphorylated tyrosines are technically difficult for mass spectrometric analysis. Very low stoichiometry of tyrosine phosphorylation makes it almost impossible to detect many tyrosine modifications on low-abundance proteins. A typical example in our case was the identification of tyrosine phosphorylation sites on a nuclear import protein (Dim7) with a very low stoichiometry (0.004 moles/mole). High cell quantities (  Overlap of Drosophila phosphoproteome data with other HT screens A minimal overlap of pY sites from Kc cell proteome data is seen with respect to tyrosine phosphorylation [11]. A considerable overlap of proteins representing various phosphopeptides from our HT mass spectrometry data from insulin stimulation was seen with the candidate gene list from RNAi screen data for insulin RTK mediated ERK activation [12]. But the EGF RTK mediated HeLa cell phosphor-proteome data showed overlap of only one pY site with respect to tyrosine phosphorylation modifications, even though many candidate proteins with conserved tyrosines from our data are either serine or threonine phosphorylated upon EGFR activation [2]. A recent report [13] on signaling networks assembled by oncogenic EGFR and c-Met pathways and a HT mass spectrometry analysis of pervanadate-stimulated S2 cells [14] data have a considerable overlap with our data. The RTK mediated tyrosine phosphor-proteome network under insulin and EGF RTKs has not only validated existing tyrosine phosphorylated signaling nodes but also reveals several novel insights in to regulation of RTK signaling and crosstalk with various other pathways (Figures 4a and 4b).

Translational value of conserved pY events in Drosophila proteome and future directions
The novel tyrosine phosphorylation events from the Drosophila proteome conserved in 100 human orthologs and paralogs constitute a valuable resource to translate the missing signaling connections and nodal points in the human proteome from the perspective of disease development. The data warrants further validation in human tumor cell lines and tissue samples to see if these pY events are up-regulated or down-regulated in GF signaling with respect to human disease phenotypes.
Recent reports indicate the importance of genes in glycolytic pathways in cancer progression [15]. Mechanisms of tumor growth based on the selective switching of cellular processes towards anabolic pathways rather than oxidative phosphorylation also stress the importance of glycolytic proteins in cancer development [16]. Our study in the fly proteome reveals that several proteins involved in glycolytic pathways are tyrosine phosphorylated and that these candidate human proteins with conserved tyrosine residues merit further study. Upregulation of monocarboxylic transporters (MCA) in Type-1 diabetic patients indicate the possibility of increased capacity of the brain to use non-glucose substrates to meet energy requirements during hypoglycemia [17]. Our study reveals that several MCA transporters are tyrosine phosphorylated upon RTK activation. It will be interesting to see if the human orthologs with conserved phosphorylated tyrosines found in this study are involved in similar mechanisms in brain-cell energetics. Based on the available protein expression data, interesting candidate proteins could be selected for analysis of the dynamics of tyrosine phosphorylation with respect to disease development. Reverse-phase protein microarrays could be a very useful tool in this direction [18]. Peptide arrays containing novel conserved pY modifications could be used to probe SH2 domain containing protein arrays to expand the signaling network with respect to a particular pathway [19].

Materials and Methods
Cell culture Growth factor stimulation S2 (EGFR) cells were grown in 15-cm culture dishes to about 80% confluency. The cells were starved by replacing complete medium with minimal medium without serum (15mL of minimal medium per plate) for about 18-20 hours. The next day cells were stimulated with EGF (100ng/mL of medium) or insulin (5 ug/mL of medium) for 2, 8 and 12 minute time intervals. A total of 10 plates that gave approximately 2610 8 cells were used per stimulation. A set of 10 plates of overnight-starved cells was used as the control. The cells were lysed after stipulated time of growth factor treatment using the lysis buffer for containing 6M urea. As per the manufacturer's directions, 10 mL of lysis buffer were used to lyse cells from a batch of ten 15-cm culture plates. The lysate from insulin and EGF treated samples were subjected to further processing as per the manufacturer's directions (PhosphoScanH Kit (P-Tyr-100) #7900, Cell Signaling Technology Inc., Danvers MA USA).

Peptide extraction, LC-MS/MS analysis and assignment of tyrosine phosphorylation
Digestion of total lysate with trypsin, reverse phase solid phase extraction of digests, immunoaffinity purification of phosphopeptides, analysis of phosphopeptides by LC-MS/MS analysis, evaluation of MS/MS spectra using Sequest browser and assigning phosphopeptide sequences and review of assigned peptide sequences using a two step process were done at the Cell Signaling Technology facility following the methods described by Rush et al [9].

Identification of human homologues and conserved tyrosines on human proteins
A combined list of human orthologs and paralogs for the respective Drosophila proteins was obtained using InParanoid (Version 6.0) hosted by SBC, HomoloGene hosted by NCBI and also using the eukaryotic ortholog list from Flybase.
Two different Blast searches (BLASTP hosted by NCBI and BLAST2 hosted by SIB blast network) were also made for individual whole protein sequence of tyrosine phosphorylated Drosophila proteins against the human protein database.
ClustalW2, a multiple sequence alignment program was used to align each Drosophila protein sequence against all the human protein matches from the blast searches to identify all the conserved tyrosines that are tyrosine phosphorylated in each of the 290 Drosophila proteins. The parameters used were: Protein Gap open penalty: 10.0; Gap Extension penalty: 0.2; Protein matrix: Gonnet; Protein ENDGAP: -1 and GAPDIST: 4.

Supporting Information
Table S1 Overall classification of tyrosine phosphopeptides based on functional category of proteins. Summary of all the tyrosine phosphorylation sites that are specific to insulin RTK, EGF RTK and sites common to starved control cells/insulin/EGF treated cells. The conserved position of tyrosine on respective human ortholog and paralog are given for each phosphorylated site on Drosophila proteins. The Swiss-Prot protein ID for each human ortholog and paralog is also provided.