Systems-level analysis reveals multiple modulators of epithelial-mesenchymal transition and identifies DNAJB4 and CD81 as novel metastasis inducers in breast cancer

Epithelial-mesenchymal transition (EMT) is driven by complex signaling events that induce dramatic biochemical and morphological changes whereby epithelial cells are converted into cancer cells. However, the underlying molecular mechanisms remain elusive. Here, we used mass spectrometry based quantitative proteomics approach to systematically analyze the post-translational biochemical changes that drive differentiation of human mammary epithelial (HMLE) cells into mesenchymal. We identified 314 proteins out of more than 6,000 unique proteins and 871 phosphopeptides out of more than 7,000 unique phosphopeptides as differentially regulated. We found that phosphoproteome is more unstable and prone to changes during EMT compared to the proteome and multiple alterations at proteome level are not thoroughly represented by transcriptional data highlighting the necessity of proteome level analysis. We discovered cell state specific signaling pathways, such as Hippo, sphingolipid signaling, and unfolded protein response (UPR) by modeling the networks of regulated proteins and potential kinase-substrate groups. We identified two novel factors for EMT whose expression increased upon EMT induction: DnaJ heat shock protein family (Hsp40) member B4 (DNAJB4) and cluster of differentiation 81 (CD81). Suppression of DNAJB4 or CD81 in mesenchymal breast cancer cells resulted in decreased cell migration in vitro and led to reduced primary tumor growth, extravasation, and lung metastasis in vivo . Overall, we performed the global proteomic and phosphoproteomic analyses of EMT, identified and validated new mRNA and/or protein level modulators of EMT. This work also provides a unique platform and resource for future studies focusing on metastasis and drug resistance. state involves a major reorganization of multiple cellular components triggered by EMT-TF. This study used quantitative proteomics to identify biochemical we found that DNAJB4 mRNA expression levels are significantly increased in mesenchymal cells. Our EMT induction experiment shows that DNAJB4 protein level gradually increases as EMT progresses. Importantly, knockdown of DNAJB4 in MDA-MB-231 cells leads to a decrease in migration, impairs lung metastasis of invasive breast cancer cells, and decreases primary tumor growth rate. In parallel with this, metastasis-free survival analysis of primary breast cancer patients reveals that higher levels of DNAJB4 significantly associates with a worse metastasis-free survival. Recent research demonstrated that inducing EMT causes hyper-activation of unfolded protein response (UPR) signaling pathways (62-65). It has been shown that cells in their mesenchymal state have increased synthesis and secretion of large quantities of extracellular matrix (ECM) proteins which then triggers ER stress and activation of unfolded protein response (63). It is plausible as a chaperone protein, DNAJB4 may be induced to counteract this stress induction upon EMT and its expression gradually increases due to the activation of unfolded protein response during EMT. Deciphering the molecular mechanism of DNAJB4 overexpression during EMT would bring new perspectives into the relationship between EMT and unfolded protein response.


Introduction
Carcinoma cells utilize epithelial-mesenchymal transition (EMT), a normally latent embryonic program, to facilitate the initiation and progression of metastasis. EMT can be induced via cell intrinsic and extrinsic factors upon the communication between epithelial cells and the nearby stromal cells, and plays a critical role in metastasis of cancer cells (1)(2)(3)(4). In 1982, Greenburg and Hay first reported that epithelial cells can be converted into mesenchymal cells (5). Subsequently, numerous studies described the molecular processes during EMT and suggested alternative mechanistic models that lead to EMT (6,7). EMT does not only trigger metastatic and invasive capabilities, but can also trigger cancer stem cell properties (5,6,8) and drug resistance (9)(10)(11)(12).
Throughout EMT, epithelial cells lose cell-cell contacts and cell polarity (13). Several signaling pathways that facilitate the activation of EMT have been discovered (6,14,15) and includes TGF-β, Wnt proteins, platelet-derived growth factors, Interleukin-6 and various receptor tyrosine kinases. All of these are involved in the activation of EMT inducing transcription factors (EMT-TF). These include Snail, Slug, Zeb1, Zeb2, and Twist (13). Ectopic expression of Twist in epithelial cells leads to the loss of adherens junction protein E-cadherin and expression of mesenchymal markers such as N-cadherin and Vimentin, suggesting the induction of EMT (3,4,16). Gene expression profiling of immortalized human mammary epithelial (HMLE) cells that underwent EMT by expressing Goosecoid, Snail, Twist, or TGF-β, or by 4 knocking down the expression of E-cadherin identified a 'EMT core signature' which consists of 159 down-regulated genes and 87 up-regulated genes (4). Although transcriptomic data represents gene expression during EMT process, it may not thoroughly predict the regulation at the protein level; i.e., the mRNA levels may not be perfectly correlated with the translated proteins due to the posttranscriptional modifications and miRNA-mediated regulation (17-19).
As a comprehensive understanding of EMT requires combination of both transcriptomic and proteomic analyses, we systematically compared biochemical changes during EMT of human mammary epithelial cells that was induced by Twist and we performed detailed comparison of previous transcriptome profile of the same EMT model system with our proteome data. Our analysis revealed 314 differentially regulated proteins during EMT which was followed by the 'omic' data integration to reconstruct EMT specific networks. In parallel, to profile the changes in signaling pathways during EMT, we systematically monitored phosphoproteome changes and reconstructed the network of the interactions between kinases and phosphoproteins. Finally, we validated several novel biochemical changes identified in this study by in vitro and in vivo experiments. Our data provide a framework for understanding the global regulation of EMT at phospho-/protein level and reveal a comprehensive dataset to explore main molecular pathways that drive EMT.

Western blotting
Approximately 3.2x10 6 cells were lysed in 1X PBS buffer containing 0.1% TritonX-100, cOmplete EDTAfree protease inhibitor cocktail (Roche) and PhosSTOP phosphatase inhibitor mixture (Roche). Inhibitor cocktail tablets were used as 1 tablet for 10 ml of lysis buffer. Cells were disrupted by pulling the cell suspension through a thin needle. Cell debris was removed by centrifugation at 14,000 rpm for 5 min at 4°C. Protein concentration was measured using BCA protein assay kit (23227, Pierce). Protein samples were prepared in 2X Laemmli sample buffer containing 4% (w/v) SDS, 20% glycerol, 120 mM Tris-Cl (pH 6.8), 0.02% (w/v) bromophenol blue and 100 mM dithiothreitol (DTT) as protein sample to 2X Laemmli sample buffer ratio of 1:1 (v/v). Samples were heated at 95°C for 5 min, cooled prior to loading onto SDS-PAGE gel. 10% and 12% SDS-PAGE gels were used for the separation of proteins.

Experimental Design and Statistical Rationale
Trypsin digested lysates of HMLE and HMLE-Twist cells were dimethyl labeled as light and heavy, respectively and mixed in equal amounts. Mixed lysates were separated into 40 fractions via SCX.
Biological replicate 2 (BR2) was prepared by label swapping to remove labeling bias and separated into 40 fractions via SCX as in BR1. From each biological replicate, 2 technical replicates were prepared and analyzed in MS. Phosphoproteome quantification of the samples was designed similarly, 2 biological replicates with 2 technical replicates each, except the fractionation step. To enrich phosphopeptides, each SCX fraction was enriched using either in-house packed TiO2 (5 µm, Sachtleben, CAS no. 13463-67-7) or Ti 4+ -IMAC micro columns in which the beads were loaded onto GELoader tips using a C8 plug.
For Ti 4+ -IMAC columns, IMAC material was prepared and used as previously described (21). The enriched samples were combined into 10-12 fractions in both biological replicates with two technical replicates for each. Ratios of proteins and peptides quantified in the replicates were combined by taking the median.
To determine the regulated proteins between epithelial and mesenchymal states, we defined epithelial, mesenchymal specific and unchanged protein ratio distributions using known epithelial and mesenchymal markers. Epithelial and mesenchymal specific protein marker groups (Supplemental Table 1) were used to determine which proteins fall into the corresponding groups. By fitting normal distributions to E/M ratios of these three groups, epithelial, mesenchymal and unchanged, the likelihood of each protein belonging to the respective groups was calculated. Using the likelihoods, the relative likelihood of each protein was calculated, Prelative=Punchanged/(Pchanged-epithelial or Pchanged-mesenchymal), for epithelial or mesenchymal group memberships. Proteins with Prelative <0.05 were chosen as significantly changed proteins. Unmodified peptides of the unchanged proteins were chosen as unchanged peptides. Using the median normalized empirical distribution of the unchanged peptide ratio, we identified the respective lower 0.025 and 0.975 quantiles. These correspond to a lower and upper cutoff for the peptide ratios of -1.33404 and 1.555532, respectively. Aiming to identify those peptide ratios differing from the ones observed in the unchanged peptide ratios, phosphopeptide ratios below the lower cutoff value of the unchanged peptide distribution were identified as mesenchymal specific phosphorylations and phosphopeptide ratios above the upper cutoff of the unchanged distribution were identified as epithelial specific phosphorylations.

In-solution digestion and stable isotope dimethyl labeling
In-solution digestion was performed as described previously (22). Briefly, cell lysate was incubated for 4 hours using Lys-C (enzyme:protein ratio of 1:100) in 4 M urea, followed by overnight trypsin mM NaH2PO4 (Fisher) and 50 mM Na2HPO4 (Fisher). The labeled peptides were combined in equal ratios based on their average peptide intensities. The epithelial sample was labeled with the light dimethyl label and the mesenchymal sample with the heavy label. We swapped the labels for the second biological replicate.

Mass spectrometric analysis
The peptides were subjected to a reversed phase nanoLC-MS/MS (EASY-nLC, Thermo) connected to a Q Exactive quadrupole Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen) for a 120 min analysis run. The peptides were directly loaded onto an in-house packed 100 μm i.d. × 17 cm C18 column (Reprosil-Gold C18, 5 μm, 200Å, Dr. Maisch) and separated at 300 nL/min with 120 min linear gradients, starting from 5 to 30% acetonitrile in 0.1% formic acid. Mass spectrometric analysis and spectra acquisition were performed with the following settings: Orbitrap analysis: Polarity at positive mode; resolution: 35,000; mass range: 350−1,500 m/z; automatic gain control (AGC) target: 3e6;

Data processing and analysis
All raw data files were processed and quantified with Proteome Discoverer (version 1.4, Thermo Scientific). Peak lists containing HCD fragmentation were generated with Proteome Discoverer (PD) with a signal-to-noise threshold of 1.0. Produced data was searched against a Swissprot database version 2014_08 with taxonomy Homo sapiens (20,193 sequences) using the Mascot software (version 2.5.1, Matrix Science, London, UK). Settings for PD search were as following: The enzyme was specified as Trypsin and the fragment ion type was specified as electrospray ionization quadrupole time-offlight. A mass tolerance of ±20 ppm for precursor masses and ±0.05 Da for fragment ions was used; two missed cleavages were allowed; and cysteine carbamidomethylation as a fixed modification. Light and heavy dimethylation of peptide N termini and lysine residues; methionine oxidation; phosphorylation on serine, threonine and tyrosine were set as variable modifications. Each peptide spectral match (PSM) (Mascot peptide score >25) of a phosphorylated peptide was isolated from the data in PD. The identification of the phosphopeptide site localization was done by phosphoRS algorithm 3.0 (26), implemented in PD. A threshold of at least 0.70 probability was used for the phospho-residue localization. The 2plex dimethyl-based quantitation method was chosen in PD, with a mass precision requirement of 2 ppm for consecutive precursor mass measurements. The filter in PD was set to select only the peptides with "high confidence". We applied 0.2 min of retention time tolerance for isotope pattern multiplets and allowed spectra with a maximum of 1 missing channel to be quantified with the integrated Percolator-based filter using a false discovery rate of 1%.

Motif analysis of phosphopeptides
Phosphorylation sites with phosphorylation probabilities higher than 0.75 were chosen. ±7 amino acid sequence windows of the phosphorylation sites were used to extract motifs for epithelial and mesenchymal groups. Motif-x (27) implemented in rmotifx package (28) was used to identify enriched motifs in epithelial and mesenchymal specific phosphorylations. The significance threshold was set to 0.00018 (search length is 15 amino acids) in motif-x to make p-value of each motif match to be at least 0.05 by the Bonferroni method. The motif occurrence limit is set to 5 for epithelial S phosphorylations (n=292) and mesenchymal S phosphorylations (n=415). The motif occurrence score is set to 2 for epithelial T phosphorylations (n=27) and mesenchymal T phosphorylations (n=50). The background was set to Homo sapiens proteins from Swissprot database version 2014_08 and the enriched motifs were converted to sequence logos via Weblogo (29). NetworKIN 3.0 (30) was used to generate kinasesubstrate networks. We only used 4,060 phosphosites identified with more than 50% phosphorylation probabilities from epithelial and mesenchymal specific phosphosites and predictions scoring ≥5. For metastasis-free survival analysis of patients, GSE2603 microarray data (42) for primary breast cancer patients (n=82) was utilized. Patients were separated into two groups at the median expression.
Kaplan-Meier analyses were used to determine metastasis-free survival and statistical differences were determined by Log-rank (Mantel-Cox) test.

Knockdown experiments
shRNA sequences against targets were designed using RNAi Codex database (43). pSMP-Luc plasmid (Addgene plasmid # 36394) was used for the expression of shRNAs as described previously (44). Each shRNA sequence was cloned into pSMP-Luc plasmid after the removal of Firefly luciferase targeting shRNA sequence. DNAJB4 and CD81 targeting sequences are CTGCAACTACTTGTTATGCATA and CACACGTCGCCTTCAACTGTAA, respectively. All cloning and packaging steps of viral plasmids were carried out as described previously (44). MCF-7 and MDA-MB-231 cells were used for the virusmediated shRNA knockdown of epithelial and mesenchymal targets, respectively.

Cell viability and in vitro migration assays
To determine cell viability, MDA-MB-231 cells were incubated with Cell Titer Glo reagent and analyzed with a luminometer according to the manufacturer's protocol (Cell Titer Glo Luminescent Cell Viability Assay kit, G7571, Promega). Real-time cell migration assays were done on xCELLigence RTCA-DP (ACEA Biosciences Inc., CA, USA) according to manufacturer's protocol. RTCA (Real Time Cell Analyzer) is a system that quantifies the number of cells that are migrated towards a stimulus (in this case, serum).
In this assay, cells were seeded in low serum (1%) media in small inserts coated with gold particles at their bottom. The inserts were placed in chambers containing high serum media (10%), and cells were expected to migrate towards the lower chamber with a pace that positively correlates with their invasiveness. As the cells move towards the high serum media, the movement causes a change in the current passing through the gold particles which is then quantified and represented as the migration cell index. For wound healing scratch assay, 0.5×10 6 cells per well were seeded in 6-well plates and grown to sub-confluence. Proliferation of cells were blocked using serum-free medium and the wound area was measured at the beginning and after 24 hours following scratch formation using ImageJ software (Wayne Rasband, NIH). Three independent experiments were performed for each condition.
Images were taken with 4x magnification using Nikon Eclipse inverted microscope (Nikon, Japan). The statistical analysis of results was performed using two-tailed Student's t-test.

Flow cytometry analysis
Cells were stained with PE-conjugated anti-CD81 antibody from BD Biosciences (JS-81) (gift of Dr. Batu Erman) as described previously (45). Samples were analyzed using SONY LE-SH800 flow cytometer. All flow data were analyzed using FlowJo_V10 software.

Animal experiments
All animal experiments were approved by the animal ethics committee of Bilkent University. 6-8 weeks old female athymic nu/nu mice were subcutaneously injected without incision, with 1×10 6 cells into mammary fat pad (MFP). Primary tumor growth was monitored by measuring the tumor volume twice a week with a caliper after tumors became palpable. Tumor volumes were calculated as length × width 2 /2. Spontaneous metastasis was evaluated with a luciferase assay. Due to heterogeneous distribution of metastasis in organs, three different parts were randomly collected from each organ to make a tissue pool and weighed for normalization. Both lung and liver tissues were ground in cold PBS by tissue homogenizer and treated with cell culture lysis 5X reagent (E153A, Promega). 200 ul lysis buffer was added to 50 mg weight. After 15 min incubation, 100 ul of luciferase substrate (Promega) was added to 50 ul of the lysed sample, and luminescence was measured with a luminometer. For tail vein metastasis assay, 1.5×10 6 cells were injected into tail-vein of each 6-8 weeks old female athymic nu/nu mice, with 3-5 mice per group. Lung metastasis was monitored by bioluminescence imaging (BLI). Anesthetized mice were intraperitoneally injected with 150 mg/kg D-luciferin (Perkin Elmer).
Bioluminescence images were acquired with Lumina III In Vivo Imaging System (Perkin Elmer). Analysis was performed with live imaging software by measuring total photon flux. For Bouin's fixation, harvested lungs were cleaned with PBS and placed into Bouin's solution on a shaker for overnight.
After fixation, they were kept in 70% alcohol.

Proteomic profiling identifies differentially regulated proteins during EMT
To identify differentially expressed or phosphorylated proteins during EMT in epithelial HMLE and its mesenchymal derivative HMLE-Twist cells, we took the advantage of quantitative proteomics and phosphoproteomics approaches. Overall study design and experimental workflow is presented in supplemental Fig. 1. HMLE cells exhibiting characteristics of epithelial cells were systematically compared with HMLE cells that stably express Twist (HMLE-Twist) using dimethyl labeling (Fig. 1A).
Twist ectopic expression has been known to induce epithelial cells to undergo EMT, thereby promoting mesenchymal cell characteristics (16). We validated these results by showing the protein level down-regulation of epithelial markers, such as E-cadherin, and protein level up-regulation of mesenchymal markers, such as N-cadherin, Vimentin, and Snail upon Twist overexpression (supplemental Fig. 2).
To perform a systematic quantitative profiling of EMT, the equal amount of cell lysates from HMLE and HMLE-Twist cells were digested using Trypsin. The resulting peptides were differentially labelled with light and heavy isotopes using stable isotope dimethyl labeling (23) with the labeling efficiency of ~99%. The labels were swapped to eliminate any labeling bias in biological repeats. Two technical replicates were performed for each biological replicate. After combining differentially labelled peptides in equal amounts, the pooled peptides were separated by column-based SCX chromatography into 40 fractions, and each fraction was analyzed by LC-MS/MS in two replicates. All raw MS data were processed using the Proteome Discoverer software to identify and quantify peptides in two conditions (supplemental Fig. 1). As a result, 4,829 and 5,505 unique proteins (supplemental Table 1), as well as 29,986 and 35,591 unique peptides (supplemental Table 2A) were identified in biological replicate 1 (BR1) and biological replicate 2 (BR2), respectively with <1% FDR (supplemental Table 3 Table 1). As expected, many known epithelial markers, such as Desmoplakin, Occludin, and E-cadherin were significantly up-regulated in epithelial cells (Fig. 1E, green) and many known mesenchymal markers, such as Twist1-2, Vimentin, and N-cadherin were significantly up-regulated in mesenchymal cells (Fig. 1E, yellow). We compared our proteomics data to 14 previously published microarray data obtained from immortalized HMLE cells underwent EMT by expressing Twist (4). Although proteomic data is significantly correlated with the transcriptomic data (r=0.53, p-value <0.005), alterations at proteome level are not thoroughly represented with transcriptional data in EMT (supplemental Table 1). Proteins only significantly regulated in proteome data (Fig. 1F, blue) comprise 67% of all regulated proteins (Fig. 1F, blue, green and red).

Phosphoproteome is unstable and prone to changes during EMT
To identify protein phosphorylation events of EMT, phosphopeptide enrichment was performed in all SCX fractions using TiO2 or Ti 4+ -IMAC columns (24, 25) (Fig. 1A). 7,735 and 5,685 phosphopeptides were identified with 1% FDR in BR1 and BR2, respectively (supplemental Tables 2C and 2D). replicates with an r=0.7 (p-value <0.005, n=4,338) indicates the high reproducibility (Fig. 2C). For both epithelial and mesenchymal specific phosphosites, we searched for dominant consensus motifs (supplemental Fig. 1). In epithelial cells, SP and TP motifs dominate and in mesenchymal cells, R/KxxSxD and RxxT are the enriched motifs (Fig. 2D). Next, we used NetworKIN 3.0 (30) in KinomeXplorer (46) to further predict potential kinase-substrate relationships of EMT based on known kinase motifs and protein-protein interactions in the STRINGv10 database (47) (supplemental Fig. 1).
Accordingly, a kinase-substrate network was reconstructed that covers 73 kinase groups and their potential epithelial and mesenchymal specific substrates (Fig. 2E). Notably, more kinase activity and phosphorylation events take place in mesenchymal cells than in epithelial cells. ROCK, Aurora, CAMKII, and MAP kinases are among the predicted kinases for mesenchymal specific phosphorylations (Fig.   2E).
Reconstruction of epithelial and mesenchymal specific protein networks reveals functional subnetworks GO enrichment analysis (48) of epithelial and mesenchymal specific proteins revealed distinct annotations in epithelial and mesenchymal cells. Based on the top 10 scoring GO biological process (BP) terms (Fig. 3A), cell-cell adhesion and cell junction assembly proteins are the most significant terms in epithelial state whereas regulation of fatty acid metabolism and cellular response to growth factor stimulus are uniquely significant in mesenchymal cells (Fig. 3A).
Next, we sought to identify signaling pathways that represent epithelial and mesenchymal states (supplemental Figs. 3 and 4). For this purpose, we took the advantage of Omics Integrator software which optimally reconstructs a network by considering the significance of the proteomic hits and the reliability of protein-protein interactions (32). Beyond the list of proteins, it searches for the either direct interactions of the given proteins or the interactions through intermediate proteins. For epithelial state, cell-cell adhesion, cell junction assembly, and epithelial cell differentiation pathways are among the significantly enriched biological proteins in the reconstructed network (Fig. 3B). EGFR, one of the central nodes, interacts with JUP, Col17A, and numerous Keratins. Another significant pathway is Hippo signaling. Multiple components of Hippo signaling pathway SAV1, TJP1, TJP2, PARD6B, PARD6G, PRKCI, PRKCZ, LLGL2, and SFN (14-3-3 sigma, which is also known as epithelial cell marker protein) (49) are part of the epithelial specific network, which indicates their epithelial cell specific roles. Additionally, our network analysis revealed exocytosis, cell cycle, and regulation of proteolysis as notable cellular functions of epithelial specific proteins. Remarkably, seven members of S100 protein family, which is one of the largest calcium binding protein sub-family, are in multiple subnetworks of epithelial specific proteins including epithelial cell differentiation (Fig. 3B).
Multiple signaling pathways are identified in the mesenchymal specific protein network including sphingolipid signaling, regulation of antigen receptor-mediated signaling, transforming growth factor beta receptor signaling, and regulation of TOR signaling (Fig. 3C). Our network identified G-coupled receptors S1PR1, S1PR3, and HTR1A that are implicated in sphingolipid signaling as mesenchymal specific. Activation of those may regulate EMT. Interestingly, multiple mesenchymal specific proteins are linked to fatty acid metabolic processes such as TWIST1, DECR2, CPT1A, CYP4F12, FABP3, IRS2, LPIN3, PLA2G15, and MGLL. This suggests a link between fatty acid metabolism and induction of EMT ( Fig. 3A, C). Another remarkable mesenchymal specific sub-network is protein folding-response to unfolded protein (Fig. 3C). Within this sub-network, two human heat shock proteins 40 family members; DNAJB4 and DNAJB5 are identified (Fig. 3C).

Novel epithelial and mesenchymal specific proteins affect the migration of breast cancer cells
Next, we further studied a sub-class of novel epithelial and mesenchymal specific proteins using realtime cell migration analysis (RTCA) (supplemental Fig. 5A). This sub-class of proteins was determined in comparison with the transcriptome data obtained from Taube et al. study (4). We selected the proteins that are either regulated at proteome level or at both transcriptome and proteome levels. In addition to transcriptome data, we analyzed the mRNA expression levels of selected proteins in epithelial and mesenchymal breast cancer cell lines using the study by Luo et al.   . 5B, C). When epithelial specific proteins CASP14, PI3, TLR2, and TFAP2A were depleted, we observed that the migration of MCF7 cells increased (supplemental Fig.   5B). On the other hand, depletion of mesenchymal specific proteins BIN1, PDGFRB, CADM3, ANPEP, DNAJB4, and CD81 decreased the migration of MDA-MB-231 cells (supplemental Fig. 5C, Figs 4F and 5F, respectively). We focused on two proteins, DNAJB4 and CD81, for the reasons explained below, to address their effects on EMT both in vitro and in vivo.

DNAJB4 knockdown decreases migration of mesenchymal breast cancer cells and its mRNA level predicts metastasis-free survival
DNAJB4 is regulated at both transcriptome and proteome level during EMT and shows increased transcription in mesenchymal breast cancer cell lines (supplemental Fig. 5A and Fig. 4A). In agreement with our quantitative proteomics data, Western blotting analysis showed dramatically increased DNAJB4 protein expression in HMLE-Twist cells compared to HMLE cells (Fig. 4B). To analyze temporal expression of DNAJB4 during EMT, we used EMT inducible HMLE-Twist-ER cells (50). The cells were treated with 20 nM 4-OHT for 16 days to induce EMT and collected at different days to analyze their expression pattern by Western blotting. We found that DNAJB4 expression increases gradually during EMT induction (Fig. 4C).
To examine the roles of DNAJB4 expression on mesenchymal cells, we performed shRNA-mediated knockdown of DNAJB4 in MDA-MB-231 invasive breast cancer cells (Fig. 4D). After the suppression of DNAJB4, we observed no change in cell viability (Fig. 4E), but a significant inhibition of migration in mesenchymal cells by real-time migration assay (Fig. 4F). These results were further validated by a wound healing assay (Fig. 4G). Moreover, the correlation analysis of DNAJB4 with EMT scores revealed that DNAJB4 mRNA expression levels positively correlate with the EMT scores of breast cancer patients (Fig. 4H). Similarly, the metastasis-free survival analysis of patients revealed that high level of DNAJB4 mRNA expression significantly associates with worse metastasis-free survival of breast cancer patients (Fig. 4I). In addition to DNAJB4, the mRNA expression level of DNAJB5, which was also identified as a 18 member of mesenchymal specific network (supplemental Fig. 4 and Fig. 3C), is higher in mesenchymal breast cancer cell lines (supplemental Fig. 6A) and positively correlates with EMT scores of breast cancer patients (supplemental Fig. 6B).

CD81 knockdown decreases migration of mesenchymal breast cancer cells
Another prominent hit in our migration assay was a member of tetraspanins family, CD81 (supplemental Fig. 5A). Although most of the members of tetraspanins were present in the epithelial specific network (supplemental Fig. 3), CD81 is in the mesenchymal specific protein network (supplemental Fig. 4 and Fig. 3C). CD81 is regulated only at proteome level, and its transcription level does not significantly change during EMT and in breast cancer cell lines with epithelial or mesenchymal phenotype (supplemental Fig. 5A and Fig. 5A). Our Western blotting analysis confirmed that CD81 protein expression dramatically increases in HMLE-Twist cells as expected from our quantitative proteomics data (Fig. 5B). Similar to DNAJB4, CD81 expression also increased gradually in a time dependent manner in inducible HMLE-Twist-ER cells as EMT progresses by 4-OHT treatment (Fig. 4C).
As a potential cell surface marker, we tested whether CD81 expression increases on the mesenchymal cell surface using unpermeabilized HMLE and HMLE-Twist cells in flow cytometry. Our analyses revealed that surface CD81 expression increased 79% in HMLE-Twist cells (Fig. 5C).
Next, we performed shRNA-mediated knockdown of CD81 in MDA-MB-231 invasive breast cancer cells.
As a result, CD81 was effectively depleted (Fig. 5D) and viability of cells was not affected (Fig. 5E). In agreement with the real-time migration assay (Fig. 5F), the in vitro wound healing assay also revealed the impairment of migration ability of MDA-MB-231 invasive breast cancer cells upon CD81 knockdown (Fig. 5G).

Individual knockdown of DNAJB4 and CD81 results in decreased primary tumor growth and metastasis
We then examined the roles of DNAJB4 and CD81 in primary tumor growth and metastasis in vivo. We performed retrovirus-mediated shRNA knockdown of DNAJB4 and CD81 in MDA-MB-231-Luc2-GFP cells. To observe the effects of DNAJB4 and CD81 knockdown on the primary tumor growth, we developed primary tumor xenografts with control and knockdown cells in nude mice. Knockdown of CD81 and DNAJB4 significantly impaired the tumor growth (Fig. 6A, B, C). To test the effect of two candidates on the spontaneous metastasis, tumor growth has been monitored until tumors in each group reached to 1000 mm 3 (supplemental Fig. 7). One month after surgical removal of primary tumors, an ex vivo luciferase assay was performed. Knockdown of CD81 and DNAJB4 significantly reduced both lung and liver metastases (Fig. 6D).
Next, we tested whether DNAJB4 and CD81 play a role in extravasation/colonization of MDA-MB-231 invasive breast cancer cells in lungs by tail-vein metastasis assay. After 6 weeks from intravenous injection of DNAJB4 and CD81 depleted cells into nude mice, we compared their lung metastasis with control cells (Fig. 6E). Mouse groups injected with either CD81 or DNAJB4 knockdown cells showed a low level of lung metastasis compared to control (Fig. 6F). Similarly, we observed less nodules in lungs in the knockdown group mice compared to control mice after fixation with Bouin's solution (Fig. 6G).
Hematoxylin and eosin (H&E) staining confirmed the differences in lung metastases in control, DNAJB4 knockdown and CD81 knockdown groups (Fig. 6H). These results demonstrate that both DNAJB4 and CD81 contribute to the mesenchymal characteristics of cells undergoing EMT and have roles in primary tumor growth and metastasis of breast cancer cells in vivo.

Discussion
The transition from epithelial to mesenchymal state involves a major reorganization of multiple cellular components triggered by EMT-TF. This study used quantitative proteomics to identify biochemical 20 changes upon activation of EMT-TF. Previously, Taube et al. obtained the transcriptome profiles of HMLE cells in epithelial and mesenchymal states as 'EMT core signature' (4). By using HMLE-Twist cells, we profiled the biochemical changes at the protein and post-translational level. Strikingly, around 65% of significantly regulated proteins did not exhibit any detectable regulation at the mRNA level (supplemental Table 1). Similarly, a previous study analyzed the cell surface protein changes during EMT using HMLE-Twist cells by performing membrane-associated protein fractionation followed by mass spectrometry analysis (14). Based on the commonly identified proteins, our whole proteome profile largely correlated with the published cell surface proteome (76% for mesenchymal and 83% for epithelial) (supplemental Tables 1 and 4). Interestingly, when significantly changed proteins were normalized to all identified ones, Lu et al. reported more than two-fold significantly differentially expressed proteins compared to our whole proteome data. One likely explanation for this difference is the contribution of cellular spatial organization during EMT. Altogether, these findings suggest that besides the gene expression level regulation, EMT is regulated at multiple post-transcriptional levels including protein sub-cellular localization and posttranslational modification.
To reveal phosphorylation-driven signaling pathways during EMT, we performed a quantitative phosphoproteomic analysis. Our analysis reveals that there are more alterations in phosphoproteome than proteome. Having more modulation at the phosphorylation level suggest the importance of phosphorylation-based signal transduction pathways during EMT. Many kinases are predicted to phosphorylate either epithelial or mesenchymal specific phosphopeptides. This may suggest that phosphorylation events in EMT are important for cells to preserve their state as epithelial or mesenchymal and transition between two states must be regulated by altered phosphorylation landscape of the cells. Overall, our data point out more global phosphorylation in mesenchymal cells (514) than epithelial cells (357) suggesting an important role of kinases during EMT. We identified 73 kinase groups and their potential epithelial and mesenchymal specific substrates. Some of these roles in EMT are previously reported: CDK5 is required for TGF-β1-induced EMT (51). CK2A2 expression levels are found to be upregulated in human colorectal cancer and downregulated through miR-1228, decreasing EMT capability (52, 53). Exosomes from transformed epithelial cells that contain PAK2 are reported to induce metastatic features in endothelial cells (54). Interestingly, GSK3β's decreased activity is found to be promoting invasive abilities of epithelial cancers via Snail and Slug activation (55-57). We also identified kinases such as PAK2, ROCK2, and NEK1 are auto-phosphorylated in a mesenchymal specific manner. These auto-phosphorylated kinases might be members of the feedback mechanisms in EMT.
Besides the kinase-substrate network, reconstructing epithelial and mesenchymal specific proteinprotein interaction networks revealed key signaling pathways of cells undergoing EMT including Hippo, TGF-β, mTOR, PDGF, and cAMP. In addition to expected cellular machineries that are remodeled during EMT, such as cell adhesion, cell-substrate junction assembly etc., our analysis identified understudied ones, such as lipid metabolism and sphingolipid signaling (58). We found that expression of Sphingosine-1-phosphate receptors 1 and 3 increases in mesenchymal cells. In addition, the expression of fatty acid metabolic process related proteins DECR2, CPT1A, CYP4F12, FABP3, IRS2, LPIN3, PLA2G15, and MGLL were increased in mesenchymal cells. Deciphering the molecular mechanism behind the overexpression of these proteins will provide further insights into the biology of EMT. One motivation for this study was to find biomarkers for EMT. Our network analysis identified multiple cell surface proteins as epithelial and mesenchymal specific. Some of them may serve as potential biomarkers, such as TSPAN1 and FXYD3 as epithelial and CD10 (MME), CD90 (THY1) (66), Fibulin-4 (EFEMP2) and CD81 as mesenchymal markers. Interestingly, the CD81 mRNA level does not significantly alter during EMT (4) and between epithelial and mesenchymal breast cell lines. But we identified CD81 as one of the significantly up-regulated proteins during the EMT process. CD81 protein levels increase gradually as cells undergo EMT. In parallel, CD81 level increased 79% on the surface of HMLE-Twist cells. Functional assays supported its potential of being a marker for EMT process. Upon