Quantitative acetylome and phosphorylome analysis reveals Girdin affects pancreatic cancer progression through regulating Cortactin.

The actin-binding protein Girdin is involved in a variety of cellular processes, including pancreatic cancer. The objective of this study is to explore the role and the mechanism of Girdin in pancreatic cancer by quantitative acetylome and phosphorylome analysis. We firstly found that Girdin was overexpressed in pancreatic cancer tissue and increased expression of Girdin was associated with tumor size and stage of patients with pancreatic cancer. We established the shRNA knockdown of Girdin in PANC-1 and Aspc-1 cells, and we found that shGirdin inhibited proliferation, migration and invasion, and promoted apoptosis. Subsequently, we identified and quantified 5,338 phosphorylated sites in 2,263 proteins that changed in response to Girdin knockdown, and identified a similar set of Girdin-responsive acetylome data as well. Additional data revealed that down-regulation of Girdin affected Cortactin phosphorylation and acetylation, suggesting Cortactin as an important regulatory target of Girdin. Moreover, we found that overexpression of Cortactin could rescue the effect of shGirdin on proliferation, apoptosism, migration and invasion of pancreatic cancer cells. In general, our results provided new insights into the mechanisms of Girdin function including cell proliferation, migration and invasion, and offer biomarker candidates for clinical evaluation of Girdin.


Technical Route
The aim of this project is to use integrated approach involving SILAC labeling, HPLC fractionation, affinity enrichment, mass spectrometry-based quantitative proteomics to quantify dynamic changes of the whole phosphorylome of human cell lines. The general technical route is indicated below:

Protein Extraction
Trypsin Digestion

Database Search
Bioinformatic Analysis 1. Results

Quantitative Overview
Altogether, 5,468 phosphorylation sites in 2,317 proteins were identified, among which 5,338 phosphorylation site in 2,263 proteins were quantified ( Table 1). All the data was presented in the file of "5144SP_Pho/2-Basic_analysis/ SILAC_modified_quantification.xlsx" In this report, the quantitative ratio over 1.5 was considered as up-regulation while quantitative ratio below 1/1.5 (0.67) was considered as down-regulation (T test p-value <0.05). The number of differentially quantified sites and proteins were summarized in Table 2.

Protein Annotation
To further understand the function and feature of identified and quantified proteins, we annotated function or feature of protein from several different categories, including Gene Ontology, Protein Domain, Protein Complex, KEGG Pathway and Subcellular Localization. Firstly, all the identified proteins were annotated. Then, the quantifiable proteins were also annotated.
The results were presented in the folder: 5144SP_Pho/4-Protein_annotation The results were presented in the folder: 5144SP_Pho/9-Motif_analysis

GO Classification of Terms Level 2
According to GO annotation information of identified phosphorylation proteins, we calculated the number of quantifiable proteins in each GO term of level 2.  Note: For detailed information, please find the corresponding excel files in the supplementary folder of "5144SP_Pho/5-Functional_classification".

Subcellular Location Classification
According to subcellular location annotation information of identified phosphorylation proteins, we calculated the number of quantifiable proteins in each subcellular location.   For detained information of the subcellular location of up-and down-regulated proteins, please find the corresponding excel files in the supplementary folder of "5144SP_Pho/5- The results were presented in the folder: 5144SP_Pho/5-Functional_classification cell-cell junction SWI/SNF superfa mily-type complex apic olateral plasma membrane lamellipodium enzyme binding histone deace tyla se activity (H3-K9 specif ic) NAD-dependent histone deace tyla se activity (H3-K9 specif ic) protein kinase C activity histone deace tyla se activity (H4-K16 spec ific) NAD-dependent histone deace tyla se activity (H4-K16 spec ific) histone deace tyla se activity (H3-K14 spec ific) NAD-dependent histone deace tyla se activity (H3-K18 spec ific) transme mbr ane receptor pr otein tyrosine kinase signaling pathwa y regulation of ce ll adhesion nega tive regulation of intrace llular protein kinase casc ade defense response to virus muscle or gan development nega tive regulation of M APK casc ade response to biotic stimulus muscle structure de velopm ent nega tive regulation of protein serine/threonine kinase activity response to other organism nega tive regulation of transfe rase activity response to virus regulation of M AP kinase activity nega tive regulation of protein kinase activity  cell pe riphery cell cortex cell cortex part cortical ac tin cytoskeleton microtubule actin c ytoskeleton spindle nuclear pore nuclear basket cytoske letal protein binding lipid binding actin binding microtubule binding structur al molecule activity tubulin binding cytoske leton organiza tion cell-matrix adhesion regulation of ce ll sha pe regulation of cytoskele ton or ganization regulation of protein depolym erization actin f ilament-based process regulation of organelle organization regulation of protein comple x disa ssembly nega tive regulation of protein complex disassembly nega tive regulation of protein depolym eriz ation cell junction assem bly microtubule-based pr oce ss cell-substrate adhesion microtubule c ytoskeleton orga nization     The results were presented in the folder: 5144SP_Pho/6-Functional_enrichment 1.6. Cluster Analysis

Quantiles-based Clustering for Protein Groups
Firstly, the quantified proteins in this study were divided into four quantiles according to the quantification ratio to generated four quantiles: Q1 (0~1/1.5), Q2 (1/1.5~1/1.3), Q3 (1.3~1.5) and Q4 (>1.5). Then, the quantiles-based clustering was performed.    PTM Biolabs is able to deliver high quality service in the projects proposed above. We are proudly providing our technical expertise and knowledge in epigenetics and proteomics towards the client's success.

SILAC Labeling
The cells were grown to 80% confluence in high glucose (4.5 g/liter) Dulbecco's modified Eagle's medium (with glutamine and sodium pyruvate) containing 10% fetal bovine serum and 1% penicillin-streptomycin at 37°C with 95% air and 5% CO2. The cells were labeled with either "heavy isotopic lysine" ( 13 C-Lysine) or "light isotopic lysine" ( 12 C-Lysine) using a SILAC Protein Quantitation Kit (Pierce, Thermo) according to manufacturer's instructions. Briefly, the cell line was grown in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum and either the "heavy" form of [U-13 C6]-L-lysine or "light" [U-12 C6]-L-lysine for more than six generations before being harvested, to achieve more than 97% labeling efficiency.
After that, the cells were further expanded in SILAC media to desired cell number (~5×10 8 ) in fifteen 150 cm 2 flasks.
The "light" labeled cells were then treated with 5×10 10 PFU/mL ad-girdin shRNA and the "heavy" labeled cells were treated with same amount of ad-GFP. After treatment, the cells were maintained in SILAC media for another 48 hours. The cells were then harvested and washed twice with ice-cold PBS. After snap freezing in liquid nitrogen, cell pellets were stored in -80 °C freezer for future use.

Protein Extraction
The harvested "heavy" and "light" labeled cells were lysed with lysis buffer (100 mM Tris-Cl, 2mM EDTA, pH 7.2) supplemented with Phosphatase Inhibitor Cocktail Set III and Protease Inhibitor Cocktail Set V on ice using a high intensity ultrasonic processor (Scientz) for 30 min, respectively. The supernatants were saved after centrifuge at 20,000 g for 10 min at 4 °C. The protein concentration was determined with 2-D Quant kit according to the manufacturer's instructions.

Trypsin Digestion
For digestion, the protein solution was reduced with 10 mM DTT for 1 h at 37 °C and alkylated with 20 mM IAA for 45 min at room temperature in darkness. Finally, trypsin was added at 1:50 trypsin-to-protein mass ratio for the first digestion overnight and 1:100 trypsin-to-protein mass ratio for a second 4 h-digestion.

HPLC Fractionation
The sample was then fractionated into fractions by high pH reverse-phase HPLC using Agilent 300Extend C18 column (5 μm particles, 4.6 mm ID, 250 mm length). Briefly, peptides were first separated with a gradient of 2% to 60% acetonitrile in 10 mM ammonium bicarbonate pH 10 over 80 min into 80 fractions, Then, the peptides were combined into 8 fractions and dried by vacuum centrifuging. was added and the enriched phosphopeptides were eluted with vibration. The supernatant containing phosphopeptides was collected and lyophilized for LC-MS/MS analysis.

Mass Spectrometer
Thermo Scientific TM Q Exactive TM Plus

LC-MS/MS Analysis
Peptides were dissolved in solvent A (0.1% FA in 2% ACN), directly loaded onto a reversedphase pre-column (Acclaim PepMap 100, Thermo Scientific). Peptide separation was performed using a reversed-phase analytical column (Acclaim PepMap RSLC, Thermo Scientific) with a linear gradient of 2-24% solvent B (0.1% FA in 98% ACN) for 50 min, 24-36% solvent B for 12 min and 35-80% solvent B for 4 min then holding at 80% for the last 4min, all at a constant flow rate of 300 nl/min on an EASY-nLC 1000 UPLC system. The resulting peptides were analyzed by Q Exactive TM Plus hybrid quadrupole-Orbitrap mass spectrometer (ThermoFisher Scientific).
The peptides were subjected to NSI source followed by tandem mass spectrometry (MS/MS) in Q Exactive TM Plus (Thermo) coupled online to the UPLC. Intact peptides were detected in the Orbitrap at a resolution of 70,000. Peptides were selected for MS/MS using NCE setting as 28; ion fragments were detected in the Orbitrap at a resolution of 17,500. A data-dependent procedure that alternated between one MS scan followed by 20 MS/MS scans was applied for the top 20 precursor ions above a threshold ion count of 5E3 in the MS survey scan with 15.0s dynamic exclusion. The electrospray voltage applied was 2.0 kV. Automatic gain control (AGC) Confidential

Database Search
The resulting MS/MS data was processed using MaxQuant with integrated Andromeda search engine (v.1.4.1.2). Tandem mass spectra were searched against SwissProt_Human (20,203 sequences) database concatenated with reverse decoy database. Trypsin/P was specified as cleavage enzyme allowing up to 2 missing cleavages, 5 modifications per peptide and 5 charges.
Mass error was set to 10 ppm for precursor ions and 0.02 Da for fragment ions.
Carbamidomethylation on Cys was specified as fixed modification and oxidation on Met, phosphorylation on Ser, Thr, Tyr and acetylation on protein N-terminal were specified as variable modifications. False discovery rate (FDR) thresholds for protein, peptide and modification site were specified at 1%. Minimum peptide length was set at 7. All t he other parameters in MaxQuant were set to default values. The site localization probability was set as > 0.75.

QC Validation of MS Data
The MS data validation was shown in Figure 11. Firstly, we checked the mass error of all the identified peptides. The distribution of mass error is near zero and most of them are less than 0.02 Da which means the mass accuracy of the MS data fit the requirement ( Figure 11A).
Secondly, the length of most peptides distributed between 8 and 20, which agree with the property of tryptic peptides (Figure 11B), that means sample preparation reach the standard. can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.
Identified proteins domain functional description were annotated by InterProScan (a sequence analysis application) based on protein sequence alignment method, and the InterPro domain database was used. InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and metagenomes, as well as in characterizing individual protein sequences.

KEGG Pathway Annotation
KEGG connects known information on molecular interaction networks, such as pathways and complexes (the "Pathway" database), information about genes and proteins generated by genome projects (including the gene database) and information about biochemical compounds and reactions (including compound and reaction databases). These databases are different networks, known as the "protein network", and the "chemical universe" respectively.

Subcellular Localization
The cells of eukaryotic organisms are elaborately subdivided into functionally distinct membrane bound compartments. Some major constituents of eukaryotic cells are: extracellular space, cytoplasm, nucleus, mitochondria, Golgi apparatus, endoplasmic reticulum (ER), peroxisome, vacuoles, cytoskeleton, nucleoplasm, nucleolus, nuclear matrix and ribosomes.
Bacteria also have subcellular localizations that can be separated when the cell is fractionated.
The most common localizations referred to include the cytoplasm, the cytoplasmic membrane (also referred to as the inner membrane in Gram-negative bacteria), the cell wall (which is usually thicker in Gram-positive bacteria) and the extracellular environment. Most Gramnegative bacteria also contain an outer membrane and periplasmic space. Unlike eukaryotes, most bacteria contain no membrane-bound organelles, however there are some exceptions.
There, we used wolfpsort a subcellular localization predication soft to predict subcellular localization. Wolfpsort an updated version of PSORT/PSORT II for the prediction of eukaryotic sequences.

Protein Complex
Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing approximately 16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature re ference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers.

Motif Analysis
Soft motif-x was used to analysis the model of sequences constituted with amino acids in specific positions of modifier-21-mers (10 amino acids upstream and downstream of the site) in all protein sequences. And all the database protein sequences were used as background database parameter, other parameters with default.

Enrichment of Gene Ontology analysis
Proteins were classified by GO annotation into three categories: biological process, cellular compartment and molecular function. For each category, we used Functional Annotation Tool of DAVID Bioinformatics Resources 6.7 to identify enriched GO against the background of Homo sapiens. A two-tailed Fisher's exact test was employed to test the enrichment of the proteincontaining IPI entries against all IPI proteins. Correction for multiple hypothesis testing was carried out using standard false discovery rate control methods. The GO with a corrected p-value < 0.05 is considered significant.

Enrichment of pathway analysis
Encyclopedia of Genes and Genomes (KEGG) database was used to identify enriched pathways by Functional Annotation Tool of DAVID against the background of Homo sapiens. A twotailed Fisher's exact test was employed to test the enrichment of the protein-containing IPI entries against all IPI proteins. Correction for multiple hypothesis testing was carried out using standard false discovery rate control methods. The pathway with a corrected p-value < 0.05 was considered significant. These pathways were classified into hierarchical categories according to the KEGG website.

Enrichment of protein domain analysis
For each category proteins, InterPro (a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites) database was researched using Functional Annotation Tool of DAVID against the background of Homo sapiens. A two-tailed Fisher's exact test was employed to test the enrichment of the protein-containing IPI entries against all IPI proteins. Correction for multiple