Identification of Drosophila centromere associated proteins by quantitative affinity purification-mass spectrometry

Centromeres of higher eukaryotes are epigenetically defined by the centromere specific histone H3 variant CENP-ACID. CENP-ACID builds the foundation for the assembly of a large network of proteins. In contrast to mammalian systems, the protein composition of Drosophila centromeres has not been comprehensively investigated. Here we describe the proteome of Drosophila melanogaster centromeres as analyzed by quantitative affinity purification-mass spectrometry (AP-MS). The AP-MS input chromatin material was prepared from D. melanogaster cell lines expressing CENP-ACID or H3.3 fused to EGFP as baits. Centromere chromatin enriched proteins were identified based on their relative abundance in CENP-ACID–GFP compared to H3.3-GFP or mock affinity-purifications. The analysis yielded 86 proteins specifically enriched in centromere chromatin preparations. The data accompanying the manuscript on this approach (Barth et al., 2015, Proteomics 14:2167-78, DOI: 10.1002/pmic.201400052) has been deposited to the ProteomeXchange Consortium (http://www.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD000758.

). In order to discriminate centromere specific proteins from proteins abundantly found in chromatin, we quantitatively compared proteomes isolated from different chromatin regions. To biochemically   isolate these different chromatin regions, we made use Drosophila cell lines expressing GFP-tagged histone H3 variants as baits. These were either the replacement variant H3.3, which is enriched in euchromatin, or the centromere specific H3 variant CENP-A CID . Chromatin from these cells and the parental cell line was isolated and solubilized by Micrococcal Nuclease digestion. This soluble extract served as input material for anti GFP-affinity purification, thereby enriching chromatin fragments together with the associated proteins. Repeated washes were performed to remove unspecifically bound contaminants. The associated proteins were eluted under denaturating conditions and fractionated via SDS-PAGE. After in gel-tryptic digestion, peptides were extracted, concentrated and analyzed by LC-MS/MS. Three independent AP-MS experiments were performed per cell line used. Intensity-based absolute quantification (iBAQ) values from the output of the MaxQuant software package were used as a measure for the abundance of identified proteins. Average iBAQ values were calculated for the different samples and in case the protein was not detected, the iBAQ values were imputed from a random distribution (see Section 2.5). Centromere enrichment was calculated by dividing the average iBAQ value for each protein in the CENP-A CID -GFP purification by the corresponding iBAQ value in the chromatin purification from untransfected parental or H3.3-GFP expressing cell lines, respectively. A factor was considered centromere specific if its log2-fold enrichment over both controls was more than four. Using these criteria, we identified 86 proteins that were specifically enriched in CENP-A CID -GFP containing chromatin (Table 1 and [1]). Known CENP-A CID -binding proteins such as Cal1, the centromeric proteins CENP-C, or CAF-1 were also found enriched in CENP-A CID chromatin, demonstrating the general applicability of the technique to detect proteins enriched in centromeric chromatin [2]. While centromere association of most of the 86 identified proteins has not been reported so far, several associations among the proteins are predicted using the "Search Tool for the Retrieval of Interacting Genes/Proteins" (STRING) (Fig. 2) [3]. This indicates that a complex network of interactions contributes to centromere function or maintenance.

Cell culture
The Drosophila Schneider Line 2 derived L2-4 cell line was used for all experiments. Cells were maintained at 25 1C in Drosophila Schneider medium supplemented with 10% fetal calf serum and penicillin/streptomycin. Stable cell lines were established by XtremeGENE HP mediated transfection of GFP-fusion expression constructs following four weeks of Hygromycin B selection (100 mg/mL).

Chromatin preparation for AP-MS
Asynchronously growing cells were harvested by centrifugation and washed in PBS. Cells were resuspended in ice-cold hypotonic buffer (10 mM HEPES, pH 7.6; 15 mM NaCl; 1.5 mM MgCl 2 ; 0.1 mM DTT; freshly added protease inhibitors: PMSF, Aprotinin, Leupeptin, Pepstatin) and lysed for 10 min on ice by adding Triton X-100 to a final concentration of 0.1%. Nuclei were pelleted by centrifugation, washed with PBS and chromatin was solubilized for 10 min at 26 1C by micrococcal nuclease (MNase) digestion in EX100 buffer (10 mM HEPES, pH 7.6; 100 mM NaCl; 1.5 mM MgCl 2 ; 0.5 mM EGTA; 2 mM CaCl 2 ; 10% glycerol (v/v); freshly added protease inhibitors) containing 2000 U MNase per one billion nuclei. Chromatin was released by increasing the sodium chloride concentration to 300 mM and applying ten strokes in a Dounce homogenizer with a tight-fit pestle. Following one hour incubation at 4 1C, insoluble material was pelleted for 20 min at 20,000g and the supernatant was precleared with Protein A Sepharose beads yielding the AP-MS input extract.

Affinity purification and sample preparation for mass spectrometry
GFP-Trap agarose beads (ChromoTek) were used as affinity resin. The beads were preblocked in 0.5% BSA, 0.5% polyvinylpyrrolidone dissolved in EX100 buffer by over-head rotation for 30 min. The input extract was added to the blocked beads and affinity purification was performed for 2 h at 4 1C on an overhead rotator. The affinity resin with bound complexes was washed three times for 5 min at 4 1C with EX300 buffer and bound proteins were eluted by boiling beads in Laemmli buffer for 10 min at 95 1C. Eluted proteins were separated by SDS-PAGE using a 15% polyacrylamide gel and the gel was stained with Coomassie Brilliant Blue G-250. Whole lanes were excised from the gel with a disposable gridcutter (Gel Company) and split into eight vials. Following destaining, reduction of disulfide bonds with dithiothreitol and alkylation with iodoacetamide, in-gel tryptic digestion was performed. Resulting peptides were collected by acid extraction of the gel pieces, concentrated by evaporation, and resuspended in 0.1% TFA.

LC-MS/MS
Peptides were injected into an Ultimate 3000 HPLC system (Thermo-Fisher Scientific). Samples were desalted online by a C18 micro-column (5 mm Â 300 mm id 5 mm, packed with C18 PepMapTM, 5 mm, 100 Å, Thermo-Fisher Scientific), and peptides were separated with a gradient from 5% to 60% acetonitrile in 0.1% formic acid over 40 min at 300 nL/min on a C18 analytical column (10 cm Â 75 mm, packed in house with C18 PepMapTM, 3 mm, 100 Å, Thermo-Fisher Scientific). The effluent from the HPLC was directly infused into the LTQ Orbitrap mass spectrometer (Thermo-Fisher Scientific) via a nano-electrospray ion source. The MS instrument was operated in the data-dependent mode to automatically switch between fullscan MS and MS/MS acquisition. Survey fullscan MS spectra (m/z 350-2000) were acquired in the Orbitrap with resolution 60,000 at m/z 400. For all measurements with the Orbitrap detector, three lock-mass ions from ambient air (m/z¼371.10123, 445.12002, 519.13882) were used for internal calibration as described [4]. The six most intense peptide signals with charge states between two and five were sequentially isolated applying a 1 Da window centered around the most abundant isotope to a target value of 10,000 and fragmented in the linear ion trap by collision-induced dissociation. Fragment ion spectra were recorded in the linear trap of the instrument. Typical mass spectrometric conditions were as follows: spray voltage, 1.4 kV; no sheath and auxiliary gas flow; heated capillary temperature, 200 1C; activation time, 30 ms; and normalized collision energy, 35% for collision-induced dissociation in linear ion trap.

Protein identification and statistical analysis
For protein identification, the raw data were analyzed with the Andromeda algorithm of the MaxQuant protein analysis package (version 1.2.2.5) against the Flybase dmel-all-translation-r5.24. fasta database including reverse sequences and contaminants. The Trypsin/P enzyme was selected, allowing for maximum two missed cleavages. Carbamidomethylation of cysteine was set as fixed modification; methionine oxidation and protein N-acetylation were included as variable modifications. The mass tolerance of the initial search was 20 ppm; after recalibration, a 6 ppm mass error was applied for the main search. Fragment ions were searched with a mass offset of 0.5 Da using the six most intense signals per 100 Da. Searching for secondary peptide hits within already assigned MS/MS spectra was enabled. The search results were filtered with a peptide and protein false discovery rate of 0.01 with a minimum peptide length of six amino acids. Protein identifications with at least one unique peptide were accepted. For quantification, the intensity-based absolute quantification (iBAQ) values were calculated from peptide intensities and the protein sequence information [5] of unmodified, M/oxidated, and acetylated peptide species with a minimum of two peptides per protein.
As preparation for statistical analysis, protein hits representing reversed sequences or contaminants and protein hits without quantification values were removed from the list of identified proteins from three biological replicates. iBAQ quantification values were log2-transformed and subsequently missing values were imputed from a random distribution centered at 1/3 Â log2 of the obtained experimental data. The imputation was repeated three times to reduce effects of the random value distribution. ANOVA was applied in DanteR (vs 0.2, PNNL, Richland, WA, USA) [6] to calculate protein enrichment factors and p-values and obtained p-values were corrected for multiple hypothesis testing by the Benjamini-Hochberg method [7].

STRING analysis
Protein names from Table 1 were subjected to STRING analysis using the web-tool available via http://string-db.org/ [3]. Fig. 2 shows the confidence view with the active prediction methods "Experiments", "Databases" and "Textmining" and medium confidence score (0.4).