Proteome-wide dataset supporting the study of ancient metazoan macromolecular complexes

Our analysis examines the conservation of multiprotein complexes among metazoa through use of high resolution biochemical fractionation and precision mass spectrometry applied to soluble cell extracts from 5 representative model organisms Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Strongylocentrotus purpuratus, and Homo sapiens. The interaction network obtained from the data was validated globally in 4 distant species (Xenopus laevis, Nematostella vectensis, Dictyostelium discoideum, Saccharomyces cerevisiae) and locally by targeted affinity-purification experiments. Here we provide details of our massive set of supporting biochemical fractionation data available via ProteomeXchange (PXD002319-PXD002328), PPIs via BioGRID (185267); and interaction network projections via (http://metazoa.med.utoronto.ca) made fully accessible to allow further exploration. The datasets here are related to the research article on metazoan macromolecular complexes in Nature [1].


a b s t r a c t
Our analysis examines the conservation of multiprotein complexes among metazoa through use of high resolution biochemical fractionation and precision mass spectrometry applied to soluble cell extracts from 5 representative model organisms Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Strongylocentrotus purpuratus, and Homo sapiens. The interaction network obtained from the data was validated globally in 4 distant species (Xenopus laevis, Nematostella vectensis, Dictyostelium discoideum, Saccharomyces cerevisiae) and locally by targeted affinity-purification experiments. Here we provide details of our massive set of

Value of the data
Macromolecular complexes drive essential biological processes, yet their ubiquity across phyla is unclear. By applying a human-centric approach on the merged data for 5 species obtained through fractionation and mass spectrometry, and subsequent computational analysis we identified 16,655 high confidence protein-protein interactions and 981 putative functional modules encompassing 2153 broadly-conserved proteins found in virtually all multicellular eukaryotes.
We further we projected a draft conservation map of 41 million putative high-confidence cocomplex interactions for 122 species with fully sequenced genomes that encompasses functional modules present broadly across all extant animals.
Functional analysis subsequently revealed metazoan-specific complexes responsible for cell-cell communication, development and disease, and ancient complexes extant for $ 1 billion years with central housekeeping roles. This reconstructed physical interaction network provides mechanistic insights into the unique organization and evolution of animal cells.
Despite the vast array of information available for many multi-cellular organisms, our data reveals fundamental attributes of the macromolecular machinery of animal cells with clear ubiquitous relevance to metazoan biology, development and evolution.
Although our the research article focused on global conservation properties, these datasets can be analyzed at the individual animal species or complex levels by researchers in the community to assess the variety and functional adaptations of particular protein assemblies across phyla.

Data
We performed biochemical fractionation of 6387 fractions from 69 different experiments, followed by quantitative mass spectrometry analysis to derive soluble multiprotein complexes from 5 representative model organisms with 4 other organisms used for validation. Altogether the selected organisms cover a span of over half a billion years of evolutionary divergence from human, and also play a vital role as model organisms in biology and disease research. We selected 5 human cell lines, 2 cell types in fly, and cells from 5 different development stages in sea urchin, along with whole body lysate from worm and embryonic stem cells from mice. These 5 species were used in deriving interactions and complexes, while 4 additional species were subjected to the same preparation and quantitation methods, but used only in validation: frog, sea anemone, yeast and amoeba. The species, cell types, and fractionation methods used are provided as Supplementary information with the research article.

D. discoideum
One liter of AX4 cells were grown in HL5 medium, harvested at a cell density of 4-5 Â 10 6 cells/ml, transferred to 17 mM phosphate buffer for 2 h, pelleted and frozen in aliquots of 5 Â 10 8 cells each. Cell pellets were resuspended in lysis buffer and lysed by sonication as mentioned above. Prior to biochemical fractionation, removal of nucleic acids was done by treating soluble protein extracts with benzonase nuclease (100 m/ml; Millipore, USA) on ice for 30 min.

D. melanogaster
Nuclear extracts of SL2 cells [5,6] and whole cell extracts were prepared by harvesting cells using centrifugation at 1200 rpm for 10 min at 4°C, removing medium by aspiration, and washing cells twice with cold PBS. Cell pellets were resuspended in lysis buffer (20 mM Hepes/KOH pH 7.6, 200 mM KCl, 10% Glycerol, 0.1% NP-40, 1 mM DTT, PMSF, Aprotinin, Leupeptin and Pepstatin) and incubated on ice for 5 min. Cells were frozen in liquid nitrogen and thawed in 26°C water bath three times to lyse, and the extract clarified at 13,000 rpm for 30 min at 4°C, before being aliquoted and snap-frozen at À80°C for subsequent analysis.

H. sapiens
Human neural stem cell line CB660 and human brain tumor stem cell line G166 were obtained from Patrick Paddison (Fred Hutchinson Cancer Research Center, Seattle). Cell pellet was resuspended in lysis buffer [10 mM Tris-HCl (pH 8.0), 10 mM KCl, 1.5 mM MgCl 2 , 0.5 mM DTT, and 1x Protease Inhibitor Cocktail Set I (Calbiochem)], centrifuged at 1000g for 5 min (4°C). The supernatant was saved as the cytosolic fraction. The pellet was resuspended in 250 mM sucrose/10 mM MgCl 2 /1x Protease Inhibitor Cocktail, layered over a sucrose cushion of 880 mM sucrose/0.5 mM MgCl 2 /1x Protease Inhibitor Cocktail, and centrifuged at 3000g for 10 min (4°C). The pellet was resuspended in lysis buffer with 5% NP-40 by sonicating water bath (15 min) followed by centrifugation at 3500g for 10 min and the supernatant saved as the nuclear fraction [2].

N. vectensis
Unfertilized sea anemone eggs samples were collected by centrifugation, aspirating away seawater by vacuum. Samples were resuspended in 5 volumes of ice-cold lysis buffer and wash three times. (Lysis buffer: 40 mM NaCl, 2.5 mM MgCl 2 , 300 mM glycine, 100 mM potassium gluconate, 2% glycerol, 50 mM HEPES and pH 6.9 4.19 mM CaCl 2 , 10 mM EGTA) supplemented with fresh protease and phosphatase inhibitors (1 μΜ PEFABLOC, 10 μM Protease inhibitor Cocktail 3 Cal Biochem, 1 mM Na orthovanadate, 100 μM NaF). Suspensions were transferred to a chilled glass homogenizer on ice, allowed to settle; buffer removed, and then disrupted by hand using 5-10 strokes of a loose fitting pestle on ice until 100% lysis was obtained. Lysates were centrifuged at 10,000g for 15 min at 4°C, and the clarified supernatants removed for analysis.

S. purpuratus
Four stages of sea urchin early embryonic development were analyzed: unfertilized embryos, 5 min post fertilization, 2 cell and hatched blastula. Samples were collected by centrifugation, aspirating away seawater by vacuum. Samples were resuspended in 5 volumes of ice-cold lysis buffer and washed three times. (Lysis buffer: 40 mM NaCl, 2.5 mM MgCl 2 , 300 mM glycine, 100 mM potassium gluconate, 2% glycerol, 50 mM HEPES with pH 6.9 4.19 mM CaCl 2 , 10 mM EGTA) for unfertilized eggs and pH 7.4 (8.56 mM CaCl 2 , 10 mM EGTA) for fertilized embyros with KOH, and freshly added protease and phosphatase inhibitors (1 mM PEFABLOC, 10 mM Protease inhibitor Cocktail 3 Cal Biochem, 1 mM Na orthovanadate, 100 mM NaF). Suspensions were transferred to a chilled glass homogenizer on ice, allowed to settle; buffer removed, and then disrupted by hand using 5-10 strokes of a loose fitting pestle on ice until 100% lysis was obtained. Lysates were centrifuged at 10,000g for 15 min at 4°C, and the clarified supernatants removed for analysis. Protein concentrations were measured by Bradford assay. Affinity bead (SeraFILE PROspector) based sample pre-separations were performed as per manufacturer's instructions.
The above lysates or fractions from beads for all species except frog were subjected to ion exchange fractionation by an Agilent 1100 HPLC system. Proteins from each of the various HPLC fractions were precipitated, resuspended and digested in solution with trypsin, dried and resolubilised [2] before being analyzed by LC-MS/MS using a nanoflow HPLC System (EASY-nLC; Proxeon) coupled with LTQ Orbitrap Velos (Thermo Fisher).

X. laevis
Extracts were prepared from 750 stage 15 embryos, from 1000 dissected animal caps allowed to develop to stage 19-20, or from adult male heart and liver. All steps were on ice or 4°C unless otherwise noted. Embryos were washed in X Buffer (10 mM Tris pH 7.5, 20 mM KCl, 5 mM MgCl 2 , with 50 mg/ml cycloheximide and 1:100 volume of Protease Inhibitor Cocktail Set I (Calbiochem) added freshly), and glass dounce homogenized in an equal volume of X Buffer using 20 strokes of loose and tight pestles. After 1000g 10 min initial centrifugation, the supernatant was further clarified by recentrifugation at 15,000g 10 min. Animal caps were washed in Steiner's medium, liquid was removed, and tissue stored À 80°C until use. After dounce homogenization, sample was probe-tip sonicated with 2 pulses (30 s 30% power) prior to clarification centrifugation as above. Heart and liver were dissected from one frog, minced with a razor blade, disrupted with a Tissue Tearor (BioSpec Products) in an equal volume of X Buffer, glass dounce homogenized with a loose pestle, and large debris was pelleted 1000g 1 min. The supernatant was dounced with a tight pestle and then clarified as for embryo extract. In a replicate experiment, clarified heart and liver homogenates were frozen, and subsequently pooled, depleted of hemoglobin using HemogloBind (Biotech Support Group) according to the manufacturer's instructions, and clarified at 15,000g 10 min prior to sucrose gradient fractionation. Protein concentration was determined with BioRad Protein Assay, using BSA as a standard. Gradients were formed by layering 3.5 ml/3.9 ml/3.9 ml respectively of 47/26.5/7% sucrose in X Buffer without protease inhibitors and were allowed to equilibrate during horizontal storage at 4°C for 1.5-3 h prior to loading. [Gradients used in fractionating Xenopus laevis heart/liver homogenate containing hemoglobin underwent minor mixing.] After loading 200-500 ml extract (containing 1.8 mg embryo, 0.6 mg animal cap, 2.5 mg each pooled heart/liver extract, or 2 mg hemoglobin-depleted pooled heart/liver) the gradients were centrifuged 35,000 rpm 1.5 h 4°C in an SW41 rotor with braking to 800 rpm. Fractions were collected by volume displacement through a UV flow cell monitoring absorbance at 254 nm. Proteins were precipitated with trichloroacetic acid, or (for more dilute animal cap gradient fractions) UPPA-Protein-Concentrate (G Biosciences), and washed with ice cold acetone, and the air dried pellets resuspended in 0.1 M Tris pH 8.1 for in-solution digestion with trypsin, then analysis with LTQ Orbitrap Velos (Thermo Fisher).

Data processing protocol
Target-decoy databases were constructed from protein sequences downloaded from ENSEMBL when available (Homo sapiens, Mus musculus, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae), and otherwise from databases of the main species-specific genomics community (Strongylocentrotus purpuratus: spbase.org, Dictyostelium discoideum: disctybase. org, Nematostella vectensis: genome.jgi-psf.org and X. laevis: http://www.marcottelab.org/index.php/ Xenopus_Genome_Project) and processed to retain only the longest sequence for each gene, to simplify orthology mapping between species, since determination of conserved complexes was the primary focus of the project.
To improve peptide-spectral matching sensitivity and accuracy in obtaining MS2 peptide identification and spectral count quantitation mass spectra were searched using 3 search engines: Tide, INSPECT and MSGFDB each employing a different search methodology, and the spectral counts were integrated probabilistically using MSblender [3]. We found we were able to increase the total peptide-spectral matches and proteins identified by 20-60% depending on the sample compared to using Sequest alone, with a false discovery rate of o 1% for each sample. To eliminate spurious associations between proteins with high sequence similarity, such as in the case of close homologs, only unique peptides were retained. Tide search output is in a format that was processed for best hits with MSblender; MSGFDB and INSPECT search output is provided directly, when available. The result was a total of 10.2 million peptide-spectral matches from the 5 species integrated into the metazoan complex map. The mass spectrometry search output files are available for download from ProteomeXchange.

MS1 and MS2 protein identification and quantitation
We further used MS1 intensities as a means of improving the accuracy of protein quantitation with PepQuant35 [4]. To prepare cleaner protein count profiles, we filtered the protein quantitation to retain only proteins identified previously in a given sample using the MS2 spectral count methods described in the preceding paragraph.
The MS1 intensity and MS2 spectral count elution profiles were used to derive four different correlation scores for given protein pair: (1) Pearson correlation with added Poisson noise, (2) weighted crosscorrelation, (3) co-apex score, all from MS2 and (4) Euclidean distance from MS1 using the open-source SciPy python library [8]. The correlation profiles for the four test species were mapped back to their human orthologs. These scores along with external biochemical [9,10] and functional evidence [11] were used as input for machine learning to predict conserved proteinprotein interactions and complex co-memberships.
The MS1 and MS2 elution profiles, correlation scores and orthology mapping files used are available for download from the supporting website http://metazoa.med.utoronto.ca.