Hepatic proteome network data in zebrafish (Danio rerio) liver following dieldrin exposure

Dieldrin is an environmental contaminant that adversely affects aquatic organisms. The data presented in this study are proteomic data collected in liver of zebrafish that were exposed to the pesticide in a dietary exposure. For label free proteomics, data were collected with a quadrupole Time-of-Flight mass spectrometer and for iTRAQ proteomics, data were acquired using a hybrid quadrupole Orbitrap (Q Exactive) MS system. Using formic acid digestion and label free proteomics, 2,061 proteins were identified, and among those, 103 were differentially abundant (p < 0.05 in at least one dose). In addition, iTRAQ proteomics identified 722 proteins in the liver of zebrafish following dieldrin treatment. The label-free approach identified 21 proteins that followed a dose dependent response. Of the differentially abundant proteins identified by iTRAQ, there were 26 unique expression patterns for proteins based on the three doses of dieldrin. Proteins were queried for disease networks to learn more about adverse effects in the liver following dieldrin exposure. Differentially abundant proteins were related to metabolic disease, steatohepatitis and lipid metabolism disorders, drug-induced liver injury, neoplasms, tissue degeneration and liver metastasis. The proteomics data described here is associated with a research article, “Label-free and iTRAQ proteomics analysis in the liver of zebrafish (Danio rerio) following a dietary exposure to the organochlorine pesticide dieldrin” (Simmons et al. 2019). This investigation reveals new biomarkers of toxicity and will be of interest to those studying aquatic toxicology and pesticides.

lipid metabolism disorders, drug-induced liver injury, neoplasms, tissue degeneration and liver metastasis. The proteomics data described here is associated with a research article, "Label-free and iTRAQ proteomics analysis in the liver of zebrafish (Danio rerio) following a dietary exposure to the organochlorine pesticide dieldrin" (Simmons et al. 2019). This investigation reveals new biomarkers of toxicity and will be of interest to those studying aquatic toxicology and pesticides.
© 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Data
Proteomics is used as a tool to identify biomarkers of exposure to environmental contaminants. Dieldrin is an organochlorine pesticide that can bioaccumulate in fish tissues, resulting in adverse effects within the tissue. These data are proteomics data collected in the liver of zebrafish after being fed the organochloride pesticide dieldrin. These data were collected using two different, but complementary methods (Simmons et al., 2019) [1].
Label-free relative quantification: Label-free data were acquired using a quadrupole Time-of-Flight mass spectrometer. Of 2,061 proteins identified by the database search, 1,563 proteins remained after filtering with the interquantile range estimate (Supplemental Data 1). Among those, 103 proteins (approximately 6.6%) were significantly different in abundance in at least one treatment group Specifications  Isobaric tagging for relative and absolute quantitation (ITRAQ) data were collected using iTRAQ labelling methodology. Data were acquired using a hybrid quadrupole Orbitrap (Q Exactive) MS system (Thermo Fisher Scientific, Bremen, Germany). There were 772 proteins that were regulated in one or more doses of DLD (Supplemental Data 2). These proteins comprised 26 unique expression patterns for proteins based on the three doses of DLD ( Table 1). The remaining proteins that were not changed (3219) are indicated in group X1. The number of differentially expressed proteins that were affected by DLD based on dose was as follows: The LOW DLD treatment resulted in 61 proteins down-regulated and 288 proteins up-regulated, the MED DLD treatment resulted in 99 proteins down-regulated and 185 up-regulated, and the HIGH DLD treatment resulted in 363 proteins down-regulated and 196 proteins up-regulated. The group that contained the highest number of proteins (n ¼ 178) was Expression Pattern XII and these proteins were down-regulated in abundance with the highest dose of DLD. Approximately 18.3% of the proteins detected by iTRAQ were responsive to the dietary exposure (Fold change cut-off of 1.2 fold, p < 0.05).
Examples of diseases associated with proteins quantified in the label free and iTRAQ experiments are presented in Table 2 and all enrichment data are presented in Supplemental Data 3. For label-free proteomics, the number of disease networks identified as differentially expressed based on protein abundance changes in the LOW, MED, and HIGH dose treatments of DLD were as follows: 35, 23, and 30 respectively (Supplemental Data 3). Based on the label-free proteomics, there were no common disease elements identified across all three groups ["LOW DLD", "MED DLD" and "HIGH DLD"] but there were some overlapping diseases in two of the three doses ( Fig. 1). In the LOW DLD group, proteins related to lymphoma (T-cell), growth retardation, severe combined immunodeficiency, metabolic diseases,  neoplasms and pathologic processes were decreased in protein abundance in the liver. In the MED DLD group, edema and calcification were disease networks that were decreased in abundance in fish liver while upregulated networks included wounds, multiple organ failure, hyperglycemia, liver neoplasms, and iron overload. In the HIGH DLD group, lipid metabolism disorders, neoplastic processes, and proteins involved in healing impairment were down-regulated as a network, while networks associated with growth retardation, tumor microenvironment, and liver diseases were increased in the liver ( Table 2, Supplemental Data 3). One theme that emerged from all three treatment groups was the dysregulation of proteins related to tumor and liver disease (Fig. 2). For iTRAQ proteomics, the number of disease networks identified as differentially expressed based on protein abundance changes in the LOW, MED, and HIGH dose treatments of DLD were as follows: 58, 48, and 49 respectively (Supplemental Data 3). Based on the iTRAQ proteomics, there were 6 common disease elements identified across all three groups ["LOW DLD", "MED DLD" and "HIGH DLD"] (Fig. 1). These six common disease networks, affected independent of dose were seizure susceptibility, reflex epilepsy, mucosal damage, cerebellar ataxia, cartilage loss, and erythema. In the LOW DLD group, proteins related to Ehlers-Danlos Syndrome were decreased while proteins related to neoplasm micrometastasis, carcinoma (ductal), and residual neoplasm were increased in protein abundance in the liver. In the MED DLD group, weight loss, acute liver failure, recurrent infection, and drug induced liver injury were disease networks that were decreased in abundance in fish liver while upregulated networks included acquired immunodeficiency syndrome and anxiety disorders. In the HIGH DLD group, residual neoplasms, epithelial damage, liver metastasis, and tissue degeneration were downregulated as a network, while a network associated with invasive breast care was increased in the liver (Supplemental Data 3). The large majority of the protein networks in the liver were suppressed with DLD exposure with the highest dose. Example networks for disease processes are shown in Figs. 3 and 4. Lastly, five disease networks (in one or more doses) were in common between the label free and iTRAQ proteomics experiments; endometrial carcinoma, bronchopulmonary dysplasia, stomach ulcer, cartilage degeneration, and brain metastasis.

Protein extraction, digestion, and iTRAQ labeling
For label-free, peptides were prepared from liver protein extracts for each individual liver extract (n ¼ 7). Proteins were re-suspended in 100 mM TEAB (final concentration 20 mg protein/ml). The proteins were then reduced and acetylated using 100 mM tris(2-carboxyethyl)phosphine and 200 mM 2-iodoacetamide. Proteins were digested with 10% v/v formic acid for 30 min at 115 C. Peptide digests were then evaporated to near dryness and re-suspended in 0.1% formic acid and 5% acetonitrile.
For iTRAQ, proteins were dissolved in denaturant buffer (0.1% SDS (w/v)) and dissolution buffer (0.5 M triethylammonium bicarbonate, pH 8.5) in the iTRAQ Reagents 8-plex kit (AB sciex Inc., Foster City, CA, USA). For each sample, 60 mg of each protein was reduced, alkylated, trypsin-digested, and labeled according to the manufacturer's instructions (AB Sciex Inc.). The liver were labeled as follows: (control was labelled with 117 and the three treatments were labelled as LOW ¼ 118, MED ¼ 119, and HIGH dose ¼ 121). This was done for three independent biological replicates/group. Thus, there were 3 iTRAQ experiments conducted.

LC-MS/MS analysis
For label-free proteome analysis, the peptides were analyzed by liquid chromatographyetandem mass spectrometry (LCeMS/MS). Peptides were separated on an Agilent 1260 Infinity nano-HPLC-Chip cube system, using a ProtID chip 150 II (300 Å C18 150 mm) with a thermostat-controlled column temperature of 40 C and an autosampler chilling temperature of 8 C. Both the capillary and the nano-pump timetables were: 0e45 min 0e60% solvent B, 45e50 min 90% solvent B, 50e60 min 0% Fig. 2. Protein network for lipid metabolism disorders, tumor microenvironment and liver disease following high dose treatment of dieldrin in the zebrafish liver. Protein data were generated using label-free proteomics. Red indicates that protein levels are increased for that protein and green indicates that the protein levels are decreased relative to the control group. The more intense the color, the larger the relative fold change of the protein compared to the control group. Circles indicate proteins, mushroom shaped entities with notches along the side represent receptors, while mushroom-shaped entities with notches on the bottom refer to transcription factors. Abbreviations are provided in Supplemental Data 4. solvent B. The inner valve was switched to the capillary pump at 45 min. The Agilent 6520 Accurate-Mass Quadrupole Time-of-Flight (Q-TOF) was used as the detector in tandem to the Agilent 1260 system. Samples were ionized using by z-spray with: positive polarity, gas temperature 325 C, gas flow 5 L/min, capillary 1950 V, fragmentor 180 V, skimmer 65 V, and octopole RF of 750. Scan was performed in Auto MS/MS mode with an MS range of 300e3200 Da and scan rate of 1 cycle/s and an MS/MS range of 50e3200 Da and scan rate of 1 cycle/s using a collision energy ramp with slope 3.7 and offset 2.5. Ten precursors per cycle were the maximum. The absolute threshold to trigger MS/MS scan was an absolute intensity of 1000 counts or a relative threshold of greater than 0.05% total intensity counts. Active exclusion was enabled after 2 spectra were collected for 6 seconds. Reference ion masses of 322.048121, 1221.990637, and 2421.91399 were used for simultaneous mass axis calibration throughout the analysis. Each analytical run included a blank, a peptide standard, and a BSA digest standard, which were injected every 10 samples for quality assurance. Samples were injected once per individual.
For iTRAQ proteomes, labeled peptides were desalted with C18-solid phase extraction and dissolved in strong cation exchange (SCX) solvent A (25% (v/v) acetonitrile, 10 mM ammonium formate, and 0.1% (v/v) formic acid, pH 2.8). The peptides were fractionated using an Agilent HPLC 1260 with a polysulfoethyl A column (2.1 Â 100 mm, 5 mm, 300 Å; PolyLC, Columbia, MD, USA). Peptides were eluted with a linear gradient of 0e20% solvent B (25% (v/v) acetonitrile and 500 mM ammonium formate, pH 6.8) over 50 min., followed by ramping up to 100% solvent B in 5 min. The absorbance at 280 nm was monitored and a total of 14 fractions were collected. The fractions were lyophilized and resuspended in LC solvent A (0.1% formic acid in 97% water (v/v), 3% acetonitrile (v/v)). A hybrid quadrupole Orbitrap (Q Exactive) MS system (Thermo Fisher Scientific, Bremen, Germany) was used with high energy collision Fig. 3. Protein network for tissue regeneration, neoplasms, and liver metastasis. Protein data were generated using iTRAQ proteomics. Red indicates that protein levels are increased for that protein and green indicates that the protein levels are decreased relative to control. The more intense the color, the larger the relative fold change of the protein compared to the control group. Circles indicate proteins, mushroom shaped entities with notches along the side represent receptors, while mushroom-shaped entities with notches on the bottom refer to transcription factors. Abbreviations are provided in Supplemental Data 4. dissociation (HCD) in each MS and MS/MS cycle. The MS system was interfaced with an automated Easy-nLC 1000 system (Thermo Fisher Scientific, Bremen, Germany). Each sample fraction was loaded onto an Acclaim Pepmap 100 pre-column (20 mm Â 75 mm; 3 mm-C18) and separated on a PepMap RSLC analytical column (250 mm Â 75 mm; 2 mm-C18) at a flow rate at 350 nl/min during a linear gradient from solvent A (0.1% formic acid (v/v)) to 25% solvent B (0.1% formic acid (v/v) and 99.9% acetonitrile (v/v)) for 80 min, and to 100% solvent B for additional 15 min.

Spectral processing and protein identification
Label-free proteins were identified by search against both the Uniprot and NCBI Teleostei subset protein database (downloaded on Feb 13, 2014; 729,330 entries). Spectral files were grouped into folders by treatment and each folder was searched separately using Spectrum Mill Software (Version A.03.03 SR4). Peptides were validated manually and accepted when at least one peptide had a peptide score (quality of the raw match between the observed spectrum and the theoretical spectrum) greater than 5 and a %SPI (percent of the observed spectral intensities that are accounted for by theoretical fragment peaks) of greater than 60% (recommended criteria for data obtained by an Agilent Q-TOF mass spectrometer). Data were statistically analyzed using the open source online software Metab-oAnalyst 4.0 [2]. Data were treated as follows: missing values were replaced with a small number using the default setting, data were filtered using the interquantile range estimate, and then transformed using median normalization and Pareto scaling. Fold change was determined for each treatment by performing a volcano plot. ANOVA with Fisher's LSD was conducted to determine whether treatments differed in protein abundance. Adjusted p-values (q-values) were calculated but not used for filtering. Fig. 4. Protein network for drug induced liver injury, acute liver failure and steatohepatitis. Protein data were generated using iTRAQ proteomics. Red indicates that protein levels are increased for that protein and green indicates that the protein levels are decreased relative to control. The more intense the color, the larger the relative fold change of the protein compared to the control group. Circles indicate proteins, mushroom shaped entities with notches along the side represent receptors, while mushroom-shaped entities with notches on the bottom refer to transcription factors. Abbreviations are provided in Supplemental Data 4.
Pattern searching correlation analysis was used to determine which molecules demonstrated a significant positive or negative dose-response relationship.
For iTRAQ, the raw MS/MS data files were processed by a thorough database searching approach considering biological modification and amino acid substitution against the National Center for Biotechnology Information (NCBI) Teleostei database (downloaded on Feb 13, 2014; 729,330 entries) using the ProteinPilot v4.5 with the Fraglet and Taglet searches under ParagonTM algorithm [3]. The following parameters were considered for all the searching: fixed modification of methylmethane thiosulfonate-labeled cysteine, fixed iTRAQ modification of amine groups in the N-terminus, lysine, and variable iTRAQ modifications of tyrosine. For protein quantification, only MS/MS spectra that were unique to a particular protein and where the sum of the signal-to-noise ratios for all the peak pairs >9 were used for quantification. The accuracy of each protein ratio is given by a calculated error factor from the ProGroup analysis in the software, and a P value is given to assess whether the protein is significantly differentially expressed. The error factor is calculated with 95% confidence error, where it is the weighted standard deviation of the weighted average of log ratios multiplied by Student's t factor. The P value is determined by calculating Student's t factor by dividing the (weighted average of log ratios e log bias) by the weighted standard deviation, allowing the determination of the P value with n) 1 of freedom, where n is the number of peptides contributing to the protein relative quantification (software default settings, AB Sciex, Inc.). To be identified as being significantly differentially expressed, a protein had to contain at least three spectra (allowing the generation of a P value), with P < 0.05. Additivity of protein expression was assessed quantitatively; we calculated additive expression based on mid-parent values (MPVs; averaged values from two biological replicates of each parent). To be identified as being significantly differentially expressed, a protein was quantified with at least three unique spectra in at least two of the biological replicates, along with a Fisher's combined probability of <0.05 and a fold change of ±1.2.
The MS proteomics data have been deposited in the ProteomeXchange Consortium [4] via the MassIVE partner repository with the data set identifier PXD014622 and MSV000084091 and PXD014480.

Protein network analysis
For proteomics network analysis with Pathway Studio (v11) (Elsevier), all proteins quantified in the liver with label-free and iTRAQ proteomics were imported into the program separately using Name þ Alias (i.e. mammalian homologs were identified for the zebrafish proteins). Each dose was analyzed separately for disease networks and subnetwork enrichment analysis was based on 1000 permutations of fold change data using a KolmogoroveSmirnov test. The default option of "best pvalue, highest magnitude of response" was used to map protein networks. The enrichment p-value for all queries was set at p < 0.05. Subnetwork enrichment analysis was conducted in Pathway Studio for the list of proteins, and diseases were queried. These disease networks represent those that are significantly represented by proteins that are differentially regulated by dieldrin. All abbreviations for the network are provided in Supplemental Data 4.