Data from identification of diagnostic biomarkers and metabolic pathway shifts of heat-stressed lactating dairy cows

Controlling heat stress (HS) is a global challenge for the dairy industry. In this work, an integrated metabolomics and lipidomics approach using 1H nuclear magnetic resonance (NMR) and ultra-fast LC–MS in combination with multivariate analyses was employed to investigate the discrimination of plasma metabolic profiles between HS-free and HS lactating dairy cows. Here we provide the information about the acquiring and processing of raw data obtained by 1H NMR and LC–MS techniques. The data of present study are related to the research article “Identification of diagnostic biomarkers and metabolic pathway shifts of heat-stressed lactating dairy cows” in the Journal of Proteomics (Tian et al., J. Proteomics, (2015), doi:10.1016/j.jprot.2015.04.014).


Type of data
Value of the data The data of multivariate analysis highlight the significant differences of plasma metabolic profiling between HS and HS-free dairy cows.
The data point out the potential biomarkers of HS dairy cows. The data provide insights into shifted pathways of dairy cows induced by HS.
1. Data, experimental design, materials and methods

Sample collection and experimental design
All experiments involving animals were conducted according to the principles of the Chinese Academy of Agricultural Sciences Animal Care and Use Committee (Beijing, China). Fasting blood samples were collected before morning feeding from the caudal veins of Holstein dairy cows, put into K2 EDTA anti-coagulation vacuum tubes, and centrifuged at 1600g for 10 min at 4 1C. The supernatants were transferred to tubes, frozen quickly, and stored at À 80 1C until use. An overview of the experimental design to acquire the data in present article is shown Fig. 1.

Sample preparation
For LC-MS metabolomics analysis, 150 μL aliquots of each thawed plasma sample were mixed with 600 μL ice-cold acetonitrile, vortexed, and centrifuged for 10 min at 10,000 rpm and 4 1C. The 650 μL supernatant of each sample was transferred to another tube, and concentrated to dryness with a SpeedVac Concentrator (SPD121P, Thermo Savant, Waltham, MA, USA). Each dried sample was reconstituted in 100 μL ACN/H 2 O (1:99 v/v), and filtered through a Captiva 96-well filter plate (Agilent, Santa Clara, CA, USA) for analysis.
For analyses of the plasma lipidome, a modified preparation method was used [2]. Briefly, a 30 mL aliquot of each plasma sample was mixed with 200 mL methanol, followed by the addition of 660 mL methyl tert-butyl ether, and vortexing of the sample. Subsequently, 150 mL water were added to each tube, the samples were vortexed for 5 min, incubated for 5 min, and centrifuged for 5 min at 10,000 rpm and 4 1C. The upper layers (500 mL) were collected, evaporated to dryness using a SpeedVac SPD121P concentrator, and re-dissolved in 500 mL of ACN/isopropanol/H 2 O (65:30 The corresponding solvents were filtered through a Captiva 96-well filter plate for LC-MS analysis. For NMR analysis, the plasma samples were centrifuged for 10 min at 1600g and 4 1C. The supernatants (20 μL) were collected, mixed with 40 μL of deuterium oxide containing 1 mM 3-(trimethylsilyl) propionic-2,2,3,3,d4 acid sodium salt, and transferred to 1.7 mm NMR tubes for analysis.

NMR spectroscopic analysis
1 H NMR spectra were acquired using an automated NMR Case sample changer on a Bruker AvanceIII 500 spectrometer (Bruker Biospin Billerica, MA, USA) operating at 500.13 MHz and equipped with a 1.7-mm triple resonance inverse probe at 298 K. A Carr-Purcell-Meiboom-Gill pulse sequence was used to enhance the contribution of low-molecular weight metabolites, and a diffusion-edited experiment with bipolar pulse pair-longitudinal eddy current delay pulse sequence was used to measure the lipid content of plasma lipoproteins [3]. The Carr-Purcell-Meiboom-Gill method utilized a spin-spin relaxation delay (2nτ) of 320 ms for each sample, with water signal irradiation applied during the recycle delay. For bipolar pulse pair-longitudinal eddy current delay, a sine shaped gradient with a strength of 32 G/cm and a duration of 2.5 ms was followed by a delay of 200 μs to allow for the decay of eddy currents. The diffusion delay was 120 ms and the time delay (T e ) was 5 ms. A linebroadening factor of 1 Hz was applied to free induction decays before Fourier transformation. Spectra were acquired with 128 scans and then zero-filled and Fourier-transformed to 128k data points. Representative NMR spectra plasma samples are shown in Supplementary Fig. 1A.
Metabolites were identified by inserting the experimental spectra into the Chenomx spectral database (Edmonton, AB, Canada), and comparing them with the spectra of standard compounds.
The significance of variation in the metabolic profiling data between the HS-free group and corresponding HS group was determined using SIMCA-P to generate metabolite correlations between groups (see correlation results in Supplementary Excel); then, MATLAB R2012a software was applied for the color map visualization (see Supplementary running program 1). Information-dependent acquisition mode was used for MS/MS analyses of the metabolome and lipidome. The collision energy was 35 eV.

LC-MS/MS analyses
High-resolution MS, isotope abundance ratios, MS/MS, the Human Metabolome database, the METLIN database, a literature search, and standard comparisons were employed to identify ion structures.
Quality control (QC) samples were detected once every 10 LC-MS runs to monitor the reproducibility of the instrument. Plasma samples from different groups were randomly alternated during the analysis. Autocalibration was performed using the Calibrant Delivery System in both positive and negative ion modes.

Data handling
Raw LC-MS data files (.wiff) were converted into mzXML format using ProteoWizard (http://metlin. scripps.edu/xcms/download/pwiz/pwiz.zip). The files were processed using an open-source XCMS package (version 1.20.1) in R statistical software (version 2.10.0) for peak discrimination, filtering and alignment [4], see the Supplementary running program 2. During XCMS implementation, the R-package CAMERA (Collection of Algorithms for Metabolite Profile Annotation) was used to annotate the isotope, adduct, and product ion peaks. In combination with the corresponding extracted ion chromatograms, these ions were manually excluded from the acquired peaks. The resulting two-dimensional matrices, including the observations (sample names) in columns, the variables (m/z-retention time pairs) in rows, and the peak areas, were imported into the SIMCA-P 12.0 software package (Umetrics AB, Umeå, Sweden) for multivariate analysis.
The raw 1 H NMR spectra were manually corrected for phase and baseline distortions using TopSpin software (version 3.0; Bruker Biospin) and were referenced to the signal of 3-(trimethylsilyl) propionic-2,2,3,3,d4 acid sodium salt (δ 0.0 ppm). The 1 H NMR spectra of plasma specimens were binned into 0.01 ppm integral regions and integrated in the 0.5-6.0 ppm region using the AMIX software package (version 3.8.3, Bruker Biospin). The regions containing the water resonance (δ 5.12-4.7) were removed. The spectra were normalized to the total sum of the spectral integrals to compensate for differences in sample concentration. Multivariate analyses of the normalized NMR data sets were performed using the SIMCA-P 12.0 software package.
Prior to multivariate analysis, pareto and center scaling were respectively applied to LC-MS and 1 H NMR data to reduce noise and artifacts in the models. Data from Quality Control samples were analyzed by Principal component analysis (PCA) to monitor the reproducibility of the instrument (see Supplementary  Fig. 2), based on the judgment of whether peak areas deviation are less than two standard deviations [5,6]. The retention time deviation profiles of LC-MS data that were generated using an open-source XCMS package were analyzed to evaluate the reproducibility of the chromatography (see Supplementary Fig. 3), based on the judgment of whether deviations are less than 20 s for most analyses [6]. PCA was performed to visualize global clustering and separation trends, or outliers (see Supplementary Fig. 4). Partial least squares dicriminant analysis (PLS-DA) models were applied to validate the model against overfitting through 999 random permutation tests (see Supplementary Fig. 5). The models of orthogonal partial least squares discriminate analysis (OPLS-DA) on the metabolic profiles of HS-free and HS groups were constructed (see Supplementary Fig. 6). Discriminating variables were selected according to the S-plots (see Supplementary  Fig. 7), jack-knifed-based confidence intervals (see Supplementary Fig. 8), variable importance in projection values (VIP41), and raw data plots in the OPLS-DA models (examples of Supplementary Fig. 9) [6][7][8]. Furthermore, an independent t-tests (Po0.05) (SPSS version 13.0) were used to determine if the differences between the concentrations of candidate biomarkers obtained from OPLS-DA of the HS-free and HS groups were statistically significant.

Ultra-fast LC-MS/MS-based verification test
The conditions of chromatographic gradient elution were unchanged. The eluent was injected into a triple quadrupole-trap mass spectrometer equipped with an ESI source (QTRAP 5500, Applied Biosystems/ MDS Sciex). All of the LC-MS-discovered potential candidates were detected by ultra-fast LC-MS/MS in multiple reaction monitoring (MRM) mode. For each analyte of interest, the collision energies and precursor/fragment ion pairs were pre-optimized to generate an optimal signal to noise ratio. The MRM transitions were listed in Supplementary Tables 1 and 2. The internal standards levofloxacin (362-261) and hesperidin (611-303) were used for detecting the metabolome and lipidome candidate ions in positive ion mode; and the internal standards rhein (293-221) and hesperidin (609-301) were included for detecting the metabolome candidate ions in negative ion mode.

Characterization of potential diagnostic biomarkers
The sensitivity and specificity of all candidates were evaluated by plotting ROC curves using SPSS (version 13.0), and calculating the area under the curves (AUC) [6]. The discriminatory power of biomarker candidates was ranked and visualized by heat maps (see Supplementary Fig. 10).

Author declaration
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us.
We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property.