Dataset of the rumen microbiota and epithelial transcriptomics and proteomics in goat affected by solid diets

Although early solid diet supplementation is a common practice to improve the growth and development in goat kids, its biological mechanism how solid diet induces rumen microbiota and epithelial development is still unknow. In this study, rumen fermentation parameters, 16S rRNA sequencing for rumen content and epithelial microbiota, transcriptomics and proteomics of epithelium were determined to classify the effects of solid diet supplementation. Here, we classified the changes of goat phenotypes (i.e., growth performance, rumen fermentation and development) and linked them to the changes of rumen microbiota, transcriptome and expressed proteins. The mechanism of solid diet improving rumen development was elucidated preliminarily. Moreover, different roles between the rumen content and epithelial microbiota were identified. Thess datasets expands our understanding of the association between the early diet intervention and rumen development, providing the useful information how nutrient strategy affects rumen function and subsequently improves the host growth. The generated data provides insights in the importance of rumen niche microbiota and microbe-host interactions, which benefits future studies.


Background & Summary
As one of the most important livestock specie in the world, the goat (Capra hircus) provides various types of products for human consumption, such as meat, milk, pelts and fiber 1,2 .Now, there are over 1,000 goat breeds and total goat amount is over 1 billion globally 3 .The rumen, the most critical digestive organ for ruminants, can degrade the high fiber-based plant components into volatile fatty acids (VFA) and microbial protein for rumen development and body nutrient requirement via microbial fermentation process 4 .The ability to digest fiber is the evidence of the well-developed and functional rumen in young ruminants 5 .However, the rumen of goat kids develops rapidly after born.Manipulation the early development of rumen becomes the most effective way to improve life-long rumen function and animal growth 6,7 .Early supplementation of a concentrate diet has already been widely used in ruminant production to improve its rumen and body development because of its stimulation of microbial proliferation and VFA production that initiates epithelial development [8][9][10][11] .However, the biological mechanism of this practise is still unclear.
Previous studies have demonstrated that early feeding starter with high grains or even alfalfa changed the rumen microbiota and improved animal growth [12][13][14] .However, most studies focused on the rumen content microbiome until now, and less researches have been conducted to determine the microbiota attached on the rumen epithelium and the rumen epithelial transcriptomics and proteomics.The rumen epithelial microbiota associated with the content microbial community might be critical for nutrient absorption as it tightly attaches on the luminal side of the rumen [15][16][17] .A previous study found differing community structure between the rumen content and the epithelial microbiome in cattle 18 , and the epithelial microbiota was also affected by dietary carbohydrate 17 .Moreover, the import roles of epithelial microbiota in maintaining host gene expression and development were also reported 19 .Beyond microbiota, changes of rumen fermentation and epithelial genes were reported in previous studies 4,17,20,21 .It is known that rumen epithelium plays key role in digestion and absorption of nutrients, such as VFAs and ammonia 22 .Thus, understanding the regulations of rumen epithelial gene and protein expression affected by the early diet intervention is necessary and urgent.Additionally, limit in microbe-host interactions develops a gap for understanding the connection between microbiota and rumen development as well as the goat growth.Therefore, this study was conducted to investigate the 16S rRNA gene sequences of rumen microbiota (both content and epithelium) and host transcriptomics and proteomics in goat kids consuming three diet regimes: milk replacer only (MRO), milk replacer supplemented with concentrate solid diet (MRC), and milk replacer supplemented with concentrate diet and alfalfa (MCA).This dataset, including the goat kids' phenotypes, rumen content and epithelial microbiota, and epithelial omics (both transcriptome and proteomics), was described to illustrate the effects of early supplementation of high carbohydrates on the goat kids and explored the axis of diet-microbiota-host.As a foundation data, these omics could allow us to dig more relationship between the rumen microbiome and epithelial genes.The details of a schematic overview of the study workflow were shown in Fig. 1.

Methods ethical statement.
All experimental animals' procedures in this study were approved by the Chinese Academy of Agricultural Sciences Animal Ethics Committee.experimental design, animal management and sampling.Based on the experimental design, 72 healthy Haimen goat kids (4.53 ± 0.52 kg body weight (BW)) were assigned into three treatment group: milk replacer only (MRO), milk replacer supplemented concentrate (MRC) and milk replacer supplemented concentrate and alfalfa pellets (MCA).Six animal replicates were included in each treatment group.Goat kids consumed these diet regimes from 20 to 60 days of age (d), respectively, and were slaughtered on d 60 to collect samples.
The animal trial was conducted at a commercial farm (the Green Sheep Valley Farm, Haimen City, China).During the trial, all goat kids had free access to water.Milk replacer, a patent product, was obtained from Beijing Precision Animal Nutrition Research Center, China.The solid diet, both concentrate and alfalfa pellets, were freely provided the MRC and MCA groups.
The feed intake was recorded daily and is shown in Table 1.At d 60, six goat kids from each treatment were weighted and slaughtered to collect rumen samples.Approximately 10 mL of rumen content was sampled from the mixed digesta and stored at −80 °C for next-generation sequencing.Rumen fluid phase approximately the 10 mL level was filtered via four layers of gauze and stored in a 15 mL tube at −20 °C for the analysis of rumen fermentation parameters.Next, rumen tissue at the bottom of the ventral sac was washed using sterilized PBS (pH = 7) to rinse the residual of rumen content or fluid filling the gap between papillae.Remaining residues attached tightly in the epithelium were abraded out to analyze for the epithelial microbiota.Concurrently, tissue sections (~4 cm 2 ) in the ventral sac were fixed in a solution of 10% formalin for epithelial morphology detection.Samples for the remaining tissue and the epithelium-associated microbiota were snap-frozen in liquid nitrogen and stored at −80 C for host transcriptome and proteomics, respectively.
Rumen fermentation parameters measurement.Determination of the NH 3 -N using a phenol-sodium hypochlorite colorimetric method was performed after the rumen liquid was thawed at 4 °C.Rumen microbial Fig. 1 Overview of the experimental workflows.The goat kids were assigned into three treatments (milk replacer only (MRO), milk replacer supplemented concentrate (MRC) and milk replacer supplemented concentrate plus alfalfa pellets (MCA)) on 20 days of age.At the end of animal feeding trial (60 days of age), goat kids were slaughtered for rumen sample collection.After the rumen was weighted, rumen content and epithelial microbial samples were collected for 16S rRNA sequencing.The rumen epitheliums were collected for transcriptomics, proteomics, and morphology measurements.
proteins were analyzed according to the method described by Makkar et al. 23 .VFA concentration was quantified by gas chromatography (GC) 24 using methyl valerate as the internal standard in an Agilent 6,890 series GC equipped with a capillary column (HP-FFAP19095F-123, 30 m, 0.53 mm diameter and 1 mm thickness).The detection results of rumen fermentation parameters are shown in Table 2.

Measurement of rumen epithelial morphology.
Rumen tissue sections were kept in 70% ethanol until further measurement after 24 h of fixing in formalin.All samples were stained with Yihong-hematoxylin (H.E.) at the Chinese Agriculture University (Beijing, China).The length and width of the rumen papillae and stratum corneum thickness were measured using the Axiovision software (Zeiss, Oberkochen, Germany) Image-pro express image analysis processing system.The results of rumen papilla length, papilla width, lamina propria thickness and epithelial thickness are displayed in Table 3.

Next-generation sequencing of rumen content and epithelial microbiota and analysis.
Total rumen content and epithelial microbial DNA were extracted using the Magnetic Universal Genomic DNA Kit (QIAGEN Inc., Beijing, China) according to the manufacturer's protocol, and the V3-V4 region of the bacterial 16S ribosomal RNA genes was amplified using adaptor-linked universal primers (341 F and 806 R).The concentration of DNA was determined using Qubit ® DNA Assay Kit with a Qubit ® 3.0 Fluorometer (Invitrogen, China).Amplicon libraries were built using all qualified products and sequenced with an Illumina HiSeq PE250 platform at the Realbio Technology Genomics Institute (Shanghai, China).More details related to the sequencing process can be found in our previous study 21 .
Raw sequencing files of the rumen content and epithelial microbiota were processed using the mothur program (v1.39.1) 25 .Forward and reverse reads were merged first, and low-quality reads were removed.The high-quality sequences were then aligned against the SILVA reference database (Full-length sequences and taxonomy references release 132, http://www.arb-silva.de/) 26.Moreover, the VSEARCH algorithm was employed to remove chimeras in filtered sequences.Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) at the 97% similarity level using the Ribosomal Database Project (RDP) database 27 .Alpha (Shannon Index and Observed OTUs) and beta diversities (Bray-Curtis and Jaccard distance) were calculated using mothur.The boxplots of alpha diversity and the PCoA plot of beta diversity were visualized using the Transcriptomic profile of the rumen epithelial tissue.Total RNA of the rumen epithelial samples was extracted using the TRIzol reagent (Invitrogen, CA, USA).The RNA integrity was measured using an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA).If samples had the RNA integrity equal to or over 7, it would be used for sequencing.Then, the library was prepared and sequenced at the Beijing Genomics Institution (Shenzhen, China) using the HiSeq2000 system (Illumina, CA, USA) to obtain 100-bp paired-end reads according to the manufacturer's instructions.Raw reads were filtered to obtain clean reads using the trimmomatic module in SOAPnuke (v1.4.0) software via the removal of adaptors and low-quality reads.Low-quality reads were defined as more than 20% of bases with a quality score smaller than 10 or having more than 5% ambiguous sequences labeled as "N".Then, high-quality RNA reads were mapped and assembled to reference genomes (AnimalTFDB v2.0) using HISAT (v2.1.0) 28.The detection of transcript expression levels was based on the number of fragments per kilobase of exon per million fragments mapped (FPKM).Differentially expressed genes (DEGs) were detected based on methods reported by Wang et al. 29 and the false discovery rate (FDR) was calculated based on methods of Benjamini and Hochberg's multiple testing correction 30 .The significantly DEG were confirmed at a fold change ≥ 2 and a false discovery rate (FDR) < 0.001.Using this method, the DEG were displayed through a pairwise comparison analysis (MRO-vs-MRC, MRO-vs-MCA and MRC-vs-MCA).After expression pattern clustering, the DEG from pairwise comparisons were subjected to functional annotation, including GO (Gene Ontology) functional annotation and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway annotation.The GO terms and KEGG pathway enrichment were performed using The Database for Annotation, Visualization and Integrated Discovery (DAVID v 6.8, http://david.abcc.ncifcrf.gov) 31.
proteomics of the rumen epithelium.Proteins from epithelial samples were extracted using Lysis buffer 3 (8 M Urea, 40 mM Tris-HCl or TEAB with 1 mM PMSF, 2 mM EDTA and 10 mM DTT, pH 8.5) and two magnetic beads (diameter 5 mm).Then, mixtures were placed into a TissueLyser to release proteins.After centrifugating, the supernatant was transferred into a new tube, reduced with 10 mM dithiothreitol (DTT) at 56 °C for 1 hour and alkylated by 55 mM iodoacetamide (IAM) in the dark place at room temperature for 45 min.After a new round of centrifugation (25,000 g, 4 °C, 20 min), the supernatant was quantified by Bradford.Moving to next step of quality control of protein extraction, we then mixed 15-30 μg proteins with loading buffer in centrifuge tube and heated them at 95 °C for 5 minutes (min).Then, the supernatant was centrifuged at 25,000 g for 5 min and loaded to sample holes in 12% polyacrylamide gel.The SDS-PAGE in constant voltage was performed to detect proteins quality at 120 V for 120 min.Once finished, it was stained in gel with Coomassie Blue for 2 hours and added destaining solution (40% ethanol and 10% acetic acid).Ultimately, it was put on a shaker (exchange destaining solution for 3~5 times, 30 min a time).Next, the protein digestion step needed that the protein solution (100 μg) with 8 M urea was diluted 4 times with 100 mM TEAB.Then the proteins were digested at 37 °C overnight by Trypsin Gold (Promega, Madison, WI, USA) in a ratio of protein: trypsin = 40:1.After trypsin digestion, the peptides were desalted using Strata X C18 column (Phenomenex) and vacuum-dried according to the manufacturer's protocol.Then, we did protein labeling.The peptides were dissolved in 30 μL 0.5 M TEAB with vertexing.After the iTRAQ labeling reagents were recovered to ambient temperature, they were transferred and combined with proper samples.Peptide labeling was performed by iTRAQ Reagent 8-plex Kit according to the manufacturer's protocol.The labeled peptides with different reagents were combined, desalted with a Strata X C18 column (Phenomenex), and vacuum-dried according to the manufacturer's protocol.Subsequently, peptide fractionation step was carried out.A Shimadzu LC-20AB HPLC Pump system coupled with a high pH RP column was employed for the separation of peptides.The peptides were reconstituted with buffer A (5% ACN, 95% H 2 O,  maintained in 95% buffer B for 3 min and decreased to 5% within 1 min before equilibrating with 5% buffer B for 10 min.Elution was monitored by measuring absorbance at 214 nm, and fractions were collected per 1 min. The eluted peptides were into 20 fractions and vacuum dried.Next, each fraction was resuspended in buffer A (2% ACN, 0.1% FA) and centrifuged at 20,000 g for 10 min.The supernatant was loaded onto a Thermo Scientific ™ UltiMate ™ 3000 UHPLC system equipped with a trap and an analytical column.The samples were loaded on a trap column at 5 μL/min for 8 min, and then eluted into the homemade nanocapillary C18 column (ID 75 μm × 25 cm, 3 μm particles) at a flow rate of 300 nl/min.The gradient of buffer B (98% ACN, 0.1% FA) was increased from 5% to 25% in 40 min, and then increased to 35% in 5 min, followed by 2 min linear gradient to 80%, then maintained at 80% B for 2 min, and finally returned to 5% in 1 min and equilibrated for 6 min.Finally, we used Mass Spectrometer to detect the proteins.The peptides separated from nanoHPLC were subjected into the tandem mass spectrometry Q EXACTIVE HF X (Thermo Fisher Scientific, San Jose, CA) for DDA (data-dependent acquisition) detection by nano-electrospray ionization.The parameters for Mass Spectrometer (MS) analysis were listed as following: electrospray voltage: 2.0 kV; precursor scan range: 350-1500 m/z at a resolution of 60,000 in Orbitrap; MS/MS fragment scan range: > 100 m/z at a resolution of 15,000 in HCD mode; normalized collision energy setting: 30%; dynamic Exclusion time: 30 s; Automatic gain control (AGC) for full MS target and MS2 target: 3e6 and 1e5, respectively.The MS/MS scan numbers followed one MS scan: 20 most abundant precursor ions above a threshold ion count of 10,000.
The raw MS/MS data was converted into Mascot Generic File (MGF) format, and the MGF files were searched by the local Mascot server against the database.Besides, quality control was performed to determine if a reanalysis step was needed.An automated software, called IQuant, was applied to analyze the labeled peptides with isobaric tags, with steps of protein identification, tag impurity correction, data normalization, missing value imputation, protein ratio calculation, statistical analysis, results presentation.All proteins with a false discovery rate (FDR) less than 1% will proceed with downstream analysis.

Data Records
The raw reads files for each rumen content sample of 16S rRNA sequencing have been uploaded to the NCBI Sequence Read Archive (SRA) with accession number SRP199804 32 , and the raw data of the rumen epithelial samples of 16S rRNA sequencing and transcriptomics have been deposited into NCBI SRA with accession number SRP236061 33 .The raw proteomics data were uploaded to ProteomeXchange Consortium via the iProX partner repository with the dataset identifier PXD047843 34 .All these data can be used freely.

technical Validation
Benefits of early supplementation of high carbohydrate diet was found in this dataset.As shown in Tables 1 and 2, significant increases in nutrient intake, average daily gain and body weight were observed in MRC and MCA groups.Moreover, compared to MRC, MCA had a higher intake of protein, neutral detergent fibres (NDF), and non-fibrous carbohydrates (NFC).Next, a more well-developed rumen was also found in solid diet groups as rumen weight, papilla length and width were significantly increased in MRC and MCA groups.We found that the parameters of rumen fermentation were also affected by solid diet supplementation (Table 3).Compared to MRO, lower NH 3 -N concentration was found in MRC and MCA, while higher concentrations of total VFA, acetate, propionate, butyrate and valerate in MRC and MCA were observed.
For next-generation sequencing, the DNA quality of the 16S was determined, and the DNA total amount ≥ 1 μg and concentration 30 ng/μL indicated that the DNA quality was qualified.The concentration of metagenome libraries was assessed using an Agilent 2100 Bioanalyzer instrument (Agilent DNA 1000 Reagents) and a Genomic DNA Sample Prep Kit for Illumina NovaSeq 6000 Platform, and the libraries with qualified concentration (≥10 nM) and volume (15 μL-100 μL) were subjected for sequencing.Quality control of 16S rRNA sequencing reads was performed using mothur MiSeq SOP (https://mothur.org/wiki/miseq_sop/).The quality assessment of 16S rRNA sequencing reads of both rumen content and epithelial samples is shown in Supplementary Table 1.As shown in Fig. 2A, the samples of rumen content and epithelium tended to cluster based on the organism (mainly along the first axis), the second factor of variation being the individual intra-species variability (y-axis).Thus, PCoA separated the samples according to their origin.The bacteria, including Prevotella and Bacteroidetes, dominated the rumen content communities, while epithelial samples had higher abundances of Prevotella, Lachnospiraceae unclassified, Campylobacter, and Desulfobulbus (Fig. 2B).
To ensure the quality of the transcriptomic sequencing data, a state-of-the-art equipment for molecular biology was employed to determine the purity, concentration, and integrity of RNA.Subsequently, the library's quality was assessed through testing.Once the requirements are met, computer sequencing can be conducted.A total of 109 Gb of clean data were generated, with an average of 6.44 Gb per subject.After filtration using trimmomatic, the proportion of clean reads with quality score over 30 was 96.79% (Supplementary Table 2).When mapping the high-quality reads to reference genome, we found the average of the mapping rations of all samples was 76.75% (Supplementary Table 2).The gene expressions of each epithelial sample were shown in Fig. 3.
The iTRAQ (Isobaric tags for relative and absolute quantitation, iTRAQ) technology was confirmed to have its high precision in protein quantitative method.Three technical duplicate experiments were conducted for each sample.Totally 1,443,120 spectrums were generated, 26,793 peptides and 6,003 proteins were identified with 1% FDR.Coefficient of Variation (CV) defined as the ratio of the standard deviation (SD) to the mean was used to evaluate the reproducibility (Fig. 4).The lower the CV, the better the reproducibility.The Gene Ontology (GO) annotation for all identified proteins were displayed (Fig. 5).

Usage Notes
Our comprehensive dataset of rumen microbiota and epithelial omics resulting from solid diet provides insights into association between the critical microbiota and host gene expression.Although our preliminary findings reveal how solid diet and its nutrients drive rumen microbiome, epithelial gene and proteins, more interesting Fig. 4 Quantification repeat analysis of the rumen epithelial proteomics.X-axis is the deviation between the protein ratio of the repetitive samples.Y-axis is the quantified protein amount at the corresponding range.biological pathways can be dug via re-analysing our omics dataset, which allow us to deeply understand the interactions between microbiome and host.Moreover, common bioinformatic software and pipeline used in this study is great for reuse of the data.

Fig. 2
Fig. 2 Next-generation sequencing of the rumen content and epithelial microbiota in goat kids.(A) Beta diversity of the rumen content and epithelial microbiota based on Bray-Curtis.One point represents one sample.(B) Rumen microbial composition at the genus level.Each column represents a sample, and each bar represents one bacterium.MROC, MRCC and MCAC represent content samples in animals that received MRO, MRC and MCA diets, while MROE, MRCE and MCAE represent the epithelial microbiota from the three diets, respectively.The MRO treatment was fed only milk replacer, the MRC treatment was fed milk replacer with concentrate and the MCA treatment was fed milk replacer with concentrate plus alfalfa.

Fig. 5
Fig. 5 Bar plot of the Gene Ontology Analysis using proteomics.The bar chart shows the distribution of corresponding GO terms.Different colors represent different GO categories.

Table 1 .
Effect of early supplementary solid diet on nutrient intake of goat kids.MRO = milk replacer, MRC = milk replacer + concentrate, MCA = milk replacer + concentrate + alfalfa; SEM = Standard error of the means.In the same row, values with different small letter superscripts mean significant difference (P < 0.05).The superscripts alphabet order represents a decreasing order of the mean among groups.

Table 2 .
Effect of early feeding on rumen fermentation parameters of goat kids.VFA = volatile fatty acids In the same row, values with different small letter superscripts mean significant difference (P < 0.05).The superscripts alphabet order represents a decreasing order of the mean among groups.'ggplot2'package in R (v3.6.0).The ANalysis Of SIMilarity (ANOSIM) test was performed to test the statistical significance of beta diversity.

Table 3 .
Effects of early supplementary solid diet on growth performance and rumen fermentation parameters in goat kids.MRO = milk replacer, MRC = milk replacer + concentrate, MCA = milk replacer + concentrate + alfalfa; SEM = Standard error of the means.BW = body weight; A:P = the ratio of acetate and propionate In the same row, values with different small letter superscripts mean significant difference (P < 0.05).The superscripts alphabet order represents a decreasing order of the mean among groups.
adjust pH to 9.8 with ammonia) to 2 ml and loaded onto a column containing 5 μm particles (Phenomenex).The peptides were separated at a flow rate of 1 mL/min with a gradient of 5% buffer B (5% H 2 O, 95% ACN, adjusted pH to 9.8 with ammonia) for 10 min, 5-35% buffer B for 40 min, 35-95% buffer B for 1 min.The system was then