Whole blood transcriptional profiling comparison between different milk yield of Chinese Holstein cows using RNA-seq data

Background The objective of this research was to investigate the variation of gene expression in the blood transcriptome profile of Chinese Holstein cows associated to the milk yield traits. Results We used RNA-seq to generate the bovine transcriptome from the blood of 23 lactating Chinese Holstein cows with extremely high and low milk yield. A total of 100 differentially expressed genes (DEGs) (p < 0.05, FDR < 0.05) were revealed between the high and low groups. Gene ontology (GO) analysis demonstrated that the 100 DEGs were enriched in specific biological processes with regard to defense response, immune response, inflammatory response, icosanoid metabolic process, and fatty acid metabolic process (p < 0.05). The KEGG pathway analysis with 100 DEGs revealed that the most statistically-significant metabolic pathway was related with Toll-like receptor signaling pathway (p < 0.05). The expression level of four selected DEGs was analyzed by qRT-PCR, and the results indicated that the expression patterns were consistent with the deep sequencing results by RNA-Seq. Furthermore, alternative splicing analysis of 100 DEGs demonstrated that there were different splicing pattern between high and low yielders. The alternative 3’ splicing site was the major splicing pattern detected in high yielders. However, in low yielders the major type was exon skipping. Conclusion This study provides a non-invasive method to identify the DEGs in cattle blood using RNA-seq for milk yield. The revealed 100 DEGs between Holstein cows with extremely high and low milk yield, and immunological pathway are likely involved in milk yield trait. Finally, this study allowed us to explore associations between immune traits and production traits related to milk production. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2901-1) contains supplementary material, which is available to authorized users.


Background
Milk yield and milk composition of lactating cows are the most important economic traits in dairy cattle. In the past century, genetic selection on milk yield has improved milk production in cattle. With the development of quantitative trait loci (QTLs), genome-wide association studies (GWAS) and RNA sequencing (RNA-seq) technologies, a large number of candidate genes and SNPs associated with milk performance traits have been identified, such as DGAT1 and GHR genes [1][2][3][4][5][6]. Previous studies described the associations between DGAT1 K232A polymorphism and milk production traits [7,8]. Blott et al. identified a significant SNP (BFGL-NGS-118998) inside the GHR gene that has an important role related to milk traits [9].
Mammary gland is an important organ to synthesize and secrete milk. The mammary epithelial cell has a remarkable ability to convert blood circulating nutrients into milk components [10]. Thus, almost all the studies related to milk performance traits are based on mammary gland. For example, Cui et al. collected mammary gland samples from four lactating cows after slaughter. They used RNA-seq to generate the bovine mammary transcriptome with high and low milk fat and protein percentage [11]. Finucane et al. compared bovine mammary expression profiles before and after parturition using microarray [12]. However, there are still limitations in analysis of milk performance traits using bovine mammary tissue, such as sampling difficultly, high cost of sampling and tremendous damage to the lactating cows, which resulted in small sample size.
Milk is composed of fat, protein, lactose, minerals, vitamins and water, and all these nutrients derive from blood [10,[13][14][15]. Some data shows that synthesizing 1L milk requires 400-500L blood flow though the mammary gland. So plenty of blood is essential for synthesizing milk. Previous studies associated with milk traits were only focused on blood physiological and biochemical indexes, and there are very few studies associating gene expression with milk traits in cattle blood. A gene expression profile from blood provides new opportunities to clarify the basic mechanisms of complex traits in cattle milk. Besides, whole blood is a complex mixture of cells and can accurately reflect the physiological condition and health level of cows. Many studies have used blood to diagnose disease in dairy cattle, such as mastitis [16,17]. Most importantly, blood is easier to sample in comparison with other tissues and involves limited handling of animals. Furthermore, the lactation process requires multiple tissues and organs to complete, such as mammary gland, liver and adipose tissue. Hence, blood has the potential ability to represent milk performance traits more directly and comprehensively. Sandri et al. (2015) analyzed the gene expression profile in the blood related to the gene merit for milk productive traits using microarray [18]. Marcel et al. analyzed porcine peripheral blood mononuclear cells transcription profile of pigs with divergent humoral immune response and lean growth performance [19]. To complete understand the impact of blood transcriptome profiles on milk yield, comprehensive cataloguing of gene expression change within high yielders and low yielders is required. The aim of this study is to compare gene expression profiles in bovine whole blood of high and low milk yield cows, and to investigate potential molecular biomarkers in the blood transcriptome that relate to the productive potential of lactating cows using RNA-seq techniques.

Analysis of expressed transcripts between high and low yielders
From 23 samples, we obtained total 74.6 Gb RNA-Seq data files (26,763,546 to 51,313,614 paired-end reads per sample). Nearly 68 % of the reads were mapped to the bovine genome UMD3.1.66 and approximately 62 % of the reads in every individual were uniquely mapped to the bovine genome. The alignment information for each sample is presented in Table 1. Of these, 16,314 and 16,151 expressed transcripts were revealed in high yielders and low yielders, respectively (Additional file 1: Table S1).

DEGs and splice events between high and low yielders
To provide a better understanding of the biological mechanism of milk yield, it is essential to identify the DEGs between high and low milk yield cows. Based on the Cuffdiff analysis, a total of 100 DEGs (p<0.05, FDR< 0.05) were examined between the high and low yielders (Fig. 1). All the DEGs were located in chromosomes randomly, but there were no DEGs identified in chromosome 14, 20, 27, and 28. The expression level of 100 DEGs was from 2 to 1063 FPKM in high group, and 0.4 to 1794 FPKM in low group. In addition, 43 of the 100 DEGs were highly expressed in the high yielders; whereas, the other 57 DEGs showed lower expression in low yielders. The expression level of 100 DEGs is shown in Fig. 2 and the detail information is presented in Additional file 2: Table S2.
Splice events are thought to contribute to phenotypic complexity during the mammalian evolution [20]. In total, we obtained 44,572 and 36,467 splice events in high and low yielders, respectively, compared to the annotated bovine genome UMD3.1.66. Of these, 214 (in high yielders) and 202 (in low yielders) differentially expressed splice events were identified. Further analysis showed that the 214 and 202 splice events involved 59 DEGs and 57 DEGs, respectively. Major splicing events such as, exon skipping (ES), intron retention (IR), alternative 5' splicing site (A5SS), alternative 3' splicing site (A3SS) and mutually exclusive exon (MXE) were detected in our studied bovine blood transcriptomes. The A3SS was the major splicing pattern observed in high group; but in low group, the major type was ES (Fig. 3). This suggests that high milk yield cows are more inclined to take the A3SS pattern. In addition, more splice sites were found in chromosome 26 in both groups.

Functional classification of DEGs
The DAVID tool [21] was used to annotate the function of the 100 DEGs with the particular categories focusing on the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. A total of 55 clusters (p<0.05) were significantly annotated with GO terms within three major function groups: biological process (BP), cellular component (CC), and molecular function (MF). The most significant GO categories observed were defense response, immune response, inflammatory response, icosanoid metabolic process, and fatty acid metabolic process (p<0.05) (Fig. 4). Only one KEGG pathway was enriched, which was the Toll-like receptor signaling pathway. On the other hand, based on the enrichment analysis of DEGs containing alternative splice sites revealed that the PPAR signaling pathway was only detected in the high yielded group. Detailed information of the DEG functional annotations are showed in Table 2.

qRT-PCR validation of DEGs in high and low milk yielders
Furthermore, we randomly selected 4 DEGs identified from the RNA-seq data, LGALSL, IL-8, FAM213B and CCL5, to validate their expression patterns using qRT-PCR. The results from the qRT-PCR confirmed that the DEGs had the same expression pattern observed with the RNA-seq (Fig. 5). This indicates that the gene expression observed in blood transcriptome between high yielders and low yielders was highly credible.

Discussion
The object of this study is to investigate potential molecular biomarkers in whole blood related to milk production traits in lactating dairy cows, with the aim of putting forward a non-invasive method that identifies the DEGs for milk performance traits. In this study, whole blood genome transcriptome profiles of high and low yield milk cows were investigated using RNA-seq technique. RNA-seq has many advantages over traditional cDNA microarray technologies and it can easily detect low-abundance genes [22,23]. Marioni et al. demonstrated that RNA-seq and qRT-PCR have a high correlation, and that the Pearson correlation could reach 0.929 [23], which means RNA-Seq is accurate and reproducible.
Among these 100 DEGs, many genes were also detected by other studies associated with milk yield, such as BOLA-DQA1. BOLA-DQA1 is one of the primary histocompatibility complex (MHC) class II molecules that plays an important role in the immune system. It is thought that MHC genes indirectly affect milk production traits by increasing the disease resistance of the cows. In this present study, the expression level of BOLA-DQA1 was higher in high milk yielders than in low milk yielders. Other research has reported that the BOLA-DQA1 gene was associated with resistance to mastitis progression [24].
Splicing event analysis showed that DEGs in high and low milk yield cows were different. In low yielders, cows are more inclined to take the ES pattern but in high yielders A3'SS was more likely. Oxidized low-density lipoprotein receptor (OLR1), which can degrade the oxidized form of low-density lipoprotein and plays important roles in fatty acid transport, was reported as a potential gene for milk-fat percentage and milk fat yield [25]. The splice site analysis in our study also found that AS in OLR1 was significant. There were 5 splice sites of OLR1 in high milk yield including one ES, one A5'SS and three other events. But in low yielders, there was only one splice site on OLR1, which was A5'SS. Also, the expression level of OLR1 was higher in high yielders than in low yielders. Splice events analysis of DEGs can reveal differences between high and low milk yield cows. Among those DEGs containing splice sites, gene ontology enrichment indicated that the PPAR signaling pathway was significantly different between high and low groups. Study showed that PPARA is expressed in heart, liver, adipose tissue, and muscle tissue and is involved in fatty acid catabolism [26]; PPARs play important roles in the regulation of metabolic and inflammatory signaling pathways [26,27]. It is also reported that PPAR-γ (PPARG) is over-expressed in adipose tissue and macrophages and primarily regulates adipogenesis [27,28]. PPAR-γ has been reported to significantly increase its expression during lactation in bovine mammary gland [28]. Also, anti-inflammatory properties were observed between PPARA and PPARG [29]. In our study, we found that this PPAR pathway is also in the blood transcriptome, and it was only detected in high yielders. Moreover, these three PPAR pathway DEGs: LPL, OLR1, and CPT1A, were highly expressed in high yielders. Also, the Toll-like receptor signaling pathway existed only in high yielders. The Toll-like receptor pathway is also involved in innate immunity [29]. These results indicate that the regulation of metabolic and immune function is more active in high milk yield cows. The immune system plays a key role in health maintenance, pathogenesis, diseases resistance, and production performance. Further research is required to explore the relation between immune response and milk performance traits in cattle blood transcriptomes.

Conclusion
The present study provides a non-invasive method to identify the DEGs in cattle blood using RNA-seq for milk yield traits. The study revealed 100 DEGs between high yielders and low yielders, and discovered different alternative splicing patterns between the two groups. The enrichment analysis also revealed that specific metabolic and immunological pathways are Fig. 1 Volcano plot displaying DEGs between the whole blood of 10 low yielders and 13 high yielders. The y-axis corresponds to the mean expression value of log10 (q-value), and the x-axis displays the log2 fold change value. The red dots circled in the frame represent the significantly differentially expressed transcripts (p value < 0.05 and FDR < 0.05) between high and low milk yield cows; the blue dots represent the transcripts whose expression levels did not reach statistical significance between the two groups related to cattle milk yield traits, and could be considered a signature of blood biomarkers selection in dairy cattle. These results provide the valuable resources of biological research in Chinese Holstein cows milk production, but also offer some potential guidelines to understand the relationship between milk production and the immune function.

Blood samples collection
Twenty three Chinese Holstein cows in their second or third lactation were selected based on their current lactation and previous lactation records from BingZhouHai Dairy Farm. All the cows are reared under the same standard. 13 high yielders:~28kg/ day/cow and 10 low yielders:~18kg/day/cow were selected for this study. 5ml of whole blood was drawn from the jugular vein of each cow. The blood samples were frozen in liquid nitrogen then transferred to −80°C for further RNA extraction. All experimental methods used in this study were approved by the Inner Mongolia Agricultural University (Hohhot, China) Institutional Animal Care and Use Committee.

Differentially expressed genes and splice events analysis between high and low yielders
Clean reads were acquired by removing low quality reads with adapter and poly-N from raw reads and were used for subsequent analysis. We downloaded the cattle genome (UMD3.1.66) and annotation files from the ensembl database. Bowtie v0.12.8 was used to build the index of cattle reference genome [22] and TopHat v2.1.0 [23] was employed to align clean reads for each sample against the cattle reference genome. DEGs between the high and low yielders were detected using Cuffdiff [30]. P value <0.05 and FDR<0.05 were as the threshold in this study to select differentially expressed genes between high and low yielders. The Alternative Splicing Transcriptional Landscape Visualization Tool (Astalavista) web server (http://genome. crg.es/astalavista/) extracts and displays splice events from genomic annotation of exon-intron gene coordinates. Astalavista v3.0 [20] was employed to identify the alternative splice events for all available transcripts and to study the five basic splice events of 100 DEGs for 23 samples.

Enrichment analysis of DEGs
To further investigate the function of 100 DEGs, we performed the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis using DAVID bioinformatics resource tool [21]. GO and KEGG terms with p<0.05 were considered significantly enriched. LGALSL and IL-8 were highly expressed in high yielders. FAM213B and CCL5 were highly expressed in low yielders detected by RNA-seq