Single Nucleotide Polymorphisms of NUCB2 and Their Genetic Associations with Milk Production Traits in Dairy Cows

We previously used the RNA sequencing technique to detect the hepatic transcriptome of Chinese Holstein cows among the dry period, early lactation, and peak of lactation, and implied that the nucleobindin 2 (NUCB2) gene might be associated with milk production traits due to its expression being significantly increased in early lactation or peak of lactation as compared to dry period (q value < 0.05). Hence, in this study, we detected the single nucleotide polymorphisms (SNPs) of NUCB2 and analyzed their genetic associations with milk yield, fat yield, fat percentage, protein yield, and protein percentage. We re-sequenced the entire coding and 2000 bp of 5′ and 3′ flanking regions of NUCB2 by pooled sequencing, and identified ten SNPs, including one in 5′ flanking region, two in 3′ untranslated region (UTR), and seven in 3′ flanking region. The single-SNP association analysis results showed that the ten SNPs were significantly associated with milk yield, fat yield, fat percentage, protein yield, or protein percentage in the first or second lactation (p values <= 1 × 10−4 and 0.05). In addition, we estimated the linkage disequilibrium (LD) of the ten SNPs by Haploview 4.2, and found that the SNPs were highly linked in one haplotype block (D′ = 0.98–1.00), and the block was also significantly associated with at least one milk traits in the two lactations (p values: 0.0002–0.047). Further, we predicted the changes of transcription factor binding sites (TFBSs) that are caused by the SNPs in the 5′ flanking region of NUCB2, and considered that g.35735477C>T might affect the expression of NUCB2 by changing the TFBSs for ETS transcription factor 3 (ELF3), caudal type homeobox 2 (CDX2), mammalian C-type LTR TATA box (VTATA), nuclear factor of activated T-cells (NFAT), and v-ets erythroblastosis virus E26 oncogene homolog (ERG) (matrix similarity threshold, MST > 0.85). However, the further study should be performed to verify the regulatory mechanisms of NUCB2 and its polymorphisms on milk traits. Our findings first revealed the genetic effects of NUCB2 on the milk traits in dairy cows, and suggested that the significant SNPs could be used in genomic selection to improve the accuracy of selection for dairy cattle breeding.


Introduction
Milk provides rich nutrients, protein, fatty acids, and vitamins, etc., for human [1], so the milk production traits are considered as the most important economic traits in the dairy industry, including milk yield, fat yield, fat percentage, protein yield, and protein percentage [2]. Researchers have attempted to identify the functional genes that have large effects on milk production traits to improve the accuracy of selection since the implementation of whole genomic selection. At the present, the RNA sequencing has been widely applied to study the specific gene expression patterns at different developmental stages or in different tissues [3]. We previously used this technique to obtain hepatic transcriptome of Chinese Holstein cows among dry period, early lactation, and peak of lactation, and found that the expression of the nucleobindin 2 (NUCB2) gene was significantly increased in early lactation (q value = 0.00811) and the peak of lactation (q value = 0.049246) as compared to dry period, respectively [4], so we considered that the NUCB2 might be associated with milk production traits.
Nucleobindin 2 (NUCB2) is a Ca 2+ binding protein, and it processes to generate a terminal peptide, termed nesfatin-1. Nesfatin-1 regulates the appetite and glucose metabolisms in humans and domestic animals [5][6][7]. NUCB2 exhibits a high conservation among species, and its mRNA is ubiquitously expressed [8]. Sun et al. first demonstrated that NUCB2 mRNA and nesfatin-1 protein were highly expressed in the liver of mouse fetus at embryonic day (E) 10.5, and their expression were extensively decreased at E17.5, which suggests that NUCB2 might play a critical role in liver development and physiological functions in the developmental process of mouse fetus [9]. NUCB2 is involved in human mammary epithelial cell proliferation and migration [10], and its expression is down-regulated when the milk production is decreased [11], which implies that the expression of NUCB2 might be beneficial to the increase of milk production. In addition, the NUCB2 gene was located on chr.15:43.5418 and it had a distance of 0.86 and 2.54 cM to the peak of reported quantitative trait locus (QTL) regions that have large effect on protein percentage [12] and protein yield [13], respectively. The NUCB2 was also near to three significant SNPs for milk traits, BFGL-NGS-116109, BTB-00590405, and BTB-00590603, with the distance of 0.93-5.99 Mb [14]. These data indicate that NUCB2 may be involved in milk production traits.
To date, rare studies have been reported to uncover the genetic association between the single nucleotide polymorphisms (SNPs) of NUCB2 and milk production traits in dairy cattle. In addition, the studies have revealed that haplotype blocks that are formed by the SNPs have important implications for identifying associations with complex traits [15][16][17]. Hence, in this study, we detected the polymorphisms of the NUCB2 gene and estimated the haplotype block that was formed by the identified SNPs, and then analyzed their associations with five milk traits, including milk yield, fat yield, fat percentage, protein yield, and protein percentage.

Animals
We selected a total of 1067 Chinese Holstein cows of 40 sire families from 22 dairy farms of the Sanyuan Lvhe Dairy Farming Centre (Beijing, China) on the same feeding conditions. The cows in each sire family were distributed in different dairy farms. These 1067 cows have three-generation pedigree information at least and Dairy Herd Improvement (DHI) records for 305-day milk yield, protein yield, protein percentage, fat yield, and fat percentage. Ethics Approval: The Institutional Animal Care and Use Committee (IACUC) of China Agricultural University approved all of the protocols for sample collection (permit number: DK996).

Phenotypic Data Collection
The Beijing Dairy Cattle Centre provided the phenotypic data of 305-day milk yield, fat yield, fat percentage, protein yield, and protein percentage in the first and second lactations (http://www. bdcc.com.cn/), and the descriptive statistics of them are shown in Table S1.

DNA Extraction
The semen DNAs of 40 sires were extracted using the salt-out procedures, and the blood DNAs of 1067 daughters were extracted while using the TIANamp Blood DNA Kit (Tiangen, Beijing, China), according to the manufacturer's instructions. Subsequently, we measured the quantity and quality of the extracted DNAs by the NanoDrop 2000 spectrophotometer (Thermo Scientific, Hudson, NH, USA) and gel electrophoresis, respectively.

SNP Identification and Genotyping
We designed the primers (Table S2) to amplify the entire coding region and 2000 bp of the 5 and 3 flanking regions based on the bovine reference genome sequences of NUCB2 by Primer3 (version 0.4.0, Whitehead Institute for Biomedical Research, Cambridge, MA, USA) (http://bioinfo.ut.ee/primer3-0.4.0/), and then synthesized them in the Beijing Genomics Institute (BGI, Beijing, China). We randomly mixed 40 semen DNAs into two pools (20 sires per pool) with equal concentrations (50 ng/µL) for each DNA. The reaction conditions and amplification procedures for PCR are provided in the Table  S2. Afterwards, the PCR products were purified and sequenced by the ABI3730XL DNA analyser (Applied Biosystems, Foster City, CA, USA). Subsequently, we analyzed the sequencing data while using CHROMAS (version 2.23, Technelysium, Tewantin, Queensland, Australia) to discovery the potential SNPs. The identified SNPs were individually genotyped using Sequenom MassArray (Agena Bioscience, San Diego, CA, USA) for all the 1067 cows by matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS, Bioyong Technologies Inc., HK).

Estimation of Linkage Disequilibrium (LD) and Association Analyses
The extent of LD between the identified SNPs was estimated while using Haploview 4.2 (Broad Institute of MIT and Harvard, Cambridge, MA, USA). Subsequently, we used the Statistical Analysis System (SAS) 9.13 software (SAS Institute, Cary, NC, USA) to analyze the associations of the SNPs and haplotype blocks with milk production traits with the following animal model: Y = µ + hys + b × M + G + a + e, in which, Y is the phenotypic value of each trait for each cow; µ is the overall mean; hys is the fixed effect of farm, year, and season; b is the regression coefficient of covariant M; M is the fixed effect of calving month; G is the genotype or haplotype combination effect; a is the individual random additive genetic effect, being distributed as N 0, Aδ 2 a , with the additive genetic variance δ 2 a ; and, e is the random residual, which is distributed as N 0, Iδ 2 e , with identity matrix I and residual error variance δ 2 e . Bonferroni correction was performed for multiple testing, and the significant level was equal to the raw P value, divided by number of genotypes or haplotype combinations. In addition, the additive (a), dominant (d), and substitution (α) effects were calculated, as follows: a = AA−BB 2 ; d = AB − AA+BB 2 ; α = a + d (q − p), where, AA, BB, and AB are the least square means of the milk production traits in the corresponding genotypes, p is the frequency of allele A, and q is the frequency of allele B.

Transcription Factor Binding Sites (TFBSs) Prediction
We used the MatInspector (version 3.11, Genomatix, Munich, Germany) (http://www.genomatix. de/cgi-bin/welcome/welcome.pl?s=d1b5c9a9015b02bb3b1a806f9c03293f) to predict the changes of the TFBSs that are caused by the identified SNPs in the 5 flanking region of the NUCB2 (matrix similarity threshold, MST > 0.85).

NUCB2 Gene Expression Analysis
Three healthy lactating Chinese Holstein cows were selected from the Sanyuanlvhe Dairy Farming Center (Beijing, China), and the heart, liver, spleen, lung, kidney, ovary, mammary, and uterus from each cow were collected. Subsequently, the expression of NUCB2 in eight tissues was detected to further reveal its potential function. The total RNAs of the tissues were extracted while using a Trizol reagent (Invitrogen, Carlsbad, CA, USA), and the quantity and quality of the RNA were measured with a NanoDrop 2000 spectrophotometer (Thermo Scientific, Hudson, DE, USA) and gel electrophoresis, respectively. We used a PrimerScriptH RT reagent Kit (TaKaRa Biotechnology Co., Ltd., Dalian, China) for the reverse transcription. The qPCR primers of NUCB2 and two reference genes (Ribosomal Protein S9, RPS9, and Ubiquitously Expressed Prefoldin Like Chaperone, UXT) were presented in Table S2. We conducted the qPCR while using a LightCycler ® 480 II (Roche, Penzberg, Germany) with the procedures that are shown in Table S2. All of the measurements were performed in triplicate and the relative gene expression was normalized by the RPS9 and UXT with 2 -∆∆Ct method.

SNP Identification in NUCB2 Gene
By re-sequencing, we identified ten SNPs in the entire coding region and 2000 bp of 5 and 3 flanking regions of NUCB2, including one in the 5 flanking region, two in 3 UTR, and seven in the 3 flanking region (Table 1). Further, we individually genotyped these ten SNPs while using Sequenom MassArray for all the 1067 cows, and their allelic and genotypic frequencies are shown in Figure 1.

Associations between Ten SNPs and Five Milk Production Traits
We analyzed the genetic associations between the ten SNPs of NUCB2 and milk yield, fat yield, fat percentage, protein yield, and protein percentage, respectively, by SAS, and the results showed that these ten SNPs were significantly associated with at least one milk traits in the first or second lactation ( Table 2). In the first lactation, there were five SNPs that were associated with fat yield (p values: 0.0012-0.0178), nine with fat percentage (p values: 0.0035-0.0388), and eight with protein percentage (p values: 0.0075-0.0334), and no association was found between the SNPs and milk and protein yields (p values > 0.05). In the second lactation, ten SNPs had strong associations with milk yield (p values <= 1 × 10 −4 and 0.0089), five with fat yield (p values: 0.0073-0.0466), one with fat percentage (p value = 0.0257), ten with protein yield (p values <= 1 × 10 −4 and 0.0347), and six with protein percentage (p values: 0.0107-0.0467), respectively. Two SNPs, c.*1253A>G and g.35692674A>G, were strongly associated with fat yield (p values: 0.0073-0.0383) in both of the two lactations. The g.35735477C>T had significant association with fat percentage in the first (p value = 0.0122) and second lactation (p value = 0.0257). There were four SNPs, g.35735477C>T, c.*1276C>T, g.35692145C>T, and g.35691328T>C, which were associated with protein percentage ((p values: 0.0075-0.0357) in both lactations. In addition, as the results show in Table S3, the additive, dominant, and substitution effects of the ten SNPs were found to be significantly associated with milk traits (p values < 0.05).

Associations between Haplotype Combinations with Five Milk Traits
We estimated the LD among the identified SNPs of NUCB2 and found a haplotype block (D =0.98-1.00; Figure 2) that was included all ten SNPs. The block consisted of three haplotypes, H1 (TACTTAACAC), H2 (TACTTAACAT), and H3 (CGTCCGGTGT), with the frequency of 75.3%, 1.3%, and 22.6%, respectively. By haplotype-based association analyses, we found that the haplotype block was significantly associated with protein percentage (p value = 0.0393) in the first lactation, milk yield (p value = 0.0002), fat percentage (p value = 0.047), protein yield (p value = 0.0015), and protein percentage (p value = 0.0386) in the second lactation, respectively (Table 3).

TFBSs Changed by g.35735477C>T of NUCB2
For the SNP, g.35735477C>T, in the 5′ flanking region of the NUCB2 gene, we predicted the changes of TFBSs caused by them while using Matinspector. The allele C in g.

TFBSs Changed by g.35735477C>T of NUCB2
For the SNP, g.35735477C>T, in the 5 flanking region of the NUCB2 gene, we predicted the changes of TFBSs caused by them while using Matinspector. The allele C in g.

TFBSs Changed by g.35735477C>T of NUCB2
For the SNP, g.35735477C>T, in the 5′ flanking region of the NUCB2 gene, we predicted the changes of TFBSs caused by them while using Matinspector. The allele C in g.     Note: The number in the bracket represents the number of cows for the corresponding genotype; p value shows the significance for the genetic effects of SNPs; a, b within the same column with different superscripts means p value < 0.05; A, B within the same column with different superscripts means p value < 0.01. Note: H means haplotype; the number in the bracket represents the number of cows for the corresponding haplotype combination; H1 (TACTTAACAC), H2 (TACTTAACAT), H3 (CGTCCGGTGT); p value shows the significance for genetic effects among the haplotype blocks; a, b within the same column with different superscripts means p value < 0.05; A, B within the same column with different superscripts means p value < 0.01.

Expression of NUCB2 in Tissues
We found that NUCB2 gene was expressed in heart, liver, spleen, lung, kidney, ovary, mammary gland, and uterus of the lactating Chinese Holstein and highly expressed in liver tissue while using the qPCR, (Figure 4), which suggested that it might be involved with substance metabolisms for milk synthesis in the liver.

Expression of NUCB2 in Tissues
We found that NUCB2 gene was expressed in heart, liver, spleen, lung, kidney, ovary, mammary gland, and uterus of the lactating Chinese Holstein and highly expressed in liver tissue while using the qPCR, (Figure 4), which suggested that it might be involved with substance metabolisms for milk synthesis in the liver

Discussion
NUCB2 was involved with milk production traits in dairy cows n our previous hepatic transcriptome study, and this follow-up investigation underlined that the SNPs of NUCB2 have significant genetic effects on milk yield and composition.
Genomic selection has revolutionized dairy cattle breeding, and the evaluation system with DNA marker technology and genomics has doubled the rate of genetic progress for economic traits [18]. A study has revealed that the use of available loci improved the accuracy of genomic prediction in dairy cattle [19], which suggests that loci that are associated with milk production traits could be used in genomic selection programs that are aimed at accelerating genetic progress of dairy cattle. Studies have shown that the SNPs in the candidate genes significantly influenced the milk yield and composition in Holsteins [20][21][22][23][24][25]. In this study, we identified ten SNPs in NUCB2 and found that the SNPs and haplotypes of this gene were significantly associated with milk production traits. Currently, the Illumina 50K/770K chip and the GeneSeek 90K/150K chip are commonly used, and most of the SNP markers in chips were from the SNP database with even distribution across the whole genome. Thus, the certain SNPs of NUCB2 gene with significant genetic effects on milk traits could be put into the SNP chip to increase the selection efficiency for the milk production in dairy cattle.
In addition, for each identified SNP of NUCB2 in the current study, we observed that the genotypic frequency of the cows with significantly high phenotypic values of 305-day milk yield, fat yield, fat percentage, protein yield, or protein percentage were higher. This implied the effectiveness

Discussion
NUCB2 was involved with milk production traits in dairy cows n our previous hepatic transcriptome study, and this follow-up investigation underlined that the SNPs of NUCB2 have significant genetic effects on milk yield and composition.
Genomic selection has revolutionized dairy cattle breeding, and the evaluation system with DNA marker technology and genomics has doubled the rate of genetic progress for economic traits [18]. A study has revealed that the use of available loci improved the accuracy of genomic prediction in dairy cattle [19], which suggests that loci that are associated with milk production traits could be used in genomic selection programs that are aimed at accelerating genetic progress of dairy cattle. Studies have shown that the SNPs in the candidate genes significantly influenced the milk yield and composition in Holsteins [20][21][22][23][24][25]. In this study, we identified ten SNPs in NUCB2 and found that the SNPs and haplotypes of this gene were significantly associated with milk production traits. Currently, the Illumina 50K/770K chip and the GeneSeek 90K/150K chip are commonly used, and most of the SNP markers in chips were from the SNP database with even distribution across the whole genome. Thus, the certain SNPs of NUCB2 gene with significant genetic effects on milk traits could be put into the SNP chip to increase the selection efficiency for the milk production in dairy cattle.
In addition, for each identified SNP of NUCB2 in the current study, we observed that the genotypic frequency of the cows with significantly high phenotypic values of 305-day milk yield, fat yield, fat percentage, protein yield, or protein percentage were higher. This implied the effectiveness of dairy cattle breeding in recent years, and also reflected that these loci might be closely associated with the milk production traits. However, we found that the ten SNPs of NUCB2 showed different associations between the first and second lactations, and considered that the possible reason might be the different number of cows that are used for genetic association analysis, that is, we used 1067 cows in the first lactation and 740 in the second lactation (327 cows merely completed the milking of first lactation), which might impact the statistical significance. Generally, cows have higher milk production in the second lactation, and the different physiologic status of cows between the two lactations was also one possibility for the different genetic effects across lactations. In addition, the unstable association of the SNPs in two lactations was probably due to the fact that the causal polymorphisms are not very far, but not for these ones. Besides, we observed that the cows with heterozygous genotype showed the dominance for milk traits, ten SNPs for milk yield in the second lactation, for instance, which were caused by the dominance effects of the heterozygotes. We usually do not consider it in breeding, since the dominant effect is not heritable across generations. Hence, we should choose SNPs with significant additive effects on milk traits in all lactations for dairy cattle breeding.
In this study, we used the qPCR and observed that NUCB2 was expressed in heart, liver, spleen, lung, kidney, ovary, mammary, and uterus tissues of dairy cows, and relatively highly expressed in liver. In dairy cows, the liver participates in carbohydrate, fat, protein, and other substance metabolisms, through blood circulation, it provides cholesterol, triglyceride, and other ingredients to mammary gland to synthesize milk protein and fat [26][27][28]. Our data suggest that NUCB2 may play an important role in milk synthesis metabolism in dairy cows.
The specific binding of transcription factors to the regulatory regions of the DNAs is a important regulatory mechanism that affects gene expression [29,30]. In the current study, the allele C of g.35735477C>T was predicted to create the binding sites for ELF3, CDX2, and VTATA, and the allele T to invent the binding sites for NFAT and ERG. Oettgen et al. revealed that ELF3 directly drives SPRR2A mRNA expression through binding to the sites in its promoter during terminal differentiation of the epidermis [31]. The overexpression of ELF3 transactivates the promoter of KRT12 in the differentiated mouse corneal epithelium [32]. ELF3 binds to the miR-141-3p promoter and it suppresses its expression, thereby promoting the epithelial-mesenchymal transition in human hepatocellular carcinoma [33]. ELF3 directly binds Estrogen Receptor α (ERα) and it represses its transcriptional activity in ERα positive breast cancer, showing a tumour suppressive role [34]. Zhu et al. reported that CDX2 directly transactivated the oncogene CDH-17 and promoted hepatocellular carcinoma cell proliferation [35]. CDX2 transactivates the expression of GSK-3β and Axin2 to inhibit the cell proliferation and tumor formation in colon cancer [36]. CDX2 binds to the distinct target genes and it activates their expression in developing versus human and adult mouse intestinal cells [37]. Ross et al. identified a large number of potential TFBSs in the 5 flanking sequences of SRY gene in human and bovine, including the TFBS for VTATA in the TATA-binding protein factors family with the matrix similarity score of 0.923 in bovine [38]. The NFAT family consists of five members, NFAT1-NFAT5, first found in T lymphocytes as an inducible nuclear factor that could bind to the IL-2 promoter and transactivate its expression [39]. NFAT proteins frequently cooperate with other transcriptional partners due to its weak DNA-binding capacity, and reports have shown the specific molecules that are up-/down-regulated by the different NFATs [40]. ERG regulates the expression of SMAD2/3 to cause endothelial-dependent liver fibrogenesis [41], and it promotes endothelial homeostasis via the regulation of lineage-specific enhancers and super-enhancers [42]. These data indicate that these five transcription factors can regulate the expression of target genes by binding the TFBSs. Therefore, we speculated that the g.35735477C>T could affect the expression of NUCB2 by the coactions of transcription factors ELF3, CDX2, VTATA, NFAT, and ERG. However, the further functional validation should be performed to verify the regulatory roles of NUCB2 and its polymorphisms on milk traits in dairy cattle.

Conclusions
This is the first study to demonstrate the genetic associations of SNPs of NUCB2 gene with milk production traits in dairy cows, and to show that g.35735477C>T might change the TFBSs of NUCB2.
Our findings provide valuable molecular information for dairy cattle breeding, and the significant SNPs could be put into the SNP chip for genomic selection to improve the milk production in dairy cows.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/6/449/s1, Table S1: Descriptive statistics of the phenotypic values for milk production traits. Table S2: Primers and procedures for PCR and qPCR used in SNP identification. Table S3: Additive, dominant and allele substitution effects of SNPs on milk production traits in Chinese Holstein.