Promising Loci and Genes for Yolk and Ovary Weight in Chickens Revealed by a Genome-Wide Association Study

Because it serves as the cytoplasm of the oocyte and provides a large amount of reserves, the egg yolk has biological significance for developing embryos. The ovary and its hierarchy of follicles are the main reproductive organs responsible for yolk deposition in chickens. However, the genetic architecture underlying the yolk and ovarian follicle weights remains elusive. Here, we measured the yolk weight (YW) at 11 age points from onset of egg laying to 72 weeks of age and measured the follicle weight (FW) and ovary weight (OW) at 73 weeks as part of a comprehensive genome-wide association study (GWAS) in 1,534 F2 hens derived from reciprocal crosses between White Leghorn (WL) and Dongxiang chickens (DX). For all ages, YWs exhibited moderate single nucleotide polymorphism (SNP)-based heritability estimates (0.25–0.38), while the estimates for FW (0.16) and OW (0.20) were relatively low. Independent univariate genome-wide screens for each trait identified 12, 3, and 31 novel significant associations with YW, FW, and OW, respectively. A list of candidate genes such as ZAR1, STARD13, ACER1b, ACSBG2, and DHRS12 were identified for having a plausible function in yolk and follicle development. These genes are important to the initiation of embryogenesis, lipid transport, lipoprotein synthesis, lipid droplet promotion, and steroid hormone metabolism, respectively. Our study provides for the first time a genome-wide association (GWA) analysis for follicle and ovary weight. Identification of the promising loci as well as potential candidate genes will greatly advance our understanding of the genetic basis underlying dynamic yolk weight and ovarian follicle development and has practical significance in breeding programs for the alteration of yolk weight at different age points.


Introduction
Chicken egg yolk is an emulsion of water (48%), lipids (33%), and proteins (17%) [1]. Because it serves as the cytoplasm of the oocyte and provides a large amount of reserves, egg yolk functions biologically to provide the above-mentioned nutrients to the developing embryos [2],and yolk can accumulate significant amounts of IgY immunoglobulins (up to 100 mg per egg) to provide innate immunity to the embryos [3]. Egg yolk is widely used in the food industry for its high nutritional value to humans [4]. Furthermore, the bioactive substances of egg yolk are applied in the pharmaceutical and cosmetics fields for their binding properties, emulsion stability, and natural antioxidants [5][6][7].
The central area of the chicken ovary is composed of a vascularized medulla and a cortex containing the small follicles that are oocytes covered by follicular epithelium [2], and egg yolk is formed in these ovarian follicles by the consecutive deposition of lipids and proteins [8]. The sequential development of oocytes in ovaries leads to the display of a hierarchy in the follicles with four to six yolky follicles of gradually increasing size at the surface. Yolk precursors, however, are not synthesized in the ovary but are produced by the liver and then transported in the blood to the ovarian oocytes [2,9]. Vitellogenin, consisting of one phosvitin and two lipovitellins, is the main carrier for protein transportation from the liver to the ovary in the blood [10]. The lipid carrier is very low-density lipoprotein (VLDL), which has a standard structure consisting of a core of triglycerides and cholesterol esters surrounded by a surface layer composed of phospholipids, cholesterol, and apoproteins [11]. Yolk precursors (vitellogenin and VLDL) are transported in the follicular wall and are released near the basolateral membrane of the follicles. Then the penetration of these precursors is ensured through a process of endocytosis induced by the receptor LR8 for the deposition of yolk [12,13].
Due to the wide utilization of egg yolk, many efforts have been performed to alter egg yolk weight [14]. However, egg yolk weight is a complicated quantitative trait affected by many factors, such as breed and hen age [14]. The yolk weight is increased with the age of the laying birds; for eggs of the same size, old hens produce larger egg yolks than young hens, and the albumen weight is correspondingly decreased [15]. The strategy of identifying the quantitative trait loci (QTLs) or causal genes that are related to yolk formation and ovarian follicle development is a powerful tool to illustrate the genetic control for yolk weight and follicle development. A decade ago, microsatellite markers were employed to detect the causal regions associated with yolk weight, and multiple QTLs were reported [16][17][18][19]. Until now, however, only seven QTLs (distributed on chromosomes 4, 6, 9, 11, 15, 22, and 26) that relate to yolk weight had been identified. And these QTLs, which have poor repeatability and wide positional confidence intervals, were deposited in the AnimalQTL Database (http://www.animalgenome.org/cgi-bin/QTLdb/ index). In recent years, genome-wide association studies (GWASs) have been utilized to identify the associations between genomic loci and phenotypes with relatively high-density single nucleotide polymorphism (SNP) arrays in chickens [17]. Recently, with the rapid advance in nextgeneration sequencing technologies, large numbers of SNPs have been discovered in chickens [20]. The development of the Affymetrix 600K Chicken SNP array allows further efficient screening for causal loci and genes with relevance to target traits.
In the current study, we conducted GWA analysis on the yolk weight at 11 time points from the onset of egg laying to 72 weeks utilizing 600K high-density SNP arrays in an F 2 chicken population to explore the associated genomic loci and genes that contribute to the dynamic change in the yolk weight with the aging process. Furthermore, the GWA analysis was also performed on ovary and follicle weights at 73weeksof age to detect the causal genes related to follicle and ovary development. This study lays the foundation for an investigation into the genetic control of yolk and follicle development.

Ethics statement
All protocols and procedures involving animals were performed in accordance with the Guidelines for the Care and Use of Experimental Animals established by the Ministry of Agriculture of China (Beijing, China). All animal work was approved by the Animal Welfare Committee of China Agricultural University (permit number: SYXK 2007-0023).

Resource population
Reciprocal crosses between standard breed White Leghorns (WL) and Dongxiang chickens (DX), an indigenous Chinese strain, were utilized to develop an F 2 resource population. For parents, six WL and six DX males were initially mated with 133 DX and 80 WL females, generating 1,029 and 552 chicks for the F 1 generation, respectively. Then 25 males and 407 females from the WL/DX cross and 24 males and 235 females from the DX/WL cross in the F 1 generation were used to produce the F 2 population, which consisted of 3,749 chicks in a single hatch, originating from 49 half-sib and 590 full-sib families. The hens were kept in individual cages of the same environment with food and water adlibitum at the Jiangsu Institute of Poultry Sciences. Finally, 1,534 hens from 49 half-sib families and 550 full-sib families with sufficient phenotypic and pedigree information were selected for SNP genotyping. All layers were caged individually and subjected to a light/dark cycle of 16h of light and 8h of darkness (16L:8D)and raised in the same environment with feed and water adlibitum.

Phenotypic measurements
Yolk weight (YW) was measured for the first egg of each hen and again at 32,36,40,44,48,52,56, 60, 66, and 72 weeks of age, consisting of 11 age points. Except for the first egg, fresh eggs were collected for four successive days to ensure two eggs per hen at each age point. Eggs were broken to obtain the internal contents, and then the yolk was separated from the albumen and weighed by electronic scale. All hens were humanely sacrificed by 60%-70% carbon dioxide at 73weeks of age, the follicles and ovaries were then separated and weighted for follicle-active hens. FW includes the weight of all follicles that were contained in the ovary, and OW indicates the weight of the follicle-free ovary, which consisted of a highly vascularized central area (medulla) and a peripheral area (cortex). Descriptive statistics were calculated with R project (R version 3.1.2) using all available records. The function of 'rntransform' in the GenABEL package of R project was used for the rank-based inverse normal transformations [21].

Genotyping and quality control
Genomic DNA was extracted by the standard phenol/chloroform method and genotyped with the Affymetrix 600K Axiom Chicken Genotyping Array (Affymetrix, Inc. Santa Clara, CA, USA). Affymetrix Power Tools v1. 16.0 (APT) software was then used for quality control (QC) and genotype calling. Specifically, only samples with dish quality control (DQC) >0.82 and a call rate>97% were included in the subsequent analyses. An R script supplied by Affymetrix was run to compute the SNP QC metrics and filter out individual SNPs falling below given thresholds. After these QC steps, 1,512 individuals and 532,299 SNPs remained valid. Because the current statistical methods are more powerful for the detection of the associations between phenotypes and autosomal genotypes, all SNPs on sex chromosomes were excluded in the QC process. The PLINK v1.90 package [22] was then used for further quality control to improve the detection power, in which SNPs with a minor allele frequency (MAF)<5% and Hardy-Weinberg equilibrium (HWE) test P< 1 × 10 −6 were removed from the downstream analysis. Some sporadic missing genotypes were imputed using the BEAGLE v4.0 procedure [23], and only SNPs with an imputation quality score R 2 >0.5 were retained. Ultimately, up to 1,512 individuals and 435,867 SNPs were valid for the GWA analysis.
Estimation of heritability and contribution to phenotypic variance A univariate restricted maximum likelihood (REML) estimation was implemented in the GCTA v1.24 program [24] to estimate the heritability explained by the eligible SNPs (h 2 snp ). We also quantified the pair-wise genetic and phenotypic correlations for each trait at multiple time points with a bivariate mixed model [25]. A genetic relationship matrix (GRM) derived from all genotyped SNPs on autosomes and two linkage groups was created, and the top three PCs calculated by the GCTA tool were included as covariates to account for the potential population structure. We then estimated the contribution to phenotypic variance (CPV)made by these associated loci or genomic regions.

Genome-wide association analysis
The principal component analysis (PCA) implemented in the PLINK package was conducted prior to genome-wide association (GWA) analysis to eliminate spurious associations resulting from the presence of cryptic relatedness or hidden population stratification. To properly decide the thresholds for genome-wide significant/suggestive associations, we used the simpleM method [26] to correct the number of multiple tests. With this approach, an effective number of 59,308 independent tests was suggested; hence, the genome-wide significant and suggestive P values were 8.43 × 10 −7 (0.05/59,308) and 1.69 × 10 −5 (1.00/59,308), respectively.
A univariate association test equipped with an exact mixed model approach in GEMMA v0.94 software [27] was performed with the valid individuals and SNPs. A centered relatedness matrix was calculated with the independent SNPs, and then the derived Wald test P value was calculated for the significance level between the SNPs and phenotypes. The univariate linear mixed model as follows: For the equation, y is an n × 1 vector of phenotypic values for n individuals, W is an n × 5 matrix of covariates (fixed effects, i.e., top three PCs, sire and dam effects) including a column vector of 1, α is a 5 × 1 vector of corresponding coefficients including the intercept, x is an n × 1 vector of marker genotypes at the locus tested, β is the corresponding effect size of the marker and all effects are reported for the minor allele in each marker, u is an n × 1 vector of random polygenic effects with a covariance structure as u~N(0, KV g ), where K represents a known n × ngenetic relatedness matrix derived from SNP markers and V g is the polygenic additive variance, and ε is an n × 1 vector of random residuals with ε~N(0, IV e ), where I is an n × n identity matrix, and V e is the residual variance component. We used the Wald test statistic F Wald ¼b 2 =VarðbÞ for each SNP to test the null hypothesis β = 0, where the best linear unbiased estimate (BLUE) of β and the corresponding sampling variance VarðbÞ are obtained by solving the mixed model equations (MME) based on estimated V g and V e .
The Manhattan plots and quantile-quantile (QQ) plots were drawn by the "gap" packages [28] in R project. The genomic inflation factor λ, which is the judgment of the extent of false positive signals, was also calculated with the function of estlambda implemented in the GenA-BEL package [21] in R project.

Linkage disequilibrium analysis
Notably, many SNPs maybe passively significantly associated with target traits, resulting from their linkage to a strongly causal mutant. In general, GWAS does not distinguish a genuine causal locus from those that are statistically significant loci within a strong linkage disequilibrium (LD) region. Therefore, in order to characterize the potential candidate genes responsible for a trait, we conducted an LD analysis and inferred the haplotype blocks containing peak SNPs by Haploview v4.2 [29]. A block is derived using the solid spine algorithm, and defined as that the first and last SNPs in a region that is strong in LDs (D 0 0.8) with all intermediate SNPs.

Gene identification
Variant Effect Predictor (VEP) and BioMart tools based on the galGal4 assembly supported by Ensembl were used for the identification of candidate genes in which significant loci were located [30] and for the detection of the genes in a given genomic region [31].

Phenotype description and genetic parameters
Descriptive statistics for egg yolk weights at 11 age points and follicle and ovary weights at 73 weeks were presented in Table 1. We observed a curvilinear increase in yolk weights with advancing hen age from the onset of egg laying to 66 weeks and a slight decrease at 72 weeks. Follicle and ovary weights were measured for 1,429 follicle-active hens (1,508 slaughtered in total) at 73 weeks old. These two traits exhibited large coefficients of variation, which may be due to the different oviposition intervals for old hens. All phenotypic values conformed to the normal distribution after rank-based inverse normal transformation.
We quantified the additive genetic variation in liability to yolk weights at the different ages captured by eligible GWAS markers. Univariate GCTA analyses revealed that all yolk weights had moderate heritable patterns (Table 2), and the highest SNP-based heritability estimate was found in YW40 (h 2 = 0.38). Moreover, bivariate GCTA analyses indicated that yolk weights at the various ages exhibited highly and positively correlation, especially for yolk weights at neighboring time points. At the beginning of the entire laying stage, the yolk weight of the first egg (FYW) showed slightly lower genetic correlations with the yolk weights at the following ages (r g <0.60), compared with those among yolk weights from 32 to 72 weeks of age (r g >0.80). Notably, the follicle weight at 73 weeks had low phenotypic (0.02 to 0.17) and genetic (-0.003 to 0.52) correlation with yolk weights at the 11 age points. Similarly, the ovary weight also poorly correlated with yolk weights, and the phenotypic and genetic correlation coefficients ranged from 0.02 to 0.13 and -0.02 to 0.40, respectively. However, the genetic correlation between ovary weight and follicle weight was relatively high (0.86).

Identification of candidate loci by a genome-wide association study
We conducted separate association tests using a univariate model for egg weights at each age point and follicles and ovary weight at 73weeks. The Manhattan plots and the Quantile-Quantile (Q-Q) plots of the yolk weight of first eggs (FYW) are shown as an example in Fig 1, and the other traits are shown in S1-S12 Figs. The detailed messages for genome-wide significant SNPs are listed in Tables 3 and 4, and the descriptions for suggestive significant SNPs are shown in S1 Table. The CPVs of the loci were estimated by a tool of GCTA and ranged from 2.40 to 3.47% for yolk weights (Table 3) and from 2.11 to 3.76% for ovary weights (Table 4).
Follicle weight. Three SNPs located on chromosome 1 (GGA1) were significantly associated with follicle weight (FW). Detailed SNP messages are presented in Table 3. These three significant SNPs were all identified in the intron of annotated genes. Two SNPs, rs14920313 and rs14920355, were located in the doublecortin-like kinase 1 (DCLK1) gene, and rs313140585 was identified in the neurobeachin (NBEA) gene. In addition, two other genes, spastic paraplegia 20 (SPG20) and StAR-related lipid transfer domain containing 13 (STARD13), were found near the significant SNPs.
Ovary weight. Thirty-one SNPs showed significant association with ovary weight ( Table 4), seven of which were located within a 700-kb region (169.01-169.71Mb) on GGA1. Furthermore, the other 24 SNPs were located in an 800-kb region spanning from 1.6 to 2.4 Mb on GGA28. Of the loci in GGA1, three were located in annotated genes, including NIMArelated kinase 5(NEK5), NIMA-related kinase 3(NEK3), and cytoskeleton associated protein 2 (CKAP2). Twelve of the 24 SNPs in GGA28 were located in known genes, and the remainders were located 0.4 to 10 kb away from the nearest known genes, which included 19 annotated genes (Table 4).
Considering the large number of significant SNPs (24 loci) in a narrow region (800kb) in GGA28, we speculated that strong LD exists in this region. Hence, the LD analysis was performed in the genomic region from 1.60 to 2.40 Mb, which contained 744 SNPs, and 76 small scale blocks were observed in this region (S2 Table). Eight out of the 24 significant SNPs fell into six blocks, and the remaining 18 SNPs were not located in any of these blocks (Fig 2 and  S2 Table). The results indicated that the significant loci in GGA28 were in poor LD with each other and that the loci may play independent roles.

Discussion
The population used for this study was anF 2 cross population, which could maximize the differences of the traits and increase the power to identify the QTLs for traits that differed between the breeds. Compared with previous association analyses in chickens, our analysis used a higher density (600K) SNP array covering chromosomes 1-28 and two unassigned linkage groups. In addition, this F 2 population consisted of 1,512 hens, which is the largest population used for GWA analysis of egg traits so far, and therefore, the novel genomic region and loci revealed by the current study should be accurate and reliable. Several previous studies detected causal genomic regions or genes that controlled yolk weight using GWA analysis [17,18]. However, these studies utilized phenotypes from limited age points, assuming that the genetic architecture underlying this trait does not vary with the aging process. In the current study, yolk weights of 11 age points from the onset of egg laying to 72 weeks of age were   employed to detect the dynamic genetic control for yolk weights. In addition, this study was the first GWA analysis for ovary and follicle weight in chicken.

Yolk weight
By performing GWA analysis, we identified 11 associations with yolk weight at six age points at genome-wide significance levels (P<8.33 × 10 −7 ). These significant loci, which occurred on GGA 1, 3, 5, 17, and 28, had no overlap with previously reported QTLs. This might due to the specially designed F 2 crosses used in this study. In addition, the significant loci for each age point varied completely, which was notable because it indicated that the genetic control for yolk weight was age-dependent. However, the genetic correlations among yolk weights at each age points were relatively high ( Table 2), suggesting certain consistent genetic variants existed to control yolk weight along with the aging process. This contradiction might be explained by the existence of suggestive significant common genomic regions or loci that associated with yolk weight at different age points (S1 Table). In contrast, the 11 genome-wide significant SNPs were considered more powerful and specific loci for yolk weight. These loci were located near or in 16 annotated genes. One candidate gene with a biologically plausible function is zygote arrest 1(ZAR1), which is downstream (2,140bp) of rs313116098, an SNP significantly associated with YW48.ZAR1 is considered to be an oocyte-specific maternal-effect gene [32] and is thought to function in the initiation of embryogenesis in many vertebrate species including humans, pigs, cattle, sheep, mice, rats, and frogs [33]. Michailidis et al. (2010) revealed that this gene was preferentially expressed in chicken oocytes, ovaries, testes, and embryos during embryonic development [34]. This gene is related to the development of follicular oocytes; however, its involvement in yolk weight is not clear.
Three genes, stonin 2 (STON2), galactosylceramidase(GALC), and potassium channel, voltage-gated modifier subfamily G, member 3 (KCNG3) have functions related to nerve regulation. In fact, nerve networks have close relationships with ovarian follicle development. Specifically, the maturity and ovulation of the yolky follicle is profusely innervated by both adrenergic and cholinergic fibers [35]. STON2encodes a protein that participates in synaptic vesicle recycling through interaction with synaptotagmin 1, which is required for neurotransmission [36]. GALC encodes galactosylceramidase, which is related to globoid cell leukodystrophy in humans [37]. KCNG3encodes for the voltage-gated potassium (Kv) channels, which regulate neurotransmitter release, insulin secretion, and neuronal excitability [38]. Neurons are mainly present within the thecal layers of the largest follicles. These neurons provide numerous neurochemicals (catecholamines, neurotrophins, vasoactive intestinal peptide) to the follicle [39]. Therefore, the differences among yolk weights of first eggs may be partly led by the variable maturity of the sequential follicles that are regulated by the nerve networks. The remaining screened genes have no direct relevance with the yolk deposition orovarian follicle development, and this maybe due to insufficient knowledge about these genes in chickens.

Follicle weight
Follicles that connect to the ovary by pedicles are the sites for yolk formation. Hence, the mass of ovarian follicles represent the capacity of sequential yolk production of laying hens. However, the phenotypic or genetic correlation coefficients between follicle weight and yolk weight at each age point were all very low, even with the yolk weight of 72 weeks of age. This indicated that the genetic architecture might be disparate for single yolk weights and the mass of existing ovarian follicles at a certain age.
Three loci (rs14920313, rs14920355, and rs313140585) in GGA1 were significantly associated with follicle weight, and two candidate genes, SPG20 and STARD13, harbored or were near these loci. SPG20 encodes a protein containing an MIT (microtubule interacting and trafficking molecule) domain and can cause protein translocation to the plasma membrane when stimulated with epidermal growth factor (EGF) [40], whereas EGF, as one of the main intra-ovarian hormones produced by the germinal disc and granulosa cells, can stimulate growth and reduce atresia in follicles [41,42]. STARD13 encodes a protein possessing C-terminal STAR-related lipid transfer domain, which is a classical type of lipid transporter [43,44]. In mammals,15proteins, STARD1-STARD15, possess a START domain, and cholesterol, 25-hydroxycholesterol, phosphatidylcholine, phosphatidylethanolamine, and ceramides are ligands for STARD1/ STARD3/STARD5, STARD5, STARD2/STARD10, STARD10, and STARD11, respectively [43]. In the current study, the STARD13 gene was identified with follicle weight, indicating its crucial role in lipid metabolism in the process of yolk deposition, an association that needs further investigation.

Ovary weight
The SNPs that were significantly associated with the follicle-free ovary weight were distributed on two chromosomes, GGA1 and GGA28. The loci on GGA1 (169.01-169.70Mb) were upstream from and only 3 Mb away from the loci (172.06-172.45Mb), which were related to follicle weight, whereas the candidate genes in these two regions were completely different. Six genes containing these loci of interest or located near these loci in GGA1 were identified. However, only one gene hasa function plausible for ovarian development. This gene is DHRS12, a gene that encodes the enzyme of the short-chain dehydrogenases/reductases (SDR) family, which metabolizes many different compounds, including steroid hormones [45]. The ovarian steroid hormones include estrogen, testosterone, and progesterone; these are synthesized by follicular interstitial and granulosa cells, and estrogen can stimulate the liver for the synthesis of the yolky lipid and protein precursors [2]. Hence, DHRS12is closely related to ovarian development.
For the loci in GGA 28, the most significant SNP, rs316105069(P = 1.7 × s3 -10 ), was located in two genes, including alkaline ceramidase 1 (ACER1b) andacyl-CoA synthetase bubblegum family member 2 (ACSBG2). These two genes are closely related to the biological process of yolk formation, such as lipoprotein synthesis, lipid droplet promotion, and lipid carrier. ACER1can catalyze the hydrolysis of very long chain ceramides to sphingosine, while sphingosine is absorbed and converted to palmitic acid and then acylated into chylomicron triglycerides (TGs), the materials needed for the synthesis of lipoproteins [46]. Another gene, ACSBG2, encodes proteins belonging to the acyl-CoA synthetase family, which can channel lipids into nascent lipid droplets in mammals [47]. Claire et al. (2013) demonstrated that the ACSBG2 gene is significantly associated with abdominal fat deposition and exhibited functions related to lipid metabolism in chickens [48].

Conclusions
In summary, the current study reports for the first time a GWA analysis on ovary and follicle weights in chickens. Our results revealed 12, 3, and 31 genome-based significant SNPs for yolk, follicle, and ovary weight, respectively, by GWA analysis with a 600K high-density SNP array. The GWA analysis for yolk weights at multiple age points suggested that the genetic control for yolk weight is age-dependent. A list of candidate genes such as ZAR1, STARD13, ACER1b, ACSBG2, and DHRS12 were identified for their plausible function in yolk and follicle developments. Our findings establish a foundation for follow-up studies and create a better understanding of the molecular controls involved in the development of the ovary and its hierarchy of follicles.