Genome-Wide Association Analysis Reveals Key Genes Responsible for Egg Production of Lion Head Goose

The lion head goose is one of the most important agricultural resources in China; however, its breeding process is relatively slow. In the present study, a genome-wide association study was performed for the genetic selection of egg production characters in lion head geese. We detected 30 single-nucleotide polymorphisms located in or near 30 genes that might be associated with egg production character, and quantitative real-time polymerase chain reaction was used to verify their expression level in lion head geese. The results showed that the expression levels of CRTC1 (encoding CREB-regulated transcription coactivator 1), FAAH2 (encoding fatty acid amide hydrolase 2), GPC3 (encoding glypican 3), and SERPINC1 (encoding serpin family C member 1) in high egg production population were significantly lower than those in the low egg production populations (*P < 0.05). The expression levels of CLPB (encoding caseinolytic peptidase B protein homolog), GNA12 (encoding guanine nucleotide-binding protein subunit alpha-12), and ZMAT5 (encoding zinc finger, matrin type 5) in the high egg production population were significantly higher than those in the low egg production populations (*P < 0.05). The expression of BMP4 (encoding bone morphogenetic protein 4), FRMPD3 (encoding FERM and PDZ domain containing 3), LIF (encoding leukemia inhibitory factor), and NFYC (encoding nuclear transcription factor Y subunit gamma) in the high egg production population were very significantly lower than those in the low egg production population (**P < 0.01). Our findings provide an insight into the economic traits of lion head goose. These candidate genes might be valuable for future breeding improvement.


INTRODUCTION
The lion head goose is named for the sarcoma that makes it resemble a lion's head from the front. Lion head geese provide great economic benefits via the widespread consumption of their meat as stewed products (Zhuang and Lin, 2006). Lion head geese, originating from Shantou Raoping in Guangdong province, are the only large and major goose species in China, and are the germplasm resources under special state protection (Chen et al., 2011). Lion head geese, whose ancestors are anser cygnoides, are herbivorous animals showing fast growth and large body size; as such, their feeding is relatively environment-friendly (Huang, 2009;He et al., 2012). However, they show low fertility with an average of 20-25 eggs per year (Zhang, 1991). The egg-laying period is not continuous but is divided into three parts caused by its strong broodiness (Wang, 2007). A lower laying rate may hinder the development of the lion head goose industry. Therefore, based on maintaining the characteristics of its original breeds, improving the low fecundity has become an important breeding objective of lion head goose, among which the egg laying characteristics are one of the most significant aspects (Wang et al., 2008). It is believed that egg production could be improved by adopting a modern genome-enhanced breeding scheme.
Genome-wide association studies (GWASs), which were proposed first by Risch in 1996, are powerful and effective tools to identify genetic markers associated with the trait of interest (Risch and Merikangas, 1996). In recent years, a large number of GWASs on human diseases have been published, such as for vitiligo (Shen et al., 2016) and for livestock animals such as pigs (Luo et al., 2012). Since the development of the HapMap Project, a number of high-density single-nucleotide polymorphism (SNP) chips for plant and animal species like chicken, swine, cattle, sheep, and the like have been developed as well (Gibbs et al., 2003). Hoglund et al. found that a total of 17,388 significant SNP markers and candidate genes associated with female fertility were distributed on 25 chromosomes in the Nordic Red cattle group (Hoglund et al., 2015). Shen et al. carried out GWAS on Ningdu Sanhuang chickens with Chicken chip and found the candidate gene, GARNL1, which was related to reproductive traits (Shen et al., 2012). Xie used the Illumina Porcine SNP60K chip to screen the potential candidate genes that may be associated with the litter size of the Xiang Pig (Xie, 2016). The production of SNP chips and the appearance of highthroughput sequencing technology have made GWAS an important research strategy in some fields. GWAS is widely accepted as a primary method for gene detection (Jiang et al., 2010).
In the present study, we performed GWAS to identify SNPs and potential genetic variants that may be associated with egg laying character of the lion head geese. Then we attempted to verify their functionality. As a result, we have identified certain genes that might play important roles in the egg laying process.

Animals Resources and Sample Collection
Lion head geese are the largest goose breed in China and are the one of the world's big goose species. In the past 2 years, we have bred a batch of lion head geese with high and low egg production in the Shantou Baisha Research Institute of Original Species of Poultry and Stock, Guangdong Province. These geese have the same growth environment and nutritional supplements, and they have free access to food and water.
A total of 217 geese blood samples were collected at the Shantou Baisha Research Institute of Original Species of Poultry and Stock, including 136 high egg-production geese (more than 35 eggs per year) and 81 low egg-production geese (less than 25 eggs per year). Blood samples were stored at an ACD anticoagulant tube at −80°C cryogenic refrigerator for further experiments.

DNA Extraction and Whole Genome Sequencing
Genomic DNA was extracted from peripheral blood cells of the high and low egg production groups using a HiPure Blood DNA Mini Kit (Magenbio, Guangzhou, China). After passing the quality inspection of NanoDrop 2000 Spectrophotometer (Thermo, America), the DNA samples were sent to Beijing Genomics Institute (Shenzhen, China) for whole genome resequencing. An Easy DNA Library Prep Kit (MGI, Shenzhen, China) was used to carry out the double-enzyme digestion to construct six libraries, re-sequenced using the BGISEQ-500RS platform with an average 12× sequencing depth and coverage of 8%.
Given the large number of scaffolds, scaffolds were combined into 21 chromosomes. The ordered SNP loci were separated into the 21 artificial chromosomes per 50 million base pairs (i.e. 1-50 Mbps, 51-100 Mbps etc.). Principal component analysis (PCA) was performed to identify genetic variation and the population structure.

Phenotypic Data
Descriptive statistics of phenotypic data were carried out by SPSS 22 software (IBM Corp., Armonk, NY, USA), and the sample size, maximum, minimum, average and standard deviation of high and low egg production samples were calculated.

Statistical Analysis
Genome wide association studies was performed using the EMMAX software with egg production character classified by dichotomies, i.e., the data was divided into high and low egg production (Kang et al., 2010). The analysis model was as follows: where P is the vector of phenotypes of the individuals, m is the intercept of a straight line, Z is the incidence matrix of random polygenic effects, a is the random polygenic effects, SNP is the effect of a single nucleotide polymorphism, and e is the vector of residual errors with e~N (0, Is e ), where I is the identity of matrix and s e is the residual variance.

Multiple Hypothesis Testing and Correction
The tested SNP markers could be used to make a Bonferroni adjustment with the 5% GWAS-wide significance level: where a is the GWAS-wide significance level, n is the number of all tested SNP sites. In order to reduce false negative, and then extending the threshold 20 times as the suggestive value.

Population Stratification
Population stratification refers to the existence of subpopulations with different allele frequencies, which may pose a great threat to the validity of GWAS results, and even leads to false-positive results. The quantile-quantile plot (Q-Q plot) was used to assess the GWAS results, to judge whether the P-value calculated by SNP correlation analysis deviated from the hypothesis test on the whole overall.

Quantification of Candidate Genes
To observe whether the candidate genes were differentially expressed in the high egg production group compared with the low group, we performed quantitative real-time polymerase chain reaction (RT-qPCR) for these genes. Total RNA was extracted from PBCs using the TRIzol reagent, and synthesized into cDNA using a Reverse Transcription Kit (Takara, Shiga, Japan). The cDNA was then used as a template for RT-qPCR using the CFX96 Touch (Bio-Rad, Hercules, CA, USA). The RT-qPCR primer sequences were synthesized by Sangon Biotech (Guangzhou, China) and were stored at −20°C for later use. According to the instructions of 2× SYBR Green qPCR Master Mix kit (Bimake, Houston, TX, USA), the RT-qPCR reaction was performed in triplicate and uses comprises 20 ml, containing 10 ml of 2× SYBR Green qPCR Master Mix, 0.4 ml of ROX Reference Dye, 1 ml of cDNA template, and a 0.5 mM concentration of specific primers. Thermal cycling parameters were as follows: 95°C for 5 min; 40 cycles of 95°C for 15 s, 60°C for 30 s, and 72°C for 30 s and 1 cycle of 95°C for 15 s, 60°C for 60 s, and 95°C for 15 s. Relative mRNA expression levels were calculated using the 2 −DDCt method and normalized using the expression of GAPDH [encoding glyceraldehyde-3-phosphate dehydrogenase, (Livak and Schmittgen, 2001)]. All the primers for RT-qPCR are shown in Table S1.

Sample Phenotypic Data Statistics
The egg production performance of lion head goose was divided into a high production group (>35 eggs per year) and a low production group (<25 eggs per year).
The phenotypic data of low egg production was graphically recorded in Figure 1A and high egg production was in Figure  1B. The sample size, maximum, minimum, average and standard deviation of the trait measured in the current experiment were presented in Table 1 and the boxplot is shown in Figure 1C. The sample size, maximum, minimum, average and standard deviation of the high egg production group were 136, 63, 35, 46, 54, while that in the low egg production group were 81, 25, 8, 17, 21, respectively. The annual egg production records for each individual are shown in Table S2.

Sequencing Data Statistics
Aligning the clean reads to the reference sequence allowed us to statistically analyze the sequencing depth, coverage rate, mapping rate, and mismatch rate, as shown in Table 2. Based on 217 original high egg production and low egg production samples, 8 were excluded because of mismatch, leaving 209 samples (131 high egg production, 78 low egg production). And the average of sequencing depth, coverage rate, mapping rate, and mismatch rate are 12.05%, 7.56%, 91.31%, and 1.48%, respectively.

Genetic Variation and Population Structure
To determine data validity and population structure, PCA was performed based on the variation of the sequence data, taking principal component 1 as the horizontal and principal component 2 as the ordinate (Figure 2). The differences among individuals in each group were small, having high similarity. However, the dispersion between the high and low egg production groups was large, showing obvious population differentiation and indicating that there was a great difference between the two groups.

Significant Single-Nucleotide Polymorphisms and Population Stratification Assessment
The PCA results were used as covariates and EMMAX was used for the GWAS analysis. In Figure 3, chromosomes 1-21 are shown separately with different colors. The corresponding horizontal lines indicated the 5% GWAS-wide significance levels and the threshold was expanded 20 times as a second suggested value. The results are shown in Figure 3.
The Q-Q plot showed that the screened SNPs were located above the diagonal line, indicating that the analytical model is reasonable. And the significantly higher points located at the top right corner of the graph represented potential candidate molecular markers associated with the trait (Figure 4).

Candidate Genes Analysis
To determine whether the candidate genes were differentially expressed in the high and the low egg production group, we performed RT-qPCR on these genes. The expression levels of BMP4 (encoding bone morphogenetic protein 4), FRMPD3 (encoding FERM and PDZ domain containing 3), LIF (encoding Leukemia inhibitory factor), and NFYC (nuclear transcription factor Y subunit gamma) in the high egg production population were significantly lower than those in the low egg production population (**P < 0.01). The expression levels of CRTC1 (encoding CREB-regulated transcription coactivator 1), FAAH2 (encoding fatty acid amide hydrolase 2), GPC3 (encoding glypican 3), and SERPINC1 (encoding serpin   family C member 1) in the high egg production population were significantly lower than those in the low egg production population (*P < 0.05). The expression levels of CLPB (encoding caseinolytic peptidase B protein homolog), GNA12 (encoding guanine nucleotide-binding protein subunit alpha-12), and ZMAT5 (encoding zinc finger, matrin type 5) in the high egg production population were significantly higher than those in the low egg production population (*P < 0.05, Figure 5A). The expression levels of the remaining genes (see Figure 5B for their symbols) were not significantly different between the two groups ( Figure 5B).

DISCUSSION
The Significance of Studying the Lion Head Goose The lion head goose is the largest meat goose currently bred in China. It is characterized by a large size, crude feed tolerance, fast growth, high forage reward, strong stress resistance, and has a delicious meat that has extremely high economic value and is deeply favored by consumers (Zhuang and Lin, 2006). Therefore, an in-depth study of the breeding problem of lion head geese will help to modernize the industry to meet market demand.  In this study, the high and low egg production populations selected in a previous study were analyzed, and the number of eggs per year was used as the parameter to carry out GWAS. The phenotypic records of the egg laying character in this research were normally distributed, which was consistent with the separation characteristics and population separation characteristics.

Application of Genome-Wide Association Study
In this study, we performed a GWAS for the egg production trait of a lion head geese population. Genomic studies have been carried out for many agricultural animals, such as chickens, swine, sheep, cattle, and geese, however, few of them have studied regionally important economic species such as lion head geese in China. To the best of our knowledge, this is the first GWA studies for the egg production character of lion head geese. Currently, database such as NCBI and Ensembl contain few reported goose sequences, which need to be further verification. FIGURE 4 | Quantile-quantile (Q-Q) plot of genome-wide association results for egg production. The blue points represent SNPs, and the red points represent the most significant SNPs. The research on the breeding of lion head geese has lagged behind that for other economically important species, which has led to many problems in the lion head goose breeding industry, such as backward breeding, and low productivity. With the development of genome-enhanced breeding and the improved the efficiency of genomic selection, it will be possible to protect and develop the breeding resources of the lion head goose, thus promoting the modernization and industrialization of the lion head goose industry.

Significance of This Research
Next generation sequencing technology based on highthroughput sequencing and molecular marker technology, enables the fine mapping of functional genes. Genome selection technology represents a new generation of molecular breeding technology for livestock and poultry. This technique has been successfully applied to the cultivation of sheep (Zhao and Zhang, 2019). The high egg production traits of the lion head goose breeding population, the identification of the SNPs, and the selection of functional genes for economic traits will lay the foundation for the development of genotyping technology for lion head goose breeding.

Detection and Verification of Key Genes
Bioinformatic analyses at the Ensembl and NCBI databases were used to identify the genomic location of SNPs that are significantly associated with the selected trait. Subsequently, bioinformatics and comparative genomics analysis were used to select key genes and make preliminary annotations on related gene functions.
In this study, a GWAS was conducted on the egg production trait of lion head geese, which detected 30 SNPs that were significantly associated with the high egg-production characteristic of lion head geese. We then screened the 30 genes that contained or were near, the SNPs.
In the present study, the expression levels of the BMP4, LIF, NFYC and FRMPD3 genes in the low egg production population of the lion head goose were significantly higher than those in the high egg production population.
BMP4 (bone morphogenetic protein 4), a member of the transforming growth factor beta (TGFb) superfamily of growth factors, was first characterized for its role in bone metabolism (Nilsson and Skinner, 2003). It was subsequently reported to be involved in the regulation of embryonic mesoderm formation, and the formation of primordial germ cells (Nilsson and Skinner, 2003). BMP4 mediates the formation of the mesoderm in mouse embryos, in which knockdown of BMP4 leads to death and neonatal malformation (Zhu et al., 2002). It was reported that BMP4 inhibits secretion of progesterone by granulosa cells and the expression of follicles in sheep and cattle (Monget et al., 2002;Da et al., 2018). Our results were consistent with these reports, i.e., BMP4 might negatively affect the egg production character of the lion head geese.
FRMPD3 (FERM and PDZ domain containing 3), located on the human X chromosome, is homologous to FRMPD4, which indicated that FRMPD3 might mediate significant functions related to excitability associated with neuronal migration abnormality; however, the functions of FRMPD3 have not been reported to be associated with poultry laying performance (Mardinly, 2013).
LIF (leukemia inhibitory factor), a secretory glycoprotein, is essential for the embryo implantation process in mice and humans (Aghajanova, 2004). Females lacking LIF are infertile, because their blastocysts cannot be implanted in the uterus, resulting in no clinical pregnancy (Steck et al., 2004). Our results would seem to conflict with those of previous reports, might reflect species inconsistency, or other, as yet unidentified factors. This result requires further verification and testing (Schofield and Kimber, 2005).
NFYC (nuclear transcription factor Y subunit gamma), a histone-fold domain-containing transcription factor, was identified in mice and humans as an oncogene required for the initiation and progression of tumors, and it engaged in chromatin remodeling (Tong et al., 2015). As far as we know, it has never been linked to reproductive function in any species.
The expression levels of the CRTC1, FAAH2, GPC3, and SERPINC1 genes in the high egg production goose population were significantly lower than those in the low egg production population.
CRTC1 (CREB-regulated transcription coactivator 1) is a transcriptional coactivator that has a biological function that affects energy balance and reproduction. Overexpression of CRTC1 in mice led to obesity and infertility (Altarejos et al., 2008). Breuillaud et al. showed that the CREB coactivator CRTC1 is indispensable for mouse fertility (Breuillaud et al., 2009). Our results were consistent with these reports, suggesting that CRTC1 plays a negative role in the laying trait of the lion head goose.
FAAH2 (fatty acid amide hydrolase 2), a member of the serine hydrolase family of enzymes, regulates several physiological processes, including appetite, inflammation, and various reproductive processes like secretion of gonadotropin-releasing hormone from the hypothalamus (Lunetta et al., 2015). FAAH2 may participate in negative regulation of egg laying.
GPC3 (glypican 3), a member of the heparan sulfate proteoglycans, has been widely studied as a target in human cancer, such as ovarian carcinoma. GPC3 mediates the synthesis of integral membrane proteins that interact directly with insulin like growth factor 2 (IGF2), which is considered to be an important growth factor in ovarian cancer (Ofuji et al., 2014;Wu et al., 2016). GPC3 induces apoptosis in ovarian cells, suggesting that it plays an important role in the development of ovarian cancer (Gonzalez et al., 1998). According to comprehensive research reports, we believe that low expression of GPC3 may promote egg laying in the lion head goose.
SERPINC1 (serpin family C member 1), is the main endogenous anticoagulant. Its mutations cause hereditary antithrombin deficiency and are associated with increased risk for all forms of pregnancy-related complications, which cause adverse pregnancy reaction (De la Morena-Barrio et al., 2012;Rogenhofer et al., 2014). Thus, SERPINC1 may encourage low egg production; however, its specific effects require to be further verified.
The expression levels of CLPB, GNA12, and ZMAT5 in the high egg production population were significantly increased compared with those in the low egg production population.
CLPB (caseinolytic peptidase B protein homolog) encodes an ATP-dependent chaperone. Disruption of CLPB is related to human congenital microcephaly and small birth weight (Capo-Chichi et al., 2015). Our results hint at a similar effect in geese. An increase in CLPB might lead to an increase in egg laying.
GNA12 (guanine nucleotide-binding protein subunit alpha-12), the a subunit of a heterotrimeric G protein, participates in cell transformation and embryonic development; is expressed in the cytoplasm of Leydig cells; and has the biological function of promoting the differentiation of cells and elongated sperm cells into mature sperm (Hu et al., 2008, Udayappan andCasey, 2017). Shen et al. showed that preeclampsia is associated with decreased methylation of GNA12 promoters (Shen et al., 2015). Thus, the expression of GNA12 might promote high egg production in the lion head goose.
For ZMAT5 (zinc finger, matrin type 5), there have been no reports of its effects on animal reproduction.

CONCLUSIONS
In this study, based on the breeding group of lion head goose, the blood DNA samples were collected to conduct a genomewide association study on egg production traits. Thirty SNPs related to egg-producing traits were identified, and thirty genes located in or near SNPs were screened. The selected key genes were verified using RT-qPCR. The BMP4, CRTC1, FAAH2, FRMPD3, GPC3, LIF, NFYC, and SERPINC1 genes might play a negative role in the egg production character of the lion head Goose. The CLPB, GNA12, and ZMAT5 genes might play a positive role in egg production character in the laying trait of the lion head goose. The ANTXR, CDH23, DDX49, ELOVL4, EPHB3, FBXL20, FGF9, FRY, GM2A, GPC4, HSD17B12, HTF3A, KCNAB2, LIMA1, NEXN, SLITRT6, SMG7, RXRA, and TMLHE genes might have no significant effect on egg production character of the lion head goose. These results require further verification and confirmation.
In the past few years, GWASs have devoted to the identification of key loci and genes related to the molecular breeding of livestock and poultry. These genes may provide novel target for hereditary approaches to improve breeding. Developments in this area will be exciting and will affect the future of genomic breeding. In view of the fact that most of these genetic connections are limited, a large number of sample studies are required in future investigations in order to detect these subtle variations.

DATA AVAILABILITY STATEMENT
The datasets PRJNA552198 for this study can be found in the NCBI (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA552198).

ETHICS STATEMENT
The use of animals in this study was approved by the South China Agricultural University Committee for Animal Experiments (approval ID: SYXK(Guangdong)2019-0136). All study procedures and animal care activities were conducted in accordance with the national and institutional guidelines for the care and use of laboratory animals. Written informed consent was obtained from the owners for the participation of their animals in this study.

AUTHOR CONTRIBUTIONS
QZ and QX are the principal investigator for this article and contributed to the concept and planning of the article, collection of the data, and reporting of the work described. QZ, JC, XZ, ZL, and QX contributed to the planning of the article, collection of the data, and reporting of the work described. ZX, HL, and WL are the other principal investigators for this article and contributed to the concept of the manuscript, planning of the article, collection of the data, and reporting of the work described. All authors contributed to drafting the article or revising it critically for important intellectual content.

FUNDING
This work was supported by the National Modern Agricultural Industry Science and Technology Innovation Center in Guangzhou (2018kczx01) and the Agricultural Science and Technology Innovation and Promotion Project in Guangdong Province (2018LM1112). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

ACKNOWLEDGMENTS
The author would like to thank Dr. Shen from the Beijing Genomics Institute in Shenzhen and Dr. Xie from Ming Lead Gene Technology for helping us analyze the data, revise the manuscript and answer our questions.