Genome-Wide Association Study to Identify SNPs and Haplotypes Associated with Reproductive Performance and Egg Quality Traits in Geese

Background: Geese are one of the most economically important waterfowl worldwide. However, suboptimal reproductive performance hinders the development of the goose rearing industry. With the increasing public concern surrounding animal health and the regulation of the goose rearing industry, reproductive performance and egg quality have become a major focus for researchers. To promote the accuracy of selective breeding and identify important candidate genes associated with reproductive performance and egg quality traits, we examined the birth body weight (BBW), egg number at 48 weeks (EN48), egg number at 60 weeks (EN60) and egg yolk color (EYC) among 209 Sichuan White geese using a genome-wide association study (GWAS). Results: In this study, we obtained 2.896 Tb of raw whole-genome sequencing data with a mean depth coverage of 12.44. A total of 9,279,339 single nucleotide polymorphisms (SNPs) were identied in the goose genome. The GWAS analysis showed that 26 SNPs were signicantly associated with the examined birth body weight (BBW), egg number in 48- and 60-week-old geese (EN48 and EN60) traits and egg yolk color (EYC), respectively. Notably, two genomic regions, a 28.85 kb genomic region (4512855bp ~ 4541709bp) on chromosome 35, and 294.57kb genomic region (21069009 b ~ 21363580 bp) located on chromosome 5, were signicantly associated with egg number in 48- and 60-week-old geese (EN40 and EN60) egg yolk color (EYC), respectively. In addition, we performed the MALDI-TOP MS method to detected SNPs and haplotypes markers for EN48, EN60 and EYC traits were identied in the candidate gene TMEM161A, CALCR, TFPI2, and GLP1R, respectively Conclusions: The SNPs, haplotypes markers and genes identied in this study can be used for marker-assisted selection of reproductive performance and egg quality traits in geese, promoting the accuracy of selective breeding. In addition, the identied candidate genes signicantly associated with the provide a foundation for a better understanding of the mechanisms underlying reproduction and egg quality in MBB1A:Myb-binding 1A-like protein;

such as embryo viability, hatchability, egg production, and breakage rate [9][10][11]. With the increasing public concern surrounding animal health and the regulation of the goose industry, reproductive performance and egg quality have become a major focus for researchers [12,13].
In birds, reproductive performance and egg quality are quantitative traits, which, as well as having genetic determinants, are also in uenced by diet, housing, environment, management, disease, and other factors [13]. Genome-wide association studies (GWAS) are used to detect genes or genetic variations associated with speci c traits [14]. To date, various genes have been associated with reproductive performance, egg quality, and other economically important traits in poultry. Once validated, these loci can be used for marker-assisted selection to signi cantly improve economic value and selective breeding techniques [15][16][17]. However, little is known about genes and markers associated with reproduction performance and egg quality traits in geese.
In the current study, we used a whole-genome resequencing approach to identify single nucleotide polymorphisms (SNPs) in the goose genome. We then conducted a genome-wide association analysis of egg laying and egg quality traits to identify the corresponding genomic regions and candidate genes, laying the foundation for further study of the traits to accelerate goose research and genetic improvement.

Ethics statement
All experiments involving animals were performed according to the laws and regulations established by the Ministry of Agriculture of China (Beijing, China). The study was approved by the Animal Care and Welfare Committee of the Chinese Chongqing Academy of Animal Science (Chongqing, China).

Experimental animals and phenotypic traits
A total of 209 Sichuan White geese were used in this study. Each bird was reared from birth to the nonlaying period (65 weeks). All individuals were housed in single cages (600 × 800 × 900 mm) at the AnFu waterfowl-breeding base in Chongqing City, China (105.478°N, 29.343°E). All geese were fed the same diet with free access to water under natural temperatures and arti cial lighting (12 h of light per day). At week 28, blood samples were collected from the wings of all 209 geese using vacuum tubes containing ethylenediaminetetraacetic acid and used for DNA extraction.
During the egg laying period (weeks 28-64), we collected and marked eggs from each goose daily. The birth body weight (BBW), egg number at 48 weeks (EN48), and egg number at 60 weeks (EN60) were recorded for each goose. At weeks 35 to 40, three consecutive eggs were collected from each individual and assessed for egg yolk color (EYC).
Whole-genome resequencing and SNP calling DNA was extracted from the whole blood samples using a genomic DNA extraction Kit (DP332; Tiangen Biotech, Beijing, China). DNA concentration and quality were determined using a Nano Vue spectrophotometer (Cytiva Life Sciences, Marlborough, MA, USA) and agarose gel electrophoresis to ful ll the requirements for the Illumina sequencing platform.
Whole-genome resequencing data were obtained using the Illumina HiSeq X Ten platform (Illumina, San Diego, CA, USA). After read ltering, high-quality reads were mapped to a goose complete genome sequence assembled by our lab (data not shown) using BWA software [18]. SNP calling was conducted using GATK software [19].
Genotyping and quality control Quality control was conducted at both the individual and SNP levels. At the individual level, only samples with genotype data that were ≥95% complete were included in the trait's phenotype dataset. SNPs were ltered using Plink [20] according to the following criteria: Hardy-Weinberg equilibrium value > 10 -7 , minor allele frequency >0.05, and proportion of missing genotypes <0.05. In addition, individuals with <10% of genotypes missing were retained. Following quality control, SNPs identi ed from the 209 individuals were used for principal component analysis using the model implemented in Plink.

Genomic prediction
The identi ed traits and SNPs were then used to conduct a GWAS using a mixed linear model in the Genome-Wide E cient Mixed Model Association (GEMMA) software package [21] using the following statistical model: y = Wα + xβ + μ + ε, where y is the reproductive phenotype or egg quality value for all individuals, W is a covariance matrix used to control population structure, α is a vector of corresponding coe cients, including the intercept, x is the genotype of the SNP or haplotype marker, β is the effect size of the SNP or haplotype marker for the phenotypes, μ is a vector of random polygenic effects, and ε is a vector of random residuals. The Wald test statistic was used to calculate the signi cance of the SNPs and phenotypes, while the association analysis results were corrected using Bonferroni's method.
The genotypes of the signi cantly associated SNPs identi ed in the GWAS were determined by iPLEX matrix-assisted laser desorption-ionization time-of-ight mass spectrometry (MALDI-TOP MS) using the MassARRAY genotyping platform (Agena Bioscience, San Diego, CA, USA). Ampli cation and extension primers (Table S1) were designed to amplify the target sequences and hybridize and elongate the fragments at the nucleotide position of interest, respectively.
For all of the traits, Manhattan plots were generated using the GAP package (https://cran.rproject.org/web/packages/gap/) and quantile-quantile (QQ) plots were drawn using the qqman package (https://cran.r-project.org/web/packages/qqman/), both in R project software. We used the GenABEL package to calculate the extent of false-positive signals in the results using the genomic in ation factor (k) [22].

SNP-based haplotype blocks
Association analysis of the SNPs and haplotypes and phenotypic traits was carried out using a general linear model in JMP version 13.0 using the same equation: y = Wα + xβ + μ + ε. Haplotype blocks and SNPs were separately estimated for every trait. The least-squares mean of each genotype or haplotype block was then calculated, and differences between the genotypes were analyzed using the Bonferroni test.

SNP annotation
BEDTools software [23] was used to extract genetic information from 500-kb regions upstream and downstream of each potential SNP in the goose genome, while SNP annotation was conducted using Annovar software [24] (SnpEff, Annovar, VEP, oncotator). The detected genes were then searched against the chicken and human genomes using BioMart software (http://asia.ensembl.org/biomart/) to identify homologous genes, while gene function annotation was carried out using the metascape website [25] (https://metascape.org/gp/index.html#/main/step1).

Description of phenotypic traits
The body weight, egg laying, and egg-quality traits of all 209 Sichuan White geese were recorded throughout the laying period and are summarized in Table 1. The reproduction performance BBW, EN48, EN60 and EYC traits con rmed that Sichuan White geese are a high yield, medium body-sized breed. The egg laying traits of the study population were consistent with those described previously for the breed.

Whole-genome resequencing and SNP calling
We generated 2.896 Tb of raw whole-genome resequencing data from 209 female Sichuan White geese, with a Mean depth coverage of 12.44. After ltering, a total of 2.891 Tb of high-quality sequence data were mapped to a goose genome sequence, with an average mapping rate of 98.15% (96.58%-98.38%) ( Table S2). The average GC content was 42.85%, which is consistent with that previously reported for the goose genome [26][27][28]. A total of 9,279,339 SNPs identi ed in goose genome were used for further analysis ( Figure S1).

Genome-wide association study
Principle component analysis revealed slight strati cation within the goose population ( Figure 1). Therefore, we used a mixed linear model in the GEMMA software package to adjust for variations in population structure and conduct a GWAS using the selected traits. In total, 26 SNPs were found to be signi cantly associated with the traits (Table 2), located within 14 genes surrounded these regions ( Table   2). The Manhattan and QQ plots for the traits are shown in Figure 2.
Haplotype-based association analyses with EN48, EN60 and EYC traits Haplotypes were constructed for the regions containing the signi cantly associated SNPs. We found that 5 SNPs in TMEM161A (transmembrane protein 161A), located within a 28.85 kb genomic region (4512855bp ~ 4541709bp) of chromosome 35, were signi cantly associated with EN48 and EN60 ( Figure 4A). The 5 SNPs and genotypic distributions in goose were statistically analyzed, and associations between genotypes and EN48 and EN60 traits were determined (Table 3, Figure 4A). Haplotype association analysis con rmed that the corresponding haplotype for this region was signi cantly associated with both the EN48 and EN60 traits. Individuals with the genotype (CCAAAGAA) produced minimal eggs at 48 weeks and 60 weeks compared with individuals with the other genotypes (Table S5).
On chromosome 5, a haplotype within a 294.57kb genomic region (21069009 b ~ 21363580 bp) located on chromosome 5, was constructed based on 5 SNPs ( Figure 4B, Table 3) in the genes GLP1R, CALCR, and TFPI2 that were signi cantly associated with EYC. The 5 SNPs and genotypic distributions in goose were statistically analyzed, and associations between genotypes and EYC traits were determined (Table  3). Haplotype association analysis showed that the haplotype was also signi cantly associated with EYC, and that there were signi cant differences among genotypes. For example, individuals with the genotypes (AGCTGAGAGT) and (GGTTGGAAGG) showed signi cantly higher and lower EYC values, respectively, than individuals with other genotypes (Table S5).

Discussion
Sichuan White is a popular Chinese goose breed reared for both meat and eggs. While egg laying and egg quality traits have been studied intensively in avians, including chickens [29,30], ducks [15,31], and quail [32], few studies have examined traits other than those associated with egg laying in the goose genome [13].
In the current study, a GWAS using 209 geese identi ed 27 SNPs signi cantly correlated with BBW, EN48, EN60 and EYC traits. As one of the most economically important traits, BBW is a major focus of goose breeding and has been correlated with a number of other performance traits in birds [33]. A SNP located within SMAP2 (stromal membrane-associated protein 2) was signi cantly associated with BW. In mammals, SMAP2 is necessary for formation in spermiogenesis [34], with SMAP2-de cient mice displaying male infertility [35], globozoospermia [36], asthenozoospermia [36], and abnormal acrosome formation [36]. Therefore, SMAP2 may be essential for reproduction in geese, regulating the BBW trait.
It is noteworthy that four SNPs situated within TMEM161A were grouped into a haplotype block, and association analyses con rmed that both the SNPs and the haplotype were signi cantly associated with EN48 and EN60. TMEM161A, also known as an adaptive response to oxidative stress protein 29 (AROS-29), is involved in several cellular processes, including the adaptive response to oxidative stress [37], the response to ultraviolet radiation, and the response to retinoic acid, and is a negative regulator of the intrinsic apoptotic signaling pathway in response to DNA damage and a positive regulator of the DNA repair response. In birds, retinoic acid promotes embryo development and egg production and can regulate egg number [38,39]. Sichuan White geese display seasonal reproduction as an adaption to migration habit of the domesticate from wild goose and are therefore sensitive UV radiation. Because TMEM161A plays a key role in the response to UV light [40], it is possible that the arti cial selection pressures resulting from the intensi cation of animal production systems contributed to the appearance of the four SNPs in TMEM161A, satisfying the demand for increased egg production and aiding in adaptation to the production environment. Mover, we also identi ed the other 5 candidate genes for the EN48 or EN60 traits, PGM2L (Glucose 1,6-bisphosphate synthase), DP13B (DCC-interacting protein 13beta), DP13B (DCC-interacting protein 13-beta), RTJK (RNA-directed DNA polymerase from mobile element jockey), S2542 (Mitochondrial coenzyme A transporter SLC25A42). Previous studies have identi ed many candidate genes corresponding to reproductive traits in geese [41]. However, the positional candidate genes for egg number identi ed in the current study did not overlap with these genes. The egg number trait in poultry is quantitative and is regulated by numerous genes as well as the environment [42]. The positional candidate genes and SNPs identi ed here were all associated with egg production and microeffect control of the examined traits. In addition, previous studies have examined a range of different breeds raised under diverse conditions, resulting in differences in the arti cial selection pressures among studies that would undoubtedly in uence egg number. Further, while the population sizes used in these studies were su cient to draw goose breed-speci c conclusions, they were not large enough to make species-wide predictions.
The biology pathway analysis of the candidate genes for EN48 and EN60 highly enrich in the response to radiation (GO:0009314), response to light stimulus (GO:0009416), cellular response to hormone stimulus (GO:0032870), response to and peptide hormone (GO:0043434) ( Table 3). As we known, goose is the seasonal breeding goose were regulated by the light and the reproduction actives were controlled by the cycling reproduction hormones. We speculated that the candidate genes for EN48 or EN60 were response to the light or the reproduction hormones, involving in to regulate the reproduction activities, the mechanism needed to be further studied.
EYC is a crucial economic egg quality character because consumers associate yolk color with nutrition in [43]. Studies have shown that EYC is affected by genetic [44], environmental [45], housing [46], and dietary [47] factors. In the current study, twelve SNPs were signi cantly associated with EYC in Sichuan White geese. It is worth noting that ve of these SNPs were constructed into a haplotype, resulting in the identi cation of positional candidate genes CALCR (calcitonin receptor), TFPI2 (tissue factor pathway inhibitor 2), and GLP1R (glucagon-like peptide 1 receptor). CALCR is a receptor that binds the peptide hormone calcitonin, and is involved in calcium homeostasis, bone formation and metabolism, and lipid metabolism [48,49]. GLP1R, a member of the glucagon receptor family of G protein-coupled receptors, is involved in the control of blood sugar levels by enhancing insulin secretion [50]. Interestingly, CALCR and GLP1R regulate peptides that have anorexic effects, suppressing long-term food intake and promoting signi cant weight loss [51,52]. TFPI-2 belongs to the Kunitz-type family of protease inhibitors and regulates matrix metalloproteinase activation and extracellular matrix degradation. De ciency in this gene in mice accelerate the development of atherosclerotic lesions [53]. In addition, over-expression of TFPI-2 strongly inhibits the proliferation and migration of vascular smooth muscle cells, and affects smooth muscle cell proliferation and migration [53,54]. Therefore, these positional candidate genes may control the EYC trait by affecting food intake or lipid metabolism in geese. EYC is a quantitative trait in birds and is regulated by many genes in poultry [55]. In agreement with previous studies, we also identi ed multiple other positional candidate genes corresponding to the twelve SNPs, including MBB1A (Mybbinding protein 1A-like protein), CB042 (uncharacterized protein C2orf42), CACB2, (voltage-dependent Ltype calcium channel subunit beta-2), CCM2 (cerebral cavernous malformations protein 2 homolog), and ZBT46 (zinc nger and BTB domain-containing protein 46), the effects of which on EYC require further study.
The GO pathway analysis revealed that the regulation of epidermal growth factor-activated receptor activity (GO:0007176), regulation of epidermal growth factor receptor signaling pathway (GO:0042058) was remarkably enriched for EYC traits in goose. the two GO terms modulate the EGF-activated receptor activity. In mammal or birds, EGF-like growth factors play an important role in the ovulatory follicle [56], maturation of the cumulus-oocyte complex [57], oocyte maturation and development [58]. One of the key factors affected the EYC were genetic [59]. We propose that the candidate genes for EYC traits the EGFactivated receptor factors, might regulated the process of oocyte maturation and development, and determined the EYC traits in goose.

Conclusions
We predicted 26 candidate SNPs for the reproduction performance and egg quality traits in Sichuan white goose. Furthermore, The SNPs and haplotypes markers for EN48, EN60 and EYC, were identi ed in the candidate gene TMEM161A, CALCR, TFPI2, and GLP1R, respectively. The SNPs genotypes and SNPs haplotypes could be the markers for the egg production and EYC in the arti cial selection procession in Sichuan White goose.  Tables   Table 1  The statistic of    Note: *signi cant association P < 0.05; ** extremely signi cant association P < 0.01; Different letters in the same row indicate a signi cant difference between the genotypes (P < 0.05); the same letter indicates no signi cant difference between genotypes (P > 0.05).        Supplementary Files This is a list of supplementary les associated with this preprint. Click to download.