Genome-wide association study reveals novel loci associated with body size and carcass yields in Pekin ducks

Background Pekin duck products have become popular in Asia over recent decades and account for an increasing market share. However, the genetic mechanisms affecting carcass growth in Pekin ducks remain unknown. This study aimed to identify quantitative trait loci affecting body size and carcass yields in Pekin ducks. Results We measured 18 carcass traits in 639 Pekin ducks and performed genotyping using genotyping-by-sequencing (GBS). Loci-based association analysis detected 37 significant loci for the 17 traits. Thirty-seven identified candidate genes were involved in many biological processes. One single nucleotide polymorphism (SNP) (Chr1_140105435 A > T) located in the intron of the ATPase phospholipid transporting 11A gene (ATP11A) attained genome-wide significance associated with five weight traits. Eight SNPs were significantly associated with three body size traits, including the candidate gene plexin domain containing 2 (PLXDC2) associated with breast width and tensin 3 (TNS3) associated with fossil bone length. Only two SNPs were significantly associated with foot weight and four SNPs were significantly associated with heart weight. In the gene-based analysis, three genes (LOC101791418, TUBGCP3 (encoding tubulin gamma complex-associated protein 3), and ATP11A) were associated with four traits (42-day body weight, eviscerated weight, half-eviscerated weight, and leg muscle weight percentage). However, no loci were significantly associated with leg muscle weight in this study. Conclusions The novel results of this study improve our understanding of the genetic mechanisms regulating body growth in ducks and thus provide a genetic basis for breeding programs aimed at maximizing the economic potential of Pekin ducks. Electronic supplementary material The online version of this article (10.1186/s12864-018-5379-1) contains supplementary material, which is available to authorized users.


Background
China is the world's largest producer and consumer of domestic ducks, with Chinese duck production accounting for 75% of duck production worldwide (FAO). Furthermore, duck farming has been growing rapidly in China, and in addition to breast meat and eggs, secondary products, such as duck neck and wing are also very popular. However, these secondary products including the head, neck, feet, and wings are more expensive than breast muscle, and it is therefore necessary to pay more attention to the traits affecting these secondary products.
Previous breeding programs have not focused on the composition of these other body parts in ducks. However, molecular-based breeding programs now have been applied in animal breeding, and numerous quantitative trait loci (QTL) affecting carcass traits have been identified in various animals. Several biological pathways related to carcass traits in cattle have been identified, including peroxisome proliferator-activated receptor signaling, while some significant associations were detected in close proximity to genes with known roles in animal growth, such as glucagon and leptin [1][2][3]. In pigs, the number of copies of the vertebrae development associated gene (VRTN) was related to body length [4], while SSC1 and SSC8 include several loci related to carcass traits in pigs. Three genes, TBC1D1 (encoding TBC1 domain family member 1), BAAT (bile acid-CoA: amino acid N-acyltransferase), and PHLPP1 (PH domain and Leucine-rich repeat protein phosphatase 1) were highlighted as functionally plausible candidate genes for pig growth and fatness traits [5,6]. In addition to pigs and cattle, similar studies have been conducted in poultry, and numerous candidate genes associated with carcass traits have been found in chickens, including TBC1D1, LCORL (ligand-dependent nuclear receptor corepressor-like), LAP3 (leucine aminopeptidase 3), LDB2 (LIM-domain-binding 2), and TAPT1 (transmembrane anterior posterior transformation 1) [7][8][9][10]. However, compared with other domestic animals, study about the genetic basis of growth traits in ducks is lacking. Previous studies suggested that the fat mass and obesity-associated gene (FTO) [11] and mutations in intron 2 of the growth hormone gene (GH) [12] influenced duck carcass and meat quality traits, while Zhang et al. [13] also showed that the perilin gene (PLIN) affected duck carcass and fat traits.
The current chicken QTL database [14], includes more than 1500 QTLs related to body weight traits, but QTLs for traits related to the composition of subordinate body parts, such as neck length, fossil bone length, and foot weight, are still rare. Furthermore, few studies have been conducted in ducks, and no animal QTL database currently includes duck-related QTL. This lack of information needs to be addressed to support the development of duck breeding projects.
In the present study, we conducted a genome-wide association study (GWAS) in 639 Pekin ducks using genotyping-by-sequencing (GBS) [15]. We measured or derived a total of 18 body size and carcass yields traits, and aimed to identify potential loci and candidate genes affecting these traits. To our knowledge, this is the first large-scale GWAS investigation of duck carcass traits. These QTL information will not only facilitate the study of molecular genetic mechanisms, but may also improve the accuracy of genetic selection for carcass traits in ducks.

Descriptive statistics
Mean values (and standard deviations) for body size and carcass yields traits are shown in Table 1 and the phenotypic correlations are shown in Table 2. The highest phenotypic correlation (0.98) was between DW and EW, and the lowest correlation (− 0.24) was between DP and LMWP. The phenotypic correlations among the four body weight traits (DW, EW, HEW, and BW42) were all > 0.90, while the correlations between leg muscle percentage and the other 17 traits were all < 0.2, and were mostly negative.

Genetic parameters
The genetic parameters of all traits are shown in Table  2. The estimated heritability of these 18 traits ranged from 0.30-0.91. LMWP showed the highest heritability, suggesting a considerable genetic contribution to carcass yields and body size. There was also a high genetic   correlation between FBL and body weight traits, with correlation coefficients > 0.75, and a high genetic correlation between body weight and internal organ weight, with a correlation coefficient > 0.90.

Loci-based analysis
A total of 37 significant QTLs (P < 3.48E− 05) across 14 chromosomes were identified by loci-based analysis ( Table 3; Additional file 1: Figure S1). The estimated genomic inflation factor λ ranged from 1.01-1.06, suggesting no population stratification in the studied population (Additional file 1: Figure S2). One result indicated a genome-wide significant QTL (SNP Chr1_140105435 A > T) (Fig. 1). Eight non-overlapping QTLs were obtained for body size traits. The most significant of these (Chr2_24791499_A > T; located in the intron between exons 8 and 9 of LOC101799835) was related to BrW (P = 7.67E− 06) and accounted for 10.8% of the genetic variance. A SNP on chromosome 1 (Chr1 130,709,943 A > T; located in the intron between exons 8 and 9 of the Xg blood group gene, XG) was the only locus significantly associated with NL. Three significant QTLs in separate regions of Chr2 were associated with FBL, with the most significant (SNP chr2_52398515_C > T; P = 1.13E− 05) accounting for 6.2% of the genetic variance.
We identified 29 non-overlapping QTLs related to carcass traits. The traits DW, EW, HEW, BW42, and WW shared one genome-wide significant QTL (SNP Chr1_140105435 A > T) (Fig. 2), located in the intron between the last two exons of ATP11A. This QTL was also significantly (P = 5.27E− 08) associated with HEW, accounting for 5.4% of the genetic variance. There were two significant QTL for FW (Chr1_148339868 A > G, P = 7.73E-06; Chr4_59568289 A > G, P = 1.62E-05), one in the region of the ATP-binding cassette subfamily C member 4 gene (ABCC4) and the other near LDB2. We identified two significant QTLs (Chr3_11176648 C > T and Chr10_1656050 A > G) for BMW, located near the transmembrane protein 17 gene (TMEM17) and LOC101804888, respectively. Three significant QTLs for BMWP were identified on Chr1, of which the most significant locus LOC101803092 accounted for 8.1% of the genetic variance for this trait. No QTL was identified for LMW, and the SNP Chr20_2198171 C > G, located near the autism susceptibility candidate 2 gene (AUTS2), was the only potentially significant QTL for LMWP.

Gene-set analysis
We further examined the significance of the candidate genes within the QTL using gene set analysis (Table 4). Among the 18 traits, three candidate genes were found to be significant for the four traits BW42, EW, HEW, and LMWP: LOC101791418, tubulin gamma complex-associated protein 3 gene (TUBGCP3), and ATP11A (P < 1E− 05). The start and end positions of the three genes were concentrated within the genomic region between 139 and 140 Mb on Chr1. The highest significance was seen for ATP11A with HEW (P = 3.83E− 07). These results were consistent with the results of the association analysis above.

Functional annotation of significant regions
A total of 36 candidate genes were detected by GWAS analysis. We performed QTL annotation of these 36 chicken ortholog genes using the chicken QTL database [14] (Additional file 1: Figure S3). Twenty genes were annotated with QTL information in chickens. The QTL information for seven genes was consistent with the carcass traits measured in this study. ATP11A, calcium voltage-gated channel subunit alpha1 C (CACNA1C), which were associated with 42-day weight-related QTL in chicken studies, were also related to DW, EW, and HEW in ducks in the current study, respectively. Genes TMEM17 and ATPase plasma membrane Ca 2+ transporting 1 (ATP2B1), as candidate genes associated with BMW and BMWP, were associated with breast muscle weight/percentage QTL in chicken studies. We also annotated the candidate genes using the GO and KEGG databases (Additional file 2: Tables S1 and S2). A total of 25 genes were annotated. Clustering analysis showed that these genes are not significantly enriched in any specific function or biological pathway. The gap junction protein alpha 1 gene (GJA1) plays an important role in gap junctions, which are essential for many physiological events, including embryonic development, electrical coupling, metabolic transport, apoptosis, and tissue homeostasis. ATP2B1 is associated with the calcium signaling pathway and adrenergic signaling in cardiomyocytes. CACNA1C was found to be associated with seven pathways, including the calcium signaling pathway, gonadotropin-releasing hormone signaling pathway, and regulation of the actin cytoskeleton.

Heritability and correlation coefficients
The results of the current study indicated that body size and carcass traits demonstrated moderate to high heritability in Pekin ducks (0.30-0.91). Xu [16] previously estimated that breast muscle weight (BMW) and breast muscle weight percentage (BMWP) had moderate heritabilities (0.23 and 0.16, respectively), while the heritability of body weight was high (0.48). Furthermore, the heritability of BW42 in broilers in Nunes' study was 0.31 [17]. Compared with these results, we found similar heritabilities of most traits, other than BMW. In this study, we constructed a kinship matrix using SNPs and estimated the heritability of BMW as 0.63, which was obviously higher than Xu's results. This apparent discrepancy may have been due to population differences or differences in the estimation methods. However, estimations derived from a constructed pedigree from genomic information are likely to be more accurate than those from traditional pedigree records as the genomic information reflect more real genetic relationship than traditional pedigree. The largest phenotypic correlations were between DW and EW and between DW and HEW, while the genetic correlation between BW42 and DP was small (0.39). Nunes et al. [17] estimated the genetic parameters related to body weight, chemical carcass composition, and yield in a broiler-layer cross and found a genetic correlation between BW42 and DP of 0.33, which was in accord with the current results. We found highly positive genetic (0.74) and phenotypic correlations (0.63) between BW42 and BMW. Venturini et al. [18] also found genetic and phenotypic correlation coefficients of 0.86 and 0.84 between BW42 and BMW, respectively, which were consistent with our results. However, compared with Venturini et al.'s result (0.7), we found a smaller genetic correlation between BW42 and LMW (0.3), likely due to the use of a different species. Overall, our results indicated that the genetic parameters related to body weight traits in the Pekin duck population were consistent with previous studies.

Candidate genes related to body size, body weight and carcass traits
In this study, ATP11A was significant in both loci-based model GWAS and gene-set GWAS, and was associated with DW, EW, HEW, BW42, WW, and LMWP. Based on chicken QTL information, ATP11A was also associated with body weight and internal organ weight in chickens. Also, Segawa et al. [19] noted that the major flippases at the plasma membrane in most mammal cells are encoded by ATP11A and ATP11C. ATP11A encodes a complete membrane ATPase involved in the transport of phospholipids. Transporting creates membrane phospholipid asymmetry and initiates the biogenesis of transport vesicles. We hypothesize that this gene may be involved in metabolism in duck fat cells, and may be highly significantly related to five traits because body weight is a complex trait that involves the development of multiple organs. This gene may thus be involved in basic growth and development processes, especially in relation to fat deposition; however, more experiments are needed. The measurement of carcass traits in poultry breeding is both expensive and difficult, and can only be conducted after death. In contrast, BrW is an important live body measurement that is closely related to body weight and breast muscle production [20]. The current results revealed very high genetic and phenotypic correlation coefficients between BrW and EW, HEW, and BW42,   respectively. All of them share the same SNP (Chr2_19666206 A > G). It was one of the most significant SNPs for BrW, accounting for 9.8% of the observed genetic variance, with a MAF of 0.479. The candidate gene was PLXDC2, which coordinates the development and differentiation of nerve cells in various animals, including humans, mice, and chickens [21][22][23]. Our results demonstrated correlations between this site and BW42, DW, EW, HEW, and BrW, and chicken QTL annotation showed that the gene was associated with both body weight and leg muscle weight. Like BrW, FBL can also be measured easily and is significantly correlated with weight traits, and can thus be used for trait selection indirectly. The results of the present study showed that all the genetic and phenotypic correlation coefficients between FBL and BW42, DW, EW, HEW, and WW, respectively, were high. We also found that the SNP Chr2_52398515_C > T (located in intron of TNS3), which was significantly associated with FBL, accounted for 6.2% of the genetic variance. The TNS gene family plays an important role in the development and formation of bones. Dedicator of cytokinesis 5 (Dock5) and TNS3 genes have been shown to have a synergistic effect on the maintenance of osteoclast activity to ensure the correct organization of podosomes [24]. QTL information indicated that the QTL related to body weight and breast muscle weight/percentage localized to TNS3. However, the trait of FBL was not included in the chicken QTL information and we were unable to find any previous reports of FBL in poultry. We therefore provide the first report of a QTL associated with FBL in ducks, as an important reference for future research.
Secondary duck products after slaughter represent a large market share in Asia, and a better understanding of the genetic bases of the relevant economic traits will help to increase yield. In our study, the candidate gene for FW was LDB2, which encodes a LIM-domain-binding family protein that binds to a variety of transcription factors and plays a crucial role in brain development and angiogenesis [25,26]. Wang et al. [27] conducted a GWAS in 400 chickens from a conservation population of a local Chinese breed (Jinghai Yellow chickens). They identified five SNPs with genome-wide significance for FW, including one for which the candidate gene was LDB2, located on Chr4, as in the current study. Gu et al. [8] also found that the SNP within LDB2 had the strongest association with late growth (body weight from 7 to 12 weeks old and average daily weight gain from 6 to 12 weeks old).
The current study identified four suggestive sites for HW, distributed on four different chromosomes. The candidate gene for one of these, GJA1, encodes connexin43 and connexin45, which comprise part of the gap junction consisting of an array of intercellular channels that provides a pathway for the diffusion of low-molecular-weight substances between cells [28]. Proteins are thought to play a key role in the synchronous contraction of the heart and in embryonic development [29]. Recent studies have also demonstrated that GJA1 is an important functional gene in chicken growth and development, especially of chicken breast muscle [7,30]. Other functions of this gene have yet to be explored; however, the high correlation between BMW and HW suggests that GJA1 might be an important gene in terms of increasing breast muscle yield.
We also identified genes such as that encoding Kruppel-like factor 5 (KLF5) in relation to DW, EW, HEW, and HW, and ATP2B1 for BMWP. KLFs are zinc-finger transcription factors that act as key regulators of cellular differentiation and growth in adipocytes [31]. KLF5 is involved in the biological process that regulates lipid storage, and thus affects body weight-related traits. However, further studies are needed to clarify the role of ATP2B1 in poultry.
The use of molecular markers, revealing polymorphism at the DNA level, has been playing an increasing part in animal genetics studies. If we obtained more QTLs for duck, it would be possible to integrate these SNPs information in the genomic selection scheme to improve the selection accuracy in ducks.

Conclusions
In this study, we conducted a GWAS of 18 carcass traits in Pekin ducks. We detected 37 QTLs distributed across 26 chromosomes, and identified 36 candidate genes related to body size and carcass traits. These findings further our understanding of poultry genetics and provide a genetic basis for breeding programs aimed at maximizing the economic potential of Pekin ducks.

Ducks and phenotypes
A total of 639 (males: 314; females: 325) 21-day-old Pekin ducks from the same flock were randomly selected  [32]. The following traits were measured using a caliper and weighing scale and calculated as follows: Neck length: the distance between the first cervical vertebra and the end of neck; Fossil bone length: The distance between the anterior and the posterior border of the breast-bone crest; Breast width: the distance vertically between the backbone and the beginning of the breast-bone crest; Dressed weight: Weight after bloodletting and removal of feathers, foot cuticles, toes, and clam shells; Dressed percentage (%) = Dressed weight/slaughter weight × 100; Half-eviscerated weight: Weight of carcass after removal of trachea, esophagus, crop, intestines, spleen, pancreas, gallbladder, reproductive organs, stomach contents, and keratinocytes; Percentage of half-eviscerated yield (%) = Half-eviscerated weight/slaughter weight × 100; Eviscerated weight: Half-eviscerated weight minus the weight of the heart, liver, stomach, lungs, and abdominal fat; Percentage of eviscerated yield (%) = Eviscerated weight/slaughter weight × 100; Slaughter weight/42-day body weight: Weight after fasting for 6 h; Percentage of breast muscle (%) = Breast muscle weight/eviscerated weight × 100; Leg muscle: Total leg muscle weight after removing leg bones, skin, and subcutaneous fat; Leg muscle weight percentage of (%) = Leg muscle weight/eviscerated weight × 100.

GBS
Fresh blood was collected from the ducks before slaughtering and genomic DNA was extracted from the blood using a phenol-chloroform-based method. Genotyping was performed using GBS, as described previously [15]. A total of 544 million clean reads (63.25 GB) were generated, of which 96.12% (523 million reads) were mapped to the duck genome, with an average mapping rate of 96.25%. The data were deposited in the NCBI sequence read archive (SRP068685). GBS procedures are available on Zhu Feng's article [15]. Restriction enzyme (MSe1) for the PCR-RFLP assay was selected using information from REBASE [33]. A set of variable barcode adapters that recognize Mse1-compatible sequences were ligated to the digested DNA fragments. The restriction fragments were enriched by PCR amplification with adapter-specifc primers [34]. Te data of 2 × 125 bp pair-end reads were generated by the Illumina HiSeq2500.

Single nucleotide polymorphism (SNP) identification
Clean data were mapped to a reference genome using BWA (version 1.73) [35]. The reference genome was a chromosome-assembly version from BGI duck 1.0 reference (GCA_000355885.1) based on the RH map [36] with ALLMAP [37]. VCFtools [38] and PLINK (1.90) [39] were used for quality control of the data. SNP detection was performed using the GATK Hap-loCaller (3.7) [40]. All parameters were kept at default settings, except for -stand_call_conf 30. The data were imputed using Beagle (4.1) [41], using R 2 > 0.3 for low-quality filtering. A total of 62,067 SNPs met one or more of the following conditions: minor allele frequency (MAF) of > 1%, sample call rate ≥ 95%, and the Hardy-Weinberg equilibrium test (P > 10 − 6 ). All phenotype measurements were normalized using the rank transformation method, and the effects of batch and gender on the phenotype were examined using the variance test for subsequent covariates of the mixed model. Population substructure analysis was performed using EIGENSOFT (2.04), and the top ten PCA [42] components were used as covariates for further analysis. GCTA-LDMC [43] was used to estimate the genetic parameters. SNPs used in this study listed in Additional file 2: Table S3.

Statistical analysis
Loci-based analysis was performed using the generalized linear mixed model implemented in GEMMA [44], where the kinship matrix was calculated using the center method. The mixed model was mainly based on the additive effect of sites: where y is the vector phenotypes of per duck; μ is the overall mean; X is the covariance matrix (mainly containing the gender effect, batch effect, and first ten PCA principal components obtained from the analysis of population substructure); b is the estimator vector of fixed effects; u~(0, Gσ 2 u ) is the additive polygenic effect, with G the genomic kinship matrix, and σ 2 u is the additive effect variance; S is the design matrix containing the corresponding SNP sites; α is the substitution effect size corresponding to each site; and e~N (0, Iσ 2 e ) is the vector of random residual effects, with I the identity matrix and σ 2 e the residual variance. Gene-set analysis used the MAGMA Top model [45]. Gene analysis in MAGMA is based on a multiple linear principal components regression model, using an F-test to compute the gene P-value. The association level for each gene was the weighted sum of the associated statistics for the SNP sites in the region. Using this model can improve the statistical power for the identification of a candidate gene. To perform the gene-set analysis, for each gene g the gene p-value p g computed with the gene analysis is converted to a Z-value z g = Φ − 1 (1-p g ), whereΦ − 1 is the probit function. This yields a roughly normally distributed variable Z with elements z g that reflects the strength of the association each gene has with the phenotype, with higher values corresponding to stronger associations.
The genomic inflation factor (λ) was calculated using R package qqman [46]. Multiple test thresholds were calculated using the simpleM method [47]. A total of 28,707 valid inspections were obtained. The genome-wide significance level was 1.741735E− 06 (0.05/ 28,707), and the suggestive significance level was 3.483471E− 05 (1/28,707). LD statistic were performed using PLINK (−-r2) to calculates inter-variant allele count correlations [39]. Multiple consecutive significant sites were defined as separate QTL regions, and site-effect values were calculated using the equation 2pqβ 2 /σ 2 , with allele frequencies p and q, β is corresponding effect size of SNP identified in association study, σ 2 is phenotypic variance.

Functional annotation
The gene position information was annotated using BEDTools [48]. The functions of the genes were annotated using the KEGG and GO databases, and enrichment analysis was performed using the R package GOSeq [49]. Due to the absence of a duck QTL database, we used the orthologues of chicken genes and their QTL information for candidate genes in the Animal QTL Database [14].