Genome-Wide Association Study of Muscle Glycogen in Jingxing Yellow Chicken

Glucose metabolism plays an important role in many normal and pathological physiological processes in the body. The breakdown and synthesis of muscle glycogen provides ATP for muscle activities. A genome-wide association study for muscle glycogen was performed in 474 Jingxing yellow chickens to identify significant single nucleotide polymorphisms (SNPs) and insertions and deletions (INDELs) involved in muscle glycogen metabolism. A total of nine SNPs (p < 1/699341) and three INDELs (p < 1/755733) reached a significant level of potential association. The following results were obtained through a series of analyses, including additive effects and gene function annotation. Two significant SNPs were found in introns 12 and 13 of copine 4 (CPNE4) on chromosome 2. The wild-type and mutant individuals had significant differences in glycogen metabolism at two loci (p < 0.01 for both). Individuals carrying two mutations had increased muscle glycogen content. According to the gene annotation of chromosome 11, there is a significant INDEL in intron 6 of naked cuticle homolog 1 (NKD1). After the INDEL mutation, the glycogen content increased significantly. There was a significant difference between wild-type and mutant individuals (p < 0.01). These mutations likely affecting two genes (CPNE4 and NKD1) may affect glycogen storage in a pleiotropic manner. Gene annotation indicates that CPNE4 and NKD1 may affect the process of glucose metabolism. Our findings contribute to understanding the genetic regulation of muscle glycogen metabolism and provide theoretical support.


Introduction
Muscle glycogen is the main source of glucose for muscle glycolysis. Glycolysis is the most primitive metabolic pathway for producing energy in living organisms. Study of muscle glycogen function and influencing factors will reveal the genetic mechanism of glucose metabolism and lead to its application in future molecular breeding research.
Glycogen is the main form of glucose storage in the body, mainly in the form of liver and muscle glycogen. Glycogen is a branched polysaccharide found in the form of granules in the cytosol of the cytoplasm, and is connected by many glucosides. The blood glucose concentration is maintained at a relatively constant level through the synthesis and breakdown of liver glycogen. Numerous studies have suggested that the main function of muscle glycogen synthesis and breakdown is to provide the muscle with energy in the form of ATP. However, under some special physiological conditions, such as intense exercise, muscle glycogen plays an important role in maintaining blood sugar stability

Resource Population
The experimental animals were Jingxing yellow chicken. Experimental animals were selected for the laboratory-specific intramuscular fat selection line, which had been bred for 16 generations. A total of 474 chickens were selected for this trial, of which 217 were chickens in the intramuscular fat selection line and 256 chickens in the control line (the natural population without selection). All individuals were 98 days of age and allowed to eat and drink ad libitum. Blood samples were collected the day before slaughter by standard venipuncture into a tube with anticoagulant and stored at −20 • C for subsequent genomic DNA extraction. The breast muscle tissue samples from each chicken were stored at −80 • C for subsequent measurements of glycogen content.

Phenotypic Measurement
A sample of 0.1 g of pectoral muscle tissue was used for measurement. Glycogen is stable in concentrated alkali solutions. The tissue was placed into 300 µL concentrated alkali solution for 20 min before color development to destroy other components and preserve glycogen. Glycogen was then dehydrated with concentrated sulfuric acid to form an aldehyde derivative, which was reacted with anthrone to form a blue compound, which was colorimetrically quantified using a standard glucose solution treated in the same way, and the glycogen content was ultimately calculated.

Quality Control and Imputation
Blood collected from veins of the entire population was extracted using the standard phenol/chloroform extraction method. The quantity and quality of the DNA extracted from whole blood samples were determined using a Nanodrop ND-1000 Spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA) to measure their concentration and integrity, as well as by agarose gel electrophoresis to visually assess DNA integrity. Samples that passed the quality test were sent to the company for 10G whole-genome sequencing. A total of 474 adult females Jingxing yellow chickens (14 weeks of age) were selected to perform the GWAS. Data were first filled with the Beagle v5.0 software [17,18] for all SNP loci, and the filled SNP locus was quality-controlled with the Plink v1.9 software [19]. The standard procedure was to remove sites with a minimum allele frequency (MAF) of less than 5% (MAF < 0.05) and sites with a deletion rate greater than 5% (geno > 0.05).

Whole-Genome Resequencing
After the examinations, paired-end libraries were generated for each eligible sample using standard procedures. The average insert size was 300-500 bp, and the average read length was PE150 bp (150 bp double-ended). All libraries were separately sequenced on a Novaseq 6000 sequencer (Illumina, San Diego, CA, USA) to an average raw read sequence coverage of ×10 for the natural populations. Accession data code is CRA002643 at https://bigd.big.ac.cn/gsa/.

Population Structure and Association Analysis
As an effective means of GWAS analysis, the genome-wide efficient mixed-model association (GEMMA) v0.94 software, which can perform genomic localization of traits based on population SNP or INDEL information using an efficient exact mixed model approach, was implemented with the valid individuals and SNPs or insertions and deletions (INDELs) for univariate analysis. The method makes exact GWAS computationally feasible for large sample sizes. Independent SNPs or INDELs were used to compute the centered relatedness matrix, and the significance p-value level between SNPs or INDELs and phenotypes was calculated from a derived Wald test. The analysis was performed using the GEMMA software [21], and the calculation model was: where u ∼ MVN m 0, λτ −1 K , ε ∼ MVN n 0, τ −1 I n n is the number of individuals, m is the number of packets of random effects, y is the n × 1 dimensional quantitative trait phenotype value, W is the covariance matrix of n × c, α is the c × 1 covariate, β is the effective SNP or INDEL, Z is the n × m load matrix, u is m × 1 random effect vector, ε is an n × 1 error vector. τ -1 is the residual variance, λ is the ratio of the variance components, K is the m × m affinity matrix, and I n is the n × n unit matrix. The genomic inflation factor was calculated with R (https://www.r-project.org/). In this analysis, the first and second principal components were used as covariates, and the affinity matrix (plink, prune, r 2 > 0.2) was established by using unlinked SNPs or INDELs. Individual SNPs or INDELs were solved one by one. Additionally, the significance of the SNP or INDEL was determined by the likelihood ratio test (LRT).

Gene Identification and Annotation
The annotated genes that were the nearest or harboring significant SNPs were identified as candidate genes in which significant loci were located [22]. Genes in a specific genomic region [23,24] were detected by using Variant Effect Predictor (VEP) based on the galGal5 assembly supported by Ensembl (oct2018.archive.ensembl.org/Gallus_gallus/Info/Index) accessed in May 2019.

Phenotypic Data and Structural Analysis
The maximum and minimum glycogen traits of the 474 individuals were 8.92 mg/g and 0.23 mg/g, respectively, and the average was 2.37 mg/g, with a coefficient of variation of 56%. In the GWAS analysis, population stratification effects may lead to false positive results [25]. As our population was composed of two groups, we calculated the genome-controlled inflation factor (λ) of glycogen to be 1.155. Since this value should ideally be equal to 1, this result indicated that there was a significant population stratification effect. Therefore, in the GWAS, we took the first and second principal components of the population as fixed effects and analyzed them as a whole.

Genome-Wide Association Studies (GWAS) for Single Nucleotide Polymorphism (SNP) and Additive Effect
In this study, for the Q-Q plot presented as Figure 1. The Manhattan map of muscle glycogen is shown in Figure 2. p-value correction was first performed using the PLINKv1.9 software [19] for block analysis. This experiment used the Bonferroni correction method to correct the p-value. The D' was set to 0.8, the maximum chain length was 500 Kb, and a total of 699,341 independent linkage disequilibrium (LD) blocks were obtained.

Effect
In this study, for the Q-Q plot presented as Figure 1. The Manhattan map of muscle glycogen is shown in Figure 2. p-value correction was first performed using the PLINKv1.9 software [19] for block analysis. This experiment used the Bonferroni correction method to correct the p-value. The D' was set to 0.8, the maximum chain length was 500 Kb, and a total of 699,341 independent linkage disequilibrium (LD) blocks were obtained.  In the GWAS analysis of muscle glycogen, nine SNPs reached a significant level of potential association ( Figure 2), annotated as a 100-Kb gene near the locus (Table 1). There is no gene in the 100 Kb region of chromosome 1 plus or minus, and two SNP sites on chromosome 2 are in the intronic region of the CPNE4 gene. The chromosome 3 rs13720604 has three genes in this region. The mutation of TTC7A mainly acts on the gastrointestinal tract. The TTC7A protein is part of a complex that localizes phosphatidylinositol 4-kinase (PI4K) to the plasma membranes. SOCS5 is a cytokine signaling inhibitor that acts on IGF-1 receptor signaling and EGF signaling. There are three SNPs on chromosome 4 that reached a significant level. GRIA3 is a glutamate receptor that acts as a ligand-gated ion channel in the central nervous system and plays an important role in excitatory synaptic transmission. MFAP3L protein may participate in the nuclear signaling of EGFR and MAPK1/ERK2 pathway. Chloride voltage-gated channel 3 (CLCN3) is a protein coding gene which is associated with cystic fibrosis. Among its related pathways are the activation of cAMP-dependent PKA and the transport of glucose and other sugars, bile salts and organic acids, metal ions, and amine compounds. There are two genes associated with the chromosome 11 locus. CYLD encodes a cytoplasmic protein with three conserved cytoskeletal-associated protein-glycine (CAP-GLY) domains that functions as a deubiquitinating enzyme. Gene Ontology (GO) annotations associated with the other gene on this locus, SNX20, include phosphatidylinositol binding.
Genes 2020, 11, x FOR PEER REVIEW 5 of 12 In the GWAS analysis of muscle glycogen, nine SNPs reached a significant level of potential association (Figure 2), annotated as a 100-Kb gene near the locus (Table 1). There is no gene in the 100 Kb region of chromosome 1 plus or minus, and two SNP sites on chromosome 2 are in the intronic region of the CPNE4 gene. The chromosome 3 rs13720604 has three genes in this region. The mutation of TTC7A mainly acts on the gastrointestinal tract. The TTC7A protein is part of a complex that localizes phosphatidylinositol 4-kinase (PI4K) to the plasma membranes. SOCS5 is a cytokine signaling inhibitor that acts on IGF-1 receptor signaling and EGF signaling. There are three SNPs on chromosome 4 that reached a significant level. GRIA3 is a glutamate receptor that acts as a ligandgated ion channel in the central nervous system and plays an important role in excitatory synaptic transmission. MFAP3L protein may participate in the nuclear signaling of EGFR and MAPK1/ERK2 pathway. Chloride voltage-gated channel 3 (CLCN3) is a protein coding gene which is associated with cystic fibrosis. Among its related pathways are the activation of cAMP-dependent PKA and the transport of glucose and other sugars, bile salts and organic acids, metal ions, and amine compounds. There are two genes associated with the chromosome 11 locus. CYLD encodes a cytoplasmic protein with three conserved cytoskeletal-associated protein-glycine (CAP-GLY) domains that functions as a deubiquitinating enzyme. Gene Ontology (GO) annotations associated with the other gene on this locus, SNX20, include phosphatidylinositol binding.    Based on the SNP sites typing results of the whole genome sequencing data, the average genotype frequency and phenotype of nine SNPs were calculated ( Table 2). The data in Table 3 reveal that the dominant genotype rs314624646 is TT; the dominant genotypes rs313265900 and rs740265511 are CC; the dominant genotype rs13720604, rs733544657, and rs734443 are GG; and the dominant genotypes rs13720604, rs741487544, and rs315075611 are AA.
The additive effect of alleles is the average of the two homozygous dispersions. According to the cumulative effect, the glycogen content of the rs314624646, rs13720604, and rs740265511 mutations decreased, and the glycogen content of the remaining six SNPs increased. In stepwise conditional analysis, important SNP genotypes were added to the univariate model to elucidate independent signals, and significant differences were found between the nine SNP wild-type and mutant individuals ( Figure S1, Table S1: SNP1-SNP9).

GWAS for Insertions and Deletions (INDEL) and Additive Effect
In this study, for the Q-Q plot presented as Figure 3. The results of the GWAS analysis indicated that there were three significant levels of metabolism related INDELs in the whole genome (p < 1.32 × 10 −6 ) (Figure 4), which were located on chromosomes 1, 3, and 11 (Table 4). Genes such as NKD1 and FOSL2 were identified. The INDEL of chromosome 11 is located in the NKD1 gene, which encodes a protein that is an inhibitor of the canonical WNT signaling pathway. Glycogen synthase kinase 3A (GSK3A) is among the activated target genes of NKD1. At 29.7 Kb upstream of chromosome 3, 27425548 is the FOSL2 gene, which, as a dimer with JUN, activates LIF transcription and CEBPB transcription in PGE2-activated osteoblasts. CEBPB plays a major role in adipogenesis and gluconeogenesis. There is no gene in the region of chromosome 1 7169549 plus or minus 100 Kb.

GWAS for Insertions and Deletions (INDEL) and Additive Effect
In this study, for the Q-Q plot presented as Figure 3. The results of the GWAS analysis indicated that there were three significant levels of metabolism related INDELs in the whole genome (p < 1.32 x 10 -6 ) (Figure 4), which were located on chromosomes 1, 3, and 11 (Table 4). Genes such as NKD1 and FOSL2 were identified. The INDEL of chromosome 11 is located in the NKD1 gene, which encodes a protein that is an inhibitor of the canonical WNT signaling pathway. Glycogen synthase kinase 3A (GSK3A) is among the activated target genes of NKD1. At 29.7 Kb upstream of chromosome 3, 27425548 is the FOSL2 gene, which, as a dimer with JUN, activates LIF transcription and CEBPB transcription in PGE2-activated osteoblasts. CEBPB plays a major role in adipogenesis and gluconeogenesis. There is no gene in the region of chromosome 1 7169549 plus or minus 100 Kb.   The same method as that used for the calculation of SNPs in Subsection 3.2. was used to calculate the average genotype frequency and phenotype of INDELs (Table 3). The results indicate that the three INDELs (sorted from small to large on the chromosome) dominant genotypes are wild-type, heterozygous, and wild-type.

GWAS for Insertions and Deletions (INDEL) and Additive Effect
In this study, for the Q-Q plot presented as Figure 3. The results of the GWAS analysis indicated that there were three significant levels of metabolism related INDELs in the whole genome (p < 1.32 x 10 -6 ) (Figure 4), which were located on chromosomes 1, 3, and 11 (Table 4). Genes such as NKD1 and FOSL2 were identified. The INDEL of chromosome 11 is located in the NKD1 gene, which encodes a protein that is an inhibitor of the canonical WNT signaling pathway. Glycogen synthase kinase 3A (GSK3A) is among the activated target genes of NKD1. At 29.7 Kb upstream of chromosome 3, 27425548 is the FOSL2 gene, which, as a dimer with JUN, activates LIF transcription and CEBPB transcription in PGE2-activated osteoblasts. CEBPB plays a major role in adipogenesis and gluconeogenesis. There is no gene in the region of chromosome 1 7169549 plus or minus 100 Kb.   The same method as that used for the calculation of SNPs in Subsection 3.2. was used to calculate the average genotype frequency and phenotype of INDELs (Table 3). The results indicate that the three INDELs (sorted from small to large on the chromosome) dominant genotypes are wild-type, heterozygous, and wild-type.  The same method as that used for the calculation of SNPs in Section 3.2 was used to calculate the average genotype frequency and phenotype of INDELs (Table 3). The results indicate that the three INDELs (sorted from small to large on the chromosome) dominant genotypes are wild-type, heterozygous, and wild-type.
The same additive effect found that the three INDELs resulted in varying degrees of glycogen content. In stepwise conditional analysis, important INDEL types were added to the univariate model to identify independent signals, and significant differences were found between the three INDELs wild-type and mutant individuals ( Figure S1, Table S1: INDEL1-INDEL3).

Discussion
Muscle glycogen and glucose in the glucose metabolism cycle are the main compounds that maintain normal metabolism and function in skeletal muscle cells. Glycogen also affects meat quality through its GP [26].
The Efficient Mixed-Model Association eXpedited (EMMAX) test and the Genome Association and Prediction Integrated Tool (GAPIT) both adopt the strategy of unchanged variance components in the fixed zero model to improve the operation speed. This is actually an approximate algorithm which is not as accurate as GEMMA. GEMMA can directly use the plink binary format data without complex data format conversion. It is more comprehensive and can perform single-label GWAS, multilabel GWAS and multitrait GWAS analysis.
INDEL marker refers to the insertion or deletion of nucleotide fragments of different sizes at the same site in the genome between different individuals of the same species or between different individuals of different species [27]. The distribution of INDEL markers in the genome is second only to SNP markers, but it has good stability, high polymorphism, and simple banding [28]. We performed GWAS for SNPs and INDELs associated with glycogen metabolism, both of which found significant sites that enriched the results, and identified more candidate SNP or INDEL sites and genes [21].
This trial aims to identify relevant SNP or INDEL sites and genes that affect chicken muscle glycogen content. Although many studies have attempted to identify the genetic determinants of chicken meat traits, most of them focused on the use of chips to study the visual traits of the slaughtered animal, and to date, there has been no related research on glucose metabolism indicators, such as muscle glycogen.
According to the association analysis of data binding phenotypes of the 474 individuals using whole-genome sequencing, two SNP sites on chromosome 2 are on the intronic region of the gene CPNE4, which belongs to the highly-conserved copine family. It encodes a calcium-dependent, phospholipid-binding protein which is significantly expressed in cancer tissues [29]. To date, there have been no reports of CPNE4 affecting glycogen content. The two SNP mutations in chromosome 2 have a glycogen-increasing effect; the SNP is located on the intronic region of the gene, and may have an enhancing effect by regulating the promoter. However, there are fewer individuals with mutations, which may be due to the small size of the population sample, and a larger sample and further research are needed in the future. CPNE4 can be considered as a candidate gene that affects muscle glycogen of chicken muscle, and provides a reference for marker-assisted selection of Jingxing yellow chicken. However, the related mechanism needs further verification.
The site at 62.1 Kb upstream of the rs733544036 on chromosome 4 is located adjacent to the glutamate ionotropic receptor AMPA type subunit 3 (GRIA3) gene. It has been reported that after astrocyte uptake of glutamate, the glutamate transporter initiates the glycolysis process [30][31][32], GRIA3 is involved in muscle glycogen synthesis and affects muscle glycogen content and changes in its content. The additive effect showed that the phenotype increased after mutation at this site; the mean values of the phenotypes of the wild-type and mutant individuals were significantly different. At 32.8 Kb upstream of the rs315075611 is the gene CLCN3, which encodes a member of the voltage-gated chloride channel (ClC) family. This ClC protein is required for lysophosphatidic acid (LPA)-activated Cl-current activity and fibroblast-myofibroblast differentiation. Indeed, the identification of receptors in lysophospholipid cells revealed that LPA is a transcellular PPAR gamma (PPARG) agonist [33], and PPARG regulates two enzymes involved in glucose metabolism by inhibiting TGFB1-induced mitochondrial activation [34]. As an inducer of brown fat, PPARG plays an important role in regulating glucose oxidation and glucose uptake in brown fat [35].
A total of three INDELs in the GWAS analysis reached a potential level of association. The INDEL with the most significant association to the glycogen metabolism is the locus in the NKD1 gene on chromosome 11. NKD1 is an inhibitor of the canonical WNT signaling pathway and is involved in β-catenin-dependent WNT signaling. GSK3A is one of the major target genes for activation of the WNT pathway. GSK3A was originally identified as one of five phosphorylating [36,37], rate-limiting glycogen-synthase kinases that help insulin regulate glycogen synthesis by phosphorylating and inhibiting glycogen synthase 1 (GYS1) activity, and thereby inhibiting glycogen synthesis [38,39].
Close to the chromosome three INDEL is the gene FOSL2, whose protein product activates CEBPB transcription in PGE2-activates osteoblasts. It is rapidly expressed during adipogenesis, induces CEBPA and PPARG after activation by phosphorylation, and activates a series of adipocyte genes that cause the adipocyte phenotype. Glucose and lipid metabolism are a research hotspot. A negative correlation between muscle GP and its fat content (r = -0.27, p < 0.05) has been reported [40]. We speculate that the INDEL affects FOSL2, thereby influencing the expression of lipid metabolism genes, and ultimately, the glycogen content.
A total of nine SNPs and three INDELs reached a significant level, and the SNP and INDEL genotypes were added to the univariate model to identify independent signals in a stepwise conditional analysis, which revealed nine SNPs and three INDELs wild type and mutations. Significant differences existed between individuals, and both site mutations and insertional deletions resulted in significant changes in glycogen content. Additive effects and other analyses showed that the loci identified by our analysis are valid and can be used for subsequent further verification, and also provide a relevant theoretical basis for related research.
In view of the described GWAS analysis, the relevant loci, and the identification of genes near each locus, we speculate that CPNE4 and NKD1 may affect the muscle glycogen metabolism process in chickens.

Conclusions
This study comprised a GWAS analysis of SNP and INDEL associated with chicken glycogen traits, revealing genes associated with muscle glycogen content. The identified glycogen content-associated genes, CPNE4 and NKD1, may be important candidate genes involved in the regulation of muscle glycogen content by influencing glucose metabolism. Our findings provide new insights into the genetic mechanism of muscle glycogen metabolism.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.