Genome-wide comparative analysis reveals selection signatures for reproduction traits in prolific Suffolk sheep

The identification of genome-wide selection signatures can reveal the potential genetic mechanisms involved in the generation of new breeds through natural or artificial selection. In this study, we screened the genome-wide selection signatures of prolific Suffolk sheep, a new strain of multiparous mutton sheep, to identify candidate genes for reproduction traits and unravel the germplasm characteristics and population genetic evolution of this new strain of Suffolk sheep. Whole-genome resequencing was performed at an effective sequencing depth of 20× for genomic diversity and population structure analysis. Additionally, selection signatures were investigated in prolific Suffolk sheep, Suffolk sheep, and Hu sheep using fixation index (F ST) and heterozygosity H) analysis. A total of 5,236.338 Gb of high-quality genomic data and 28,767,952 SNPs were obtained for prolific Suffolk sheep. Moreover, 99 selection signals spanning candidate genes were identified. Twenty-three genes were significantly associated with KEGG pathway and Gene Ontology terms related to reproduction, growth, immunity, and metabolism. Through selective signal analysis, genes such as ARHGEF4, CATIP, and CCDC115 were found to be significantly correlated with reproductive traits in prolific Suffolk sheep and were highly associated with the mTOR signaling pathway, the melanogenic pathway, and the Hippo signaling pathways, among others. These results contribute to the understanding of the evolution of artificial selection in prolific Suffolk sheep and provide candidate reproduction-related genes that may be beneficial for the establishment of new sheep breeds.


Introduction
China has the largest sheep flock and is the largest producer of sheep meat worldwide.According to the "National Breed List of Livestock and Poultry Genetic Resources" (2021 edition), there are 89 sheep breeds in China, including 44 indigenous breeds, 32 improved breeds, and 13 introduced breeds.Developing sheep breeds with high prolificacy has become a key goal in livestock breeding (Cottle, 2010;Miao et al., 2016;Gowane et al., 2017).Many new sheep breeds have been established in China in the past decade, such as Luxi Black Head sheep (Liu et al., 2022), Huang-huai sheep (Quan et al., 2020), and a new breed of prolific Suffolk sheep (Wang et al., 2020), which was established through grading hybridization between Suffolk sheep and Hu sheep over 12 years of breeding.Suffolk sheep make excellent male parents for terminal crosses while Hu sheep are known for their high fertility, and are widely used in sheep farming in Xinjiang, China.Thus, prolific Suffolk sheep have the advantages of high-quality meat from Suffolk sheep and prolificacy characteristics from Hu sheep.
Selection signature analysis can help identify the genomic imprint of livestock resulting from the process of domestication or artificial selection.Based on this strategy, researchers can scan the regions that are associated with important economic traits that have been subjected to selection during the domestication of livestock, locate the selected genes or genetic markers, and identify the genetic mutations associated with these traits, so as to achieve variety improvement and new germplasm creation (Horscroft et al., 2019).Additionally, population-based resequencing can reveal the evolutionary relationships between populations, help identify the excellent genetic resources of each breed, and contribute to the understanding of the genetic diversity between populations; this provides strong support for the selection of new breeds and the promotion of the development of animal husbandry.Li et al. conducted 25.7× whole-genome resequencing of wild and domestic sheep, which revealed the genetic mechanism underlying various agricultural traits in domesticated sheep (e.g., reproduction, wool production), thereby providing valuable genomic resources for research on sheep genetics (Li et al., 2020).Similarly, Zhang et al. employed genome resequencing technology to uncover the natural selection molecular imprinting of wild and domesticated sheep.They identified IFI44, PNK2, and RNF24 as being related to the immunity of sheep, thereby providing insights into the molecular mechanism underpinning the origins of phenotypic variation induced by sheep domestication and improvement (Zhang et al., 2022).Sweet-Jones et al. used whole-genome resequencing to perform high-depth scanning for selection signatures linked to the adaptability of Welsh sheep (Sweet-Jones et al., 2021).The authors reported that the RNF24, PANK2, and MUC15 genes had strong selection signals, with potential functions in the environmental adaptability of local Welsh breeds.Furthermore, Wang et al. (2017) sequenced mixed pools of multiple-lamb and single-lamb Duolang sheep populations, and identified six genes related to reproductive performance, including INHBA, NCOA1, INGS, BMPR-IB, ARNT, and KLHL1 (Sui, 2017).Zhang et al. used fixation index (F ST ) analysis to detect genome-wide selection signals in five sheep breeds, and found that RXFP2, GHR, and ASIP were related to the shape, growth, and lipid metabolism of horns (Zhang et al., 2013).Therefore, we used whole genome resequencing technology to screen genes related to important economic traits, reveal the genetic basis of breeding breeds, and provide a basis for the selection and breeding of multiple new breeds of mutton sheep.
In this study, we resequenced the whole genomes of 90 Hu sheep, Suffolk sheep, and prolific Suffolk sheep to explore their genetic structure and the genetic variance among the breeds as well as identify candidate regions and genes related to reproductive traits.Additionally, F ST and H analysis was used to identify selection signals unique to prolific Suffolk sheep, while functional enrichment analysis was undertaken to identify major genes closely related to reproductive traits.Our aim was to provide a theoretical basis for the breeding of prolific Suffolk sheep breeds as well as offer further insights into the selection of local sheep breeds in China.

Sample collection
In this study, based on pedigree information, 50 healthy and unrelated prolific Suffolk sheep of the same age (26 rams and 24 ewes with 196% average lambing rate) were selected.The 24 ewes were divided into two groups, namely, a multi-lamb group, consisting of 12 ewes with a 275% lambing rate (Figure 1A), and a single-lamb group, comprising 12 ewes with a lambing rate of 117% (Figure 1B).Twenty Hu sheep with a 230% lambing rate (Figure 1C) and 20 Suffolk sheep with a 140% lambing rate (Figure 1D) were also used (Supplementary Table S6).All the sheep came from the sheep farm of Xinjiang Academy of Agricultural and Reclamation Science.Blood samples (5 mL) collected from the jugular vein were placed in EDTA-Na 2 and stored at −20 °C until further processing.

DNA isolation and sequencing
DNA was extracted from blood using the TIANamp Blood DNA Kit (TIANGEN, China) according to the manufacturer's instructions.The quality of the genomic DNA was assessed by 1% agarose gel electrophoresis while its concentration and purity were evaluated using a NanoDrop 2000 spectrophotometer.DNA libraries were prepared using the TruSeq Library Construction Kit.The DNA was randomly fragmented to an average size of 350 bp using ultrasonication and sequencing libraries were constructed following the manufacturer's instructions (Illumina, San Diego, CA, USA).The libraries were paired-end sequenced at ~20× coverage on the Illumina HiSeq2500 Platform (Illumina Inc.) by Beijing Compass Biotechnology Co., Ltd (Beijing, China).To provide reliable data for subsequent analysis, the original sequencing data were filtered to remove reads with linker sequences, sequences with more than 10% N content, and low-quality data (Q-value ≤5).Subsequent analyses were based on these clean data.

Phylogenetic analysis and population Dynamics
Based on the neighbor-joining (NJ) method, a phylogenetic tree was constructed with the set of quality-filtered SNPs using the Phylogeny Inference Package (PHYLIP) (Felsenstein, 1989).Cluster analysis for elucidating population structure was performed using ADMIXTURE (v.1.3.0) with the following parameters: "for K in 2 3, do admixture --cv sheep.bed$K | tee log${K }.out, done" with a maximum of 10,000 iterations.Principal Component Analysis (PCA) of the 90 samples was performed using the EIGENSOFT package, v.7.2.1 (Price et al., 2006).

Selective sweep analysis
In this study, H and the F ST were calculated with VCFtools v.0.1.14using a sliding window approach (100-kb windows with a 50-kb step size) (Zhang et al., 2021).The parameters for the VCFtools program were as follows: "--fst-window -size 100,000 --fst-window-step 50,000".The top 5% F ST and H values were selected as the threshold to map the selected loci on autosomes and identify differences between any two populations.The intersection candidate area was considered as the selection signal in the test.
The F ST calculation formulas: F ST HT−HS HT , among which H T represents the expected heterozygosity of alleles in the total population, and H S represents the weighted average heterozygosity of different subgroups in the total population.
The H calculation formulas: He 1 − pi 2 , among which Pi is the frequency of the ith allele, and the expected heterozygosity of the sliding window is the mean value of the expected heterozygosity of each SNP site in the region.

Functional enrichment analysis
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was performed to identify clusters of functionally related genes (Harris et al., 2004;Jiao et al., 2012).GO term analysis included the biological process (BP), molecular function (MF), and cellular component (CC) categories.A significance threshold of <0.05 was used to determine GO term enrichment in a set of genes.KEGG pathway enrichment analysis was performed using KOBAS 2.0 (http://kobas.cbi.pku.edu.cn/) and a corrected p-value of <0.05 was set as the threshold for significance.In order to avoid false positive   results, the enrichment analysis result P was corrected by multiple tests (False Discovery Rate, FDR).
The formula of FDR is: FDR P × n (rankP) , among which, P is the original p-value, n is the number of tests, and rankP is the level of a specific original p-value.When FDR ≤0.05, GO terms and pathways that meet this condition are defined as significant enrichment of candidate genes.

Sequencing and mapping
A total of 5,245.855Gb of raw data for 90 individuals were obtained by resequencing on the Illumina HiSeq 2,500 platform, with 5,236.338Gb of clean reads remaining after filtering.The Q20 value of the clean reads was ≥95.73% while the Q30 value was ≥89.69%.The GC content ranged between 43.09% and 46.62%.The genome mapping rate relative to the sheep reference genome (Oar_v4.0)ranged between 98.65% and 99.32%.The average coverage depth was approximately 17.39× for all three sheep breeds, the 1 × average coverage was more than 98.06%, and the 4 × average coverage was more than 94.29%, indicating that the data were accurate and reliable (Supplementary Table S1).SAMTools was used to collect summary information from the input binary alignment/map (BAM) files, compute the likelihood of each genotype, and then convert the information into binary variant call format (BCF).ANNOVAR software was used for the functional annotation of gene mutations and for converting the data into variant call format (VCF) for subsequent analysis.

SNP identification and annotation
Additionally, 84,505,591 SNPs in all three sheep breeds and 28,767,952 SNPs in prolific Suffolk sheep were annotated with SAMTools v.0.1.19.In the prolific Suffolk sheep population, 204,304 exonic SNPs, 1,520,657 non-synonymous mutations (5.29%), and 2,101,043 synonymous mutations were identified.In the Hu sheep population, a total of 29,080,742 SNPs were annotated, 207,059 of which were located in exons; 620,064 nonsynonymous mutations (2.13%) and 866,996 synonymous mutations were also identified.In Suffolk sheep, a total of 26,656,897 SNPs were annotated; 187,316 of these SNPs were located in exons, while 606,196 non-synonymous mutations (2.27%) and 838,500 synonymous mutations were identified.The non-synonymous/synonymous ratio was 0.72 in all three sheep breeds (Supplementary Table S2).

Phylogenetic analysis
To investigate the genetic relationship among the three sheep breeds, a genetic distance matrix was calculated based on the SNPs after whole-genome quality control, and a phylogenetic tree of the three populations was constructed using the NJ method.The NJ tree was constructed based on the JTT + G model with 1,000 bootstrap replicates.NJ tree analysis showed that the three sheep varieties were separated into three independent genetic groups (Figure 2A).

Population genetic Structure
As shown in Figure 2B, at K = 2, Hu sheep clustered into one type, prolific Suffolk sheep and Suffolk sheep clustered into another type, and there was gene flow within them.At K = 3, prolific Suffolk sheep were clearly separated from both Hu and Suffolk sheep.Prolific Suffolk sheep and Suffolk sheep were clustered together.Hu sheep had a long genetic distance from the other two populations.Also, a close genetic relationship was discovered between prolific Suffolk sheep and Suffolk sheep, which was also consistent with the breeding process of the former.

Principal component analysis
To examine the genetic relationship among and within the three varieties, we conducted a PCA.The first Eigenvector clearly distinguished Hu sheep from Prolific Suffolk sheep and Suffolk sheep, while the second Eigenvector distinguished prolific Suffolk sheep from Suffolk sheep (Figure 2C).As expected, the results of the PCA were similar to those obtained for the phylogenetic tree and population genetic structure analysis, showing that the selected samples had good consistency.

Selective imprints of prolific Suffolk sheep, Suffolk sheep, and Hu sheep
To accurately identify the biological markers associated with the germplasm characteristics of prolific Suffolk sheep, the sliding window method (window size: 100 kb, step size: 50 kb) was used to scan the selection signals on autosomes.The top 5% F ST and H values were selected as the threshold to map the selected loci on autosomes and identify differences between any two populations.A total of 137 selected regions were scanned and 154 candidate genes were mapped in the comparison between prolific Suffolk sheep and Hu sheep (F ST > 0.249001 and H < 0.224707) (Figures 3A,C; Supplementary Tables S3 and S4).For prolific Suffolk sheep versus Suffolk sheep, a total of 99 selected regions were screened and 59 candidate genes were mapped (F ST > 0.178916 and H < 0.224707) (Figures 3B,C; Supplementary Tables S3 and S4).Repeats were removed from the 213 candidate genes in the two comparison groups.Finally, 190 candidate genes were screened, 14 of which were related to reproduction traits, including WNT10A, SENP2, and WNT6 (Table 1).Furthermore, 23 genes, including ARHGEF4, CATIP, CCDC115, and CDK5R2, were found to be unique to prolific Suffolk sheep (Figure 3D; Supplementary Table S5).These genes may represent the germplasm-specific genetic information retained during prolific Suffolk sheep breeding and selection (Table 2).199,719,200,719,210,719,221,722,337,724,813,724,819,726,979,748,143 2 2 WDR33 116,584,306,116,588,784,116,597,221,116,597,328,116,597,342,116,629,864,116,629,945 2 XRCC5 217,112,784,217,115,691,217,115,696,217,115,780,217,133,944,217,176 116,728,390,116,728,516,116,728,884,116,729,018,116,729,124,116,738,675,116,738,777,116,738 ,342, 5,0,041,821, 50,047,784, 50,048,122, 50,048,133, 50,048,234, 50,048,253, 50,048,344, 50,048,365, 50,048,407, 50,048,478, 50,048,503, 50,048,560, 50,048,649, 50,051,633, 50,092,909, 50,096,284, 50,096,399, 50,143,465 Frontiers in Genetics frontiersin.org06 Yang et al. 10.3389/fgene.2024.1404031

Functional enrichment analysis
Additionally, Gene Ontology (GO) enrichment analysis revealed key biological processes associated with germplasm-specific genes in prolific Suffolk sheep.Candidate genes were enriched in positive regulation of defense response to virus by host, and G protein-coupled purinergic nucleotide receptor activity and other processes and were mainly related to signal transduction.Meanwhile, KEGG pathway enrichment analysis indicated that the candidate genes WNT10A, PPP2R2A, and WDR33 were enriched in the mTOR signaling pathway, melanogenesis, and the Hippo signaling pathway (p < 0.05) (Figure 4).

Selective imprints of reproduction traits in prolific Suffolk sheep
Next, we explored the selective imprints of reproduction traits in prolific Suffolk sheep resulting from artificial selection.For this, selection signals were compared between the multi-lamb and singlelamb groups using 100-kb sliding windows with a step size of 50 kb across the genome.A combination of both F ST and H analysis methods was also employed to scan for selection signals on autosomes.We identified 29 selected regions and 24 candidate genes (Figures 5A,B; Table 3).Genes related to reproductive traits, such as MTNR1A, ITSN1, and GBE1, among others, were subjected to GO and KEGG enrichment analysis and were found to be associated with litter size.Furthermore, the identified genes were mainly enriched in circadian entrainment, MAPK signaling pathway, AMPK signaling pathway, and mTOR signaling pathway (Figures 5C,D).

Discussion
Sheep farming plays an important role in the global animal husbandry industry, supplying a wide range of products such as Frontiers in Genetics frontiersin.orgwool, meat, milk, and skin.In China, sheep farming has been an important part of the agricultural economy and rural life for many centuries.Over recent years, the meat sheep breeding industry has undergone rapid development, characterized by an increasing abundance of germplasm resources, a gradually improving breeding system, and a steadily increasing level of sheep production (Meadows and Kijas, 2010).However, overall, there is still a large gap between China and developed countries in this respect, a limitation that can mostly be explained by the low fecundity of most Chinese breeds (Han et al., 2015).Litter size is a complex trait that is influenced by many factors, including genetic background, nutritional level, and feeding management.It is the main trait in sheep reproductive performance and one of the main selection objectives in meat sheep breeding programs (Baelden et al., 2005).Increasing litter size proved to be an effective means of improving the economic benefits of the mutton industry (Dong et al., 2013).Therefore, in this study, we performed genome-wide resequencing to analyze germplasm-specific genes, genetic variation, and the genome map of self-bred prolific Suffolk sheep.Furthermore, we identified the main genes and SNPs related to their reproductive traits.Our findings contribute to unraveling the germplasm characteristics and population genetic evolution of this new strain of Suffolk sheep and provide a basis for the creation of new breeds of prolific meat sheep in China.
Prolific Suffolk sheep is a new strain developed by our group through 12 years of breeding in Xinjiang, China.Hu sheep served as the female parents and Suffolk sheep as the male parents in hybrid breeding.The breeding process involved hybridization, fixation in a two-way crossbreed closed flock, and herd propagation, resulting in a breed with high prolificacy and high-quality meat performance.The new strain is a stabilized composite breed made up of 87.5% Suffolk blood proportion and 12.5% Hu blood proportion (Yang et al., 2021a).In this study, we found that the Hu sheep breed was highly distinct from the other two breeds, exhibiting a large genetic distance from both, which was also consistent with the breeding process of prolific Suffolk sheep.Through selective signal analysis, the WNT10A, PPP2R2A, and WDR33 genes were found to be significantly related to reproductive traits in prolific Suffolk sheep and were highly associated with the mTOR signaling pathway, the melanogenic pathway, and the Hippo signaling pathway, among others.mTOR is a highly conserved serine-threonine protein kinase that forms two distinct complexes, mTORC1 and mTORC2 (Sui et al., 2021).mTOR complexes are sensitive to growth factor, amino acid, and oxidative stress stimulation and are involved in multiple biological processes, such as lipid metabolism, autophagy, protein synthesis, and nucleosome biosynthesis (Yao et al., 2021).The melanogenesis pathway involves a complex series of enzymatically and chemically catalyzed reactions (Pillaiyar et al., 2017).The MC1 receptor (MC1R) and its ligand melanocortin are important positive regulators of melanin production (Dnyane and Gadgil, 2020).MC1R activates cyclic AMP (cAMP) response element binding protein (CREB).Tyrosinase (TYR) is the rate-limiting enzyme in melanin synthesis, which occurs in specialized cell organelles, called melanosomes, which are transferred to keratinocytes through mechanisms that have not been fully characterized Selective imprints of the reproductive traits in multi-lamb and single-lamb groups.(A) Selection signals in the multi-lamb and single-lamb groups.(B) Distribution of heterozygosity (H) on the autosome of the multi-lamb group.(C) GO term enrichment analysis for genes related to reproductive traits.(D) KEGG pathway enrichment analysis for genes related to reproductive traits.(Varghese et al., 2021).The results of the present study indicated that phenotypic uniformity was achieved in prolific Suffolk sheep during the 12 years of breeding.The appearance of pure black coat color on the head and limbs is one of the characteristics of the breed, indicating that differences in melanin production may underlie this phenotype.Additionally, the Hippo and WNT signaling pathways are closely related.The Hippo signaling pathway comprises a group of conserved kinases that can inhibit the normal growth of cells and participates in the regulation of organ and tissue size (Heallen et al., 2011;Yang et al., 2021b).The phosphorylation of the protein kinase Warts, a constituent of the Hippo signaling pathway, results in its activation, leading to a series of changes in signaling pathways associated with reproduction traits in female livestock (Vitezslav and Vladimir, 2018;Hidayah et al., 2019).The WNT10A gene has been shown to act in the canonical Wnt/β-catenin signaling pathway.It is expressed in epithelial and mesenchymal cells throughout tooth development and plays an important role in this process (Kanchanasevee et al., 2020;Zeng et al., 2021).In this study, we found that the selected genes were enriched in the WNT and mTOR signaling pathways, suggesting that WNT10A may have contributed to changes in tooth development, hair follicle growth, and reproductive performance during the breeding of prolific Suffolk sheep.Similarly, the PPP2R2A gene is mainly

22
Starič et al. investigated the correlation between MTNR1A gene polymorphism and litter size in Slovenian sheep and similarly found that the g.17355452 locus had a significant effect on litter size (Stari et al., 2020).These results were similar to those recorded in this study.The protein encoded by the GBE1 gene is a glycogen-branching enzyme that may be involved in energy regulation during animal growth (Li et al., 2018).UROIIIS, encoded by the UROS gene, may be involved in metabolic regulation during growth (Blouin et al., 2021).Our results suggested that the GBE1 and UROS genes may regulate the growth, development, and metabolism of prolific Suffolk sheep.
Through selective signature analysis, we identified the genes that were specifically differentially expressed in prolific Suffolk sheep, a new strain developed by crossing Hu sheep and Suffolk sheep.These genes were mainly enriched in pathways related to reproduction, immunity, growth, energy metabolism, and sugar metabolism.Our findings provide both a basis for the molecular breeding of new breeds of prolific meat sheep as well as target genes and functional sites for the establishment of other new sheep breeds.
In conclusion, this study provided a comprehensive insight into the germplasm characteristics of prolific Suffolk sheep.We identified several germplasm-specific candidate genes and markers under selection in prolific Suffolk sheep, Suffolk sheep, and Hu sheep.These genes play essential roles in reproduction, growth and development, among other economic traits.The large number of genetic variants identified in the study represents an opportunity for further exploring the genetic diversity and the associated phenotypic variation in prolific Suffolk sheep.Our results contribute to the understanding of the genetic makeup of prolific Suffolk sheep and provide valuable information for future development and improvement of new breeds.

FIGURE 1
FIGURE 1 Sheep breeds.(A) Prolific Suffolk sheep in the multi-lamb group.(B) Prolific Suffolk sheep in the single-lamb group.(C) Hu sheep.(D) Suffolk sheep.

FIGURE 2
FIGURE 2 Population genetics analysis.(A) Neighbor-joining phylogenetic tree.(B) Population structure based on 90 individuals as determined using ADMIXTURE with K = 2, 3. (C) The results of the Principal Component Analysis (PCA) for the three sheep breeds.

FIGURE 3 (
FIGURE 3 (A) The distribution of F ST values on autosomes in prolific Suffolk sheep and Hu sheep.(B) The distribution of F ST values on autosomes in prolific Suffolk sheep and Suffolk sheep.(C) The distribution of heterozygosity (H) on autosomes in prolific Suffolk sheep.(D) Venn diagram of the unique genes in prolific Suffolk sheep.

FIGURE 4
FIGURE 4Enrichment results for specific genes in prolific Suffolk sheep.(A) GO term enrichment.(B) KEGG pathway enrichment.

TABLE 1
Enriched items and genes related to reproduction in prolific Suffolk sheep.

TABLE 2
Genetic variation information for prolific Suffolk sheep.

TABLE 3
Variation information for genes in prolific Suffolk sheep.