Genome-wide analysis of genetic diversity and artificial selection in Large White pigs in Russia

Breeding practices adopted at different farms are aimed at maximizing the profitability of pig farming. In this work, we have analyzed the genetic diversity of Large White pigs in Russia. We compared genomes of historic and modern Large White Russian breeds using 271 pig samples. We have identified 120 candidate regions associated with the differentiation of modern and historic pigs and analyzed genomic differences between the modern farms. The identified genes were associated with height, fitness, conformation, reproductive performance, and meat quality.


INTRODUCTION
Human consumption drives the artificial selection of farm animals. Understanding how selection creates genetic differences between populations of different farms is essential for effective livestock development.
In the last two centuries, a common strategy was to maximize pig farming profitability of highly productive commercial breeds (such as Large White, Landrace, and Duroc) with high growth rates, good feed conversion and increased lean meat yield . As a result, these breeds became popular worldwide, including in the Russian Federation (Traspov et al., 2016c;Traspov et al., 2016a;Traspov et al., 2016b;Čandek Potokar et al., 2019;Čandek Potokar & Nieto, 2019).
Considering that Yorkshire pigs in Northern America are direct descendants of the European Large White lineage (Amer et al., 2020), Large White is the most common commercial breed group. Countries that develop production usually import the breeding stock of Large White pigs since these pigs have a flexible genetic structure adapted to selection pressure (Getmantseva et al., 2020). This flexibility and genetic variation of the breed make it an exciting object for scientists striving to find the genomic regions and genes responsible for the variation.
The initial livestock of Large White pigs (approximately 100 animals) was brought to the Soviet Union from England in 1923. As a result of continuous breeding efforts, a new regional population of the Large White breed was created in the USSR during the second half of the 20th century (Traspov et al., 2016a;Getmantseva et al., 2020). The fall of the Soviet Union caused another period of hardship for Russian pig farming (Smith, 2014). The breeding programs were nearly stopped, farming practices deteriorated, pigs were massively affected by diseases, and were culled in huge numbers. After the USSR's collapse, the Soviet livestock was almost entirely replaced by imported pigs from the leading breeding centers of Denmark, France, England, Holland, Ireland, etc. Mitochondrial DNA analysis of pigs from various European breeding centers shows significant genetic differences (Getmantseva et al., 2020). In this work, we compare the Large White pigs of Soviet breeding with the modern commercial pigs. We have also analyzed the DNA structure of contemporary Large White pigs within and between the breeding farms in Russia. We have identified selection signatures attributed to the socio-economic conditions and breeding centers' practices.

Animals and sample collection
According to standard monitoring procedures and guidelines, the participating holdings specialists collected tissue samples, following the ethical protocols outlined in the Directive 2010/63/EU (2010). The pig ear samples (ear pluck) were obtained as a general breeding monitoring procedure or during the slaughter. The collection of ear samples is a standard practice in pig breeding (Kunhareang, Zhou & Hickford, 2010 = 21, all samples collected between 2018 and 2020). The Landrace (L = 23) and Duroc (D = 43) samples were collected between 2018 and 2020. Genomic DNA was extracted from ear samples using a DNA-Extran-2 reagent kit (OOO NPF Sintol, Russia) following the manufacturer's protocol. The quantity, quality, and integrity of DNA were assessed using a Qubit 2.0 fluorometer (Invitrogen/Life Technologies, USA) and a NanoDrop8000 spectrophotometer (Thermo Fisher Scientific, USA).

Genotyping
The samples were genotyped using the GeneSeek R GGP Porcine HD Genomic Profiler v1 (Illumina Inc, USA), which includes 68,516 SNPs evenly distributed with an average spacing of 25 kb. Genotype quality control and data filtering were performed using PLINK 1.9, as recommended by Marees et al. (2018). The total genotyping rate is 0.999307; 41,262 variants and 271 pigs passed the QC filters and were retained for further analysis.

Data availability
The dataset can be accessed at http://www.compubioverne.group/data/PIG/.

Population structure analysis
To study population structure, we performed a singular value decomposition (SVD) decomposition of the GRM using the SVD function in R ( Barker et al., 2001;Van Raden, 2008). R package AdmixTools was used to compute various F 2 statistics for all pairs of populations and F 3 statistics outgroup statistics estimating the relative divergence time for pairs of populations, using the Duroc pigs as an outgroup (Lazaridis et al., 2014;Patterson et al., 2012). AdmixTools was also and to plot the trees (Patterson et al., 2012;Liu et al., 2019). Using the find_graphs routine, we have generated and evaluated admixture graphs to find the best-fitting arrangements. Although F ST and F 2 statistics also calculate genetic distance or divergence time, they may be influenced by population sizes. Statistic F 3 (outgroup; A, B) estimates the genetic distance between the outgroup and branching point between populations A and B (Maier & Patterson, 2020).
To study the genetic structure, we used the VanRaden genomic relationship matrix (GRM)) (Van Raden, 2008). This matrix is constructed from the SNP matrix Z, where rows correspond to individuals and columns to markers, as G = ZZ k , where denominator k is calculated using the allele frequencies of genotyped individuals: k = 2 i p i (1 − p i ). The denominator attains maximum when all allele frequencies are equal to 1 2 . We performed the SVD decomposition of GRM using the SVD function in R. Singular value decomposition (SVD) is a valuable tool for characterizing population genetic structure to detect and extract small signals even if the data is noisy (Berrar, Dubitzky & Granzow, 2007). Besides, a graphics package in R based on the GRM matrix is used to visualize the relationships between the studied populations of pigs. Plots of the first and second SVD components and a heat map were generated to visualize the SVD results. We used the singular value decomposition (SVD) approach (Golub & Reinsch, 1971) to assess the genetic structure of the studied populations of Large White pigs in Russia.
To visualize the relationship between the studied populations of pigs using the graphics package in R, based on the GRM matrix, we built a heatmap plot that has separated the pigs by breeds. (R: A Language and Environment for Statistical Computing, http://www.r-project.org).

Detection of selection signatures
We used two statistics that can be calculated for unphased genotypic data: F ST and F LK . Fixation index F ST is a measure of population differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data, such as single-nucleotide polymorphisms (SNP) or microsatellites. F ST value of a locus is calculated as a ratio of the variance of allele frequencies between the populations and the sum of the variances within and between populations. Positive selection is indicated by high F ST values relative to their heterozygosity (Weigand & Leese, 2018). Smoothing of F ST isused to identify contiguous genomic regions under selection. The smoothed F ST method is based on the pure drift model of Nicholson et al. (2002). According to this model, individual SNPs are grouped into genomic windows, and their average smoothed F ST values are calculated. Smoothed F ST isuseful for analyzing distantly related populations and reveals subtle differences between them Porto- Neto et al. (2013).
We compared LW_OLD and LW_New groups using the F ST analysis to find genomic traces of recent selection resulting from different socio-economic conditions. We compared pigs from different farms to analyze how the selection centers' preferences and breeding practices affect the genomes. Then each farm was compared to the rest of the subgroups combined. SNP regions with smoothed F ST values above the 95th quantile indicate positive selection; the gene content of each region was analyzed.
F LK is a population differentiation statistic. The calculation incorporates a kinship matrix representing the relationship between populations. F LK -based methods are optimally effective when working with closely related populations (Bonhomme et al., 2010;Fariello et al., 2014). The F LK test is an extension of the Lewontin and Krakauer (LK) test (Lewontin & Krakauer, 1973), which takes into account both hierarchical structure between populations and population size heterogeneity by modeling the genetic discrepancy between populations resulting from population drift and division (Bertolini et al., 2018). We used the hapflk software (Gautier, 2015) (https://forge-dga.jouy.inra.fr/projects/hapflk). F LK was used to compare the LW_OLD vs. all LW_New groups. We have estimated the False Discovery Rate (FDR) for SNP identified by F LK and F ST using the qvalue package in R (Storey et al., 2021); we assumed the FDR cut-off of 0.15.

Functional analysis
Ensembl! Annotation of Sus scrofa 11.1, https://www.ensembl.org/index.html was used to analyze genes in the identified regions. Gene set enrichment analysis with Fisher's Exact test was done using the PANTHER database (http://www.pantherdb.org/). We have also studied the GWAS literature for humans and animals for all identified genes.

RESULTS
Admixture (Alexander, Novembre & Lange, 2009) analysis was conducted for K = 2, . . . ,20 (Fig. 1). Across all K values, only Duroc, LW_2, and LW_4 were represented by a single admixture component. Even at K = 2, the population structure of the Large White pigs is visibly complex, and it can be partitioned into subpopulations that generally agree with the farm of origin. Also, the admixture profiles of modern LW pigs are different from the Soviet-bred LW pigs.
The smallest cross-validation error was obtained for K = 10 (Fig. S1), and we have used this value in the subsequent calculations. We used Kullback-Leibler distance to partition each population into subpopulations. Large White 2 (LW_2) was a homogeneous population, LW_3 had two subpopulations (31,17,9); Three groups of pigs were partitioned into three subpopulations: Duroc (17,17,9), Landrace (20, 2, 1), and LW_4 (18, 2, 1); LW_Old were divided into four subpopulations (36,31,19,13). Next, we have applied GPS (Elhaik et al., 2014) to test the assignment accuracy using the leave-one-out validation procedure for subgroups with at least two members: all subpopulation labels were correctly recovered. Therefore, there is a possible lack of genetic continuity between the Soviet and Russian pig breeding and the absence of common breeding standards.
To investigate this further, we computed F 2 statistics for all pairs of populations and F 3 outgroup statistics, estimating the relative divergence time for pairs of populations, using the Duroc pigs as an outgroup (Table 1). Figure 2 shows the best graph (identified by the find_graphs routine) connecting the studied old and new populations. F 3 analysis shows that although the modern Large White pigs differ from farm to farm, they share more with each other and with the old Large White pigs than with the Landrace pigs.
We used the singular value decomposition (SVD) approach (Golub & Reinsch, 1971) to assess the genetic structure of the studied populations of Large White pigs in Russia. Figures 3 (A, B, C) shows the SVD analysis output in axes PC1/PC2, PC1/PC3, and PC2/PC3. Pre-defined breed/farm groups correspond to well-separated clusters. In Fig. 3D (Heatmap), all Large White pigs formed a different cluster, separated from the Duroc and Landrace breeds. The same trend can be seen in the PCA plot (Fig. 3A). PCA plots show (Fig. 3C) clear separation of Durok, Landrace, and all Large White breeds. Also, LW pigs from different farms form separate clusters. This result agrees with the admixture, F 2 , and F 3 analyses.  (Table S1). According to the F LK method with the most significant signals, 158 SNPs were identified (Fig. 4, Table S2), of which 27 SNPs on Chr6: 107577819-12095419 were also determined by the F ST method.
After determining the overlap of the identified SNPs with the known QTLs, we have found that despite producing different SNP lists, F ST and the F LK methods have resulted in nearly identical QTLs lists. According to either method, the identified areas overlap with quantitative trait loci (QTLs) for traits related to meat and anatomical characteristics, animal fitness, meat color, meat quality, conformation indicators of pigs, defects, susceptibility to diseases, blood biochemistry, reproductive traits (fertility and reproductive organs), and productivity traits (growth and development) (Table S3 and  S4).
Based on the pathway enrichment analysis, nine significant pathways were identified by F ST (Table 2). The genes detected by both approaches belong to six pathways: Synaptic vesicle trafficking (regulates the processes of the nervous system), T-cell activation (ensures the functioning of the immune system), Alzheimer disease-amyloid secretase pathway (responsible for brain function and behavior), Muscarinic acetylcholine receptor 1 and 3 signaling pathway (participates in the peripheral nervous system, controls parasympathetic reactions), PDGF signaling pathway (responsible for the structural and functional  (Table S5). This region contains 21 QTLs, 17 of which are associated with reproductive traits (twelve QTLs-Number of mummified pigs, four QTLs-Litter size, and one QTL-Litter weight total) (Table S6). Also, there are three QTLs for production traits (''Ratio of lifetime non-productive days to herd life'' and ''Bodyweight at birth'') and 1 QTL Health trait (''White blood cell number''). In this region, seven genes were identified (AP1B1, EWSR1, KREMEN1, NEFH, THOC5, TTC28, ZNRF3) ( Table S5). The enrichment analysis has identified the Wnt signaling pathway involved in regulating embryonic development (Table 3). In LW_2, signals were found on Chr1: 62307146-66047984 (70 SNPs) (Table S7). This area overlaps with 5 QTLs: Reproductive traits (Litter size), Exterior traits (Behavioral and Body shape), and Meat and carcass traits (Palmitic acid content and Ham weight) (Table  S8). In this region, nine genes were identified (FBXL4,FHL5,GPR63,KLHL32,MANEA,MMS22L,POU3F2,U6,UFL1) (Table S7). No known pathways were significantly enriched.

DISCUSSION
In nature, individuals with the highest fitness tend to have more offspring, increasing favorable alleles in the population and leaving traces in genomes. These signatures of selection can be used to identify genomic regions under selection pressure (Getmantseva et al., 2020). The mechanisms underlying phenotypic differentiation induced by pig breeding have been investigated using genome-wide genotype data or high throughput sequencing (Groenen et al., 2012;Yang et al., 2014;Diao et al., 2018;Gurgul et al., 2018;Xu et al., 2020b;Xu et al., 2020a;Bovo et al., 2020). Genomic loci associated with growth traits, reproductive traits, coat color, ear shape, and other phenotypes are now known, as well as the genes that influence these traits (Wilkinson et al., 2013;Zhang et al., 2018;Yu et al., 2020). We compared the Large White pigs of Russian breeding (LW_Old) and modern commercial pigs (LW_New) in this work. We have identified 120 genes (52 by the F ST method and 68 by the F LK method, and 16 genes by both methods) in genomic regions associated with the differentiation of LW_OLD and LW_NEW pigs.
Gene CNTN1 (SSC5) is a member of the neural immunoglobulin (Ig) subfamily and is involved in the formation of axonal connections in the developing nervous system (Wang et al., 2019). In vertebrates, the contactin family (CNTN) includes six related cell adhesion molecules participating in the nervous system formation and maintenance and in building neural circuits. CNTN genes are associated with an increased risk for autism (Lin et al., 2016). Also, genes associated with neural processes are often represented in the genomic regions associated with animals' domestication (Alberto et al., 2018). The B4GALT6 (SSC6) gene encodes lactosylceramide synthase, an essential enzyme for the biosynthesis of glycolipids. The GAREM1 gene (SSC6) encodes an adapter protein that functions in a signaling pathway mediated by the epidermal growth factor (EGF) receptor. B4GALT6 and GAREM1 are likely responsible for cardiac abnormalities, causing pigs to die during transportation (Zurbrigg, 2013;Zurbrigg et al., 2017).
The FHOD3 and DTNA genes are considered candidate genes associated with hypertrophic cardiomyopathy, a heart disease that affects all age groups (Liu et al., 2017;Qing et al., 2017), being the most common cause of heart failure and sudden death. FHOD3 gene plays a role in the polymerization of actin filaments in cardiomyocytes, while DTNA belongs to the dystrobrevin subfamily and the dystrophin family. Lack of dystrophin causes Duchenne muscular dystrophy and Becker muscular dystrophy (Tsoumpra et al., 2020). In pigs, dystrophin gene SNPs are associated with alterations in the accumulation of the dystrophin protein in skeletal muscle. It is related to sudden death caused by stress (Joshua et al., 2015).
The MPZL1 (SSC4) gene, also known as PZR, is a cell surface glycoprotein belonging to the immunoglobulin superfamily (Jia et al., 2014). In studies by , the MPZL1 gene was identified as a candidate gene for growth in Simmental beef cattle.
Modern pigs of the Large White breed, relative to pigs of the Soviet breeding program, are distinguished by a high growth rate, good feed conversion, and a smaller fat thickness. They are used in the breeding system as a mother breed and are highly fertile. The affected loci are responsible for productive traits (growth, feed conversion), fitness (fat content and percentage), reproductive traits (number of piglets at birth, multiple births, nest weight at birth). However, based on the results obtained, it can be noted that significant genetic differentiation between the study groups is due to changes in the loci responsible for the quality of meat. Over the past decades, the efforts of commercial pig breeders to increase production efficiency by accelerating animals' growth and development have led to a decrease in the quality indicators and technological properties of meat. These changes were mainly reflected in the content of intramuscular fat and the composition of fatty acids. These indicators determine muscle color, texture, water retention capacity, and the nutritional value of meat.
We have detected signals in areas responsible for animals' brain and nervous system functions and behavioral characteristics. We hypothesize that it may be due to the artificial selection of well-behaved, obedient animals for breeding.
The study of the modern livestock of Large White pigs, stratified by the selection centers, made it possible to identify individual characteristics in each subgroup. In LW_1, the signatures of modern artificial selection were identified in the genome regions mainly responsible for reproductive traits. QTLs for the number of mummified piglets were overrepresented in this area. We speculate that intensive breeding practices aimed to increase the saws' fertility can serve as one reason for mummified piglets since the limited volume of the uterus can lead to embryonic death during days 30 -115 of fetal development. LW_2 shows a signal in the area associated with a complex of traits responsible for Litter Size and quality, Behavioral, Fatty acid content, Anatomy, and Conformation. LW_3 breeding efforts were focused on meat properties, such as Conductivity 45 min post-mortem, Back Fat, Ham and Loin weight, Dressing, Lean meat, and muscle protein percentage. Signatures of artificial selection in LW_4 were evident in Corpus luteum number, Teat number, Intramuscular fat content, and Conductivity 45 min post-mortem.
Since the same phenotype can result from multiple genotypes, similar breeding strategies can result in the same phenotype but different genotypes. Our results suggested that the main emphasis in selecting modern Large White pigs is aimed at productive characteristics, quality, and technological parameters of meat. This hypothesis can be further tested when a larger sample becomes available.

CONCLUSIONS
To identify the putative areas under selection associated with prevailing trends in various socio-economic conditions and the specific practices and preferences of selection centers, we compared large white pigs of USSR selection and modern Russian commercial livestock. As a result, we found possible selection signals related to traits of height, fitness, conformation, reproductive performance, and meat quality and suggested genes that may act as candidate genes for these traits. These regions can be carefully tested using a larger set of pig samples. We have also identified possible genetic discontinuity between the Soviet-bred and modern Russian pigs.