A genome wide association study for the number of animals born dead in domestic pigs

The number of animals born dead, which includes the number of mummified (NM) and stillborn (NS) animals, is the most important trait to directly quantify the reproductive loss in domestic pigs. In this study, 282 Landrace sows and 250 Large White sows were genotyped by sequencing (GBS). A total of 816 and 1068 litter records for NM and NS were collected from them. A genome-wide association study (GWAS) was conducted to reveal the genetic difference between NM and NS. A total of 248 and 10 genome-wide significant SNPs were detected for NM and NS across numerous parities in Landrace pigs. The corresponding numbers for Large White pigs were 175 and 6, respectively. All of the detected SNPs were parity specific for both NM and NS in two breeds. Based on significant SNPs, in total 242 (146 for Landrace pig, 96 for Large White pig) and 10 significant chromosome regions (8 for Landrace pigs, 2 for Large White pigs) were found for NM and NS, respectively. Among them, 237 (142 for Landrace pig, 95 for Large White pig) and 8 significant chromosome regions (6 for Landrace pigs, 2 for Large White pigs) for NM and NS were not reported in previous studies. A list of candidate genes at the identified loci was proposed, including HMGB1, SOX5, KCNJ8, ABCC9 and YY1 for NM, ASTN1 for NS. This is the first time when GBS data was used to identify genetic regions affecting NM and NS in Landrace and Large White pigs. Many identified informative SNPs and candidate genes advance our understanding of the genetic architecture of NM and NS in pigs. However, further studies are needed to validate using larger populations with more breeds.


Background
Sow reproductive performance is one of the most important production traits in the pig industry. The total number born (TNB) and the number born alive (NBA) are two important traits to measure sow reproductive performance. The difference between them is defined as the number of piglets born dead, which directly measures loss in litters from the foetal stage to farrowing. The number born dead includes the number mummified (died before farrowing) (NM) and stillborn (dies during or shortly before farrowing) (NS). NM and NS are the most important traits for quantifying the loss of reproduction. In a normal clean environment, mummified piglets usually coincide with large litters, an insufficient nutrient supply, delayed placental development [1] and genetic defects [2]. Stillbirths are usually related to large litters, older sow age, slow farrowing or farrowing difficulties [1].
The basis of litter loss is multifactorial due to the quantitative characteristics of the traits, variation is determined by many genes with mostly small effects and an unknown number of non-genetic effects. Using molecular markers, many linkage studies have been conducted to find QTL and causal genes for NM and NS [3]. In the past few years, with the availability of high throughput genotyping, genome-wide association studies (GWAS) have been developed to identify DNA variants that are associated with complex diseases and genetic traits in humans and animals [4]. GWAS can be used to study the genetic architecture of swine reproductive loss. Using SNP chip information, many GWAS have been implemented to find QTL and causal mutations for NM and NS [3]. Recently, the availability of genome sequence information enabled to study the genomic architecture of complex quantitative traits, and several GWAS have been conducted based on whole genome sequence data [5]. However, there are limitations on quantitative trait studies due to the very large number of samples required, and these limitations have resulted in a very large cost for genotyping in sequence-based GWAS. Reduced-representation sequencing methods that use restriction enzymes for digestion to reduce genome complexity were found to be suitable for detecting SNPs from large numbers of samples with high reproducibility and low sample costs [6]. Genotyping by sequencing (GBS) is one such highly efficient technology for SNP detection on a genome-wide scale. This process has been successfully applied to plants and animals [3].
In general, the parity of sows significantly affects the sow's reproduction performance because it reflects the sow's reproductive age and physiological status. At prime parity, the litter size is smallest partly due to physiological immaturity. Up to the parity 4, 5, and 6, the litter size rapidly increases and then tends to stabilize, but it begins to decrease around parity 8 [7]. However, very little is known about the genetic architecture controlling the number of animals that are born dead in a time series focused on sow parity. In addition, the differences in genetic architecture between NM and NS for different parities are also poorly understood. Therefore, the research objective of this study was to identify significant chromosome regions that affect NM and NS in numerous parities and generate a list of candidate genes for the identified significant chromosome regions. The findings of this study provide new insight into the genetic mechanism of the number of animals that are born dead in domestic pigs.

Populations and phenotypes
Two pig breeds populations were used in this study: one Large White and one Landrace pig population. Two populations derived from the same commercial pig farm (Sichuan Tianzow Breeding Technology Co., Ltd), and introduced from Canadian Hylife Company at 2008. A total of 282 Landrace and 250 Large White producing sows were sampled randomly, and the pedigree data include 520 Landrace and 758 Large White pigs in two generations. In total, 816 and 1068 litter records for NM and NS were collected from the database of pig farm in 2012-2015, respectively. These records were grouped according to sow parity. For the Landrace pigs, the litter numbers were 282, 234, 179 and 121 for parities 1, 2, 3 and 4, respectively. For Large White pigs, the litter numbers were 250, 249, 244, 208 and 117 for parities 1, 2, 3, 4 and 5, respectively. Details are listed in Table 1.

Genotypes and quality control
A total of 250 Large White and 282 Landrace sows were genotyped using GBS technology (Novegene limited Inc.). Genomic DNA was extracted from ear tissue using a surfactant and the protease pyrolysis method. Genomic DNA samples were checked using agarose gel electrophoresis, Nanodrop and Qubit. Then, 0.1-1 μg genomic DNA was digested with the restriction endonuclease MseI. Next, P1 and P2 barcode adapters that recognize Mse1-compatible sequences were ligated to the digested DNA fragments. The restriction fragments were enriched by PCR amplification with adapter-specific primers. The quality evaluation was performed using Qubit2.0, Agilent 2100 and Q-PCR. The data sequence from the pair-end reads were generated with the Illumina HiSeq PE150. In the raw reads, N contents with > 10% of sequence length or with low quality bases (< 5) and a number > 50% of the sequence length were removed. Then, barcode sequences were eliminated. The clean data were aligned to the pig reference genome (Sscrofa11.1, Ensembl) using Burrows-Wheeler Aligner (BWA) with the parameters mem -t 4 -k 32 -M [8]. The genome analysis toolkit GATK was used to detect the SNPs using a Bayesian model [9]. Initially, a total of 10,445,924 SNPs were found. Quality control was implemented by VCFtools [10] with a minor allele frequency (MAF) of 0.01, missing rate of 0.2, Hardy-Weinberg equilibrium of 10 − 5 , and sites with a mean depth greater than or equal to 3 (dp3). After quality control, a total of 345,570 SNP markers met the quality requirements.

Statistical analysis
For GWAS, single-marker regression analysis was conducted using GEMMA software [9]. Because the phenotypic distributions for NM and NS were not normal, a transform ffiffiffiffiffiffiffiffiffiffiffi x þ 1 p for all of phenotypes were conducted firstly, where x is the phenotype value. Then, the birth year and month are included as a fixed effect, and the DMU software [11] was used to adjust these phenotype value. A univariate mixed linear model was utilized to evaluate the association between qualified SNPs and phenotypic values, separately for NM and NS in each breed and each parity.
where y is the vector of phenotypic observation; β is the SNP effect; a is the vector of the remaining polygene effect following the multi-normal distribution (a MVN ð0; Aσ 2 a Þ), A is a numerator relationship matrix; e is the vector of residual effects following the multi-normal distribution (e MVNð0; Iσ 2 e Þ); Z, W are incidence matrices for β and a, respectively. Bonferroni correction methods were used to adjust P value in multiple testing. Using the Bonferroni method, a genome level (0.05/N) and a suggestive (1/N) thresholds were used in this study, where N is the number of SNPs used for analyses.
In Landrace, a total of 251,678, 247,225, 245,353 and 240,368 SNPs for parity 1-4 were used to analysis. In Large White pigs, a total of 277,080, 275,687, 274,867, 272,405 and 271,736 SNPs for parity 1-5 were used to analysis. The percentage of phenotypic variance explained by each top significant SNP was calculated by where σ 2 f and σ 2 r are residual variances of linear models with and without SNP genotypes as predictor variables, respectively.

LD analysis
To detect the linkage disequilibrium (LD) between significant SNPs, the 40Kb region centring on each top SNP (with the highest -log(P) value) was used to performed LD analysis by Haploview software [12].

Candidate gene search
To identify candidate genes for genome-wide significant loci, a search for annotated genes within a 40Kb region centring on each top SNP (with the highest -log(P) value) of significant loci in the pig reference genome assembly (Build 11.1) was implemented using ANNOVAR software [13].

Single-parity GWAS for NM in Landrace pig population
In each parity, a mixed linear model was used to implement a single SNP association test. The association patterns of SNPs with NM in Landrace pigs are shown in Table 2 and Fig. 1a (left side). In this study, two adjacent significant SNPs were merged together if their distance was smaller than 40Kb. Table 2 only lists those significant chromosome regions that are genome level significant (0.05/N) and the SNP number involved is at least  Table S1. Three peaks were observed from Fig. 1a. The most significant chromosome region was located in 1.28-7.13 Mb on SSC11 in parity 3 and its top SNP (Position: 7114076 bp; P = 5.25E-14) was located in HMGB1 gene, which explained 1.36% of the phenotypic variance. In this region, a total of 99 SNPs were detected and 8 positional candidate genes was located ( Table 2). The second significant chromosome region was in 51.85-55.30 Mb on SSC2 in parity 1. In this region, a total of 7 SNPs was detected and only a candidate gene OR2T6 was determined. The number of significant SNP detected in parity 3 is the most, then is parity 1, and the least is parity 4 (Additional file 1: Table S1).

Single-parity GWAS for NM in large white pig population
The association results of SNPs with NM in Large White pigs based on single SNP tests are shown in Table 3 (only including the significant chromosome region with SNP number larger than 2) and Fig. 1b (right side). All detected genome-wide significant SNPs are listed in Additional file 1: Table S2.
Four peaks were observed from Fig. 1b. The first significant chromosome region was in 49.74-52.74 Mb on SSC5 in parity 4 and its top SNP was located in SOX5 gene, which explained 2.26% of the phenotypic variance. In this region, a total of 28 SNPs were detected and a list of candidate genes was located, including SOX5, ST8SIA1, KCNJ8, ABCC9, GYS2, SPX, GOLT1B and PLEKHA5. The second significant chromosome region was located in 62.98-63.02 Mb on SSC15 in parity 4 and its number of SNP involved was 14. However, in this region, no candidate gene was found. The third significant chromosome region was located in 159.12-159.35 Mb on SSC2 in parity 3. A total of 13 SNPs were detected in this region, but no candidate gene was located. The fourth significant chromosome region was located in 121.12-121. 16 Mb on SSC7 in parity 5. A total of 5 SNPs were detected in this region and a candidate gene YY1 was found here. The association patterns of SNPs for NM in Large White pigs across five parities were distinctly different with Landrace pigs. The number of detected significant SNPs in parity 5 is the most, the latter is parity 4, and the least is parity 1 (Additional file 1: Table S2).

Single-parity GWAS for NS
The association results of SNPs with NS in Landrace and Large White pigs are shown in Table 4 and Fig. 2. Table 4 only lists one genome level significant chromosome region with SNP number larger than 2 for Landrace and Large White pigs. Other significant SNPs and significant chromosome regions for NS are listed in Additional file 1: Tables S3 and S4. All of these detected significant chromosome regions for NS in this study were parity-specific (Additional file 1: Tables S3 and S4).
In Landrace pig population, one significant chromosome region was located in 131.82-131.86 Mb on SSC3 in parity 3. However, no candidate gene was found in this region. In Large White pig population, one significant chromosome region was mapped to 118.97-119.18 Mb on SSC9 in parity 2, and two candidate genes (ASTN1 and BRINP2) were located in this region.

LD results
By carrying out associate test with GBS data, many significant SNPs were identified. This study suggested six important SNPs including SSC11: 2898066 bp, SSC11: 7114076 bp, SSC5: 51702429 bp, SSC5: 49763855 bp, SSC9: 118987531 bp and SSC7: 121138988 bp. The markers within these significant regions (40Kb region centring on each significant SNP) were used to conduct LD analysis. The LD block plots were shown in Fig. 3 for each significant region. Only one block were detected on four significant regions using confidence interval algorithm. And no LD block was found for the 40Kb region centring on two SNP (SSC5: 51702429 bp and SSC5: 49763855 bp).

Discussion
In this study, a GWAS based on GBS data was implemented using univariate mixed linear model for animals  [14] showed that the number of significant chromosome regions for NM and NS were (36, 25), (27,17) and (41, 21) at parity 1, 2 and 3, respectively. The number of significant chromosome regions for NS was just slightly less than NM, moreover, most of positions of detected significant chromosome regions were different from this study. The difference may be caused by different pig populations, different genotyping methods (genotyping by sequencing vs. SNP chip) and different statistic methods (single marker regression vs. SNP sliding window). In Landrace pig population, for NM, a total of 4 significant chromosome regions overlapped with previous studies (Table 5), and the other 142 significant chromosome regions were new genomic regions that were not associated with NM before [15]; for NS, 2 significant chromosome regions overlapped with previous studies and the other 6 significant chromosome regions were new genomic region (Table 5 and Additional file 1: Table S3). In Large White pig population, only 1 significant chromosome region for NM overlapped with previous studies (Table 5) and the other 95 significant chromosome regions were new genomic region (Additional file 1: Table S2, PigQTLdb); for NS, all detected significant chromosome regions were not previously reported in literature (Table 5 and Additional file 1: Table S4). These results showed that the GBS genotyping technology can further identify novel loci relative to common commercial SNP chips. Although a lot of new SNPs for NM and NS were found in the present two populations, more validation studies are needed. In this study, the record numbers of each parity were small (see previous section), the small population may decrease the reliability of GWAS. Thus, these studies should be further verified using larger populations with more breeds.
GWAS on NM and NS to reveal the differences in genetic architecture between them The increase in number of mummified pigs and stillbirth animals would reduce the number of animals born alive in pig production. The increase in number of mummified pigs and stillbirth animals would reduce the number of animals born alive, affecting production. However, by conducting GWAS for NM and NS, very few common significant SNPs were found. Three SNPs (SSC1, position = 85,721,479, P NM = 2.99E-09, P NS = 4.70E-08; SSC3, position = 129,076,023, P NM = 3.96E-06, P NS = 2.27E-08; SSC4, position = 73,553,176, P NM = 1.30E-06, P NS = 8.16E-10) were found in common between NM and NS in Landrace pig population at parity 3 ( Table 6). These results implied that the two traits have very different genetic architecture and are controlled by highly polygenetic traits. In general, a few mummified piglets or stillborn occurring is not caused by disease. In this study, most NM and NS was in the range of 0-3 (see the means in Table 1). Therefore, the variation for NM and NS were caused by sow's genetic characteristics. The GWAS results further indicated the genetic differences and different selection strategies between NM and NS.
GWAS on NM and NS to reveal the differences in genetic architecture between sow parities In this study, a phenomenon was observed in the GWAS, i.e., all of significant SNPs for both NM and NS were parity-specific, and no significant SNPs or candidate genes were in common between parities (Table 6). This results were similar to Onteru et al. (2012) [14]. They speculated that there were possible temporal gene effects for each sow's parity. There are different physiological characteristics in different parities of sows due to age. This result showed that the genomic structures that control NM and NS were different in different parities, and further confirm that different parities should be considered as different traits. This also verified that it was necessary to conduct GWAS in different parities.
Plausible candidate genes at significant loci By separately conducting GWAS for NM and NS in Large white and Landrace pig populations, several known genes that have their functions related to lethality, expression of reproductive system and embryo mesenchyme were found in this study. Thus, it is important to identify these genes.  Table 2). Two important functional candidate genes (HMGB1 and SPATA13) were located in this region. The HMGB1 (High Mobility Group Box 1) gene plays a role in several cellular processes, including inflammation, cell differentiation and tumor cell migration. Knockout mutations in the HMGB1 gene resulted in lethality in mice [16,17].
At 49.74-53.88 Mb on SSC5, a significant chromosome region for NM was detected in Large White population at parity 4 (Table 3). Three positional functional candidate genes (SOX5, KCNJ8 and ABCC9) were located in this region. The SOX5 (SRY-Box 5) gene is expressed in the embryo mesenchyme and many important physiological system [18] that could affect the health of sow and the embryo. And mutation of SOX5 seem to result in neonatal death, and carriers of a SOX5 deletion also show several clinical features [19]. The KCNJ8 (Potassium Voltage-Gated Channel Subfamily J Member 8) and ABCC9 (ATP Binding Cassette Subfamily C Member 9) genes are related to Hypertrichotic Osteochondrodysplasia [20]. The KCNJ8 is expressed in the embryo mesenchyme and reproductive system [21], and the mutation of KCNJ8 directly affected the immune system, homeostasis and mortality [22]. This gene plays an important role in reproductive and immune system and development of embryo. On one hand, this gene could directly affect the development of embryo. On the other hand, due to the effect on sow's health, it could indirectly result in fetal disease or death. Furthermore, the ABCC9 gene is also associated with homeostasis and mortality [23], and is involved in the body's immune system progress [22]. Thus, these two genes were proposed as important candidate genes for NM.
At 121.12-121. 16 Mb on SSC7, a significant chromosome region for NM was detected in parity 5 (Table 3). Its top SNP was located on YY1 (YY1 Transcription Factor) gene. This gene is a protein-coding gene and associated with peri-implantation lethality [24]. Therefore, it was proposed as candidate gene for NM.

Conclusions
In this study, we performed a GWAS study to detect genomic regions associated with NM and NS in different parities in two pig populations. These findings advance our understanding of the genetic architecture of the number of animals born dead in domestic pigs. Most of SNP detected were different between NM and NS. All of significant SNPs for both NM and NS were parity-specific, and no candidate gene was in common between parities. GBS technology explores many new SNPs for NM and NS, however, these SNPs should be further validated using larger populations with more breeds.

Additional file
Additional file 1: Table S1. Summary of significant chromosome regions including genomic level (0.05/N) for NM in Landrace population, N is the number of SNPs used for analyses. Table S2. Summary of significant chromosome regions including genomic level (0.05/N) for NM in Large White population, N is the number of SNPs used for analyses. Table S3. Summary of significant chromosome regions including genomic level (0.05/N) for NS in Landrace population, N is the number of SNPs used for analyses. Table S4.