Accelerated deciphering of the genetic architecture of agricultural economic traits in pigs using a low-coverage whole-genome sequencing strategy

Abstract Background Uncovering the genetic architecture of economic traits in pigs is important for agricultural breeding. However, high-density haplotype reference panels are unavailable in most agricultural species, limiting accurate genotype imputation in large populations. Moreover, the infinitesimal model of quantitative traits implies that weak association signals tend to be spread across most of the genome, further complicating the genetic analysis. Hence, there is a need to develop new methods for sequencing large cohorts without large reference panels. Results We describe a Tn5-based highly accurate, cost- and time-efficient, low-coverage sequencing method to obtain 11.3 million whole-genome single-nucleotide polymorphisms in 2,869 Duroc boars at a mean depth of 0.73×. On the basis of these single-nucleotide polymorphisms, a genome-wide association study was performed, resulting in 14 quantitative trait loci (QTLs) for 7 of 21 important agricultural traits in pigs. These QTLs harbour genes, such as ABCD4 for total teat number and HMGA1 for back fat thickness, and provided a starting point for further investigation. The inheritance models of the different traits varied greatly. Most follow the minor-polygene model, but this can be attributed to different reasons, such as the shaping of genetic architecture by artificial selection for this population and sufficiently interconnected minor gene regulatory networks. Conclusions Genome-wide association study results for 21 important agricultural traits identified 14 QTLs/genes and showed their genetic architectures, providing guidance for genetic improvement harnessing genomic features. The Tn5-based low-coverage sequencing method can be applied to large-scale genome studies for any species without a good reference panel and can be used for agricultural breeding.

Background: Uncovering the genetic architecture of economic traits in pigs is important for agricultural breeding. Two difficulties limiting the genetic analysis of complex traits are the unavailability of high-density markers for large population in most agricultural species which are lack of good reference panel, and the association signals tend to be spread across most of the genome, i.e., the infinitesimal model of quantitative traits. Findings: Here, we discovered a Tn5-based highly accurate, cost-and time-efficient, low coverage sequencing (LCS) method to obtain whole genome markers and performed whole-genome sequencing on 2,869 Duroc boars at an average depth of 0.73× to identify 11.3 M SNPs. Based on these SNPs, the genome-wide association study (GWAS) detected 14 quantitative trait loci (QTLs) in 7 of 21 important agricultural traits in pigs and provided a starting point for further investigation such as ABCD4 for total teat number and HMGA1 for back fat thickness. The inheritance models of different traits were found to vary greatly. Most obey the minor-polygene model but can be attributed to different reasons, such as the shaping of genetic architecture by artificial selection for this population and sufficiently interconnected minor gene regulatory networks. Conclusions: GWAS results for 21 important agricultural traits identified tens of important QTLs/genes and showed their various genetic architectures, providing promising guidance for genetic improvement harnessing genomic feature. The Tn5-based LCS method can be applied to large-scale genome studies for any species without good reference panel and widely used for agricultural breeding.
Availability of data and materials Genome-wide association studies (GWAS) have identified thousands of genetic 43 variants associated with complex traits in humans and agricultural species [1,2]. The 44 mapping resolution lies on the density of genetic markers which perceive linkage 45 disequilibrium (LD) in sufficiently large populations [3,4]. Despite the declining cost 46 of sequencing, it is still expensive for agricultural breeding studies to apply wholegenome sequencing to all individuals in a large cohort (thousands of levels). In many 48 scenarios, imputation-based strategies, which impute low-density panels to higher 49 densities, offer an alternative to systematic genotyping or sequencing [5,6]. To date, 50 array-based genotype imputation has been widely used in agricultural species [7,8]. 51 The imputation accuracy of this strategy crucially depends on the reference panel sizes 52 and genetic distances between the reference and target populations. However, the 53 unavailability of large reference panels and array designs for target populations in 54 agricultural species limits the improvement of array-based genotype imputation [9,10]. 55 Inaccurate imputations influence the results of follow-up population genetic analyses. 56 In terms of recently developed methods, low-coverage sequencing (LCS) of a large 57 cohort has been proposed to be more informative than sequencing fewer individuals at 58 a higher coverage rate [11][12][13] 1 Mb (Fig. 3D), providing an indication of the expected mapping resolution obtainable 165 with this population. 166 We further studied the high level of LD and found that it could be a consequence of  Table S4), reflecting the importance of smell when scavenging for food 174 during long periods of environmental adaptation. This result is consistent with a 175 previous study that reported that genes associated with olfaction exhibit fast evolution 176 in pigs. We also observed a significant enrichment of genes involved in the neurological 177 system process (P = 8.64e -5 ). These genes may be associated with behavior and 178 increased tameness and thus were under selection during early domestication. In 179 addition, the hair cycle process (P = 0.004) and bone mineralization (P = 0.040) were 180 also detected to be significantly enriched, which may represent the phenotypic changes 181 of coat and body composition during pig domestication.

183
The 21 associated phenotypes used in this study are shown in Table 1  For the carcass traits, we identified six QTLs (Table 1 and Supplementary Table   215 S5), in which a common narrowed QTL region on SSC7 of 30.24-30.52 Mb was 216 identified to be significantly associated with back fat thickness (BF) and loin muscle 217 depth (LMD) ( Fig. 5 and Fig. S6). Among the QTLs associated with BF and LMD, the 218 narrowed QTL on SSC7 was found to make the greatest contribution to the heritability, 219 so this would be the location of the major genes in the region (Table 1 Table S6). We checked the sequencing depths of these sites, all 261 of which exceeded 2,100×, proving that these sites were completely fixed in our 262 population with the same alleles as in the reference genome. This result reflects the 263 long-term artificial selection history of this commercial Duroc population for growth 264 traits and also explains the lost heritability and major QTLs.    The phenotype TTN data were acquired from Tan's study [27]. In detail, the number of     The flow chart summarizes the steps used to identify and impute polymorphic sites, where the green block (left) represents the highly accurate pipeline used for the Tn5based LCS analysis (BaseVar-STITCH). We also generated SNP results using the GATK-Beagle pipeline (right) and compared them with those found with the BaseVar-STITCH method. The data generated from the high-coverage sequencing analyses (middle) were used to assess the accuracy of the above results. The BaseVar-STITCH pipeline was used in the further GWAS presented in this study.  in which the LD on chromosomes 6 (SSC6) and 10 (SSC10) represent the highest and lowest levels across the whole genome, respectively.

Figure 4 Summary Manhattan plot of seven phenotypes with significant SNPs
Genome-wide representation of all quantitative trait loci (QTLs) identified in this study.
Light and dark grey dots show associations from the seven measures where at least one QTL was detected at the tagging SNP positions (n = 258,662). The most significant SNP positions at each QTL are marked with a color dot.

Figure 6 Heritability and SNP significance and normalized effect of 21 traits
The SNP effect was estimated and normalized and is displayed in the black boxplot.  The red tangles represent detected pathways in this study, which including Bcl-2, NT3, TrkB and p75NTF.