Identifying pleiotropic variants and candidate genes for fertility and reproduction traits in Holstein cattle via association studies based on imputed whole-genome sequence genotypes

Genetic progress for fertility and reproduction traits in dairy cattle has been limited due to the low heritability of most indicator traits. Moreover, most of the quantitative trait loci (QTL) and candidate genes associated with these traits remain unknown. In this study, we used 5.6 million imputed DNA sequence variants (single nucleotide polymorphisms, SNPs) for genome-wide association studies (GWAS) of 18 fertility and reproduction traits in Holstein cattle. Aiming to identify pleiotropic variants and increase detection power, multiple-trait analyses were performed using a method to efficiently combine the estimated SNP effects of single-trait GWAS based on a chi-square statistic. There were 87, 72, and 84 significant SNPs identified for heifer, cow, and sire traits, respectively, which showed a wide and distinct distribution across the genome, suggesting that they have relatively distinct polygenic nature. The biological functions of immune response and fatty acid metabolism were significantly enriched for the 184 and 124 positional candidate genes identified for heifer and cow traits, respectively. No known biological function was significantly enriched for the 147 positional candidate genes found for sire traits. The most important chromosomes that had three or more significant QTL identified are BTA22 and BTA23 for heifer traits, BTA8 and BTA17 for cow traits, and BTA4, BTA7, BTA17, BTA22, BTA25, and BTA28 for sire traits. Several novel and biologically important positional candidate genes were strongly suggested for heifer (SOD2, WTAP, DLEC1, PFKFB4, TRIM27, HECW1, DNAH17, and ADAM3A), cow (ANXA1, PCSK5, SPESP1, and JMJD1C), and sire (ELMO1, CFAP70, SOX30, DGCR8, SEPTIN14, PAPOLB, JMJD1C, and NELL2) traits. These findings contribute to better understand the underlying biological mechanisms of fertility and reproduction traits measured in heifers, cows, and sires, which may contribute to improve genomic evaluation for these traits in dairy cattle.


Background
Fertility, reproduction, calving, and fertility disorders represent a group of traits that directly impact the economic efficiency and animal welfare in the dairy industry [1,2]. However, the long-term genetic selection for production traits (especially milk yield) in Holstein cattle has negatively impacted fertility and reproductive performance due to their unfavorable genetic correlations [3][4][5]. Fertility and reproduction traits were only added to worldwide official genetic evaluations over the past two decades [6,7]. In this context, the more recent implementation of genomic selection (GS) in North American Holstein cattle has greatly contributed to improving genetic gains for fertility and reproduction traits [8]. The usually low heritability estimates [9] and great complexity of biological mechanisms affecting fertility and reproductive performance impact GS accuracy, which relies, among other factors, on population-specific linkage disequilibrium (LD) between genotype markers and causal variants. Therefore, a promising approach is to genotype and use the causal or tightly linked variants within known quantitative trait loci (QTL) related to fertility and reproduction traits in GS schemes, which is expected to improve the prediction accuracy for such traits [10]. Despite the great efforts that have been made to identify QTL, functional genes, and putative causal variants related to fertility and reproduction traits in dairy cattle, it is expected that many potential candidate variants still remain to be uncovered, especially with the increase in detection power achieved by using whole-genome sequence (WGS) variants.
Genome-wide association studies (GWAS) have been popularly performed as a standard method for QTL mapping and candidate gene discovery in both humans and other species [11]. For instance, Fortes et al. [12] and Ma et al. [3], based on a systematic review, reported that a considerable number of QTL and candidate genes located in the Bos taurus autosomes (BTA) and in the X chromosome are associated with fertility and reproduction traits. However, most of these studies used the low-or medium-density single-nucleotide polymorphism (SNP) panels, which might hinder the discovery of candidate genes and causal variants. Due to the reduced cost of high-throughput sequencing approaches, several largescale genome resequencing initiatives have been undertaken, such as the 1,000 Bull Genomes Project [13] and the Canada's Ten Thousand Cows Genome Project [14]. Based on these large reference populations, WGS variants can be obtained cost-effectively through computational imputation from SNP panels to WGS [15,16]. The use of WGS variants for GWAS can enhance the discovery of variant-trait associations especially when there is only short-range LD surrounding causal variants [17].
Previous studies have successfully used real or imputed WGS variants for performing GWAS of fertility and reproduction traits in other dairy cattle populations (e.g., [18][19][20]).
There is a plethora of traditional and novel trait definitions for measuring fertility and reproductive performance in females and males, which are further complicated by alternative definitions across breeding programs [3,21,22]. Furthermore, fertility and reproduction traits recorded in heifers and lactating cows have always been treated as different traits, i.e., they have been analyzed separately due to their usually low genetic correlations [9,23]. Therefore, the detection of pleiotropic variants affecting multiple fertility and reproduction traits is paramount for unraveling their biological background. In this context, Bolormaa et al. [24] proposed an efficient multiple-trait analysis for directly combining the estimated effects of SNPs from single-trait GWAS based on a chi-square statistic, which facilitates the discovery of potential pleiotropic variants. In dairy cattle, this method has been applied to multiple fertility and reproduction traits, which were mainly based on SNP array data (e.g., [25][26][27]).
In this study, we first performed single-trait GWAS analyses for 18 fertility and reproduction traits in North American Holstein cattle, using imputed WGS genotypes. Following the method proposed by Bolormaa et al. [24], the multiple-trait chi-square statistics were subsequently calculated regarding different trait categories, including heifer traits, cows traits, sire traits, and their combinations. Our main objective was to identify candidate pleiotropic SNPs and genes associated with various fertility and reproduction traits in heifers, cows, and sires, which will contribute to a better understanding of the underlying biological mechanisms of fertility and reproduction traits in North American Holstein cattle.

Deregressed breeding values and imputed SNP variants
The analyzed traits and their phenotypic summaries (i.e., deregressed estimated breeding values, dEBV) are shown in Table 1. The number of included animals that have phenotypes ranged from 3,803 for SCSc (sire calf survival -cow) to 5,986 for CA (calving ability). The average accuracy of dEBV (± standard deviation, SD) was 0.55 ± 0.02. After performing the quality control (QC), about 5.6 million SNPs (ranging from 5,396,362 for CA to 5,880,012 for SCSc) were retained for single-trait GWAS analyses. These SNPs were distributed across all autosomes with a mean (± SD) pairwise distance of 445 ± 1,478 bp and a mean (± SD) minor allele frequency (MAF) of 18.7 ± 15.6% (Table S1).

Summaries of single-trait and multiple-trait GWAS analyses
The single-trait GWAS revealed a total of 1,484 SNPs that were significantly associated with the 18 fertility and reproduction traits with no overlapping SNPs between traits. The highest and lowest numbers of significant SNPs were observed for DO (days open) and NRRh (nonreturn rate heifer), respectively (Fig. 1A). These significant SNPs were broadly distributed across all autosomes (Fig. 1B), with the greatest number of significant SNPs located on BTA3, BTA22, and BTA23. For each trait, the top three chromosomes with the greatest numbers of significant SNPs are shown in Fig. 1C. BTA4, BTA20, and BTA22 were observed in common for four traits [i.e., CSc (calf survival cow), DCA (daughter calving ability), DO, and SCEc (sire calving ease cow) for BTA4, CSh (calf survival heifer), DF (daughter fertility), NRRh, and SCSh (sire calf survival heifer) for BTA20, and CA, CEh (calving ease heifer), FSTCh (first service to conception heifer), and SCEh (sire calving ease heifer) for BTA22]. The detailed distribution of significant SNPs identified across traits and chromosomes is shown in Table S2. The number of QTL identified in the single-trait GWAS ranged from 38 for NRRh to 73 for SCEc. There were three QTL overlapping between CA and CTFS (calving to first service), and three overlapping between CSh and SCEc (Table S3). The highest estimated genome inflation factor was equal to 1.090 (95% CI of 1.075-1.104) for DCA, and its mean ± SD across all traits was equal to 1.052 ± 0.024 (Fig. S1).
For the multiple-trait analyses, the defined trait categories of heifer, cow, and sire traits, and their combinations are shown in Table 1. There were 425 significant  SNPs revealed by multiple-trait analysis, including 87 for heifers, 72 for cows, 84 for sires, 96 for heifers and cows, and 86 for all animals. Of these, only one SNP (BTA9:21,350,462) was shared between heifers and cows, whereas no SNP for sire traits was shared with heifers, cows, or heifers and cows (Fig. S2). When both heifer and cow traits were analyzed together, there were 15 and 25 significant SNPs overlapping with the independent analyses of heifer and cow traits, respectively. In summary, the multiple-trait analyses revealed a total of 328 unique significant SNPs, which are distributed across all autosomes.

Multiple-trait analysis for heifer traits
The multiple-trait analysis of six heifer traits revealed 87 significant SNPs broadly distributed across 22 chromosomes ( Fig. 2 and Table 2). Of these, the highest numbers of significant SNPs were observed on BTA20, BTA22, BTA23, and BTA18. The three most significant SNPs were located on BTA18, BTA20, and BTA20. A total of The functional analyses revealed that the identified 184 candidate genes were significantly enriched into 21 Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Fig. S3). Among them, the immunity-associated biological functions were predominant, such as the antigen processing

Multiple-trait analysis for cow traits
There were 72 SNPs significantly associated with cow traits ( Fig. 2 and Table 3). These significant SNPs were distributed across 25 chromosomes, with the highest number of SNPs observed on BTA3, BTA8, and BTA10. The three most significant SNPs were located on BTA17, BTA13, and BTA27, respectively. There were 37 significant SNPs located within the exonic, intronic, or upstream/downstream regions of 23 genes, while 101 candidate genes were found within ± 100 Kb of 35 intergenic SNPs. Fifty-six QTL were found, and two of them  (Table S5).

Multiple-trait analysis for sire traits
Eighty-four significant SNPs were associated with sire traits, which were broadly distributed across 24 chromosomes ( Fig. 2 and Table 4). The three chromosomes that had the highest numbers of significant SNPs were BTA7, BTA17, and BTA22. The three most significant SNPs were located on BTA4, BTA5, and BTA26. There were one, 32, and one significant SNP located within the exonic, intronic, and upstream/downstream regions of 30 protein-encoding genes, respectively. Other 104 protein-encoding and 13 lncRNA genes were found within ± 100 Kb regions of 50 intergenic SNPs. Seventythree QTL regions were determined from the 84 significant SNPs, and 31 of them (42.5%) overlapped with 72 previously reported QTL associated with reproduction traits in sires such as scrotal circumference, daughter pregnancy rate (DPR), CR, IPC, and interval to first estrus after calving (IFEC); Table S6. Among all the 147 candidate genes found for sire reproduction traits, only one GO term of "N-acetyl-beta-D-galactosaminidase activity" was shown to be significantly enriched (Fig.  S3). Two genes located on BTA20, including HEXB (Hexosaminidase subunit beta) and LOC786974 (Betahexosaminidase subunit beta) were also involved in this biological function.

Multiple-trait analysis for the combined trait categories
When the 14 female fertility and reproduction traits were combined (heifers and cows), 96 significant SNPs were identified across 28 chromosomes ( Fig. 2 and Table S7). The three most significant SNPs were located on BTA23, BTA22, and BTA4, respectively. Ninety-six QTL were detected and 25 of them overlapped with 60 previously reported reproduction-associated QTL in cattle (Table S8). There were 171 candidate genes found, in which, nine GO terms and one KEGG, such as Phagosome (KEGG:04,145) and rough endoplasmic reticulum (GO:0,005,791), were significantly enriched for them (Table S9).
The multiple-trait chi-square statistics was also applied to all 18 traits of heifers, cows, and sires combined. Eighty-six significant SNPs and 178 candidate genes were identified across 26 chromosomes ( Fig. 2 and Table S10). All significant SNPs were clustered into 77 QTL, and 27 of them (35%) overlapped with previously reported reproduction-associated QTL in cattle for traits, such as CR, calving ease (CE), DPR, and IPC (Table S11). The functional enrichment analyses revealed four GO terms, including "2-acylglycerol O-acyltransferase activity", "Diacylglycerol biosynthetic process", "Acylglycerol O-acyltransferase activity", and "O-acyltransferase activity" (Table S9). For these biologically relevant genes identified for heifer, cow, and sire traits in this study, we also performed a proteinprotein interaction analysis, but no direct functional interactions among them were observed as shown in Fig. S4.

Refined studies on fertility and reproduction traits in dairy cattle
Due to the great economic importance of fertility and reproduction traits in cattle industry, GWAS have been frequently carried out especially since genotyping become affordable for large sample sizes. Besides the fact that SNP panels have been used in former GWAS of cattle fertility and reproduction traits [27][28][29][30][31][32][33][34][35][36][37][38][39], wholegenome sequence variants recently began to be used to refine QTL boundaries and identify causal genes [18][19][20]. Therefore, the imputed or real WGS variants is being increasingly used for GWAS, especially for traits with low heritability and that are highly polygenic. Because of intrinsic correlations among various indicator traits of fertility and reproduction, it is preferable to perform multiple-trait GWAS to detect pleiotropic variants and increase the detection power of important variants. The classical methods of multiple-trait GWAS [40,41] are difficult to implement in practice, especially on large-scale datasets due to computing requirements. Alternatively, an approximate method was proposed by Bolormaa et al. [24], which efficiently computes a multiple-trait chi-square statistic from estimated SNP effects from single-trait GWAS. In this study, we used imputed WGS variants and this approximate method for multiple-trait GWAS of fertility and reproduction traits in Holstein cattle, aiming to refine the associated pleiotropic variants, candidate genes, and QTLs.

Polygenic nature of fertility and reproduction traits
Most fertility and reproduction traits are lowly heritable in cattle. Berry et al. [21] reported a summary    [3,12]. In this study, a similar genetic landscape was observed, as all autosomes harbored significant SNPs for one or more traits analyzed, being consistent with the polygenic nature of these traits. Numerous fertility and reproduction indicator traits have been defined for measuring reproductive performance at different reproductive stages in female cattle [3,21]. However, some of these traits are by definition related with each other, such as CI is greatly determined by the interval from calving to conception. Therefore, pleiotropic variants are expected for these traits [10,42]. Overall, we found that the number of significant SNPs from multiple-trait analysis was less than the cumulative sum of separate single-trait analysis; and similar results were found in previous GWAS studies on fertility and reproduction [e.g., 27].

Differences between heifer and cow traits
It is well known that the genetic basis and physiological processes underlying reproduction differ between heifers and lactating cows. In general, heifer traits have higher heritability, but lower genetic correlations with each other compared to cow traits [43,44]. There are also only moderate genetic correlations between heifer and cow traits, which are lower than between cow traits in different parities [23,[44][45][46]. Likewise, both ovarian structures and circulating steroids (such as serum concentrations of progesterone and estradiol) were observed to act differently between heifers and cows. In addition, lactating cows have lower circulating steroid concentrations, but larger ovulatory follicles and luteal structures than heifers [47,48]. In previous GWAS for fertility and reproduction traits, different variants, candidate genes, and QTL were found significantly associated with heifers and cows [20,26,27,37]. For instance, Fang et al. [26] found a significant QTL on BTA17 associated with interval from first to last insemination (IFLI) and non-return rate (NRR) in heifers, but not in cows. For NRR, distinct significant QTL regions on BTA17 were also suggested for heifers and cows [20]. Liu et al. [27] analyzed four fertility traits (IFLI, FSC, IPC, and NRR) measured on both heifers and cows, and found that none of the significant QTL overlapped and most of these QTL were located on distinct chromosomes. In this study, our multiple-trait analysis found that there were only two candidate QTL (around at BTA9:21.35 Mb and BTA23:28.63 Mb) overlapping between heifers and cows. Furthermore, our functional analyses of candidate genes also suggested differences between heifer and cow traits. The candidate genes found for heifers were significantly involved in immunity-associated biological functions, whereas candidate genes found for cows were associated with acylglycerol O-acyltransferase activity and diacylglycerol metabolism.

QTL and candidate genes found for heifer traits
The mean CR of Holstein heifers was estimated to be 56.3% in the United States, which is influenced by the age at breeding, month of breeding, age of service sire, and other factors [45]. Accordingly, our multiple-trait analysis for heifer traits suggested dozens of associated QTL that are broadly distributed across three quarters of all included chromosomes. Notably, one-third of these QTL overlapped with previously identified reproductionrelated QTL in cattle. For example, the QTL located on BTA3:89.78-89.98 Mb was supported by a QTL previously reported to be associated with the commencement of LA and proportion of cows in LA between 25 and 60 days in milk [19], and with CR [49]. There were five QTL previously reported to be associated with DPR, CR, and IFEC [50][51][52] For the CR at first service and the number of times bred in Holstein heifers, Galliou et al. [54] found that three out of the five most significant pathways were involved in immune system regulation, which was consistent with our results. More importantly, the non-classical MHC I gene of BoLA-NC1, found in this study and involved in the KEGG pathway of allograft rejection, was suggested to increase maternal immunity against the fetus [55]. Another non-classical MHC I gene (JSP.1) was found to play a crucial role during early pregnancy in heifers in a previous study [56]. Similarly, Melo et al. [57] found that the MHC class II genes were significantly associated with pregnancy success in Nellore cows. As the link between autophagy and reproduction has been acknowledged [58], another autophagy-related gene of phosphatidylinositol 3-kinase catalytic subunit type 3 (PIK3C3) was identified in this study. These findings suggest the importance of immunological tolerance of dam to fetus and other immune responses in establishing successful pregnancy.

QTL and candidate genes found for cow traits
Our multiple-trait analysis for cow traits revealed that all significant QTL were broadly distributed across more than 85% of the chromosomes and one-third of them were supported by previously reported QTL in cattle. For instance, one QTL located on BTA18:63.08-63.28 Mb was previously reported significantly associated with CI and stillbirth [36,78]; one QTL located on BTA20:56.86-57.06 Mb was supported by two previously reported associations with CR, FSC, and IPC [49,54]; and one QTL located on BTA5:61.82-62.02 associated with FSC and CR [79]. In addition, many novel QTL identified in this study harbored candidate genes that have biological functions impacting reproduction according to the literature.
Our analyses of candidate genes for cow traits revealed functional implications that are in contrast to those for heifers. Instead of immune system, the most significant functions were associated with fatty acid metabolism, such as the acylglycerol O-acyltransferase activity and diacylglycerol metabolism. Mattos et al. [80] comprehensively reviewed effects of dietary fatty acids on reproduction in ruminants, which include the influences on ovarian follicle and corpus luteum function via improved energy status, and the synthesis of reproductive hormones, such as steroids and prostaglandins. In addition, the maternal lipid metabolism during pregnancy positively influence fetal growth [81,82]. Many genes involved in fatty acid metabolism were similarly found to be associated with reproductive performance in Nellore cattle [57]. The seasonal changes in Holstein fertility was related to fatty acid composition of follicles [83]. Furthermore, the relationship between dietary fatty acids and ovarian function was experimentally found in Holstein cows [84].
Interestingly, numerous candidate genes identified in this study also have the known biological functions associated with reproduction in literature. Two genes (GRIN2A and ITGAX) located on BTA25 were associated with CTFS in Iranian Holstein cattle [85] and with altered expression in polycystic ovary syndrome in women [86], respectively. There were four candidate genes located on BTA8, including the kinesin family member 13B (KIF13B) involved in oocyte meiosis [87], annexin A1 (ANXA1) with regulatory functions during early pregnancy in mice [88] and differentially expressed between less fertile and normally fertile Holstein [89], proprotein convertase subtilisin/kexin type 5 (PCSK5) that contributed to ovarian follicle development in rats [90], and zinc finger protein 483 (ZNF483), which is associated with age of puberty in Brahman heifers [91]. Two genes, the fem-1 homolog B (FEM1B) and sperm equatorial segment protein 1 (SPESP1) on BTA10, were reported to be involved in polycystic ovary syndrome in humans [92] and required for fully fertile sperm in mice [93], respectively. Four genes were associated with different reproduction traits, including cell adhesion molecule 2 (CADM2) on BTA1 with the number of piglets born dead in pigs [94], calcium voltage-gated channel auxiliary subunit beta 4 (CACNB4) on BTA2 with CTFS in Iranian Holstein Cattle [85], ST6 N-acetylgalactosaminide alpha-2,6-sialyltransferase 5 (ST6GALNAC5) on BTA3 with fertility index in Holstein cattle [95], and ATP/GTP binding protein like 1 (AGBL1) on BTA21 with out of season lambing in sheep [96]. Other suggestive candidate genes included the protein tyrosine phosphatase receptor type Z1 (PTPRZ1) on BTA4 differentially expressed between high-and low-fertility Holstein cows [97], LPS responsive beige-like anchor protein (LRBA) on BTA17 differentially methylated between high-and low-fertility bulls [98], peroxisome proliferator activated receptor gamma (PPARG ) on BTA22 involved in the release of oocytes each estrous cycle [99], and jumonji domain containing 1C (JMJD1C) on BTA28 required for long-term maintenance of male germ cells in mice [100].

QTL and candidate genes found for sire traits
In dairy cattle, the genetic evaluation of male reproduction has received much less attention in comparison with female traits [101,102]. Sire CR in Holstein was included in two previous GWAS studies [103,104], which revealed significant SNPs located on multiple chromosomes. In this study, we included four sire traits and found that the number and genomic distribution of QTL are comparable with or higher than that in females. Interestingly, more than 40% of these QTL identified in this study overlapped with previously reported reproduction-related QTL in cattle. For instance, all four QTL located on BTA17  [105,106].
Only one GO term (N-acetyl-beta-D-galactosaminidase activity) was significantly enriched among all candidate genes of sire traits. The two involved genes (HEXB and LOC786974) on BTA20 have no functional evidence in literature with respective to reproduction. However, we found many novel candidate genes that have direct functional implications in reproduction. For example, the engulfment and cell motility 1 (ELMO1) gene on BTA4 was found to play crucial roles in clearance of apoptotic germ cells and spermatogenesis in mice [107]. Two genes, JMJD1C and cilia and flagella associated protein 70 (CFAP70) on BTA28, are required for long-term maintenance of germ cells in mice [100] and are associated with multiple morphological abnormalities of sperm flagella in human [108]. Notably, both ELMO1 and JMJD1C genes were suggested by the significant SNPs located in intronic regions. One well-known candidate gene on BTA15 (follicle stimulating hormone subunit beta, FSHB), associated with semen quality and fertility in bulls [109], was also identified in this study.

Combining multiple trait categories
We combined multiple trait categories for multiple-trait analysis for exploiting potential pleiotropic effects for fertility and reproduction traits. When the 14 female traits were analyzed together, 10 out of the 33 suggestive candidate genes in heifer and cow traits were identified. In addition, 15 out of the 54 suggestive candidate genes in heifers, cows, and sires were identified when also combining the four male traits together. There were five additional identified candidate genes in the combined analysis that have biological implications in reproduction in the literature, including Cbp/p300 interacting transactivator with Glu/Asp rich carboxy-terminal domain 2 (CITED2) on BTA9, cadherin 1 (CDH1) on BTA18, fanconi anemia complementation group M (FANCM) on BTA23, DPY30 domain containing 1 (DYDC1) on BTA28, and neural EGFL like 2 (NELL2) on BTA5 [129][130][131][132][133]. Among them, NELL2 was recently found to be involved in lumicrine system essential for testis-epididymis-spermatozoa signaling and male fertility [133]. Furthermore, three chromosomes (BTA3, BTA11, and BTA24) no longer contained significant regions when all 18 traits were analyzed together. A possible explanation for these findings is that the increased amount of information might have removed spurious associations.

Implications of the study
The results of this study have three main implications. First, we enhanced compelling evidence that traits measured in heifers, cows, and sires hold relatively distinct polygenic nature, therefore having specific evaluation is needed in breeding programs. Second, the QTL and candidate genes found in this study can be specifically incorporated into genomic prediction models by giving greater weights to the more important markers. Furthermore, SNPs located in the relevant genomic regions identified can also be included in commercial genotyping platforms to increase the accuracy of genomic prediction. Third, we suggested several novel candidate genes associated with fertility and reproduction traits in cattle, which should be further investigated in sequel studies. Additionally, the use of closer to biology phenotypes that better capture the biological mechanisms underlying the fertility and reproduction traits and the use of multi-omic data (e.g., transcriptomics, metabolomics, epigenomics) can further facilitate the mapping of QTL and variants associated with fertility and reproduction. Future studies could further investigate the most promising candidate genes using gene editing or gene knock-out experiments and real WGS data (instead of imputed WGS), which would enable the investigation of rare alleles (usually removed due to low imputation accuracy) and structural variation in the genome (e.g., copy number of variants, insertions, deletions). For the significant intergenic SNPs found in this study, their possible functional roles will be explored when a comprehensive annotation of genomic regulatory elements are available.

Conclusions
The multiple-trait GWAS of 18 fertility and reproduction traits in North American Holstein cattle revealed several QTL and candidate genes associated with heifer, cow, and sire traits. These QTL were broadly distributed across the entire genome, which are consistent with the polygenic nature of fertility and reproduction traits. The biological functions of immune response and fatty acid metabolism were significantly enriched in heifer and cow traits, respectively, whereas no known functional enrichment was found for sire traits. The most important chromosomes, which had three or more significant QTL, were BTA22 and BTA23 for heifer traits, BTA8 and BTA17 for cow traits, and BTA4, BTA7, BTA17, BTA22, BTA25, and BTA28 for sire traits. Several candidate genes that have not been previously reported in cattle and other livestock were strongly suggested for heifer (SOD2, WTAP, DLEC1, PFKFB4, TRIM27, HECW1, DNAH17, and ADAM3A), cow (ANXA1, PCSK5, SPESP1, and JMJD1C), and sire (ELMO1, CFAP70, SOX30, DGCR8, SEPTIN14, PAPOLB, JMJD1C, and NELL2) traits. More than onethird of the QTL identified in this study overlapped with previously reported reproduction-related QTL in cattle, and more than 50 candidate genes were supported by functional implications in reproduction, as found in the literature. In addition, many important candidate genes were identified via significant SNPs located in their intronic regions. These observations indicate high detection power and mapping resolution of this study. Therefore, a comprehensive investigation on underlying genetic basis of fertility and reproduction in heifers, cows, and sires are provided in this study, whose findings can be used for improving genomic evaluation for fertility and reproduction traits in Holstein cattle.

Animals, phenotypes and genotypes
The phenotypic and genotypic datasets used in this study were provided by the Canadian Dairy Network (CDN), a member of Lactanet Canada (Guelph, ON, Canada). A total of 18 fertility and reproduction traits were analyzed in this study (Table 1). For all traits, dEBV were used as pseudo-phenotypes, which were computed following VanRaden et al. [134] and only dEBV with accuracy greater than 0.50 were kept for further analyses. The details of trait definitions and the effects included in the statistical model used to predict the original estimated breeding values (EBV) for each trait are reported in Oliveira Junior et al. [9].
Imputed WGS datasets containing 29,548,077 SNPs were available for 9,131 animals. The detailed genotype imputation process using the FImpute software [135], including the number of animals per SNP panel, methods, and QC, was described by Chen et al. [136]. In brief, genotype imputation was performed in two steps: (1) imputation from medium density panels ( (2) imputation from HD to WGS. The reference population for step 1 had 2,397 animals (from the same herds and Holstein population), whereas for step 2, there were 1,147 animals with WGS data (from the 1,000 Bull Genomes Project, which also included North American Holstein animals) [136]. An additional QC step was applied after imputation, which required the individual and genotype missing rates to be lower than 0.1 (this step was done as some markers were still missing after the imputation process), MAF to be higher than 0.01, and no extreme deviation from Hardy-Weinberg equilibrium (only retained SNPs with P > 1.0 × 10 -8 ). The SNPs that were poorly imputed were also removed from the analyses. This was done in a previous step where accuracy of imputation was calculated based on genotype concordance rate and allelic R 2 on a per-SNP basis, where the SNPs retained had a concordance rate and allelic R 2 greater than 0.95 and 0.85, respectively. All QC were conducted using the PLINK software [137], after which an average of 5,576,878 SNPs were retained for single-trait GWAS using 3,803 to 5,986 animals (depending on the trait, as showed in Table 1).

Single-trait association analyses
As all known fixed effects were fitted when predicting the EBVs used to compute the dEBVs, only SNP and polygenic effects were included in the mixed linear model used for the association analyses, i.e.: where y is the vector of dEBV for each analyzed trait; 1 is a vector of ones; µ is the overall mean; b is the fixed effect of the SNP tested for association, X is a vector containing the genotype score for the tested SNP; u is the random vector of polygenic effect with u ∼ N (0, Gσ 2 u ) , where G is the genomic-based relationship matrix (GRM), σ 2 u is the additive genomic variance of polygenic effects; Z is the incidence matrix for u ; and e is a random vector of residual effects with e ∼ N (0, Iσ 2 e ) , where I is an identity matrix and σ 2 e is the residual variance. The genetic relationship between individuals j and k in the GRM was computed as [138]: where p i is the frequency of the reference allele for the i th SNP; x ij and x ik are the numbers of copies of the reference allele for individuals j and k , respectively; and N is the total number of SNPs used. Instead of using all included SNPs, a total of 29 GRMs were constructed by randomly sampling 50,000 SNPs from the remaining 28 chromosomes (i.e., iterative process including all chromosomes except the one in which the analyzed SNP was located). These analyses were implemented using the GCTA software [138].
The strongly-linked SNPs were clumped out if they had high LD (r 2 > 0.9) with another SNP that have a lower P in the GWAS, using the PLINK software [137]. This clumping process avoids overestimating the number of significant SNPs and was suggested to be preferable compared to the pruning method without considering the P values from GWAS [139]. To check if there was potential stratification due to the population structure, the genomic inflation factor (λ) [140] was evaluated, along with its 95% confidence interval.

Multiple-trait chi-square statistics
Following Bolormaa et al. [24], the multiple-trait analysis was performed using five different trait categories (i.e., heifers, cows, sires, heifers and cows, and all traits), as shown in Table 1. The CA trait was simultaneously included into both heifer and cow categories as it is a sub-index incorporating other traits. For each category of n traits, the multiple-trait chi-square statistics of a SNP was obtained as follows [24]: where t i is a n × 1 vector of signed t-values for i th SNP (that is equal to the allele effect divided by its standard error) across the n analyzed traits; t ′ i is the transpose of vector t i ( 1 × n ); and V −1 is the inverse of n × n correlation matrix, in which the pairwise correlations of traits were calculated over the estimated SNP effects. The null hypothesis that the SNP has no significant effect on any of the tested traits was tested based on the χ 2 distribution with n degrees of freedom.

Adjustment for multiple-hypotheses testing
Due to the polygenic architecture of fertility and reproduction traits, a large number of QTL are expect, therefore a Bonferroni correction at 5% chromosome-wise significance level [141] was carried, by dividing 0.05 by the number of independent chromosome segments ( M e ) to account for dependency among tests. The M e was calculated as follow [142]: where N e is the effective population size, which was set to 66 according to a recent report in North American Holstein [143]; L is chromosome length expressed in centi-Morgans (cM; one cM was considered to be equivalent to 1 Mb). Thus, SNPs were considered as statistically significant if their −log 10 (P) was higher than the chromosome-wide threshold [ −log 10 (0.05/M e ) ], which ranged from 4.15 to 4.65 depending on the chromosome (average ± SD = 4.39 ± 0.12).

QTL, positional candidate genes, and functional analyses
The QTL were defined as chromosomal regions of ± 100 Kb around the significant SNPs, therefore, considering a flanking distance where, on average, high LD is expected in North American Holsteins [144]. The identified QTL were compared to previously reported QTL in the Cattle QTL Database -Release 2N e L ln(N e L) , 43 [145]. To most effectively avoid missing the potential causal genes, all positional candidate genes within QTL region, including protein-encoding and lncRNA, were retrieved using the biomaRt R package [146]. The ARS-UCD1.2 assembly (https:// ensem bl. org/ Bos_ taurus/ Info/ Index) was used as the reference genome. Functional enrichment analyses of the candidate genes were conducted using the g:GOSt function of the g:Profiler web server [147], including the target datasets of the GO terms [148] and KEGG pathways [149]. The default parameters and methods for adjusting for multiple hypotheses testing were used, targeting an adjusted 5% level of significance. For the biologically suggested candidate genes, we also performed a protein-protein interaction analysis using the STRING software [150].
Additional file 1: TableS1. Position distribution and minor allele frequencies of SNPs that passed thegenotype quality control; Table S2. Distribution of significant SNPs of single-traitGWAS; Table S3. Numbers of quantitative trait loci revealed by single-traitGWAS and their overlapping among traits; Table S4. The previously reported andreproduction-associated QTL for the significant SNPs revealed by multiple-traitanalysis of six heifer traits; Table S5. The previously reported andreproduction-associated QTL for the significant SNPs revealed by multiple-traitanalysis of nine cow traits; Table S6. The previously reported andreproduction-associated QTL for the significant SNPs revealed by multiple-traitanalysis of four sire traits; Table S7. Significant SNPs and candidate genesfrom multiple-trait analysis of 14 heifers and cows' traits; Table S8. Thepreviously reported and reproduction-associated QTL for the significant SNPsrevealed by multiple-trait analysis of 14 heifer and cow traits; Table S9. Thesignificantly enriched GO terms and KEGG from multiple-trait analysis of 18heifers, cows, sire traits; Table S10. Significant SNPs and candidate genesfrom multiple-trait analysis of 18 heifers, cows, sire traits; Table S11. Thepreviously reported and reproduction-associated QTL for the significant SNPsrevealed by multiple-trait analysis of 18 heifer, cow, and sire traits; Figure S1. Manhattan plots (left) and Quantile-quantile plots (right) of thesingle-trait GWAS results based on imputed WGS data; Figure S2. Numbers of significant SNPs foreach trait category and their overlaps found by multiple-trait analysis. Figure S3. Significantly enrichedbiological functions of candidate genes revealed by multiple-trait analysis. Figure S4. Potential proteinprotein interaction among biologically relevantgenes that were identified for heifer, cow, and sire traits in this study.