Genome-wide association study for female fertility in Nordic Red cattle

The Nordic Red Cattle (NRC) consists of animls belonging to the Danish Red, Finnish Ayrshire, and Swedish Red breeds. Compared to the Holstein breed, NRC animals are smaller, have a shorter calving interval, lower mastitis incidence and lower rates of stillborn calves, however they produce less milk, fat and protein. Female fertility is an important trait for the dairy cattle farmer. Selection decisions in female fertilty in NRC are based on the female fertility index (FTI). FTI is a composite index including a number of sub-indices describing aspects of female fertility in dairy cattle. The sub-traits of FTI are: number of inseminations per conception (AIS) in cows (C) and heifers (H), the length in days of the interval from calving to first insemination (ICF) in cows, days from first to last insemination (IFL) in cows and heifers, and 56-day non-return rate (NRR) in cows and heifers. The aim of this study was first to identify QTL for FTI by conducting a genome scan for variants associated with fertility index using imputed whole genome sequence data based on 4207 Nordic Red sires, and subsequently analyzing which of the sub-traits were affected by each FTI QTL by associating them with the sub-traits. A total 17,388 significant SNP markers (−log10(P) > 8.25) were detected for FTI distributed over 25 chromosomes. The chromosomes with the most significant markers were tested for associations with the underlying sub-traits: BTA1 (822 SNP), BTA2 (220 SNP), BTA3 (83 SNP), BTA5 (195 SNP), two regions on BTA6 (503 SNP), BTA13 (980 SNP), BTA15 (23 SNP), BTA20 (345 SNP), and BTA24 (104 SNP). The fertility traits underlying the FTI peak area were: BTA1 (IFLC, IFLH), BTA2 (AISH, IFLH, NRRH), BTA3 (AISH, NRRH), BTA5 (AISC, AISH, IFLH), BTA6 (region 1: AISH, NRRH; region 2: AISH, IFLH), BTA13 (IFLH, IFLC), BTA15 (IFLC, NRRH), and BTA24 (AISH, IFLH). For BTA20 all sub-traits had SNP markers with a –log10(P) > 10. Furthermore the genes assigned to the most significant SNP for FTI were located on BTA6 (GPR125), BTA13 (ANKRD60), BTA15 (GRAMD1B), and BTA24 (ZNF521). This study 1) shows that many markers within FTI QTL regions were significantly associated with both AISH and IFLH, and 2) identified candidate genes for FTI located on BTA6 (GPR125), BTA13 (ANKRD60), BTA15 (GRAMD1B), and BTA24 (ZNF521). It is not known how the genes/variants identified in this study regulate female fertility, however the majority of these genes were involved in protein binding, 3) a SNP in a QTL region for FTI on BTA20 was previously validated in three cattle breeds.


Background
The Nordic Red Cattle (NRC) consists of animals belonging to the Danish Red, Finnish Ayrshire, and Swedish Red breeds. Compared to the Holstein which is the dominating dairy cattle breed in the Denmark, NRC animals are smaller, have a shorter calving interval, lower mastitis incidence and lower rates of stillborn calves. However, they also produce less milk, fat and protein [1]. Female fertility is an important trait for the dairy cattle farmer. Poor fertility leads to extra inseminations, higher veterinary costs, and higher culling rate causing higher replacement costs. Impaired female fertility is the major reason for culling in the dairy industry [2]. In the Nordic countries genetic improvement of female fertility in the NRC cattle population is achieved by including a fertility index (FTI) in the breeding goal. FTI reflects how easily a cow or a heifer conceives given the index is composed of four different female fertility traits namely: number of inseminations per conception, interval from calving to first insemination, days from first to last insemination, and 56-day non-return rate.
Genetic evaluation for three NRC populations from Denmark, Sweden and Finland are carried out jointly by Nordic Cattle Genetic Evaluation (http://www.nordicebv. info). The trait definitions have been standardized across the three countries and the cattle industry has made the large NRC dataset available for cattle genetic research.
Previous genetic analyses of female fertility in NRC showed that the heritability is in the range of 0.02 to 0.04 [1]. A trait with low heritability requires large progeny groups in order to attain an acceptable level of accuracy for the breeding value. International cooperation is therefore beneficial as a large number of registrations increase the accuracy of the estimated breeding values.
Previous studies on the identification of DNA markers have been focusing on the individual red breeds [3,4], however a joint association analysis including the three NRC populations is lacking. Furthermore, a limitation of these previous studies [3,4] is that the marker panels used only represent a small fraction of the variants segregating in the population. With the availability of data from the three countries and the whole genome sequence (WGS), a joint analysis could improve the power to identify the causal mutation [5]. The power to detect QTL increases because there is an improvement of accuracy of the breeding value by pooling the data from the three countries together. Furthermore using WGS will capture the linkage disequilibrium structure of the population better due to the higher marker density and in addition WGS includes some of the causal mutations underlying FTI. The identified DNA markers influencing female fertility traits in NRC [4][5][6] can potentially be used for routine genomic evaluation where additional weight can be put on certain genomic regions/variants [7].
The joint association analysis of the NRC population with imputed WGS data allows the detection of QTL for FTI and the determination of which sub-traits are affected by these QTL for FTI. Therefore our first objective of this study was to identify QTL for FTI by conducting a genome scan for variants associated with FTI in the joint NRC population using imputed WGS. The second objective was to re-analyze the identified QTL region for FTI to ascertain which sub-trait(s) of FTI are influenced by these QTL.

Methods
The analysis was based on a two-step approach. In the first step the genome was screened for the fertility index only based on a sire model without considering relationship among sires. In the second step targeted regions were selected based on the genome scan and re-analyzed for the individual female fertility traits using an animal model considering relationship among all animals in the study.

Animals and phenotype data
No animal experiments were performed in this study, and, therefore, approval from the ethics committee was not required. A total of 4207 Nordic Red sires from Denmark (RDCDNK), Sweden (RDCSWE) and Finland (RDCFIN) with official breeding values for female fertility traits were used for the association study. Estimated breeding values (EBVs) were predicted by the Nordic Cattle Genetic Evaluation (NAV) for each animal as part of routine genetic evaluation of Nordic dairy cattle breeds. Predictions were based on first to third parity records. BLUP EBVs were predicted using multi-trait sire models (SM). For details regarding the phenotypes recorded and models used in routine breeding value prediction, see http://www.nordicebv.info. The reliability of the breeding value is in the range of 40 to 99 % (Additional file 1: Figure S1).
Female fertility index reflects the ease with which a heifer or cow was able to conceive. FTI is a compound index and its sub-traits are: number of inseminations per conception (AIS) in cows and heifers, the length in days of the interval from calving to first insemination (ICF) in cows, and days from first to last insemination (IFL) in cows and heifers. In addition 56-day non-return rate (NRR) in cows and heifers were analyzed even though it had no weight in the FTI. The reason to include NRR in our analyses was that NRR is a trait of biological importance understanding female fertility. For AIS, IFL and NRR EBVs were predicted separately for 1st parity animals (heifers, suffixed H) and 2nd and 3rd parity animals (cows, suffixed C). For more information about the heritability and correlations between the traits please see http://www.nordicebv.info/NR/rdonlyres/ 5CD2E4DC-F82A-4809-A770-3022E270E205/0/Principles Nyeste.pdf.
In step I FTI was analyzed for its association with SNPs. In step II selected QTL regions identified for FTI were targeted and analyzed for individual female fertility traits: AIS, IFL, ICF and NRR as well as for FTI with an animal model considering the relationship among all the animals in the study.

Whole genome sequence data
The 253 dairy cattle sequences originating from a combination of sequences processed at Aarhus University [8] and sequences from the 1000 Bull Genomes Project dataset Run2 [9] were available. The sequencing of Nordic bulls (Jersey, Holstein, Nordic Red Cattle) at Aarhus University, Foulum was done using Illumina sequencers at Beijing Genomics Institute, Shenzhen, China and at Aarhus University. Sequencing was shotgun paired-end sequencing. Where necessary, fastq data were converted from Illumina to Sanger quality encoding using a patched version of maq [10]. They were aligned to the UMD3.1 assembly of the cattle genome [11] using bwa [12]. They were converted to raw BAM files using samtools [10]. Quality scores were re-calibrated using the Genome Analysis Toolkit [13] following the Human 1000 Genome guidelines incorporating information from dbSNP [14]. Sequences were realigned around insertion/deletions using the Genome Analysis Toolkit. Variants were called using the Genome Analysis Toolkit's UnifiedGenotyper. Genomes for the 1000 Bull Genomes Project were sequenced in a number of laboratories and collected at the Department of Primary Industries, Victoria, Australia [9]. The breeds included in the sequencing project can be found at: http://www.1000bullgenomes.com/. Sequences were aligned to the same reference genome as used at Aarhus University. Variant calling was done using samtools' mpileup function. Variant Call Files from Aarhus University and the 1000 Bull Genomes project were combined using the Genome Analysis Toolkit's CombineVariants with precedence given to calls from the Nordic dataset for animals appearing in both datasets.

Imputation HD and sequence data
Imputation of 50 K SNP to the full sequence was done in two steps. First, in another study (N.K. Kadri, pers. comm.) the 50 K genotypes for 12,322 Nordic bulls (from Nordic Holstein, RDC and Danish Jersey) were imputed to HD genotypes using the software IMPUTE2 [14]. A reference containing HD genotypes for 2036 bulls (902 Holstein, 735 Nordic Red and 399 Danish Jersey) was available.
Second, the 12,322 bulls imputed to HD genotypes were further imputed to the full sequence variants. Variants from the 253 dairy bulls were used as reference to impute the imputed HD data to the whole genome level using the software Beagle [15]. For imputation from HD to sequence data, chromosomes were divided into chunks of about 20,000 consecutive markers with an overlap of 250 markers at each end to minimize imputation error at ends of the chunks. Only polymorphisms identified both at Aarhus University dataset and the 1000 Bull Genomes dataset were imputed. For positions containing both a SNP and an insertion-deletion polymorphism (INDEL), the INDEL was deleted. SNPs at positions with disagreements between alleles from sequence and HD data were deleted. The reference data was pre-phased with Beagle v3.3.2 [15] and all markers with an r 2 value below 0.9 were removed. This left a total of 8,938,927 markers for chromosomes 1-29.

Statistical method for association analysis
Step I: Genome scan for FTI A sire model was used for the initial genome screening for markers associated with FTI. A SNP-by-SNP analysis was carried out in which each SNP in turn was tested for association with the phenotype. The following model was used to estimate SNP effects: Where y ijk was the EBV of individual k belonging to the half-sib (sire) family j from population i, μ was the general mean, p i is fixed effect of i'th population (RDCDNK, RDCSWE and RDCFIN), b was the allelic substitution effect, x ijk was the number of copies of an allele (with an arbitrary labeling) of the SNP count in the individual k (corresponding to 0, 1, or 2 copies), and s j was the random effect of the j'th half-sib family assumed to follow a normal distribution s~N(0, σ s 2 ) where σ s 2 was the sire variance, and e ijk was a random residual of individual k assumed to follow a normal distribution e~N(0, σ e 2 ). All statistical analyses were conducted using DMU [16]. The null hypothesis H 0 : b = 0 was tested with a t-test. A SNP was considered significant if the -log 10 (P) value for the SNP exceeded a genome wide Bonferroni corrected threshold of 8.25 (= −log 10 (0.05/8,938,927)) corresponding to a nominal type I error rate of 5 % corrected for 8,938,927 simultaneous tests.
Step II: Association analyses for the targeted regions The results from the step I analysis were plotted in a Manhattan plot (Fig. 1). QTL peaks included in step II analyses were identified subjectively based on the Manhattan plot. The most significant SNP was the marker with the largest -log 10 (P) value in that peak. QTL regions were defined as a section of chromosome 0.5 Mb on both sides of the most significant SNPs for FTI in the genome. The selected regions were screened for association with the sub-indices. An animal model considering all relationships among individuals was used.
The association between the SNP and the phenotype was assessed by a single-locus regression analysis for each SNP separately, using a linear mixed model [17]. The model was as follows: where y ij was the EBV for j'th bull from i'th population, μ was the overall mean, p i was a fixed effect of i'th population (RDCDNK, RDCSWE and RDCFIN), b was the allelic substitution effect, x ij was the number of copies of an allele (with an arbitrary labeling) of the SNP count in the individual j (corresponding to 0, 1, or 2 copies), u ij was the random polygenic effects. The vector u, whose elements were the u ij was assumed to follow a multivariate normal distribution u~N (0, A σ u 2 ) where A was the additive relationship matrix between sires and σ u 2 was the polygenic variance, and e j was the random residual of individual j with a normal distribution N(0,σ e 2 ). The model was fitted by REML using the software DMU [16]. The estimates and standard errors of the fixed effects estimates were obtained from DMU. Testing for the presence of an effect of a marker was done using a ttest against a null hypothesis of H 0 : b = 0. A SNP was considered significant if the -log(p) value for the SNP exceeded a genome wide Bonferroni corrected threshold of 8.25 (= −log 10 (0.05/8,938,927)) corresponding to a nominal type I error rate of 5 % corrected for 8,938,927 simultaneous tests.

Annotation of the top SNP
All positions provided in this text refer to the UMD3.1 assembly. The annotations of the most significant SNP markers in the selected regions were extracted from the ENSEMBL data base (http://www.ensembl.org, [18]): ENSEMBL Bos taurus 75.31, UMD3.1.

Results
Step I The results of the step I analysis for FTI are presented Fig. 1 and in Additional file 2: Table S1. In total 17,388 significant SNP markers (−log 10 (P) > 8.25) were detected for FTI distributed over 25 different chromosomes. The most significant QTL was identified on BTA12 with 12,860 significantly associated SNP. For this QTL the causal variant, a 660-Kb deletion around 20.5 Mb at BTA12, has previously been described [5]. The other QTL were located on BTA1 ( Table 1.
Step II The regions selected based on association results for FTI (Table 1) were reanalyzed for the individual female fertility traits using sequence data in an animal model ( Table 2). The QTL shown in Table 2 showed significant association with fertility sub-traits, however the magnitude of test statistics varied for individual sub-traits.
On BTA1 the associations with the highest test statistic were found for IFLC and IFLH. The most significant SNP marker for IFLC was located in the TRPC1 gene, while the most significant SNP marker for IFLH was annotated as an upstream gene variant of the GRK7 gene. On BTA2 the associations with the highest test statistic were found for AISH, IFLH, and NRRH. The SNP BTA2:132321329, (rs209242168) was the SNP with the highest test statistics for all three traits. This SNP is located within the EIF4G3 gene. On BTA3 the associations with the highest test statistic were found for AISH and NRRH. For the QTL for AISH the most significant SNP was located within the RHOC gene, while for the QTL for NRRH the most significant SNP was l annotated as an upstream gene variant of the WDR77 gene. On BTA5 the SNP with the highest test statistic for FTI was BTA5:62781359 (rs135099682) with a -log 10 (P) of 10.79. The same marker was also the most significant marker for the traits AISC, AISH and IFLH with -log 10 (P) values of 9.99, 13.87, and 13.45 respectively.
On BTA6 the associations with the highest test statistic were found for AISH and NRRH. Both SNPs (Chr6:43223916, Chr6:44642153) were located in an intergenic region (Table 2). However the most significant SNP marker (BTA6:43511992, rs41983284) for FTI was located within the GPR125 gene. A second QTL on BTA6 was detected and showed the strongest association with AISH and IFLH. The SNP BTA6:99400480 (rs379908987) was the most significant SNP for both AISH and IFLH and were located within the SCD5 gene. The SNP with the highest test statistics for NRRH (BTA6:96953202, rs110379023) was located within the C4orf22 gene. Furthermore, FTI was most strongly associated with the SNP BTA6:98115824 (rs208894094). This SNP was also the most significant marker for AISC and IFLC.
On BTA13 the associations with the highest test statistic were found for IFLC and IFLH, whereas the marker with the highest test statistic for FTI was located in the ANKRD60 gene. On BTA15 the associations with the highest test statistic were found for IFLC and NRRH. The most significant SNP marker for IFLC is located within the GRAMD1B gene, while the most significant SNP marker for NRRH is located in an intergenic region. On BTA20 all individual female fertility traits had a -log 10 (P) above 10. All SNP markers detected were located in an intergenic region. On BTA24 the associations with the highest test statistic were found for AISH and IFLH. The significant SNPs were located within the ZNF521 gene.

Discussion
In this study we have applied a two-step approach to screen the cattle genome for QTL for the female fertility index. In the first step the cattle genome was screened for QTL for the FTI based on the whole genome sequence variants using a sire model without considering the relationship among all animals in the study except the sire-son relationship. This was done to pre-select markers to analyze with a linear mixed model. Based on the results in step I, targeted QTL regions for the FTI were reanalyzed for the underlying female fertility traits using sequence data and an animal model taking the relationship among sires included in the study. Below we discuss the findings from individual targeted regions.

BTA1
In this region for the FTI QTL AISC, IFLC and NRRC have their most significant SNP (BTA1:127295226, rs43271631) in common. AISC and IFLC are phenotypes which are highly connected as they measure the same i.e., AIS reflects the number of inseminations, whereas IFL measures the time this has taken. BTA1:127295226 is located in intronic region of TRPC1. TRPC1 has functions related to protein binding (GO:0005515). In mice it has been shown that osteoblast formation and function is regulated by a TRPC1 protein-dependent pathway [19]. Furthermore, AISH and IFLH share their most significant SNP. It is located up-stream of GRK7. Höglund et al. [6] validated two SNP in Nordic Holstein, Jersey and Nordic Red for NRRH at 124.8 Mb and 135.6 Mb, which is 4 to 5 Mb away from the QTL peak for FTI     [20].

BTA2
The most significant marker (BTA2:132321329, rs209242168) was in common for the traits AISH, IFLH and NRRH and located in the intron of EIF4G3. The function of EIF4G3 is related to protein binding (GO:0005515). Previously a SNP on BTA2 has been detected for NRRC at 12.4 Mb in Finnish Ayrshire [4]. Even though the Finnish Ayrshire is part of the Nordic Red cattle population the SNP at 12.4 Mb for NRRC did not show significant effect in our study.

BTA3
The most significant SNP for FTI (BTA3:33388881, rs43336559) was also the most significant SNP for AISC and IFLC. This SNP is intergenic close to SLC6A17 gene (BTA3: 33,337,960-33,373,481). SLC6A17 gene functions as a vesicular transporter selective for proline, glycine, leucine, and alanine [21]. Complex Vertebral Malformation (CVM), which is characterized by misshapen and fused vertebrae around the cervico-thoracic junction is caused by a mutation in the Golgi-resident nucleotide-sugar transporter encoded by SLC35A3 located a 43.4 Mb on BTA3 [22]. This QTL on BTA3 has most significant effect on number of insemination, non-return rate and interval from first to last insemination indicating that it could be related to early embryonic death. Therefore SLC6A17 is a strong candidate gene for this QTL. Sahana

BTA15
In the region 34.1 to 35.1 Mb FTI, AISC, ICF, and IFLC did not have their most significant SNP in common, but each of their significant SNP was located in the intron region of GRAMD1B. This gene's involvement with female fertility is unknown. Höglund et al. [6] confirmed one SNP marker influencing IFLH in three cattle breeds (Nordic Holstein, Jersey and Nordic Red), however this marker was located at 23.6 Mb and did not overlap with the QTL detected for FTI in this study.

BTA20
The most significant SNP for FTI is BTA20:67116858 (rs133488500). All the female fertility traits have high -log 10 (P) of over 10 in the region of 500 kbp around SNP marker BTA20:67116858 ( Table 2). None of these SNP markers were annotated to a gene. One marker (BTB-01617409) located at 66,926,929 bp on BTA20 has been validated in Nordic Holstein, Jersey and Nordic Red for FTI [6]. This marker is only 100 kbp away from the most significant SNP marker for ICF (BTA20:66806800, rs384363430). Previously a significant SNP was detected on BTA20 for ICF in Finnish Ayrshire [4]. However with the position of this marker around 4.9 Mb this is not overlapping with the region for FTI identified in this study.

BTA24
The most significant SNP for FTI (BTA24:31820659, rs137134841) coincided with the most significant SNP for AISC and IFLC. In the same region ranging from 31.1 to 32.1 Mb AISH, IFLH and NRRH shared their most significant SNP (BTA24:31817915, rs381174897). The two SNPs are close to each other in the genome and are located in an intron in the gene ZNF521. ZNF521 is among others related to protein domain specific binding (GO:0019904). This gene has not been reported to be associated with fertility before, but has been reported as a candidate gene for the trait, rear side view, in Chinese Holstein cattle [24]. Previously significant SNP associations on BTA24 have been identified for ICF on BTA24 in Finnish Ayrshire [4]. These SNPs however are not located in the QTL region detected in this study.

Intergenic regions
All the significant SNPs on BTA5 and BTA20 were located in intergenic regions. With the improvement of the Bovine annotation of the genes and localization of the set sites of gene transcription, initiation, termination as well as differential splicing one would expect the identification of the causal mutation to improve. To date information on genomic structure of organisms which are better annotated like mouse and human are used as an information source. However it has been shown that areas which contain mainly regulatory elements and non-protein coding regions the annotation are more challenging [25,26]. Though we have used the "full" sequence level variants in our analysis half of the total genetic variants identified in the WGS were filtered out for various quality control reasons. All the variants which were not bi-allelic were dropped due to limitations in the imputation software.
Therefore the actual causal polymorphism may be missing from the data analyzed here.

Cow versus heifer traits
The genetic correlation between cow and heifer fertility is generally low [27,28]. This indicates that little overlap in genes for cows and heifers influencing female fertility are expected. The results of this study show that if female fertility traits had their most significant SNP marker in common with the fertility index these traits were either cow traits or heifer traits. This is in agreement with previous QTL mapping studies [6,29].

Conclusion
The results of this study showed that 1) the traits AISH and IFLH had in many cases the same significant SNP marker in the QTL detected for FTI, 2) the majority of the candidate genes assigned to the most significant SNP regulating female fertility were involved in protein binding, and 3) a SNP marker on BTA20 located in the QTL region for FTI has previously been validated for female fertility in three different breeds.
Wrote the paper: JKH, BB. All authors contributed to the discussion of the results, read and approved the final manuscript.