QTL Identification Using Combined Linkage and Linkage Disequilibrium Mapping for Milk Production Traits on BTA6 in Chinese Holstein Population

Milk production traits are important economic traits for dairy cattle. The aim of the present study was to refine the position of previously detected quantitative trait loci (QTL) on bovine chromosome 6 affecting milk production traits in Chinese Holstein dairy cattle. A daughter design with 918 daughters from 8 elite sire families and 14 markers spanning the previously identified QTL region were used in the analysis. We employed a combined linkage and linkage disequilibrium analysis (LDLA) approach with two options for calculating the IBD probabilities, one was based on haplotypes of all 14 markers (named Method 1) and the other based on haplotypes with sliding windows of 5 markers (named Method 2). For milk fat yield, the two methods revealed a highly significant QTL located within a 6.5 cM interval (Method 1) and a 4.0 cM interval (Method 2), respectively. For milk protein yield, a highly significant QTL was detected within a 3.0 cM interval (Method 1) or a 2.5 cM interval (Method 2). These results confirmed the findings of our previous study and other studies, and greatly narrowed down the QTL positions.


INTRODUCTION
Milk production traits are important quantitative traits for dairy cattle. The goal of QTL mapping is to identify genes underlying these traits for us to gain a better understanding of their physiological and biochemical roles and for a more direct way of genetic improvement. Since the first QTL experiment on milk production was reported 14 years ago (Georges et al., 1995), many studies have reported mapping of QTL for milk production traits by linkage analysis (Khatkar et al., 2004). However, the mapping resolution achieved by linkage analysis is in general poor due to limited crossing-over events in pedigrees and marker density. Typically, the confidence intervals for many mapped QTL could span around 20-30 cM, which is too large for positional cloning of the underlying genes.
To overcome this limitation, some methods have been proposed to fine map QTL using the existing linkage disequilibria in natural populations accumulated from historical recombinations (Xiong and Guo, 1997). Methods have also been proposed to combine linkage disequilibrium (LD) with linkage analysis (LA) method (LDLA) to improve both power and precision of QTL mapping based on either variance component (Meuwissen et al., 2002) or likelihood (Farnir et al., 2002) approach. This strategy has been demonstrated to be able to narrow the mapping intervals for some QTL in a few studies (Olsen et al., 2004;Gautier et al., 2006;Holmberg et al., 2007;Olsen et al., 2008;Kim et al., 2009). Compared with LA analysis, the LDLA analysis utilizes the information of estimated identity by decent (IBD) probabilities between founders using linkage disequilibrium information, and thus would add more information for the inference. The theoretical advantages of LDLA analyses over LA or LD analyses are manifold: i) a marker for which a parent is homozygous does not contribute information in a linkage analysis, yet it does in LDLA analysis; ii) conversely, two parents may share the same haplotype but not necessarily the same QTL genotypes, and a pure LD analysis would be misleading, but in LDLA analysis the phenotype of offspring together with the ascertainment of alleles transmitted can be used to determine which are the most likely QTL genotypes of the parents; iii) an individual without relatives but with phenotype records can be included in the LDLA analysis, in contrast to LA analysis.
Bos Taurus autosome 6 (BTA6) has received considerable attention because many studies reveal that it harbors QTL for milk production traits (Georges et al., 1995;Spelman et al., 1998;Velmala et al., 1999;Ron et al., 2001;Freyer et al., 2003;Olsen et al., 2004;Olsen et al., 2005;Chen et al., 2006;Schopen et al., 2009). The results of those studies differ somewhat with respect to the number of QTL detected, their positions, and to what extent the milk traits are affected by QTL. Several studies performed in different breeds have reported segregation of at least one QTL for milk production traits close to marker BM143 in the middle of BTA6 (Spelman et al., 1996;Kuhn et al., 1999;Nadesalingam et al., 2001;Ron et al., 2001;Olsen et al., 2002;Olsen et al., 2004;Olsen et al., 2005). Using the LDLA method, Olsen et al. (2004) fine mapped a QTL affecting fat percentage and protein percentage to a 7.5 cM interval surrounded by the markers BMS2508 and FBN12 (near BM143) in Norwegian dairy cattle. Schnabel et al. (2005) fine mapped a QTL affecting protein percentage to a small interval in the vicinity of BM143 in a U.S. Holstein population.
In a previous study in a Chinese Holstein population, linkage analyses were carried out for mapping QTL affecting milk production traits in a region of 63.5 cM on BTA6 (Chen et al., 2006). A highly significant QTL was detected near marker BMS470 affecting milk fat yield. BMS470 is about 14 cM away from marker BM143. The aim of the present study was to further confirm the QTL previously detected and hopefully to narrow down the QTL region by increasing the marker density in the region around BMS470 and utilizing the LDLA method.

Population and phenotypes
The population consisted of eight sire families, which were proved to be segregating with QTL affecting milk yield, fat yield, protein yield or fat percentage in our previous study (Chen et al., 2006). The pedigree of each individual in the study was traced back as far as possible and revealed that they were sons or grandsons of North American Holstein, born in the years between 1993 and 1996. The sizes of these families were enlarged with the total number of daughters increasing from 746 in the previous study to 918 in the current study. The numbers of daughters in each family ranged from 47 to 233. These animals were from 15 Holstein cattle farms in Beijing with an average 305-d milk yield of about 8,500 kg. The Beijing Dairy Cattle Center official estimated breeding values (EBVs) of the daughters for the three milk production traits, i.e., milk yield (MY), fat yield (FY), and protein yield (PY) were used as phenotypes for QTL mapping. These EBVs were calculated using repeatability animal model BLUP and multi-lactation 305 d records pre-adjusted for calving age and month.

Markers and their linkage disequilibrium map
Based on the mapping results of our previous work (Chen et al., 2006;Mei et al., 2009), fifteen microsatellite markers between BMS690 and BM4528 were chosen from the latest MARC map (Ihara et al., 2004). The average map distance between adjacent markers was 1.02 cM, ranging from 0.42 to 1.85 cM. The map positions, number of alleles and polymorphic information contents (PIC) of all markers used in the current study are given in Table 1. First, the extracted DNA was routinely diluted and aliquoted to 96-well plates. Then, PCR was performed on the GeneAmp PCR System 9600 or 9700 (PE, Applied Biosystems, Foster City, CA) under the following conditions: 20-μl volume, 50 ng of genomic DNA, 1.3 μM of each primer (5′ ends of primers were labeled with fluorescein), 1.5 μM of MgCl 2 , 125 μM of dNTPs, and 0.5 IU of Taq polymerase. Annealing temperatures of PCR ranged from 55 to 61°C with 30 cycles of amplification. The PCR products were run on the ABI377 DNA sequencer (Applied Biosystems). Automated fragment analysis, size calling, and binning were then done by GeneScan v.3.1 and Genotyper v.2.5 (Applied Biosystems) to identify the alleles of each of the microsatellite loci.
A LD map for the studied region was implemented in the Graphical Overview of Linkage Disequilibrium (GOLD) package (Abecasis and Cookson, 2000), which provides an easy way to interpret graphical representation of the pattern of disequilibrium in the region of interest.

Statistical analysis
Fine mapping of QTL was performed using the LDLA method described by Meuwissen et al. (2002) based on a single QTL model. Basically, it is a variance component approach (Hoeschele et al., 1997), but is extended to take into account the information provided by the residual LD in the population. The procedure consisted of the following steps.
The first step was the reconstruction of maternally and paternally inherited marker haplotypes for each recorded individual. The haplotypes of all sires and daughters were estimated based on marker information using a Gibbs sampler (Meuwissen et al., 2002;Windig and Meuwissen, 2004). In a simulation study with evenly spaced markers, Grape at al. (2006) found that using 4 to 6 markers as a sliding window across the region to reconstruct haplotypes tended to give better accuracy. In this study, we used two analytical strategies, named Methods 1 and 2, to reconstruct haplotypes. In Method 1, all 14 markers were used simultaneously to reconstruct haplotypes. In Method 2, a sliding window of 5 contiguous markers was used to reconstruct haploytpes.
In the second step, the midpoint of each marker bracket was regarded as a putative position of the QTL. For each putative QTL position, the IBD probabilities of haplotype pairs were calculated based on marker and/or pedigree information and allele information (Meuwissen and Goddard, 2001;Meuwissen et al., 2002). The IBD probability matrix at putative QTL positions was deduced from the haplotype IBD matrix and from the known pedigree information (Fernando and Grossman, 1989). When the IBD probability of a haplotype pair was above 0.95 at a given position, these haplotypes were considered identical. The IBD probability depends on the effective population size and the number of generations starting from the base population. In this study, both of the effective population size and the number of generations were assumed to be 100.
The third step was the calculation of the likelihood at putative QTL positions using a variance component method (Hoeschele et al., 1997) based on the following linear mixed model: where μ is the overall mean, h is a vector of random haplotype effects of dimension q×l with q being the number of different haplotypes, Z is an (n×q) haplotype incidence matrix, u is a vector of random residual polygenic effects, and e is a vector of random errors. The variance-covariance matrices of h, u and e are , G 2 p h σ , A 2 u σ and , R 2 e σ respectively, where G p is the matrix of IBD probabilities among haplotypes, A is an additive genetic relationship matrix and R a diagonal matrix with 1 − j r (r j = reliability of the EBV of individual j) on the diagonals. The G p matrix and the corresponding likelihood of observations were evaluated for the putative QTL at the midpoint of each marker bracket. For each marker bracket, the likelihood ratio test statistic (LR) was calculated by maximizing the likelihoods with respect to the variance components for the full model (containing both haplotype and residual polygenic effects) and the reduced model (containing only residual polygenic effect), respectively. The marker bracket with the highest LR was taken as the most likely QTL bracket. In this study, the ASREML package (Gilmour et al., 2006) was used for calculating the LR values.

Significance level and confidence interval
It is generally regarded that the LR statistic follows approximately a x 2 distribution with degrees of freedom between one and two (Grignola et al., 1996). To be more conservative, we assumed here it followed a x 2 distribution with two degrees of freedom. Because the QTL was found to be highly significant in our previous study (Chen et al., 2006), a nominal p-value of 5% was considered to be significant in this study. The LOD drop-off method of Lander and Botstein (1989) was employed to estimate the approximate supporting intervals, which can be calculated by identifying the highest point in the LR profile and subtracting 1 LOD unit (equivalent to 4.6 LR) at both sides of the peak point.

Marker LD map
Each marker had 6 alleles on average, ranging from 5 to 10. Marker BM4322 had low polymorphic information content (0.48) in our population, hence it was removed from the analysis. The remaining 14 markers were finally employed in the LDLA analysis. Figure 1 shows the LD map of the remaining 14 markers. The average D' value between adjacent markers was 0.2941, indicating a substantial amount of LD over the targeted region.

QTL mapping
The results of the LDLA analysis with two methods for reconstruction of the marker haplotypes are presented in Table 2 and Figure 2. For the trait FY, the two methods revealed a significant QTL. Method 1 located the QTL in the bracket MNB208-BMS5010 while Method 2 located the QTL in the adjacent marker bracket BMS5010-FBN13, and the corresponding estimated genetic variance explained by the QTL was 10.297% and 11.495%, respectively, in this region. The 1-LOD drop-off interval was 6.5 cM and 4.0 cM for Methods 1 and 2, respectively. For PY, both methods revealed a significant QTL in the bracket M1-BMS470. The ratio of QTL variance to total genetic variance was estimated to be 9.613% and 9.408% for Method 1 and 2, respectively. The 1-LOD drop-off interval was 3.0 cM and 2.5 cM for Method 1 and 2, respectively. For MY, the two methods produced the largest LR value in the same bracket BMS5010-FBN13, but both corresponding p-values did not reach the 5% significance level.
The LR profiles for the milk traits are shown in Figure 2.

DISCUSSION
In this study, we successfully fine mapped QTL for FY and PY in a Chinese Holstein population. Our previous linkage analyses (Chen et al., 2006) suggested a QTL position affecting fat yield close to marker BM470, which is about 14.3 cM away from BM143, with a confidence interval of 55 cM for FY and 58 cM for PY. In the present  study, we further confirmed the previous result clearly while having much higher precision. Specifically, for FY, a QTL was detected in the contiguous marker brackets with the locations 60.96 cM and 62.24 cM for the two methods. The 1-LOD drop-off interval was calculated as 6.5 cM and 4.0 cM for method 1 and 2, respectively. For PY, both methods revealed a significant QTL in the bracket M1-BMS470 with the 1-LOD drop-off interval being 3.0 cM and 2.5 cM, respectively. Compared with our previous linkage studies, the confidence intervals of the detected QTL for FY and PY were narrowed dramatically. In addition, these results are also generally in agreement with several other studies. Zhang et al. (1998)  To date, the LDLA fine mapping methods have been used widely in dairy cattle. These studies showed that fine mapping of a previously identified chromosomal region using LDLA could greatly reduce the QTL interval and was an important step toward identification of the gene and its causative mutation (Farnir et al., 2002;Blott et al., 2003;Olsen et al., 2004;Cohen-Zinder et al., 2005;Schnabel et al., 2005). On BTA6, Ron et al. (2001) identified a QTL affecting milk protein and fat percentage within a 4 cM interval in Israeli dairy cattle. Olsen et al. (2004) mapped a QTL for fat percentage and protein percentage within a region of 7.5 cM between marker BMS2508 and FBN12 (near BM143) in Norwegian dairy cattle. Subsequently, Olsen et al. (2005) increased the marker density in the QTL region by typing additional SNPs followed by LDLA analysis. The haplotype analysis revealed six haplotypes that had significant effect on the protein content of milk. The most common of these haplotypes also had the most extreme effect. Further characterization of these haplotypes using sequence information, radiation hybrid mapping and comparative mapping with the human genome sequence resolved the QTL to a 420Kb haplotype region between gene ABCG2 and LAP3 (Olsen et al., 2005). Within this region, there are only four known human genes: IBSP, MEPE, OPN and PKD2 (Http://genome.ucsc.edu). Subsequently, Cohen-Zinder et al. (2005) and Schnabel et al. (2005) focused on this region to further identify genes ABCG2 and OPN, respectively, affecting milk traits. Recently, several candidate genes on BTA6, such as ABCG2, FAM13A1, PPARGC1A, OPN and SPP1, have been widely reported to be associated with milk production (Cohen et al., 2004;Cohen-Zinder et al., 2005;Schnabel et al., 2005;Weikard et al., 2005;Sheehy et al., 2009). The aim of this study was to refine the position of a previously identified QTL (Chen et al., 2006) by using more markers in the region and the LDLA mapping approach. Most of the aforementioned fine mapping studies on BTA6 provided evidence for the existence of a QTL affecting fat or/and protein percentage located near BM143 on BTA6. Differently from those studies, we focused on a novel QTL region between MNB208 to BMS470 based on our previous finding in a linkage study (Chen et al., 2006) in Chinese Holstein. We are attempting to fine map QTL underlying milk production traits to identify new candidate genes for further positional cloning. Two QTL signals corresponding to FY and PY were detected within the interval of MNB208 to BMS470. From the complete bovine genome sequence corresponding to this interval, a total of eighteen genes potentially relating to milk traits are involved, i.e., KLF3, TMEM156, WDR19, RPL9, LIAS, UGDH, UEB2K, RBM47, NSUN7, RPL7A, APBB2, UCHL1, TMEM33, SLC30A9, CCDC4, YIPF7, GABRA2 (http://www.ncbi. nlm.nih.gov/projects/mapview/). In the meantime, we should also be cautious regarding the regions neighboring this interval because of the great extent of LD in dairy cattle. In particular, a high level of LD in Holstein dairy cattle on BTA6 over ~2 cM has been reported (Khatkar et al., 2006). So, it is still possible that the underlying genes are outside of this interval. Our present findings have practical significance in exploring the novel candidate genes on BTA6 for milk traits besides those from other previous studies.
Two mapping methods for LDLA analysis were used in our study and the results show that Method 2 (using 5 markers for a sliding window in haplotype inference) gives a sharper peak and a smaller confidence interval in the QTL identification as compared to Method 1 (using all 14 markers) (Figure 2). These results are in agreement with the recent findings of Grapes et al. (2006) and Zhao et al. (2006). According to the simulation study of Grapes et al. (2006), derivation of IBD probabilities based on haplotypes of 4 to 6 markers was optimal, resulting in a higher precision of QTL position than that based on the haplotypes of 10 markers under various simulated scenarios. The possible reason is that too many markers used for inferring haplotype may generate a flatter likelihood curve and cannot discriminate between the alternative QTL position. Similar findings were also validated by Zhao et al. (2007), where 4 markers were able to get better mapping precision than other considerations.