Role of evolutionary history on haplotype block structure in the human genome: implications for disease mapping
Introduction
Linkage disequilibrium (LD) mapping of complex disease genes relies on the identification of an association of polymorphic markers (i.e. microsatellites or single nucleotide polymorphisms [SNPs]), either individually or as haplotypes, with mutations causing disease. Finding an association depends largely on the underlying genomic patterns of LD in the studied population. The NIH Haplotype Map (‘HapMap’) project, an extension of the genome sequencing project, aims to characterize patterns of haplotype structure and LD across the human genome to facilitate mapping of complex disease genes 1., 2.•.
LD is the nonrandom association between genetic markers (here we focus on SNPs) and can be empirically estimated for different genomic regions from nucleotide sequence data. When a mutation first arises, it is immediately in LD with other SNPs that are found on the chromosome on which the mutation initially occurs. Under a neutral, infinite allele model of molecular evolution, as this new mutation increases in frequency by genetic drift, LD is expected to decay between this new mutation and flanking SNPs as a result of recombination over time 3.•, 4.•, 5., 6.. Thus, if recombination occurs randomly across the genome, then the degree of association between SNPs should decrease as the chromosomal distance between SNPs increases. However, in recent years the quantification of sequence variation across genes and entire human chromosomes shows that the expected inverse relationship between the degree of LD and distance between SNPs is not universally found [3•].
Initial theoretical work predicted that LD should extend only a few kilobases from common SNPs in the genome [7]. However, it appears that there are ‘hotspots’ where historical crossing-over events (i.e. recombination or gene conversion events) are clustered and separate relatively large haplotype blocks (‘hapblocks’) where SNPs are strongly associated over distances as large as 170 kb 2.•, 8., 9., 10., 11. (Figure 1). Recombination hotspots and hapblocks have important implications for the design and interpretation of disease-association studies 12., 13., 14., 15., 16.. If hapblocks can be defined across the genome and if they have similar block boundaries across populations, a subset of SNPs could be identified to distinguish common haplotypes, thus greatly reducing the number of SNPs that need to be genotyped for association studies. Here, we discuss approaches to defining hapblocks, recent evidence for variation in crossing-over rates and block size in the human genome, possible explanations for this phenomenon, and finally, the potential implications of hapblocks for disease-mapping studies.
Section snippets
Detecting hapblocks
LD is typically estimated with two different measures: D′ and r2 (Box 1). Although D′ is often used in mapping studies, it is upwardly biased in small population sizes and for low-allele frequencies; therefore, the use of r2 to detect LD, which is a direct function of the sample size and is comparable across studies, is now commonly preferred 3.•, 5., 14., 17.•. The expected value of r2 is a function of the parameter ρ ≅ 4Nec, where c is the recombination rate between two markers and Ne is the
Identification of htSNPs
The density of SNPs needed for gene-mapping studies will depend on identification of SNPs in strong association within hapblocks, and the ability to identify htSNPs across sampled populations. Johnson et al. [25] characterized 135 kb of DNA from nine genes and found that 2–5 htSNPs could be used to define the 6 or fewer common haplotypes (>5% in frequency) observed at each gene that account for at least 80% of all observed haplotypes. HtSNPs were identified either by eye or using a
Do recombination hotspots exist?
Recent analyses indicate a significant level of genomic recombination rate heterogeneity, which may be explained by mutation rate variability, allele frequency differences across samples, and both neutral and adaptive processes 18., 27., 28.. With the PCR amplification of gene loci from sperm, hapblock boundaries have been shown to coincide with crossing-over hotspots during meiosis, some clustering as small as 1–2 kb in the MHC class II region [29]. However, when recombination rate estimates
Role of evolutionary history
It is possible that the different assumptions among ‘block-finding’ methods, alone, may explain the variation in the size and number of blocks found in the genome, thus requiring more rigorous testing of methods, especially on the same genomic regions [24•]. However, hapblock structure both within and across populations is also influenced by evolutionary factors such as fluctuations in population size (Ne), mutation rate (μ), recombination rate (c), admixture, and selection. Several parameters
Effects of ascertainment bias
The results of empirical and simulation studies described above indicate that demographic history has influenced the distribution of genetic variation, LD, and hapblocks across the genome and across populations. The initial identification of SNPs in one or a few populations can result in an ascertainment bias (AB) towards high frequency, presumably older, SNPs and an underestimate of LD 17.•, 44.. Therefore, it is important to understand how AB, with respect to the selection of genetic markers
Conclusions: implications for mapping disease mutations
A goal of the HapMap project is to define blocks of LD to assist in mapping mutations associated with complex disease 1., 2.•. These blocks are usually characterized by common SNPs identified in either one or a few populations. A major assumption is that the risk of common genetic disease is affected by common disease susceptibility alleles that are at high frequency at a few loci across ethnically diverse populations (i.e. common disease/common variant hypothesis 49., 50.•, 51.).
Update
Shortly before our review went to press, two studies regarding LD and haplotype block identification were published by Wall and Pritchard 53., 54.. The authors’ analysis of the recent Gabriel et al. study [2•] shows that although recombination ‘hotspots’ do exist, ∼50% of the genome cannot be categorized as ‘block-like’. Consistent with our review, Wall and Pritchard stress the importance of a more dense SNP map to better define both the length and integrity of hapblocks in the human genome.
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
- •
of special interest
- ••
of outstanding interest
Acknowledgements
Funded by a Burroughs Wellcome Fund Career Award, a David and Lucile Packard Career Award, and National Science Foundation grant BCS-0196183 (SA Tishkoff), and NSF IGERT training grant BCS-9987590 (BC Verrelli).
References (54)
- et al.
Why is there so little intragenic linkage disequilibrium in humans?
Genet Res
(2001) - et al.
High-resolution haplotype structure in the human genome
Nat Genet
(2001) Finding genes underlying risk of complex disease by linkage disequilibrium mapping
Curr Opin Genet Dev
(2003)- et al.
Linkage disequilibrium and inference of ancestral recombination in 538 single-nucleotide polymorphism clusters across the human genome
Am J Hum Genet
(2003) - et al.
Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation
Am J Hum Genet
(2002) - et al.
Population genomics: a bridge from evolutionary history to genetic medicine
Hum Mol Genet
(2001) - et al.
A first-generation linkage disequilibrium map of human chromosome 22
Nature
(2002) - et al.
Patterns of human genetic diversity: implications for human evolutionary history and disease
Annu Rev Genomics Hum Genet
(2003) - et al.
A neutral explanation for the correlation of diversity with recombination rates in humans
Am J Hum Genet
(2003) - et al.
Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots
Nat Genet
(2003)