Role of evolutionary history on haplotype block structure in the human genome: implications for disease mapping

https://doi.org/10.1016/j.gde.2003.10.010Get rights and content

Abstract

With the completion of the first draft of the human genome sequencing project, a new challenge is to characterize patterns of linkage disequilibrium and haplotype structure across genomic regions to identify mutations associated with complex disease. Recent work shows considerable linkage disequilibrium heterogeneity, where genomic regions of extended haplotype blocks are punctuated by recombination hotspots. In this review we explore some of the current approaches to defining and characterizing ‘hapblocks’, mechanisms by which hapblocks may be generated, and the implications this block-like structure may have for successfully mapping mutations associated with complex disease.

Introduction

Linkage disequilibrium (LD) mapping of complex disease genes relies on the identification of an association of polymorphic markers (i.e. microsatellites or single nucleotide polymorphisms [SNPs]), either individually or as haplotypes, with mutations causing disease. Finding an association depends largely on the underlying genomic patterns of LD in the studied population. The NIH Haplotype Map (‘HapMap’) project, an extension of the genome sequencing project, aims to characterize patterns of haplotype structure and LD across the human genome to facilitate mapping of complex disease genes 1., 2.•.

LD is the nonrandom association between genetic markers (here we focus on SNPs) and can be empirically estimated for different genomic regions from nucleotide sequence data. When a mutation first arises, it is immediately in LD with other SNPs that are found on the chromosome on which the mutation initially occurs. Under a neutral, infinite allele model of molecular evolution, as this new mutation increases in frequency by genetic drift, LD is expected to decay between this new mutation and flanking SNPs as a result of recombination over time 3.•, 4.•, 5., 6.. Thus, if recombination occurs randomly across the genome, then the degree of association between SNPs should decrease as the chromosomal distance between SNPs increases. However, in recent years the quantification of sequence variation across genes and entire human chromosomes shows that the expected inverse relationship between the degree of LD and distance between SNPs is not universally found [3].

Initial theoretical work predicted that LD should extend only a few kilobases from common SNPs in the genome [7]. However, it appears that there are ‘hotspots’ where historical crossing-over events (i.e. recombination or gene conversion events) are clustered and separate relatively large haplotype blocks (‘hapblocks’) where SNPs are strongly associated over distances as large as 170 kb 2.•, 8., 9., 10., 11. (Figure 1). Recombination hotspots and hapblocks have important implications for the design and interpretation of disease-association studies 12., 13., 14., 15., 16.. If hapblocks can be defined across the genome and if they have similar block boundaries across populations, a subset of SNPs could be identified to distinguish common haplotypes, thus greatly reducing the number of SNPs that need to be genotyped for association studies. Here, we discuss approaches to defining hapblocks, recent evidence for variation in crossing-over rates and block size in the human genome, possible explanations for this phenomenon, and finally, the potential implications of hapblocks for disease-mapping studies.

Section snippets

Detecting hapblocks

LD is typically estimated with two different measures: D′ and r2 (Box 1). Although D′ is often used in mapping studies, it is upwardly biased in small population sizes and for low-allele frequencies; therefore, the use of r2 to detect LD, which is a direct function of the sample size and is comparable across studies, is now commonly preferred 3.•, 5., 14., 17.•. The expected value of r2 is a function of the parameter ρ  4Nec, where c is the recombination rate between two markers and Ne is the

Identification of htSNPs

The density of SNPs needed for gene-mapping studies will depend on identification of SNPs in strong association within hapblocks, and the ability to identify htSNPs across sampled populations. Johnson et al. [25] characterized 135 kb of DNA from nine genes and found that 2–5 htSNPs could be used to define the 6 or fewer common haplotypes (>5% in frequency) observed at each gene that account for at least 80% of all observed haplotypes. HtSNPs were identified either by eye or using a

Do recombination hotspots exist?

Recent analyses indicate a significant level of genomic recombination rate heterogeneity, which may be explained by mutation rate variability, allele frequency differences across samples, and both neutral and adaptive processes 18., 27., 28.. With the PCR amplification of gene loci from sperm, hapblock boundaries have been shown to coincide with crossing-over hotspots during meiosis, some clustering as small as 1–2 kb in the MHC class II region [29]. However, when recombination rate estimates

Role of evolutionary history

It is possible that the different assumptions among ‘block-finding’ methods, alone, may explain the variation in the size and number of blocks found in the genome, thus requiring more rigorous testing of methods, especially on the same genomic regions [24]. However, hapblock structure both within and across populations is also influenced by evolutionary factors such as fluctuations in population size (Ne), mutation rate (μ), recombination rate (c), admixture, and selection. Several parameters

Effects of ascertainment bias

The results of empirical and simulation studies described above indicate that demographic history has influenced the distribution of genetic variation, LD, and hapblocks across the genome and across populations. The initial identification of SNPs in one or a few populations can result in an ascertainment bias (AB) towards high frequency, presumably older, SNPs and an underestimate of LD 17.•, 44.. Therefore, it is important to understand how AB, with respect to the selection of genetic markers

Conclusions: implications for mapping disease mutations

A goal of the HapMap project is to define blocks of LD to assist in mapping mutations associated with complex disease 1., 2.•. These blocks are usually characterized by common SNPs identified in either one or a few populations. A major assumption is that the risk of common genetic disease is affected by common disease susceptibility alleles that are at high frequency at a few loci across ethnically diverse populations (i.e. common disease/common variant hypothesis 49., 50.•, 51.).

Update

Shortly before our review went to press, two studies regarding LD and haplotype block identification were published by Wall and Pritchard 53., 54.. The authors’ analysis of the recent Gabriel et al. study [2] shows that although recombination ‘hotspots’ do exist, ∼50% of the genome cannot be categorized as ‘block-like’. Consistent with our review, Wall and Pritchard stress the importance of a more dense SNP map to better define both the length and integrity of hapblocks in the human genome.

References and recommended reading

Papers of particular interest, published within the annual period of review, have been highlighted as:

  • of special interest

  • ••

    of outstanding interest

Acknowledgements

Funded by a Burroughs Wellcome Fund Career Award, a David and Lucile Packard Career Award, and National Science Foundation grant BCS-0196183 (SA Tishkoff), and NSF IGERT training grant BCS-9987590 (BC Verrelli).

References (54)

  • C. Wiuf et al.

    A coalescent model of recombination hotspots

    Genetics

    (2003)
  • K. Ardlie et al.

    Lower-than-expected linkage disequilibrium between tightly linked markers in humans suggests a role for gene conversion

    Am J Hum Genet

    (2001)
  • D.E. Reich et al.

    Human genome sequence variation and the influence of gene history, mutation and recombination

    Nat Genet

    (2002)
  • P.C. Sabeti et al.

    Detecting recent positive selection in the human genome from haplotype structure

    Nature

    (2002)
  • M.A. Saunders et al.

    Nucleotide variability at G6PD and the signature of malarial selection in humans

    Genetics

    (2002)
  • J. Wakeley et al.

    The discovery of single-nucleotide polymorphisms–and inferences about human demographic history

    Am J Hum Genet

    (2001)
  • J.M. Akey et al.

    The effect of single nucleotide polymorphism identification strategies on estimates of linkage disequilibrium

    Mol Biol Evol

    (2003)
  • C.S. Carlson et al.

    Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans

    Nat Genet

    (2003)
  • D.E. Reich et al.

    On the allelic spectrum of human disease

    Trends Genet

    (2001)
  • J.K. Pritchard et al.

    The allelic architecture of human disease genes: common disease-common variant…or not?

    Hum Mol Genet

    (2002)
  • J. Couzin

    New mapping project splits the community

    Science

    (2002)
  • S.B. Gabriel et al.

    The structure of haplotype blocks in the human genome

    Science

    (2002)
  • J.K. Pritchard et al.

    Linkage disequilibrium in humans: models and data

    Am J Hum Genet

    (2001)
  • J.D. Wall

    Insights from linked single nucleotide polymorphisms: what we can learn from linkage disequilibrium

    Curr Opin Genet Dev

    (2001)
  • K.G. Ardlie et al.

    Patterns of linkage disequilibrium in the human genome

    Nat Rev Genet

    (2002)
  • L. Kruglyak

    Prospects for whole-genome linkage disequilibrium mapping of common disease genes

    Nat Genet

    (1999)
  • N. Patil et al.

    Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21

    Science

    (2001)
  • Cited by (0)

    View full text