Much recent discussion addresses the question of ‘missing heritability’ in genome-wide association studies (GWAS). The problem can be illustrated using the example of human height. Classical pedigree studies show a high heritability of human height, in the order of 80%. This is part of our everyday experience: tall parents tend to have tall children. GWAS has identified more than 40 loci associated with height, but these variants together explain only a small part of phenotypic variation.1 A number of hypotheses have been advanced to identify the source of the missing heritability, including large effects of rare variants and effects of copy-number variation.2

A conceptual difference between pedigree studies and GWAS does not appear to have been considered: pedigree-based heritability measures the phenotypic effects of much larger chunks of chromosome than GWAS-based heritability. This distinction can be illustrated with a simple example that elides complexities arising from diploidy. Consider two SNPS (A/T and G/C) in linkage equilibrium that are located 0.1 cm apart. The SNPs could, for example, encode two amino acid substitutions within a single protein. From the perspective of pedigree-based measures of heritability, the four haplotypes (AG, AC, TG and TC) are inherited as four alleles at a single locus, but from the perspective of GWAS these are biallelic polymorphisms at distinct loci. Suppose that the combinations AG and TC add a little bit extra to height but AC and TG subtract a little bit. Then, neither SNP will be correlated with height in GWAS, but the haplotypes, which are correlated with height, will be reliably transmitted from parents to offspring and will contribute to estimates of pedigree-based heritability. Put another way, the genetic effect on phenotype appears as part of the additive genetic variance in pedigree studies but as an unmeasured gene × gene interaction in GWAS.

The major constraint on measuring interactions in GWAS has been the very large number of possible interactions. If there are 106 SNPs on an array, then there are 5 × 1011 pairs of SNPs. However, the number of pairs is a much more manageable 106 if analysis is restricted to neighboring SNPs.