MHC Genomics and Disease: Looking Back to Go Forward

Ancestral haplotypes are conserved but extremely polymorphic kilobase sequences, which have been faithfully inherited over at least hundreds of generations in spite of migration and admixture. They carry susceptibility and resistance to diverse diseases, including deficiencies of CYP21 hydroxylase (47.1) and complement components (18.1), as well as numerous autoimmune diseases (8.1). The haplotypes are detected by segregation within ethnic groups rather than by SNPs and GWAS. Susceptibility to some other diseases is carried by specific alleles shared by multiple ancestral haplotypes, e.g., ankylosing spondylitis and narcolepsy. The difference between these two types of association may explain the disappointment with many GWAS. Here we propose a pathway for combining the two different approaches. SNP typing is most useful after the conserved ancestral haplotypes have been defined by other methods.


Introduction
It has been very nearly 50 years since Terasaki, Brewerton, and their colleagues discovered the extraordinary association between ankylosing spondylitis (AS) and HLA B27 [1,2]. With a few caveats of great interest to clinicians, all patients with AS have this allele, justifying the idea that B27 is an essential requirement for the disease-effectively a sine qua non [3]. However, the allele is much more frequent than the disease, and is therefore not itself sufficient. Penetrance is low [4,5]. Other known requirements are male sex and adult age, indicating that the mechanisms of susceptibility and pathogenesis may be quite complex and difficult to unravel. This is still the case today, although the fundamental finding has been confirmed to exhaustion [6].
It has been nearly 40 years since it was established that other HLA associations are completely different [3,7]. For example, in Caucasoids, the more severe forms of systemic lupus erythematosus (SLE) and myasthenia gravis with thymic hyperplasia (MG) are associated with the 8.1 ancestral haplotype [8,9]. This sequence includes B8, but extends over more than a megabase from HLA A to HLA DP. The genetic factors responsible for susceptibility and severity must be numerous and widely spaced throughout the MHC, although most evidence implicates the central MHC, including the C2, Bf, C4, HSP, and TNF genes, together with their associated non-coding or regulatory sequences. In contrast to AS, female sex is important for these conditions [10].
By 1983, these two different types of association were well recognized [8]. Over subsequent decades, many more examples have been added. Narcolepsy is another example of the allelic form in that the DQB1 0602 association is not restricted to one ethnic group [11,12]. Therefore, explanations for a direct role of a single allele are sought, in this case, and may ultimately appear to be disarmingly simple as for haemochromatosis [13,14] and some drug hypersensitivities [15].
Deficiencies of C4, C2, and 21-hydroxylase (21OH) are examples of associations with extensive ancestral haplotypes (8.1, 18.1, and 47.1, respectively) [16,17]. In each case, there is a plausible explanation, in that the particular sequence includes a missing or defective gene. The observations by Alper et al. [7,18,19] were important in leading to concepts of conserved population haplotypes, which have been faithfully inherited over thousands of generations and are best illustrated by 57.1, which is represented to various degrees in multiple ethnicities.
As different ancestral groups formed and migrated out of Africa and beyond, they carried conserved MHC sequences, which were fixed at each of the alpha, beta, gamma, and delta polymorphic frozen blocks (PFB) (see Figure 1 and Table 1). These sequences were shuffled laterally somewhat as populations mixed, and new combinations appeared, but the more polymorphic regions survive to this day [20][21][22]. In the case of these deficiencies, the defective or missing C4, C2, or 21OH genes remain within the frozen sequence.  [5], and adapted from [16].
The organization of the MHC provides a model for the genome. Each ancestral haplotype has its own map. Polymorphic frozen blocks are shaded. Not all genes are shown. PerB11 is now designated MIC. Frozen refers to the freezing of the sequence by inhibition of recombination and mutations, whether 1. unequal crossing-over, 2. double recombination, 3. nucleotide replacement, 4. insertions and deletions, 5. duplication, 6. other.
It was only through extensive family studies that it became clear that recombination occurred outside polymorphic frozen blocks ( Figure 2). Indeed, the frozen blocks must have been inherited faithfully over many generations, since identical haplotypes occur in subjects with extensive family trees showing no known relationship, and even in populations that were widely scattered geographically, implying only very remote common ancestry.  [5], and adapted from [16].
The organization of the MHC provides a model for the genome. Each ancestral haplotype has its own map. Polymorphic frozen blocks are shaded. Not all genes are shown. PerB11 is now designated MIC. Frozen refers to the freezing of the sequence by inhibition of recombination and mutations, whether 1.
It was only through extensive family studies that it became clear that recombination occurred outside polymorphic frozen blocks ( Figure 2). Indeed, the frozen blocks must have been inherited faithfully over many generations, since identical haplotypes occur in subjects with extensive family trees showing no known relationship, and even in populations that were widely scattered geographically, implying only very remote common ancestry. The reality is that the frozen blocks occupy only a limited proportion of the whole MHC region of a megabase or more, and we have not been able, as suggested in Figure 1, to define hard boundaries between frozen blocks and areas subject to recombination. This difficulty leads us to conclude that there are degrees of freezing as well as specific hotspots. Carrington [23] has contributed by identifying some of the regions which do recombine, but we suspect that such regions are dependent on the genomic environment rather than distance. For example, B8, which is Caucasoid specific by any measure, occurs in non-caucasoids with MG but hardly otherwise [24]. This observation implies that lateral transfer between haplotypes of very different ancestry may lead to thawing as a consequence of differences in the cis, trans, and epistatic interactions. Using multiethnic mapping is powerful [25][26][27] as a way to find which components carry susceptibility. In other words, ancestral haplotypes are preserved in ethnicities, but eventually fall apart in multiracial combinations.
Susceptibility to autoimmune diseases, such SLE, MG, and insulin-dependent diabetes mellitus (IDDM) must be similarly frozen [8,[28][29][30][31][32]-although, to date, there is no mechanistic explanation for the susceptibility [20,32]. Largely, for this reason, we have suggested a dominant role for noncoding regulatory sequences associated with duplicons, indels, and retroviral-like elements (Table 1). We also implicate epistatic, trans, and cis interactions, with their potential to increase the degree of functional polymorphism exponentially [5,16]. Epistatic refers to sequences on different chromosomes, which segregate independently. Trans refers to the sequence of the second alternative chromosome. Cis refers to the same chromosome. Table 1. Lessons from MHC genomics.

•
Human diversity is inherited from ancestors, rather than created by recent mutation.

•
Diversity is regenerated at speciation and maintained by meiotic recombination between the ancestral haplotypes within polymorphic frozen blocks.

•
The unit of inheritance is the ancestral haplotype.
These lessons could not have been predicted from the prevailing concepts, which still underlie the thinking behind SNPs and GWAS (Table 2). This SNP-based thinking promotes an overemphasis on the role of ongoing mutation, compared to conservation of ancient polymorphism [33]. The reality is that the frozen blocks occupy only a limited proportion of the whole MHC region of a megabase or more, and we have not been able, as suggested in Figure 1, to define hard boundaries between frozen blocks and areas subject to recombination. This difficulty leads us to conclude that there are degrees of freezing as well as specific hotspots. Carrington [23] has contributed by identifying some of the regions which do recombine, but we suspect that such regions are dependent on the genomic environment rather than distance. For example, B8, which is Caucasoid specific by any measure, occurs in non-caucasoids with MG but hardly otherwise [24]. This observation implies that lateral transfer between haplotypes of very different ancestry may lead to thawing as a consequence of differences in the cis, trans, and epistatic interactions. Using multiethnic mapping is powerful [25][26][27] as a way to find which components carry susceptibility. In other words, ancestral haplotypes are preserved in ethnicities, but eventually fall apart in multiracial combinations.
Susceptibility to autoimmune diseases, such SLE, MG, and insulin-dependent diabetes mellitus (IDDM) must be similarly frozen [8,[28][29][30][31][32]-although, to date, there is no mechanistic explanation for the susceptibility [20,32]. Largely, for this reason, we have suggested a dominant role for non-coding regulatory sequences associated with duplicons, indels, and retroviral-like elements (Table 1). We also implicate epistatic, trans, and cis interactions, with their potential to increase the degree of functional polymorphism exponentially [5,16]. Epistatic refers to sequences on different chromosomes, which segregate independently. Trans refers to the sequence of the second alternative chromosome. Cis refers to the same chromosome.

•
Human diversity is inherited from ancestors, rather than created by recent mutation.

•
Diversity is regenerated at speciation and maintained by meiotic recombination between the ancestral haplotypes within polymorphic frozen blocks.

•
The unit of inheritance is the ancestral haplotype. • Such sequences carry specific alleles, duplicons, indels and retroviral-like elements (RLEs), which together regulate gene expression.
These lessons could not have been predicted from the prevailing concepts, which still underlie the thinking behind SNPs and GWAS (Table 2). This SNP-based thinking promotes an overemphasis on the role of ongoing mutation, compared to conservation of ancient polymorphism [33]. Table 2. Alternative dogma.
• Genetic diversity or inherited variation requires ongoing mutation.
• Diversity accumulates through errors in the copying of DNA.
• The unit of inheritance is the allele.
• At each locus alleles may be deleterious, beneficial, or neutral.
Clearly, any understanding of MHC genomics leading to disease must take account of the pragmatic observations implicating age-and sex-dependent penetrance. In Figure 3, we illustrate the potential importance of cis and trans interactions by proposing that ancestral haplotypes can be represented by meshing polymorphic cogs. The degree of meshing or interaction, whether cis or trans, is affected by size and density (expression), as well as by competition with similar cogs, including those representing paralogous sequences encoded on other chromosomes. Thus, MHC, genotypically identical subjects may be affected to different degrees ( Figure 3).  Reproduced with permission from [5].
Clearly, any understanding of MHC genomics leading to disease must take account of the pragmatic observations implicating age-and sex-dependent penetrance. In Figure 3, we illustrate the potential importance of cis and trans interactions by proposing that ancestral haplotypes can be represented by meshing polymorphic cogs. The degree of meshing or interaction, whether cis or trans, is affected by size and density (expression), as well as by competition with similar cogs, including those representing paralogous sequences encoded on other chromosomes. Thus, MHC, genotypically identical subjects may be affected to different degrees ( Figure 3).  vertically (in cis). The size and shape of cogs represent polymorphism or variant forms of the blocks. For example, the father has yellow-blue (a) and redorange (b) haplotypes while the mother has green-cyan (c) and orange-brown (d) haplotypes. Haplotype (a) has been transmitted to the eldest daughter, eldest son, and youngest son. The middle two children have inherited haplotype (b). Reactive meshing is dependent on hormonal and other environmental influences. The oldest and youngest offspring are genotypically identical, but the interactions are different as a consequence of the sexual environment. Adapted with permission from [5].  vertically (in cis). The size and shape of cogs represent polymorphism or variant forms of the blocks. For example, the father has yellow-blue (a) and red-orange (b) haplotypes while the mother has green-cyan (c) and orange-brown (d) haplotypes. Haplotype (a) has been transmitted to the eldest daughter, eldest son, and youngest son. The middle two children have inherited haplotype (b). Reactive meshing is dependent on hormonal and other environmental influences. The oldest and youngest offspring are genotypically identical, but the interactions are different as a consequence of the sexual environment. Adapted with permission from [5].
These models have led to the concept of multifunctional polymorphic control of metabolic and other pathways. Cascades involving stepwise activation of related products, such as the complement system, are promising targets for further study. Table 3 illustrates how ancestral haplotypes were recognized initially. Today there are many more haplotypic markers, including noncoding sequences between the loci shown. As the number of haplotypic markers increased, including SNPs, it became more and more obvious that very few are haplo-specific. It follows that linkage disequilibrium (LD) cannot define such haplotypes. For example, 2 × 2 delta values cannot identify 18.1 and 18.2, because the alleles are shared by other haplotypes.

Genome-Wide Association Studies and Single Nucleotide Polymorphisms
We believe that the above concepts and models of ancestral haplotypes suggest an alternative approach to the conduct and analysis for genome-wide association studies. When applied to the MHC, some results of commercial SNP typing have been disappointing, even to the point that a recent review by Kennedy et al. [36] essentially dismisses the many classic studies, and very unwisely blames "HLA typing errors, disregard of population structure and lack of replication" [36]. The same authors cite a paper which promotes a "focus on haplotypes . . . first suggested in 1987", thereby ignoring important previous contributions on haplotypes-including the original use of the term in 1967 by Ruggero Ceppellini [37,38]-and a huge body of careful observations that have been confirmed repeatedly and rediscovered, without attribution, in the past few decades.
The International Hapmap Project [39][40][41] is a potentially valuable resource of high-resolution SNP-typed individuals, and includes samples of cell lines used to define the genomic sequence of conserved, extended ancestral haplotypes, which segregate faithfully through families. Surprisingly, the proponents and users [42] of Hapmap have ignored the opportunities for reconciliation with earlier studies, which addressed the shortcomings of LD analysis and focused on haplotypic sequences, including RLEs; indels; duplications; and single nucleotide polymorphism in the literal sense, used by Gaudieri et al. [43] and Longman et al. [44] to map regions of extensive, interrupted sequence differences. The importance of PFB [16] or fixity [45] was also ignored until rediscovered [46]. A more balanced review by Petersdorf and O'hUigin [47] begins the daunting process of integrating population genetics, classic HLA associations, ancestral haplotypes, polymorphic frozen blocks, SNP typing, and gene expression [47]. The authors hope for "the study of haplotype-associated phenotypic differences" and for "haplotype-matching" in transplantation. Indeed, there is already great encouragement for each of these ambitions. The functional differences conferred by ancestral haplotypes, such as 8.1, have been well known for more than 30 years, and include TNF and IgA concentrations, even though the latter is not encoded within the MHC [16,48,49]. The benefit of haplotype matching in renal and bone marrow transplantation was established decades ago [50,51].

Reconciling MHC Genomics and SNP Typing
While we recognize that the disconnect between classical MHC and later SNP genomics will decrease [47,52], we hope this happens quickly. To this end, we summarize the issues as follows: 1.
There is no possibility of a single reference sequence. Rather, there are numerous ancestral haplotypes, each with its own very extensive and specific sequence [35,[53][54][55].

2.
These sequences are characteristic of ancestral populations or ethnicities. SNP typing on heterozygous mixed populations cannot reveal ancestral haplotype-or at least, must be extremely inefficient [20]. 3.
MHC complexity is best managed by defining haplotypes through segregation in extensive family studies [56]-not trios-since the power of segregation increases with the number of copies in different heterozygous combinations. Recombination can be demonstrated given sufficient generations to study.

4.
Ancestral haplotypes include specific duplications, indels, etc. [54]. Fortunately, there are now many panels of homozygous cell lines and libraries of their sequences available [57,58]. These should be the references and should replace allele and SNP databases. 5.
Penetrance is crucial but complex, and depends on age, sex, and a multitude of environmental factors that will vary from time to time and in different settings [53]. Cis and trans interactions are well known contributors to susceptibility and severity [5] These need to be included in the experimental design, but will be difficult to understand until the relevant pathways are defined. We recommend careful consideration of the concept of whole genome duplications resulting in paralogous sequences [16,59], which may compete with and modify the effect of any sequence implicated through SNP analysis. 6.
Linkage disequilibrium (LD) is an incomplete reflection of conserved haplotypes and their relative frequencies. There are at least two types of association with disease, as described above. No doubt, there will be further categories, especially as cis and trans interactions are defined. IDDM is an example of the need to address the mode of inheritance and multiple interactions, as described by Alper et al. [62,63]. Epistatic interactions may also be important. 8.
The low positive predictive values of a genetic marker for a disease will remain so until the pathogenic pathways are understood. Fortunately, for clinical purposes, there are examples where the absence of an allele or sequence can be useful for the exclusion of a diagnosis. However, the presence of the same allele does not permit confirmation of the diagnosis-take, for example, HLA B27 [4]. Thus, those designing future studies should consider how the results will be of practical value, and at the same time, be useful in defining the biology. 9.
Many regions of biological and statistical importance are not included in commercial SNP typing. In fact, duplications, indels, RLEs, and ambiguities may be very informative [16]. 10. The MHC is a useful model for other genomic regions with polymorphic frozen blocks [64].
11. An understanding of synteny and paralogy is very valuable [65]. 12. Many MHC associations, such as narcolepsy, 21-OH deficiency, and haemochromatosis are not immunologically mediated. There is no justification for prejudice in interpreting results. Current limited understanding of pathological processes may lead to confusion. In fact, as illustrated by the value of informative clinic-to-laboratory studies, there is potential to elucidate these processes. 13. A promising approach includes understanding the processes and genetics responsible for autoimmune diseases induced by immune checkpoint inhibitors, vaccines, and drugs, such as D-Penicillamine. The value of genomics increases when the inducing agent is known [15].

Conclusions
One lesson might be that there are incompatible concepts and terminology, as well as a very patchy understanding of the incontrovertible facts of MHC associations and structure. Another might be that research is constrained by the limitations of commercial platforms.
We hope our commentary is helpful in providing background for new discoveries. Hence the subtitle: "Looking Back to Go Forward". Surely, it is important not to dismiss the history as "HLA typing errors". In fact, the International HLA workshops provide a useful resource for haplotype association with disease, as well as for matching donors and hosts for transplantation.
SNP typing is most useful after the conserved ancestral haplotypes have been defined by other methods.
Funding: This research received no external funding.