Several Regions in the Major Histocompatibility Complex Confer Risk for Anti-CCP-Antibody Positive Rheumatoid Arthritis, Independent of the DRB1 Locus

Recent evidence suggests that additional risk loci for RA are present in the major histocompatibility complex (MHC), independent of the class II HLA-DRB1 locus. We have now tested a total of 1,769 SNPs across 7.5Mb of the MHC located from 6p22.2 (26.03 Mb) to 6p21.32 (33.59 Mb) derived from the Illumina 550K Beadchip (Illumina, San Diego, CA, USA). For an initial analysis in the whole dataset (869 RA CCP + cases, 1,193 controls), the strongest association signal was observed in markers near the HLA-DRB1 locus, with additional evidence for association extending out into the Class I HLA region. To avoid confounding that may arise due to linkage disequilibrium with DRB1 alleles, we analyzed a subset of the data by matching cases and controls by DRB1 genotype (both alleles matched 1:1), yielding a set of 372 cases with 372 controls. This analysis revealed the presence of at least two regions of association with RA in the Class I region, independent of DRB1 genotype. SNP alleles found on the conserved A1-B8-DR3 (8.1) haplotype show the strongest evidence of positive association ( P ~ 0.00005) clustered in the region around the HLA-C locus. In addition, we identified risk alleles that are not present on the 8.1 haplotype, with maximal association signals ( P ~ 0.001-0.0027) located near the ZNF311 locus. This latter association is enriched in DRB1*0404 individuals. Finally, several additional association signals were found in the extreme centromeric portion of the MHC, in regions containing the DOB1, TAP2, DPB1, and COL11A2 genes. These data emphasize that further analysis of the MHC is likely to reveal genetic risk factors for rheumatoid arthritis that are independent of the DRB1 shared epitope alleles.

Because the association of HLA with RA was first demonstrated in 1976 (10), the vast majority of case-control association studies have focused on HLA-DRB1 locus encoding a group of risk alleles collectively called the "shared epitope" (SE) alleles (11,12). These alleles share a common sequence element containing Q/K-R-R-A-A at positions 70-74 of the DRB1 chain, with some minor variation from this canonical sequence in some risk alleles. Despite the appealing simplicity of the shared epitope as an explanation for disease association, it is quite apparent that there is a complex hierarchy of risk for the various DRB1 alleles MOL MED 14(5-6)293-300, MAY-JUNE 2008 | LEE ET AL. | 293 Several Regions in the Major Histocompatibility Complex Confer Risk for Anti-CCP-Antibody Positive Rheumatoid Arthritis, Independent of the DRB1 Locus that contain the shared epitope (13). In addition, certain genotypic combinations, such as DRB1*0401/0404, carry exceedingly high risk that cannot be explained simply by the number of shared epitope alleles that are present (12). This suggests that there may be haplotypic effects that modify the risk of particular shared epitope alleles.
In addition, although the DRB1 locus is clearly of predominant importance, several reports over the years have suggested the presence of additional risk loci within the MHC (14)(15)(16)(17). The arguments for these additional loci are often confounded by the complex patterns of linkage disequilibrium that are observed in this genetic region. By carefully matching cases and controls by DRB1 genotype, we now provide additional evidence for several new risk loci for RA located in the Class I region of the MHC, as well in the region centromeric to the DRB1 locus.

Study Populations
RA cases and controls in the current analysis are taken largely from populations utilized for our previous whole genome association study using the Illumina 550K Beadchip (Illumina) (4). Briefly, RA cases were selected from four North American RA patient collections. The North American Rheumatoid Arthritis Consortium (NARAC) samples are from multiplex families (primarily affected sibling pairs); at least one sibling was required to have documented erosions on hand radiographs, with at least one sibling having disease onset between the ages of 18 and 60 years of age (18). The other collections include samples from the Wichita Rheumatic Disease Data Bank (WRDDB) (19), mean disease duration ten years; the National Inception Cohort of Rheumatoid Arthritis Patients (NICRAP) (20), enrolled within six months of clinical diagnosis; and Study of New Onset Rheumatoid Arthritis (SONORA ) (21), enrolled within 3-12 months of clinical diagnosis. All cases were anticyclic citrullinated peptide anti-body positive (CCP +) with reported European-American ancestry. The controls were taken from 1,732 individuals who are part of the New York Cancer Project (NYCP) (22) and on whom HLA-DRB1 data was available. All subjects reported European American ancestry. For the matched case-control studies, an additional set of 46 controls from the UK carrying the DRB1*0401/0404 genotype were included in the analysis. Informed consent was obtained for all samples using protocols approved by the local institutional review boards.

MHC Genotyping
Genotype data was obtained from the Illumina HumanHap550 genotyping array (Illumina) and included 2,094 SNPs in 7.56 Mb region from 6p22.2 (26.03 Mb) to 6p21.32 (33.59Mb) encompassing the entire MHC. Genotyping was performed at the Feinstein Institute for Medical Research according to the Illumina Infinium two assay manual (Illumina), as previously described (4). After stringent filtering with removal of SNPs with > 2% of missing genotype a total of 1,769 SNPs were available for analysis.
All participants were HLA-DRB1 typed using the SSOP low-resolution method (23); individuals with DRB1*04 and DRB1*01 were subsequently tested using medium-resolution panel to allow for four digit DRB1 subtyping. A proportion of the subjects had four digit typing for all DRB1 alleles (see Supplementary Materials). The 46 UK controls (DRB1*0401/0404 heterozygotes) were typed by the PCRrSSO techique using RELI TM kits (DYNAL).
Among an initial set of case samples for whole genome association study (total n = 908, including 464 NARAC, 168 WRDDB, 162 NICRAP, and 114 SONORA), we first carried out genome-wide quality control filtering as follows: individuals with > 5% missing genotypes, SNPs with > 5% missing data, control HWE P-values < 0.00001, and MAF < 0.01 were excluded. We removed samples with excess sharing across all pairs of individuals, as this pattern is consistent with DNA sample contamination. We also have removed genetic outliers as determined by either 235 major ancestral informative SNPs or 1,411 European ancestral informative SNPs (24), using the first or second principal components (see Supplementary Materials). After filtering, 869 case samples and 1,193 controls remained for analysis. Of these, 700 cases and 1,172 controls had DRB1 oligotyping data available. Of these 700 cases, 74 are overlapping with NARAC cases that have been previously utilized for fine mapping studies in the MHC (16).

Statistical Analysis
We first conducted a case-control association study for single marker analysis using HelixTree 5.0.2 software (Golden Helix Inc, Bozman, MT, USA) and R program, using all samples (n = 869 cases, 1,193 controls) and MHC SNPs (n = 1,769) that passed quality control filters. We calculated Pearson's chisquare test based on allele counts.
We further characterized association signals according to whether SNP alleles are found on the conserved ancestral A1-B8-DR3 (8.1) haplotype (25). Specifically, we distinguished association signals that were due to a higher 8.1 related allele frequency in the case group versus signals characterized by a higher 8.1 allele frequency in the control group. To determine whether particular alleles are present on the A1-B8-DR3 (8.1) ancestral haplotype, we obtained the sequence of the 8.1 haplotype from complete genomic sequences of 8.1 homozygotes (COX cell line) (http://www.sanger.ac.uk/HGP/ Chr6/MHC). As an independent check, we used 20 samples (one case and 19 controls) from DRB1*0301 homozygotes among the present study population to identify the alleles of 8.1 haplotype. By definition, these samples carry two copies of the 8.1 haplotype at the DRB1 locus, and due to the long-range nature of the 8.1 haplotype, they also carry two copies of the 8.1 allele at nearby SNPs until the haplotype is broken by recombination. Because recombination at any given position only occurred in a fraction of these sam-ples, the majority of the 20 samples had extended regions of SNP homozygosity reflecting the 8.1 haplotype. This can be used to infer the 8.1 haplotype associated alleles for each SNP.
To remove the effect of DRB1, the predominant risk locus for RA, we constructed a matched dataset with equal number of case and control samples with the exactly same DRB1 genotype. For matching of cases and controls at the DRB1 locus, we had available a mix of two digit (low resolution) for the majority of DRB1 alleles and four digit (medium resolution) oligotyping for DRB1*04 alleles and a large fraction of DRB1*01 related alleles. The vast majority of DRB1*01 alleles with four digit genotyping were DRB1*0101. Thus, matching on DRB1*04 and DRB1*01 subtypes was quite precise, while matching on other DRB1 genotypes was often based on membership in a common DRB1 allelic group. However, four digit typing also was available for some of these other allelic groups. The complete list of the DRB1 genotypes found in the matched pairs is given in Table 1 of the Supplementary Material. Because of the large number of DRB1*0401/DRB1*0404 heterozygotes in cases, we included additional controls with this genotype from the National Blood Service, UK.
In addition to matching of cases and controls by DRB1 genotype, we also carried out principal components analysis (EIGENSTRAT) (26) to assure matching of cases and controls for ancestry, thereby minimizing population stratification unrelated to disease status. Because the population stratification due to major ancestry differences was achieved by the removal of outliers during the quality control stage, the matching of case and control's ancestry was mainly on the European subpopulation level using 1,441 non-MHC makers selected to capture European population substructure (24). Therefore, case and control matched pairs were chosen not only by having the same DRB1 genotype, but also by virtue of proximity to each other in the PC1-PC2 plane (distance < 0.05) using For exploratory subset analyses, we characterized DRB1 alleles into three groups according to SE status and DRB1 family group: The SE(a) group alleles contained 0401, 0404, 0405, and 0408. The SE(b) group alleles contained 0101, 0102, 0901, and 1001; the SE(-) group contained all other DRB1 alleles. For some analyses we selected particular genotypic subgroups including 0401/not 0404, or 0404/not 0401.

Case-Control Association Study of SNPs Across the MHC Region in the Entire Dataset
To assess the patterns of association across the MHC, we analyzed our panel of MHC SNP markers extending from 6p22.2 (26.03 Mb) to 6p21.32 (33.59 Mb). These data were derived from a previous whole genome study using the Illumina 550K Beadchip (Illumina) (4). After quality filtering (see Methods), 1,769 SNPs were available for analysis on 869 CCP + case samples and 1,193 controls.
As expected, the strongest association signal was observed in markers near to the HLA-DRB1 locus (see Figure 1, which shows the single-SNP chi-square test statistic from the 2-by-2 allele count table). Nevertheless, there are several regions across the MHC, especially within the central MHC and Class I regions, that also showed significant association with RA (chi-square values between 50 and 100). Additional association signals of this magnitude are also observed in the region just centromeric to the DRB1 locus. It is, of course, unclear from this analysis whether these additional signals simply reflect linkage disequilibrium with the known DRB1 risk alleles, or whether these are independent effects.

Analysis of Case Control Pairs Matched 1:1 by DRB1 Genotype
To control for the effects of LD due to the strong association signals at the DRB1 locus, we chose to select an equal number of cases and controls matched specifically by DRB1 genotype, thereby eliminating differences between cases and controls at this locus. This approach is more precise than matching cases and controls by the simple presence of risk alleles and non risk alleles, because it is well established that there is considerable heterogeneity in the strength and evidence of association at the DRB1 locus in these two categories (12).
We were able to select 372 cases and 372 controls that were matched at the DRB1 locus for both alleles. As described in the Methods section, we also took care to match each pair of cases and controls for ancestry, using a panel of ancestry informative markers (24) and selecting pairs to minimize the distance between pairs using a principal components approach. Obviously, this dataset is considerably smaller than the sum of available samples. For example, the various SE(-)/SE(-) combinations of DRB1 alleles are in vast excess in controls, compared with the small number of cases available for matching on these genotypes. Conversely, some risk genotypes (such as 0401/0401 or 0401/0404) are rather uncommon in control subjects, and thus not all cases with this risk genotype could be utilized for matching. However, at this stage of the analysis, we preferred to remove any potential cause of a false positive signal rather than gain statistical power. To partially address the lack of controls with these high risk genotypes, we specifically obtained a group of 46 DRB1*0401/0404 controls to increase our sample size.
Our approach to the analysis was also guided by our previous studies that implicated the A1-B8-DR3 (8.1) haplotype in contributing to MHC risk that is independent of the DRB1 locus (16). Therefore, we categorized all 1769 SNPs as to whether the allele that is found on the 8.1 haplotype is positively or negatively Table 1. List of markers with χ 2 > 6.635 in 372 RA cases and 372 controls matched by DRB1 genotype. Odds ratios are shown assuming the risk allele is on the ancestral 8.1 haplotype (see Figure 2).

Marker
Position (  are located in the region around HLA-C, with maximal evidence for association within the HLA-C locus. This general region was implicated in our previous studies using microsatellite typing across the MHC (16), and therefore confirms that haplotypic elements that are related to the 8.1 extended haplotype confer risk for RA, independent of the DRB1 locus. Specifically, this signal cannot be due to LD with DR3 because both cases and controls are matched 1:1 for this allele, as well as all other major DRB1 alleles. Interestingly, we also observed significant association signals in the region centromeric to the DRB1 locus in regions that include the DOB, TAP2, DPB1, and COL11A2 genes (see Figure 2 and Table 1). These signals are mainly "positive," i.e., the 8.1 allele is enriched in the RA group.
Our previous microsatellite studies of the MHC had suggested that a more  telomeric portion of the Class I region also might be involved in RA susceptibility (16). In this case, the relevant microsatellite alleles were not related to the ancestral 8.1 haplotype, but rather could be found on certain DRB1*0404 haplotypes that were transmitted to affected subjects. It was, therefore, of interest that in the current study, evidence of additional association was observed in a region between 29.0 and 29.4 Mb, with maximal chi-square values over 8.9 for markers rs6923005 and rs6930903 (see Figure 2 and Table 1). When we examined associations in this region on the subgroup of paired sample containing any SE + DRB1*04 allele (designated SE(a) alleles for our purposes), there was increased evidence for association in this region, as shown in Table 2. The vast majority of SE(a) alleles are either DRB1*0401 or DRB1*0404 in our dataset. To determine which particular DRB1*04 allele, 0404 or 0401, is connected to this signal, we examined these associations in individuals who carried 0404 in the absence of 0401, and vice versa. Strikingly, even without correcting for differences in sample size, the associations in this region are overwhelmingly found in subjects carrying a DRB1*0404 allele in the absence of DRB1*0401, and not in subjects carrying DRB1*0401 in the absence of DRB1*0404 (Table 2).
For exploratory purposes, we also carried out a subgroup analysis to determine if the association signals in the Class 1 or centromeric Class II region are present (or absent) within any particular subsets of DRB1 genotypes, perhaps suggesting interactions with the DRB1 locus (other than LD). For this purpose we categorized genotypes into five groups, based on the designations of alleles as belonging to the SE(a), SE(b), or SE(-) groups, as described in the methods section. The rationale for these divisions relates to the generally higher odds ratios seen in the literature with SE(a) alleles compared with SE(b) alleles (12). Admittedly, these categories of risk alleles are imprecise, and were employed here only for exploratory purposes. We  Table 2 of Supplementary Materials). In addition to differences discussed above that reflect the contribution of 0404 subjects to the signals in Class I, there were a variety of differences among the subgroups in terms of patterns of association outside DRB1. However, these findings need to be validated in a much larger sample set, and we wish to emphasize that none of our results are corrected for multiple testing.

DISCUSSION
In this report we have provided confirmatory evidence for the existence of additional risk genes for rheumatoid arthritis in the Class I region of the MHC that are independent of the well known risk alleles at the DRB1 locus. The data are confirmatory in the sense that our previous study using 54 microsatellite markers across the MHC (16) suggested that at least two distinct regions in the Class I region are independently associated with RA, and these regions overlap with the regions implicated in the current study. Therefore, although we have utilized over 1,700 SNP markers in the present analysis, we have not carried out strict correction for multiple testing, because there is substantial prior evidence for association in these two Class I regions. Our new data also provide evidence for additional risk alleles in the extreme centromeric portion of the Class II region, although of course these results require replication. One should also note that the study (16) compared alleles transmitted to the RA patient and alleles that are not transmitted, thus all genetic information was obtained from pedigrees with the case samples. In the current case-control study, extra control samples are used as the baseline for allele frequencies. The two studies can then be considered as independent, with the exception of the 74 NARAC cases that are included in both studies (see Methods). As a further check of our current findings, we have also run the analysis after excluding these 74 case-control pairs with results that are highly similar to those shown in Figures 1 and 2, with significant (χ 2 > 6.635) associations in both Class I regions, as well in the region centromeric to DRB1 in the matched analysis (data not shown).
We have chosen an analytic approach that involves matching of cases and controls 1:1, based on DRB1 genotype.  sets available to us. In this way, we have completely removed the association signal at DRB1. Any residual minor mismatching at DRB1 is due to having only two digit rather than four digit typing on a proportion of SE negative alleles. To reduce error, we also confined ourselves to the evaluation of SNPs that had very high call rates ( > 98%) in this region for our entire dataset. In addition, we employed a set of ancestry informative markers to reduce population stratification. Thus, we view these data as providing very strong support for the hypothesis that several regions in the MHC provide risk for RA, independent of the DRB1 locus.
Our analysis was also particularly influenced by previous data suggesting that a fragment of the conserved ancestral 8.1 haplotype carries risk alleles for RA that are independent of DRB1. This hypothesis is derived from studies in our lab (16) as well as others (15,28). As discussed below, the ancestral 8.1 haplotype has been implicated in a number of different autoimmune diseases.
For the association signal in the region around HLA-C (see Figure 2), the risk alleles are found on a common ancestral haplotype carrying the classical HLA alleles A*0101-B*0801-C*0701-DRB1*0301-DQB1*0201-DQA1*0501-DPA1*0101-DPB1*0301, commonly known as the "A1-B8-DR3" or "8.1" ancestral haplotype. The 8.1 haplotype is of considerable medical and biological interest (25). A wide array of autoimmune phenotypes has been associated with this haplotype, including systemic lupus erythematosus (29), autoimmune hepatitis, and myasthenia gravis (25). Interestingly, recent data suggest that DRB1 is not the primary locus of risk for Myasthenia Gravis (30). Truncated 8.1 haplotypes that lack the DR3 allele are associated with higher levels of acetylcholine receptor autoantibodies in subjects with Myasthenia Gravis, and the primary association with disease appears to lie telomeric to the Class II region. In addition to associations with autoimmunity, the 8.1 ancestral haplotype has also been implicated in risk for IgA deficiency and common variable immunodeficiency (25). A low antibody response after hepatitis B immunization has also been associated with this haplotype (31), and some reports in the literature suggest that the 8.1 haplotype is associated with altered cytokine production and reduced Fc receptor function (32)(33)(34). Thus, there appears to be something immunologically distinctive about this extended haplotype.
The major association signals in RA that are related to the 8.1 haplotype encompass a number of potential candidate genes in the Class I regions, and the current data does not permit us to choose between them with any degree of confidence (see Table 1). Many significant signals are present within the HLA-C locus, and the maximal association is within the HLA-C locus itself. A substantial literature on the complex and complementary relationship between HLA-C locus polymorphisms and the extensive molecular diversity at the KIR locus is now emerging (35). Interestingly, one report has suggested that certain combinatorial relationships at these loci may predispose to rheumatoid vasculitis (36), although others have not observed such a relationship for RA in general (37). Nevertheless, complex interactions have been described for psoriatic arthritis (38), outcome of HIV infection (39), nasopharyngeal carcinoma (40), and the maternal fetal relationships underlying preeclampsia (41). The HLA-C allele found on the 8.1 haplotype is HLA-C*0701, a member of the "C-group 1" alleles that are characterized by an asparagine residue at position 80 (35). The C-group 1 alleles are ligands for inhibitory KIR2DL2/3 and activating KIR2DS2 receptors. A full characterization of HLA-C locus diversity in RA is now clearly mandated by the current findings in the context of the most up to date approaches to the analysis of the KIR locus (35).
We (16) and others (17) have proposed previously that the MHC also contains genetic risk elements outside of DRB1 that are unrelated to the extended 8.1 haplotype. In particular, we were interested in extending our prior evidence for a signal in the Class I region found in subjects carrying DRB1*0404 haplotypes. As shown in Table 1 and Figure 2, a group of SNPs extending from 29.0- 29.4 Mb are significantly associated with RA. To explore whether there is an interaction between this association and DRB1*0404, we compared the strength of this association in subjects carrying various DR4 shared epitope alleles -designated the "SE(a)" group of alleles in this discussion (see Methods) -with individuals carrying the two major DR4 risk alleles -DRB1*0401 and DRB1*0404. Strikingly, the evidence for association in the region is considerably higher in the 0404 group, despite a much lower sample size in this group (see Table 2).
Our previous study suggested that some association signals in Class I might be due to alleles that are present in a subset of *0404 haplotypes. The current study does not allow us to address this issue directly. However, regardless of the haplotypes involved, the region extending from 29.0-29.4 megabases contains risk alleles that appear to interact with the DRB1*0404 allele. In this case, the term "interact" implies that these two loci are involved in the same mechanistic pathway. It will require a much expanded sample size of cases and controls carrying DRB1*0404 to understand this finding fully.
Finally, we have developed evidence of additional independent risk effect in the region centromeric to the DRB1 locus. Areas of maximal association include the DOB, TAP2, DPB1, and COL11A2 genes. These findings are of interest because similar findings are emerging from a conditional analysis of rheumatoid arthritis in the Swedish population (Ding et al., manuscript submitted).
In summary, we have shown the presence of multiple independent risk regions for RA susceptibility within the MHC complex. The data strongly suggest the need for comprehensive HLA typing of large populations of RA patients, additional SNP mapping, and resequencing of candidate genes in the region. HLA typing should include full Class I typing so that potential interactions with KIR loci can be pursued. It is striking that after more than 30 y of investigation into the MHC relationships with rheumatoid arthritis, there is still much to be learned.

ACKNOWLEDGMENTS
Support was provided by the National Institutes of Health, RO1-AR44222 and U19 IMAGEN. The research also was supported in part by the intramural program of the National Institute of Arthritis and Musculoskeletal and Skin Diseases, as well as the Eileen Ludwig Greenland Center for Rheumatoid Arthritis at the Feinstein Institute. LAC is supported by R01 AI065841.