Genome-Wide Association Study of Determinants of Anti-Cyclic Citrullinated Peptide Antibody Titer in Adults with Rheumatoid Arthritis

INTRODUCTION Rheumatoid arthritis (RA) is one of the most common autoimmune diseases, affecting 0.5% to 1% of most populations (1). This systemic disease is marked by chronic inflammation that affects the synovial membrane of diarthrodal joints predominantly. RA is heterogeneous in terms of clinical presentation in disease aspects, progression, and severity, and in response to different clinical treatments. Its susceptibility is determined by a combination of multiple genetic and environmental factors. The most important genetic factor, which has been associated unequivocally with RA susceptibility, is variability in the major histocompatibility complex (MHC). Although the HLA associations with RA are complex (2,3), the majority of the genetic signal from the MHC is explained by alleles at the HLA-DRB1 locus (4), and accounts for approximately 30% of the genetic risk of RA (5). In individuals of European ancestry, the associated HLADRB1 alleles share a region of sequence similarity or “shared epitope” (SE) at amino acid positions 70–74 in the third hypervariable region of the HLA-DRB1 molecule (HLA-SE) (5). Anti-cyclic citrullinated peptide (antiCCP) antibody is an important biomarker for RA. It is as sensitive as, but more specific than, rheumatoid factor for diagnosing RA (6). Multiple studies have shown that the presence of anti-CCP antibodies in RA correlates with radiographic progression (6–11). Known genetic risk factors for RA are associated more strongly with CCP positive (CCP+) RA than with CCP negative (CCP–) RA (12–14); the most recent study by Ding et al. (14) definitively showed that the HLA region is associated specifically with CCP+ subset of RA. Previous studies of genetic determinants of anti-CCP have Genome-Wide Association Study of Determinants of Anti-Cyclic Citrullinated Peptide Antibody Titer in Adults with Rheumatoid Arthritis


INTRODUCTION
Rheumatoid arthritis (RA) is one of the most common autoimmune diseases, affecting 0.5% to 1% of most populations (1). This systemic disease is marked by chronic inflammation that affects the synovial membrane of diarthrodal joints predominantly. RA is heterogeneous in terms of clinical presentation in disease aspects, progression, and severity, and in response to different clinical treatments. Its susceptibility is determined by a combination of multiple genetic and environmental factors. The most important genetic factor, which has been associated unequivocally with RA susceptibility, is variability in the major histocompatibility complex (MHC). Although the HLA associations with RA are complex (2,3), the majority of the genetic signal from the MHC is explained by alleles at the HLA-DRB1 locus (4), and accounts for approximately 30% of the genetic risk of RA (5). In individuals of European ancestry, the associated HLA-DRB1 alleles share a region of sequence similarity or "shared epitope" (SE) at amino acid positions 70-74 in the third hypervariable region of the HLA-DRB1 molecule (HLA-SE) (5).
Anti-cyclic citrullinated peptide (anti-CCP) antibody is an important biomarker for RA. It is as sensitive as, but more specific than, rheumatoid factor for diagnosing RA (6). Multiple studies have shown that the presence of anti-CCP antibodies in RA correlates with radiographic progression (6)(7)(8)(9)(10)(11). Known genetic risk factors for RA are associated more strongly with CCP positive (CCP+) RA than with CCP negative (CCP-) RA (12)(13)(14); the most recent study by Ding et al. (14) definitively showed that the HLA region is associated specifically with CCP+ subset of RA. Previous studies of genetic determinants of anti-CCP have been limited to the HLA region (7,10,11,(15)(16)(17)(18). Almost all of the studies published on HLA-SE alleles and anti-CCP have the consistent finding that HLA-SE alleles are associated with CCP positivity, while some of these found that the HLA-SE is specifically associated with higher anti-CCP titer. Irigoyen et al. reported that HLA-DR3 is associated with lower anti-CCP titer, in contrast with HLA-SE alleles (16), using the NARAC collection.
Anti-CCP is relatively stable over time, making it a good quantitative trait to investigate. In this study, we treat anti-CCP antibody titer among RA subjects as a quantitative outcome and proxy for progressive RA in a genome-wide association study (GWAS) to elucidate the genetic basis of anti-CCP titer. Analyses were performed both with and without adjusting for HLA-SE status and HLA-DR3 to identify the additional contribution of genes other than HLA-DRB1 to variability in anti-CCP titer and severity of RA.

Study Population
RA patients were recruited at the Robert Breck Brigham Arthritis Center at Brigham and Women's Hospital as described previously (19). In brief, patients were eligible to join the study if they had a diagnosis of RA, were over 18 years old, and did not have a diagnosis of psoriatic arthritis or systemic lupus erythematosus. All study protocols were approved by the Brigham and Women's Hospital Institutional Review Board and informed consent was obtained from all subjects. A total of 575 RA samples were genotyped in this study, including all CCP+ patients enrolled in the BRASS cohort at that time, and 142 CCP-patients. The cohort is predominantly female (82.3%) with a mean age of 58.0 (±13.6) years old and average disease duration of 16.1 (±12.5) years. Anti-CCP antibodies were measured using a second generation Enzyme-Linked ImmunoSorbent Assay (ELISA) from Inova (Inova Diagnostics Inc., San Diego, CA, USA).

Genotyping and Quality Control
Genotyping of the samples was performed at the Broad Institute using the Affymetrix GeneChip 100K Mapping Array containing 116,204 SNPs (Affymetrix, Santa Clara, CA, USA). Samples were processed in a 96-well plate format using Biomek FX robotics according to the manufacturer's protocol. Genotypes were called using the BRLMM (Bayesian Robust Linear Modeling using Mahalanobis distance) algorithm.
To ensure high quality data and to minimize false positive results due to technical artifact, filtering criteria were set as follows: genotype call rate ≥ 0.95 per SNP and 0.9 per individual, minor allele frequency (MAF) ≥ 0.01, and P value for Hardy-Weinberg equilibrium test ≥ 0.0001. There were 97,248 autosomal SNPs from 546 self-reported Caucasians (82.3% female) that met these criteria. The average genotyping rate in the remaining individuals was 98.0%, and the SNP-wise genotyping rate was 98.9%. Samples were genotyped for HLA-SE alleles by lowresolution genotyping.
Individuals in BRASS were self-identified as Hispanic or non-Hispanic and as either Caucasian, American Native Indian, Pacific Islander/Asian, or African American. The entire study sample was utilized to screen for evidence of subtle population stratification through evaluation of pairwise identity by state (IBS) sharing distance across all samples using the software package PLINK (20). The hypothesis was that a more homogeneous population was more similar among individuals as reflected in small IBS distances. Clustering in terms of IBS-sharing distance among all possible pairs in the sample was explored through PLINK. A two-dimensional scaling plot from PLINK was used to visualize sample similarity. Standard classical multidimensional scaling was used to reduce representation of the data into two dimensions, enabling visualization on a two-axis scatter plot. Two dimensional scales ( Figure 1) visualized four wellseparated subgroups that correlated with self-reported ethnic groups. The IBS distance between African Americans, Asians, and Caucasians was distinguished. Also, the plot suggested Hispanic Caucasians were clearly genetically different from the majority of North American Caucasians. We further checked any potential outliers tion to immobilized sequence-specific oligonucleotide probes (16). CCP was measured using the same assay as in BRASS. Different sub-cohorts had different upper ranges of the titer; in particular some titers in the NARAC family collection were truncated at 413 (4%) or 210 (1%). In the primary analysis, CCP was transformed to a normal distribution by taking the inverse normal of the rank. GLM models for the 289 550K SNPs as predictors of ordinal CCP titer were run using PLINK (20), first unadjusted and then controlling for HLA-SE status for SNPs on Chromosome 6. As sensitivity analyses for the CCP range indicated differences among the sub-cohorts, logistic models also were run with CCP categorized as high (> 210) versus low (< 100) and the primary GLM models were rerun with cohort and disease duration as covariates.

Initial GWAS in BRASS
Demographics and clinical characteristics of the final set of 531 subjects in the BRASS cohort used in the genome-wide association analysis are summarized in Table 1. among the remaining self-reported non-Hispanic Caucasians using a Cook D test for the two-dimension scale. The Cook D value for each subject is the measurement of the parameter estimate change in analysis with that subject compared with the estimate without that subject. A larger value indicates that the subject is more different from the remaining subjects (for example, an outlier). We removed 19 individuals based on IBS analysis who also selfreported Hispanic and/or non-Caucasian race. Two more individuals who selfreported as non-Hispanic Caucasian, with Cook D value > 0.5 (1.07 and 0.62) were removed from analysis; the remaining subjects had a Cook D value no higher than 0.18. After quality control and removing outliers, 531 individuals were included in the genome-wide association testing.

Statistical Model
Anti-CCP titer was used as a continuous outcome variable. For the genomewide analysis, each SNP was tested for association with anti-CCP titer using general linear regression (GLM) assuming an additive model for all SNPs in PLINK. Secondary analyses were performed in which the HLA-SE, the wellestablished risk locus for anti-CCP RA, was controlled to determine whether the association was independent from the HLA-SE for SNPs on Chromosome 6. Analyses using only CCP+ (anti-CCP titer > 20) subjects (N = 401) also were performed. We additionally controlled for HLA-DR3 to investigate whether the signal was independent of HLA-DR3.
Because CCP titer was not normally distributed, simulation was used to validate the use of GLM. Ten million random genotypes were generated based on different minor allele frequencies (MAF) (0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5 respectively) for each individual. The empirical P value distribution was obtained by analyzing observed anti-CCP measures against the simulated genotypes using GLM. A pure null distribution was simulated, in which the type I error should follow the nominal level: for example, when we use 0.05 significance level, we should expect 1/20th of the results to be significant; if we use 0.001, we should get 1 in 1000 significant results; and so on. If no excess type I error was observed, this would support the validity of the GLM test.
Permutation testing was utilized to evaluate the significance of association analysis results. Based on individual's genome typing data, we permuted anti-CCP titer 1000 times. Then we randomly assigned genotype data to each CCP titer and ran pseudo-GWAS 1000 times to get a whole genome "association statistic." Such permutation creates the null distribution (that is, no real associations) and hence the number of significant associations observed in the permutation analysis is a measure of the expected false positive level across the genome.

Replication Population
Samples from the NARAC population were used for replication. NARAC samples were collected by four different study designs. NARAC cases were from multiplex families (primarily affected sibling pairs) in which at least one sibling had documented erosions. The other collections that were used for replication included samples from the National Data Bank of Rheumatic Diseases, the National Inception Cohort of Rheumatoid Arthritis, and the Study of New Onset Rheumatoid Arthritis. A more detailed description is given elsewhere (21). We genotyped 908 RA cases by SNP assay with Infinium Human Hap 550 version 1.0 (Illumina, San Diego, CA, USA) and applied the same quality control (QC) criteria as in BRASS. Among all NARAC RA patients, 849 individuals who passed QC filters, had CCP measurements, and were determined to have at least 90% European ancestry using STRUCTURE (22) were utilized in the replication analysis. For this 849-member cohort, 74.1% were female, with a mean age of 56.0 ± 12.3 years and average disease duration of 10.6 ± 11.1 years; they were all CCP+. Low resolution HLA-DRB1 typing for the allele groups DRB1*01 through DRB1*18, and high-resolution DRB1*04 typing were performed by initial PCR amplification, followed by hybridiza- The mean age at diagnosis was 41.6 years old. Patients had well-established disease; average disease duration was 16.4 years. Frequency of 0, 1, and 2 copies of the HLA-DRB1 SE among patients were 31.4%, 40.7%, and 27.9%, respectively. Seventy-six percent were clinically classified as CCP+.
The anti-CCP titer in the BRASS patients demonstrated a wide spectrum (from 3 to 449 units), but did not follow a normal distribution (Figure 2). Skewness and kurtosis were 0.42 and -1.09 respectively, and suggest a relatively flat and long right tailed distribution. Simulation was done to evaluate whether a general linear model (GLM) would handle type I error properly. Considering minor allele frequency (MAF) values of 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5, 10 million genotypes were simulated. We observed the expected distribution of results at all α level and MAF thresholds ( Table 2); no excess type I error was observed, supporting the validity of the GLM test in this analysis.
Genome-wide association analysis was conducted for anti-CCP titer using 97,248 SNPs in 531 BRASS samples with GLM ( Figure 3). The top 25 SNPs with the smallest P values are shown in Table 3. The five most significant SNPs are in the MHC region, the well-known susceptibility region for RA, and the minor alleles all are associated with lower anti-CCP titers. After adjusting for HLA-SE status, coded as 0, 1, or 2 copies of SE allele, the association in these five SNPs remained significant (P < 10 -5 ) suggesting that there may be other factors in this region, besides the HLA-SE locus, that affect anti-CCP titer.
The primary results suggest that five SNPs in the HLA region are associated with CCP titer; however, this could be interpreted as an HLA association with CCP positivity rather than titer since CCP-subjects were included in the analysis. To test whether these findings are robust, we limited the analysis to 401 CCP+ RA subjects in a secondary analy-sis. The associations still held with slightly weaker P values, suggesting that these SNPs are associated with CCP level among those who are CCP+, not just with CCP positivity (Table 3).
Permutation analyses to examine the expected number of false positive findings with observed associations are summarized in Table 4 for two genotype success rates. For SNPs with a genotype rate greater than 95%, we observed 134 tests that reached significance at the level of < 0.001; by random chance we would expect 93 significant tests. Furthermore, a Q-Q plot (Figure 4) demon-   strated no excess signals that might be due to bias based on a genotype call rate of 95%.

Replication Analysis in NARAC
We selected 289 SNPs that were associated with anti-CCP titer in all BRASS cohort or CCP+ cohort at P ≤ 0.001 to assess replication in the NARAC Illumina 550K data set. We compared the two genotyping platforms using HapMap linkage disequilibrium (LD) information (23) to define proxy SNPs for replication. There were 289 tagging SNPs in the 550K panel that passed quality control (QC) and tagged one or more candidate SNPs with r 2 > 0.8 in the Hap Map CEU population (24). Among the SNPs analyzed in the NARAC sample, the most significant was rs1980493, which is in LD with the top five SNPs from BRASS (R 2 = 0.85) (unadjusted GLM P = 6.1 × 10 -5 [ Table  3], and SE-adjusted P = 0.00018). The minor allele was associated with lower anti-CCP levels. In sensitivity analyses to test for differences in subcohort CCP ranges (see Methods), rs1980493 remained the top SNP (GLM adjusted for HLA-SE, cohort, and disease duration, P = 0.0003; logistic regression of high/ low CCP subsets, P = 0.003). This SNP is located between HLA-DRA and the gene BTNL2. Two more candidate SNPs,  rs12478948 and rs2046159, achieved a P value of < 0.01 in the NARAC study (data not shown). Additional exploration of the HLA region was performed using BRASS data. Because the HLA-SE is actually a group of alleles in the HLA-DRB1 region, when we checked LD between the five identified SNPs with individual SE alleles DR1 and DR4, the highest LD r 2 with SE risk alleles observed was 0.09. We also examined LD defining HLA-SE as any copy of SE alleles and found an r 2 = 0.14 ( Table 5).
A previous study demonstrated an inverse association between HLA-DR3 and anti-CCP titer (16) in NARAC subjects. The five top SNPs in BRASS had an r 2 = 0.46 with DR3. We repeated the analysis adjusting for HLA-DR3 and the P value for the top five SNPs in BRASS were 0.0002 (data not shown). Among the CCP + BRASS subjects, the P values were 0.01. The P value for the single top SNP in NARAC was 0.6 after adjusting for DR3. The P value for a heterogeneity test between BRASS and NARAC was 0.23 for the top SNP. We then pooled the raw data and performed a pooled metaanalysis across the BRASS and NARAC studies, and the SNP rs1980493 was asso-ciated significantly with anti-CCP titer at 1.8 × 10 -9 , 5.3 × 10 -7 , 0.0014, and 0.0022 for unadjusted model, adjusted for HLA-SE, adjusted for HLA-DR3, and adjusted for both HLA-SE and DR3 in the same model respectively. When limiting BRASS to CCP+ only (N = 401) and pooling with NARAC for the same four analyses, results were 2.9 × 10 -7 , 1.0 × 10 -6 , 0.02, 0.01, respectively.

DISCUSSION
We performed GWAS on the quantitative RA trait of anti-CCP titer and confirmed that HLA is the most important region for this RA phenotype. However, the HLA-SE does not explain all the risk in the region, as there is a significant signal in the region that is independent of HLA-SE status. The minor alleles of the top SNPs in both cohorts are associated inversely with anti-CCP level. This signal, near BTNL2 and HLA-DRA, is in LD with HLA-DR3; therefore, the precise causal location of this effect remains to be determined.
Besides genetic susceptibility analysis in RA, there also are studies of genetic associations for anti-CCP level as a quantitative trait. Most of these studies focused on the HLA region (7,10,11,(15)(16)(17)(18), and consistently found that the HLA-SE is associated with higher anti-CCP titer. Irigoyen et al. reported that in contrast to the SE, HLA-DR3 alleles were found to be associated with lower anti-CCP titers. The top SNPs from BRASS have moderate LD with DR3, also are associated with lower anti-CCP titers, and the association holds after adjustment for DR3. The proxy SNP from NARAC has higher LD with DR3 (r 2 = 0.64, with D'≈1), and, after adjusting for DR3, the association is no longer significant. We do see significant association between the top SNP and anti-CCP titer after controlling for DR3 in the combined cohort. Although our results support independence of the SNP from the HLA-SE, further evidence will be needed to determine whether the SNP is associated with anti-CCP independent of HLA-DR3. Aside from traditional linkage and candidate gene studies, the availability of cost effective high-throughput SNP genotyping platform technology has generated high enthusiasm for GWAS for a wide range of disorders. Among these, four were done within several large RA case-control studies looking for RA risk loci (13,21,25,26). Remmers et al. (25) reported that a STAT4 SNP, rs7574865,was associated significantly with RA risk in two large cohorts. There were two SNPs on the 100K SNP platform, rs3821236 and rs3024886, in LD with STAT4 rs7574865 with R 2 of 0.74; however, we failed to find any association with anti-CCP titer. For non-MHC reported regions from WTCCC (26) that associated with RA risk, we did see signals from Chromosomes 6q, 21, and 22 in analysis of BRASS samples, but failed to replicate these results in the NARAC samples. Plenge et al. (21) reported that SNPs in LD with the TRAF1 and C5 genes were associated significantly with RA, and Raychaudhuri et al. (27) reported new risk loci for RA, CD40 and CCL21 from meta analysis. However, in the 100K SNP platform, none of the available SNPs are in high LD with the reported SNPs.
An important strength of this study is the inclusion of two large, wellcharacterized RA cohorts. Limitations include the risk of false positive findings due to the large number of tests showing the comparison between whole genome association observed -log10(p) and expected -log10(p). Expected -log10(p) assumes there is no association between SNPs with anti-CCP and shown on the abscissa. The observed -log10(p) is shown on the ordinate. inherent in genome-wide association testing. Therefore, it is crucial to confirm findings by replication studies. We included 130 CCP-subjects from BRASS in the primary analysis which raises the question of whether the significant associations in the HLA region are due to an HLA association with the CCP positive/negative phenotype. However, in a secondary analysis limited to CCP+ subjects only, our findings were similar, suggesting that the association with CCP titer is robust. Although we have a fairly homogeneous cohort with a sample size >500, we still lack power to detect loci with small effect sizes. The 100K SNP platform only covers ~20% of the common variants through the genome, so more comprehensive sets of SNPs are needed for a better understanding of the genetic basic of the quantitative trait anti-CCP titer.
After testing our top findings in NARAC sample, we identified a locus in the HLA-DRA/BTNL2 region in LD with DR3 with a strong signal in both samples. After adjustment for HLA-SE status, this signal remained significant at P = 5.3 × 10 -7 , suggesting a signal in this region that is independent of the HLA-SE.

ACKNOWLEDGMENTS
The BRASS Registry is supported by a grant from Millennium Pharmaceuticals, Biogen-Idec, and Crescendo Biosciences. The NARAC is supported by US National Institutes of Health grants R01-AR44422, M01 RR-00079, and NO1-AR-2-2263. J Cui is supported by NIH grants R01 AR49880, P60 AR047782, and the Alliance for Lupus Research. LA Criswell is supported by NIH grants K24 AR02175 and R01 AI065841. RM Plenge is supported by a NIH K08 AI55314-3, a private donation from the Fox Trot Fund, the Burroughs Wellcome Fund, and the William Randolph Hearst Fund of Harvard University. EW Karlson is supported by NIH grants R01 AR49880, P60 AR047782, and K24 AR0524-01. The Broad Insti-tute Center for Genotyping and Analysis is supported by grant U54 RR020278 from the National Center for Research Resources. We thank all participants and Investigators from BRASS and NARAC, especially the principle investigator Peter K Gregersen.

DISCLOSURE
The BRASS registry is supported by a research grant from Millenninum Pharmaceuticals, Biogen-Idec, and Crescendo Bioscience. E Izmailova and A Parker are employed at Millennium. R Roubenoff is employed at Biogen.