An Epigenetic Signature in Peripheral Blood Associated with the Haplotype on 17q21.31, a Risk Factor for Neurodegenerative Tauopathy

Little is known about how changes in DNA methylation mediate risk for human diseases including dementia. Analysis of genome-wide methylation patterns in patients with two forms of tau-related dementia – progressive supranuclear palsy (PSP) and frontotemporal dementia (FTD) – revealed significant differentially methylated probes (DMPs) in patients versus unaffected controls. Remarkably, DMPs in PSP were clustered within the 17q21.31 region, previously known to harbor the major genetic risk factor for PSP. We identified and replicated a dose-dependent effect of the risk-associated H1 haplotype on methylation levels within the region in blood and brain. These data reveal that the H1 haplotype increases risk for tauopathy via differential methylation at that locus, indicating a mediating role for methylation in dementia pathophysiology.


Introduction
Epigenetics is one of the most rapidly expanding fields in biology, and is uncovering additional levels of complexity in the human genome, including DNA methylation, histone modifications, and intra-and inter-chromosomal interactions mediated by chromatin proteins [1,2]. Changes in methylation represent a key area where environmental factors can modify or interact with inherited genetic factors (DNA sequence) to alter the functional output of the genome. Disease-causing genes involved in epigenetic modifications have been identified, most notably for neurodevelopmental disorders such as Rett syndrome [3]. A very limited number of studies have addressed specific epigenetic modifications relevant to neurological diseases and dementia (reviewed in [4][5][6][7]). Additionally, epigenetic signatures have been reported for different brain regions [8,9], for regional brain aging [10], and aging in general [11] further supporting epigenetic studies in patients with neurodegenerative diseases.
Progressive supranuclear palsy (PSP) is a neurodegenerative disease typically characterized by parkinsonism, postural instability, and cognitive impairment [12]. Pathologically, PSP is defined by the accumulation of tau protein in subcortical and cortical regions, (reviewed in Williams and Lees, 2009 [13]), showing substantial overlap with other neurodegenerative diseases characterized by tau accumulation and grouped under the generic name of tauopathies, including approximately one-half of all frontotemporal dementia (FTD) cases and Alzheimer's disease [14]. Both rare [15,16] and common [17] genetic variation have been shown to mediate risk for tauopathies. The major common variant risk for PSP, a prototypical tauopathy, involves a region surrounding the tau locus [18], but how such genetic variation might mediate risk is not known.
We profiled the methylation status in peripheral blood from patients with two tau-related neurodegenerative conditions, PSP and FTD, using Illumina DNA methylation arrays. We then integrated these methylation data with SNP and gene expression data to identify a mediating role for methylation in genetic risk for PSP. We replicate this finding in independent studies and show that it is conserved in brain, providing the first evidence for a role for DNA methylation in mediating the risk for neurodegenerative dementia.

Differential methylation analysis
We first analyzed methylation profiles in 171 patients with FTD (n = 128) and PSP (n = 43) and compared them with 185 subjects with no evidence of dementia or other neurological conditions using Illumina HumanMethylation 450 k arrays (Table S1). Two datasets were generated in two batches, samples were compared within each dataset to condition out a potential batch effect, and the resulting differentially methylated probes (DMPs) were combined (see Methods).
Differential methylation analysis identified a number of DMPs between affected subjects and controls, with partial overlap between PSP and FTD (Figure 1a-b, complete list of DMPs is in Table S2). DMPs were mostly clustered within CpG islands (defined according to the Illumina annotation), with most being hypermethylated in PSP vs. controls ( Figure 1c, Table 1). Gene ontology analysis of DMPs in PSP vs. controls showed overrepresentation of genes involved in a number of pathways, including DNA binding and transcription factor binding ( Figure S1). We then assessed the chromosomal distribution of the DMPs, and observed -only in PSP samples vs. controls -an overrepresentation of probes from chromosomes 19 (hypergeometric test pvalue = 1.32610 26 ), 22 (p = 8.63610 26 ), and 17 (p = 5.82610 25 , Figure 1d), with most top DMPs (after filtering for absolute average beta difference (abD).0.1) located within the 17q21.31 region (Figure 1e). The most significant DMPs when comparing PSP vs. controls (n = 14, absolute abD.0.1) are listed in Table 2. Of note, 4 DMPs (all hypomethylated in PSP) were located within the NFYA gene, encoding for a component of a nuclear transcription factor. Importantly, 3 of the 14 significant DMPs are located in 17q21.31 (Figure 1e, p = 2.23610 27 , hypergeometric test). Despite being located in a relatively limited genomic region, these 3 probes were both hypermethylated and hypomethylated in PSP vs. controls, suggesting complex diseaseassociated patterns of differential methylation in this region.

17q21.31 haplotype and methylation
The location of several DMPs in the 17q21.31 region was intriguing because the 17q21.31 locus contains an established risk factor for neurodegeneration, first reported in 1997 by Conrad et al. [18] for PSP and then confirmed in multiple series (reviewed in Wade-Martins, 2012 [19]). Two main haplotypes (H1 and H2) have been described at this locus. The more common H1 haplotype is over-represented (95% vs. 57%) in PSP vs. normal controls [18][19][20]. The H1/H2 locus spans at least 1.8 Mb and includes multiple genes (.40, many of which are actively transcribed in the brain), notably including MAPT, encoding for the microtubule-associated protein tau [19]. Mutations in MAPT cause FTD and PSP, and hyperphosphorylated tau accumulation is a hallmark in a number of neurodegenerative conditions, including AD, PSP, FTD and others, collectively named 'tauopathies'.
Consistent with previous reports, the H1 haplotype was overrepresented in our PSP cohort, with a H1 allelic frequency of 97.1% vs. 80.4% in controls (p = 1.86610 24 , Fisher's exact test, Table S1), further confirming -even in this relatively small data set -the H1 haplotype as a risk factor for PSP. We hypothesized that the clustering of DMPs in 17q21.31 in PSP cases vs. controls might be related to the H1 haplotype risk factor. To detect an effect of the 17q21.31 haplotype on methylation levels, we compared samples based on their genotype at this region, independent of disease classification. As for previous analyses, we compared samples within datasets to avoid potential batch effects. Genotype distribution across diseases and datasets is reported in Table S3.
We compared carriers of the risk-associated H1 haplotype (H1/ H1 and H1/H2 genotypes) to H2/H2 samples (dominant model) within each dataset and, after filtering DMPs for adjusted p#0.05, identified two overlapping sets of 57 and 34 DMPs (Figure 2a,b), markedly clustered within the 17q21.31 region (Figure 2d). Similar results (Figure 2a,c) were obtained when comparing H1/H1 samples to H2 carriers (H1/H2 and H2/H2, recessive model), supporting the hypothesis of a strong cis effect of the H1/H2 locus on methylation levels in peripheral blood ( Figure 2). After filtering for absolute abD.0.1, 8 of the top 9 DMPs identified in both datasets were within 17q21.31 (Table 3) in the dominant model. As noted in PSP cases vs. controls, DMPs in this region are both hyper-and hypo-methylated, suggesting a complex cis-regulation of methylation levels ( Figure 3a). Scatterplots of the methylation levels for the top DMPs shared between the dominant and recessive models indicate that the H1 haplotype influences methylation levels at these sites in a dose-dependent fashion (Figure 3b), accounting for a majority of methylation variability at these sites (e.g. R-squared = 0.835 and 0.866 in dataset #1 and #2, respectively, for cg22968622). Similar results were obtained when comparing subjects based on their genotype at 17q21.31, but only within controls, FTD, or AD patients (Text S1). The H1 haplotype can be further divided into sub-haplotypes [21]. We obtained sub-haplotype information for 93 H1 carriers in our cohort using the SNPs described in Kauwe et al 2008 [22]. Hierarchical clustering of the methylation signal in the 17q21.31 region and principal component analysis did not reveal a particular clustering of H1 sub-haplotypes (data not shown). These results -although based on a subset of our cohort -suggest that haplotype structure is not the major determinant of 17q21.31 methylation overall.
To test the contribution of haplotype status on PSP-associated DMPs, we repeated the differential methylation analysis only on samples with the H1/H1 haplotype (n = 31 PSP cases and 59 unaffected controls). Of the resulting 341 significant DMPs (after

Author Summary
Progressive supranuclear palsy (PSP) and frontotemporal dementia (FTD) are two neurodegenerative diseases linked, at the pathologic and genetic level, to the microtubule associated protein tau. We studied epigenetic changes (DNA methylation levels) in peripheral blood from patients with PSP, FTD, and unaffected controls. Analysis of genome-wide methylation patterns revealed significant differentially methylated probes in patients versus unaffected controls. Remarkably, differentially methylated probes in PSP vs. controls were preferentially clustered within the 17q21.31 region, previously known to harbor the major genetic risk factor for PSP. We identified and replicated a dose-dependent effect of the risk-associated H1 haplotype on methylation levels within the region in independent datasets in blood and brain. These data reveal that the H1 haplotype increases risk for tauopathy via differential methylation, indicating a mediating role for methylation in dementia pathophysiology.
application of the Benjamini-Hochberg procedure, FDR = 0.05), 21 were located in chromosome 17 and 2 were located in the 17q21.31 band. Neither the chromosome nor the region were found to be significantly overrepresented by the hypergeometric test (p = 0.342 and 0.149, respectively). The lack of overrepresentation within 17q21.31 after conditioning on strata defined by the 17q21.31 haplotype suggests that either the previously identified overrepresentation on chromosome 17 was due to the 17q21.31 haplotype effect on methylation levels, or that it could also reflect reduced power due to small sample size. To address the issue that the strata contained too few samples, we also carried out a multivariate regression model analysis that included 17q21.31 haplotype as covariate. Specifically, the methylation level of each of the 3 top PSP-related DMPs located in 17q21.31 (Table 2) was regressed on PSP status, 17q21.31 haplotype, ethnicity, and age using a multivariate linear regression model. We found that, except for cg23758822, other PSP-related DMPs were no longer significant (p = 0.410 on average, Table S4) in a multivariate model once it included the H1 genotype. We also calculated the relative weight of each predictor using the R package relaimpo [23], and determined that the H1 haplotype accounted for the majority of explained variance (78.2625.9%, Figure S2). Finally, we estimated relative cell count composition in peripheral blood using methylation data [24][25][26]. Correction for inferred cell count did not significantly change our findings (Text S1, Table S5, Figure S11).
Taken together, these findings indicate 1) a strong effect of the 17q21.31 haplotype on methylation levels at 17q21.31, 2) that the risk-associated H1 determines most of the methylation changes observed with confidence in PSP patients vs. controls, and 3) that additional DMPs outside the 17q21.31 region may be at play in determining risk susceptibility for PSP in H1 carriers, though larger sample sizes will be needed to clarify their importance.  Genome-wide methylation QTL analysis confirms a cis methQTL at 17q21.31 Our findings strongly indicate a cis regulation of methylation levels at the 17q21.31 locus. To test whether there were additional potential genetic determinants of methylation levels at 17q21.31 in our dataset, we performed a methylation QTL (methQTL) analysis in a subset of 226 individuals of European descent for whom whole-genome SNP and methylation data were available (Table S6). We assessed association of genetic variants with methylation levels at 3 CpGs within 17q21.31 (cg22968622, cg17117718, cg19832721) in each dataset. We identified on average 110 genome-wide significant signals (Bonferroni-adjusted p#0.05), all located within the 17q21.31 region ( Figure 3c, Table  S7). These variants accounted for a proportion of variability ranging between 25.5% and 98.2% (mean R-squared = 0.701, Figure S3, Table S8) further confirming that genetic variants at 17q21.31 are controlling methylation levels in cis in the same region. We focused on Caucasian individuals because of the differences in frequency of the H2 haplotype across populations. In fact, consistent with previous reports [27], we observed that the H2 haplotype occurs more frequently in Caucasians (H2 allelic frequency = 19.2%, Table 4) than in other ethnic groups (H2 allelic frequency in Asians = 1.3%; p = 3.83610 26 , Fisher's exact test). However, similar results were observed when including all the 273 individuals for whom SNP and methylation data were available (Text S1, Figure S4, Tables S9, S10).

locus in additional datasets
To confirm that the 17q21.31 haplotype regulates methylation in cis at this locus in an independent dataset, we downloaded and reanalyzed raw data from a previously published study, for which SNP and methylation data in peripheral blood from 12 samples were publicly available [28]. Using the rs1052553 SNP to call the H1/H2 haplotype and adopting the same statistical thresholds, we compared H1/H1 vs. H1/H2 subjects and identified one hypomethylated probe (cg22968622, adjusted p-value = 2.37610 28 , abD = 20.42) within 17q21.31, which was also identified in our analysis. We also performed a methQTL analysis for cg22968622 in the same dataset. Of the 310 significant SNPs, 206 were located in the 17q21.31 region (p = 0, hypergeometric test), further supporting the presence of a cis methQTL at this locus.
To provide independent validation of the methylation array assay, we performed reduced representation bisulfite sequencing (RRBS) on a representative set of 7 samples from the study (2 H1/ H1 controls, 1 H1/H1 PSP patient, 1 H1/H2 control, 1 H1/H2 PSP patient, and 2 H2/H2 controls). As a sequencing-based approach, RRBS would not suffer from some of the technical biases present in arrays, e.g. due to hybridization. At CpG sites that were covered by both RRBS and array, the methylation measurements were highly correlated (Pearson r.0.9) in all seven samples ( Figure S5).
To validate our findings from peripheral blood, we analyzed RRBS data from whole-blood DNA of a separate cohort of 80 healthy subjects (comprising 54 H1/H1, 24 H1/H2, and 2 H2/ H2). On average, the methylation level computed from RRBS was highly correlated with the array in both dataset #1 (r = 0.965) and dataset #2 (r = 0.963) ( Figure S6). Consistent with the array, we found differences in methylation that were significant even after strict Bonferroni correction for multiple testing, mostly localized to the 17q21.31 cytoband (Table S11). Since local methylation levels are often highly correlated, we would expect the differentially methylated CpGs identified by both the array and the sequencing method to be in close proximity. Indeed, the differentially methylated loci identified by RRBS in the 17q21.31 region are nearby those identified by the Illumina Human Methylation array, overlapping the same genes MAPT and KIAA1267. This degree of overlap is striking, given the large extent of the haplotype inversion ( Figure S7). Methylated regions identified by the array but not by RRBS may be a result of the higher power (greater sample numbers) in the array, and differences in coverage. Therefore, we looked at probes that both demonstrated haplotype-specific methylation in 17q21.31 on the Illumina array and were covered by RRBS reads in the additional cohort. One probe, cg08113562, met these criteria. The methylation pattern followed a statistically significant dose-dependent relationship with the H1 versus H2 haplotype, with mean methylation fractions of 0.001 in H1/H1 subjects, 0.022 in H1/H2 subjects, and 0.048 in H2/H2 subjects (two-sided p = 0.03, ANOVA), in the same direction as that reported by the array.
To assess the relevance of our findings to brain tissue, we analyzed 2 published methylation QTL studies in brain involving 150 [29] and 153 [30] subjects, respectively. Gibbs et al. [29] identified 9 SNP-CpG association pairs (out of 52,345 significant methQTL, mean R-squared = 0.232) at the 17q21.31 locus in two (frontal cortex and cerebellum) of the four studied brain regions; Zhang et al. [30] identified in cerebellar samples 122 SNP-CpG pairs (significant in at least one of the three thresholds they used) at the 17q21.31 locus (out of 12,117 significant methQTLs, mean R-squared = 0.136). Together, these data demonstrate that the cis methQTL we identified in our study are present in independent studies in peripheral blood, and are preserved in brain.

Causal inference identifies three methylated regions that may mediate PSP risk
The strong association between the haplotype at 17q21.31 and methylation status raises the question of whether methylation levels mediate the protective or pathogenic effects of haplotype variants. Recent developments in the field of causal inference have yielded quantitative methods to predict the hierarchy of causation given genetic variants [31][32][33][34]. Network Edge Orienting (NEO), for example, uses structural equation models to choose the best fitting causal model, assuming that the genetic variation is fixed by meiosis and thus ''anchors'' each model (that is, genotype precedes phenotype) [35]. NEO allows one to evaluate which of five testable causal models ( Figure S8) best explains the relationship between genetic variants, methylation levels, and disease status. For instance, the genotype may lead to patterns of methylation that directly contribute to the disease phenotype ( Figure S8b). Under this model, the DMPs within 17q21.31 would be the most interesting, as they would correspond to the epigenetic markers mediating the increased risk conferred by the H1 haplotype. Alternatively, the genotype may independently give rise to the methylation and disease phenotype, with neither contributing to the other ( Figure S8c). DMPs under this model are only associated   with the disease because of the common source of variation due to the 17q21.31 locus. We applied NEO to calculate a relative fitting index of the ''mediation model'' for the 9 haplotype-associated DMPs (Table S12) using 35 PSP cases and 184 unaffected controls for whom these data were available. The ''mediation model'' best explained the methylation pattern in three sites, one of which (cg17117718) was statistically significant (see Methods) ( Figure 3d). These results support the hypothesis that methylation status at certain sites likely is a causal mediator of the major known genetic risk related to PSP pathogenesis. Taken together, these results predict -for the first time -a link between epigenetic changes and tauopathies, and will need to be further validated with functional studies.

Haplotype-associated differences in gene expression
Methylation changes have been associated with changes in gene expression. We examined microarray expression data in peripheral blood available for 120 subjects, to test whether the methylation associated with the 17q21.31 haplotype had such an effect. Among 88 healthy subjects with H1H1 haplotype, 24 healthy subjects with the H1H2 haplotype, and 8 healthy subjects with the H2H2 haplotype, we identified three probes significantly differentially expressed in peripheral blood, mapping to MAP-K8IP1 (on chromosome 11), LRRC37A4 (located within the 17q21. The strong haplotype-associated methylation changes identified in our study included DMPs within the MAPT, KIAA1267, ARHGAP27, and DND1 genes. We used linear regression to test whether the 17q21.31 haplotype was associated with differential expression of these genes, and found no correlation between haplotype and gene expression for these transcripts (adjusted Rsquared = 0.006, 0.000, 20.008, and 20.011, respectively). Thus, while haplotype was shown to affect mRNA expression of MAPK8IP1 and LRRC37A4 in peripheral blood, there was no detectable correlation between DMP-containing genes and their corresponding expression levels.

Discussion
The goal of this study was to assess whether changes in DNA methylation in peripheral blood are observed in patients with neurodegenerative diseases. By performing microarray-based differential methylation analysis, we identified a methylation signature associated with disease status in PSP and, to a lesser extent, FTD. Using SNP data available in a subset of our series, we showed that a remarkable proportion of the observed changes in methylation status in PSP are associated with a common haplotype at the 17q21.31 locus, strongly suggesting the presence of a cis methylation QTL in this region. Although we included patients with neurodegenerative disorders in our analysis, the observed pattern seems to be related to the haplotype at 17q21.31, independent of disease status. Integrative analyses including SNP and gene expression data support a model whereby genetic variation at the 17q21.31 locus modulates the risk for neurodegenerative tauopathy at least partially via differential methylation.
The H1 haplotype at 17q21.31, a large linkage disequilibrium block due to an inverted chromosomal sequence of ,970 kb, is the major known risk locus for PSP [36] and other neurodegenerative diseases [17,37]. Although the genetic contribution of this locus to the risk for neurodegeneration is established and widely replicated, the mechanism by which risk is increased is largely unknown. This region spans from CRHR1 (corticotrophin) to IMP5 (a presenilin homologue) at the centromeric end of LD, while WNT3 and NSF (N-ethylmaleimide-sensitive factor) are at the telomeric end of the LD block [38]; therefore it spans at least 1.8 Mb, including 48 RefSeq genes -many of which actively transcribed in the brain -and constitutes the largest haplotype block in the human genome. Stefansson et al. [39] showed that the complete disequilibrium was due to an inversion occurring in the H2 haplotype relative to the H1 human reference and subsequent absence of recombination between inverted and noninverted chromosomes. The study of this region to understand susceptibility to neurodegeneration has been mostly focused on one gene, MAPT, encoding for the microtubule-associated protein tau. This focus on tau is well motivated, as hyperphosphorylated tau accumulates within neurofibrillary tangles -the pathological hallmark of AD -and because mutations in MAPT cause FTD, the second most common neurodegenerative dementia. Several in vitro studies have reported alterations of transcription levels in MAPT due to common variants in the region [40,41], but this finding has not been consistently replicated [42,43]. More consistent evidence exists for a higher expression of exon 3 in brains from H2 carriers [43,44] and of exon 10 in H1 carriers [41,45], suggesting that splicing abnormalities are involved in increasing risk.
Recently, additional genetic evidence has been reported implicating this locus in other neurodegenerative diseases, such as Parkinson's disease [46], essential tremor, and multisystem atrophy [47]. This is important, as these diseases are not typical tauopathies, suggesting that the effect of this risk-associated region may be complex and involve multiple genes [17].
Our results indicate a novel mechanism by which the H1/H2 locus may affect the risk for tauopathies: significant alterations in methylation mediating increased disease susceptibility. Importantly, these methylation changes are not at the MAPT locus only, but are consistently observed in at least 3 neighboring genes as well, suggesting that genes other than MAPT might be at play in increasing disease susceptibility. In addition, DMPs in the region were both hyper and hypo-methylated, suggesting a complex regulation of methylation levels at this locus. Further studies will be needed to understand whether the observed methylation signature at 17q21.31 is increasing susceptibility through a MAPTdependent or independent mechanism, or both. Interestingly, a recent study focused on rheumatoid arthritis [48] linked a genetic susceptibility region for the disease, the MHC locus, with methylation changes in the same region, supporting the notion that epigenetic changes might mediate complex disease susceptibility induced by genetic risk factors. This is the first study of DNA methylation levels in blood in PSP and FTD, disorders that mainly affect brain. Although methylation patterns may be tissue specific [49], comparative studies of blood and brain showed both methylation patterns that are tissuespecific and conserved across tissues [50]. We show that the particular H1 haplotype-related methylation pattern identified in blood is at least partially conserved in brain. This is encouraging, since -in contrast to brain -blood is available from living patients, yielding a higher potential for future use as biomarker and the possibility of large-scale studies. We and others have used gene expression in peripheral blood to gain insights into the biology of neurodegenerative disorders [51,52]. This study supports the notion that a disease-related signature is present in methylation data as well. Finally, we decided to focus on the risk-associated 17q21.31 region as an initial step, but many interesting candidates for further study emerged from the differential methylation analysis in PSP patients vs. controls, namely the nuclear transcription factor NFYA.
Although our analysis of published datasets supports the presence of the 17q21.31-associated methylation signature in brain tissues, further studies focused on brain samples from patients will be needed to test whether methylation changes are contributing to tissue-specific gene expression abnormalities, and ultimately explain the mechanism of action of genetic susceptibility alleles, and the striking regional vulnerability of these disorders.

Ethics statement
All subjects and/or their proxies signed informed consents for genetic studies. The research protocol was approved by the University of California San Francisco (UCSF) and Los Angeles (UCLA) University Institutional Review Boards for human research.

Sample description
Patients were enrolled as part of a large genetic study in neurodegenerative dementia (Genetic Investigation in Frontotemporal Dementia, GIFT) at the UCSF Memory and Aging Center (UCSF-MAC) [53]. 371 unrelated subjects were enrolled in the study (Table S1), including patients with neurodegenerative disorders (128 FTD, 43 PSP, and 15 AD), and 185 healthy controls.

Sample preparation
DNA was extracted from peripheral blood using standard methods. No cell sorting or cell selection was conducted, therefore our data measure methylation levels in whole blood. Total RNA was extracted from the same individuals from peripheral blood using Paxgene Blood RNA tubes (Qiagen).
Ethnicity. We inferred ethnicity for 271 samples (out of the 273 for whom SNP data were available), by using SNP clustering compared to Hapmap data. Briefly, MDS analysis was applied on a merged dataset, including 702 samples from our data (273 of which were included in present study) and 1184 subjects from HapMap phase III. MDS plot shows that the first two principal components can cluster samples by ethnicity, and our data had good overlap with HapMap data ( Figure S9). We used selfreported ethnicity for an additional 88 samples. For 12 samples ethnicity remained unknown either because we were not able to call ethnicity with certainty using SNP data (n = 2), or because of the lack of SNP data and self-reported ethnicity (n = 10).

Reduced representation bisulfite sequencing
RRBS was performed on seven samples multiplexed in two lanes of the Illumina HiSeq. Library preparation was performed using the Msp I restriction enzyme as previously described [54]. Read alignment and methylation level calls were performed using BS-Seeker2 [55] (parameters: --aligner = bowtie -m 5 -g hg18.fa -r --low = 50 --up = 500 -a adapter.txt).
Published datasets SNP data from Heyn et al. [28] reporting the methylome analysis of newborns and centenarians, including 40 samples assayed by Illumina HumanMethylation450 BeadChip and 14 genotyped by Illumina HumanOmni5-Quad BeadChip (both methylation and SNP genotyping data were available for 12 samples), were downloaded from the Gene Expression Omnibus Database (GEO, http://www.ncbi.nlm.nih.gov/geo/, GSE31438). H1/H2 haplotypes were inferred from SNP rs1052553. For methylation data, beta and detection p-values were downloaded from GSE30870, and 65 sites containing missing values were removed.

Statistical analysis
Methylation arrays. In order to avoid potential confounders from batch effects, the two datasets were processed separately. Raw data was processed using the Illumina GenomeStudio software (version 2010.3). Background correction and color normalization were performed using the R package minfi version 1.2.0 [56], and normalization using Subset-quantile Within Array Normalization (SWAN) [57]. Probes were excluded from further analysis if .95% samples had detection p-value .0.01. In summary, 3,027 probes were removed from dataset #1, and 26,306 probes were removed from dataset #2. In order to avoid potential confounders, 66,877 SNP-containing probes were also excluded from further analysis [58]. Beta values (ratio between methylated probe intensity and the overall intensity) were computed using the R package minfi.
Linear models and empirical Bayes methods as implemented in the limma package were used for differential methylation analysis [59]. P-values were adjusted by using the Benjamini-Hochberg (BH) false discovery rate method. Multi-dimensional scaling (MDS) did not show obvious biases between chips within each dataset, but a batch effect could be observed between datasets #1 and #2 (Figure S10), similar to what has been reported in the literature [30,60]. To avoid potential confounders due to this batch effect, we compared each disease category with the set of controls within the same batch (i.e. conditioning on batch effect). Similar differential methylation results were obtained when we performed a combined analysis across the entire dataset, after correcting for batch effects using ComBat [61] (Text S1). Two filters were applied to conservatively identify differentially methylated sites: 1) a p-value-based filter (BH-adjusted p#0.05) and 2) an absolute average beta difference (abD) filter (absolute abD.0.1). Chromosomal enrichment analysis was performed by using the hypergeometric test as implemented in the R phyper function.
The impact of relative cell counts in peripheral blood was estimated as previously described [1,2] and based on a subset (n = 385) of the 500 loci whose methylation levels reflect the relative proportions of immune cells in unfractionated whole blood. After estimating the blood cell type distribution for each sample using the methylation level of the 385 loci, we applied a linear mixed-effect model considering (1) main blood cell types distribution as dependent variables; (2) disease status (or 17q21.31 haplotype), age, ethnicity, and gender as fixed effects; and (3) chip number as a random effect (Text S1).
Raw and normalized methylation data were deposited in the Gene Expression Omnibus (GEO, www.ncbi.nlm.nih.gov/geo), accession number: GSE53740.
Methylation QTL analysis (methQTL) was performed by regressing methylation level at select CpG sites of interest on SNP genotypes. Age and the first two principal component generated from MDS analysis were also included in the multivariate regression model as covariates. Linear regression methQTL analysis was performed using PLINK [62].
Microarray-based expression analysis. Raw data were processed as previously described [63]. Briefly, after quantile normalization, batch effects were removed using ComBat [61]. Differential expression analysis was performed using the limma package [59], applying a false discovery rate filter of #0.05 and an absolute log fold change filter of .0.1.
Causality analysis. Causality analysis was performed using the software package Network Edge Orienting (NEO) [35], a structural equation modeling software for determining the direction of causality among various phenotypes (e.g. clinical, molecular) given genotype data. Subjects from all batches were included (n = 219). The relative fitting index of the model is estimated by the Single Marker LEO.NB score, defined as the base-10 logarithm of the probability ratio between the mediation model and the next most likely causal model (i.e., a LEO.NB score of 1 means that the fit of the mediation model is 10 times better than that of the next best alternative causal model). The genotype was encoded as the dosage of the minor allele (A) at rs1560310 tagging H2 (i.e., 0 for H1/H1, 1 for H1/H2, and 2 for H2/H2). Gene expression levels were included for differentially expressed probes according to the 17q21.31 haplotype, and were encoded with the ComBat-corrected relative expression levels. Clinical phenotype was encoded as a binary variable (i.e., 0 for unaffected, 1 for affected with PSP). The Single-Marker analysis option was used, and results surpassing the thresholds of a LEO.NB score (a likelihood ratio of model fit) .0.8 and RMSEA index ,0.05 were considered significant fits to the mediation model [64]. Figure S1 Over-represented gene ontology (GO), molecular function, level 3 (MF_3) categories among DMPs in PSP versus controls (in green the proportion of hypomethylated DMPs; in red the proportion of hypermethylated DMPs) sorted by 2log10 (pvalue). A 2log (p-value) of 1.3 corresponds to an overrepresentation p-value of 0.05. (PDF) Figure S2 Relative effect of covariates on methylation beta value variance. The H1 haplotype accounts for most of the explained variance. For the top 3 PSP-related DMPs, the relative importance of predictors in the multivariate linear regression model (including H1 frequency, diagnosis status, age, and ethnicity) was calculated using R package relaimpo. Error bars: 95% bootstrap confidence intervals. (PDF)           Text S1 Additional analyses including: 1) Testing the 17q21.31 haplotype effect in patients and controls separately; 2) Impact of estimated relative cell counts in peripheral blood; 3) Genome-wide methylation QTL analysis in the entire dataset (n = 273); 4) Differential methylation analysis using the combined dataset (n = 371 samples). (DOCX)