Main

Identifying human DNA polymorphisms that regulate gene expression is essential for understanding the genetic basis of common diseases but is technically challenging. Valuable insights into molecular mechanisms can be derived by in vitro analysis of protein-DNA binding and reporter gene expression, but we have limited tools to interrogate putative regulatory polymorphisms in vivo. Such analysis needs to consider natural chromatin structure and natural haplotypic combinations of polymorphisms. It also must be able to detect small differences in gene expression, which could be crucial for disease susceptibility if a physiologically important mediator is involved.

We attempted to address these issues for LTA, an important early mediator of immune and inflammatory responses1. Regulation of LTA is finely calibrated to resist dangerous pathogens but not to cause inflammatory damage to the host, as seems to occur in bacterial sepsis2 and cerebral malaria3. An intronic SNP at LTA+252 is associated with susceptibility to myocardial infarction4, asthma5,6, bacterial sepsis7,8, cancer9,10 and autoimmune diseases11,12. Haplotypes bearing the LTA+252G allele show relatively high levels of reporter gene expression4 and LTA protein production by peripheral blood mononuclear cells in some13,14 but not all15,16 studies. An investigation of allele-specific quantification of transcripts and phosphorylated polymerase II loading was consistent with high LTA expression by haplotypes carrying the LTA+252G allele17. Inspection of the data suggests that the main effect did not reside with the LTA+252G allele itself, however, but rather in some other difference between the most common haplotype, which did not include the LTA+252G allele, and the haplotype containing this allele.

To investigate the relationship between genotypic variation at LTA and phenotype, we first determined the sequence diversity at the locus by cloning and sequencing the gene and its flanking regions in a panel of lymphoblastoid cell lines (LCLs). We previously reported an analysis of LTA haplotypes in five individuals who were heterozygous with respect to LTA+252 (ref. 17). We now sequenced 52 chromosomes derived from unrelated individuals of European descent. We found that 12 polymorphic sites were present at >10% allele frequency, including 11 SNPs and 1 tandem-repeat polymorphism. This variation resolved into three main haplotype clades spanning LTA (Fig. 1a).

Figure 1: LTA protein production is associated with LTA haplotype.
figure 1

(a) Haplotype structure at the LTA locus. SNPs present at an allele frequency >0.1 in the panel of 26 LCLs are shown (minor allele frequencies given in brackets) together with haplotypes (haplotype frequencies shown at the head of the columns). Designations of SNPs refer to the nucleotide position relative to transcriptional start site. Clade A is uniquely defined by LTA+10A, LTA+252G and LTA+723A; clade B is uniquely defined by LTA+80A and LTA+368C; and clade C is uniquely defined by LTA+438(10) and LTA+495C. The frequencies of these main haplotype clades were 0.27, 0.46 and 0.27, respectively. (b) Analysis of LTA production by haplotype pair. A scatter plot of LTA concentration in cell culture supernatant from the panel of LCLs is shown per 106 cells, plotted as a symbol representing each cell line together with the geometric mean (horizontal lines). A total of six replicate experiments were done. The cell lines heterozygous with respect to haplotypes A, B and C are shown categorized at a given time point according to the pair of haplotypes present in that cell line (A+B, A+C or B+C). At 24 h after stimulation, cells bearing haplotypes B+C or B+A were associated with significantly lower LTA concentrations than those bearing haplotypes A+C in a paired t-test (two-tailed; 226.7 versus 377.2, P = 0.001 and 260.5 versus 377.2, P = 0.031 for comparison of haplotypes B+C with A+C and of haplotypes B+A with A+C, respectively), whereas B+C versus B+A showed no difference (226.7 versus 260.5, P = 0.12). Similarly, after 48 h, cells bearing haplotypes B+C or B+A were associated with significantly lower LTA concentrations than those bearing haplotypes A+C in a paired t-test (two-tailed; 371.3 versus 557.0, P = 0.001 and 407.7 versus 557.0, P = 0.007 for comparison of haplotypes B+C with A+C and of haplotypes B+A with A+C, respectively), whereas B+C versus B+A showed no difference (371.3 versus 407.7, P = 0.25).

We examined the same panel of 26 LCLs before and after stimulation with a mitogen (phorbol myristate acetate (PMA) plus ionomycin) that strongly induces LTA in this cellular model. In six replicate stimulation experiments, we determined LTA concentration in cell culture supernatants by ELISA 0, 24 and 48 h after stimulation and normalized this concentration to cell numbers at each time point. When we analyzed SNPs individually, LTA−294T, LTA−293A, LTA+10A, LTA+80A, LTA+252G, LTA+368C, LTA+723A, TNF−863A and TNF−308A all showed significant associations with LTA production after stimulation and, to a lesser extent, in the resting state (Supplementary Table 1 online). This does not mean that all these SNPs are functional, as the high linkage disequilibrium at LTA distributes associations across the locus.

As a starting point in dissecting functionality, we carried out stepwise linear regression analysis using either all SNPs with >10% allele frequency or the three main haplotype clades as predictive variables for induced LTA protein production (dependent variable). Analysis of individual SNPs identified LTA+368C and LTA+80A (which in this data set always occur together) as the only significant predictive variables of LTA protein production (regression coefficient at 48 h after mitogen stimulation (b) = 143.9 and standard error (s.e.) = 61, P = 0.027). These two SNPs uniquely define haplotype clade B, which was the only significant predictor when we entered the three haplotype clades as predictive variables of induced LTA production (at 24 h after mitogen stimulation, b = 129.6 and s.e. = 61.8, P = 0.048; at 48 h after mitogen stimulation, b = 176.8 and s.e. = 65.2, P = 0.013). Cells carrying haplotype clade B thus produce, on average, less LTA than other cells (Fig. 1b), consistent with our previous association of this clade with low polymerase II loading and transcript abundance17.

To explore possible molecular explanations, we investigated the haplotype-defining SNPs for effects on protein-DNA interactions in vitro using electrophoretic mobility shift assays (EMSAs). In addition to LTA+368C and LTA+80A, we studied LTA−293A, which defines a large subset of clade B. When we incubated nuclear extracts from unstimulated and mitogen-stimulated LCLs with oligonucleotide probes spanning these SNPs, we observed allele-specific binding with probes spanning LTA+80 (Fig. 2a). A slowly migrating inducible complex bound to the C allele but not the A allele (complex I), whereas two constitutive complexes bound only to the A allele (complexes II and III). Competition EMSA experiments showed the specificity of allelic binding for these complexes (Fig. 2b). In contrast, LTA−293 showed no allele-specific differences in protein-DNA binding, and LTA+368 showed a very slowly migrating constitutive complex that seemed to be specific for the C allele and was not investigated further (Fig. 2a).

Figure 2: Protein-DNA binding modulated by SNPs found on the functional haplotype.
figure 2

(a) EMSAs using nuclear extracts prepared from unstimulated (open circle) or mitogen-stimulated (filled circle) LCLs incubated with radioactively labeled probes spanning the sites of each of the SNPs in the two allelic forms. In this and all subsequent EMSAs, the location of the wells is indicated by an asterisk. (b) Competition EMSAs using radioactively labeled probes with either C (lanes 1–8) or A (lanes 9–16) at the LTA+80 SNP in a binding reaction with nuclear extract prepared from unstimulated LCLs. Complex I was specifically competed by 10× molar excess of unlabeled LTA+80C competitor probe, whereas unlabeled LTA+80A probe competed only at 100× molar excess. Complexes II and III were abolished by 100× molar excess of LTA+80A, in contrast to unlabeled competitor LTA+80C, which competed only weakly for binding. Molar excess of unlabeled probe matching an unrelated consensus binding site for Oct-1 did not compete for binding by complexes I, II or III, in contrast to a number of faster migrating complexes, which seem to be non-specific.

Several lines of evidence suggested that the allele-specific effects of the polymorphism at LTA+80 might involve the basic helix-loop-helix (bHLH) family of transcription factors. First, the sequence spanning the SNP at +80 shows homology with the E2-box consensus motif bound by class A bHLH proteins (CAGCTG)18 with a 1-bp mismatch (underlined) in the presence of an A at LTA+80 (CAGCAG), or a 2-bp mismatch with a C at LTA+80 (CCGCAG). When a probe matching a classical E-box binding element from the immunoglobulin heavy-chain (IgH) intronic enhancer19, μE5, was radioactively labeled and used in an EMSA with nuclear extracts from unstimulated LCLs, we observed two protein-DNA complexes that comigrated with complexes II and III observed with LTA+80A (Fig. 3a). The comigrating complexes binding to the radioactively labeled μE5 probe were competed away by molar excess of unlabeled probe matching LTA+80A but only weakly by LTA+80C (Fig. 3a).

Figure 3: In vitro recruitment of E-box binding proteins at LTA+80A.
figure 3

(a) Competition EMSAs using either a radioactively labeled probe matching LTA+80A or a consensus E-box binding element (μE5) with nuclear extract prepared from unstimulated LCLs in the presence of 10× and 100× molar excess of unlabeled competitor probes as shown. (b) Supershift EMSAs using radioactively labeled probe spanning LTA+80A with nuclear extract prepared from unstimulated LCLs incubated in the presence of antibodies as indicated. Unstimulated nuclear extracts were used in these experiments as complexes II and III were present constitutively.

Antibody studies of the allele-specific complexes II and II identified on EMSA as binding to the sequence spanning LTA+80A provided preliminary evidence that at least two E-box proteins were present (Fig. 3b). Complex II was supershifted on EMSA using nuclear extracts from LCLs in the presence of antibodies to ABF-1. ABF-1 is a class II bHLH family member recently identified in LCLs that is specifically expressed in lymphoid tissue20. We found no evidence of supershift with antibodies to the usual heterodimeric partner of ABF-1, the bHLH protein E47. Complex III was supershifted with antibodies to AP-4, a member of the bHLH-Zip subgroup of E-box binding proteins21. These data indicate that at least two E-box proteins show allele-specific recruitment to LTA+80A in vitro using crude nuclear extracts.

To examine further allele-specific binding by candidate E-box binding proteins, we expressed ABF-1, E47 and AP-4 in COS-7 cells. We incubated protein extracts with radioactively labeled oligoduplexes corresponding to sites LTA+80C, LTA+80A and μE5, and analyzed them using EMSAs (Fig. 4a,b). For ABF-1 expressed alone, we observed no binding to any of the probes, consistent with previous observations that ABF-1 normally binds as a heterodimer20. In contrast, E47, which is capable of binding as a homodimer22, bound the consensus μE5 probe and, more weakly, the LTA+80A probe; no binding was seen for LTA+80C. When we coexpressed ABF-1 with E47, we observed binding to μE5 and LTA+80A but no binding to the LTA+80C allele probe. The complex was supershifted in the presence of antibody to ABF-1 but not E47. The specificity of the antibody to ABF-1 was resolved by western blotting (see Supplementary Fig. 1 online). AP-4 bound strongly to both the μE5 and LTA+80A allele probes but only very weakly to the LTA+80C allele probes (Fig. 4c). These data indicate that the sequences spanning LTA+80 show allele-specific differences in their ability to bind E-box proteins in vitro.

Figure 4: Differential binding of in vitro expressed E-box binding proteins.
figure 4

We analyzed total protein extracts from COS-7 cells transfected with either an empty vector (V) or with plasmids expressing ABF-1, E47 or AP-4 by EMSA. Control lanes show, in order, radioactively labeled probe alone and radioactively labeled probe incubated with protein extracts from nontransfected cells. (a) Protein extracts from COS-7 cells transfected with E47 and ABF-1 expression plasmids singly or together and incubated with a radioactively labeled probe for consensus E-box binding element, μE5. (b) Protein extracts from COS-7 cells transfected with E47 and ABF-1 expression plasmids singly or together and incubated with radioactively labeled probe spanning LTA+80 with either C or A at the site of the SNP. (c) Protein extracts from COS-7 cells transfected with AP-4 expression plasmid analyzed for binding to radioactively labeled probes for consensus E-box binding element, μE5, or probes spanning LTA+80 with either C or A at the site of the SNP.

We then investigated whether the allele-specific recruitment of E-box binding proteins observed in vitro modulated levels of gene expression. We carried out transient transfection assays in which these proteins were overexpressed in the presence of constructs bearing the LTA+80 SNP. We engineered reporter gene constructs in which 20-bp DNA fragments of the LTA 5′ untranslated region (UTR) corresponding to each of the two allelic variants of the LTA+80 SNP were placed upstream of an SV40 promoter in a pGL3 luciferase reporter. We then transiently transfected these constructs into COS-7 cells. When cotransfected with ABF-1 and E47, the constructs bearing LTA+80A relative to the SV40 promoter alone had significantly lower reporter gene activity than did the LTA+80C constructs (Fig. 5). We observed a dose-dependent effect, with increasing repression when we cotransfected increasing amounts of ABF-1/E47 expression plasmids (Fig. 5). We found no significant difference between LTA reporter gene plasmids (P > 0.05) at any amount of AP-4 expression plasmid cotransfected (0.0625–0.5 μg). These data provided evidence that ABF-1 has the capacity to modulate reporter gene expression in an allele-specific manner, in contrast to AP-4, which showed higher DNA binding affinity in the presence of the LTA+80A allele but no difference in expression after cotransfection.

Figure 5: Allele-specific modulation of gene expression by ABF-1 and E47.
figure 5

We transiently transfected COS-7 cells with pGL3.prom reporter gene alone or with pGL3.prom reporters containing a 20-bp DNA fragment matching the LTA 5′ UTR (nucleotides +72 to +91) bearing C or A at the site of the LTA+80 SNP placed upstream of the SV40 promoter (designated pGL3.promLTA+80C and pGL3.promLTA+80A, respectively). We transfected increasing amounts of empty expression vector or expression vectors for ABF-1 and E47 with the reporter constructs, including pBSK as required to keep the total amount of plasmid transfected constant (0.5 μg). The mean ± s.e.m. of five independent transient transfection experiments are shown together with paired t-tests comparing constructs pGL3.promLTA+80A and pGL3.promLTA+80C (NS, P > 0.05; *P = 0.02; **P = 0.005; ***P = 0.003). Results are shown for pGL3.promLTA constructs relative to pGL3.prom using values derived from luciferase activity normalized to Renilla and expressed relative to empty expression plasmid. When we transfected E47 expression construct in the absence of ABF-1, we found no difference between levels of activity of the two LTA constructs using 0.0625, 0.125 or 0.25 μg of E47 expression plasmid. There was, however, a small but significant difference between LTA constructs using 0.5 μg of E47 expression plasmid (P = 0.043 in a paired t-test of five independent transfection experiments): mean luciferase activity of pGL3.promLTA+80A relative to pGL3.prom was 1.27 (s.d. 0.16) compared with 1.00 (s.d. 0.28) for pGL3.promLTA+80C relative to pGL3.prom. For pGL3.prom transfected alone into 4 × 105 COS-7 cells, mean luminometer values of 2,398 light units were obtained.

The functional data from assays of protein-DNA binding demonstrated the capacity in vitro for E-box binding proteins ABF-1 and AP-4 to bind in an allele-specific manner, modulated by the LTA+80 polymorphism. To seek evidence that these E-box binding proteins were recruited in vivo, we investigated binding of ABF-1 and AP-4 at the LTA promoter or 5′ UTR by chromatin immunoprecipitation (ChIP) using a LCL heterozygous with respect to LTA+80 (Fig. 6a). ABF-1 and AP-4 were recruited to the region, with AP-4 present constitutively but not after mitogen stimulation. We then carried out haplotype-specific ChIP (haploChIP) using the LTA+80 polymorphism as a marker to analyze allele-specific occupancy by ABF-1 and AP-4 in each of four B cell lines heterozygous with respect to LTA+80. We previously showed that accurate quantification of relative allelic abundance can be assayed by this method using primer extension–mass spectrometry (PE-MS)17. We generated standard curves for the LTA+80 polymorphism by analyzing differing known input ratios of plasmid DNA for the two allelic forms from 1:4 to 4:1. This analysis showed a strong linear relationship (r2 = 0.96) with the ratios observed by PE-MS. We then analyzed allele-specific occupancy by ABF-1 using haploChIP. This analysis showed progressively more binding by ABF-1 to the LTA+80A allele over time after mitogen stimulation (one-way analysis of variance test for linear trend over time P = 0.004, r2 = 0.349; Fig. 6b). In contrast, the constitutive binding that we observed for AP-4 showed no significant allelic difference, with a mean ratio LTA+80C:LTA+80A of 1.08 (95% confidence interval (c.i.) = 0.56–1.25).

Figure 6: Allele-specific occupancy by ABF-1 in vivo.
figure 6

(a) We analyzed recruitment of ABF-1 and AP-4 at the LTA promoter and 5′ UTR (nucleotides +171 to −46) by ChIP using a LCL stimulated with ionomycin and PMA (lanes 1–6) for the times shown. Input shows the products of PCR reactions containing 0.5% of total amount of chromatin used in immunoprecipitation reactions. Mock immunoprecipitation reactions using irrelevant control antibodies (against T antigen pAb101) together with immunoprecipitations done using antibodies specific for ABF-1 or AP-4 are shown. (b) Allele-specific loading of ABF-1 analyzed for LTA+80 SNP by PE-MS. We stimulated four cell lines heterozygous with respect to LTA+80 SNP with ionomycin and PMA. The overall geometric mean ratios (± 95% c.i.) of ABF-1 loading between the C and A alleles are shown.

Phenotypic variation in LTA production has previously been correlated with the LTA+252 polymorphism4,13,14,16. Here we identified nine different polymorphisms that correlate with LTA protein production. When LTA sequence variation is analyzed in detail using multivariate statistics, this complex pattern of association can be reduced to the effects of a specific low-producing haplotype clade. This commonly occurring haplotype is uniquely defined by two markers, one of which, LTA+80, was found in vitro to modulate allele-specific binding of two E-box binding proteins, ABF-1 and AP-4, in the presence of the LTA+80A allele. ABF-1 is a recently described transcription factor that has been particularly associated with lymphoid tissue20,23, raising the possibility that the functional effects of the LTA+80 polymorphism are tissue-specific. ABF-1 repressed transcription in our reporter gene model when the LTA+80A allele was present, consistent with previous reports that it acts as a transcriptional repressor20,23, whereas AP-4 had no effect on allele-specific expression.

The haploChIP method provides a powerful new tool for investigating whether regulatory polymorphisms alter specific DNA-protein interactions in vivo (i.e., to investigate polymorphisms in their natural chromatin structure in a living cell). The method was initially developed to assess the effects of allelic variation on polymerase II binding, as a surrogate marker of gene expression17. Here we show how it may be used to acquire specific information about the mode of action of a putative regulatory polymorphism. When we tested the hypothesis generated from our in vitro data that the LTA+80 polymorphism modulates binding of ABF-1 and AP-4, we found allele-specific differences in binding in vivo by haploChIP for ABF-1 whereas AP-4 showed constitutive binding that was unaffected by the polymorphism. The AP-4 signal on chromatin immunoprecipitation may emanate from a nonpolymorphic binding site in the same vicinity, such as the E-box motif that is located 317–312 nucleotides upstream of the transcriptional start site, which would be present on the same amplified DNA fragments after sonication as these are typically 1 kb in length.

Can the functional effects of the LTA+80A allele explain the reported epidemiological associations between polymorphisms in the LTA region and susceptibility to common diseases4,5,6,7,8,9,10,11,12? Our analysis of haplotype structure in 52 chromosomes of European origin confirmed high linkage disequilibrium across the 10-kb region. The LTA+80A allele was present on 70% of chromosomes analyzed. The LTA+252G allele was not present on any of the chromosomes that carried the LTA+80A allele but was present on 57% of chromosomes that did not. Thus, the significant association that we observed between the LTA+252G allele and high LTA protein production could entirely be explained, when examined by multivariate analysis, by the ability of the LTA+80A allele to suppress LTA production. Analysis of larger data sets or different experimental models may identify other functional variants at the LTA locus4, but based on these data, we can at least state that LTA+80A is a good functional candidate to explain the reported disease associations of the LTA+252G allele, and this needs to be addressed in future epidemiological studies.

Methods

Plasmids and haplotype construction.

To generate the pGL3.promLTA+80C and pGL3.promLTA+80A constructs, we designed oligonucleotides spanning the LTA+80 site (sequences are available on request), which we annealed and cloned into a KpnI–BglII-digested dephosphorylated pGL3-promoter vector (Promega) upstream of the SV40 promoter. We verified all constructs by DNA sequencing. Expression plasmids used were pHBA.FLAG-ABF-1 (ref. 20), pHBA-E47 (ref. 20), pHBAPr-1-neo24 and pcDNA3-AP-4MD25. To generate the haplotypic data for the panel of LCLs, we carried out long-range PCR from genomic DNA, cloning and sequencing as previously described17 and used polyphred alignment to create haplotypes from each hemizygous clone. Contig locations (GenBank accession number Y14768) for the SNPs described were as follows: LTA−294, 47622; LTA−293, 47621; LTA+10, 47318; LTA+80, 47248; LTA+252, 47076; LTA+368, 46960; LTA+438, 46890; LTA+495, 46833; LTA+723, 46605; TNF−1031, 45081; TNF−863, 44913; TNF−308, 44358.

Nuclear extracts, EMSAs, western blotting and antibodies.

We annealed oligonucleotide probes (sequences are available on request) and used Klenow to radioactively label the resulting oligoduplexes LTA+80C, LTA+80A, LTA+368G, LTA+368C, LTA−293G, LTA−293A, Oct-1 and μE5 with [α32P]dCTP (Amersham Pharmacia Biotech). We prepared nuclear extracts from 20 × 106 LCLs as previously described26,27. We mixed 2 μg of nuclear extract with 0.2–0.5 ng of labeled probe (1–5 × 104 c.p.m.) in a 10-μl reaction containing 12 mM HEPES buffer (pH 7.8), 100 mM NaCl, 1 mM EDTA, 1mM EGTA, 12% glycerol and 0.5 μg poly(dA-dT) (Amersham Pharmacia Biotech) and incubated it at room temperature for 20 min. We analyzed the reaction by electrophoresis in a nondenaturing polyacrylamide gel at 4 °C in 0.5× TBE buffer. For supershift analysis, we preincubated nuclear extract with appropriate antiserum at room temperature for 20 min before adding the radioactively labeled probe. We used the following antibodies for supershift studies: ABF-1 (sc9555) and Oct-1 (sc232; Santa Cruz); human monoclonal antibody to E47/E12 (554199; BD Biosciences); AP4 (ref. 28). We transfected COS-7 cells with constructs expressing pHBA.FLAG-ABF-1, pHBA-E47, pHBAPr-1-neo, pcDNA3-AP-4MD and pcDNA3 and then prepared total protein extracts by lysing cells in lysis buffer (20 mM Tris-HCl (pH 8), 300 mM NaCl, 0.1% Nonidet P-40, 10% glycerol) supplemented with Complete protease inhibitors (Boeringher Mannheim). We carried out western blotting for total protein in COS-7 cell extracts by fractionation on a 4–12% NuPage Bis-Tris polyacrylamide gel (Invitrogen) in 1× MOPS buffer, followed by transfer to a PVDF membrane (Invitrogen) and incubation with primary antibody in 4% dried milk powder with 0.1% Tween in phosphate-buffered saline. We used primary antibodies to ABF-1 (sc9555) at 1:500 dilution and to β-tubulin 1 (T-7816; Sigma) at 1:1,000,000 dilution. We visualized bands by chemiluminescence with horseradish peroxidase–conjugated rabbit antibody to goat and goat antibody to mouse, respectively, using ECL Plus (Amersham).

Cell culture, transfections and luciferase assay.

We obtained EBV-transformed B cell lines and cultured them as described previously17. We stimulated cells with 200 nM PMA (Sigma) and 125 nM ionomycin (Sigma) unless otherwise stated. We assayed LTA concentration in cell culture supernatants in duplicate using an LTA ELISA kit (AMS Biotechnology) according to the manufacturer's instructions. We analyzed data by linear regression using SPSS software. We cultured COS-7 cells in Dulbecco's modified Eagle medium supplemented with 10% fetal bovine serum, 100 U ml−1 penicillin, 100 mg ml−1 streptomycin, 0.2 mM L-glutamine at 37 °C in 5% humidified CO2. We used Polyfect (Qiagen) for transient transfection of expression constructs, with and without reporter gene constructs, into COS-7 cells following the manufacturer's protocol. We transfected 4 × 105 COS-7 cells with 1.5 μg of total DNA (0.5 μg of expression plasmid or pBSK, 0.167 μg of pGL3 reporter and 0.833 μg of pRL.TK) and collected them after 48 h. We cotransfected pRL.TK to normalize for transfection efficiency. We kept the total amount of DNA used in each transfection experiment constant by addition of pBSK. We lysed cells in 500 μl of 1× lysis buffer (Promega) and analyzed 10 μl using the Dual-Luciferase reporter assay kit (Promega) and a Turner luminometer model 20 (Promega) following the manufacturer's protocol.

Chromatin immunoprecipitation.

We prepared chromatin from 5 × 108 cells, sonicated it and purified it on a cesium-chloride gradient as previously described17. We incubated Dynabeads M-280 precoated with appropriate sheep antibodies to mouse or to rabbit IgG overnight with primary antibodies at 4 °C in phosphate-buffered saline containing 5 mg ml−1 bovine serum albumin (Sigma). To generate magnetic beads coated with antibody to goat, we incubated Dynabeads M-280 with biotinylated antibodies to goat IgG (227-AB-2G2; AMS Biotechnology) for 30 min at room temperature and then incubated washed beads with primary antibody as described. We used specific antibodies against ABF-1 and AP-4 as described for supershift analysis and also did mock antibody controls using antibody to actin (A2066; Sigma). After washing to remove unbound antibody, we incubated antibody beads overnight at 4 °C with 50 μg of chromatin in 1× RIPA buffer29 containing 1× complete protease inhibitors (Roche) and 5 μg ml−1 pepstatin (Roche) on a nutator. We washed bead-chromatin immunoprecipitations in 1× RIPA buffer containing 500 mM LiCl (Sigma) eight times with a final wash in Tris-EDTA buffer. We resuspended beads in 120 μl of Tris-EDTA digestion buffer containing 1% SDS, incubated them at 65 °C for 10 min, pelleted beads by centrifugation and incubated the supernatant at 65 °C overnight to reverse cross-links. We added proteinase K to 200 μg ml−1 final concentration, incubated at 37 °C for 2 h, extracted the DNA using phenol and chloroform and precipitated it with ethanol in the presence of 20 μg of glycogen. We resuspended pellets in 30 μl of Tris-EDTA containing 10 μg of RNase A, incubated them at 37 °C for 1 h, purified the reactions by spin column and assayed them by semiquantitative PCR30. We amplified the LTA region between −46 and +171 relative to the transcriptional start site (primer sequences available on request) with PCR cycling parameters of 96 °C for 1 min then 5 cycles of 96 °C for 35 s, 68 °C for 45 s, 72 °C for 35 s; then 20 cycles of 96 °C for 35 s, 64 °C for 45 s, 72 °C for 40 s; and then a final 6 cycles of 96 °C for 35 s, 56 °C for 60 s, 72 °C for 90 s. We carried out haploChIP analysis using PE-MS as previously described17. We carried out first-round amplification with primers and cycling parameters as described for semiquantitative PCR except we used 27 cycles rather than 20 cycles annealing at 64 °C. We carried out primer extension for LTA+80 (extension primer sequence available on request) using MassEXTEND (Sequenom) as described previously17. We immunoprecipitated each cell line twice and analyzed the products of immunoprecipitation by four independent PCR reactions, which were in turn each spotted in quadruplicate on the chip before PE-MS. As we have observed previously for certain SNPs17, we found a systematic allelic difference in PE-MS signal intensity for LTA+80 when we compared observed ratios with a range of known input ratios (from 1:4 to 4:1). ChIP sample ratios given were normalized by the input ratios.

Note: Supplementary information is available on the Nature Genetics website.