INTRODUCTION

There is growing evidence that non-coding genetic variation has an important role in the determination of individual susceptibility to complex disease.1, 2 Single nucleotide variants (SNVs) account for approximately 90% of all known sequence variation, most of which is located in non-coding regions of the genome.3 It is therefore likely that a significant number of SNVs that influence phenotype, that is, functional variants, actually lie outside amino-acid coding regions of genes and affect the regulation of gene expression as opposed to protein sequence and function.4, 5 Current estimates from the 1000 genomes data indicate that there are as many non-coding functional variants as coding ones.3 One of the greatest challenges currently facing genetics is determining which non-coding genetic variants have a functional effect on gene expression rather than being neutral.

The primary purpose of the study is to identify the functional variants that have been identified previously using an integrative genetics approach. We have previously used this approach to identify novel cis-acting genes that correlate with HDL cholesterol levels, a predictive determinant for cardiovascular disease.6 Leukocyte-based RNA samples from 1240 individuals in the San Antonio Family Heart Study (SAFHS)7 were used to obtain genome-wide transcriptional profiles. Analysis identified 67 cis-regulated transcripts whose expression levels significantly correlated with HDL-C levels. One gene, Vanin-1 (VNN1 (MIM No. 603570)), showed strong support for both cis-regulation and positive correlation with HDL-C levels. VNN1 is a pantetheine hydrolase (pantetheinase)8 that catalyses the hydrolysis of pantetheine to pantothenic acid (Vitamin B5) and cysteamine, a potent antioxidant.9, 10 Administration of pantethine, the dimer of pantetheine, to hyperlipidemic subjects results in the lowering of serum TG, low-density lipoproteins and ApoB, whilst increasing HDL-C and ApoA levels,11, 12 supporting our data implicating Vanin-1 in lipid biosynthesis (for a review, see Kaskow et al13).

The objective of the current study was the identification and molecular characterization of the functional cis-acting sequence variation in the promoter of VNN1. The ‘gold standard’ approach for determining whether cis-regulatory elements such as SNVs regulate gene expression consists of the reporter gene assay14, 15, 16 and the electrophoretic mobility shift assay (EMSA).17, 18 In the reporter gene assay, allelic promoter regions are cloned upstream of a reporter gene such as luciferase, transfected into relevant cell lines and assayed for reporter gene activity as a measure of transcriptional activity. Differences in activity between the allelic constructs are considered to reflect functional differences. The EMSA is used to detect allelic differences in transcription factor binding when comparing nucleotide sequence probes carrying specific SNVs (for a review of these approaches, see Knight19) Although these in vitro approaches have traditionally been considered adequate to ascertain the functionality of sequence variation, there are many variables that can confound the results and render the ascertainment of function inconclusive, as we have shown previously.20 This led us to develop strategies that assess function in an in vivo context.21 Here we use traditional approaches and also further develop in vivo approaches22 to validate statistically prioritized variants within the VNN1 promoter. In addition, we illustrate the potential of a novel, robust method for assessing variant’s functionality on a genome-wide level23 that we have developed based on next-generation sequencing technology. The combination of the ‘gold standard’ approaches and our more rigorous in vivo tests allows for a more confident assessment of whether a variant does indeed regulate gene expression and yields insights into the allele-specific mechanism of action.

METHODS

Nucleotide numbering

Numbering of nucleotides reflects transcription initiation site numbering as defined before.6 The transcription initiation site is defined as the nucleotide 13 basepairs upstream of the A of the ATG translational initiation codon of VNN1 (GenBank accession number NG_012147.1). In reference to the hg19 genome, the translation initiation codon is located at position chr6:133035174 with −137 (rs4897612:G>T; −150) at chr6:133035325 and −587 (rs2050153:G>A; −600) at chr6:133035775 according to 1-based numbering. The alternative translation initiation numbering is provided in brackets at the first mention in the text. All SNVs can be found in the dbSNP database.24

Statistical genetics

The description of the genotyping and statistical genetic methods used to prioritize the VNN1 promoter variants have been previously described by Goring et al.6 The population stratification, measured genotype, quantitative trait disequilibrium test (QTDT) and quantitative trait linkage disequilibrium (QTLD) results for the −137 SNP (rs4897612:G>T) were computed on the inverse-normalized VNN1 mRNA phenotype using the qtld command in the SOLAR computer package.25 This procedure performs a test for population stratification as well as the commonly used association tests, QTDT26 and the measured genotype test.27 The QTDT procedure is not limited to the scoring of allele transmission from parents to offspring but assesses the entire pedigree structure of the SAFHS cohort, thus increasing the power to detect an association.28 The measured genotype test uses a standard threshold model assuming an underlying normal distribution of liability. The measured genotype graph for the −137 SNP was generated using geom_boxplot in ggplot2.29

Cell culture

The Jurkat E6-1 cell line (ATCC No. TIB-152) was cultured in RPMI-1640 with L-glutamine (Life Technologies Co., Grand Island, NY, USA) supplemented with 100 g/ml of penicillin/streptomycin (Life Technologies Co.) and 10% heat inactivated FBS (Life Technologies Co.). Cells were maintained at 37 °C with 5% CO2 according to cell-specific guidelines. Epstein Bar Virus -transformed lymphoblastoid cell lines were created from the SAFHS cohort. Cell lines A04210, A01223, A01518, A02238, A00263, A01721 and A01897 were maintained as described above.

Reporter gene assays

To create reporter gene constructs, approximately 2 kilobasepairs (kb) of the VNN1 proximal promoter from −2021 to +14 was cloned into the pGL4.10 vector between the Acc65I and HindIII restriction cut sites immediately upstream of the firefly luciferase reporter gene. A construct representing the −587G/−137G haplotype was initially generated and subsequently each variant was mutated to the other allele using the Quikchange II Site-Directed mutagenesis kit (Agilent Technologies Pty Ltd, Melbourne, VIC, Australia) according to the manufacturer’s instructions. Genotypes were confirmed by sequencing.

Mycoplasma-free Jurkat E6-1 cells were transiently transfected using the Amaxa Nucleofector II system (Lonza Inc., Walkersville, MD, USA) according to the manufacturer’s instructions. In brief, 2 × 106 cells were harvested on day 7 of growth and resuspended in solution V. One microgram of DNA and 100 ng pRL-TK control vector was transfected per reaction using the setting X-001. Twenty-four hours post transfection, the cells were harvested, and luciferase activity was measured using the Dual luciferase Reporter Assay System (Promega Co., Sydney, NSW, Australia). Statistical significance was determined by the Student’s unpaired t-test. P<0.05 was considered significant.

EMSA

Complementary 5′ biotinylated oligonucleotides representing both allelic forms of the promoter variants were obtained commercially (Operon Biotechnologies, Inc., Huntsville, AL, USA). The sequences of the EMSA oligonucleotides VNN137T, VNN137G, VNN587G and VNN587A: 5′ are shown in Table 1. Nuclear extract preparation, binding reactions and gel analysis and detection were as described previously.6, 30 For EMSA supershift reactions, 500 ng human Sp1 supershift-grade antibody (Santa Cruz Biotechnology Inc., Santa Cruz, CA, USA) was incubated with the nuclear extract for 30 min on ice, then incubated with the labeled oligonucleotide for 30 min on ice before analysis.

Table 1 Sequences of EMSA and PCR oligonucleotides used in this study

Chromatin arrangement assay

Chromatin arrangement assays were performed as previously described22, 31, 32 with the following modifications. Nuclei were prepared and digested with nuclease using approximately 1 × 107 genotyped cells from the SAFHS cohort. DNA were purified using the QIAamp DNA blood mini kit (Qiagen Pty Ltd, Doncaster, VIC, Australia). Quantitative real-time PCR was performed using 100 ng DNA, 0.5 μ M of primers (−137 Chart F and −137 Chart R or −587 Chart F and −587 Chart R—see Table 1) and SensiMix SYBR No Rox (Bioline (Aust) Pty Ltd, Alexandria, NSW, Australia) in a final reaction volume of 20 μl. Thermal cycling conditions for the CFX96 Real-Time Detection System (Bio-Rad Laboratories Inc., Hercules, CA, USA) were as follows: 95 °C for 10 min; followed by 40 cycles of 95 °C for 15 s, 58 °C for 15 s and 72 °C for 15 s. Percentage of chromatin accessibility was calculated as a ratio of relative abundance of digested to undigested amplicon, a measurement calculated using the CFX Manager software (Bio-Rad Laboratories). Statistics were performed using GraphPad Prism v4 (GraphPad Software, San Diego, CA, USA). Statistical significance was determined by the Student’s unpaired t-test. P<0.05 was considered significant.

Bisulfite sequencing

Genomic DNA was isolated from cultured human leukocyte cell lines and purified using DNA Blood mini columns (Qiagen). Purified DNA was treated with bisulfite to convert unmethylated cytosine bases into uracils and subsequently purified using EpiTect spin columns (Qiagen). The converted DNA was enriched for the VNN1 gene promoter by PCR using primers VNNB F and VNNB R (Table 1) flanking the predicted CpG island (between −758 to −496 upstream of the VNN1 transcriptional start site). The amplification products were cloned into the pCR2.1-TOPO vector (Life Technologies Co.) for sequencing (Australian Genome Research Facility, Perth, WA, Australia). Sequences were analyzed using the online analysis program Bisulfite Sequencing DNA Methylation Analysis (BISMA) to identify the methylation state of predicted CG dinucleotides in CpG islands.33

Chromatin arrangement–sequencing (CHA-seq)

Nuclei were isolated from 1 × 107 cells of an immortalized leukocyte cell line (ID: A01721) heterozygous at −137 and −587 of VNN1 identified by our previous sequence analysis of the VNN1 promoter.6 Isolated nuclei were digested with micrococcal nuclease (New England Biolabs Inc., Ipswich, MA, USA) for 20 min at 37 °C, and DNA purified using DNA Blood mini columns (Qiagen). DNA fragment sizes, including mononucleosomal and dinucleosomal fragments as well as intervening sizes (140–280 basepairs), were isolated following agarose gel electrophoresis to maximize the capture of non-nucleosomal nuclease sensitivity. Purified fragments were prepared for next-generation sequencing using the NEBNext DNA sample prep kit (New England Biolabs) according to the manufacturer’s protocol. The prepared library was enriched for VNN1 sequences using the SureSelect DNA capture array (Agilent Technologies Inc., Santa Clara CA, USA), then sequenced on an Illumina Genome Analyser II (Illumina Inc., San Diego, CA, USA). The sequence data were analyzed using the Integrative Genomics Viewer (IGV) 2.0 software.34

RESULTS

Identification of putative functional variants within the VNN1 promoter

To identify cis-acting functional elements within the VNN1 promoter, we previously re-sequenced 2 kb of proximal promoter upstream from the VNN1 transcriptional start site and identified 22 promoter variants in the SAFHS founder population.6 The 6 best candidate variants were typed in all 1240 SAFHS samples to assess association with VNN1 transcript level in the larger population. Analysis revealed five variants strongly associated with VNN1 transcript abundance; the −137 (rs4897612:G>T) and −587 (rs2050153:G>A) variants showing the strongest relationship. In the current study, the population stratification, measured genotype, QTDT and QTLD results for the −137 variant were computed on the inverse-normalized VNN1 mRNA phenotype. The measured genotype results for the −137 variant, demonstrating an additive genetic effect, are shown in Figure 1. The strong linkage disequilibrium (LD) between the −137 and −587 SNPs (ρ2=0.99) impeded further statistical prioritization, as the two variants were providing essentially identical genetic information. Functional validation was necessary to resolve whether one or both of the variants was responsible for influencing the expression of the mRNA.

Figure 1
figure 1

Box and whisker plot for the measured genotype analysis of the −137 SNP on VNN1 mRNA abundance in leukocytes from the SAFHS participants. The absence of the G allele strongly impairs expression (P=5.7 × 10−83). The mRNA phenotype has been inverse-normalized for this analysis, resulting in a population mean of zero, with units approximating the SDs.

‘Gold standard’ functional tests of the −137 and −587 VNN1 promoter SNPs

In a previous study, we reported the identification of several SNVs within the proximal promoter of the VNN1 gene that showed significant association. The variant most highly correlated with VNN1 transcript levels in leukocytes (−137) was also shown by EMSA to exhibit differential nuclear factor binding supporting a potential regulatory role.6 To more fully assess the functional attributes of the −137 SNP and the next most significantly correlated variant at position −587, we have now used the ‘gold standard’ approaches for characterizing variant functionality. Data from the interrogation of the ENCODE databases using RegulomeDB35 did not provide any convincing evidence that the SNPs were located in a transcriptionally active region, although there is a paucity of ENCODE data for lymphocytes, the cell type with which the original study was done. Also, no ChIP-seq data were available for transcription factor Sp1, which we have tentatively identified as being relevant previously,6 and no evidence was found for the binding of miRNA or other non-coding RNAs. For the gold standard tests, a 2-kb region of the VNN1 promoter was inserted upstream of a firefly luciferase reporter gene. A construct representing the major −137T haplotype (−137T/−587G) was generated initially and subsequently mutated to the other allele to allow the independent assessment of variants at each site. Reporter gene analysis indicated that the −137G allele showed significantly higher expression (1.36-fold, P=0.005) than the T allele (Figure 2a). The functional attributes of the −137 alleles were also tested in the context of both −587 alleles. The results indicated that the −587 SNP alleles did not influence transcriptional activity differences due to the −137 alleles in this assay.

Figure 2
figure 2

Gold standard assays assessing −137 and −587 SNP functionality. (a) Relative transcriptional activity of VNN1 promoter variant haplotypes transfected in Jurkat E6-1 cells is shown as a measure of firefly luciferase activity normalized to renilla activity. (n=3, bars represent mean±SEM, **P<0.005) (b) EMSA of the −137 variant and surrounding sequences. Approximately 20 fmol radiolabeled probe was incubated with 3 μg Jurkat nuclear extract, and analyzed by gel electrophoresis. Unbound labeled DNA probe is indicated. Binding of a specific complex to the G allele is shown and on addition of Sp1 antibody results in the upward shift of this complex as shown by the arrows. (c) EMSA of the −587 SNP shows no allelic difference in transcription factor binding.

To determine whether the allelic difference in transcriptional activity related to differences in transcription factor binding, EMSA was performed. The results confirmed our previous results that showed allele-specific difference in transcription factor binding at the −137 SNP. As previous bioinformatic analysis indicated the presence of a putative Sp1 transcription factor binding site in the −137G region,6 supershift EMSA was done using an anti-Sp1 antibody. The major complex interacting with the −137G sequence was supershifted with this antibody, indicating that the major −137G complex contained the activating transcription factor Sp1 (Figure 2b). This is consistent with the reporter gene analysis where the −137G allelic reporter showed an increase in transcriptional activity compared with the −137T allele. In contrast, no binding was seen for either the −587 G or A allele (Figure 2c). Taken together, these ‘gold standard’ tests would indicate that the −137 SNP is functional, whereas the −587 variant is not.

Allele-specific chromatin accessibility analysis of the −137 and −587 SNPs

One limitation of the ‘gold standard’ approaches is that they fail to assess variant function in their in vivo chromosomal context. As chromatin context is dependent on DNA sequence per se as well as methylation status, variants can potentially alter the conformation of the surrounding chromatin and result in distinct transcriptional potential and hence alter the regulation of a gene. Given that the −587 G variant allele occurs as part of a CpG dinucleotide and is a potential site for methylation, the development of a chromatin accessibility assay was particularly important in assigning function. To determine the endogenous chromatin conformation around both the −137 and −587 SNPs, genotyped lymphoblastoid cell lines from the SAFHS population were used to perform chromatin arrangement by real-time PCR (CHART-PCR) assays.22, 32 The results showed that −587 A/A homozygotes displayed a greater chromatin accessibility (27%) around the −587 site than −587 G/G homozygotes (10%, Figure 3). As expected, the −587 heterozygotes showed an intermediate level of accessibility (16%). Similar results were obtained for the −137 SNP, where the T allele (27%) showed a greater accessibility than the G allele (13%) with intermediate levels for the heterozyote −137 G/T lines (19%). Overall, the results indicate that both the −587 and the −137 SNPs differentially influence chromatin arrangement and potentially the expression of VNN1.

Figure 3
figure 3

Chromatin accessibility differences at the −137 and −587 SNPs. MNase-treated cell lines of the genotypes shown were used to calculate chromatin accessibility expressed as a ratio of undigested/MNase digested (percentage of cutting). The data were obtained from five different cell lines for each genotype assayed in triplicate. Significance was determined by the Student’s t-test. Error bars represent mean±SEM, *P<0.05, **P<0.005, ***P<0.001.

Allele-specific methylation status of the −587 SNP region

Bioinformatic analysis36 of the VNN1 promoter region surrounding the −587 and −137 SNPs indicated the presence of a CpG island encompassing the −587 site, the predicted extent of which was different for each allele. For the −587 G allele (which is in LD with the −137 T allele), the region was 264 bp in length, and for the A allele, a predicted 204 bp region (Figure 4a). Although no data were available for lymphocytes, ENCODE data also supported the presence of a CpG island encompassing the −587 variant in brain tissue.35 As the chromatin accessibility assay results indicated that the −587 G variant was in a region of condensed chromatin, it was of interest to determine whether the −587 G nucleotide methylation status was directly responsible for the difference predicted. We therefore performed bisulfite sequence analysis on genotyped lymphoblastoid cell lines from the SAHFS population to assess the DNA methylation status of the VNN1 promoter CpG island in the region in which the −587 SNP was embedded (Figure 4b). The −587 G site in both homozygotes and heterozygotes shows near complete cytosine methylation (on the opposite strand). In fact, all CpG dinucleotides displayed heavy methylation, except at the −720 (−733) CpG site. For this site, the methylation pattern correlated with −587 genotype. On haplotypes carrying the −587 G allele, the −720 CpG site was generally unmethylated. Conversely, when the −587 A allele is present, the −720 CpG site was almost exclusively cytosine methylated. This pattern was present even in the heterozygous lines, indicating that the allele-specific pattern is not cell line specific but is rather determined by genotype. The functional consequence of this is yet to be determined but is likely responsible for the −587 allelic differences apparent in chromatin arrangement.

Figure 4
figure 4

Methylation status of the −587 SNP. (a) Bioinformatic predictions of CpG island length. Black circles represent a predicted CpG island length of 264 bp when the −587 G allele is present. Grey circles represent a CpG island length of 204 bp when the −587 A allele is present. Grey boxes indicated methylated CG dinucleotides. R=G or A. (b) Bisulfite sequence analysis on genotyped lymphoblastoid lines from the SAFHS population. Line designations and −587 genotype are shown on the left. Black boxes indicate methylation. White boxes indicate no methylation. ‘A’ represents the alternate allele at −587, which is not a substrate for CpG methylation. The grey shading indicates heterozygote cell lines. Near complete methylation is seen at the −587 G allele. The methylation status at −587 appears to effect methylation at −720, a non-variant site.

Assessment of chromatin arrangement at the −587 SNP using a novel genome-wide assay

As our results suggested that the effects of allelic variation may be manifest at regions remote from the actual site of sequence difference, we have translated the CHART-PCR assay into a genome-wide format (CHA-seq) in order to assess allele-specific differences in chromatin arrangement across an extended genomic region. A micrococcal nuclease-digested genomic library was prepared from a −587 G/A heterozygous cell line from the SAFHS cohort. The library was subjected to next-generation sequencing to map the nuclease digestion sites. The results showed that the region surrounding the −587 SNP displayed a periodicity in the number of nuclease cut sites consistent with the positioning of the nucleosomes in the region (Figure 5a). Comparison of the number of sequenced library fragments ending at a particular nucleotide (and hence mapping the sites of digestion by the nuclease) for each −587 variant indicated an allele-specific bias in the digestion pattern (Figures 5a and b). Only 12% of the total number of fragments carried −587 G, with the majority carrying the A allele (88%). This is consistent with the region of the haplome carrying the −587G allele being subject to greater chromatin condensation than that carrying the A allele.

Figure 5
figure 5

Assessment of the chromatin arrangement across the VNN1 promoter using CHA-seq. (a) Integrated Genomics Viewer output of the region surrounding the −587 SNP (rs2050153) of the VNN1 promoter. Shown is the total coverage across the region and the percentage of reads carrying each allele for the variants located in the region. The −587A allele is shown in black and the −587G allele in white. Three putatively non-functional variants (rs45513194:C>G; −430, rs13204527:G>A; −681, and rs7760367:A>G; −723) are shown and act as controls. Also shown are the putative positions of the nucleosomes based on sequence coverage. (b) Sites of digestion by Micrococcal nuclease are plotted across the −587 VNN1 promoter region. The number of sequenced fragments ending at each nucleotide are indicated for each −587 allele (A or G).

DISCUSSION

The ‘gold standard’ functional tests have been useful over the past two decades as a test for determining the functionality of genetic variants. However, both the reporter gene and EMSA assays are unreliable in a number of respects. We have developed strategies that initially involves a statistical prioritization of sequence variants based on comprehensive genotyping and correlation with transcript expression levels,6 and in the current study, application of the ‘gold standard’ functional tests followed by novel in vivo tests that quantitate differences in chromatin arrangement as a measure of functional difference. Our strategy serves to greatly increase confidence in assigning function to a particular variant.

Our results indicated that the two SNPs within the VNN1 promoter, previously predicted to be functional following extensive statistical interrogation of transcript levels from the SAFHS,6 were functional. The ‘gold standard’ assays established that the −137 SNP was functional and showed a higher level of transcriptional activity for the −137G allele that is consistent with the ability of this allele to preferentially bind the activating transcription factor Sp1 in EMSA. In contrast, the −587 SNP alleles displayed no differences in either transcriptional activity or transcription factor binding.

However, application of our in vivo tests show a significant allelic difference in chromatin accessibility surrounding the −587 SNP and consequently indicates strong functional potential at this variant also. As one of the alleles at −587 is contained within a CG dinucleotide and hence is a potential site of cytosine methylation, bioinformatic analysis of the region surrounding the −587 SNP revealed allele-specific differences in the extent of the CpG island predicted to be present in the VNN1 promoter. Assessment of the methylation status of the region surrounding the −587 SNP revealed an allele-specific methylation pattern at a single CG dinucleotide approximately 120 bp upstream of the −587 SNP, establishing that allele-specific chromatin differences can be seen at sites remote from the actual genetic variation. Similar results have been observed previously,37 where two PBL and one CD34-positive cell samples showed a similar pattern to our lymphoblastoid cell lines, of differential CpG methylation at the −720 site. However, in contrast to our results, the results from the previous study suggest that the genotype at −587 does not correlate with −720 methylation status. Given the almost complete LD between −587 and −137 in our SAFHS population, it is possible that the effects seen at the −720 site are due to allele-specific events at the −137 site. In partial support of this, the reporter gene analysis shows that the allele-specific effects seen for the −137 SNP are independent of −587 genotype.

A model to provide a possible explanation for the allele-specific methylation events is that −137G allele-specific Sp1 transcription factor binding leads to allele-specific chromatinization of a restricted promoter region upstream of the VNN1 gene and consequent increased gene expression (Figure 6). We propose that regulatory factor/s are able bind to a region upstream of the −720 site in the −137G allele due to this restricted chromatinization (Figure 6b), whereas in the −137T allele the extended CpG island blocks binding (Figure 6a). The upstream factor may serve to increase transcription via interaction with the downstream Sp1 binding at −137 (Figure 6b) or otherwise by remodeling the chromatin at the CpG island leading to derepressed Sp1-mediated activation. It is likely that the genotype at the −137 site dictates both the occupancy of the Sp1 site and also the chromatin conformation of the VNN1 upstream promoter region and serves to regulate genotype-specific VNN1 expression.

Figure 6
figure 6

Model of the interplay between variants at the −137 and −587 sites and the upstream CpG island of the VNN1 promoter. (a) The −137T/−587G haplotype induces methylation of a region that extends past the −720 site blocking binding of regulatory factors to sites upstream. (b) In the −137G/−587A haplotype, the −720 site is methylated but the −587 site is not, leading to a restricted CpG island methylation which allows binding of regulatory factors upstream of −720. Also, the activating transcription factor Sp1 is able to bind at the −137G site. This has the effect of enhancing transcription by direct or indirect interaction between the upstream regulatory factor and the Sp1 transcription factor.

In any case, the results obtained from this study provide a platform to investigate the allele-specific regulation of the VNN1 gene and the influence of the genetic variation on CVD-related phenotypes. We will use the functional cis-regulatory SNP/s identified here to define, by association analyses in our lymphocyte transcriptome data set, those genes that are obligately downstream of VNN1. Identification of −137 as a true functional SNP also allows a simplified re-evaluation of genetic associations with multiple CVD risk-related phenotypes (including triglycerides, HDL-C, carotid wall thickness).

As we have shown previously,38 the functional effects of SNP variation are often only manifest under specific tissue-specific or stimulus-specific conditions. Although our current studies were carried out in genotyped lymphocytes or lymphocyte cell lines, a cell type that is easily obtained, many functional studies will be limited due to the context in which a functional effect will be seen. One of the potential limitations of our in vivo assays, which require the isolation of nuclei from genotyped cells, is the availability of the appropriate cell or tissue type, but results from our own expression QTL studies6 and those of others39 indicate that significant functional information can be obtained for genes that are not normally expressed in the cell type used.

Our approach to the identification of non-coding functional variation, involving statistical prioritization of variants that are associated with transcript levels of the locus, followed by the application of a comprehensive suite of standard as well as new in vivo tests for function, will greatly improve the positive identification of non-coding functional variants by reducing the levels of false-positive ascertainment in addition to providing more compelling evidence with respect to the mechanism of action.