Tools and Resources

Parallel functional testing identifies enhancers active in early postnatal mouse brain

Department of Psychiatry and Behavioral Sciences, University of California, Davis, United States
Department of Neurobiology, Physiology and Behavior, University of California, Davis, United States
Department of Pediatrics and Human Development, Grand Rapids Research Center, Michigan State University, United States
Helen Wills Neuroscience Institute, University of California, Berkeley, United States
Departments of Ophthalmology and Neurobiology, University of Pittsburgh, United States

Oct 4, 2021

https://doi.org/10.7554/eLife.69479

Open access
Copyright information

Version of Record

Accepted for publication after peer review and revision.

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

Version of Record published: November 9, 2021 (This version)
Accepted Manuscript published: October 4, 2021 (Go to version)
Accepted: October 2, 2021
Received: April 19, 2021
Preprint posted: January 15, 2021 (Go to version)

1. Of interest
Imputation of 3D genome structure by genetic–epigenetic interaction modeling in mice

Lauren Kuffler, Daniel A Skelly ... Gregory W Carter

Research Article Apr 26, 2024
Further reading

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Enhancers are cis-regulatory elements that play critical regulatory roles in modulating developmental transcription programs and driving cell-type-specific and context-dependent gene expression in the brain. The development of massively parallel reporter assays (MPRAs) has enabled high-throughput functional screening of candidate DNA sequences for enhancer activity. Tissue-specific screening of in vivo enhancer function at scale has the potential to greatly expand our understanding of the role of non-coding sequences in development, evolution, and disease. Here, we adapted a self-transcribing regulatory element MPRA strategy for delivery to early postnatal mouse brain via recombinant adeno-associated virus (rAAV). We identified and validated putative enhancers capable of driving reporter gene expression in mouse forebrain, including regulatory elements within an intronic CACNA1C linkage disequilibrium block associated with risk in neuropsychiatric disorder genetic studies. Paired screening and single enhancer in vivo functional testing, as we show here, represents a powerful approach towards characterizing regulatory activity of enhancers and understanding how enhancer sequences organize gene expression in the brain.

Introduction

Cis-regulatory elements such as enhancers are critical drivers of spatiotemporal gene expression within the developing and mature brain (Nord, 2013). Enhancers integrate the combinatorial functions of transcription factors (Fuxman Bass et al., 2015) and chromatin organizers (Calo and Wysocka, 2013) to drive cell and regional-specific expression of genes across brain development. DNA sequence variation within enhancers has been associated with both evolution (McLean et al., 2011) and the genetic etiology of neurological disorders (Dunham, 2012; Perenthaler et al., 2019; D’haene and Vergult, 2021). The first identified enhancers were defined by functional capacity to amplify transcriptional activity in reporter plasmids (Moreau et al., 1981; Banerji et al., 1981). Putative enhancers have been predicted via DNA conservation using comparative genomics and, more recently, by epigenetic signatures such as open chromatin from DNaseI hypersensitive site sequencing (DNase-seq) or assaying for transposase-accessible chromatin using sequencing (ATAC-seq) (Dunham, 2012), histone tail modifications from chromatin immunoprecipitation sequencing (ChIP-seq) (Nord, 2013; Visel, 2013), and 3D chromatin organization (Schoenfelder and Fraser, 2019); however, these approaches are proxy measurements that do not directly evaluate whether a DNA sequence acts as a functional enhancer (Benton et al., 2019).

Enhancer reporter assays assess the ability of candidate DNA sequences to drive expression of a reporter gene, and have been used as the primary means of functionally testing activity of predicted enhancers (Kvon, 2015). Transgenic mouse enhancer assays have been applied to characterize the regulatory activity of candidate DNA sequences in the mouse, in both the developing and mature brain (Nord, 2013; Visel, 2013; Silberberg, 2016), and recently viral vectors have been used to deliver enhancer reporter constructs to mouse brain as well (Graybuck et al., 2021; Mich, 2021). These assays offer exceptional information on tissue-specific enhancer activity but require one-by-one testing of individual candidates. The advancement of massively parallel reporter assays (MPRAs) has led to the ability to functionally screen a multitude of DNA sequences for enhancer activity in parallel within a single experiment using sequencing-based quantification (Inoue and Ahituv, 2015). To date, there are a few promising demonstrations of MPRA applied in vivo in brain (Shen et al., 2016; Shen, 2019), although screening approaches to characterize enhancers remain highly novel, particularly in developing brain. Functional assessment of enhancer activity and the use of enhancers to express reporters and genes in mouse brain cells has emerged as an area of major interest (Dimidschstein et al., 2016; Hrvatin, 2019; Nord and West, 2020; Vogt, 2014). In vivo screening approaches represent an ideal method to perform broad testing at scale and maintain tissue-type and developmental context.

Here, we adapted an MPRA functional enhancer screen, self-transcribing active regulatory region sequencing (STARR-seq) (Arnold et al., 2013), for application to early postnatal mouse brain via recombinant adeno-associated virus (rAAV) delivery. We screened a library of candidate human DNA sequences spanning putative brain-specific regulatory sequences and regions associated with genome-wide association studies of epilepsy and schizophrenia. We identified sequences capable of enhancing reporter expression in early postnatal mouse cortex, and validated positive and negative MPRA expression predictions, including within the neuropsychiatric disorder-associated third intron of CACNA1C (Moon et al., 2018; Ferreira et al., 2008; Sklar, 2008). Our results demonstrate the utility of parallel functional testing to interrogate the regulatory activity of enhancers in early postnatal mouse brain and highlight opportunities for functional screening in studies of normal brain development and function as well as the etiology of neurodevelopmental and neuropsychiatric disorders.

Results

Developing and validating AAV-MPRA strategy

We used a modified STARR-seq (Arnold et al., 2013) MPRA orientation, in which the candidates of interest were cloned in the 3’ untranslated region (3’-UTR) of a reporter gene (here EGFP) driven by the Hsp1a minimal promoter (HspMinP; formerly referred to as Hsp68 minP Bevilacqua et al., 1995). This reporter construct was flanked by inverted terminal repeat regions (ITRs) necessary for packaging self-complementary adeno-associated virus (scAAV), which has the advantage of faster transduction rates compared to conventional AAV (McCarty, 2008; Figure 1A, Figure 1—figure supplement 1). Enhancers are proposed to function independent of orientation and the position relative to the transcriptional start site (Serfling et al., 1985), as has been demonstrated in in vitro applications of STARR-seq (Arnold et al., 2013; Klein et al., 2020). However, we first sought to verify that an enhancer sequence in the STARR-seq orientation not only increased transcription generally, but also did so in a cell-type defined restricted manner in vivo when paired with HspMinP and delivered via scAAV. To validate this, we cloned a GABAergic-biased mouse Dlx5/6 enhancer (Dlx) (Dimidschstein et al., 2016; Lee et al., 2014) into the scAAV reporter vector. scAAV9 carrying HspMinP-EGFP-Dlx was mixed with AAV9 carrying the constitutively active CAG-mRuby3 reporter and this mixture was delivered to postnatal day (P)0 mouse forebrain via brain intraventricular injection and collected at P7. The expression of mRuby3 was used as a positive control and to locate the injection site for analysis. Red fluorescent cells, marking the focus of viral exposure, were found distributed in the cortex and hippocampus (Figure 1B, Figure 1—figure supplement 2A, B). We analyzed EGFP-expressing cells in these regions. The brains were fixed, cryosectioned, and stained with an antibody for GFP to enhance detection of Dlx activity (see Materials and methods).

Figure 1 with 4 supplements see all

Download asset Open asset

Designing and validating 3’-UTR enhancer reporter AAV assay.

(A) Schematic of in vivo parallelized functional enhancer reporter assay. The test library was generated using the previral vector pscAAV-HspMinP-EGFP, which contained a multiple cloning site (light grey) between the EGFP reporter and polyadenylation site (PAS). Purified PCR products for test amplicons were cloned into the vector using Gibson assembly. The previral library was packaged into AAV9(2YF), and the viral library delivered to the brain via injection at P0. Brains were collected at P7. (B) Representative image of a coronal section of a P7 mouse brain injected at P0 with a virus mixture consisting of an AAV containing the STARR-seq vector carrying the inhibitory interneuron enhancer *Dlx* (scAAV9-HspMinP-EGFP-*Dlx*) and an injection control AAV containing an expression vector for mRuby3 under the control of CAG, a general mammalian promoter. EGFP expression was visualized via IHC using an anti-GFP antibody, while mRuby3 expression was visualized using native fluorescence. Insets show close up of boxed regions showing morphology of EGFP-expressing cells in the cortex. (C) Sections from P7 mouse cortex transduced with *Dlx*-driven STARR-seq reporter vector and mRuby3 injection control at P0, counterstained with an antibody for Lhx6, a transcription factor active in deep cortical layer interneurons. EGFP-expressing cells with Lhx6⁺ Nuclei are indicated with arrows. Inset graph shows fraction of EGFP- or mRuby3-expressing cells co-labeled with Lhx6 in three replicate animals injected with scAAV9-HspMinP-EGFP-*Dlx* (Animal 1, n = 20 EGFP+ cells, 218 mRuby3+ cells; Animal 2, n = 18 EGFP+ cells, 435 mRuby3+ cells; Animal 3, n = 32 EGFP+ cells, 311 mRuby3+ cells) or one animal injected with AAV9-*Dlx*-βGlobinMinP-EGFP (Dimidschstein et al., 2016; Lee et al., 2014) (n = 31 EGFP+ cells, 63 mRuby3+ cells). (D) Ratiometric (log₂ RNA/DNA) activity of miniMPRA mouse library in P7 mouse cortex after injection at P0. Boxplot of distribution and individual replicates (N = 4) shown for the 16 tested candidates. NEG indicates putative negative candidate; otherwise, name indicates nearby gene and if applicable, embryonic enhancer ID, for positive candidates.

We evaluated cell-type specificity two ways: by assessing morphology and by counterstaining with an antibody for Lhx6, a cardinal transcription factor expressed in both developing and mature neurons derived from the medial ganglionic eminence (Sandberg, 2016; Alifragis et al., 2004). EGFP⁺ cells in these P7 brains had small cell somas and neurites, typical of interneurons compared to pyramidal cells and glia (Figure 1B, Figure 1—figure supplement 2A, B); many were Lhx6⁺, but this was quite variable across animals (Figure 1C, Figure 1—figure supplement 3). We noted that the EGFP⁺ cells that were negative for Lhx6 staining still primarily appeared to be interneurons based on morphology. Since Dlx should be active in multiple GABAergic interneuron types in the neocortex (Dimidschstein et al., 2016), we systematically analyzed the morphology of EGFP⁺ cells (Figure 1—figure supplement 2). The principal excitatory neurons in the cortex and hippocampus have a typical pyramidal shape with a tear-drop cell body and a long, primary apical dendrite, whereas inhibitory interneurons are more variable in their appearance. Furthermore, excitatory neurons in the hippocampus are restricted to specific pyramidal and granule layers, and spatial location could be used as a second classification criterion there to lend extra confidence to the morphological assessment. Based on these morphological and spatial criteria, the vast majority of EGFP-expressing cells resemble interneurons (92.7 %, n = 314 cells in five animals, Figure 1—figure supplement 2A, C). 3.2 % of EGFP⁺ cells exhibited pyramidal morphology. We compared these results with expression driven from the same Dlx sequence in a conventional orientation upstream of a minimal β-Globin promotor, a construct that has previously been validated to exhibit interneuron-biased expression (Dimidschstein et al., 2016). The specificity for interneurons in our hands was similar in both constructs (Figure 1—figure supplement 2B, C). These data, taken together, indicate that enhancer orientation and choice of minimal promoter did not greatly affect cell-type-specific activity of the enhancer reporter construct.

We next performed a small-scale parallel reporter assay (miniMPRA), testing 16 mouse sequences representing putative enhancer and negative control sequences (Figure 1D, Supplementary file 1). Long sequences of interest (~900 bp) were selected to maximize the chromosomal context of candidate enhancer elements within these regions. These sequences were predicted to have either enhancer activity or no activity in postnatal mouse brain based on previous transgenic mouse enhancer assays and on mouse brain H3K27ac ChIP-seq data (Nord, 2013; Visel, 2013). We batch-cloned these sequences into the 3’-UTR of our scAAV reporter construct, and we packaged this into a scAAV library using AAV9(2YF) (Dalkara et al., 2012). AAV9(2YF) is a tyrosine-mutated derivative of AAV9, which has increased transduction in the brain and similar tropism compared to standard AAV9. We delivered the scAAV library to P0 mouse prefrontal cortex by direct injection and collected the brains for DNA and RNA analysis at P7 (Figure 1A). In this proof-of-principle experiment, putative negative control sequences displayed low transcription levels based on the ratio of RNA-seq read counts to DNA-seq read counts. On the other hand, previously validated embryonic brain enhancers (Visel, 2013) and putative enhancers of interest near genes associated with neurodevelopmental disorders or epilepsy (Turner, 2016; Nakayama et al., 2010; Gao et al., 2017) were found to be active as enhancers in this assay at P7 (Figure 1D). Based on this proof-of-principle experiment, we moved forward with scaling up our scAAV STARR-seq strategy to testing hundreds of sequences in mouse brain.

Cloning and AAV packaging of MPRA library

We generated an MPRA library targeting 408 candidate human DNA sequences for testing, each ~900 bp in size, to assess functional enhancer activity in vivo in early postnatal mouse brain. These sequences were categorized into four groups (Figure 2A), two groups without ascertainment based on potential regulatory activity, and two groups selected based on presence of epigenomic signatures commonly used to predict enhancer activity (Supplementary file 2). For the first group (‘GWAS’), we selected genomic intervals containing single-nucleotide polymorphisms (SNPs) identified as lead SNPs from genome-wide association studies (GWAS) for epilepsy (Abou-Khalil, 2018) and schizophrenia (Ripke, 2014) as identified using the NHGRI-EBI GWAS Catalog (Buniello et al., 2019). For the second group (‘LD’), we selected five lead SNPs from these same studies, and included genomic intervals that included all SNPs in linkage disequilibrium (LD, r² >0.8, Auton et al., 2015) within these non-coding regions associated with the CACNA1C and SCN1A loci. Because SNPs can be inherited together, GWAS lead SNPs are not necessarily causal and further evidence is necessary to prioritize linked variants. Thus, most SNP-containing amplicons in these first two groups were expected to be negative for enhancer activity. Indeed, less than 20 % of GWAS and LD group sequences had strong evidence of open chromatin in fetal brain or adult neurons (Figure 2A, lower panel). For the third group (‘FBDHS’), we selected regions overlapping DNaseI hypersensitive sites in fetal human brain from ENCODE (Dunham, 2012), selecting candidates within copy number variant regions near autism risk genes (Turner, 2016; Turner et al., 2017). We expected this group to be enriched for active enhancers in our assay, although enhancers active in fetal brain may not continue to have activity at the P7 postnatal time point tested in our assay. Finally, we included human orthologs of the 16 sequences tested in our miniMPRA experiment (‘PutEnh’), which included putative positive and negative sequences. Thus our final library included a balance of sequences that are expected to have no enhancer activity (59 %), as well as sequences with evidence for regulatory capacity in the brain (41 %).

Figure 2 with 5 supplements see all

Download asset Open asset

In vivo AAV MPRA yields amplicons capable of enhancing transcription, enriched for signatures associated with enhancers.

(A) Graphical representation of library composition. Top panel shows the fraction of the total library made up by each group of amplicons. Bottom panel shows the fraction of amplicons in each group that were positive for the given epigenomic signature. (B) Mean RNA and DNA representation in the assay for candidates that passed inclusion criteria (N = 308). Amplicons with significantly (p < 0.05, FDR < 0.1) increased model residual value (Res.) in RNA compared to DNA are shown in orange and red. Normal p-values; empirical FDR q-values (See Materials and methods). (C) Bar plot representing mean activity based on RNA/DNA ratio in the test assay with individual replicates shown as dots. Amplicons are sorted by linear model residuals (p < 0.05 colored red). (D) The top 20 active amplicons with consistent activity across both linear and ratiometric models. Bars represent mean activity based on RNA/DNA ratio and individual replicates are shown as dots. Three amplicons were used for downstream validation in single-candidate deliveries (magenta). (E) Amplicon intersection with fetal brain epigenomic datasets including DNase Hypersensitive loci, H3K4me1, H3K4me3, H3K36me3, H3K9me3, and H3K27me3. Amplicons were divided into two groups based on the statistical significance of their activity in the MPRA. (F) Amplicon intersection with human neuron or glia ATAC-seq, vertebrate conserved elements, and digital transcription factor footprints in fetal brain, fetal lung, and K562 cells. Asterisks in E and F indicate significant enrichment for positive amplicons with annotation class (p < 0.05, permutation test).

We cloned all sequences of interest (referred to from here on as amplicons) from pooled human DNA via polymerase chain reaction (PCR) and Gibson assembly directly into the viral DNA plasmid backbone. Following batch cloning, we sequenced the pooled plasmid library, verifying presence of 345/408 (84.6 %) target amplicons. There was no obvious pattern among amplicons that did not make it into the batch cloned library, and we presume differences in PCR primer performance and general stochasticity were the primary drivers of batch cloning success. This cloned library of candidates was then packaged in an scAAV9(2YF) viral vector in the same manner as our miniMPRA.

Application of in vivo MPRA and assessment of reproducibility of biological replicates

We delivered the viral library to the left hemisphere of four neonatal mouse brains at P0 via direct injection into the prefrontal cortex and collected forebrain tissue seven days after transduction at P7. We isolated viral genomic DNA, representing input or delivery control, as well as total RNA, and generated amplicon sequencing libraries for the left hemispheres for each replicate as well as for the previral batch-cloned plasmid library. RNA samples were treated with DNase, to prevent cross-contamination, and tagmentation libraries were prepared to capture the 3’-UTR variable region. Following sequencing, de-duplicated aligned reads to the human genome were used to generate amplicon summary counts (Supplementary file 3). We then filtered the dataset by removing amplicons with raw counts less than 200 and mean amplicon library proportions less than 2^–15 in any DNA sample, leaving 308/345 (89 %) passing these quality control thresholds. We observed significant correlation between biological replicates (Pearson correlation, p < 0.001), although correlation was consistently higher for DNA replicates compared to RNA replicates (r > 0.824 for DNA comparisons, and r > 0.546 for RNA comparisons, Figure 2—figure supplement 1A, B). Correlation within DNA or RNA was higher than between library type, suggesting limited, if any, impact of cross-contamination. We also observed strong correlation between amplicon read counts in the pooled previral plasmid library (‘Library’) and genomic AAV DNA (‘DNA’) collected from each injected P7 brain (r > 0.804, p < 0.001, Figure 2—figure supplement 1A). To test the effect of PCR bias during library preparation on amplicon counts, we generated one technical RNA replicate, Sample 4–35, using a higher cycle count (Figure 2—figure supplement 1B), and found similar amplicon representation and dropout patterns as its matched lower-cycle technical replicate (Pearson r = 0.827, p < 0.001). Taken together, this demonstrates that viral packaging, neonatal delivery, P7 sample collection and processing, library generation, and sequencing did not substantially affect amplicon representation. Although correlation between amplicons generated from P7 cDNA (‘RNA’) was strong overall, particularly for amplicons with robust cDNA expression, each delivery replicate included a small subset of lower cDNA expression amplicons that showed replicate-specific cDNA dropout (Figure 2—figure supplement 1B). Amplicon representation in the previral library and P7 viral DNA was correlated with amplicon GC content (Figure 2—figure supplement 2A-B), indicating GC-based differences impact PCR and cloning efficiency. There was reduced strength in correlation between GC content and the MPRA RNA (Figure 2—figure supplement 2C-D) and no further GC bias arose between viral packaging, delivery, and recapture of the library, suggesting that GC bias originated from the batch cloning process. Input viral DNA and MPRA RNA amplicon counts were also generally correlated, indicating some basal MPRA transcription from the minimal promoter independent of candidate enhancer sequence activity, as reported in other MPRA studies (Lee, 2020; Ashuach et al., 2019).

Identification of putative enhancer activity in P7 brain based on MPRA analysis

Having confirmed reproducibility of input viral DNA and MPRA RNA collection, we performed activity estimation. We used two approaches for activity estimates: first, regression residual-based estimate to correct for background or basal activity, and second, via simple comparison of RNA to DNA amplicon representation.

First, we used the middle 80 % of amplicons (N = 248) ranked by RNA/DNA ratio (Figure 2—figure supplement 2E) to build a linear model to account for background MPRA RNA based on amplicon representation and GC content (Figure 2—figure supplement 2E-H). For regression-based estimation of activity, amplicon representation across each replicate was combined to generate a summary activity value. To identify amplicons with activity above expected based on background (i.e. presumed enhancers), we applied the model to the full set of amplicons (N = 308) to generate residual values that represent observed RNA levels compared to expected. p-values for amplicon activity versus background were empirically defined using the distribution of standardized residuals (Figure 2B, Figure 2—figure supplement 3, Supplementary file 4). We identified 41 amplicons (13 %) with significantly increased RNA (p < 0.05), suggestive of positive activity, with 17 passing with a false discovery rate (FDR) < 0.1. Amplicon group was confirmed to be a significant predictor of mean amplicon activity (One-way ANOVA, p = 0.006), with the highest activity reported for the PutEnh and FBDHS groups.

Second, we also estimated activity based on RNA/DNA ratio (Figure 2C), a common metric used to report activity in published MPRAs (Inoue and Ahituv, 2015; Klein et al., 2020; Ashuach et al., 2019). We found that 71 amplicons were considered active using the criteria RNA/DNA ratio > 1.5 and RNA/DNA standard deviation less than its mean. Amplicons that exhibited RNA/DNA ratio > 1.5 but did not pass the regression model-based p-value potentially included sequences with weaker enhancer activity. 78 % (32/41) amplicons identified using model residuals were also active in the ratiometric comparison. Among 41 positive amplicons with significant regression residual activity, 20 (49 %) had mean ratio – 1 s.d. >1.5 across individual replicates, representing the amplicons with the strongest and most consistent MPRA-defined activity (Figure 2D). For statistical testing of RNA/DNA ratios, we used a Wilcoxon rank sum, a non-parametric test comparing DNA vs RNA proportion values across delivery replicates. 43 amplicons passed p < 0.05 with confidence intervals greater than 0 (one-tailed for increased RNA), and all of them had Benjamini-Hochberg corrected FDR < 0.05. In comparison to the linear model approach, 26 amplicons were significant at p < 0.05 in the Wilcoxon rank sum test and in the linear model.

We compared the 41 model-based significant amplicons to the remainder of the library for intersection with fetal brain epigenomics datasets from the Roadmap Epigenomics Project (Bernstein et al., 2010). We found higher than expected intersection of significant amplicons with genomic loci characterized as DNase hypersensitive (DNase, p = 0.0355). We also observed that significant amplicons had increased enrichment for H3K4me1 (p = 0.0274), and H3K4me3 (p = 0.0131), epigenomic marks associated with transcriptional activity and enhancer function. We did not find enrichment for H3K36me3, a histone mark associated with gene bodies, or for H3K9me3 and H3K27me3, histone marks associated with heterochromatin (Figure 2E).

We also intersected amplicons with a human neuron and glia ATAC-seq dataset from the Brain Open Chromatin Atlas (BOCA) (Fullard et al., 2018), vertebrate evolutionary conserved elements (UCSC Genome Browser), and digital transcription factor (TF) footprints (Vierstra et al., 2020). Similar to Figure 2E, we found marginally significant increased overlap of significant amplicons with human neuron ATAC-seq peaks (p = 0.0497). We also found increased enrichment of significant amplicons in open chromatin in human glia ATAC-seq (p = 0.0159). There were no significant differences in enrichment among amplicon groups in TF footprints in fetal brain or lung, nor in K562 cell lines, nor for conserved elements (Figure 2F). We also examined each epigenetic mark as a co-variate of a general linear model, to see if it improved a regression model predicting cDNA levels considering all amplicons (N = 308). If cDNA levels in our MPRA are reflective of true enhancer activity, we would expect epigenomic signatures associated with enhancer elements to improve prediction of cDNA expression. Indeed, Fetal DNase hypersensitivity (p = 0.0002), H3K4me1 (p = 0.0002), adult neuron ATAC-seq (p = 0.0073), and fetal brain TF footprints (p = 0.04) were found to be significant, while signatures not relevant to enhancer activity in brain were not (Supplementary files 5 and 6). Fetal brain DNase hypersensitivity, H3K4me1, and human neuron ATAC-seq improved the regression model (reduced BIC), and fetal brain DNase hypersensitivity and H3K4me1 signatures had the strongest predictive power of amplicon activity (0.2 and 0.178 point-biserial correlations, respectively).

Based on the intersections with DNaseI hypersensitive and neuronal ATAC-seq signatures, there was a high representation of ascertained likely enhancers (20/41) among significantly active amplicons. These 20 active amplicons represented 15.9 % of the set of ascertained likely enhancers, while the remaining 21 active amplicons represented 11.5 % of predicted negative sequences. The difference in amplicons with MPRA activity (model-based p-value < 0.05) comparing ascertained positive and negative amplicon sets was not significant based on Fisher’s exact test (odds ratio of 1.45, one-tailed p-value = 0.176). The representation of ascertained likely enhancers was higher among the most significant amplicons with FDR < 0.1 (12/17). These 12 amplicons represented 9.5 % of the ascertained likely enhancers, while the remaining five active amplicons represented 2.7 % of the predicted negative sequences, resulting in an odds ratio of 3.73. The enrichment of this highly significant set of MPRA amplicons for pre-ascertained likely enhancers was statistically significant (Fisher’s exact test, one-tailed p-value = 0.011). Therefore, consistent with our expectations, enhancer ascertainment based on open chromatin patterns in the brain was associated with MPRA activity.

Lack of strong effects and insufficient power for allele-specific activity analysis

Our MPRA library included a subset of amplicons harboring SNPs that were either lead SNPs from GWAS studies (121 amplicons), or SNPs in LD with lead SNPs within associated intervals at CACNA1C (21 amplicons) and SCN1A (62 amplicons). As amplicons were cloned from pooled human population DNA, we anticipated capturing representative alleles for these SNPs in our MPRA library. Indeed, 439/440 of these SNPs, representing 313/345 cloned amplicons, were represented in our library before viral packaging, but only 147 of them had minor allele frequency above 0.1, critical for allele comparison. Among the eight amplicons with significant MPRA activity, none exhibited significant allelic differences (Figure 2—figure supplement 4G). We further used luciferase assays in two cell lines (HEK and SH-SY5Y) to compare haplotype activity for one MPRA active enhancer, finding no significant difference between the alleles of the CACNA1C SNP containing amplicon #3 (Figure 2—figure supplement 4H). We note that despite strong DNA reference allele frequency correlation in the viral library DNA, limited allele coverage, unbalanced allele frequencies, and reduced correlation of RNA allele frequencies resulted in lack of power to detect moderate to small allelic differences in our experiments (Figure 2—figure supplement 4, Supplementary file 7).

Activity reproduces between full MPRA and miniMPRA experiments

Since the PutEnh group in our MPRA contained human orthologs of the miniMPRA sequences, this experiment enabled comparison of human and mouse sequences in an in vivo mouse brain context and offered further validation of reproducibility for our MPRA results. Eight amplicons were included in both experiments, and we observed general correlation between activity in the mouse miniMPRA and the human orthologs in the full MPRA library. These results further indicate reproducible activity and that there is conserved function between orthologous mouse and human sequences tested in the same postnatal mouse brain context (Figure 2—figure supplement 5).

Confirmation of in vivo P7 cortex MPRA results for single candidate sequences

To validate enhancer activity in the mouse brain at P7, we next cloned individual amplicons for selected positive and negative hits from the MPRA into the same HspMinP-EGFP 3’-UTR oriented reporter and generated scAAV9 for each construct. Amplicon #161 (FBDHS group), an enhancer candidate that overlaps both a DNaseI hypersensitive site in fetal human brain and a copy number variant region near the autism- and epilepsy-associated gene SCN2A, and that displayed particularly strong activity in the screen (Figure 3—figure supplement 1A), showed consistent expression of EGFP in the mouse brain at P7 (Figure 3A, top row). EGFP expression driven by this amplicon was seen wherever the brain had been exposed to the virus (based on mRuby3 expression), with a trend toward greater density of EGFP⁺ cells observed in the lower-middle layers of the cortex (lower half of Layer IV and Layer V). On the other hand, a predicted negative sequence that did not display enhancer activity in the screen, amplicon #264, did not show expression of EGFP (Figure 3A, bottom row, Figure 3—figure supplement 1B).

Figure 3 with 2 supplements see all

Download asset Open asset

Functional validation of STARR-seq screen.

(A) Validation of positive and negative hits from in vivo AAV MPRA screen. Representative images of coronal sections of AAV-transduced P7 brains stained with an anti-GFP antibody is shown (left panels). Closeup of the boxed regions are shown in the panels on the right (from left to right: Red channel, mRuby3 injection control; Green channel, EGFP expression driven by candidate amplicon; Merge with DAPI in gray). The brain shown in the top row was transduced with AAV9-CAG-mRuby3 (injection delivery control) and scAAV9-HspMinP-EGFP carrying in the 3’-UTR Amplicon #161, a highly active amplicon in the AAV MPRA. In the bottom row, a similar transduction is shown for Amplicon #264, a negative control with no predicted enhancer activity that did not display activity in the in vivo AAV MPRA. (B) Functional validation of enhancer activity in different cell types. Brains were transduced as in A with AAV9-CAG-mRuby3 and scAAV9-HspMinP-EGFP carrying either Amplicon #161 or *Dlx* in the 3’-UTR. Brains were collected at P7 and stained for GFP and Ctip2, a transcription factor necessary for axon development in excitatory projection neurons in Layer V during embryonic development. Representative staining of coronal sections is shown (left panels). Zoomed in views of boxed regions are shown in single-channel images in the panels on the right (Green, EGFP; Red, mRuby3; Magenta, Ctip2). Ctip2 channel images are shown with EGFP⁺ cells outlined (top) and mRuby3+ cells outlined (bottom). (C) Quantification of Ctip2 co-labeling shown in B with additional animals co-transduced with AAV9-CAG-mRuby3 and AAV9-*Dlx*-βGlobinMinP-EGFP included for comparison. Individual GFP+ and mRuby3+ cells were counted and scored for whether each cell contained a Ctip2-positive nucleus. Cell counts were summed across images for the same brain. Data is presented as mean ± SEM for the fraction of fluorescent cells that are Ctip2⁺. Cells that expressed EGFP under the control of the inhibitory interneuron enhancer *Dlx* displayed a lower frequency of Ctip2⁺ nuclei compared to cells that drove EGFP under the control of amplicon #161 or drove mRuby3 under the control of the general mammalian promoter CAG (n = 5 animals co-injected with HspMinP-EGFP-#161 and CAG-mRuby3, four animals co-injected with HspMinP-EGFP-*Dlx* and CAG-mRuby3, and two animals co-injected with *Dlx*-βGlobinMinP-EGFP and CAG-mRuby3).

To further validate enhancer function in excitatory glutamatergic neurons, we counter-stained for Ctip2, a transcription factor involved in axonal development in excitatory cortical projection neurons in Layer V of the cortex (Arlotta et al., 2008; Leyva-Díaz and López-Bendito, 2013). Although expression of Ctip2 is not exclusive to excitatory neurons in adult mice (Nikouei et al., 2016), it is commonly used as an excitatory Layer V marker during embryonic and early postnatal development (Leyva-Díaz and López-Bendito, 2013; Alcamo et al., 2008; Leone et al., 2015; Gompers et al., 2017). We reasoned that if our in vivo MPRA reporter construct accurately reproduced cell-type specific enhancer activity, then we should see a difference in the Ctip2 overlap with EGFP driven by amplicon #161 compared to EGFP driven by Dlx. Indeed, after counterstaining brain sections transduced as described above with antibodies for Ctip2, we found that EGFP⁺ cells driven by Dlx exhibited significantly lower frequencies of Ctip2⁺ nuclei compared to either CAG-driven mRuby3⁺ cells or EGFP⁺ cells driven by amplicon #161 (Figure 3B and C). These results demonstrate that our MPRA could accurately reflect enhancer activity of particular candidate sequences in vivo.

Dissection of regulatory elements within the third intron of CACNA1C

Our library included amplicons spanning across a psychiatric disorder-associated LD interval within the ~330 kb third intron of the gene CACNA1C, which encodes the α1 subunit of the L-type voltage-gated calcium channel Ca_V1.2. This region contains previously in vitro defined regulatory elements (Eckart et al., 2016; Roussos, 2014) harboring schizophrenia- or bipolar disorder-associated SNPs, predominantly rs1006737 (Ferreira et al., 2008; Sklar, 2008), rs2007044 (Ripke, 2014), rs4765905 (Hamshere et al., 2013), and rs4765913 (Ripke, 2014). Via MPRA, we assessed the activity of 17 amplicons within the CACNA1C intron covering SNPs in LD (r² >0.8). As a comparison set, we also included five amplicons covering SNPs in LD (r² >0.8) associated with a non-neuronal SNP associated with hematocrit (rs7312105) (van der Harst et al., 2012).

Three amplicons within the CACNA1C psychiatric disorder LD interval drove significant RNA transcript expression in both linear and ratiometric models in our MPRA: amplicons #3 (overlapping rs1108075 and rs11062166), #6 (overlapping rs12315711 and rs2159100), and #7 (overlapping rs11062170 and rs4765905) (Figure 4A). In comparison, no amplicons from the hematocrit LD interval passed significance threshold for activity (Figure 4—figure supplement 1). We validated #3 and #6 (highlighted in blue, Figure 4A) for enhancer activity in postnatal brain using our single-candidate reporter construct strategy. We also validated lack of activity for #2, an amplicon in the same LD block that did not have significant enhancer activity based on the MPRA results (highlighted in red, Figure 4A). Similar to the negative controls and consistent with MPRA findings, #2 did not drive significant EGFP expression in the brains of P7 mice (Figure 4B, top panels). On the other hand, #3 and #6 drove detectable EGFP expression in cells throughout the cerebral cortex wherever viral exposure was detected by the co-injected CAG-driven mRuby3 positive control (Figure 4B, middle and bottom panels respectively). In a follow-up experiment, we transduced scAAV9-HspMinP-EGFP-#3 and AAV9-CAG-mRuby3 at P0 but waited to collect the brains until P28, at which time we observed that amplicon #3 drove EGFP expression in cortical neurons of adolescent mice (Figure 4C, Figure 4—figure supplement 2).

Figure 4 with 3 supplements see all

Download asset Open asset

Functional dissection of the large third intron of *CACNA1C*.

(A) UCSC Genome Browser representation of amplicons #1 through #7 in the third intron of *CACNA1C* (hg38, chr12:2,220,500–2,242,499). UCSC tracks for GENCODE v36 and 100 vertebrate conservation, normalized coverage of aligned reads for the previral library, and DNA and RNA samples for the four biological replicates are shown; y-axis scale is 0–50,000 reads. MPRA analysis is shown as graphs of linear model residuals and -log₁₀ transformed p-values. Three amplicons, #3, #6, and #7, were found significantly active in our assay. Amplicons which were tested in single-candidate experiments are highlighted (red for no activity in MPRA, blue for significant activity in MPRA) (B) Confocal images of single-candidate validation of amplicons #2, #3, and #6. Mice were transduced at P0 with two AAV vectors: one for an HspMinP-EGFP-3’-UTR enhancer reporter construct carrying the indicated amplicon and a second control vector, CAG-mRuby3. Brains were fixed at P7 and sectioned and stained with an antibody for EGFP for signal amplification. Tiled, whole section images are shown on the left. Closeup of boxed regions are shown in the panels on the right. Green, EGFP; red mRuby3; grey, DAPI. These experiments validated robust EGFP expression driven by the two positive MPRA hits (#3 and #6), with substantial EGFP reduction for the MPRA negative amplicon #2. (C) Mice were transduced with AAV including positive amplicon #3 and processed as in B, but were raised to P28 before fixing, sectioning, and staining.

These results suggest that the AAV MPRA implementation was effective at screening putative regulatory sequences for in vivo activity in the brain, with reporter expression concordant between MPRA results and single candidate tests of EGFP expression in P7 mouse forebrain for five individually validated amplicons, and that these results could be extended to study activity in later development.

Discussion

The ability to test in parallel the regulatory capacity of candidate sequences in vivo offers considerable potential for elucidating the role of enhancers in the developing brain and for efficient identification of enhancers that are capable of driving precise expression patterns. Here, we report successful rAAV-mediated delivery of a 3’-oriented parallelized enhancer reporter assay to early postnatal mouse brain and demonstrate the utility of this approach in screening human DNA sequences for regulatory activity in the brain. Via this MPRA, we identified novel presumed enhancers active in P7 mouse cortex, showcasing example applications including identifying regulatory sequences associated with ASD-associated loci, screening amplicons that include lead SNPs from human genetic studies, and comprehensive testing for enhancers harboring SNPs from disease-associated non-coding LD intervals. We show that amplicons active in our MPRA were more likely to have enhancer signature across functional genomics datasets and that orthologous mouse sequences tested in an independent parallel reporter assay showed strong activity correlation. Finally, we validated MPRA activity predictions via imaging of EGFP for four positive and two negative sequences in vivo in P7 brain, and we confirmed that activity for one of these positives continued to P28. This study provides a model for applying this powerful screening approach in vivo in mammalian brain.

Our study represents one of the first parallelized enhancer assays testing human sequences in early postnatal mouse brain via rAAV-mediated delivery. We validated MPRA performance, verifying both the capacity for the construct to drive characteristic cell-type restricted expression for a known interneuron enhancer, as well as demonstrating EGFP protein expression driven by novel putative enhancers that were active in vivo in P7 mouse cortex. Based on MPRA and validation experiments, the main sources of variation across MPRA replicates were rAAV transduction rate and injection site (Figure 1—figure supplement 4, Figure 3—figure supplement 2, Figure 4—figure supplement 3), highlighting the need for delivery controls to ensure reproducibility when applying MPRAs in vivo via viral delivery. We observed some variability in amplicon RNA dropout across replicates, likely due to a combination of transduction efficiency variability across animals, sensitivity recapturing amplicons with low viral representation, and PCR stochasticity (Kebschull and Zador, 2015). Based on the high degree of correlation between biological replicate #4 and its high amplification cycle technical replicate, we do not believe that PCR amplification bias explains this replicate-specific dropout. On the other hand, we consider it likely that stochastic differences in the number and population of transduced cells in each replicate would alter the apparent activity of amplicons in the assay. For this reason, it is critical that AAV MPRAs use high titer preparations and take advantage of serotypes with high infectivity for the target tissue to maximize the number and diversity of cell types sampled. While not an obvious issue here, the effect of PCR-based clonal amplification bias in other MPRAs has been shown to be reduced with the addition of barcodes and unique molecular identifiers to the reporter construct (Neumayr et al., 2019).

Our rAAV-based approach enabled efficient transduction of the mammalian central nervous system. Our approach differs from recent MPRA strategies which use lentiviral vectors (Klein et al., 2020; Gordon et al., 2020; Morgan et al., 2020; Inoue et al., 2017) in that AAV is expressed episomally while lentivirus must first integrate into the genome. Lentivirus is especially useful for in vitro applications with difficult to transfect cell types, as AAV is not very efficient at transducing cell lines (Ellis et al., 2013). AAV offers a complementary strategy to lentivirus in vivo, as well as considerable flexibility in targeting various tissue types based on capsid serotypes (Srivastava, 2016). AAV is gaining traction as a powerful method for enhancer-driven, cell-type-specific manipulations (Graybuck et al., 2021; Mich, 2021; Rubin et al., 2020).

Amplicons in this assay were approximately 900 bp long, representing some of the longer sequences tested in MPRAs to date. Our rationale, that longer sequences afford more native biological context to enhancer activity, appears in accordance with a recent study comparing MPRA designs (Klein et al., 2020), which finds that longer sequences add biological signal including enrichment of an RNA polymerase III catalytic subunit, histone-modifying enzymes, and increased transcription factor binding. In our vector design, ~900 bp was the maximum length an amplicon could be and still be efficiently packaged into an scAAV. Using traditional, single-stranded AAV vectors would increase packaging capacity, potentially with a trade-off of reduced transduction efficiency. However, increased length, especially when inserted into the 3’-UTR, may contribute to increased variability across replicates (Klein et al., 2020), and may drive mRNA degradation (Rabani et al., 2017) that would inhibit RNA transcript detection in the assay. We detected a number of amplicons that had reduced RNA compared to expected background levels, which may represent sequences with silencer or repressor activity such as those that have been reported in recent MPRA screens (Doni Jayavelu et al., 2020). However, because of both the insertion of these amplicons into the 3’-UTR and the length of the amplicons, the modified 3’-UTR may have caused transcripts to be subject to RNA degradation. Thus, we are hesitant to draw conclusions about these amplicons with lower than expected activity without further characterization. Nevertheless, while artifacts from inserting sequences into the 3’-UTR may impact the STARR-seq design and our results, we show that orthologous mouse sequences exhibited correlated presence and absence of activity and similarly that amplicons with absence or presence of MPRA activity across replicates consistently reproduced in independent single deliveries to the brain. Further, active amplicons were enriched for DNase Hypersensitivity Sites, H3K4me1, and H3K4me3 peaks generated from fetal human brain tissue, and with ATAC-seq peaks from FACS-purified neurons and glia from postmortem human brain. Overall, the combination of reproducible MPRA activity, enrichment for neuron-specific enhancer signatures, and activity validation in AAV experiments provide evidence that positive results of the MPRA indeed act as enhancers.

Parallelized enhancer assays such as the one reported here have the possibility to become fundamental for assessing active enhancers in the brain and offer great potential for functional dissection of sequence-based enhancer activity. The vast majority of sequence variants associated with genetic risk for neurodevelopmental and neuropsychiatric disorders are found in non-coding regions (Zhang and Lupski, 2015), many of which are presumed to be located in enhancers (Ripke, 2014). As an example, our assay enabled functional annotation of 17 intronic CACNA1C amplicons spanning regions harboring schizophrenia- and bipolar disorder-associated SNPs within strong linkage disequilibrium and identified three regions with enhancer activity in the brain. Notably, we detected no activity in our assay from amplicon #5 (LD group), which spans a region harboring the SNPs rs1006737 and rs2007044, two of the most statistically significant risk SNPs (Moon et al., 2018) from GWAS of schizophrenia and bipolar disorder (Figure 4A). Two of the three amplicons we identified as enhancers in P7 brain had prior evidence of enhancer activity in in vitro models (Eckart et al., 2016; Roussos, 2014), indicating our assay can reproducibly detect enhancer activity identified in orthogonal studies. The region spanning amplicon #6 was previously annotated as a putative enhancer in physical proximity to the CACNA1C promoter in human induced pluripotent stem cell (hiPSC)-derived neurons (Roussos, 2014). The region spanning amplicon #7 exhibited enhancer activity via luciferase assay in SK-N-SH cells (Eckart et al., 2016). In addition, #7 was predicted as a putative enhancer based on open chromatin signatures in both fetal and postmortem human brain (Dunham, 2012; Bernstein et al., 2010; Fullard et al., 2018). Finally, we show that activity of one of these CACNA1C intronic enhancers continues at P28, highlighting the potential to apply the MPRA and single enhancer AAV testing at later ages. Although our design was not well powered to compare the activity of sequence variants (Figure 2—figure supplement 4), future studies using allele-balanced libraries, libraries with reduced complexity, or libraries with allele-specific barcodes could greatly improve the sensitivity and power of this assay to evaluate the effects of sequence variation on enhancer activity. We attempted to assess allelic differences in enhancer function for amplicon #3 using a luciferase assay, but we found no difference in activity between the reference and variant SNPs. However, disease-relevant SNPs may alter enhancer function by modulating activity in specific cell-type or developmental contexts which were not replicated in the in vitro luciferase assay. For this reason, in vivo parallel functional assays will be critical in understanding how sequence variation within enhancers contributes to altered gene expression, including for the disease-associated CACNA1C locus.

In summary, our in vivo parallelized enhancer reporter assay in P7 mouse brain enabled us to identify novel enhancers active in early postnatal mouse brain and pinpoint potential regulatory regions where disease-associated sequence variation (e.g. GWAS SNPs) may contribute to transcriptional pathology. Our results highlight the opportunities gained by in-depth functional dissection of enhancers in the developing brain via in vivo adaption of MPRAs, toward deeper understanding of enhancer biology across neurodevelopment and in the etiology of neuropsychiatric disorders.

Share this article

Cite this article

Designing and validating 3’-UTR enhancer reporter AAV assay.

In vivo AAV MPRA yields amplicons capable of enhancing transcription, enriched for signatures associated with enhancers.

Functional validation of STARR-seq screen.

Functional dissection of the large third intron of CACNA1C.

Author details

Jason T Lambert

Contribution

Contributed equally with

Competing interests

Linda Su-Feher

Contribution

Contributed equally with

Competing interests

Karol Cichewicz

Contribution

Competing interests

Tracy L Warren

Contribution

Competing interests

Iva Zdilar

Contribution

Competing interests

Yurong Wang

Contribution

Competing interests

Kenneth J Lim

Contribution

Competing interests

Jessica L Haigh

Contribution

Competing interests

Sarah J Morse

Contribution

Competing interests

Cesar P Canales

Contribution

Competing interests

Tyler W Stradleigh

Contribution

Competing interests

Erika Castillo Palacios

Contribution

Competing interests

Viktoria Haghani

Contribution

Competing interests

Spencer D Moss

Contribution

Competing interests

Hannah Parolini

Contribution

Competing interests

Diana Quintero

Contribution

Competing interests

Diwash Shrestha

Contribution

Competing interests

Daniel Vogt

Contribution

Competing interests

Leah C Byrne

Contribution

Competing interests

Alex S Nord

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading