Q-STARZ: Quantitative Spatial and Temporal Assessment of Regulatory element activity in Zebrafish

Noncoding regions of the genome harbouring cis-regulatory elements (CREs) or enhancers drive spatial and temporal gene expression. Mutations or single nucleotide polymorphisms (SNPs) in enhancers have been widely implicated in human diseases and disease-predispositions. However, our ability to assay the regulatory potential of genetic variants in enhancers is currently very limited, in part because of the need to assay these elements in an appropriate biological context. Here, we describe a method for simultaneous quantitative assessment of the spatial and temporal activity of wild-type (Wt) and disease-associated, mutant (Mut) human CRE alleles using live imaging in zebrafish embryonic development. We generated transgenic lines harbouring a dual-CRE dual-reporter cassette in a pre-defined neutral docking site in the zebrafish genome. Using this single transgenic cassette, the functional activity of each CRE allele is reported via expression of a specific fluorescent reporter, allowing the simultaneous visualisation of the activity of both alleles. This can reveal where and when in embryonic development the wild-type allele is active and how this activity is altered by the disease-associated mutation.


INTRODUCTION
Mutations or single-nucleotide polymorphisms (SNPs) in noncoding regions of the human genome functioning as cis-regulatory elements (CREs) or enhancers have been widely implicated in human diseases and disease-predispositions [1,2].
Disease-associated sequence variation in enhancers can alter the transcription factor (TF) binding sites, leading to aberrant enhancer function and altered target gene expression [2]. Application of next generation sequencing technologies combined with molecular genetic approaches has enabled widespread identification of presumptive CREs and associated putative pathogenic mutations in patient cohorts [3]. However, incomplete understanding of the TF binding potential of CREs impedes the functional assessment of pathogenicity of these variants compared to genetic variants in coding regions. Thus, determining how mutations in the vast stretches of the human noncoding genome contribute to disease and disease-predisposition remains a huge unmet challenge.
Functional analysis of enhancer activity, and assessment of the impact of diseaseassociated variation on this activity, depends on the availability of the right TFs in the right stoichiometric concentrations, which is only precisely captured in vivo. Enhancerreporter transgenic assays have been widely employed in a variety of model organisms, including the mouse, for in vivo assessment of enhancer function [4][5][6][7][8].These assays however can be affected by the random integration of transgenes and have limited application for studying the temporal aspects of enhancer function over the complete time course of embryonic development since, for example, live imaging is challenging due to the opaqueness of the mouse embryo and its in utero embryonic development.
Zebrafish (Danio rerio) is a highly suitable in vivo vertebrate model for visualizing tissue-specific enhancer activity. Robust transgenesis methods allow rapid generation of transgenic lines yielding transparent embryos which develop externally [9,10]. The activities of a large number of putative human and mouse CREs have been assessed in transgenic zebrafish models, irrespective of the primary sequence conservation of the mammalian CREs in zebrafish [5,[11][12][13][14][15][16]. However these assays were based on Tol2-recombination which mediates random integration of the CRE-reporter cassette in the zebrafish genome [17]. The measured CRE activities were strongly influenced by the variable site and copy number of integrations, necessitating analysis of each element in multiple transgenic lines and precluding quantitative assessment of CRE activities. These biases can be alleviated by targeted integration of the transgenic cassette into pre-defined neutral sites in the zebrafish genome using phiC31-mediated recombination [18][19][20].
Previously, we developed a system in which dual fluorescence CRE-reporter zebrafish transgenics allow for direct comparison of the in vivo spatial and temporal activity of wildtype (Wt) and putative SNP/mutation (Mut) bearing CREs in the same developing embryo [5]. The functional output from each CRE version (Wt/Mut) is visualized simultaneously as eGFP or mCherry signal within a live developing embryo bearing both transgenes. This enables unambiguous comparison of the activity of both wildtype and mutant CREs in a developmental context, simultaneous assessment of multiple separate elements for subtle differences in spatio-temporal overlap, and the validation of putative TFs by analysing the effect of morpholino-mediated depletion of the putative TF on CRE activity [5]. The assay had clear advantages over other conventional CRE-reporter transgenic assays, notably rapid, unambiguous detection of subtle differences in CRE activities using a very low number of animals. However, as the CRE alleles were on separate constructs randomly integrated into the zebrafish genome, the assay was not suitable for quantitative assessment of altered CRE activity. Furthermore, multiple transgenic lines had to be analysed for each CRE to eliminate any bias arising from the site of integration.
Here we describe Q-STARZ (Quantitative Spatial and Temporal Assessment of Regulatory element activity in Zebrafish), a new and significantly improved design of our previous transgenic reporter assay, based upon targeted integration of a dual-CRE dual-reporter cassette into a pre-defined site in the zebrafish genome ( Fig. 1). A unique feature of this design is the single transgenic cassette containing both wildtype and mutant CREs, separated by strong insulator sequences, with the transcriptional potential of both CREs read out as expression of different fluorescent proteins. Qualitative and quantitative activity of the two CRE alleles is analyzed from eGFP/mCherry fluorescence in real-time by live imaging of embryos obtained from the founder (F0) lines bearing the dual CRE dual-reporter cassette. This allows robust, unbiased assessment of spatial and temporal activities of both CREs using a single transgenic line, thereby reducing animal usage by up to 75% compared to the previous design. We utilize disease-associated mutations in well characterized CREs from the PAX6 and SHH loci to demonstrate the salient features of the Q-STARZ method.

Targeted integration of dual-CRE dual-reporter transgenic cassette in the zebrafish genome
Analysis of enhancer activities in conventional reporter assays in zebrafish suffer from bias arising from position effects due to the random integration of the transgene at Tol2 sites naturally distributed at a low frequency throughout the zebrafish genome [9,17]. Most assay designs also harbour only one CRE per transgene introducing ambiguity in the analysis when comparing CREs with highly similar activities or subtle changes in sequence (e.g. disease-associated mutations or SNPs). Q-STARZ is a versatile, robust and cost-effective analysis pipeline designed to alleviate both these limitations ( Fig. 1).
We first generated 'landing lines' harbouring phiC31 attB integration sites at inert positions in the zebrafish genome. Using Tol2-mediated transgenesis, we integrated 'landing pads' at random sites in the zebrafish genome ( Fig. 1, supplementary fig. 1).
To visualise successful integration events, the landing pads contain 'tracking CREs' (Supplementary table 1) driving expression of a 'tracking reporter gene'. These CREs had previously well-characterised activities, enabling us to select transgenic lines devoid of bias arising from the site of integration [5]. We assessed reporter gene expression in F1 embryos derived from several independent F0 transgenic lines for each tracking CRE (Fig. 2, supplementary table 2). F1 embryos in which the activity of CRE was not influenced by the site of integration were raised to adulthood to establish 'landing lines' presumed to be harbouring the phiC31 attB sites in an inert position of the zebrafish genome (Fig. 1B). The CRE activities were observed to be highly influenced by the site of integration in F1 embryos derived from founder lines bearing SOX9-CNEa and Pax6-SIMO CREs, but we obtained landing lines with clean, single-site integration using Shh-SBE2 as the tracking CRE (Fig. 2, supplementary table 2). Shh-SBE2 is a forebrain enhancer driving Shh expression in the hypothalamus [21]. Based on these observations, we decided to use the Shh-SBE2 landing line for all subsequent experiments described in this study. The precise integration site of the landing pad in the Shh-SBE2 line was determined using ligation mediated PCR (LM-PCR) (Supplementary fig. 2).
In the second part of the Q-STARZ pipeline, we generated a 'dual-CRE dual-reporter replacement construct' containing two CRE-reporter cassettes separated from each other by strong insulator sequences (Fig. 1A, supplementary fig 1B). The replacement construct was co-injected with mRNA encoding phiC31 integrase into F2 embryos derived from the Shh-SBE2 landing line. Recombination-mediated cassette exchange between the attB sites on the landing pad construct and attP sites on the replacement construct integrates a single copy of the dual-CRE dual-reporter cassette at the predefined site in the zebrafish genome (Fig. 1B). Injected embryos were scored for loss of Shh-SBE2 driven CRE activity in the forebrain and gain of mosaic eGFP and mCherry signals from the replacement CRE-reporter cassette. Selected embryos were raised to sexual maturity to establish independent founder transgenic lines for each replacement cassette analysed in this study (Supplementary table 2). Activity of both CREs in the replacement construct was visualised simultaneously as eGFP or mCherry signals by live imaging of F1 embryos derived from outbreeding the founder lines with wild-type zebrafish (Fig. 1B). Detailed protocols for the various steps described in this section are provided in materials and methods.

Robust, quantitative assessment of CRE activity using Q-STARZ
A key feature of Q-STARZ is the simultaneous assessment of activities of the two CREs present on the replacement cassette. In order to prevent cross-talk between the two enhancers, a well-characterised insulator sequence from the chicken genome, chicken b-globin 5'HS4 (cHS4) [22,23] was placed between the two CRE-reporter cassettes ( Fig. 1A, supplementary fig. 1). We optimised the assay using replacement constructs bearing two CREs with previously well-characterised tissue-specific activities from the PAX6 regulatory domain (Fig.3). PAX6 is a transcription factor with vital pleiotropic roles in embryonic development [4,24,25]. Over 30 CREs have been characterised upstream and downstream of the PAX6 transcription start site which coordinate precise spatial and temporal PAX6 expression in the developing eyes, brain and pancreas [2]. We selected PAX6-7CE3 and PAX6-SIMO for this analysis as they have well established and highly distinct tissue-specific activities during zebrafish embryogenesis. PAX6-7CE3 drives expression in the hindbrain and neural tube from 24hpf-120hpf, while PAX6-SIMO activity is in developing lens and forebrain from 48hpf-120hpf [14,16] (Supplementary table 1).
When the two CRE-reporter cassettes were separated by a 'neutral' sequence -a randomly selected region from the mouse genome with no insulator activity, we observed complete crosstalk of the two CRE activities (Fig. 3A, supplementary fig. 3, and supplementary table 2). We also performed a dye-swap experiment wherein the eGFP and mCherry reporters were swapped between the two CREs. We observed no significant difference in CRE activities in the dye swap experiment, indicating no bias was introduced in the analysis by varying signal intensities from the two fluorophores used (Supplementary fig. 3). Next, we substituted the neutral sequence with one, two or three tandem copies of the cHS4 insulator. Enhancer blocking activity of this insulator has been attributed to its ability to bind CTCF [26]. Crosstalk between the two enhancer-reporter cassettes was progressively reduced with increasing copies of cHS4, with complete insulation achieved in replacement cassettes bearing three copies (3xcHS4) ( was active in the rostral part of forebrain while Shh-SBE4 activity was restricted to caudal forebrain (Fig. 4B, supplementary movie 2). This analysis highlights the importance of simultaneous visualization of CRE activities in the developing embryo to define the precise spatial and temporal activity of each CRE.

Robust assessment of the effects of disease-associated mutations on CRE activity
As well as qualitative comparison of activity between two different CREs, a key strength of the Q-STARZ pipeline is its suitability for discerning the precise effects of disease-associated mutations or SNPs within a specific CRE. We tested this in SBE2, a regulatory element that controls SHH expression in the developing forebrain, using a point mutation (C>T) identified in a patient with holoprosencephaly, and shown to abrogate the activity of SBE2 in the rostral hypothalamus of the mouse (Fig. 5A) [5,27]. We simultaneously visualised the activities of the human Wt(C) and Mut(T) SBE2 alleles in our dual-CRE dual-reporter system by live imaging of transgenic zebrafish embryos from 24hpf -72hpf (Fig. 5B, supplementary movie 3). We detected no difference in the activities of the two alleles in very early development until ~40hpf.
However, from ~ 48-72hpf, activity of the alleles started to diverge. Expression driven by the Wt allele was observed in the developing rostral and caudal hypothalamus of transgenic embryos while the Mut allele was only active in the caudal hypothalamus, indicating that the mutation disrupts rostral activity of the SBE2. Upon quantification of reporter gene expression associated with each allele, we observed no significant difference in activity was observed between the two alleles at 28hpf. However at later stages of development (48hpf and 72hpf) the mutant allele failed to drive reporter gene expression in the rostral hypothalamus, and had significantly weaker activity in the caudal hypothalamus (Fig. 5C).Our analysis thus unambiguously and precisely uncovered where and when in embryonic development the mutation associated with holoprosencephaly alters the enhancer activity of SBE2.

DISCUSSION
The noncoding region of the human genome is estimated to contain approximately one million enhancers [28,29]. The widespread application of whole-genome sequencing for understanding genetic diseases (rare, common and acquired -i.e. cancer), combined with genome-wide identification of chromatin signatures associated with active enhancers, has led to the identification of a large number of putative enhancers with disease-associated or disease risk-associated sequence variation [1,2,30,31]. A complete understanding of how these sequence changes alter enhancer function is a necessary first step towards establishing roles of the CREs in the aetiology of the associated disease. Thus there is a pressing need for rapid, cost effective assays for robust unambiguous comparisons of mutant CRE alleles with the activities of wild-type alleles. Importantly, this has to be done in the appropriate context, relevant to the biology of the associated disease. CRE activity depends on precise stoichiometric concentrations of specific transcription factors, which is only achieved in the right physiological context inside a developing embryo or in cell lines that closely model the cellular phenotypes of developing tissues [32,33].
The Q-STARZ assay we describe here is highly versatile and enables unambiguous assessment of human tissue-specific CRE function in vivo at all stages of early embryonic development in a vertebrate model system. A distinctive feature of the assay is the targeted integration of a single transgenic cassette bearing two independent CRE-reporter units into a pre-defined inert site in the zebrafish genome.
We establish an analysis pipeline that enables simultaneous robust qualitative and quantitative analysis of enhancer function without any bias from position effects or copy number variation between the two CREs analysed.

Similar methods of targeted integration of enhancer-reporter transgenic cassettes
have been developed for zebrafish as well as mouse models [18,34]. Q-STARZ however offers a unique advantage when analysing the effects of disease-associated sequence variation on CRE function by enabling direct comparisons of activities of wild-type and mutant alleles inside the same, transparent developing embryo using live imaging.
Using CREs with previously well-established activities we demonstrate that inclusion of three tandem copies of a strong insulator sequence in the replacement construct robustly prevents cross talk between the two CREs analysed. This feature enables direct comparisons of the spatial and temporal dynamics of both CREs by simultaneous visualization of functional outputs in live embryos at all stages of development. We demonstrate here that this can uncover the precise sites and time points in embryonic development where the CRE functions are unique and where they overlap with each other. Q-STARZ is therefore an ideal tool for generating a detailed cell-type specific view of CRE usage during embryonic development. This will enhance understanding of the roles of CREs in target gene regulation, particularly for the complex regulatory landscapes of genes with key roles in development like PAX6 and SHH. Analysis of CREs derived from these loci in conventional transgenic assays has revealed multiple CREs apparently driving target gene expression in the same or highly overlapping tissues and cell-types. This has led to the concept of redundancy in enhancer function conferring robustness of expression upon genes with key roles in embryonic development [35][36][37]. However, our analysis of the Shh-SBE2 and SBE4 enhancers, previously reported as forebrain enhancers with overlapping functions, reveals subtly distinct spatial and temporal activity domains of each enhancer during development. Based on these results, we hypothesize that there are small but important differences in the timing of action or precise localisation in cell-types within the forebrain where these enhancers exert their roles that are overlooked when analysed independently in conventional transgenic assays.
Finally, we demonstrate that Q-STARZ can robustly detect differences in activities of mutant and wild-type CRE alleles. Live imaging of transgenic embryos carrying a reporter cassette with a previously validated disease-associated point mutation in the SHH-SBE2 enhancer revealed the loss-of-activity of the mutant allele in the rostral hypothalamus compared to the wild-type CRE. This recapitulates a similar loss of rostral activity of the SBE2 mutant that has been previously reported in mouse transgenic assays [27]. However, since we could visualise the activities of both the wild-type and mutant alleles simultaneously in the same embryos in real-time, we were able to determine the precise time point in development when the mutation affects CRE function. We propose that Q-STARZ will be a powerful tool to define the precise cell-types and stages of development where CRE function is affected by mutations or SNPs identified by GWAS and other studies, thus this could significantly improve our ability to discern potentially pathogenic and functional sequence variation from background human genetic variation, which is currently a major challenge for human genetics.

Ethics statement
All zebrafish experiments were approved by the University of Edinburgh ethical committee and performed under UK Home Office license number PIL PA3527EC3; PPL IFC719EAD.

Generation of landing pad and dual-CRE dual-reporter replacement vectors
All the constructs in this study were generated using the Gateway recombination cloning system (Invitrogen). PCR primers with suitable recombination sites were used for amplification of CREs from the genomic DNA (Supplementary table 1

Generation of zebrafish transgenic lines
Zebrafish were maintained in a recirculating water system according to standard protocols [38]. Embryos were obtained by breeding adult fish of standard stains (AB ) and raised at 28.5°C as described [38]. Embryos were staged by hours post fertilization (hpf) as described [39]. Final CRE-reporter plasmids were isolated using Qiagen miniprep columns and were further purified on a Qiagen PCR purification column (Qiagen), and diluted to 50 ng/ml with nuclease free water. Tol2 transposase mRNA and phiC31 integrase mRNA were synthesized from a NotI-linearized pCS2-TP or pcDNA3.1 phiC31 plasmid, respectively [40,41] using the SP6 mMessage mMachine kit (Ambion), and final RNA diluted to 50 ng/ml. Equal volumes of the reporter construct(s) and the transposase RNA were mixed immediately prior to injections. 1-2 nl of the solution was micro-injected per embryo and up to 200 embryos were injected at the 1-to 2-cell stage. Embryos were screened for mosaic fluorescence at 1-5 days post-fertilization i.e. 24-120 hpf and raised to adulthood.
Germline transmission was identified by outcrossing sexually mature F0 transgenics with wild-type fish and examining their progeny for reporter gene expression/ fluorescence. F1 embryos from 3-5 F0 lines showing the best representative expression pattern for each construct were selected for confocal imaging (Fig. 1B). A few positive embryos were also raised to adulthood and F1 lines were maintained by outcrossing. A summary of the number of independent lines analysed for each construct and their expression sites is included in supplementary table 2.

Mapping of transgene integration site in the landing lines
Transgenic embryos obtained from lines harbouring the landing pad vectors were sorted into eGFP-positive and eGFP-negative groups. Genomic DNA was purified from ~ 100 pooled embryos using QIAGEN DNeasy blood and tissue kit (Cat No./ID: 69504). Ligation-mediated PCR (LM-PCR) [42] was used for mapping the landing pad integration site using previously published protocol [43]. 1μg of genomic DNA was digested with either NlaIII, BfaI or DpnII and purified using a Qiagen QIAquick PCR purification kit (Cat No./ID: 28104). A 5μl aliquot was added to a ligation reaction containing 150 μmoles of a double stranded linker. Ligations were performed using high concentration T4 ligase (NEB, M020S) at room temperature for 2-3 hours.
The first round of the nested PCR was performed using linker primer 1 with either Tol2

Imaging of zebrafish transgenic lines
Embryos for imaging were treated with 0.003% PTU (1-phenyl2-thio-urea) from 24hpf to prevent pigmentation. Embryos selected for imaging were anaesthetized with tricaine (20-30mg/L) and mounted in 1% low-melting point (LMP) agarose. Images were taken on a Nikon A1R confocal microscope and processed using A1R analysis software. Time-lapse imaging was performed on an Andor Dragonfly spinning disk confocal, and processed using Imaris (Bitplane, Oxford Instruments) and ImageJ.      (CH) hypothalamus for F1 embryos derived from founder lines bearing the replacement construct described in (A). At 28hpf, no significant difference in activity was observed between the two alleles. However, at later stages of development (48hpf and 72hpf) the mutant allele failed to drive reporter gene expression in the RH, and had significantly weaker activity in the CH at 72hpf. ****p<0.0001, **p<0.01