Transcriptional network orchestrating regional patterning of cortical progenitors

Significance Development of cortical areas begins in cortical stem cells through the action of morphogens controlling the graded expression of transcription factors (TFs). Here, we have systematically identified the TFs and gene regulatory elements (REs) that together control regional pattering of the cortical progenitor zone; these data have led us to propose a cortical regionalization TF network. To identify REs active in this network, we performed TF chromatin immunoprecipitation followed by sequencing (ChIP-seq) and chromatin-looping conformation experiments as well as assays for epigenomic marks and DNA accessibility in purified ventricular zone (VZ) progenitor cells in wild-type and patterning mutant mice. This integrated approach has laid the foundations to identify a TF network and cortical VZ REs involved in cortical regional patterning.


Emx2 antibody production
A guinea-pig Emx2 antibody was generated by Genscript. It was raised against a peptide of the Nterminus of mouse Emx2 (aa 1-155) which specifically excluded the homeodomain of Emx2.
ChIP-seq libraries were prepared using the Ovation Ultralow DR Multiplex System (NuGEN). Generated libraries were size selected using Blue Pippin (centered around 300 bp), QC tested on a Bioanalyzer (Agilent) and sequenced as single-end 50 nucleotide reads on a HiSeq 4000 (Illumina) at the Center for Advanced Technology at UCSF (http://cat.ucsf.edu/).

ChIP-seq Computational Analysis
Clustering, base calling, and quality metrics were performed using standard Illumina software. Sequencing libraries were analyzed for overall quality and were filtered, and reads were mapped to the mouse genome (mm9) using Burrows-Wheeler Alignment (BWA) (9).

Pairwise Pearson Correlation
Pearson correlation between aligned read counts of pairs of TFs was determined using DeepTools (10) to show association between TFs and between different TF replicates.

Motif analysis
De novo motif discovery and enrichment was performed using HOMER version 4.9 (11) in the called peaks for each individual TF, using standard parameters, 300 bp up-and downstream of TF peaks. We compared the significant motifs discovered with the JASPAR (25) database. Motif enrichment was determined for all motifs present in the HOMER known motif database with p-value < 10 -100 . We established the average motif coverage enrichment plot around 300 bp of peaks for each TF using custom R scripts.

Gene Ontology analysis
Gene Ontology analyses were conducted using the GREAT algorithm.

VISTA enhancer annotations
Relevant VISTA enhancers were annotated by at least two experts in the field. Their region of activity was ascertained on whole-mount embryos and where necessary, using sections. Gradients of activity were only defined when they were clear and when consensus was obtained between at least two observers.

Gradient modelling
We established the model associating TF binding, enhancer spatial activity gradients, and enhancer regional activity by determining the pairwise correlation of among the different factors. The TF binding combinations and other factors were intersected using custom R scripts, and the correlation matrix and plot with R ggcorrplot package. Only correlations with p < 0.01 was shown. Non-relevant associations were manually removed.

Histone mark ChIP
We used Cell Trace Yellow Cell Proliferation Kit (ThermoFisher, #C34567) or Cell Trace CFSE Cell Proliferation Kit (ThermoFisher, #C34554) to label the ventricular zone of Pax6 and Emx2 mutants and their control littermates at E12.5. FlashTag labeling was conducted by injecting 0.3 µl of 10 mM of a carboxyfluorescein succinimidyl ester (CellTrace Yellow or Cell Trace CFSE, ThermoFisher) bilaterally into the fourth ventricle of E12.5 embryos (12). Gentle manual pressure was then applied to the exterior of the embryonic head to promote even mixing of the dyes. After 20 minutes, the cortices were then dissected and papain-treated (Papain dissociation system, Worthington Biochemical Corporation) for 15-30 minutes at 37°C with rotation. After inhibiting the papain, dissociated cortical cells were resuspended in 4% FBS/1x-PBS and the singlet FTag-positive population was sorted using the BD FACS Aria III Cell Sorter at Helen Diller Cancer Building (UCSF). Approximately 200,000 cells were used for native ChIP as described in (13). Wild-type cells and mutant littermates were always injected, sorted and processed side by side using the same number of nuclei. Basically, nuclei were extracted from the sorted cells and digested for 8 mins with micrococcal nuclease (MNase, Sigma). Mono and di-nucleosomes were combined and used for ChIP of two epigenomic marks: H3 acetylated lysine 27 (H3K27ac, Abcam, ab472) and H3 trimethyl lysine-27 (H3K27me3, Active motif, 39157). After immunoprecipatation, DNA and libraries were prepared as for TF ChIP as described above.

ATAC-seq
ATAC-seq was performed on around 80,000 sorted nuclei. Basically, we fluorescently labeled the VZ of wild-type and mutant littermates using the FlashTag procedure as indicated above. After making nuclei, the pellet was resuspended in 25uL of Tagment DNA buffer and 2uL of enzyme (Tagment DNA Enzyme, Nextera DNA Library Prep Kit, 15028211, Illumina). Tagmentation was performed at 37°C for 30 mins without shaking. Samples were then purified using MinElute columns (Qiagen), PCR-amplified for 8-10 cycles using the NEB Next High Fidelity 2x PCR Master Mix (NEB, 72 °C 5 min, 98 °C 30 s, (98 °C 10 s, 63 °C 30 s, 72 °C 60 s) per cycle, held at 72 °C). The generated amplified libraries were purified on Ampure XP Bead (Beckmann Coulter) and bioanalyzed. Sequencing was carried out on a HiSeq4000 (50 bp PE, Illumina).

PLAC-seq
PLAC-seq libraries for E12.5 cortex were prepared similar to the previously published protocol (14). 3 to 7 million cells were used for each library. If the cells appeared aggregated, they were dissociated with gentle MACS dissociator or dounce homogenizer. Each PLAC-seq library was prepared using DpnII as the restriction enzyme and Dynabeads M-280 sheep anti-rabbit IgG (Invitrogen #11203D) mixed with 5ug of H3K4me3 (04-745, Millipore) for the chromatin immunoprecipitation step. Finally, libraries were prepared with the Illumina Tru-seq adaptors and the final libraries were sent for paired-end sequencing on the HiSeq X Ten (150 bp reads).

PLAC-seq data analysis using the MAPS pipeline
We applied the MAPS pipeline (15) to detect statistically significant H3K4me3-centric long-range chromatin interactions from PLAC-seq data. We only analyzed intra-chromosomal interactions for autosomal chromosomes, and identified chromatin interactions at 10Kb bin resolution between 20Kb and 1Mb. We first mapped the raw paired-end reads (i.e. fastq file) to the mm10 reference genome using bwa mem. Next, we applied filtering steps to remove PCR duplicates, low quality reads (MAPQ <= 30) and chimeric reads (15). We then split the remaining mapped reads into short-range (distance between pair ends within 1Kb) and long-range reads (distance between pair ends between 1Kb and 1Mb). We used the short-range and long-range reads to measure protein immunoprecipitation (IP) efficiency and detect longrange chromatin interactions, respectively. In addition, we collected ChIP-seq peaks called by MACS2 from cortical cells. Among all 10Kb bin pairs, we only examined those 10Kb bin pairs in which at least one 10Kb bin overlapped with MACS2 ChIP-seq peaks, since these 10Kb bin pairs are enriched by H3K4me3 antibody during the PLAC-seq experiment (16). We fitted a positive Poisson regression model on all selected 10Kb bin pairs with more than one raw count, taking into consideration multiple bias factors including 1D genomic distance between two interacting bins, restriction enzyme cut site frequency, GC content, mappability score, and H3K4me3 antibody IP efficiency measured by the number of short-range reads in each bin. After modeling fitting, we obtained expected contact frequency, p-value and false discovery rate (FDR) for each 10Kb bin pairs. Finally, we defined a 10Kb bin pair as a statistically significant long-range chromatin interaction if the raw contact frequency >= 12, normalized contact frequency (defined as observed contact frequency/expected contact frequency) >= 2, and FDR < 0.01. We further merged adjacent significant chromatin interactions together, and defined those isolated significant chromatin interactions as singletons. We applied a more stringent FDR threshold (FDR < 0.0001) among those singletons to reduce potential false positives.

Enhancer-Gene maps (definitions and annotation strategy).
Associations were previously defined by correlation between the H3K27ac profile of putative enhancers and the total-RNA profile of annotated genes across embryonic development. Two different maps were used, based on a dataset of 29 (17) or 66 (18) samples representing time-courses considering up to 12 tissues and 7 time points. Given a list of genomic regions, a custom C++ script was used to annotate each one of these regions to the overlapping putative enhancers in these two maps (if any) and to associate them to the computationally inferred target gene.

Validated enhancers from the VISTA Enhancer Browser (derivation and annotation strategy)
Human and murine validated elements available on August 25, 2017 on the VISTA enhancer browser (http://enhancer.lbl.gov) (19). Human regions (hg19) were lifted to the mouse genome (mm10) using liftOver (20). This was run with default parameters except for minmatch that was set to 0.1 for mouse to human conversions and to 0.95 for mapping between mm9 and mm10. After that, the same script and strategy outlined in the previous paragraph were used to annotated a given list of genomic intervals to the regions in VISTA.

PLAC-seq data annotation
Using the same script and strategy outlined in the previous paragraphs, each end of the interaction was annotated to any overlapping: (1) VISTA element; (2) the putative target gene, as inferred separately from the two maps described in the previous paragraph; (3) TF-binding events, as inferred by ChIP-seq; in case of multiple overlapping peaks, the region was assigned the peaks with the highest enrichment score; (4) the closest gene on the linear genome, using the TSS of RefSeq genes as landmark. Coordinates of RefSeq genes were downloaded from the UCSC genome browser (19) on May 30, 2015. Merging of the resulting annotations was performed using the statistical computing environment R v3 (http://www.rproject.org).

Defining the interactome for loci in the Cortical Regionalization TF Network
We defined genomic loci based on the farthest PLAC-seq interaction between promoters and pREs for each loci. In cases in which this chromatin binding domain was restricted to only one side (upstream or downstream) of the gene body (i.e. Bcl11a or Dmrt3, Figure S5), we added a 100kb buffer on the other side of the TSS or 5'UTR.

Transgenic enhancer assays
Transgenic assays were performed according to published methods (21,22) and the VISTA enhancer browser can be consulted here: https://enhancer.lbl.gov. A summary of the methodology can be found in Lindtner et al, 2019 (23).

Contact for Reagent and Resource Sharing
Further information and requests for resources should be directed to and will be addressed by the Lead Contacts, John L. R. Rubenstein (John.Rubenstein@ucsf.edu) and Alex S. Nord (asnord@ucdavis.edu).
(C) Table of regulatory elements (pREs) around the Nr2f1, Npas3 and Lmo3 loci, that may participate in rostral LVP patterning Shown are NR2F1 and PAX6 ChIP-seq peaks, differential epigenomic peaks in the Nr2f1 -/and Pax6 -/and pRE/TSS interactions (see Figures 4, 6, 7).  (B) Examples of cortical VISTA enhancers with graded patterns of activity. The wholemounts and sections for each example are placed along the central grid according to their annotated gradient of activity in the developing pallium. hs1035 has a DV and RC gradient; hs798 has a DV and CR gradient; hs636 has a RC and VD gradient; and finally, hs1172 has a CR and VD gradient.
(C) Correlalogram showing the expansion of the modeling presented in Figure 4D showing the different combinations of TF binding and their predictions of gradients of activity (p < 0.01)

Figure S4. Epigenomic profiling of CRTFN genes (Cortical Regionalization TF Network) in the cortical VZ.
(A) Histology of Flash-Tag staining of the cortical VZ at E12.5; note that the Flash-Tag labeled VZ cells do not overlap with TBR2 immunostained SVZ progenitors.
(B) Example FACs plots showing data for Flash-Tag (CFSE) positive progenitor cells prepared from the E12.5 cortex. The histogram of FITC counts shows a bi-modal FITC negative population (VZ -) and a FITC positive population (VZ + ). The dot plot depicting FSC vs. FITC shows the gating which was used to collect the FITC + (VZ + ) population. Black bar above VZ + correspond to the cells collected for further analysis.
(C) Overview of epigenomic enrichment in pREs with differential combinatorial TF binding (0-5 TFs) for CRTFN TF gene loci. Number of pREs at each locus are in brackets (n=#pREs). Percentage of total pREs are indicated by a teal histogram bar. Percentage of pREs with the following epigenetic marks are indicated by the following colors: ATAC = yellow; H3K27ac = dark green; H3K27me3 = red. See Figure  S4A which summarizes this data for each TF locus.  (B) Statistical analysis shows that the H3K27ac/H3K27me3 ratio is significantly lower for pREs in genomic loci of TFs with MP expression (red) compared to pREs for the non-MP TFs (purple) (p<0.05, unpaired two-tailed t-test).
-pallial VISTA enhancer wholemount (and sections if available) showing activity in the cortex at E11.5.
- Table with a comprehensive list of pREs that shows differential regulation in Pax6 -/-, Emx2 -/and Nr2f1 -/around the interactome of the relevant TF. Each pRE in this table is highlighted by a yellow column in the genome tracks below. The highlighted columns have orange numbers matched to the row number in this table.