A multi-omics approach to visualize early neuronal differentiation from hESCs in 4D

Summary Neuronal differentiation of pluripotent stem cells is an established method to study physiology, disease, and medication safety. However, the sequence of events in human neuronal differentiation and the ability of in vitro models to recapitulate early brain development are poorly understood. We developed a protocol optimized for the study of early human brain development and neuropharmacological applications. We comprehensively characterized gene expression and epigenetic profiles at four timepoints, because the cells differentiate from embryonic stem cells towards a heterogeneous population of progenitors, immature and mature neurons bearing telencephalic signatures. A multi-omics roadmap of neuronal differentiation, combined with searchable interactive gene analysis tools, allows for extensive exploration of early neuronal development and the effect of medications.


INTRODUCTION
Brain region and cell-specific transcriptional and epigenetic landscapes are fundamental for investigating disease mechanisms and therapeutic interventions. However, the role of epigenetic regulation on the establishment and maintenance of cellular identity during early neuronal differentiation is not well understood Yao et al., 2016). Studies of chromatin modifications and global open chromatin in early human brain, neuronal differentiation of human embryonic stem cells (hESCs) and organoid model systems have been insightful on the regulation of neuronal gene programs (Luo et al., 2016;Markenscoff-Papadimitriou et al., 2020;Reilly et al., 2015;Xie et al., 2013). Single-cell resolution analyses have further subtyped the neuronal cells present in human fetal brain development and organoids (Amiri et al., 2018;Eze et al., 2021;Trevino et al., 2020Trevino et al., , 2021Ziffra et al., 2021).
Developmental trajectories can only be spatiotemporally resolved by single-cell omics systematic studies which characterize cells at intermediate differentiation timepoints. Although neuronal differentiation of pluripotent stem cells (PSCs) is an established method to study early development, disease and neurotoxicity (Riemens et al., 2018), fewer omics studies have targeted 2D neuronal differentiation from hESCs and how well they recapitulate early human brain development.
Combining RNA-seq, global DNA methylation, single-cell RNA-seq and ATAC-seq data and analyzing the integration across four timepoints (4D analysis), we constructed a molecular timeline and correlated transcription factors (TFs) with time-and population-specific chromatin states in hESCs, early fate commitment and during differentiation. We provide access to the single-cell data in user-friendly, interactive web applications that enable visualization of gene cluster regulation during the neuronal differentiation protocol for novel insights and as a basis for future studies. This integrative analysis delineated transcription programs and identified over 26,000 putative cis regulatory elements that link to expressed genes during 2D neuronal differentiation.

Global expression profiles reveal neuronal differentiation and maturation signatures
To increase gene expression sensitivity, we performed bulk gene expression analysis with higher sequencing depth (Figures 2 and S1). Overall, we found 11,313 differentially expressed genes (DEGs) comparing cells from Day 0 to 20 (Table S1). More genes were differentially expressed during neural induction (Day 0 to 7), compared to the later stages, with self-patterning (Day 7 to 13) and maturation (Day 13 to 20) stages (Table S1). The most extensive transcriptional changes occurred between Day 0 and Stage I, with loss of pluripotency and gain of neuralization markers (Figures 2A, S2A, and S2F). We confirmed that bulk RNA-seq analysis for selected marker genes correlates well with ddPCR ( Figures 2B and S2G).
Specific gene expression patterns drive the loss of pluripotency towards neuronal maturation ( Figure 2). These may be steep decreases after neuronal induction, as seen for LIN28A and CDH1, expression peaking at Day 7 or Day 13 (such as RAX and FOXG1, respectively) or gradual increase for genes, such as OTX2 and SOX6 (Figures 2A, S2A, and S2B). Moreover, the expression of neuroectodermal patterning Wnt/b-Catenin negative regulator AMER2 (Pfister et al., 2012), and neuronal differentiation marker STMN2 (Wang et al., 2019b) increase at Day 13, and increase further at Day 20 (Figures 2A,S2D,and S2E). On Day 20 we also find genes correlated to specific neuronal types, such as GRIA1, and GNRH1 ( Figure S2F) and the top 50 differentially expressed genes between Days 0 and 20 were plotted as a heatmap ( Figure 2C).
We further identified biological processes (BP) enriched among the DEGs with gene ontology (GO) analyses ( Figure 2D; Table S2), which revealed upregulated BPs related to neuronal maturation from Day 0 to 20 ( Figure 2D). Stage-specific GO analyses revealed enrichment of BPs involved in neurogenesis and neuron development differentiation at stage I, and BPs involving synaptic organization and neurotransmitter regulation and secretion at the end of the maturation stage (Table S2).
Deconvoluting these changes temporally, the highest number of DMCs was observed between Day 0 and 7 (n = 161,600), with fewer changes in the self-patterning phase (Days 7-13, n = 39,545) and during cell maturation n = 47,676) (Table S1). Next, we used GOMETH analysis (Maksimovic et al., 2021) to explore shared biological functions among DMCs. In line with the gene expression results, from Day 0 to 20 we observed enrichment of BPs involved in neurogenesis, brain development and others ( Figure 3C) showing how DNAm modulates neuronal differentiation ( Figure 3A). Similarly, these analyses identified DMCs between Day 0 to 7 and Day 0 to 13, with BPs involved in neuron projection morphology, which fits well with the cell transitions ( Figures S3D and S3E). One of the most significant GO terms is ''neuron migration'', evidenced by expression of genes such as DCX and its partner PAFAH1B (Nadarajah and Parnavelas, 2002) (Figure 3).
To explore the correlation between DNAm and gene expression, we combined the DNAm and RNA-seq data sets based on CpG probe location and gene locus ( Figures 3D, S3F, and S3G; Table S1). Of the Stage I gene annotated DMCs, 72% overlap with differentially expressed genes, inferring functional impact on gene expression. For genes with DMCs we generally observed a decrease on transcriptional activation iScience Article or an increase for genes becoming repressed during the course of differentiation. The expression levels of the majority of the differentially expressed genes between Day 0 and 20 are predicted to be associated with DNAm changes (8,011 of the 11,313 DEGs). The expression of markers of late trophectoderm (e.g., KRT18), pluripotency maintenance (POU5F1), suppression of pluripotency (NR6A1) (Wang et al., 2016), metabolic reprogramming (LDHA) (Zheng et al., 2016), or spatiotemporally regulated cortical TFs and cell cycle related genes (LHX2, CDKN1C) (Chou and Tole, 2019; Laukoter et al., 2020) and neuronal differentiation and maturation markers, (such as DCX), may be regulated by one or more CpGs ( Figures 3C, S3F, and S3G; Table S3).
Methylated non-CpGs (mCpHs) have been associated with transcriptional repression in the mouse genome (Xie et al., 2012), and also detected in human ESCs, brain and organoid models (Lister et al., 2009;Luo et al., 2016;Schultz et al., 2015). We demonstrate that the distribution of methylated non-CpGs vary across timepoints ( Figure 3B). Non-CpG DNAm levels were enhanced at Day 0 cells and declined during differentiation ( Figure 3B), similar to the decline observed during differentiation of hESC to cerebral organoids (Luo et al., 2016).

Identification of heterogeneous populations of progenitors, mature and immature neurons with telencephalic signatures
To characterize the gene expression signatures, composition, differentiation pathway trajectories and the maturation level of the cell types derived, we performed single-cell RNA-seq (scRNA-seq) analyses at Days 0, 7, 13 and 20 (Figures 4 and S4; Table S4). We used the batch correction algorithm Harmony on the unfiltered data and found our data to be very consistent across replicates ( Figures S4D and S4E). Moreover, differential expression analyses between days were performed to investigate if DEGs were comparable between bulk RNA-seq and scRNA-seq (Table S4). We found an overlap of genes between DEGs for global RNA-seq and scRNA-seq of 62-91%, confirming a high correlation between the two datasets.

Inferring quantitative analysis of cell cycle phase
Neuronal development involves major cell cycle alterations. G1-phase lengthening is associated with the transition to more differentiated cell types, whereas S-phase duration is linked to progenitor cell expansion (Arai et al., 2011). The cell cycle-specific gene trajectories documented the decreased proportion of cells in S phase and an increased proportion of cells in G1 phase as cells transition from Day 0 to 20 (Figures 4B and  hESCNeuroDiffscRNA). Although, terminally differentiated neurons are found in G0 (not in G1), and the analyses tools currently available cannot accurately predict cells in the G0 phase, the results are consistent with previous studies showing that maintenance of proliferation and pluripotency in PSCs, neural stem cells and progenitor cells are cell cycle-regulated (Becker et al., 2006;Boward et al., 2016;Liu et al., 2019;Soufi and Dalton, 2016). The cell cycle regulator CDK1 was expressed by 60% Day 0 cells, 45% at Day 7, 53% at Day 13, and reduced to 33% at Day 20 ( Figure S4G, % from the hESCNeuroDiffscRNA). Moreover, utilizing CytoTRACE to predict single-cell hierarchies (Gulati et al., 2020) showed that cell potency gradually decreased from Day 0 to 20 ( Figure 4C), confirming the cell cycle phase prediction.

Development and differentiation markers used for cluster resolution and annotation
For a preliminary assessment of the cell types in our neuronal differentiation dataset, we utilized a scRNAseq Human Brain dataset (La Manno et al., 2016) and RNA-seq data from Human Primary Cell Atlas (Aran et al., 2019) ( Figures 4D and S4F). According to the Human Primary Cell Atlas, D0 hESCs overlap with ESCs See also Figure S3 and Tables S1 and S3. iScience Article and IPSCs, whereas the majority of D7, D13 and D20 cells overlap with neuroepithelial cells ( Figure S4F). When we compared our datasets to the La Manno Brain Data, the D0 cell population correlated with ESCs, and D7, D13 and D20 cells that are dividing and REST positive (Figures 4B and 4F) annotate to human neuronal progenitors whereas the cells in the D20 clusters R11-13 correlated with neuroblasts ( Figure 4D).

Characterizing the unsynchronized hESC population
We identified three Day 0 clusters (R1-3) where all cells expressed POU5F1 ( Figure S4G) verifying their pluripotency. TFs essential in establishing and maintaining pluripotency (i.e., GAL, TDGF1, ID1, FOXH1 and SOX2) were highly expressed in clusters R1-3 ( Figures S4F and S4G, as can be observed hESCNeuroDiffscRNA). R3 cells expressed the highest levels of LRRC75A ( Figure S4G). As others have reported (Chen et al., 2021, p. 1), PHC1 was highly expressed in hESC clusters R1-R3, and its expression was greatly reduced in differentiating cells. Downregulation of PHC1 was compensated by increased PHC2 expression, indicating a role for PHC2 in human neuronal differentiation (Figures 4F and S7B). Focusing on the FOX family of TFs, a clear switch was observed from FOXH1 expression in R1-R3, to the expression of the master regulator of brain development FOXG1 (Beyer et al., 2013;Chiu et al., 2014) in all other clusters ( Figures 4F and S5B).

LSX FOREBRAIN INDUCTION CUES EVIDENT AT THE END OF STAGE I
Under LSX induction, hESCs undergo morphogenetic events and form neural rosettes. These Day 7 cells mapped to a single cluster (R4, Figure 4F), enriched in the rostral marker LIX1 ( Figure S4G). In contrast, expression of the preplacodal genes EYA1 and SIX1 (Ikeda et al., 2007;Schlosser, 2014) and of the caudal markers PAX5 and GBX2 (Kirkeby et al., 2012;Maroof et al., 2013) was low throughout differentiation, confirming that the LSX efficacy is enabling cell-fate commitment persistence.

Self-patterning does not affect fate commitment
At Day 13, which marks the end of the self-patterning stage, 75% of the cells mapped to cluster R5 and most of them expressed REST ( Figures 4F and S4G). SIX3, DLX5 and BMP4 ( Figure S4G) were expressed in the FGF8/HES1 enriched R6 cells. Moreover, R6 was enriched in TAGLN, but absent in the R5 cells (F) Violin plots representing gene expression levels and distribution in clusters R1 to R15 for selected genes. See also Figure S4 and iScience Article co-expressing NKX2.1 and SOX6. CNTN1, a potent inducer of neuronal migration and Notch ligand (Hu et al., 2003) and potent inducer of neuronal migration (Lee et al., 2014), was exclusively expressed in R6 cells negative for NTN1, DLL1, FABP7 and POU3F2. Comparing to results of 8 week human embryonic tissue (Kirkeby et al., 2012), no midbrain and hindbrain markers were detected at Day 13, confirming that the self-patterning phase does not affect fate commitment.

Characterization of the day 20 heterogeneous population
Day 20 cells retained their identity and clustered in R7-13 ( Figure S4G). Some cells expressed high levels of CDK1 ( Figure S4G), whereas other cells were still regulated by REST and expressed DLX5 and CDKN1C. CDKN1C, which forms complexes with histone deacetylases to repress neuronal genes in non-neuronal cells (Laukoter et al., 2020) is inversely correlated with REST expression and enhanced in R11-13 ( Figure 4F). Of interest, ARX, a regulator of cortical progenitor expansion by repression of CDKN1C (Colasante et al., 2015) was only expressed in R13 cells ( Figure S4G). Neuronal differentiation correlated with CDK6 upregulation and G1 shortening. CDK6 is directly regulated by GLI3 and expression of GLI3 (Hasenpusch-Theil et al., 2018) (detectable at R4) dropped significantly in R12-13. REST is known to be downregulated during neurogenesis and in differentiating neurons and the pattern was recapitulated in this study ( Figure 4F).

Neuronal maturation signatures
Day 20 cells were highly enriched for MAP2, and clusters R11-13 were enriched for DCX, which is a marker of migratory neurons. Genes expressed in proliferating neuroblasts associated with cortical migration control and developing rostral brain structural patterning, such as EMX2 (Pang et al., 2008;Spalice et al., 2009;Verrotti et al., 2010), decreased in clusters R11-R12 and were undetectable in R13 ( Figure S4G). FGF8, an anterior-posterior patterning molecule, acting mainly via EMX2 repression (Hao et al., 2019, p. 8), was expressed in R4 and R6 cells and in a few Day 20 cells, mainly in R8 and R10 clusters. Furthermore, FGF17 (Figure S4G) and FGF18 were mostly expressed at Day 20 R8 cluster. HES6-enriched cluster R11 ( Figure S4G) was composed of Day 13 and 20 cells, and most of the R11 NEUROG1-negative cells were Day 13 cells. Neural stem and progenitor marker ZEB1, which was downregulated on neuronal differentiation to permit proper migration of immature neurons (Wang et al., 2019a), was expressed in almost all cells ( Figure S4G). In addition, FOXG1-enriched R13 cells also express high levels of DLX5 ( Figure S4G).
Of note, the expression of GNRH1 (Gonadotropin Releasing Hormone 1) was expressed in 30% of the R12-13 cells (9% of Day 20 cells) ( Figure S4G). Of these cells, some expressed GABAergic or glutaminergic processing enzymes. As the mechanisms that contribute to the development of extrahypothalamic GnRH neurons are not fully described, such data are vital for studies of development, puberty and reproduction.

Chromatin accessibility analysis identifies regulation signatures during differentiation
In line with the bulk RNA-seq data, the scRNA-seq results showed downregulation of pluripotency genes and upregulation of brain development genes correlating with neuronal transcriptional programs during differentiation. To further assess the epigenetic landscape changes on differentiation, we performed single-cell assay for transposase accessible chromatin sequencing (scATAC-seq). The analysis focused on chromatin-based gene regulation from loss of pluripotency at Day 0 to Day 20 ( Figure 5 and S5).

Reanalyzing scRNA-seq datasets for integration with scATAC-seq data
To integrate scATAC-seq and scRNA-seq, the scRNA-seq datasets for Day 0 and Day 20 were reanalyzed. 1910 Day 0, and 3033 Day 20 cells were projected in 13 clusters ( Figure 5A) and in accordance with the maturation trajectory seen in the corresponding CytoTRACE plot ( Figure 5B). scRNA-seq clusters were numbered and annotated, cohering to the initial four-timepoint analysis ( Figures 5A, S5A, and S5B). Thus, Day 0 cells resolved into five clusters (R0-3 and R14) whereas Day 20 clusters were resolved into nine clusters (R7-13 and R15).

Chromatin accessibility changes globally during differentiation
The analysis of 4,901 Day 0 nuclei and 2,847 Day 20 nuclei and the scATAC-seq data showed good distribution of fragment sizes, fragment numbers, and transcriptional start site (TSS) enrichment ( Figures S5C-S5F). The supervised pseudotime trajectory scATAC-seq analysis, which predicts paths for gene regulatory changes in cells during differentiation, showed a similar profile to the single-cell gene expression CytoTRACE analysis ( Figures 5B and 5C). We mapped four chromatin accessibility clusters at ll OPEN ACCESS iScience 25, 105279, November 18, 2022 9 iScience Article Day 0 (C1-4) and five at Day 20 (C9-5; Figure 5D). Genome tracks demonstrate differential chromatin opening in scATAC-seq clusters for gene loci POU5F1, REST, GAD2 and DCT ( Figure S5G). We next generated a gene score matrix heatmap representing a score of chromatin opening of 200 kb genomic regions. Higher

Correlation of chromatin regulatory dynamics and gene expression
To better connect the regulatory open chromatin with gene expression we performed integrated analysis of scATAC-seq with scRNA-seq using ArchR (Granja et al., 2021). Following constrained alignment of cell populations after integration of scATAC-seq and scRNA-seq, the integrated clusters were renamed to correspond to the previously annotated scRNA-seq clusters (Figures 5F and S5B We assessed scATAC-seq peaks across TGDF1, CDH1, CDH2, STMN2, and DCX loci across the integrated clusters (R0, R2, R7, R9 and R12) and found cluster-specific chromatin opening correlating with detected level of gene expression in single-cell transcriptome data ( Figures 5I and S5J). Furthermore, the peakto-gene co-accessibility arcs show that the expressed genes STMN2, TDGF1, CDH1, CDH2 and DCX are linked to potential cis regulatory regions (CREs).
To explore the interaction of putative CREs with gene expression, we mapped 95,800 peak-to-gene links grouped into 25 clusters and observed a clear correlation of the predicted CREs and gene expression ( Figure 5G). Chromatin accessibility peak annotation analysis revealed varying enrichment across integrated clusters at promoters, intronic, exonic and distal regions ( Figure S5H) but with a clear enrichment at TSS in every cluster ( Figure 5H). These results agree with previous studies showing that neuronal gene activation in early human brain development depends on multiple distal regulatory regions (de la Torre-Ubieta et al., 2018;Markenscoff-Papadimitriou et al., 2020;Reilly et al., 2015;Trevino et al., 2020Trevino et al., , 2021Ziffra et al., 2021).
To understand the dynamics of lineage-defining factors at pluripotency and differentiation endpoint, we mapped motifs within accessible chromatin regions for TFs found in our analyses ( Figures 5J and S5K). Motif footprinting for POU5F1 underlies a regulatory function in accessible chromatin in pluripotent clusters R0 and R2, whereas ASCL1 and OTX2 footprints were more enriched in differentiated clusters R12 and R7. The motifs of POU5F1, DLX6, ASCL1 and OTX2 and motif enrichment in open chromatin in individual cells illustrate how lineage-defining TFs can dynamically regulate gene expression programs during neuronal differentiation (Figures 5J and S5K).

Molecular signatures of neuronal differentiation
To further elucidate the molecular regulation during neuronal differentiation we focused on Day 20 and linked chromatin accessibility to gene expression to identify potential novel enhancers. We identified 26, 189 putative CREs in five linked groups (k-means clusters 1-5) (Figures 6A and 6B; Table S5). GO enrichment for the linked genes for groups 1-2 and to a lesser degree group 5 were linked to processes such as ''nervous system development'', ''neurogenesis'', ''neuron differentiation'' and ''anterograde trans-synaptic signaling'' ( Figure 6C; Table S5). Corresponding GO enrichment for the identified putative CREs in km 1-5 by GREAT (McLean et al., 2010) showed very similar BPs for all groups such as ''nervous system development'', ''neurogenesis'' and ''generation of neurons'' (Table S5). A total of 1183 potential CREs overlapped with enhancer regions from a collection of Ensembl Human Regulatory Regions (Zerbino et al., 2015; Figure 6D; Table S5). As many of the annotated Ensembl enhancers may operate in other cell types, we next compared our integrated peak-to-gene link Day 20 data with a similar dataset from early human brain. We utilized a single-cell atlas from human cortical development post-conceptional weeks 16-24 and extracted the linked genes from inferred peak-to-gene link pairs after a pseudotime annotation from all five interaction clusters (Trevino et al., 2021) ( Figure 6E; Human telencephalon enhancer activity patterns are a subject under intense study (Visel et al., 2013). To address whether any of the putative Day 20 CREs may function as enhancers we compared their genomic location with the coordinates of active VISTA enhancers for ASCL1 and ID4. We show an overlap with enhancers hs967, hs988, hs1354, hs1533 and hs1540 previously described to be active in the brain of transgenic mice, but no overlap was observed for hs1114 ( Figure 6F) (Visel et al., 2007(Visel et al., , 2013. However, other CREs did not overlap with previously characterized VISTA enhancers and may potentially be new candidate enhancers. For example, we have shown that GNRH1 was expressed in R12-13 cells and our integrative analysis identified a set of potential CREs 3.5 kilobases upstream of the GNRH1 locus ( Figure 6F). This genomic distance from GNRH1 is similar to a rat E2 upstream Gnrh1 enhancer that has been shown together with enhancers E1 and E3 to be important in driving robust neuronal expression (Iyer et al., 2010). To better understand the regulation of neuronal differentiation programs we mapped TF motifs that were enriched in the putative CREs and correlated these with expression in Day 20 cells (Figures 6G  and 6H; Table S5). Motif enrichment with higher correlation included neuronal TF regulators such as neuronal marker NHLH1, the ONECUT family known to promote neuronal differentiation (van der Raadt et al., 2019), and proneural factor NEUROG1 with representative sequence logos ( Figure 6I; Table S5).
We also generated peak-to-gene links for hESCs and identified 13, 463 putative CREs in five groups (k-means clusters 1-5) ( Figures S6A and S6B; Table S6). Because of fewer peak-to-gene links identified in Day 0, the GO functional enrichment analyses of the linked genes in the five groups displayed less overlap and more general processes such as ''regulation of cellular process'', ''biological regulation'' and this was also true for the corresponding GREAT analyses of potential CREs ( Figure S6C; Table S6). Furthermore, 388 of the putative CREs overlapped with annotated enhancers from the Ensembl regulome ( Figure S6D; Table S6). Reflecting a smaller dataset, fewer TF expressed in hESCs with motifs enriched identified linked putative CREs were identified ( Figures S6E and S6F; Table S5). HMGA1, an architectural chromatin protein that previously has been shown to be highly expressed in hESCs and can prevent differentiation (Shah et al., 2012), showed both high expression and motif enrichment in Day 0 cells with representative sequence logo ( Figure S6G).

Exploration of single-cell data using interactive webtools
It is generally necessary to have expertise in bioinformatics for the analysis of large single-cell sequence data analysis. Open-access interfaces based on open-source tools enabled us to make our scRNA-and scATAC-seq data available to more people, abiding by the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles (Ouyang et al., 2021;Sharma et al., 2021).The users can explore scRNAseq data in hESCNeuroDiffscRNA and plot high resolution figures of their genes of interest under seven different tabs (Figures S7A-S7H; Table S7). This includes exploration of 1) Gene expression UMAPS as illustrated for POU5F1 and NTRK1; 2) gene co-expression analysis, here shown for PHC1/PHC2 and NEUROG1/NTRK1; 3) different gene and cluster expression configurations, such as heatmaps, violin-, box-, proportion-and bubble plots. The platform also allows for correlation with other published gene expression datasets ( Figure S7H).  iScience Article expression of ZIC4, mostly undetectable at early timepoints, appears at Day 13 and peaks at Day 20 ( Figure 7A). DNAm levels at the CpGs in the ZIC2 locus were stable, whereas DNAm of 13 CpGs in the ZIC4 locus were positively or negatively correlated with gene expression across differentiation ( Figure 7B). Differential expression of ZIC2 and ZIC4 across individual cells at Day 0, 7, 13 and 20 in UMAPs ( Figure 7C) can be compared and correlated with TFs important in the neuronal differentiation protocol ( Figure 7C).
The scATAC-seq data can be explored in ''hESC Neuro Differentiation scATAC seq'' (hESCNeuro DiffscATAC) (Figures S7I-S7N; Table S7). Users can visualize chromatin accessibility, motif enrichment or integration of scATAC-seq with scRNA-seq in different UMAPs ( Figure S7I). The webtool enables investigation of gene score and motif matrix, showing the representative sequence logo calculated from open regions. (gene examples at Figure S7J). Chromatin opening can be explored across the genome for different clusters, as shown for SOX2 and POU5F1 or in different heatmaps ( Figures S7K-S7N; Table S7). Pseudotime trajectories or peak-togene linkage can be viewed as heatmaps, which define linked chromatin opening peaks with promoters of expressed genes (Granja et al., 2021) and may deduce enhancer promoter interactions (Baek and Lee, 2020). Gene score and gene integration analysis showed that ZIC2 and ZIC4 were active in different cells ( Figure 7D) whereas their footprints and representative sequence logos were identified. Our 4D approach highlights the epigenetic regulation and gene expression of these genes using this neuronal differentiation protocol.

DISCUSSION
Here we present comprehensive multi-omics analyses to characterize a 2D neuronal differentiation protocol from pluripotent hESCs towards a ventrally committed, telencephalic population of progenitors, mature and immature neurons. We assessed stage-to-stage transition using ddPCR and immunofluorescence imaging and used bulk RNA-seq and scRNA-seq to validate gene expression and cell populations over time. The scATAC-seq and DNAm analyses further characterized the epigenetic and gene regulatory landscapes. We have identified cis-regulatory elements and DNA binding TFs that regulate neuronal differentiation gene expression programs.
The deconvolution of early human neuronal differentiation at the level of molecular regulation provides insight to an otherwise inaccessible developmental window. Animal models are valuable, but evidence shows that the human neocortex develops under the effect of additional mechanism (Massimo and Long, 2021;Pinson and Huttner, 2021;Xing et al., 2021). Thus, neuronal differentiation studies from PSCs provide an alternative method to characterize developmental transcriptome trajectories and the roles of specific genes in human brain formation and patterning.
To specify the patterning and cell maturation identities, we followed up the trajectories of major TFs (O'Leary and Sahara, 2008). Ventral telencephalic markers, such as EMX2 and ASCL1, were already expressed at the end of Stage II, whereas dorsal markers such as EMX1 and NEUROG2 were absent. Absence of expression of HOXB2, PAX7, and GBX2, confirmed that self-patterning after neural induction, had no effect on lineage commitment.
GO analyses revealed stage dependent enrichment of biological processes correlated to neurogenesis, pattern specification, signaling and neurotransmitter regulation, migration, synaptic organization and neuronal maturation. The DNAm analyses showed alternating, stage-dependent changes for various patterning genes and TFs important to neurogenesis. In most developmentally regulated genes involved in neuronal lineage commitment, DNAm levels decreased on transcriptional activation and increased for genes becoming repressed during the time of differentiation. We also observed that sometimes downregulation of gene expression in the self-patterning stage might intersect the upregulation of gene expression seen both in stage I of neural induction of hESCs and during maturation at stage III. This could be because of the combined effect of non-CpG and CpG DNAm implicated in the regulation of RNA splicing in ESCs and neurons, respectively (Ball et al., 2009;Laurent et al., 2010). Non-CpG DNAm accumulates in neurons during synaptogenesis and synaptic pruning , and CpH DNAm is associated with transcriptional repression (Luo et al., 2016;Xie et al., 2012). Whether and how CpH DNAm plays a role in self patterning following the LSX induction is not known, and future studies are needed to explore this. Furthermore, the in vitro model for DNAm changes presented here is advantageous for neuropharmacological studies. Whether these changes can be translated to distinct early developmental events, cannot be ascertained. iScience Article networks. The identification of common DNAm modification sites and chromatin openness regions may present candidate loci for future studies of early human development and may advance translational studies of the impact of drugs used early in human pregnancy.
The effect of loss of pluripotency towards neuralization, irrespective of the intermediate timepoints, was investigated through an integrative analysis of the scATAC-and scRNA-seq. We used ArchR (Granja et al., 2021) for this analysis as this pipeline was flexible for small scATAC-seq and scRNA-seq datasets. The juxtaposition of the transcriptome to the regulatory elements in ESCs and differentiating Day 20 cells allowed us to infer neuronal gene regulatory network information.
We identified linked sets of genes and putative CREs unique to hESCs and Day 20 that enabled us to infer epigenome regulation. Less than 2.7% of the putative CREs identified in Day 0 or Day 20 overlapped with the annotated ENSEMBL enhancers. However, our functional assessment of the potential CREs suggests that we have identified new enhancers specific for neuronal differentiation. A greater number of expression linked putative CREs was identified for Day 20 than in the more homogenous hESCs, and the functional annotations for the linked analysis correlates with the BPs from differential analysis of RNA-seq and DNAm. Our multi-omics approach showed that the in vitro 2D neuronal differentiation protocol recapitulates stages of neuronal progenitor proliferation and specification. Thousands of transient chromatin accessible regions linked to expressed genes have been mapped in early human brain development (de la Torre-Ubieta et al., 2018;Trevino et al., 2020Trevino et al., , 2021Ziffra et al., 2021). Using the same correlation based approach for scATAC-seq and scRNAseq integration, we were able to compare CRE-linked genes that were active in Day 20 with CRE-linked genes from human brain tissue, at weeks 16 to 24 after conception (Trevino et al., 2021). A total of 68% linked genes from the early human brain single-cell pseudotime integration overlapped with the Day 20 CRE-linked genes. Moreover, we show that the genomic location of some of the putative CREs for neuronal genes overlapped with active early human brain enhancers assessed in transgenic mouse embryos (Visel et al., 2013). Future work need to establish the functional role of the predicted Day 20 CREs in neuronal differentiation. Analyses of datasets from studies such as ours could define the epigenetic regulatory programs preceding and regulating gene expression during cortical development. The causal direction of epigenetic regulation in early brain development was further corroborated with the in vitro differentiation via sets of open CREs and the regulation of neuronal TF-centered networks. However, only 10% of regulatory TFs identified in putative CREs in the same pseudotime developmental brain program overlapped with Day 20 mapped regulatory TFs (Table S5) (Trevino et al., 2021). Buenostro et al. have reported observations that epigenetic regulation are found to precede transcription (Ma et al., 2020). We suggest that some of the detected chromatin accessible regions in Day 20 cells define gene expression for future cellular states. Our comprehensive gene expression and epigenetic regulation network is a resource for future studies including, but not restricted to, the effect of specific disease variants or drugs in early brain physiology and development.

STRENGTHS OF STUDY
Although numerous studies have used the LSX cocktail for neural induction, to our knowledge this is the first study that has shared all scRNA-seq and scATAC-seq data in such transparent and interactive format.
Thus the strength of this study is the high quality data and the presentation of our single-cell data in two visualization tools, ShinyCell and in-house developed ShinyArchR.UiO (Ouyang et al., 2021;Sharma et al., 2021), that are openly available for users. These tools allow the users to explore candidate genes and utilize a comprehensive set of functionalities, beyond the analysis presented. These tools enable insight into the molecular and structural partners of stage-specific markers and time-stamped TFs, their transcriptional regulation and cell cluster identities. Programming scripts for data analysis are made available and can be easily customized for further studies and the incorporation of other data.

Limitations of the study
The protocol described is an in vitro study, and as such, it is difficult to know how well it reflects the in vivo dynamics. Furthermore, this study has several limitations at the level of neuronal subtype characterization, as we did not assess neuropeptide diversity or secretion and the membrane electrochemical and electrophysiological maturation properties were not evaluated. Protein quantification or intracellular localization of markers and trajectories were also beyond the scope of this multi-omics characterization. A limitation of scATAC-seq is the relatively low genome- iScience Article better understand G0 dynamics. The outcome would be the design of tools that can accurately predict the cell cycle phase in differentiating and differentiated neurons.
In conclusion, in this study we describe the characterization of a 2D neuronal differentiation protocol where the unparalleled power of multi-omics was used to decipher fate specification. We assessed the functional regulation of transcription factors and developmentally regulated genes, from loss of pluripotency towards neuronal differentiation. Integration of scATAC-seq and scRNA-seq provide invaluable insight on the complexity of fate decisions and enable other researchers to fine-tune future studies. Finally, the reader has access to the single-cell sequencing data in two searchable, user-friendly webtools to visualize intraand inter-time point and cell cluster regulation, interactively.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

EXPERIMENTAL MODEL AND SUBJECT DETAILS
Human embryonic stem cell culture and maintenance and neuronal differentiation protocol The hESC line HS360 (Stockholm Medical Biobank, Sweden, KIe009-A) (Main et al., 2020;Strö m et al., 2010) was used in this study. hESCs were maintained in Essential 8ä Medium (Thermo Fisher), in feeder-free conditions on Geltrex Matrix solution (Thermo Fisher) pre-coated culture plates and media was replaced daily. The full description of hESC cultivation and a step-by-step description of the 2D neuronal differentiation protocol, all stock solutions and dilution recipes, coating instructions and times, critical points of the protocol, a Day-by-Day timeline of high-resolution brightfield images, immunocytochemistry, qRT-PCR marker genes for HS360 are available at STAR Protocols (Samara et al., 2022). Brightfield images were acquired on an EVOSâ FL Cell Imaging System AMF4300 (Thermo Fisher).

Immunofluorescence analysis
In brief, cells grown on 13mm glass coverslips, were washed once and fixed in 4% paraformaldehyde for 15 min at room temperature (RT). After 3 washes, the cells were permeabilized with 0.3% Triton X-100 (Thermo Fisher) in blocking buffer containing 2% BSA (Sigma-Aldrich) and 0.01% Tween in 13PBS for 30minat RT, washed 3 times, and blocked with 10% horse serum for 30 min. Primary antibodies were diluted (as in KRT) in 13PBS containing 0.03% Triton X-100, and coverslips were incubated overnight at 4 C. Next, coverslips were equilibrated at RT for 2 h and washed 3 times. The secondary antibodies were diluted (see key resources iScience Article scRNA-seq data analysis The Cell Ranger 3.1.0 Gene Expression pipeline (10x Genomics) was used to demultiplex the raw base-call files and convert them into FASTQ files. The FASTQ files were aligned to the GRCh38-3.0.0 human reference genome, and Cell Ranger count was used with default parameters for computing read counts for Days 0, 7, 13 and 20.
We utilized Harmony for batch correction after merging and performing log normalization on the two replicates (Korsunsky et al., 2019). We aggregated sequenced replicates for each day into single datasets using Cell Ranger Aggr command. The Seurat Package v.4.0. (Hao et al., 2021) was used to perform quality control and normalization on the count matrices obtained after the aggregation. The gene count per cell, UMI count per cell and percentage of mitochondrial and ribosomal transcripts were computed using the functions of the Seurat package. Low quality cells expressing few genes (less than 200) were excluded from the downstream analysis. Genes expressed in less than three cells were removed. Duplicates, dead cells and cells with greater than 5 median absolute deviations (MADs) for mitochondrial reads were filtered out (Mc-Carthy et al., 2017). We used the MAD-based definition of outliers, using the Isoutlier function of scater package version 1.0.4, to remove putative low-quality cells from the dataset. For normalization we used scTRANSFORM to better understand cell to cell heterogeneity after performing cell cycle regression analysis Tirosh et al., 2016). Counts were adjusted for cell-specific sampling ('normalized') using the scTRANSFORM function with regression of cell cycle genes and mitochondrial content. To cluster the cells, we used resolution of 0.55, obtained by determining the optimum number of clusters (cell grouped together sharing similar expression profiles) in the dataset using the Clustree R package (Zappia and Oshlack, 2018) (Figures S4B and S4C). SingleR (Aran et al., 2019) was used to annotate the cells against two different reference data sets. Global RNA-seqdata from Human Primary Cell Atlas was accessed using the celldex (Aran et al., 2019), and scRNA-seq data from a Human Brain dataset (La Manno et al., 2016) was accessed using the scRNAseq R package . Cell types with < 15 cells annotated were excluded from the plots (Figures S4F and 4D). FindMarkers from the Seurat R package was used to perform differential expression analysis between days.

OPEN ACCESS
iScience 25, 105279, November 18, 2022 29 iScience Article Cell-cycle state assessment using scRNA-seq data The cell cycle state assessment was performed using a gene set and scoring system previously reported (Tirosh et al., 2016). The S and G2M scores were calculated based on 43 S phase-specific genes and 54 G2 or M phase-specific genes and the Seurat package Cell cycle scoring function was used for calculation of actual scores.

Clustering and dimensionality reduction
The Resolution for finding clusters were computed using scClustviz SNN-based clustering (Innes and Bader, 2018). Then, principal component analysis was performed using RunPCA function of the Seurat package. For UMAP visualization, clusters were identified using FindNeighbors and FindClusters Seurat function using resolution 0.55, followed by the RunUMAP function across samples with the same parameters. UMAP preserves aspects of global structure in larger datasets and was therefore preferred for visualization over t-SNE (Becht et al., 2019). We used FindMarkers or FindAllMarkers functions to compute differentially expressed genes between the clusters.

CytoTRACE
We utilized CytoTRACE (Cellular (Cyto) Trajectory Reconstruction Analysis) (Gulati et al., 2020) with gene counts for all datasets (Merged Days 0, 7, 13 and 20 and Day 0 and Day 20) for prediction of differentiation state of cells from scRNA-seq data. In short, CytoTRACE leverages single cell gene counts, covariant gene expression and local neighbourhoods of transcriptionally similar cells to predict ordered differentiation states from scRNA-seq data. CytoTRACE uses smoothing steps within the dataset to remove confounding factors associated with direct comparison of genes expressed by each cell in cross-study differences in depth and sensitivity.

scATAC-seq Library Preparation and sequencing
Cells from HS360 hESCs were differentiated in one time-course experiment, and at Day 0 and Day 20, the cells were washed twice with 1xPBS and detached to single cell suspension by application of Accutase (STEMCELL Technologies) at 37 C for 7 min. The detached cells were washed with appropriate base media with added 0.04% BSA (Sigma-Aldrich) and filtered using MACS SmartStrainers (Miltenyi Biotech) to remove cell aggregates. Nuclei isolation was done according to the 10x Genomics protocol CG000169 (Rev D) using 2 min of incubation in lysis buffer diluted to 0.1x and 0.5x for Day 0 and Day 20 cells, respectively. We used the Countess II FL Cell Counter (Thermo Fisher Scientific) to quantify nuclei and confirm complete lysis and microscopy to confirm high nuclei quality. Nuclei were further processed on the 10x Chromium controller (10x Genomics) using Next GEM Chip H Single Cell Kit (10x Genomics), Next GEM Single Cell ATAC Library & Gel Bead Kit v1.1 (10 x Genomics) and Chromium i7 Multiplex Kit N Set A (10x Genomics) according to the Next GEM Single Cell ATAC Reagent Kits v1.1 User Guide (CG000209, Rev C). The targeted nuclei recovery was 5,000 nuclei per sample. The resulting 4 sample libraries were sequenced on a NovaSeq Sp flow cell (Illumina) with 50 cycles for read 1, 8 cycles for the i7 index read, 16 cycles for the i5 index read and 49 cycles for read 2.

scATAC sequencing analysis
Cell Ranger ATAC version 1.2.0 with reference genome GRCh38-1.2.0 was used to pre-process scATACseq raw sequencing data into FASTQ files. Single cell accessibility counts for the cells were generated from reads using the 'cellranger-atac count' pipeline. Reference genome HG38 used for alignment and generation of single-cell accessibility counts was obtained from the 10x Genomics (https://support. 10xgenomics.com/single-cell-atac/software/downloads/). Downstream analysis of the scATAC-seq data was performed using the R package ArchR v1.0.1 (Granja et al., 2021). A tile matrix of 500-bp bins was constructed after quality control, removal of low-quality cells and doublet removal using the doubletfinder function of ArchR. The ArchR Project contained the filtered cells that had a TSS enrichment below 3 and <1000 fragments. ArchR has implemented Harmony for batch correction (Korsunsky et al., 2019). A layered dimensionality reduction approach utilizing Latent Semantic Indexing (LSI) and Singular Value Decomposition (SVD) applied on Genome-wide tile matrix. Uniform Manifold approximation and projection (UMAP) was performed to visualize data in 2D space. Louvain Clustering methods implemented in R package Seurat (Stuart et al., 2019) was used for clustering of the single-cell accessibility profiles. We also obtained links (peak2genelinks) between gene TSS and putative CRE's to expression profiles of genes from ArchR. The peak2genelinks pair represents enhanced gene interactions. We correlated the putative CREs to a list ll OPEN ACCESS iScience Article QUANTIFICATION AND STATISTICAL ANALYSIS Statistical analyses were performed in R version 4.1.2 (R Core Team, 2019) using SARTools v.1.6.8 (Varet et al., 2016), DESeq2 v.1.22.1 , Limma , ArchR (Granja et al., 2021), Seurat (Hao et al., 2021;Stuart et al., 2019), MORE (Conesa, 2018) and ggpubr package v.0.4.0 (Kassambara, 2020). Details are described in the relevant methods sections above.