The Parkinson’s-disease-associated mutation LRRK2-G2019S alters dopaminergic differentiation dynamics via NR2F1

to develop neurodegenerative diseases. We investigate the occurrence of developmental abnormalities in dopa-minergicneuronsinamodelofParkinson’sdisease(PD).Wemonitorthedifferentiationofhumanpatient-speciﬁc neuroepithelial stem cells (NESCs) into dopaminergic neurons. Using high-throughput image analyses and sin-gle-cell RNA sequencing, we observe that the PD-associated LRRK2-G2019S mutation alters the initial phase of neuronal differentiation by accelerating cell-cycle exit with a concomitant increase in cell death. We identify the NESC-speciﬁc core regulatory circuit and a molecular mechanism underlying the observed phenotypes. The expression of NR2F1 , a key transcription factor involved in neurogenesis, decreases in LRRK2-G2019S NESCs, neurons, and midbrain organoids compared to controls. We also observe accelerated dopaminergic differentiation in vivo in NR2F1 -deﬁcient mouse embryos. This suggests a pathogenic mechanism involving the LRRK2-G2019S mutation, where the dynamics of dopaminergic differentiation are modiﬁed via NR2F1. mutations or genetic variations, infection, trauma, or environmental factors. using permeabilization buffer 1 mL 0.05% Saponin-1% BSA-1xPBS. Next, cells were subjected to titrated primary antibodies and isotype controls in equal concentration in permeabilization buffer for 1h at 4 (cid:4) C. After two 10%FBS-1xPBS wash steps cells were resuspended in secondary antibody mix and incubated for 30 min at 4 (cid:4) C. Following two 10% FBS-1xPBS washing steps, cells were resuspended in 1xPBS and analyzed in a Fortessa ﬂow cytometry analyzer (BD Biosciences), Becton Dickinson Biosciences. Resulting ﬂow cytometric data was further processed using FlowJo, LLC software. First, we gated-out doublets using conservative double gating via SSH-A/SSH-H and SSH-A/SSC-W). Samples were further analyzed while considering appropriate negative, compensation, and isotype controls. Gated quantitative numbers were subjected to different groupings


Correspondence jens.schwamborn@uni.lu
In brief Walter et al. show that differentiating iPSC-derived neurons carrying the LRRK2-G2019 mutation are characterized by faster initial neuronal differentiation and cell-cycle exit, as well as increased cell death, compared to controls. Downregulation of the transcription factor NR2F1 appears key to mediate LRRK2-G2019S-dependent phenotypes.

INTRODUCTION
Parkinson's disease (PD) is the second most prevalent neurodegenerative disorder after Alzheimer's disease. It is histopathologically characterized by the progressive loss of midbrain dopaminergic neurons (mDAs) in the substantia nigra (Przedborski, 2017;Schapira et al., 2017). However, the pathology's wide spectrum of clinical manifestations is only partially explained by the progressive loss of mDAs. Several lines of evidence have substantiated the hypothesis that PD might have a neurodevelopment component. This is, for example, suggested by an analysis of PD autoptic brain samples (Huisman et al., 2004). The number of mDAs has been found to be decreased in the sub-stantia nigra but strongly increased in the olfactory bulb (Huisman et al., 2004). Neurogenesis has been reported to occur in human olfactory bulb until the first 18 months after birth (Sanai et al., 2011;Wang et al., 2011). Thus, it is reasonable to conclude that the increased number of mDAs was specified during development. Further evidence for a neurodevelopmental component of PD comes from leucine-rich repeat serine/threonine-protein kinase 2 (LRRK2). Mutations in the gene encoding for LRRK2 have been associated with both familial and sporadic PD (Funayama et al., 2002;Lesage et al., 2006;Ozelius et al., 2006). LRRK2 is highly expressed during development (Zechel et al., 2010), and it binds to proteins of the dishevelled (DLV) family (Sancho et al., 2009), key mediators of the Wnt signaling pathway. The Wnt pathway is essential in embryonic development (Croce and McClay, 2008), controlling several key processes such as cell fate specification (Logan and Nusse, 2004), including that of mDAs (Arenas, 2014). LRRK2 mutations in the Roc domain have been shown to disrupt its interactions with DLV1-3 (Sancho et al., 2009). Consistent with the neurodevelopmental hypothesis, LRRK2 has been shown to regulate synaptogenesis (Parisiadou et al., 2014;Winner et al., 2011).
The penetrance of the LRRK2-G2019S mutation is incomplete and variable (Healy et al., 2008), and it is modulated by age, individual genetic background, and environmental factors (Chen et al., 2017;Hill-Burns et al., 2016;Lorenz et al., 2017). It has already been shown that genetic mutations can interfere with development, resulting in defects of variable severity and diverse clinical manifestations (Marín, 2016;Poduri et al., 2013). Increasing evidence suggests that neural stem cell (NSC) deregulation and the resulting compromised neurodevelopment play a role in PD (Le Grand et al., 2015;Marxreiter et al., 2013;Schwamborn, 2018). The existence of a neurodevelopmental predisposition to PD could potentially explain also some of the extra-nigral symptoms associated with PD (Schapira et al., 2017).
The use of PD-patient-derived human induced pluripotent stem cells (hiPSCs) represents an enabling tool to model developmental alterations in an in vitro human-based system (Gammill and Bronner-Fraser, 2003;Reinhardt et al., 2013b). hiPSCderived cells resemble embryonic identity at the gene expression level and recapitulate early developmental events (Mariani et al., 2012). hiPSC-derived neural cells carrying the LRRK2-G2019S mutation reproduced several in vivo PD phenotypes, including reduced neurite ramification (Reinhardt et al., 2013b), synaptic alterations (Borgs et al., 2016;Reinhardt et al., 2013b;Sá nchez-Dané s et al., 2012), mitochondrial abnormalities (Cooper et al., 2012;Sanders et al., 2014), reduced stress sensitivity (Reinhardt et al., 2013b), autophagosomal dysfunctions (Sá nchez-Dané s et al., 2012), and stem cell alterations (Liu et al., 2012;Sanders et al., 2014). Previous studies mainly characterized the mature differentiated mDAs without considering potential alterations in the transition dynamics from NSCs to neurons. In contrast, here, we addressed this gap and extensively study the impact of the LRRK2-G2019S mutation on the dynamics of cell fate specification throughout neuronal differentiation of patient-specific neuroepithelial stem cells (NESCs).

RESULTS
In vitro differentiation of human NESCs follows an in vivo developmental process To evaluate whether LRRK2-G2019S mutation could influence the differentiation dynamics of dopaminergic neurons, we cultured NESCs from an early-onset patient (PD2/G2019S) and the corresponding isogenic control line, where the mutation was corrected (PD2/WT) ( Figure S1A). We induced dopaminergic differentiation and collected the cells after 4, 10, 14, and 42 days. To analyze neuronal differentiation at a high-level view, we performed droplet-based single-cell RNA sequencing (scRNA-seq) at the mentioned time points.
To assess the quality and identity of our in vitro preparation, we performed a prototype analysis using a machine-learning tool and the information contained in a single-cell atlas of the human developing ventral midbrain (La Manno et al., 2016). The algorithm compares the transcriptomes measured from our in vitro model to the in vivo reference human fetal cell types. The approach allows evaluating the similarity of each cell to many reference cell types. The classifier assigns each cell a probability to belong to each of the prototypes. An analysis of the top 18 expressed candidate genes relevant to midbrain progenitors, neuroblast medial (NbM) stem cells, and mDAs was conducted. The analysis revealed that the in vitro mDA differentiation is incomplete. However, the in vitro cells resemble good-quality NbM and progenitors ( Figure S1A). Hence, the used cell culture model represents a suitable tool to address early developmental processes.

LRRK2-G2019S causes accelerated neuronal differentiation
After proving the in vivo relevance of our in vitro model, we evaluated whether the LRRK2-G2019S mutation altered cell fate acquisition. We used 13 iPSC-derived NESC lines generated from three patients carrying the LRRK2-G2019S mutation and four age-and gender-matched healthy individuals. Four isogenic NESC lines (two where the mutation was introduced and two where the mutation was corrected) were also used (Figures 1A and S1A). We used two different clones for patient 1 (P1 and P1.1). The derivation of NESCs from iPSCs and their mDA differentiation and characterization have been previously described ( Figure S1A) (Reinhardt et al., 2013a).
Throughout the study, we grouped the cells in two ways; we indicated the disease condition of the donor as healthy (H) or PD and the absence (WT) or presence (G2019S) of the LRRK2-G2019S mutation. H/WT indicates cells from a healthy individual, while H/G2019S are healthy cells after the introduction of the LRRK2-G2019S mutation (isogenic control). The PD/G2019S lines are cells from PD patients carrying the LRRK2-G2019S mutation, and they are annotated as PD/WT after the gene correction. We analyzed all NESC lines according to the following groupings: (1) H + PD/G2019S versus H + PD/WT), (2) PD/G2019S versus H/WT), (3) PD/G2019S versus PD/WT, and (4) (H/WT versus H/G2019S).
To study the potential contribution of neurodevelopmental alterations to PD, we analyzed the temporal dynamics of neuronal and mDA differentiation via automated image analysis. We focused the analysis on days 4, 10, and 14 of differentiation (Figure 1B). We analyzed the appearing DA neuronal marker tyrosine hydroxylase (TH + ) and neuron-specific class III beta tubulin (TUJ1 + ) cells in combination with the LIM homeobox transcription factor 1-alpha (LMX1A) as a marker for mDA (Andersson et al., 2006). LMX1A and TH staining confirmed the midbrain identity of the cells during differentiation and showed the higher   . LRRK2-G2019S correction in PD-patient-derived cells rescued the dynamics of mDA specification to healthy control levels. Furthermore, we observed a significant increase in the mDA fraction in healthy genetic background lines when the LRRK2-G2019S mutation was introduced ( Figure S2A). We also analyzed the ratio of TUJ1 + / Hoechst ( Figure S2B) and TH À -TUJ1 + /Hoechst ( Figure S2C) and observed a similar tendency for neurons and non-mDA.
To gain further insights into the exact dynamics of this phenotype, we analyzed the cultures with a flow-cytometry-based approach (fluorescence-activated cell sorting [FACS]). We immunostained the cells using antibodies against TUJ1 and TH. TH + /TUJ1 + neurons were measured prior induction of mDA differentiation (day 0) and at days 2, 4,6,8,10,14,28,and 42 of differentiation (Figures 1C,S3A,and S3B). LRRK2-G2019S cultures at days 4, 10, and 14 differentiated into mDAs more rapidly than control cultures. However, at the endpoint of differentiation (day 42), no significant difference in the number of mDA was detectable when comparing LRRK2-G2019S and control cultures. Interestingly, correction of LRRK2-G2019S mutation in patient-derived cells rescued the impaired mDA differentiation dynamics to the levels observed in healthy controls ( Figure 1C). We also quantified TH in combination with the mDA marker forkhead box protein A2 (FOXA2) (Sasaki and Hogan, 1994). FOXA2 was present in the majority of the cells (e.g., $95%) at day 14 ( Figures 1D, S3C, and S3D). Quantification of TH + /FOXA2 + neurons confirmed the initial increase of mDA in LRRK2-G2019S cultures compared to controls. Thus, we conclude that the LRRK2-G2019S mutation increases the rate of neuron and mDA formation during differentiation.
To analyze cell identities during neuronal specification, we reverted to the scRNA-seq and analyzed the global transcriptome changes due to LRRK2-G2019S during neuronal differentiation. We generated targeted gene lists specifically for progenitor cells, mDAs, astrocytes, microglia, oligodendrocytes, endothelial cells, and cell cycle (Table S1) based on the available literature (La Manno et al., 2016;Reinhardt et al., 2013a;Whitfield et al., 2002). We profiled 2,000 qualitycontrolled cells at the four differentiation time points (250 cells per condition on days 0, 10, 14, and 42). After pre-processing of the data ( Figure S4A), we calculated cumulative gene expression scores for the targeted cells or cellular processes (NESCs and mDA-based defined gene lists). The analysis of cumulative gene expression distributions and differentially regulated gene expression ( Figure S4B; Table S2) showed significant differences between the genotypes for most time points and gene lists. We pooled the data obtained from both genotypes and performed a principal-component analysis. We used the 20 first principal components in t-distributed stochastic neighbor embedding (t-SNE) for further dimensionality reduction (van der Maaten and Hinton, 2008). The resulting t-SNE plots revealed separated cell clusters, particularly on days 10 and 14 ( Figure 2A). To characterize the clustering structures and the relation of the cells with clusters of similar gene expression, we fitted a Gaussian mixture model to the t-SNE plots. This resulted in the identification of defined clusters per each time point, as shown by the gray shadowed re-gions. We mapped individual cells on this clustered structure using color-coded cluster assignments. We observed an increasing specification and heterogeneity of the cells during differentiation. The rather diffuse arrangement of cells over the clusters in the NESCs became more constrained at later time points, forming more distinct clusters. Interestingly, this heterogeneity converged to a more homogeneous gene expression pattern at differentiation day 42. Collectively, the results of the cluster-based scRNA-seq analyses verified the accelerated mDA specification in LRRK2-G2019S-expressing neurons. Furthermore, the pattern of initially diverging and eventually converging gene profiles was clearly visualized by the distinct clusters ( Figure 2A). Cellular identity mapping revealed highly specific neuronal differentiation and highlighted the tendency of LRRK2-G2019S NESC cultures toward differentiation. The panel shows dimensionality reduction of the scRNA-seq, clustering, and identification of cell types based on lists of genes. The panel shows dimensionality reduction of the scRNA-seq, clustering, and identification of cell types based on lists of genes for both PD2/WT and PD2/G2019S. Cells identified as neurons are colored in yellow, while cells identified more specifically as mDA are in green. There is one table per time point. The first line of the table reports which percentage of cells belong to which cell type. We can see that neurons represent 5.1% (day 0), 21.6% (day 10), 34% (day 14), and 28.4% (day 42), while mDAs represent 1.8% (day 0), 8.6% (day 10), 11% (day 14), and 5% (day 42) of all cell types. Importantly, the fraction of neurons/mDAs is growing over time, except for the last time point. The second line of the table reports which percentage of cells within every cell type belong to the PD2/G2019S sample. For neurons, these percentages are 88.2% (day 0), 76.8% (day 10), 75.8% (day 14), and 70.8% (day 42); for mDAs, these percentages are 77.7% (day 0), 74.4% (day 10), 84% (day 14), and 22.8 (day 42). Thus, both neurons and mDA are considerably more numerous in the PD2/G2019S cultures than in PD2/WT at every time point (except for mDAs at day 42).
Coherently, LRRK2-G2019S-expressing NESCs appeared to be more prone to differentiate even before neuronal induction, as indicated by the significant increased number of mDA-specific transcripts ( Figure 2B). Furthermore, the cumulative expression analysis statistically verified the early upregulation of mDA-related transcripts at days 10 and 14 of differentiation.
To quantify and compare the developmental state between the genotypes, we used the branching and pseudotime analysis provided by Monocle (Trapnell et al., 2014;Qiu et al., 2017aQiu et al., , 2017b. The branching analysis identifies three branching points, which reflect the differentiation process. It becomes evident that cells carrying the LRRK2-G2019S mutation originate from the same cell state and that both cell lines differentiate toward similar cell states ( Figures 2C and S4F). Moreover, LRRK2-G2019S preceded LRRK2-WT cultures, suggesting a faster differentiation. To quantify this observation, we compared the pseudotimes as a measure of the developmental state of individual cells from both cultures, LRRK2-G2019S and LRRK2-WT, at each day. Figure 2D shows the distribution of pseudotimes of LRRK2-WT and LRRK2-G2019S cells in gray and blue, respectively. In the NESCs, the two distributions coincided demonstrating that cells started from a very similar state. At day 10, the mutant cells exhibited already a significant fraction of cells close to the final differentiation state, while the majority of control cells still exhibited a stem cell state. The faster differentiation of LRRK2-G2019S compared LRRK2-WT cells was maintained at day 14, but eventually, the distribution of both cell lines converged at day 42.
Overall, these data show that the LRRK2-G2019S mutation causes an acceleration of neuronal differentiation and mDA (B) Histograms showing cumulative gene expression distributions, for mDA-specific genes, during differentiation. 250 cells per population per day were analyzed. Statistical analysis was performed via z-test followed by Bonferroni post hoc test, ****p < 0.0001. Differentially expressed genes and percentages of particular lists are presented in Figure S4B and in Table S2. specification. Additionally, these results demonstrate that prior to differentiation, the gene expression pattern in LRRK2-G2019S NESCs already differs from that of the isogenic control cells.
LRRK2-G2019S induces early cellcycle exit, loss of stemness, and decreased viability in differentiating mDA To better understand the dynamics of LRRK2-G2019S-associated early enrichment of mDAs during differentiation, we performed a cell-cycle exit analysis. Using automated image analysis, we quantified the fraction of cycling cells using Ki-67 staining (Scholzen and Gerdes, 2000). We observed that the aberrant enrichment of neurons in LRRK2-G2019S cultures was accompanied by an earlier exit of LRRK2-G2019S NESCs from the cell cycle. At days 4, 10, and 14 of differentiation, there were significantly fewer Ki-67 + cells in LRRK2-G2019S than in wild-type cultures ( Figures  3A and S5A).
We then used the scRNA-seq data to investigate differences in cell-cycle-related gene expression ( Figures 3B, 3C, and S4D). The resulting expression matrices visualized the loss of cycling cells during differentiation and highlighted the more rapid loss of cell-cycle-related gene expression in LRRK2-G2019S compared to LRRK2-WT cultures ( Figure 3B).
We then quantified the differences in the expression of cellcycle-related genes in a cumulative expression approach ( Figure 3C).
Since the exit from the cell cycle is associated with loss of stemness, we measured the fraction of cells positive for the NSC marker SRY-box 1 (SOX-1) (Bylund et al., 2003; Pevny (C) Histograms visualizing loss of cumulative gene expression of cell-cycle-specific genes during differentiation. Population histograms were analyzed via z-test followed by Bonferroni post hoc test. (D) Dynamics of SOX-1 + loss during differentiation quantified by FACS (experiments were performed at least four times, with one technical replicate). Data were analyzed via unpaired t test. (E) Histograms showing cumulative gene expression distributions for NESC-specific genes. Population histograms were analyzed via z-test followed by Bonferroni post hoc test. Differentially expressed genes and percentages of particular lists are presented in Figure S4B and Table S2. (F) Representative confocal images of cleaved PARP (c.PARP) staining and c.PARP detection at days 4, 10, and 14 of differentiation (scale bar, 50 mm). Quantification of c.PARP + pixels at days 4, 10, and 14. Each data point represents one micrograph (experiments were performed three times, with one technical replicate). For all panels, the data are presented as the mean ± SEM. *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001. The scRNA-seq experiment was performed one time with one replicate per time point.  OPEN ACCESS et al., 1998). In NESC maintenance cultures (day 0), more than $85% of the cells were positive for SOX-1 ( Figures 3D, S5B, and S5C). In all comparisons, the starting NESC cultures were similar with respect to SOX-1 expression. As expected, the fraction of SOX-1 + cells decreased markedly during neuronal differentiation. The reduction in the fraction of SOX-1 + cells was more prominent in LRRK2-G2019S than in LRRK2-WT cells (Figure 3D). We further confirmed the accelerated loss of stemness in LRRK2-G2019S compared to LRRK2-WT cells by considering the cell distributions over gene expression for stemness genes ( Figure 3E).
We then investigated whether the increased apoptosis was the reason for the progressive time-dependent decrease in mDA number in LRRK2-G2019S cultures. We quantified the presence of cleaved poly(ADP-ribose) polymerase (c.PARP) + cells (Duriez and Shah, 1997) at days 4, 10, and 14 of neuronal differentiation. The automated image analysis revealed a significant increase of apoptotic cells during differentiation in LRRK2-G2019S cultures ( Figures 3F and S5D). The isogenic controls confirmed the clear dependency of the c.PARP phenotype to the LRRK2-G2019S mutation. The increased apoptosis at day 42 was also confirmed by scRNA-seq ( Figures S4B-S4I). In Figure S4I, we compute the distributions of gene expressions for pro-apoptotic, anti-apoptotic, and caspases across cells and compute the corresponding fold change (FC) between the expression in the PD2/G2019S and PD2/WT cells. FC > 0 indicates overexpression in PD2/G2019S (compared to PD2/WT), FC < 0 indicates under-expression in PD2/G2019S (compared to PD2/WT), FC = 1 indicates that PD2/G2019S cells express (on average) twice as much as PD2/WT cells, and FC = 0 indicates that PD2/G2019S and PD2/WTcells express the same levels. We observed that at day 10, PD2/G2019S cells significantly overexpressed anti-apoptotic genes compared to PD2/ WT. At day 42, PD2/G2019S cells overexpressed both proapoptotic and caspase genes, indicating that at day 42, increased cell death is particularly occurring in PD2/G2019S cells.
Overall, LRRK2-G2019S NESCs differentiated into mDAs significantly more rapidly than healthy cells. This accelerated differentiation was highlighted by a faster loss of proliferation capacity and stemness, and it is accompanied by increased apoptosis.
The core regulatory circuit of NESCs is deregulated in PD-patient-derived cells As NESCs carrying the LRRK2-G2019S mutation showed an increased propensity to differentiate into mDAs, we next sought to investigate the molecular cause of this alteration. Different cell types have been shown to maintain their cell identity through core regulatory circuits (CRCs) of transcription factors. These factors are often controlled by super-enhancers and bind to each other's regulatory regions to establish feedback mechanisms maintaining cell-type-specific gene expression profiles (Boyer et al., 2005;Hnisz et al., 2013). To derive the NESC-specific CRC, we performed chromatin immunoprecipitation sequencing (ChIP-seq) analysis of histone H3 lysine 27 acetylation (H3K27ac) in NESC lines H1 and H4 from healthy individuals and identified between 225 and 256 active super-enhancers in NESCs, depending on the cell line (Tables S3 and S4). Next, the transcription factors that are expressed and associated with an active super-enhancer in NESCs were identified, and the obtained data were used as an input for CRC mapper (Saint-André et al., 2016). With CRC mapper, the transcription factors that contain their DNA-binding sequence motif in their own super-enhancer region, thus forming an autoregulatory loop, are first identified. Next, the super-enhancer regions of these autoregulated transcription factors are scanned for the presence of the DNA-binding motifs of other similarly controlled transcription factors. Finally, the identified feedback loops are used to form the cell-line-specific CRCs (Figure 4). The analysis identified a CRC of six transcription factors in each cell line, with five transcription factors (NR2F1, NR2F2, SOX2, POU3F2, and POU3F3) contributing to the CRC reproducibly in each cell line, thus forming the consensus CRC of NESCs.
To test whether the CRC genes were altered in LRRK2-G2019S NESCs and whether this could underlie the observed differences in dynamics of neuronal fate acquisition, we used the scRNA-seq data to compare the expression levels of the CRC genes across the cell populations.
We considered the gene expression across the cell populations (of 250 cells each) for the five transcription factors of the core CRC. We compared the expression for each of these genes in PD2/WT and PD2/G2019S cells at days 0, 10, 14, and 42. This was performed by computing the gene expression FCs of PD2/ G2019S with respect to PD2/WT cells ( Figure 5A). NR2F1 was significantly downregulated in PD2/G2019S compared to PD2/ WT cells. NR2F2 was also significantly downregulated at days 10, 14, and 42. Furthermore, we computed the gene-gene correlation among the expression levels of these five transcription factors, retained only those correlations that were statistically significant (p value < 0.05), and averaged them over the number of gene-gene combinations. As shown in Figure 5B, the value of the average gene-gene correlation across the CRC genes in the NESCs was similar between PD2/WT and PD2/G2019S cells. During differentiation, its value increased in the PD2/WT cells, indicating that these genes are co-expressed in PD2/WT. In contrast, it decreased to zero in PD2/G2019S cultures at days 14 and 42. This indicates no statistically significant correlation between the expressions of these genes in PD2/G2019S cells upon differentiation. This further supports the idea that the CRC is dysregulated by PD2/G2019S mutation upon differentiation. Moreover, we observed in both PD2/WT and PD2/G2019S groups that at day 10 of differentiation the average correlation is higher than the values at day 0 and at day 14 in the corresponding group. This behavior is consistent with the fact that the average genegene correlation is expected to be higher when cells are undergoing a transition between cell states than when they are in a welldefined cell state. The majority of PD2/G2019S cells showed a clearly reduced expression of NR2F1, while only a few of the PD2/WT cells show any reduction in its expression ( Figure 5C). qRT-PCR confirmed the decreased expression of NR2F1 in both NESC and dopaminergic cultures after 8 days of differentiation ( Figures 5D and 5E). NR2F1 expression decreased in H lines when LRRK2-G2019S mutation was introduced and increased in PD lines when it was corrected. Overall, these data show that LRRK2-G2019S mutation altered NESC CRC compared to Cell Reports 37, 109864, October 19, 2021 7 Article ll OPEN ACCESS LRRK2-WT cultures. It also highlights the peculiar role of LRRK2-G2019S in downregulating NR2F1 expression levels.
To evaluate NR2F1 downregulation in a more complex in vitro system, we used human midbrain organoids (hMOs). We have previously shown that midbrain organoids are composed of dopaminergic neurons producing and secreting dopamine (Monzel et al., 2017;Smits et al., 2019). hMOs derived from PD patients carrying the LRRK2-G2019S mutation presented a decrease in the number and dendritic tree complexity of mDAs compared to control organoids ((Smits et al., 2019)). We looked into the scRNA-seq in (Smits et al., 2020) describing the transcriptomic profile of hMOs generated from the healthy line H3/ WT in which the mutation was introduced (H3/G2019S). We observed that NR2F1 mRNA expression is downregulated not only in mDAs but also in glia, stem cell progenitors, and nondopaminergic neurons in H3/G2019S organoids compared to H3/WT ( Figure 5F). The decrease in NR2F1 expression was observed after both 35 and 70 days of differentiation.
Next, we wanted to understand how the reduced expression of NR2F1 might be mediated. First, we analyzed existing proximity ligation-assisted ChIP-seq (PLAC-seq) data from primary human cortical neurons to identify the regulatory regions associ- Representation of the CRC derived for two healthy donor NESC lines (one technical replicate) using CRC mapper (Saint-André et al., 2016). The five shared TFs forming the consensus circuit are highlighted via underlying gray color. For each locus encoding for the indicated TFs, the detected H3K27ac ChIP-seq signal is shown. The called corresponding super-enhancer region is indicated with a burgundy bar, and the locations of the different DNA-binding motifs for the core transcription factors are indicated in the respective colors. For both cell lines, one additional CRC TF (ZBTB16 and HOXA2) was mapped as indicated. ated with NR2F1 gene via chromatin looping interactions in vivo ( Figure 6A) (Nott et al., 2019). The identified regulatory regions showed high consistency with the active enhancer regions identified in our NESCs, with most interactions clustering at the NR2F1 super-enhancer region and some chromatin loops connecting from several hundred kilobases away (Figure 6). Then we performed further ChIPseq analysis in additional cell lines to map the differentially active enhancer regions between LRRK2-WT and LRRK2-G2019S cells. In detail, we performed ChIP-seq for H3K27ac in two of the patient-derived cell lines (PD1 and PD2) and compared them either to an agematched healthy NESC line (H1 versus PD1/G2019S) or tthe isogenic control line used in the scRNA-seq analysis (PD2/G2019S versus P2/WT) using THOR, a tool for differential peak calling (Allhoff et al., 2016). This analysis confirmed the presence of a large super-enhancer flanking the NR2F1 locus also in the additional cell lines, but it did not reveal any consistent changes in it between the two comparisons ( Figures 6B and 6C). However, the analysis did reveal the presence of an additional super-enhancer almost 300 kb downstream of NR2F1 that was also interacting with NR2F1 transcription start site according to the PLAC-seq data. Interestingly, the differential peak calling revealed a consistent reduction in the H3K27ac signal along a 5-kb region of the distal super-enhancer in the patient-derived stem cells ( Figure 6B) in a LRRK2-G2019S-dependent manner (Figure 6C). This suggests that the downregulation of NR2F1 in PD patient cells could be due to LRRK2-dependent alterations in the activity of a distal downstream super-enhancer.
Dopaminergic differentiation is accelerated in vivo in Nr2f1-deficient mouse embryos NR2F1 (Nr2f1 in mouse) is a key factor for the development of several neural structures (Armentano et al., 2007;Bertacchi et al., 2019;Parisot et al., 2017;Tomassy et al., 2010), and its mutations or deletions have been implicated in human pathological Negative numbers indicate that LRRK2-G2019S expresses that gene less than LRRK2-WT, and positive numbers indicate the opposite. p values were generated by applying a z-test followed by Bonferroni post hoc test. (B) Average gene-gene correlations among the five genes of the consensus core CRC. Correlations were filtered by their p value (required to be < 0.05, and Bonferroni correction was applied), thus assuming no correlation exists for couples of genes where the correlation was not statistically significant. (C and D) Distributions of cells populations across gene expression of NR2F1 in the scRNA-seq data. The expression of NR2F1, normalized within each day, was expressed on the horizontal axes. p values were generated by applying a z-test followed by Bonferroni post hoc test. The scRNA-seq experiment was performed one time with one replicate per time point. NR2F1 mRNA expression levels were measured by qRT-PCR in NESCs (C) and after 8 days of mDA differentiation (D).
(legend continued on next page) Cell Reports 37, 109864, October 19, 2021 9 Article ll OPEN ACCESS conditions (Bertacchi et al., 2018;Bosch et al., 2014;Chen et al., 2016). To test Nr2f1 possible implication in the development of mDA, we first investigated its co-expression pattern with midbrain-specific marker FoxA2 in mouse embryos at embryonic day 12.5 (E12.5) ( Figure S6A and S6B), when both neural progenitors (NPs) and neurons start accumulating at the ventral side of the mesencephalon (Ang, 2006;Gale and Li, 2008). Nr2f1 is expressed in virtually all FoxA2 + cells, at higher level in differentiating neurons compared to NPs ( Figures S6A and S6A 0 ). The expression of Nr2f1 can be modulated by using heterozygous (HET) or homozygous mouse model (Armentano et al., 2006), in which one or both alleles are removed; consequently, Nr2f1 protein level can be roughly half in HET mice or completely abolished in KO ones ( Figures S6A 00 -S6B). At E12.5, TH + differentiated dopaminergic neurons found at the ventral-most midbrain region widely express Nr2f1 (Figures S6C-S6E), with more than 64.4% ± 2.7% showing high protein levels ( Figure S6D). In HET embryos, the loss of one NR2F1 allele is sufficient to lower Nr2f1 protein level, with the majority of the TH + population (65.2% ± 3.4%) showing low levels of Nr2f1 ( Figure S6E). Taken together, our data show that Nr2f1 expression follows the differentiative process of dopaminergic neurons and can be efficiently modulated with a genetic approach.
As NR2F1 is one of the CRC genes downregulated in LRRK2-G2019S cultures, we exploited the mouse model to test the possible effect of Nr2f1 loss on mDA differentiation in vivo. At early time points (E10.5), Nr2f1 HET and KO embryos already show decreased FoxA2 + progenitors and a parallel increased production of the first FoxA2 + TUJ1 + neurons exiting the cell cycle ( Figures 7A-7D), which we reconfirmed 2 days later ( Figures  7E-7H). To further investigate early DA differentiation, we analyzed the appearing DA marker TH in the different genotypes and found an increased DA neuronal number also at E13.5 . As for in vitro human cell differentiation, a double staining with cleaved-caspase3 revealed that TH + neurons undergo apoptosis when Nr2f1 is downregulated ( Figure 7M). Consistently, at E15.5, the number of TH + neurons in mutants was back to a level that is comparable to WT brains ( Figure 7N), suggesting that aberrantly differentiated dopaminergic cells are removed by apoptotic process. All in all, the downregulation of Nr2f1 triggers aberrant early mDA differentiation, followed by increased apoptosis, implying that Nr2f1 is key for dopaminergic differentiation. This highlights the in vivo relevance of our findings.

DISCUSSION
Cells in the early neurodevelopmental phase are particularly vulnerable to insults (Rice and Barone, 2000). Changes in the brain microenvironment during the sensitive developmental period might have deleterious effects that become apparent over time (H€ ubener and Bonhoeffer, 2014). Even subtle develop-mental alterations might increase the susceptibility toward the development of certain diseases, especially those associated with aging. The contribution of such alterations toward the development of several brain-related diseases has been previously demonstrated (Marín, 2016;Poduri et al., 2013). In this context, we suggest that the PD-associated LRRK2-G2019S mutation alters mDA developmental dynamics via NR2F1.
To model the potential neurodevelopmental contribution of the PD-associated LRRK2-G2019S mutation, we used human iPSCbased neuronal cultures, which are particularly suitable for modeling early human embryonic neurodevelopment (Mariani et al., 2012). The rejuvenation process during reprogramming, which normally represents a major obstacle when studying the age-related aspects of neurodegenerative diseases (Studer et al., 2015), is in this case beneficial. We analyzed the dynamics of the early transition phase of neuronal differentiation from NESCs to mDAs. We observed an accelerated neuronal and mDA fate specification in cells carrying LRRK2-G2019S mutation compared to control cultures. Previous studies have reported that the presence of the LRRK2-G2019S mutation does not significantly affect the number of mDAs at a time point that is comparable to our endpoint (Reinhardt et al., 2013b;Sá nchez-Dané s et al., 2012). Also, in our study, the amount of mDAs at the last time point assessed is comparable between LRRK2-G2019S and LRRK2-WT cultures. However, the dynamics by which this comparable proportion is reached are significantly different. We previously studied the LRRK2 mutation R1441G and found that murine NSCs carrying the mutation showed increased cell death and decreased neuronal differentiation, as shown by doublecortin staining after 4 days of differentiation (Bahnassawy et al., 2013). The R1441G and G2019S mutations are located in the Roc and kinase domains of the LRRK2 protein, respectively. It is likely that the dynamics of the dopaminergic differentiation might be differentially affected in models carrying the two mutations considering the location of the two-point mutations. Furthermore, differences in the utilized cell culture models as well as species differences might play a role.
An alteration in the temporal specification of mDAs might have subtle but important consequences in the substantia nigra. The timing of brain development influences the signals (growth factors, chemokines, and neurotransmitters) that a neuron receives and impacts the establishment of proper neuronal connections (Jiang and Nardelli, 2016). Neurodevelopment is a competitive and selective process (Yamaguchi and Miura, 2015), and the presence of an altered number of mDAs during specific phases of neurodevelopment might impact correct neuronal integration (Bergami and Berninger, 2012). Such subtle changes in the specification dynamics probably are not sufficient to cause severe abnormalities at that specific time point but might increase the susceptibility toward the onset of PD after a second hit (Sulzer, 2007). This second hit can consist of additional (E) Data were normalized to the NR2F1 expression level of H1. Statistical significance was assessed with unpaired t test or Mann-Whitney test according to normality test. The comparison H1/WT versus H1/G2019S was assessed with a one-sample t test. *p < 0.05, **p < 0.01. qRT-PCR experiments were performed three times in triplicate. (F) Gene expression data of NR2F1 levels from scRNA-seq data of human midbrain organoids (hMOs) generated from the line H3/WT with its isogenic control H3/ G2019S. hMOs were cultured for 35 and 70 days; 30 organoids were used for each genotype and time point. In agreement with the literature (Liu et al., 2012;Reinhardt et al., 2013b), we detected a loss of viability in LRRK2-G2019S cultures and confirmed this phenotype throughout several stages of mDA differentiation. The loss of viability and the altered cellular fate transition of LRRK2-G2019S mDAs could result from multiple factors. Increased cell death was in fact observed also in LRRK2-G2019S NESCs compared to controls (Walter et al., 2019;Nickels et al., 2020). According to the scRNA-seq analysis, the presence of the LRRK2-G2019S mutation induced the acquisition of a pro-differentiation state in NESCs. This is highlighted by the significant upregulation of neuronal and mDA transcripts and the downregulation of NESC transcripts. Overall, LRRK2-G2019S cultures showed increased and accelerated mDA generation accompanied by increased cell death.
Strikingly, the transcription factor NR2F1 appears to be involved in this accelerated differentiation. NR2Fs, also called chicken ovalbumin upstream promoter transcriptional factors (COUP-TFs), are a family of orphan receptors whose putative molecular ligands are still elusive (Bertacchi et al., 2018). NR2F1 and NR2F2 are the major homologs of the family identified in vertebrates and thought to have similar functions.
The transcription factor Nr2f1 has been shown to play a key role in development, in particular during neocortical area patterning (Greig et al., 2013;Terrigno et al., 2018). Its expression levels seem to regulate temporal competence in NPs (Bertacchi et al., 2018). Nr2f1, along with Nr2f2, regulates the timing of the switch of NSCs from neurogenesis to gliogenesis (Naka et al., 2008). In addition, it regulates neuronal differentiation by coordinating several processes within the neuroepithelium, including cell-cycle exit, neurogenesis, and fate determination (Faedo et al., 2008). These processes seem to be mediated via receptor tyrosine kinase signaling and b-catenin-mediated Wnt signaling (Faedo et al., 2008).
By knocking out Nr2f1, we were able to phenocopy the accelerated mDA differentiation phenotype in vivo in the mouse mesencephalon. Interestingly, Nr2f1 downregulation in mouse to half dosage (HET mutants) only shows a trend toward accelerated DA neuron differentiation. Only following complete loss of Nr2f1 expression in vivo (i.e., full KO mutants) were a significant increase of early differentiation and increased apoptosis found. This discrepancy could be due to the fact that Nr2f1 is only one out of five CRC factors for DA development. The loss of Nr2f1 alone in mouse development could be compensated by other factors of the same network, notably Nr2f2, as redundancy between the two homologs has been demonstrated in other brain areas (Flore et al., 2017;Tang et al., 2010;Zhou et al., 2015). Nr2f1-2 double knockdown was shown to cause sustained neurogenesis in stem cells at the expense of gliogenesis both in vitro and in vivo in developing mouse brain (Naka et al., 2008). The in vivo relevance of NR2F1 in the context of dopaminergic neurons comes also from a study investigating NR2F1 expression in the olfactory bulb of mice (Bovetti et al., 2013). The study proved that odor deprivation resulted in decreased expression of TH and Nr2f1. Considering that hyposmia is an early clinical manifestation of PD, it is notable that Nr2f1 is involved in the regulation of sensory-dependent plasticity in the adult olfactory bulb.

LIMITATIONS OF THE STUDY
This study contains convincing pieces of evidence describing abnormalities in the differentiation trajectory of LRRK2-G2019S dopaminergic neurons mediated by NR2F1. The LRRK2-G2019S-dependent decrease in NR2F1 levels is associated with altered distal super-enhancer activity, but further studies are necessary to clarify precisely the molecular signaling connecting the two proteins. NR2F1 contains several putative phosphorylation sites, some of which have been proven to be   (Gay et al., 2002). From here, it is tempting to speculate that NR2F1 can be targeted by cellular kinase, such as LRRK2.
A limitation of this study is that the in vivo relevance of the observed LRRK2-G2019S-dependent neurodevelopmental defects has yet to be mechanistically proven in the context of PD.
The key point that remains unsolved is whether the accelerated differentiation is the major driver of the increased cell death observed in LRRK2-G2019S differentiating neurons or whether other factors are contributing. The latter scenario seems more likely, as cell death has been observed also in LRRK2-G2019S NESCs ( Walter et al., 2018;Nickels et al., 2019), accompanied by altered mitochondrial morphology and functionality ( Walter et al., 2018). A major experimental challenge to fully address the role of LRRK2-G2019S in dopaminergic neurogenesis and its impact on PD pathology remains how to recapitulate in vitro complex phenomena ranging from brain development to ageassociated neurodegeneration, which in vivo take place over decades. The use of aging inducers might certainly help to recreate on dish the desired time-dependent cellular changes within a reasonable experimental setup. An additional strategy would be the possibility of gene editing postmitotic neurons to, for example, correct the LRRK2-G2019 mutation only in fully differentiated neurons. This would allow us to evaluate whether these neurons are still susceptible to cell death as a consequence of the altered neurogenesis even though the mutation was corrected. These types of studies currently still present technical challenges, but they are the necessary step forward to unravel key unresolved questions in the field of neurodegeneration.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Jens Schwamborn (jens.schwamborn@uni.lu). Certain materials are shared with research organizations for research and educational purposes only under an MTA to be discussed in good faith with the recipient.

Material availability
This study did not generate new unique reagents.
Data and code availability scRNA-seq data have been deposited at GEO (GSE128040) and are publicly available. All the data of this manuscript is openly available https://doi.org/10.17881/y9k6-xa72. The codes used in the paper are available on GitHub through the page https://doi.org/10.17881/y9k6-xa72. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS
Human iPSCs were used in this study. Their use was approved by the local ethical committee (Ethic Review Panel of the University of Luxembourg, PDiPS project and CNER No 201305/04).
All mouse experiments were conducted in accordance with relevant national and international guidelines and regulations (European Union rules; 2010/63/UE), and have been approved by the local ethical committee in France (CIEPAL NCE/2019-548). Nr2f1 heterozygous (HET) and homozygous (KO) mice were generated and genotyped as previously described (Armentano et al., 2006). Littermates of HET and KO mice with normal Nr2f1 alleles were used as control mice (herein called WT). Midday of the day of the vaginal plug was considered as embryonic day 0.5 (E0.5). Control and mutant mice were bred in a 129S2/SvPas background. Both male and female embryos were used in this study; age is specified for each embryo used in specific experiments. Standard housing conditions were approved by local ethical committee in France (CIEPAL NCE/2019-548); briefly, adult mice were kept on a 12 hours light-dark cycle and housed three per cage with the recommended environmental enrichment (wooden cubes, cotton pad, igloo) with food and water ad libidum.

Chromatin Immunoprecipitation (ChIP)
The chromatin immunoprecipitation of hNESCs was performed as previously described (Gé rard et al., 2019). For immunoprecipitation 30 mg of chromatin was used per reaction with 5 mg of an antibody against H3K27ac (Abcam, ab4729) and incubated overnight.

ChIP-seq
The sequencing of the ChIP samples was performed at the Genomics Core Facility in EMBL Heidelberg, Germany or at the sequencing platform of the Luxembourg Centre for Systems Biomedicine (LCSB) at the University of Luxembourg with single-end and unstranded reads sequenced in an Illumina HiSeq 2000 machine with read length of 50 bp or in an Illumina NextSeq 500 machine with read length of 75 bp, respectively. In total approximately 30 million raw reads per sample were obtained. The read quality control and mapping were performed as previously described ( Gé rard et al., 2019) using human reference genome GRCh38. The enhancer and super-enhancer regions were called with Hypergeometric Optimization of Motif EnRichment (HOMER) version 4.7.2 (Hnisz et al., 2013) using the respective input samples as controls. For super-enhancer calling a stitching distance of 10 kb was used.
Identification of the core regulatory circuit (CRC) ChIP-seq data of H3K27ac enrichment from NESCs and the HOMER called super-enhancer coordinates were used as an input for ROSE (Warren Andersen et al., 2013) to associate super-enhancers to actively expressed genes and to derive a list of superenhancer controlled transcription factors in NESCs using the default parameters. The obtained matrix was in turn used as an input for CRC mapper (Saint-André et al., 2016) to derive a core regulatory circuit per NESC line using the default parameters.
Identification of differential H3K27ac peaks THOR (Allhoff et al., 2016) was used to identify differential H3K27ac regions between conditions. H3K27ac signal normalization was performed using the TMM approach implemented in THOR and differential H3K27ac regions were considered significant at a p value of 0.05. Only differential H3K27ac peaks that show an absolute difference in read count superior or equal to 1000 were kept for downstream analysis.
Single-cell RNaseq using DropSeq Microfluidics devices were fabricated using a previously published design (Macosko et al., 2015;Walter et al., 2019). All the procedures for obtaining the single-cell suspension as well the processing of the Dropseq data were performed as previously described (Walter et al., 2019).

Analysis of differentially regulated genes
The output of the DropSeq experimental setup ( Figure S4A) is the expression matrix where rows represent genes, and each column is an individual cell. The elements of this matrix represent the measured intensity of gene expression, i.e., the measured number of mRNA molecules leading to the gene expression matrix.
To extract the relevant information on the developmental status of cells from the large dataset, we defined five lists of specific genes (Table S1). In particular, these lists contain genes covering stemness (NESC), dopaminergic neurons (mDA neurons), cell cycle, and related to pro-apoptotic and caspases genes.
For each of these lists, we compared the expression of the corresponding genes between genotypes within sampling points. Since the gene expression levels were measured at the single cell level, we analyzed the distribution of gene expression across cells. For this purpose, we consider in Figures 2A, 3A, and S4C histograms showing the distributions for cells of each group of the cumulative gene expression, for each of the lists considered above (rows) and for each day (columns), where cumulative gene expression corresponds to the sum of all mRNAs of listed genes leading to a single cumulative score for each cell. Since total numbers of mRNAs are not comparable between days the values in the horizontal axis are normalized, separately for each day and each list of genes, to the maximum of the cumulative gene expression (across the genotypes GC and G2019S) for that particular day and list. Thus, 1 corresponds to the maximal cumulative gene expression within one day and for one list of genes, while 0 corresponds to no expression for all of the genes on that list on that day.
We applied a z-test to assess a statistically significant difference between the means of the cumulative gene expression of the genotypes. The test was corrected with Bonferroni for multiple testing. Since between Figures 2B, 3C Article ll OPEN ACCESS to five lists of genes and across four different days, which means to correct for a total of 20 repetitions of the test. Standard p values thresholds are indicated on the panels as p values * < 0.05,** < 0.01, *** < 0.001, **** < 0.0001.
Furthermore, for each day we determined how many genes were differentially expressed between control and mutant, by applying to all genes (independently for each gene) the following three statistical tests: one-way ANOVA test, a one-way ANOVA test on ranks (Kruskal-Wallis test), and a Mutual Information based test. The minimum p value obtained by each gene across these three tests was retained, and any gene was considered differentially expressed with statistical significance when the corresponding p value was below the threshold of 0.01. In order to account for multiple testing, Bonferroni correction was included, which represents a conservative choice. The test was applied for all the approximately 20000 genes for each time point. This means that the individual p value for each gene is required to be < 0.01/20000 in order to claim statistical significance with an overall p value < 0.01. Figure S4B shows the number of genes that were differentially expressed between genotypes with a statistical significance corresponding to a p value < 0.01, for each day. In the tables also the percentages were reported, indicating how many of the genes of each individual list (of the five lists outlined above) are differentially expressed at each day. The colors indicate the list of genes to which the percentages refer to, and the color coding is the same as in Figure S4C.

Cell cycle analysis
To characterize the cellular maturity, we subsequently investigate the cell cycle in more detail. Figure 3B shows a subset of the expression matrices corresponding to cell cycle genes of Figure 3C for each day (columns), and each group (upper and lower row respectively), where the color scale indicates the intensity of expression. We further investigate cell cycle by means of dimensionality reduction, in particular, applying tSNE, in Figure S4D.
Normalization to the mean gene expression value was further performed to obtain the relative expression levels. This was done by subtracting the average expression value (log 2 (TPM+1)) of each gene from all the cells of the DGE matrix. Previous cell cycle analysis have shown two prominent gene expression programs (G1/S and G2/M) to contain several gene expressions of different cell cycle phases to overlap among the two (Macosko et al., 2015;Tirosh et al., 2016;Whitfield et al., 2002). Furthermore, these gene expression patterns have also been observed in the G1/S and G2/M cell cycle phases, in the bulk sample analysis of the differentially synchronized HeLa cells (Whitfield et al., 2002). Based on this data, a core set of 100 genes (G1/S) and 133 genes (G2/M) was considered for the cell cycle analysis (Table S1).
Due to the sparsity of the scRNaseq data, the expression for each cell cycle phase was refined by first evaluating the correlation between each of the genes in the scRNaseq data with the average gene expression values of all the genes involved in the respective cell cycle program (G1/S & G2/M), and by then including all the genes with high correlation value (R 2 > 0.3; p value < 0.05). Hierarchical clustering of the data demonstrates that some cells are cycling with high relative expression of most of the genes included in either of the cell cycle program or both of the programs, while other cells show basal expression for most of these genes ( Figure 3D).
To determine the cell cycle score for individual cells and to set a threshold for classifying the cells as cycling cells, we proceed as follows. First, to determine the cell cycle score for individual cells (Figure S4D), the average value of the relative expression of all the genes involved in both cell cycle programs were evaluated. Second, the average basal expression score for the cell cycle program was calculated for all the genes involved in both cell cycle programs (G1/S and G2/M) for the 10% of the cell population showing the least expression magnitude. Finally, the classification of a cell as cycling or non-cycling was performed using the t test statistics on the expression value of different genes in both cycling programs, considering the averaged basal cell cycle score (cycling cells; FDR < 0.05).

Pseudo-temporal analysis
To cross-validate our analysis and to quantify the developmental differences between the PD2/WT and PD2/G2019S cell lines, we also employed the single cell R analysis package Monocle ( Qiu et al., 2017). For a cross time point analysis, data of all 3055 cells from both cell lines and all time points were combined into one expression matrix shown in Figure S4A. We then performed the standard quality control steps of Monocle for gene filtering, considering a minimal expression threshold of 10% and only genes expressed in at least 10 cells, leading to a reduction of the number of considered genes from 20,766 to 16,992. To remove outliers and doublets, only cells with at least 2123 and less than 23350 transcripts were considered leading to 2954 cells. Using the same gene lists led to a very similar population composition compared to our customized analysis (see below).
For the cluster analysis across days, we reduced batch effects by excluding residuals caused by the number of expressed genes. The resulting clustering shown in Figure S4E demonstrates that cells at day 0 do not differ between the two cell lines but diversify during the differentiation process.
To quantify and compare the developmental state between the genotypes, we used the branching and pseudotime analysis provided by Monocle. For this purpose, we restricted our analysis to the most variable genes identified by the dispersion characteristics calculated by Monocle. Subsequently, we again reduced the dimensionality for stable calculation of cell trajectories based on reverse graph embedding ( Qiu et al., 2017). Based on the resulting trajectories, Monocle identifies branching points and calculates pseudotimes by ordering cells on the resulting branches based on shortest paths similar to the diffusion map approach (Haghverdi et al., 2016).
Cluster detection, cell identity mapping, and quantification For a deeper analysis of cell states and the differentiation process, we developed a custom analysis pipeline using MATLAB (version R2017a; Mathworks) scripts. To preprocess the gene expression matrices as described. We restricted the number of cells (columns) Cell Reports 37, 109864, October 19, 2021 e5 Article ll OPEN ACCESS to a subset of the most maximally expressed cells, according to various factors such as the minimum level of expression, the cumulative level of expression as described. This led to the number of cells used for each group and day as indicated (Figures S4G and  S4H). Then, we performed a cluster analysis of this high dimensional data to investigate cell type-specific gene expression across different days, and for the different cell lines. We first reduced the dimensionality of the data by principle component analysis (PCA) (Bishop, 2006). This technique allowed obtaining the orthogonally transformed space that better captures the variance of the data. For the sake of data analysis and visualization, we kept only the first 20 principal components (PCs), reflecting more than 75% of the variance for all days.
Next, the resulting matrices for each day containing cells of both genotypes were used as inputs to the t-SNE (t-distributed stochastic neighbor embedding) algorithm (van der Maaten and Hinton, 2008). While PCA is transforming the data by capturing the highest mode of variation, allowing to preserve as much variance as desired and still reduce the dimensionality, t-SNE rather looks at finding a non-linear transformation that preserves the neighboring structure of nearby points in the original data. Therefore, its purpose its merely enabling the visualization of the data, originally in a high dimensional space, by projecting them to 2 or 3 dimensions. Ideally, the projected data may possess some non-random structure, such as separated clusters of points organized together, that will allow unveiling different properties and relations between the samples, in our case, the different cells. Using the results of PCA as input and mapping the data to three dimensions we selected the best viewpoint in terms of clearness of the existing clusters. When running t-SNE, we used the Barnes-hut algorithm for the approximate computation of the joint distributions (van der Maaten and Hinton, 2008), and Euclidean distance as a measure of similarity between points.
From the scatterplots obtained by t-SNE we already got some insights into the structure of the data. But to more deeply investigate the data, we designed a fully automated and unbiased pipeline for performing the different analyses presented in Figure 2A. In the next step, a cluster analysis on the resulting 3D data was performed by fitting a Gaussian mixture model using expectation maximization (Bishop, 2006). Each cluster is represented by one resulting ellipsoid, whose centroid and shape depends on the mean and covariance matrix of the underlying 3D Gaussian distribution fitted to the neighboring points. We fitted eight components in all conditions, as shown in Figure 2A.
In Figure 2A we color-coded the cells by their cell type identity. The seven potential identities considered were: oligodendrocyte, neuron, mDA neuron, microglia, endothelial, astrocytes, and NESC. Each of these phenotypic identities was defined by a list of genes known to be highly expressed in the corresponding cell types (Table S1). For each cell, we obtain then seven scores (one per identity) as the mean value of the expression level of all genes contained in each particular list. This way, we compared unbiasedly all these scores and just selected the one with the highest value as the mostly expressed identity, which finally enabled the generation of all scatterplots in Figure 2A.
Similarly, we have a different list of genes that defines different phenotypes and/or states during cell development (Table S1). Using these lists, we first obtained for each cell a mean expression value. Then, using these scores, we compared all the cells of one cluster to the cells in the remaining clusters by means of the Mann-Whitney U test in order to obtain a p value from the comparison. This test was chosen due to the fact that the normality assumption was not fulfilled. This was done for all possible comparisons, eight in our case, the number of fitted components. Then, we discarded all the comparisons where the mean expression level of all cells in the chosen cluster is smaller than the mean expression level of the remaining ones. Finally, among the comparisons left, we chose the one with the lowest p value. This way, we led to one cluster that we define as the ''cluster with the highest expression,'' as opposed to the remaining cells. Additionally, the size of the marker depends on the expression level.
Additionally, we provide the percentages of either PD2/G2019S or PD2/WT cells in the highlighted cluster with respect to the total number of cells of that type, indicated with square and rounded markers respectively. The p value provides the statistical significance which of the genotypes is more highly concentrated within the highlighted cluster. This p value was obtained by running a permutation test on the data, i.e., we randomly shuffle the elements of the label vector that contains the correspondence between each cell and the group it belongs to. Then, using this shuffled vector, we counted again the number of cells of each genotype contained in the particular cluster. We repeated this permutation 20,000 times, finally leading to the distributions of a number of cells for each group in the cluster. Hence, given these distributions and the original number of cells of each genotype in the cluster, we computed a p value that accounts for an abnormal number of cells of one specific type in the highlighted cluster.
Data pre-processing scRNA-seq in organoids From midbrain organoids datasets, cells having unique feature counts over 2,500 were removed as probable doublets or multiplets. Similarly, low-quality cells or empty droplets, were further filtered out with unique feature counts below 100 (for day35 data) -200 (for day70 data), and mitochondrial transcripts above 30% ( Figure S1). MT genes were removed from midbrain organoid datasets after QC by deletion of all row names with pattern 'MT'. After QC, WT35 midbrain organoids included 2864 cells, WT70 -2005 cells, MUT35 -2946 cells and MUT70 -2660 cells. To better transmit the biological information between in vivo and in vitro ventral midbrain datasets, midbrain organoid data (WT35, WT70, MUT35 and MUT70) were integrated with embryonic midbrain data (La Manno et al., 2016) using Seurat integration analysis workflow (Stuart et al., 2019). Integration was performed based on the top 20 dimensions. RNA assay data of integrated object were log normalized and scaled to 10'000 transcripts per cell.

OPEN ACCESS
Cell type identification scRNA-seq in organoids After integration of embryonic midbrain and midbrain organoid datasets, integrated object was scaled and principal component analysis (PCA) was applied. Cell clustering was performed based on the top 20 principal components using Louvain algorithm modularity optimization with a resolution of 0.5. UMAP was used for cell cluster visualization (Becht et al., 2018). Nine distinct cell clusters were identified in UMAP plot. Clusters 0 and 7 were present only in midbrain organoids, and located in a close proximity to each other in UMAP plot, indicating their high similarity and in vitro specificity. Because of this overclustering both clusters were pulled, resulting in eight distinct cellular identities. For cell type identification, binarized gene list across cell types from (La Manno et al., 2016) was used. For more details on how this list is generated please refer to (La Manno et al., 2016).

Characterization of the core CRC genes expression by sc-RNaseq
To investigate via scRNaseq the difference in expression of the five genes of the core CRC, we considered the gene expression across the cell populations (of 250 cells each) for the five transcription factors. We compared the expression for each of these genes within the PD2/WT cells with the one within the PD2/G2019S cells, for day 0, 10, 14 and 42. This was done by computing the gene expression fold-change of the G2019S compared to the WT ( Figure 5A). For each gene and day, this is obtained as the logarithm in base two of the ratio between the average expression (across cells) of that gene in the G2019S, and that in WT, so negative numbers indicate that PD2/G2019S expresses that gene less than PD2/WT, positive numbers the converse. The number of cells considered is 250 per population and per day, and the p value comes from applying a z-test, and accounting for repeated hypothesis testing by applying Bonferroni correction for the total number of time the test was repeated for all the genes of the consensus. Error bars are obtained considering the error on each of the two averages (PD2/WT, PD2/G2019S) by the standard error of the mean (SEM), and combining the two using the variance formula for calculating error propagation. Therein, the statistical significance was assessed with a z-test, including Bonferroni correction for multiple hypothesis testing.
The expression levels of the transcription factor NR2F1 are shown in detail in terms of cells distributions across gene expression in Figure 5C. These histograms exhibit the normalized NR2F1 expression on the horizontal axes. They show the details of the distributions of cells corresponding to the red bars in Figure 5A. Statistical significance is again assessed by means of a z-test corrected by Bonferroni correction accounting for the repetition of this test on the full-list of consensus and non-consensus genes of the core CRC tested.
Furthermore, the average gene-gene correlation between the five genes of the consensus core CRC was computed and we retained only those correlations which were statistically significant (p value < 0.05). These were averaged over the number of genegene combinations. Gene-gene correlation is computed for each day and condition among each couple of genes, then averaged over genes. Correlations are filtered by their p value, thus assuming no correlation for couples of genes where the correlation is not statistically significant, i.e., whenever the corresponding p value is < 0.05 after applying Bonferroni correction for multiple hypothesis testing across all genes, days and conditions.
In vivo prototype mapping To compare our in vitro cell preparation to in vivo cells embryonic human development we used, as a reference, a single-cell atlas of the embryonic human ventral midbrain (described in La Manno et al., 2016). To perform the comparison we used ''prototype analysis'' a machine learning approach that is described in more detail in that paper. In brief, the analysis strategy is to compare every singlecell transcriptome to a set of predefined cell types using a machine learning classifier. First a set of cell type identities, also ''prototypes,'' are each defined by taking in consideration a group of cells from the human embryonic atlas. Then, a regularized logistic regression classifier is trained to discriminate cells from different prototypes. The model formulation is probabilistic and can output a probability for each possible assignment. The assigned prototype is the one of maximum likelihood.
With respect to the original paper (La Manno et al., 2016), we made the following minor changes. We performed a normalization of the NESC dataset. This was a normalization by total molecules across the cells of the sample and normalized the whole dataset to the same scale of the training dataset (the atlas). In particular, the normalization is this is:x ij = m j: xij P j xij where m yj = mean P i y ij . x ij and y ij indicate the value of the i th gene for the j th cell for the test and train set respectively. This size normalization was meant to buffer two effects: cell-to-cell variation in RNA detection that in the DropSeq can be stark, and the difference in average sequencing depth of the test with respect to the reference dataset since they were generated with different technologies (STRT versus DropSeq). Prototypes that were used to train the classifier are so defined: Embryonic stem cells (eES) prototype was learned by the model using eSCa, eSCb and eSCc as a reference. Radial glia-like cells type 1 (Rgl1) were trained on hRgl1; radial glia-like cells type 2 (Rgl2) trained on hRgl2, hRgl2b, hRgl2c; radial glia-like cells type 3 (Rgl3) trained on hRgl3, Neural progenitors (NProg) on hNProg, neuroblast mediolateral type 1 on hNbML1, serotonergic neurons (Sert) trained on hSert; dopaminergic neurons (DA) trained on hDA0, hDA1 and hDA2 populations, neuroblast medial (NbM) on hNbM; Early progenitor cells (Prog) on hProgFPM, hProgFPL, hProgM and hProgBP. The expression of the top 18 genes for the twenty best cells identified as DA, NbM and Prog are compared to their corresponding embryo standards in Figure S1.