Identification and characterization of functional modules reflecting transcriptome transition during human neuron maturation

Neuron maturation is a critical process in neurogenesis, during which neurons gain their morphological, electrophysiological and molecular characteristics for their functions as the central components of the nervous system. To better understand the molecular changes during this process, we combined the protein-protein interaction network and public single cell RNA-seq data of mature and immature neurons to identify functional modules relevant to the neuron maturation process in humans. Among the 109 functional modules in total, 33 showed significant gene expression level changes (discriminating modules) which participate in varied functions including energy consumption, synaptic functions and housekeeping functions such as translation and splicing. Based on the identified modules, we trained a neuron maturity index (NMI) model for the quantification of maturation states of single neurons or purified bulk neurons. Applied to multiple single neuron transcriptome data sets of neuron development in humans and mice, the NMI model made estimation of neuron maturity states which were significantly correlated with the neuron maturation trajectories in both species, implying the reproducibility and conservation of the identified transcriptome transition. We identified 33 discriminating modules whose activities were significantly correlated with single neuron maturity states, which may play important roles in the neuron maturation process.


Background
As the central organ of the nervous system, the brain is composed of multiple types of neurons and glia ina complex cyto-architecture. By means of synaptic contacts, neurons form local and long-distance networks, which is a key component for brain function. Prior to the establishment of neuronal connections, neurons are generated from neuronal progenitor cells (NPC) located in the areas near the ventricles, and start a long maturation process comprised of a series of sequential and sometimes overlapping steps: neuronal migration, axon elongation, dendrite formation, synaptogenesis and refinement of connections (pruning). This complex developmental process leads immature neurons to eventually acquire their mature appearance and full electrical excitability [1][2][3]. However, while the molecular changes and regulatory mechanisms of NPC proliferation have been described in detail [4,5], our knowledge of neuron maturation is still relatively sparse. A comprehensive investigation of neuron maturation at the molecular level could largely expand our understanding not only of brain development and function, but also of neurodevelopmental disorders such as autism and schizophrenia. It could also spark the quantitative measurement of neuronal maturity states, which may provide a powerful tool for future studies.
Here, we adapted an insulated-heat-diffusion-based network smoothing procedure with a topological overlap matrix-based module identification method to analyze differences between immature and mature neurons on the transcriptome level, based on published single-cell RNA sequencing data of adult and fetal human brain tissues and the protein-protein interaction network annotated in the Reactome database. With the identified functional modules discriminating neurons in different maturity states, we developed machine-learning-based neuron maturity indices (or NMIs), which aim to quantify the level of neuron maturity. By applying the NMI models to multiple human and mouse single-cell or purified bulk RNA-seq data from neurons at different developmental stages and conditions, we verified the identified transcriptome transition during neuron maturation in human neuron in vitro models, as well as its high conservation in mouse neurons. The constructed NMI models thus show their potential in describing and comparing a variety of neuron maturity states.

Detection of protein-protein interaction modules relevant to human neuron maturation
To comprehensively investigate changes of functional modules during the process of neuron maturation in humans, we adapted the module detection algorithm based on the topological overlap matrix (TOM) [6], from the widely used gene co-expression network analysis pipeline WGCNA [7], to the protein-protein interaction network annotated by Reactome [8,9]. To include gene differential expression information, each edge in the network was weighted by the difference of expression level changes between linked genes, which were smoothed with the insulated heat diffusion procedure to reduce influence of noise (see Materials and Methods). Gene expression level changes during the neuron maturation process in humans were estimated based on the published single cell RNA-seq (scRNA-seq) data of fetal and adult human brains. [10].
The analysis resulted in 109 functional modules with sizes ranging from 21 to 203 genes, with a median size of 38 genes (Fig. 1a). As shown by the calculated adjusted random index (ARI) [11,12], choice of insulating parameter did influence the identified modules, but the modular composition remained generally robust (Additional file 1: Figure S1). A two-sided Wilcoxon signed rank test was applied to each module in order to identify functional modules with significant expression level changes with concordant direction. Thirty-three functional modules with significant directional changes, which were referred to discriminating modules, were identified (Benjamini-Hochberg (BH) corrected P < 0.05, Additional file 2: Table  S1). Among them, 17 modules accounting for 964 genes in the network showed higher activity in mature neurons (referred as mature-high modules). On the other hand, the remaining 16 modules accounting for 1125 genes showed higher activity in immature neurons (referred as immature-high modules). Among the 33 discriminating modules, 31 of them were significantly overlapped with the three co-expression modules identified by applying WGCNA to the 20 cell-pooling samples (Fig. 1b). Gene Ontology (GO) enrichment analysis by topGO [13] and GOSemSim [14] indicated that genes encoding for membrane proteins which participate in cell communication, signaling and oxidation-reduction processes for energy generation were strongly enriched in mature-high modules (Fig. 1b, Additional file 3: Table S2). On the other hand, genes encoding for nuclear proteins related to transcription and post-transcriptional processing including splicing and translation were enriched in immature-high modules (Fig. 1b, Additional file 3: Table S2). The neuron specificity index (NSI) for each of the 33 discriminating modules suggested that, genes in majority of those modules (22 out of 33, one-sided binomial test P = 0.04, Fig.  1b) presented a trend of higher gene expression levels in neurons compared to glial cells, which further implied the biological importance of those modules in neurons.
Although lacking additional data for in vivo transcriptome of human neurons across the whole neuron maturation process, it has been reported that neuron maturation explains the majority of brain transcriptome changes during prenatal and new-born postnatal development [15]. Therefore, we took the advantage of fetal and early postnatal brain RNA-seq dataset in BrainSpan and another age series RNA-seq data [16], to compare the brain transcriptome before and after postnatal day 100. Remarkably, 28 out of the 33 discriminating modules showed significant concordant expression level changes (one-sided Wilcoxon signed rank test to fold changes (FC), BH-corrected P < 0.1) in at least one dataset, while 20 of them showed significant concordance in both datasets (Fig. 1). In addition, although not significant, another two modules showed consistent direction of changes in both datasets. These results suggest that the discriminating modules represent the reproducible functional modules discriminating mature and immature neurons.

Adversarial functional module pairs
Interestingly, a further comparison with PPI functional modules, which were detected without integrating with expression level differences, identified six adversarial functional module (AFM) pairs. The two modules in one AFM pair were corresponding to the same module when the differential expression information was not integrated (Fig. 1). Three out of six AFM pairs (M4-M14, M36-M108, M8-M72) showed significant expression changes with consistent directions in at least one bulk brain RNA-seq dataset (one-sided Wilcoxon signed rank test, BH corrected P < 0.01). In addition, consistent discordance in all pairs were observed in both bulk brain datasets (one-sided Wilcoxon rank sum test, P < 0.01). Further functional analysis revealed highly consistent, connected but varied GO term and biological pathway enrichment in each pair of adversarial modules (topGO with the parentchild algorithm for GO terms, one-sided Fisher's exact test for pathways; BH-corrected P < 0.05, Additional file 3: Table S2). This analysis indicated that highly connected biological pathways may play distinct roles during neuron maturation in humans. They may reflect decoupling of components in the same pathway during the neuron maturation process.
A good example is the AFM pair M4-M14 (Fig. 2). Genes in both modules participate in signaling by Rho GTPases, and more specifically, by activating the Rhotekin and Rhophilins pathway according to the Reactome annotation. Interestingly, this pathway splits into two parts: RHOB/C and RTKN in mature-high M4, and RHOA, RHPN1/2 and TAX1BP3 in immature-high M14. This partition implies that, although Rhotekin and Rhophilins both participate in Rho GTPases signaling, they interact with different members of the Rho protein family and play different roles in the process of neuron maturation. Rhophilins interact with RhoA and take part in neuron maturation including neuron migration, which is supported by previous studies suggesting interaction between them [17] as well as the role of Rhophilins in cell migration [18]. Rhetekin, on the other hand, while being important in neural differentiation and neurite outgrowth, is also required for neuron survival [19]. This may explain why the expression level of RTKN gene remains high in mature neurons.
Another AFM pair, M32-M39, represents another scenario. While both modules show significant enrichment of pathways related to endocytosis, genes in the two modules also participate in distinct pathways (Fig.  2). Spry regulation of FGF signaling pathway, which has been reported to be required for cortical development Color darkness indicates whether the change is significant according to Wilcoxon rank sum test. Boxes mark adversarial module pairs. Signature functions, defined as module-specific GO-BP terms annotated to majority of genes in modules are shown next. The rightmost bars show the neuron specificity index (NSI) of each module: orangepositive NSI; greennegative NSI [20], only appears in the immature-high module M32, whereas EPH-ephrin mediated cell repulsion, whose role extends from development to adulthood regulating neuronal plasticity [21], only appears in the mature-high module M39. In summary, the pleiotropy of genes and pathways leads to the separation of the two modules.

Identified functional modules discriminated different maturity states of neurons from in vitro models
To further estimate how well the neuron-maturationrelated transcriptome transitions we identified, especially genes participating in the detected discriminating modules, reflect status of neuron maturation, we establish a machine-learning-based quantitative estimate of neuronal maturity state and tried to apply it to other data sets. In brief, we constructed a LASSO-regularized logistic regression model based on the standardized expression level of genes involved in each identified module. Each model provided a value ranging between zero and one, namely a modular Neuron Maturity Index (mNMI), with values closer to 1 indicating higher maturity. Ten-fold cross-validation suggested high performances for most of the mNMIs (median AUC = 0.87, Additional file 4: Figure  S2). Applying the models in the test set also resulted in accurate estimations (median AUC = 0.84, Additional file 4: Figure S2), with those based on discriminating modules performing marginally better (two-sided Wilcoxon rank sum test, P = 0.11). The mNMIs were further added to two integrated NMIs to represent the overall maturity state, by taking their averages weighted by their performances. This procedure was done by either including all mNMIs (transcriptome NMI or tNMI), or only those based on discriminating modules (discriminating NMI or dNMI). Both general NMIs performed perfectly in distinguishing mature and immature neurons in the test set (AUC = 1, Additional file 4: Figure S2).
With the NMI models constructed, we applied them to neuron transcriptome data sets of in vitro neuron models in order to check whether the identified transcriptome transition could be reproduced and therefore represent the general molecular signature of neuron maturation. In a previous study, Bardy et al. combined patch clamping and scRNA-seq to investigate the relationship between transcriptome and electrophysiology of iPSC-derived neurons [22]. The estimation of NMIs indicated trend of increased neuron maturity accompanying increased action potential, i.e. the electrophysiological maturity, especially between the most immature and mature neurons (one- sided Wilcoxon rank sum test, P = 0.12 for dNMI, P = 0.02 for tNMI, Fig. 3 and Additional file 5: Figure S3).
While this dataset was limited by its relatively small number of neurons (N = 56), Close et al. applied scRNAseq to interneurons generated by in vitro differentiation of human embryonic stem cells (hESCs) to characterize temporal interneuron transcriptome during its maturation, generating another dataset which involved 1733 cells [23]. By estimating NMIs for each DCX + interneuron (N = 993) , we observed the significant increase of integrated NMIs across the time course, especially between 54-day and 100-day (Wilcoxon rank sum test, P < 0.0001, Fig. 3 and Additional file 5: Figure S3). We also noticed that both tNMI and dNMI did not present significant increase between 100-day and 125-day interneurons (Wilcoxon rank sum test, P = 0.26 for tNMI, P = 0.58 for dNMI), which is consistent with the weak discrimination between them at the whole transcriptome level proposed by Close et al.
It is worth noting that even at the most electrophysiologically mature state (Bardy et al. dataset) or at the latest time point (Close et al. dataset), a large proportion of interneurons were still in immature state ( Fig. 3 and Additional file 5: Figure S3). These observations may be due to the technical issue that the NMI model failed to provide prediction of mature neurons, or reflected the failure to complete the neuron maturation process in vitro. To answer this question, we examined the human single neuronal nucleus RNA-seq in adult brains [24], resulting in both tNMI and dNMI values significantly larger than 0.5 ( Fig. 3 and Additional file 5: Figure S3). As expected, no significant difference of both tNMI and dNMI was observed between excitatory and inhibitory neurons (Wilcoxon rank sum test, P = 0.22 for tNMI, P = 0.27for dNMI, Fig. 3 and Additional file 5: Figure S3). Hierarchical clustering based on Pearson's correlation coefficient among samples revealed that cell type makes more contributions to sample separation than source of dataset, showing that the estimation is less likely to be biased by batch effect (Additional file 6: Figure S4). The above results suggested the potential maturation arrest of the in vitro differentiated neurons.

Transcriptome transition during maturation is conserved in mouse neurons
To check whether the detected transcriptome transition during neuron maturation was conserved in mice, the most widely used animal model for brain development and mental disorders, we applied the constructed human-based NMI model to neuron transcriptome data in mice. Chen et al. extracted maturing interneurons from mouse embryonic medial ganglionic eminence (MGE) and applied scRNA-seq to measure their transcriptome [25]. Estimation of dNMI suggested a boost of maturity state at E17.5, the latest time point across the time course. This result suggested that the human-based NMI models successfully recurred the neuron maturation process in mouse, implying the conserved maturation programs of neuron between humans and mice. Interestingly, the three subtypes of maturing interneurons identified in the study showed significantly different dNMIs (ANOVA, df 1 = 2, df 2 = 130, F = 55.2, P < 0.0001, Fig. 4a), suggesting that they represented interneurons at distinct stages of maturation.
Activities of mature-high modules reflect mature neuron functionality level Interestingly, applying the dNMI model to the purified neuron transcriptome of PS2APP Alzheimer's disease mouse model [26] suggested a significantly weaker maturity state than controls (median dNMI PS2APP = 0.782, median dNMI control = 0.791, two-sided Wilcoxon rank sum test, P = 0.003). Further studies on each of the mNMIs indicated that three mNMIs, all of which were based on mature-high modules, significantly decreased in PS2APP neurons (Wilcoxon's rank sum test, BH corrected P < 0.1). In addition, among the top-ten of the 27 discriminating modules with reliable mNMIs (AUC > 0.8 in crossvalidation in training set) and strongest decrease in PS2APP comparing to control neurons, eight were mature-high modules (Fisher's exact test, odds ratio (OR) = 4.25, P = 0.1). The bias of changes to the mature-high modules was different from observation from the above MGE interneurons data set, as only nine out of 15 (60%) modules with mNMIs significantly different among the three subtypes of maturing interneurons were maturehigh modules (Fisher's exact test, OR = 1.83, P = 0.49).
Considering that the mature-high modules are more likely to be responsible for mature neuronal function maintenance, the biased changes implied that the lower tNMI of PS2APP neurons represented impairment of neuronal function rather than maturation, which has been reported previously [27]. Therefore, we constructed the third integrated index, the neuron functionality index (NFI), which integrated mNMIs from only the maturehigh discriminating modules. As expected, the estimated NFIs of PS2APP neurons were significantly lower than those of control neurons (median NFI PS2APP = 0.836, median NFI control = 0.850, Wilcoxon rank sum test, P = 0.05, Fig. 4b). On the other hand, the integrated NMIs of immature-high discriminating modules did not show any significant difference (Wilcoxon rank sum test, P = 0.58, Fig. 4b). For comparison, no significant difference of either dNMI or NFI was observed between purified neuron transcriptome of a lipopolysaccharide-treated neuroinflammation mouse model and control mouse (Fig. 4b). These results indicated that the activities of mature-high, but not the immature-high, modules may act as signatures of neuron functionality.

Discussion
In this study, we studied the transcriptome changes during neuron maturation in humans and those functional pathways involved. For this purpose, we developed a new bioinformatics framework, by integrating module identification in the protein-protein interaction (PPI) network and differential expression (DE) analysis. Our strategy revealed 33 discriminating modules, each of which represents distinct biological pathways, which may be relevant to neuron maturation.
In general, the 17 modules whose genes show significantly higher expression levels in mature neurons, namely mature-high modules, tend to participate in processes relevant to neuronal function and electrophysiology. For instance, there are six discriminating modules, all of which are mature-high modules, which show enrichment of synaptic genes and have been reported to be relevant to the electrophysiological maturity of in vitro differentiated neurons [22]. Genes in M90, the module enriched for voltage-gated potassium channel complex components, also show higher expression levels in mature neurons. Directly checking those genes in the Bardy et al. dataset suggests higher expression levels in neurons with higher action potential than in neurons with lower action potential in marginal significance (permutation test, P = 0.052). Furthermore, energy consumption is suggested to grow during neuron maturation, as genes in functional modules related to both respiratory chain (M24) and tricarboxylic acid cycle (M37) show higher expression levels in mature neurons. As previously reported, higher neuronal activity increased mitochondrial oxidative phosphorylation [28]. Therefore, the increasingly active energy generation machinery in mature neurons we observed may be an adaptive strategy of mature neurons to its higher electrophysiological activity.
On the other hand, it is interesting that the 17 immature-high modules whose genes show significantly higher expression levels in immature neurons tend to show enrichment for nuclear functions, which are mainly related to housekeeping processes including RNA and protein metabolism. Indeed, genes in the immature-high modules are significantly overlapped with human housekeeping genes [29], especially when comparing with genes in the mature-high modules (one-sided Fisher's exact test, odds ratio (OR) = 2.1 P < 0.0001 compared to all genes in the network; OR = 3.0, P < 0.0001 compared to genes in mature-high modules). There are two possible explanations. The decreased activities of housekeeping processes may be an artificial observation due to the increased activities of pathways related to neuronal functions, since the quantification of expression assumes constant amount of transcripts in samples. In such case, genes in the immature-high modules share similar expression level differences which are not relevant to significances of modular expression level differences. However, ANOVA results suggest that genes in different immature-high modules show different amplitude of changes (F = 6.37, P < 0.0001). Partial Pearson correlation (PPC) between statistical significances (log-transformed P) and modular expression level changes (average expression alteration score) given the module sizes as condition (PPC = 0.57, P = 0.025) suggest dependency between them. Therefore, although this possibility cannot be completely ruled out, there is another scenario, where at least parts of these "housekeeping" processes may play more important roles in the maturation process compared to the final mature stage. This hypothesis is supported by previous studies where mRNA metabolism has proven relevant to some neuronal diseases such as spinal muscular atrophy (SMA) [30], and many regulators of transcription, mRNA translation and protein synthesis have been reported to be related to neurodevelopmental disorders such as autism [31]. Together with the significant neuron enrichment of gene expression patterns of our identified discriminating modules, these results to some extent relieve us from the concern about the sensitivity of our method to identify functional modules related to neuron maturation. Meanwhile, we also admit that to integrate more proteinprotein interactions with neuron specific functions in the future, rather than just base on PPI network in Reactome many pathways of which play housekeeping functions, might be helpful to increase the sensitivity of our method and have a more detailed interpretation of the neuron maturation process.
To further verify the observed differences between mature and immature neurons, we generated the LASSO logistic regression based neuron maturity index (NMI) model based on the detected gene differential expression, to estimate overall maturity states of neuronal samples. By applying NMI to two public data sets of neurons in vitro generated from neuronal progenitor cells (NPC), we find that the constructed NMI model correctly predict neuron maturation statuses. It suggests that the observed transcriptome differences represent general transition during neuron maturation which can be reproduced in neuron models in vitro. Meanwhile, we also observed that neurons in vitro generated from neuronal progenitor cells (NPC) are likely undergoing maturation arrest, as their estimated maturity states hardly attain complete maturation. In a previous study comparing the transcriptome of in vitro neuron models to spatiotemporal human brain transcriptome, in vitro neuron models were suggested to be similar to fetal brains [32]. However, the comparison between bulk neural samples with both neurons and proliferative cells can hardly tell whether this similarity is due to the similar NPC:neuron combination, or similar maturity states. Our results suggest that in vitro neuronal models are likely to be far from full maturation, which may be due to the lack of environmental stimulation that has been shown to be relevant to neuronal development [33].
Results of applying the NMI model in the mouse medial ganglionic eminence (MGE) single cell RNA-seq data suggests that our observed transcriptome transition happened during neuron maturation is applicable and conserved in mouse. At the same time, it is interesting to see that the three subtypes identified by the study represent neurons with distinct maturity states [25]. In the original study, three neuronal subtypes were identified on a spatial distinction basis: neurons from lateral ganglionic eminence (LGE) expressing LGE markers, neurons from MGE expressing MGE markers, and LGE/ MGE neurons expressing both markers. Our study suggests that neurons expressing LGE markers tend to be more mature, and those expressing MGE markers tend to be immature. This observation provides an alternative explanation on a developmental sequential basis, which reconciles with spatial distinction basis explanation, as a previous study has reported that interneurons are generated in MGE and migrate to LGE during their maturation [34]. Together, they provide a more comprehensive description about the origin of interneurons during brain development.
It is worth to mention that our NMI model, although was originally developed to verify the detected transcriptome transition between immature and mature neurons in other data sets, has the potential to corroborate or benchmark transcriptome changes during neuron maturation. Previous studies have developed statistical tools to evaluate maturity state of neural samples, e.g. CoNTExT [32]. Those tools were designed to be used for bulk tissue samples, e.g. dissected brain samples and in vitro neural cultures, which consist of multiple cell types including neuronal progenitor cells, immature and mature neurons, as well as non-neuronal glial cells. The NMI model, on the other hand, serves homogeneous neuronal samples, including single neurons and purified neuron populations. In the era of single cell biology, pseudo-time construction analysis, e.g. TSCAN [35], is commonly used to study transcriptome trajectory of cell development, and may be applied also to study neuron maturation [23]. This analysis, however, is limited by lacking benchmark of maturation stages. Although expression of several biomarkers may be helpful in a rough manner, the quantitative description is still missing. The relatively large sample size required to reconstruct a reliable pseudo-time series is also one limitation (although with less significance), as many studies only measured limited number of neurons [22,25,36]. In such a scenario, the NMI model can be implemented into, and complement, the existing framework, thus potentially benefitting future research.

Conclusions
To our knowledge, our study is the first report to comprehensively investigate and characterize molecular functions related to the transcriptome transition happened during neuron maturation in humans. By comparing public single cell RNA-seq data with both immature and mature neurons in vivo, we identified 33 functional modules with activities related to neuron maturity states and participating in varied biological processes, including synaptic functions, energy consumption and housekeep processes such as translation and splicing. The detected transcriptome transition was further validated by public human brain transcriptome profiles during development, as well as its high predictive power of neuron maturity states in multiple human neuron in vitro models. We also showed that such transition is conserved in mammals, considering its reasonable predictive power of neuron maturity states in mouse neurons.

Identification of neuron maturation relevant functional modules in the human protein-protein interaction (PPI) network
The human protein-protein interaction network was retrieved from the Reactome database (v57) [8,9], which is comprised of 8170 proteins and 200,260 undirected interactions. Proteins encoded by genes whose expression was undetectable in brains were excluded, with 5962 proteins and 125,437 interactions remaining.
Single-cell RNA-seq (scRNA-seq) data of human brains was retrieved from SRA (SRP057196) [10]. The RNA-seq reads were mapped to the human genome hg38 using STAR 2.3.0e [37] with default parameters. The number of reads covering exonic regions of each protein-coding gene annotated in GENCODE v21 was counted and normalized using DESeq2 [38]. FPKM was calculated for each gene in each sample. Average FPKM of each gene was calculated for mature and immature neurons, as the mean FPKM across all cells classified as "neurons" and "fetal quiescent", respectively. Expression level difference between mature and immature neurons of each gene was represented by expression alteration score s: where f is the fold change between average FPKM of mature and immature neurons, and p is the P-value of ANOVA with neuron maturity state as the independent variable. A heat-diffusion-based network smoothing procedure, as described and implemented in HotNet2 [39], was then applied to the obtained PPI network where the above expression alteration scores were assigned to corresponding nodes. In brief, a diffusion matrix, which describes the amount of heat diffused between each node pair in the network during the insulated heat diffusion process when the system reaches equilibrium, was defined as: Here, β is the insulating parameter (set to 0.55 in this study), and W is the normalized adjacency matrix. The smoothed expression alteration score of nodes in the network was then calculated as: where s is the vector of expression alteration scores of all nodes in the network. Weights were assigned to the edges which represent the annotated protein-protein interactions: A topological overlap matrix (TOM) based module identification procedure [6], as implemented in WGCNA [7], was then applied to resulted weighted PPI network. In brief, TOM was defined as a N × Nsquare matrix with N as the number of nodes in the network: where a i,j is the weight of edge between node i and node j, k i is the degree of node i. Hierarchical clustering with average linkage method was applied using TOM as the distance matrix, followed by the dynamic tree cutting procedure implemented in the R package DynamicTree-Cut [40], requiring minimal module size as 20. For each identified module, a Wilcoxon signed rank test was applied to the expression alteration scores of proteins in the module. Modules with Benjamini-Hochberg (BH) corrected P < 0.05 were defined as discriminating modules. Discriminating modules with positive median expression alteration scores were defined as mature-high modules, while the remaining ones were defined as immature-high modules. The pipeline to identify functional modules has been implemented as an R package and can be downloaded at https://github.com/maplesword/TOMRwModule.
To compare the discriminating modules identified with our PPI-assisted approach to the WGCNA-based co-expression modules, WGCNA analysis was applied to the same data set. Considering the high inter-cellular variability of single-cell RNA-seq data, gene expression levels of randomly selected cells with the same cell type were firstly averaged, resulted in 20 cell-pooling samples (10 for mature neurons and 10 for immature neurons). WGCNA was then applied to the cell-pooling samples. Eigen-gene patterns of each identified modules were correlated with the maturity labels of the pseudo-bulk samples to determine the significance and direction of its expression level changes during neuron maturation.

Characterization of functional modules
A Gene Ontology (GO) enrichment analysis was performed for each identified discriminating functional module using the parentChild algorithm [41] implemented in topGO [13], with all genes encoding for proteins involved in the PPI network as background. Pairwise functional similarities of discriminating modules were calculated using GOSemSim [14], by averaging similarities of the three GO categories: cellular component (CC), biological process (BP), and molecular function (MF). Hierarchical clustering with complete linkage was applied to the distances among discriminating modules defined as one minus the calculated similarity. The signature function of each module was defined as the module-specifically enriched representative GO terms under the Biological Processes category. The module-specific representative term was required to be annotated to at least half of genes in the module. When multiple module-specific terms met this criterion, the one covering the largest number of genes in the module was chosen. For module pairs with high interconnectivity and highly shared functions, terms shared by the two modules were considered, but limited to the ones annotated to more than half of the genes in both modules.
Functional pathway annotation was performed for each identified discriminating functional module based on the pathway gene set annotation in Reactome using a one-sided Fisher's exact test to compare with all genes encoding for proteins in the PPI network. Pathways with BH corrected P < 0.05 were selected.
The neuron specificity index (NSI) was calculated for each discriminating module to estimate neuron specificity of its expression pattern. More specifically, utilizing the published RNA-seq data of purified cell types in mouse brains [42], a neuron enrichment score (NES) was firstly calculated for each detected gene, as the Pearson correlation coefficient between its expression patterns across the purified brain cell type samples and the neuron-representative binary pattern [43]. NSI was then calculated as the difference between the average NES of genes in one discriminating module and the average NES of genes in non-discriminating modules. A positive NSI indicates a trend of higher gene expression levels in neurons compared to glial cells.

Generation of neuron maturity index (NMI)
The NMI models were constructed aiming at the discrimination of mature and immature neurons. To objectively build and test the models, the mature and immature neuron scRNA-seq data mentioned above were randomly separated into two groups. The training set included 99 mature neurons and 82 immature neurons. The test set included 32 mature neurons and 28 immature neurons.
Based on the training set, LASSO logistic regression as implemented in glmnet [44] was applied to each identified functional module, with standardized expression levels of genes in the module in each sample set as independent variables and neuron maturity state as the dependent variable. Expression level standardization was performed for each gene separated as following: e ¼ log 10 e þ 1 ð Þ−mean s∈T log 10 e s þ 1 ð Þ À Á sd s∈T log 10 e s þ 1 ð Þ À Á : Here, mean s∈T (log 10 (e s + 1)) and sd s∈T (log 10 (e s + 1)) represent the mean and standard deviation of the log10transformed expression levels (in FPKM) across all samples in the training set. The LASSO regularization parameter λ was then determined using ten-fold cross-validation to maximize area under curve (AUC) of receiver operating characteristic (ROC) of the model. For each sample given the expression levels in FPKM, the resulted LASSO logistic regression model of each module predicted the probability of the sample being mature neuron in relative to immature neuron; therefore, it was defined as the modular neuron maturity index (mNMI) of the functional module. Those mNMI models were then applied to the test set for performance evaluation, as well as other neuron scRNA-seq data or purified neuron bulk RNA-seq data for further investigations.
To integrate multiple mNMIs of different functional modules, a weighted mean of multiple mNMIs was calculated for module set S: Here, the weight of mNMI i (w i ) was defined as AUC i -0.5, where AUC i is the AUC of ROC of mNMI during the ten-fold cross-validation in the training dataset. When S = {all functional modules, the corresponding iNMI S was defined as transcriptome NMI (tNMI). Discriminating NMI (dNMI), on the other hand, was defined as the iNMI S when S = {all discriminating modules}. Lastly, neuron functionality index (NFI) was defined as the iNMI S when S = {all mature-high modules}.
The NMI models have been implemented as an R package (neuMatIdx) and can be downloaded at https:// github.com/maplesword/neuMatIdx.