Significant Evolutionary Constraints on Neuron Cells Revealed by Single Cell Transcriptomics.

Abstract Recent advances in single-cell RNA-sequencing technology have enabled us to characterize a variety of different cell types in each brain region. However, the evolutionary differences among these cell types remain unclear. Here, we analyzed single-cell RNA-seq data of >280,000 cells and developmental transcriptomes of bulk brain tissues. At the single-cell level, we found that the evolutionary constraints on the cell types of different organs significantly overlap with each other and the transcriptome of neuron cells is one of the most restricted evolutionarily. In addition, mature neurons are under more constraints than neuron stem cells as well as nascent neurons and the order of the constraints of various cell types of the brain is largely conserved in different subregions. We also found that although functionally similar brain regions have comparable evolutionary constraints, the early fetal brain is the least constrained and this pattern is conserved in the mouse, macaque, and humans. These results demonstrate the importance of maintaining the plasticity of early brain development during evolution. The delineation of evolutionary differences between brain cell types has great potential for an improved understanding of the pathogenesis of neurological diseases and drug development efforts aimed at the manipulation of molecular activities at the single-cell level.


Introduction
The brain is the most complex organ known. Its complexity is not only due to the variety of behaviors originating from different neural circuits but also the complex interactions among these circuits. The brain regions involved in complex neural circuits are very distinct at the molecular level (Kang et al. 2011;Hawrylycz et al. 2012Hawrylycz et al. , 2015. At the cellular level, the latest single-cell sequencing technology has enabled researchers to reveal the complexity of cell types within a single brain region. Recent studies have found that tens of different cell types can be classified within one brain area, enabling a functional investigation of each cell type (Zeisel et al. 2015(Zeisel et al. , 2018Tasic et al. 2016).
The functional diversity in different brain regions indicates that the brain endures significant constraints during evolution, because any accumulation of deleterious mutations may cause abnormalities in brain function (Wang et al. 2006;Somel et al. 2013;Brainstorm et al. 2018;Gandal et al. 2018). In fact, the evolutionary rate of brain tissue preferred genes is relatively low (Khaitovich et al. 2005;Wang et al. 2006) and gene expression differences between human and chimpanzee are the smallest in brain compared with other organs (Khaitovich et al. 2005). As biological structure is the basis of their function and brain regions are composed of different cell types, the evolutionary constraints of these cell types might be different from each other as well.
High-expressed genes usually evolve slowly resulting in a negative correlation between expression level and evolutionary rate of a gene at the tissue level (E-R anticorrelation) (Drummond et al. 2006;Liao et al. 2006;Zhang and Yang 2015;Gandal et al. 2018). Several hypotheses have been proposed to explain this phenomenon. Both empirical data and theoretical simulations have provided a large amount of evidence that selection reduces errors in several molecular processes including protein translation, mRNA folding, or protein-protein interactions explain E-R anticorrelation (Zhang and Yang 2015). Therefore, highly expressed genes are more harmful than lowly expressed genes due to these errors (Zhang and Yang 2015). "Evolutionary constraint" refers to the extent that harmful mutations are removed from the population, as it causes molecular errors and reduced fitness. There are considerable variations of E-R anticorrelation among different brain regions and cortical brain regions are subject to more constraints than subcortical brain regions (Tuller et al. 2008). These findings suggest that E-R anticorrelation can be used as a reliable and robust index for assessing the strength of evolutionary constraints at the transcriptome level.
Does E-R anticorrelation exist at the single-cell level? Do different cell types exhibit different E-R anticorrelation levels? Are they under different evolutionary constrains to avoid biological errors? To answer these questions, we first examined the E-R anticorrelation with various single-cell transcriptome data sets and then compared the distribution of this parameter among >700 cell types. Finally, we validated this parameter in ten brain regions as well as during brain development.

Materials and Methods
Tissue Level RNA-Seq Data Tissue level RNA-seq data from adult human brains were downloaded from the Gene Expression Omnibus (GEO) data set with accession number GSE58604 (Wang et al. 2015), previously released by us. Human and macaque brain development and maturation data including 480 and 366 samples, respectively, were downloaded from the PsychENCODE website via the following link: http://development.psychencode.org/ (Li et al. 2018;Zhu et al. 2018). Mouse brain (hypothalamus) development data including 48 samples were obtained from GEO data set GSE21278 (Shimogori et al. 2010). To study the E-R anticorrelation changes during mammalian brain development, we downloaded human-macaque and human-mouse orthologous genes from Ensembl database (through BioMart). We only consider those genes expressed in all three species. This resulted in 10,858 gene sets that were used for further transcriptomic analysis.

Single-Cell Level RNA-Seq Data Sources
Single-cell RNA-seq data form for mouse primary visual cortex, somatosensory cortex, and hippocampus were downloaded from GEO data set, accession numbers GSE71585 and GSE60361 (Zeisel et al. 2015;Tasic et al. 2016). Singlecell RNA-seq data for human and macaque brain development were downloaded from http://development.psychencode.org/ (Zhu et al. 2018). Single-cell RNA-seq data for mouse cell atlas were obtained from GEO, accession number GSE108097 (Han et al. 2018) and for human LGN and MTG brain cells were downloaded from Allen Brain Atlas via the following link: http://celltypes.brain-map.org/rnaseq. All genes expressed in those data sets were included in the analysis. Single-cell RNA-seq data for dorsal root ganglion (Hu et al. 2016) were downloaded from GEO, accession number GSE71453. The single-cell prefrontal cortex development data were downloaded from GSE104276 (Zhong et al. 2018) and were used to examine the dynamic expression of neurological disease-related genes (supplementary fig. S9, Supplementary Material online).
Detailed information on all the scRNA-seq data analyzed can be found in supplementary table S1, Supplementary Material online.

Computation of E-R Anticorrelation
We retrieved dN and dS (Ka and Ks) of mouse-rat, humanchimpanzee, and human-macaque ortholog genes from European Bioinformatics Institute (https://www.ensembl.org/ biomart/martview, last accessed December 10, 2019). Genes with dS equal 0 or dN/dS >2 were not considered for further analysis. For genes with multiple dN or dS values, averaged dN or dS values were used. For analyzing evolutionary constraints, genes expressed in at least one cell were retained and cells with <200 expressed genes were removed. The Spearman correlation between expression values and the corresponding dN and dN/dS ratio of each gene were calculated.

Permutation Test Experiment
For each permutation test, the mean expression levels of each genes for each cell subtypes were resampled. Then, we recalculated their correlation coefficients with dN and dN/dS. About 10,000 experiments were performed in a total of 4,684 cells from both Tasic et al. (2016) and Zeisel et al. (2015) data sets.

Mouse Cell Atlas Data Analysis and Computation of Corrected E-R Anticorrelation
A total of 242,533 single cells from 38 tissues were included in the initial data set. Then, we removed the cells with <200 expressed genes, leaving 226,456 cells for further analysis. We used the R package "Seurat" for dimension reduction and clustering, following the tutorial at https://satijalab.org/ seurat/mca.html. To compare the evolutionary constraints among different tissues and cell types, cells with nonsignificant E-R anticorrelation (adjusted P > 0.05) were removed. To get rid of the effect of detected gene numbers, the residues from the linear regression between E-R anticorrelation and detected gene numbers were used (corrected E-R anticorrelation).

Gene Dating
The human and mouse gene age data were obtained as reported previously (Zhang et al. 2010). Briefly, the origins of Ensemble v51 protein coding genes were dated by determining the presence and absence of their orthologs along the vertebrate phylogenetic tree. "Young genes" were defined as primate-specific genes (phylogenetic branch !8, 1,828 genes) in human-and rodent-specific genes (phylogenetic branch !8, 3,111 genes) in mouse, respectively. The remaining genes which originate prior to the primate and rodent split were defined as "old genes."

Neurological Disease Genes
We manually obtained 427 Mendelian disease genes from Mendelian Inheritance in Man (OMIM) database (Blekhman et al. 2008). These genes are annotated with 17 different neurological disease phenotypes, including mental retardation, schizophrenia, autism spectrum disease, Alzheimer's disease, Parkinson's disease, neurodegeneration, amyotrophic lateral sclerosis, dementia, epilepsy, learning disability, intellectual disability, intellectual development disorder, cognitive impairment, depression, alcohol abuse, sleep disorder, and neurodevelopment disorder. We refer to these genes as neurological disease genes in the main text.

Statistical Analyses
The Wilcoxon signed ranks test (Wilcoxon test) and Kruskal-Wallis test were used to compare the evolutionary constraints of different tissues/cell types. One-tailed Wilcoxon test was used to calculate the upregulated genes in neuronal cell types. We defined neuronal upregulated genes with fold change >2 and adjusted P < 0.05 compared with nonneuronal types. A total of 2,532 and 7,725 neuronal upregulated genes were detected in Tasic

E-R Anticorrelations Widely Exist at the Single-Cell Level
Firstly, we examined whether E-R anticorrelation exists at the single-cell level. A total of 44,000 single cells, which were from 4 different types of human and mouse tissues (tumor, blood, brain, and stem cells), were collected (supplementary table S1, Supplementary Material online). Despite the fact that different sequencing platforms, library preparation protocols, and sequencing depths were used, we observed E-R anticorrelation for the majority of cell types (supplementary fig. S1, Supplementary Material online). We also found that the number of genes detected in a sample due to differences in sequencing depth can influence this index ( fig. 1B and C  and supplementary fig. S1, Supplementary Material online). When the number of detected genes increases, a stronger E-R anticorrelation is seen for each cell type. Thus, we regressed out the effect of detected gene numbers in the following analysis and refer to this value as E-R anticorrelation (or corrected E-R anticorrelation, see Materials and Methods). Since vertebrate immune cells are under positive selection and After controlling for detected gene number, we found that the brain transcriptome have stronger E-R anticorrelation than other organs ( fig. 2A and supplementary fig. S3A, Supplementary Material online) (one-sided Wilcoxon test: P < 2.20 Â 10 À16 ). We then calculated the E-R anticorrelation for each cell type (760 cell types in total) of the 38 organs. We found that significant variations exist in the evolutionary constraints among them, and a great overlap of E-R anticorrelation exists among different cell types of each organ ( À0.055 to 0.173. Even more interesting is the fact that the neuron cell transcriptomes, including neurons from the central nervous system (CNS) and peripheral nervous system (PNS), are under the strongest evolutionary pressure compared with other cell types ( fig. 2C and supplementary fig.  S3C and table S2, Supplementary Material online, one-sided Wilcoxon test, P CNS neuron <2.20 Â 10 À16 , P PNS neuron <2.20 Â 10 À16 ). We also observed a relatively strong E-R anticorrelation in stem cells, indicating that rapidly differentiated cells are sensitive to mutations ( fig. 2C and supplementary figs. S2B and S3C, Supplementary Material online, onesided Wilcoxon test, P stem cells < 2.20 Â 10 À16 ). Therefore, we further analyzed E-R anticorrelation levels in data sets from neuron cells at different stages of maturation (Li et al. 2018). These analyses suggest that mature neurons have a stronger evolutionary constraint than nascent neurons and neuron stem cells while rapidly evolving genes tend to be expressed in the earlier stages of maturation (supplementary fig. S4C, Supplementary Material online). These results emphasize the importance of investigating the evolutionary differences of organs based on their cell types.

The Strength of Evolutionary Constraints in Different Cell Types Is Conserved
To better distinguish the evolutionary constraints of different cell types within neuronal tissue, two data sets with high-sequencing quality were employed (Zeisel et al. 2015;Tasic et al. 2016). In both data sets, significant differences of E-R anticorrelation among cell types were observed ( fig. 3A and B). Interestingly, excitatory and inhibitory neurons show stronger E-R anticorrelation than nonneuronal cells, even after controlling for the detected gene number ( fig. 3A and B and supplementary table S3, Supplementary Material online). E-R anticorrelation is stronger in excitatory neurons than in inhibitory neurons, implicating stronger evolutionary constraints on the transcriptome of excitatory neurons. We further noticed that the strength of evolutionary constraints rank for different cell types by and large is conserved between these two data sets (excitatory neuron > inhibitory neuron > oligodendrocyte and astrocyte > microglia) implying that the selective pressure acting on different cell types is relatively stable in different brain regions ( fig. 3C, one-tailed Wilcoxon test, P < 0.0001 in all the comparison). This ranking order was further validated by the RNA-sequencing transcriptome data which were from fluorescence-activated cell sorting purified brain cells with higher read depth (Zhang et al. 2014) (supplementary fig. S5, Supplementary Material online), implying the robustness of our analysis. As both excitatory and inhibitory neurons can be classified into tens of subtypes, more investigations are required to further clarify whether the transcriptome of certain subclasses of inhibitory neurons is more constrained than that of excitatory neurons at some specific developmental periods or in some specific brain regions. Further, we found that the correlation between E-R anticorrelation and mean expression level in Tasic

Evolutionary Constraints of Single Cells Are Stronger Than Expected
Are the evolutionary constraints of each cell type in the nervous tissue stronger than expected by chance? To answer this question, 10,000 randomization experiments were preformed, where the expression of each gene was shuffled and their E-R anticorrelations were recalculated. We found that the E-R anticorrelation of each cell type is much stronger than in the permutation experiments (supplementary fig. S8, Supplementary Material online, one-tailed Wilcoxon test, P < 0.05 for all the 12 comparisons).

Both Neuronal Upregulated Genes and Neurological Disease-Related Genes Contribute to the Strong Evolutionary Constraints of Neurons
We next estimated the effect of neuronal upregulated genes on the high evolutionary constraints of neurons from Tasic et al. (2016) data set and found that the evolutionary rate of those genes (dN and dN/dS) is lower than other expressed genes in neurons (for dN, P ¼ 7.25 Â 10 À17 ; for dN/dS, P ¼ 9.20 Â 10 À15 , one-tailed Wilcoxon test). After removing those genes, the E-R anticorrelation of neuronal cells is decreased (supplementary table S4, Supplementary Material online, for dN-Exp, P ¼ 3.67 Â 10 À27 ; for dN/dS-Exp, P ¼ 1.26 Â 10 À39 , one-tailed Wilcoxon test). This effect also exists for neurological disease genes (for dN-Exp, P ¼ 7.78 Â 10 À6 ; for dN/dS-Exp, P ¼ 3.92 Â 10 À7 , one-tailed Wilcoxon test) and similar results were found using the data set from Zeisel et al. (2015). This indicates that both neuronal upregulated genes and neurological disease genes are under higher selective pressure and contribute to the greater constraints of neuronal cells in evolution.
We also found that there are more genes expressed in adult brain (525.34 6 258.30) than fetal brain (453.00 6 147.10) (P ¼ 2.40 Â 10 À9 , one-tailed Wilcoxon test) and neurological disease genes have higher expression levels in the adult brain than in the fetal brain (P ¼ 2.25 Â 10 À25 , one-tailed Wilcoxon test) (supplementary fig. S9A, Supplementary Material online). Thus, the adult brain is generally more susceptible to neurological diseases. Finally, the expression of these pathogenic genes is slightly increased during the development of prefrontal cortex (supplementary fig. S9B, Supplementary Material online).

Early Brain Development Exhibits the Least Evolutionary Constraints
In order to further study the evolutionary constraint of brain neurons on the tissue level and temporal scale, we next calculated E-R anticorrelation in the previously reported ten language-related Brodmann areas (Wang et al. 2015), which are all from healthy adult samples. As shown in supplementary fig. S10, Supplementary Material online, there are no significant differences in E-R anticorrelations among these ten brain regions (Kruskal-Wallis v 2 ¼ 3.69, P ¼ 0.93), indicating that evolutionary constraints are very similar in functionally similar brain regions. This result is not in contradiction with a previous report as more distinct brain regions were used before (Tuller et al. 2008). We then studied how evolutionary constraints change during brain development by analyzing the recently published PsychENCODE data, which profiled the transcriptome of distinct human brain regions from different developmental stages (Li et al. 2018;Zhu et al. 2018). Our results suggest that the evolutionary constraints are the weakest in the early infancy brain and then increase during development ( fig. 4A). This pattern is consistent for different subregions of the human brain (supplementary fig. S11, Supplementary Material online) and is conserved in both monkey and mouse ( fig. 4C and E), which indicates that the early development of the mammalian brain requires more plasticity and thus fewer constraints during evolution. Additionally,  consistent with a previous report (Zhang et al. 2011), we found that the young genes are upregulated in the fetal brain of humans and monkeys, but not in those of the mouse ( fig. 4B, D, and F).

Discussion
By using single-cell sequencing data from >280,000 cells, we showed that E-R anticorrelation is well established for all cell types. At the cellular level, the selection constraints in different cell types vary with differentiated cells being under more constraints. Thus, there is no direct relationship between the selective constraints and the physical distance of the cells and adjacent cell types may have great variations in terms of their evolutionary constraints. We further found that the evolutionary constraints of neuronal cells are nearly always the strongest among different somatic cell types. At last, we analyzed the evolutionary constraints of brain at different developmental stages on tissue level. Although functionally similar brain regions have similar constraints, the early fetal brain exhibited the weakest evolutionary constraints and this pattern is conserved across three species.
The development of single-cell RNA-sequencing technology has allowed us to isolate and compare neuronal cells between species on a large scale. A recent comparison of different cell types between the cortex of two reptilian species and those of the mouse/human has suggested that novel excitatory neurons are generated while inhibitory neurons are mostly conserved during the evolution of amniotes (Tosches et al. 2018), which has highlighted different evolutionary pathways of excitatory and inhibitory neurons. In this study, our results demonstrate that neuronal cells have stronger evolutionary constraints than nonneuronal cells, implicating that more functional divergence of neuronal cells exists compared with nonneuronal cells in the nervous systems. The stronger constraints in neurons are partly due to upregulated neuronal genes and neurological disease-related genes that evolve more slowly than other genes in the cell. Our results are consistent with the recent finding that oligodendrocytes have undergone an accelerated evolution compared with neurons in the human lineage (Berto et al. 2019). Interestingly, the study by Berto et al. (2019) observed that human-specific genes in oligodendrocytes tend to be related to neuropsychiatric disorders, highlighting the importance of myelination and oligodendrocytes for the pathobiology of these disorders (Berto et al. 2019).
We found that early mammalian brain evolution was highly plastic compared with later life span stages. Consistent with an earlier study (Zhang et al. 2011), the human and monkey brains have more "young" genes upregulated than the mouse brain, demonstrating accelerated evolution of the primate infant brain. Additionally, a recent study has shown that the most considerable transcriptomic divergences between the human and macaque brains occur during the early stages of development (Zhu et al. 2018), which is consistent with our results. In summary, we have shown that the transcriptional plasticity of the early brain may be one of the key factors determining the direction of mammalian brain evolution.
Finally, more detailed evolutionary differences between cell types of different organs are revealed with single-cell sequencing technology. Since even within the same organ various selective constraints among different cell types exist and those constraints can overlap, it may be of particular importance to classify and investigate the transcriptional differences of distinct cell types across multiple organs rather than within a specific organ. Thus, understanding the evolutionary differences between cell types has great potential for shedding light on the pathogenesis of neurological diseases and contributing to the development of drugs based on the molecular activities at the single-cell level.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.