Single-cell transcriptome analysis reveals secretin as a hallmark of human enteroendocrine cell maturation

The traditional nomenclature of enteroendocrine cells (EECs), established in 1977, applied the “one cell - one hormone” dogma, which distinguishes subpopulations based on the secretion of a specific hormone. These hormone-specific subpopulations included S cells for secretin (SCT), K cells for glucose-dependent insulinotropic polypeptide (GIP), N cells producing neurotensin (NTS), I cells producing cholecystokinin (CCK), D cells producing somatostatin (SST), and others. In the past 15 years, reinvestigations into murine and human organoid-derived EECs, however, strongly questioned this dogma and established that certain EECs coexpress multiple hormones. Using the Gut Cell Atlas, the largest available single-cell transcriptome dataset of human intestinal cells, this study consolidates that the original dogma is outdated not only for murine and human organoid-derived EECs, but also for primary human EECs, showing that the expression of certain hormones is not restricted to their designated cell type. Moreover, specific analyses into SCT-expressing cells reject the presence of any cell population that exhibits significantly elevated secretin expression compared to other cell populations, previously referred to as S cells. Instead, this investigation indicates that secretin production is realized jointly by other enteroendocrine subpopulations, validating corresponding observations in murine EECs also for human EECs. Furthermore, our findings corroborate that SCT expression peaks in mature EECs, in contrast, progenitor EECs exhibit markedly lower expression levels, supporting the hypothesis that SCT expression is a hallmark of EEC maturation.

In this probability density plot, the genes with a mean log2 normalized read counts of 0 and a probability density greater than 1 have been removed for better visibility.These lines did not provide meaningful information for the analysis and their removal allows for a clearer representation of the remaining data.The plot illustrates the log2-transformed expression counts of secretin-producing cells, indicating a higher expression of SCT in the small intestine compared to the large intestine and rectum.Multiple EEC hormones are expressed in all three subpopulations.GIP, CCK, and GCG show the highest normalized read counts in K, I, and L cells, respectively.All three subpopulations showed a positive SCT expression.In this probability density plot, the genes with a mean normalized log2 read counts of 0 and a probability density greater than 1 have been removed for better visibility.These lines did not provide meaningful information for the analysis and their removal allows for a clearer representation of the remaining data.

/
May 5, 2024 23/28 In this probability density plot, the genes with a mean normalized log2 read counts of 0 and a probability density greater than 1 have been removed for better visibility.These lines did not provide meaningful information for the analysis and their removal allows for a clearer representation of the remaining data.Based on the information obtained from the quality metrics, an upper limit of 5,500 and 40,000 was set for the number of expressed genes and the number of total counts, respectively.A permissive threshold of 20% was used to filter the proportion of mitochondrial genes expressed.Samples above the set thresholds were filtered out of the dataset.The three covariates of data quality, i.e., the proportion of mitochondrial genes, the number of genes by counts, and the number of total counts expressed, were plotted again.The defined thresholds for the covariates were verified to be effective, resulting in only the desired samples being kept in the dataset.

Fig. S2 .
Fig. S2.Logarithmic expression of SST, GCG, and TAC1 counts sorted by annotation.SST, GCG, and TAC1 read counts (grouped by annotation) after normalization, log transformation and batch correction.SST, GCG, and TAC1 showed the highest expression in D, N, and EC cells, respectively.An overlap in expression of EEC hormones is noticeable.

Fig. S4 .
Fig. S4.Mean expression levels of EEC hormones in each annotation.After normalization, log2 transformation, and batch correction of the data, the mean SCT, GIP, TAC1, SST, GCG, NTS, and CCK normalized read counts are calculated for all EEC populations.The results show an overlap in the hormonal expression of EECs.SCT is expressed nearly at the same level in N, K, I, and L cells.SST, GIP, NTS, and CCK showed, as described by the "one cell one hormone" dogma, the highest expression in D, K, N, and I cells, respectively.GCG showed the highest expression in N cells, followed closely by L cells.TAC1 was expressed the most by N and EC cells.

Fig. S5 .
Fig. S5.Distribution of SCT Expression in the Gastrointestinal Tract.Violin plot showing SCT expression across the gastrointestinal tract, with regions labeled as Small Intestine (SmallInt), Large Intestine (LargeInt), and Rectum (REC).The plot illustrates the log2-transformed expression counts of secretin-producing cells, indicating a higher expression of SCT in the small intestine compared to the large intestine and rectum.

Fig. S6 .
Fig. S6.Secretin Dynamics in Mature vs. Progenitor EECs.Violin plots depicting the expression levels of CHGA and SCT, and progenitor markers NEUROG3 and SOX4 in SCT high and SCT low enteroendocrine cells (EECs).The plots on the left compare the expression of CHGA (top) and NEUROG3 (bottom) in cells with high versus low levels of SCT.The plots on the right display the expression of SCT (top) and SOX4 (bottom) under the same conditions.Each plot shows the distribution and density of gene expression counts, with dots representing individual cells.Higher density regions of the plot are wider, illustrating the concentration of cells with similar expression levels.These visualizations indicate higher CHGA and SCT expressions in mature EECs (SCT high), whereas NEUROG3 and SOX4 are more expressed in progenitor EECs (SCT low).

Fig. S7 .
Fig. S7.Differential gene expression analysis between SCT high and SCT low enteroendocrine cells (EECs).The plot displays gene names ranked by their negative log-transformed p-values, indicating the level of differential expression.Genes such as PCSK1N, TTR, SCT, CHGB and CHGA are among the most significantly upregulated in SCT high EECs.The x-axis represents the gene rank based on the significance of expression difference, and the y-axis represents the negative log-transformed p-values, highlighting the statistical significance of each gene's differential expression.
Fig. S9.Expression of EEC hormones in N, D, and EC cells.Expression of EEC hormone counts in N, D, and EC cells.Multiple EEC hormones are expressed in all three subpopulations.NTS, SST, and TAC1 show the highest normalized read counts in N, D, and TAC1 cells, respectively.All three subpopulations showed a positive SCT expression.In this probability density plot, the genes with a mean normalized log2 read counts of 0 and a probability density greater than 1 have been removed for better visibility.These lines did not provide meaningful information for the analysis and their removal allows for a clearer representation of the remaining data.

Fig. S10 .
Fig. S10.Quality covariates plotted before filtering.The left plot shows the number of genes expressed in each sample.Each sample expresses between 500 and 8,000 genes.The number of total counts per sample is shown in the right plot, and the values range from 400 to about 120,000 counts for some samples.The number of genes by counts compared to total counts are shown together in the middle plot to better visualize the joint effect they have in filtering.

Fig. S11 .
Fig. S11.Mitochondrial genes proportion in the Gut Cell Atlas Dataset.Proportion of mitochondrial genes grouped by region (upper left), subregion (upper right), diagnosis (lower left), and age group (lower right).Adult samples show overall high expression, while fetal samples show low expression of mitochondrial genes.

Fig. S12 .
Fig. S12.Proportion of mitochondrial genes grouped by dataset annotations.The annotations present in the dataset express mitochondrial genes in varying proportions.Overall, increased expression is observed.Paneth cells show very high expression of mitochondrial genes.Goblet, tuft, microfold, colonocyte, and enterochromaffin (EC) samples also show relatively high expression (≥ 15%).Enteroendocrine samples (L, I, D, K, N, M, and EC), on the other hand, are not as affected (≈ 10%) by this phenomenon.

Fig. S13 .
Fig. S13.Samples number dependence on mitochondrial threshold.The left plot shows the number of enteroendocrine cell samples in the dataset for different thresholds of the proportion of mitochondrial genes.The right plot shows the number of total samples in the dataset for different thresholds of the proportion of mitochondrial genes.For very stringent thresholds (≤ 10%), most samples would be filtered out.For less stringent thresholds (≥ 20%), most enteroendocrine cells and total samples are retained.

Fig. S14 .
Fig.S14.Quality covariates plotted after filtering.Based on the information obtained from the quality metrics, an upper limit of 5,500 and 40,000 was set for the number of expressed genes and the number of total counts, respectively.A permissive threshold of 20% was used to filter the proportion of mitochondrial genes expressed.Samples above the set thresholds were filtered out of the dataset.The three covariates of data quality, i.e., the proportion of mitochondrial genes, the number of genes by counts, and the number of total counts expressed, were plotted again.The defined thresholds for the covariates were verified to be effective, resulting in only the desired samples being kept in the dataset.

Fig. S15 .
Fig. S15.Highest expressed genes in EEC subpopulations.The figure consists of six subplots labeled a) -f).Each subplot shows a boxplot of the highest expressed genes in a specific type of cell: N, D, EC, K, I, and L cells, respectively.The genes shown in each subplot were identified using the sc.pl.high_expr_genes function (Scanpy), which calculates the fraction of counts assigned to each gene over all cells in the dataset.The boxplots in each subplot show the normalized read counts levels of the 40 top genes with the highest mean fraction over all cells.The genes are sorted by decreasing mean fraction, with the gene with the highest mean fraction at the top of each boxplot.NTS, SST, MLN, GIP, CCK and GCG are the most highly expressed enteroendocrine hormones for N, D, EC, K, I and L cells respectively.