Single-nucleus multi-omics of human stem cell-derived islets identifies deficiencies in lineage specification

Insulin-producing β cells created from human pluripotent stem cells have potential as a therapy for insulin-dependent diabetes, but human pluripotent stem cell-derived islets (SC-islets) still differ from their in vivo counterparts. To better understand the state of cell types within SC-islets and identify lineage specification deficiencies, we used single-nucleus multi-omic sequencing to analyse chromatin accessibility and transcriptional profiles of SC-islets and primary human islets. Here we provide an analysis that enabled the derivation of gene lists and activity for identifying each SC-islet cell type compared with primary islets. Within SC-islets, we found that the difference between β cells and awry enterochromaffin-like cells is a gradient of cell states rather than a stark difference in identity. Furthermore, transplantation of SC-islets in vivo improved cellular identities overtime, while long-term in vitro culture did not. Collectively, our results highlight the importance of chromatin and transcriptional landscapes during islet cell specification and maturation.

Insulin-producing β cells created from human pluripotent stem cells have potential as a therapy for insulin-dependent diabetes, but human pluripotent stem cell-derived islets (SC-islets) still differ from their in vivo counterparts. To better understand the state of cell types within SC-islets and identify lineage specification deficiencies, we used single-nucleus multi-omic sequencing to analyse chromatin accessibility and transcriptional profiles of SC-islets and primary human islets. Here we provide an analysis that enabled the derivation of gene lists and activity for identifying each SC-islet cell type compared with primary islets. Within SC-islets, we found that the difference between β cells and a wr y e nt er oc hr om af n-like cells is a gradient of cell states rather than a stark difference in identity. Furthermore, transplantation of SC-islets in vivo improved cellular identities overtime, while long-term in vitro culture did not. Collectively, our results highlight the importance of chromatin and transcriptional landscapes during islet cell specification and m at ur at ion.
The development of methods to differentiate hPSCs into islet-like clusters has the potential to generate an unlimited number of insulin-producing stem cell-derived β (SC-β) cells for the treatment of insulin-dependent diabetes [1][2][3] . This process utilizes temporal combinations of small molecules and growth factors 4-6 , microenvironmental cues 7 and other sorting or aggregation 4,[8][9][10][11][12][13][14] approaches to drive cells through several intermediate progenitor cell types. The resulting SC-β cells possess many features of primary human β cells, including the expression of β-cell-specific markers, glucose-responsive insulin secretion and the ability to reverse severe diabetes in animal models 5,6,[15][16][17][18] . As a result, these cells have the potential to provide a functional cure for human patients with type 1 diabetes. However, these stem cell-derived islets (SC-islets) are a heterogeneous tissue that also contains stem cell-derived α (SC-α) and δ (SC-δ) cells, as well as an endocrine cell type of intestinal identity denoted here as stem cell-derived enterochromaffin (SC-EC) cells 19 . The presence of this awry population suggests that there are inefficiencies in lineage specification during these directed differentiation protocols 9 . Correcting this aberrant signalling could enhance specification to a β-cell identity and further improve the function of SC-β cells.
The specific pattern of gene expression and chromatin state within a cell directs differentiation and maintains its final identity 13,14,[20][21][22][23] . Thus, characterizing both the transcriptional and chromatin landscape of cells during this differentiation process can provide insight into the degree that a particular cell type resembles its in vivo analogue. Single-cell RNA sequencing has been used to characterize the transcriptional environment of SC-islets, demonstrating that SC-β cells express many, but not all, of the important genes found in human primary β cells 9,16,17,24,25 . Studies have also begun to investigate the chromatin state within pancreatic cell differentiations using bulk approaches 26-29 , Resource https://doi.org/10.1038/s41556-023-01150-8 of certain motifs, such as PDX1, PAX4 and NKX6-1, that are important in human primary β cells 31 (Fig. 1e).
In an effort to better characterize these cell populations within SC-islets, we cross-referenced transcription factor DNA-binding motif chromatin accessibility and gene expression information to identify active transcription factors that have both high expression and that can access their target binding motifs to promote transcription of other genes (Supplementary Table 5). Here we highlight the top ten identified active transcription factors enriched in specific endocrine populations compared with the average of the other endocrine populations (Fig. 1f). Highly active transcription factors in SC-β cells include RFX1, PDX1 and PAX6, which are important for β-cell identity 31,39 . Other highly active transcription factors, such as MAFG, EBF1, ISX and PLAGL2, have not been highlighted previously, demonstrating the utility of using both mRNA and chromatin accessibility data for identifying cell types. In contrast, SC-EC cells have co-enrichment of LMX1B, LMX1A, MNX1 and GATA6. Notably, the SC-β and SC-EC populations both have high activities of NKX6-1 and PAX4, suggesting that they share common features essential for their identities. This similarity is further demonstrated with motif chromatin accessibility plots, where NKX6-1 and PDX1 were enriched in both SC-β and SC-EC populations (Fig. 1e,g). In contrast, binding motifs for PAX6 were enriched in SC-β cells compared with SC-EC cells while LMX1A was enriched in SC-EC cells compared with SC-β cells. Unexpectedly, we found a mismatch in RNA expression and motif accessibility across cell types for certain transcription factors, such as LMX1B and ISX (Fig. 1f,g). We also investigated transcription factors specifically implicated during endocrine cell development by RNA expression and motif accessibility (Extended Data Fig. 1h). Of note, we observed FEV being upregulated in the SC-EC population, HNF1A in all endocrine populations, and PDX1 expressed predominantly in SC-EC and SC-β cells by both RNA and chromatin motif accessibility. We also observed a mismatch of NEUROG3 in the final SC-β cell population, where there was no detectable RNA but enriched motif accessibility. Collectively, this multi-omic analysis has generated important lists of genes to better classify the cell types found within SC-islets, providing better resolution of cell identity and insights into the relative importance of different transcription factor activity within each cell type.

SC-EC and SC-β cells show gradient, not distinct, identities
The serotonin-producing SC-EC cells comprise an awry cell population that arises during in vitro SC-islet differentiation protocols but that does not positively contribute to tissue function 9,40,41 (Fig. 1b). Detection of substantial amounts of serotonin occurred only at the end of this differentiation protocol (Extended Data Fig. 2a). While it is not known how serotonin affects in vitro differentiation to SC-β cells, a study has reported its involvement in regulating β-cell mass during pregnancy 42,43 . Although both enterochromaffin and β cells arise from a definitive endoderm precursor during in vivo development, enterochromaffin cells originate from an intestinal lineage, while islet cells differentiate from a distinct pancreatic origin. During in vitro differentiation of SC-islets, however, they appear to share a common progenitor lineage, and how this enterochromaffin cell population emerges is not well understood 9 . Interestingly, the combined mRNA/ATAC clustering analysis allowed us to resolve two distinct SC-EC populations that were adjacent to the SC-β cell population (Fig. 1a), suggesting that there may be multiple or a gradient of cell states between SC-EC and SC-β cells. Given the apparent similarity of SC-β and the two SC-EC cell populations in our multi-omic clustering (Fig. 1b), we performed a trajectory analysis to detect differences in both gene expression and chromatin accessibility between SC-β and SC-EC cells (Fig. 2a, Extended Data Fig. 2b and Supplementary Table 6). As expected, enterochromaffin cell identity genes were most highly expressed in the SC-EC side of the trajectory map and β-cell identity genes were most highly expressed towards the SC-β cell side of the trajectory map (Fig. 2b,c and Extended Data Fig. 2c). ATAC peaks around SC-β cell and SC-EC cell marker genes and recent work has demonstrated the advantages of studying primary human islets using single-cell assay for transposase-accessible chromatin (ATAC) sequencing [30][31][32] . ATAC sequencing can describe whether particular chromatin regions are in an open, accessible state that is ready to be transcribed or interacted with, providing valuable information about cell identity that is missing when only transcriptional data are investigated. This data can define, for instance, the chromatin accessibility of promoters for specific genes that are capable of being transcribed. Furthermore, it can describe the chromatin accessibility of the DNA sequence motifs where specific transcription factors bind to help activate the transcription of different genes. Importantly, modulating the chromatin states of particular genes and motifs that are mis-expressed in in vitro-derived cell types are likely to drive cells closer to the identity of their in vivo counterparts. However, a comprehensive index of the combined chromatin accessibility and transcriptional signatures of each in vitro-differentiated SC-islet cell type is currently lacking.
In this Resource, we have provided and analysed the transcriptional and chromatin landscapes of SC-islets at the single-cell level, identifying important genes and motifs for each cell type. Interestingly, SC-EC and SC-β cells formed a gradient of cell identities, with subpopulations exhibiting characteristics of the other. Primary human islets had a more defined chromatin state than SC-islets, the latter of which had open chromatin regions associated with other lineages. Transplantation into mice for 6 months closed many of these accessible chromatin regions and improved lineage-specific gene expression, while extended in vitro culture did not have the same effect. We identified and modulated chromatin regulators important for SC-β cell identity, highlighting the importance of the chromatin landscape. Our findings improve the characterization of cellular identities within SC-islets and provide a resource to guide the development of strategies to improve SC-β cell differentiation.

Multi-omic SC-islet analysis improves cell identity resolution
We analysed transcriptional and chromatin features of cells produced through directed differentiation of hPSCs to pancreatic islets to define cell identity and investigate lineage specification deficiencies rigorously 7,33 (Extended Data Fig. 1a,b and Supplementary Table 1). These SC-islets were processed and sequenced using single-nucleus multi-omics, obtaining both gene expression (messenger RNA) and chromatin accessibility (ATAC) information for each cell ( Fig. 1a and Supplementary Table 2). Analysis of gene expression and chromatin accessibility both individually and in combination enabled us to identify specific islet cell types, including SC-β cells (Fig. 1b, Tables 3 and 4; 29,526 cells from 3 independent differentiations). We were able to identify two subpopulations of SC-EC cells (denoted as SC-EC1 and SC-EC2) only when using the integrated analysis of both mRNA and ATAC data (Fig. 1b). SC-β, SC-α and SC-δ populations had strong chromatin peaks represented by their respective INS, GCG and SST gene regions, as expected (Fig. 1d). Unexpectedly, however, the INS gene region had open accessibility across all detected cell types, including non-endocrine cell types to some extent. In contrast, GCG and SST gene regions had relatively few chromatin peaks outside of SC-α and SC-δ cells, respectively.
We found distinct transcription factor-binding motif groups enriched in each specific cell type (Fig. 1e, Extended Data Fig. 1e and Supplementary Table 5). Within the endocrine population, SC-β cells had enrichment of motifs that correspond to known β-cell-associated transcription factors 9,25,34,35 . Interestingly, this included enrichment of the MAFA binding motif, but the expression of the transcription factor itself remained at very low levels in SC-β cells, as previously described 19,36 . Motifs that were specifically enriched in SC-EC cells include GATA6, CDX2 and LMX1A, which are known intestinal markers 37,38 . Interestingly, SC-β and SC-EC cells possessed shared enrichment  Resource https://doi.org/10.1038/s41556-023-01150-8 also highlight differences in chromatin accessibility patterns, some of which were predicted to be cis-regulatory elements (Extended Data Fig. 2d). Differential motif chromatin accessibility analysis identified LMX1A, GATA6 and CTCF motifs to be enriched in SC-EC cells, while NEUROD1, HAND2, MAFA, PAX6, ONECUT2 and RFX2 motifs were enriched in SC-β cells (Fig. 2b,c).
While key identity genes and motifs were enriched on their respective sides of the trajectory map, their expression was a mixture of SC-β and SC-EC cell identities across the trajectory (Fig. 1b). To further probe the transcriptional and chromatin landscape of cells along this trajectory map, we performed subclustering analyses separately on the SC-β cell population and the SC-EC cell populations (Fig. 2d). Differential expression analyses revealed that each of the SC-β cell subpopulations had distinguishing gene expression and motif accessibility features (Fig. 2d Additionally, we performed trajectory analysis using another tool 44 that produced consistent results (Extended Data Fig. 2g). Collectively, these data suggest that the SC-EC and SC-β cell populations produced from in vitro differentiation form a continuum of cell states rather than exhibiting clear exclusivity of gene expression and chromatin accessibility, suggesting deficiencies in fully specifying each endocrine cell population in vitro.
Because changes in chromatin accessibility influence which genes can be expressed in a cell and consequently its identity, chromatin regulators could influence where a cell lies on this gradient between SC-β and SC-EC cells during in vitro differentiation. To this end, our integrated multi-omic analysis identified the chromatin remodeller CCCTC-binding factor (CTCF) as the transcription factor binding motif having the greatest increase in accessibility in SC-EC cells compared with SC-β cells (Fig. 2c). Because CTCF is a chromatin remodeller that is involved in other developmental and differentiation processes 45 , we sought to further examine its impact on pancreatic differentiation. Unfortunately, we were unsuccessful in also knocking down its expression despite trying several approaches, which we would hypothesize could improve differentiation to SC-β cells. Instead, we utilized a doxycycline-inducible VP64-p65-Rta CRISPR activation (CRISPRa) stem cell line 46 to increase transcription of CTCF during differentiation ( Fig. 2f and Extended Data Figs. 2h,i and 3a). Upregulation of CTCF during stage 5 endocrine cell induction 7,33 resulted in drastic reductions in the expression of β-cell identity markers, insulin content and glucose-stimulated insulin secretion along with notable upregulation of intestinal lineage/EC-cell-associated genes (Fig. 2g,h and Extended Data Fig. 3a-e). CTCF overexpression had the most pronounced impact during the endocrine induction stage, indicating its specific role in selecting between SC-EC and SC-β cell fates (Extended Data Fig. 3f). Furthermore, single-nucleus multi-omic sequencing demonstrated that CTCF overexpression caused increased accessibility of the CTCF binding motif and decreased accessibility of β-cell-associated transcription factor binding motifs (Fig. 2i, Extended Data Fig. 3g-l and Supplementary Table 8; 12,467 cells from 2 datasets, 1 of each condition from the same differentiation batch). These results demonstrate that elevated CTCF expression disrupts the development of SC-β cells and redirects pancreatic progenitors towards an intestinal EC-like cell fate.

Multi-omic analysis defines distinct primary islet cell types
To compare the transcriptional and chromatin signatures identified in our SC-islets to their in vivo counterparts, we sequenced and characterized primary human islets (  Table 5). Notably, primary β cells had accessible binding sites for PDX1, NKX6-1, MAFA and ISX, similar to SC-β cells. Surprisingly, motifs identified as enriched in SC-EC cells, such as LMX1A and MNX1, were enriched in β cells when compared with other endocrine populations. Furthermore, re-clustering analysis of the primary β cells identified three subpopulations that displayed unique gene expression and accessible motif signatures (Extended Data Fig. 4f-i and Supplementary Table 7), consistent with previous studies 25,31,47,48 .
Similar to our multi-omic analysis of SC-islets, we also examined transcription factor activity in primary islet endocrine cells as assessed by both relative increases in mRNA transcripts and chromatin . Control represents cells without doxycycline treatment. Statistical significance was assessed by unpaired two-sided t-test. h, ICC quantification of cells expressing C-peptide protein (P = 5.3 × 10 −6 ) and SLC18A1 protein (P = 3.1 × 10 −4 ) with or without CTCF overexpression, plotting mean ± s.e.m. (control; n = 6 biologically independent samples, doxycycline; n = 7 biologically independent samples). Control represents cells without doxycycline treatment. Statistical significance was assessed by unpaired two-sided t-test. i, Volcano plots showing differential motif chromatin accessibility analysis of SC-endocrine population comparing control and CTCF overexpression (12,467 cells from 2 independent biological samples, 1 of each condition from the same differentiation batch; integration of all samples). Statistical significance was assessed by logistic regression. Control represents cells without doxycycline treatment. SC, stem cell derived; EC, enterochromaffin cells; CRISPRa, CRISPR activation.

Resource
https://doi.org/10.1038/s41556-023-01150-8 accessibility of the corresponding DNA-binding motif (Fig. 3f,g). For β cells, NKX6-1 was the only transcription factor that was shared in the top ten active factor list with SC-β cells (Figs. 1f and 3f). While some of the active transcription factors were shared with the list from SC-islets, this analysis illustrates that primary islet cells have a unique transcriptional and chromatin landscape when compared with their stem cell-derived counterparts, highlighting specific deficiencies in lineage specification during directed differentiation protocols.

Primary islet chromatin is more restricted than in SC-islets
Previous studies have demonstrated that β cells derived from in vitro differentiation of hPSCs are functionally and transcriptionally different Re-clustering analysis of EC-cell populations Single-nucleus multiomic sequencing 10 Resource https://doi.org/10.1038/s41556-023-01150-8 than their in vivo counterparts 5,6,9,16,17,19 . We compared the transcriptional and chromatin landscapes of SC-islets and primary islets using single-nucleus multi-omic sequencing to better understand these differences and similarities ( Fig. 4a; 47,566 cells from 5 samples; 3 SC-islets, 2 primary islets). Both SC-islets and primary islets contained β, α and δ cells (Extended Data Fig. 5a,b). However, only SC-islets contained EC-cells, while only primary islets contained PP cells, consistent with prior analysis 9 . In general, SC-islet cell types were transcriptionally most similar to their primary cell counterparts (Fig. 4b). Surprisingly, promoter chromatin accessibility did not show this trend, as cell origin (in vitro derived versus primary) was a greater factor in determining similarity rather than cell type. In general, promoter accessibility of SC-islet cells was open across cell types for endocrine and non-endocrine identity genes and lacked cell type distinctiveness (Fig. 4c,d), even when compared with additional primary islets (Extended Data Fig. 5c-e; 52,317 cells from 5 samples; 3 SC-islets, 2 primary islets). Primary β cells had greater chromatin accessibility of the INS gene compared with SC-β cells (Fig. 4e). While SC-α and SC-δ cells also shared chromatin accessibility peaks around the INS gene, this was not observed in primary α and δ cells. Chromatin accessibility analysis of MAFA and UCN3 revealed predicted cis-regulatory elements that differed considerably between SC-β and primary β cells (Fig. 4f).
To explore similarities and differences in endocrine cell identity between SC-islets and primary islets, we analysed differential gene expression and motif chromatin accessibility in β-, α-and δ-cell subpopulations (Fig. 4g, Extended Data Fig. 5f and Supplementary Table 9). The mRNA data from the multi-omic analysis yielded results consistent with previous RNA-only studies, demonstrating SC-β cells had lower expression of IAPP and other maturation genes 9,16,25,49,50 . ARID1B, a chromatin regulator 51 , had higher expression in SC-β cells. Surprisingly, ONECUT2 had increased mRNA expression and DNA-binding motif chromatin accessibility in SC-β cells compared with primary β cells, despite this being a gene whose increased expression is associated with adult human β cells compared with juvenile β cells 34 . The chromatin accessibility data demonstrated that SC-β cells had more enriched motifs that are associated with off-target or progenitor cell states compared with primary β cells. Primary β cells showed enriched motifs linked to FOS/JUN family genes, expression of which is related to better function following transplantation 16 . MAFA surprisingly had similar chromatin accessibility of the associated DNA-binding motif between SC-β and primary β cells despite low mRNA expression and chromatin accessibility of the gene in SC-β cells (Fig. 4f,g). Thus, the role of MAFA in β-cell identity may be regulated by the chromatin state of the associated gene and not accessibility of its binding motif. These findings reveal substantial differences in mRNA expression and chromatin accessibility between SC-β cells and primary β cells. SC-β cells display more off-target cell type features and lack key adult primary β-cell identity signatures, whereas primary β cells have a more restricted chromatin state.
We compared β, α and δ cells from SC-islets and primary islets by analysing the activity of transcription factors in each cell type from both origins ( Fig. 4h and Extended Data Fig. 5g). For SC-β and primary β cells, several transcription factors had comparable mRNA expression and motif accessibility, while others exhibited differences, such as enriched RFX1 motif accessibility, reduced MAFA RNA expression and reduced NFE2L1 and BACH2 motif accessibility in SC-β cells. We identified the top ten transcription factors with the greatest differential activity in SC-β cells compared with primary β cells and found that many of these factors were enriched in SC-β cells, indicating ambiguity in their cell identity due to incorrect or incomplete cell fate specification. Our analyses reveal fundamental differences in mRNA expression, chromatin accessibility and transcription factor DNA-binding motifs between cell types in SC-islets and their in vivo counterparts.
In vivo, not in vitro, time enhances SC-β cell signatures SC-β cells can acquire improved phenotypes with weeks in vitro or months after transplantation 4,9,17,18,27,52 . We performed multi-omic sequencing on SC-islets that underwent extended in vitro culture or transplantation into mice to characterize the chromatin and transcriptional changes that occur over time under these conditions. SC-islets cultured in vitro were sequenced periodically for up to 12 months ( Fig. 5a and Extended Data Fig. 6a,b; 40,332 cells from 5 datasets, 1 from each timepoint). While SC-β cells displayed increased INS transcript with time, insulin secretion improved only until week 4 in vitro (Extended Data Fig. 6c,d). SC-α, SC-δ and SC-EC cells increased GCG, SST and TPH1 expression, respectively, until week 4 but declined by month 6 in vitro (Extended Data Fig. 6c). We found reductions in both the number and identity features for SC-EC cells starting at month 6 and SC-α cell by month 12 (Extended Data Fig. 6e,f). Drastic shifts in chromatin accessibility were observed broadly across all endocrine cell types at months 6 and 12 (Extended Data Fig. 6g).
In SC-β cells, expression of many β-cell transcription factors and the accessibility of their associated motifs were upregulated at week 4 of in vitro culture but decreased in the long term (months 6 and 12) ( Fig. 5b-d). Expression and promoter accessibility of other β-cell genes, including UCN3, increased, while those associated with off-target identities, such as SC-EC cell, diminished with long-term culture (Fig. 1b,d and Supplementary Table 10). Similarly, ATAC peaks demonstrated accessibility around the INS genomic region increased in SC-β cells but diminished in SC-α and SC-δ cells (Fig. 5c). Collectively, multi-omic analysis of in vitro cultured SC-islets revealed that, while the transcription of some important β-cell genes increased over time, the chromatin state of these cells became much more restricted in long-term culture. While this restriction helped to decrease off-target populations, many crucial β-cell transcription factors were also downregulated, possibly leading to the observed decrease in function.
In parallel, we also performed multi-omic sequencing of SC-islets transplanted for 6 months in mice ( Fig. 5e and Extended Data Fig. 7a,b; 7,162 cells from 3 datasets, each dataset used 3 mice). Gene expression and promoter accessibility information demonstrated that endocrine identities became even more distinct than before transplantation (Extended Data Fig. 7c-e). Motif accessibility was also well defined for each cell type, with transplanted SC-β and SC-EC cells no longer displaying shared open motifs, such as for PDX1 (Extended Data Fig. 7f,g). Comparison of these 6 month transplanted SC-islets with those before transplantation revealed that transplanted SC-islets acquired gene expression and chromatin accessibility signatures that were more similar to their primary cell counterparts (Extended Data Fig. 8a-c and Supplementary Table 11). These improvements in cell identity were also reflected in the chromatin accessibility around the INS gene, which exhibited diminished peaks in non-β cell populations (Fig. 5f). No major shifts in composition were seen, suggesting this was set by the end of the in vitro differentiation process and that transdifferentiation was likely not occuring (Extended Data Fig. 8a). Gene sets and motif accessibility for many β-cell markers were upregulated in transplanted SC-β cells, whereas immature and off-target populations markers decreased ( Fig. 5g and Extended Data Fig. 8d-f). Collectively, transplanted SC-β cells resembled primary β cells more in gene expression, promoter accessibility and motif accessibility, indicating that transplantation enhances both transcriptional and chromat in landscapes.
To compare 6 month in vitro and 6 month transplanted SC-islets, we integrated these datasets with the 2 week in vitro data ( Fig. 6a and Extended Data Fig. 9a,b). Overall, transplanted SC-islet cells displayed greater enrichment of genes and motifs for transcription factors associated with their cell types, demonstrating that both time and environment are necessary for maturation (Extended Data Fig. 9c). In particular, transplanted SC-β cells exhibited several upregulated and   downregulated genes and motifs compared with their 6 month in vitro counterparts, allowing us to compile a list of markers relevant for β-cell identity from both environments (Fig. 6b,c, Extended Data Fig. 9d and Supplementary Table 12). In vitro culture closed many chromatin regions, including some associated with β-cell identity. In vivo time opened chromatin regions linked with cell identity while restricting others, enabling transplanted SC-β cells to acquire a chromatin and transcriptional landscape closer to primary β cells. Furthermore, SC-EC cells also retained expression of identity genes when transplanted, in contrast to during 6 month in vitro culture.
Throughout our analysis, the chromatin regulator ARID1B emerged several times as being differentially expressed in β cells (Figs. 4g and 5d and Extended Data Figs. 8c and 10a). Knockdown of ARID1B during the in vitro culture of SC-islets increased β-cell identity markers, including     Table 13; 21,969 cells from 2 datasets, 1 of each condition from the same differentiation batch). These findings demonstrate that the chromatin landscape is important for SC-β cell identity and that targeting chromatin regulators can further improve SC-β cell maturation in vitro.

Discussion
SC-β cells have the potential to functionally cure type 1 diabetes 53 but do not perfectly match the transcriptional and functional features of primary β cells. An increased understanding of the deficiencies in lineage specification could improve SC-islet differentiation to prevent non-endocrine cell generation and increase SC-β cell function. Our single-nucleus multi-omic approach provided a more robust definition of cell types than using either data type in isolation and identified important genes and chromatin signatures for SC-islet cell type specification. Comparison with primary human islets allowed for identification of deficiencies in the chromatin and transcriptional landscape of SC-islet cell types. Although the analysis of only four islet donors may not encompass all possible variability across the general population, these data still offer important insights into the chromatin and transcriptional landscape of SC-islet cell types and helped identify differences from their primary counterparts. These differentially expressed genes and chromatin accessibility signatures identified here can be targeted to improve SC-islet cell differentiation.
Our multi-omic analysis revealed important insights about the gene expression and chromatin state dynamics of SC-islet cell types compared with primary human islets. While SC-islet cell types were less distinct from each other by chromatin accessibility compared with primary cells, this difference was mainly driven by continued open chromatin accessibility of genes expressed by progenitor cell types, such as NEUROG3 (ref. 54) and GP2 (ref. 14), and alternative cell fates that were closed in primary cells. After transplantation into mice, SC-islet cell types developed more distinguished transcriptional and chromatin accessibility signatures that matched their respective cell identities 16,17 . However, extended in vitro culture broadly restricted access to chromatin regions, including those associated with a β-cell identity, and the underlying mechanism responsible for this difference in cellular identity remains unclear. These findings may have implications for other in vitro differentiation systems where immaturity is commonly observed 55,56 .
Previous research suggested that SC-EC and SC-β cells are distinct populations that arise from the same pancreatic progenitor population during SC-islet differentiation 9 . However, our multi-omic study revealed that these cells form a continuum of cell types with varying degrees of both enterochromaffin and β-cell features, rather than  Resource https://doi.org/10.1038/s41556-023-01150-8 distinct cell populations. This suggests that current directed differentiation methodologies are insufficient for fully specifying each endocrine cell type. Furthermore, our data support the idea that chromatin accessibility is a major regulator of fate decisions between SC-EC and SC-β cells, particularly by CTCF. Interestingly, a recent publication 57 using a different single-cell approach suggested that SC-EC cells resemble a pre-β cell population in the pancreas. Transplanted SC-islets retained SC-EC cells after 6 months in vivo, but this population became less similar to β cells over time. As SC-EC cells may be detrimental 9 to SC-β cell function, understanding how to reduce or eliminate them is desired. Analysis tools that take account of both chromatin and mRNA 44 could lead to insights on cell fate decisions using our datasets.
Our study highlights the important role of chromatin regulators in the generation and final identity of islet cell types, as evidenced by the drastic differences in chromatin accessibility between cell types and within a given cell type from different origins. By modulating the chromatin regulators CTCF and ARID1B, we were able to alter SC-β cell identity, demonstrating the potential for better control of the chromatin landscape to improve SC-islet differentiation protocols. Our comprehensive indexing of single-cell transcriptional and chromatin accessibility states in SC-islets provides a valuable resource for further development of these protocols, as both aspects of islet cell identity will probably need to be targeted for enhanced differentiation strategies. Further comparisons with other multi-omic approaches 57 may yield additional insights into SC-islet identity and biology.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41556-023-01150-8.

Stem cell culture and differentiation
The HUES8 human embryonic stem cell (hESC) line was provided by Douglas Melton (Harvard University) 5 . The H1 hESC line was provided by Lindy Barrett (Broad Institute) with permission from WiCell containing doxycycline-inducible dCas9-VPR transgene in the AAVS1 locus (CRISPRa system) 46 . All hESC work was approved by the Washington University Embryonic Stem Cell Research Oversight Committee (approval no. 15-002) with appropriate conditions and consent. mTeSR1 (StemCell Technologies; 05850) was used for the culture of undifferentiated stem cells. All cell culture was maintained in a humidified incubator at 5% CO 2 and 37 °C. Cells were passaged every 4 days by washing cell with phosphate-buffered saline (PBS) and incubating with TrypLE at 0.2 ml cm −2 (Gibco; 12-604-013) for 10 min or less at 37 °C. Dispersed cells were then mixed with an equal volume of mTeSR1 supplemented with 10 µM Y-27632 (Pepro Tech; 129382310MG). Cells were counted on Vi-Cell XR (Beckman Coulter) and spun at 300g for 3 min at room temperature (RT). The supernatant was aspirated, and cells were seeded at a density of 0.8 × 10 5 cm −2 for propagation onto Matrigel (Corning; 356230)-coated plates in mTeSR1 supplemented with 10 µM Y-27632. After 24 h, medium was replaced daily with mTeSR1 without Y-27632. SC-islet differentiation was performed as described previously 7,33 . Briefly, hESCs were seeded at a density of 6.3 × 10 5 cells cm −2 .
Twenty-four hours later, the mTeSR1 was replaced with differentiation medium supplemented with small molecules and growth factors as outlined in Supplementary Table 1.

SC-islet and primary islet cell culture
After 7 days in stage 6 of the differentiation protocol, cells were dispersed from the culture plate with TrypLE at 0.2 ml cm −2 (Gibco; 12-604-013) for 10 min or less at 37 °C. The cells were mixed with an equal volume of stage 6 enriched serum-free medium (ESFM), centrifuged at 300g, and resuspended in ESFM at a concentration of 1 million cells ml −1 . Five millilitres of this solution were pipetted in each well of a six-well plate and placed on an orbital shaker (Orbi-Shaker CO 2 , Benchmark Scientific) at 100 r.p.m. to form SC-islet clusters. These clusters were maintained by aspirating and replacing 3 ml of ESFM every 2-3 days. SC-islets in long-term culture were similarly maintained with ESFM for up to 1 year without passaging. Primary human islets were acquired as clusters and shipped from Prodo Laboratories, which required consent from the donor relatives for use in research. Consent information can be found on their website (https://prodolabs.com/ human-islets-for-research). These islets have been refused for human islet transplants and meet specific criteria for research use. Donor details can be found in Supplementary Table 2. Our study consists of four donors. Upon arrival, islets were transferred into a six-well plate on an orbital shaker at 100 r.p.m. and maintained with 4 ml per well of CMRL1066 Supplemented medium (Corning; 99-603-CV) with 10% heat-inactivated foetal bovine serum (Gibco; 26140-079). Primary human islets were submitted for sequencing within 2 days after arrival.

Mouse transplantations and SC-islet cell retrieval
Mice that were 7 weeks old, male, and with the NOD.Cg-Prkdc scid Il2rg tm1wjl /SzJ (NSG) background ( Jackson Laboratories; 005557) were randomly assigned to experimental groups. Mice were housed in an ambient facility with 30-70% humidity and a 12 h light/dark cycle and were fed a chow diet. Animal studies were performed by unblinded individuals in accordance with the Washington University International Care and Use Committee guidelines (approval 21-0240). Mice were anaesthetized using isoflurane and injected with ~5 × 10 6 SC-islet cells under the kidney capsule. At 6 months post-transplantation, mice were killed, and the kidney transplanted with SC-islets was removed. Sliced kidney samples were placed into a solution of 2 mg ml −1 collagenase D (Sigma; 11088858001) in RPMI (Gibco; 1187-085). A total of nine mice were used to produce three samples. Each sample consisted of pooling three transplanted kidneys (one kidney from each mouse) to achieve sufficient cell numbers required for sequencing. The tissue was incubated for 40 min at 37 °C before diluting with PBS, mechanically disrupting with a pipette, and filtered through a 70 µm strainer (Corning; 431751). The flowthrough was centrifuged, and the remaining cell pellet was resuspended in MACS buffer (0.05% bovine serum albumin (BSA) in PBS). The Miltenyi mouse cell isolation kit (Miltenyi; 130-104-694; LS column, 130-042-401) was used to remove excess mouse cells. The flowthrough was centrifuged, and resuspended in PBS with 0.04% BSA for nuclei processing and sequencing. Data collection and analysis were not performed blind to the conditions of the experiments. No animals or data points associated with transplantation were excluded.

Single-nucleus sample preparation and sequencing
Cells were processed and delivered to the McDonnell Genome Institute at Washington University for library preparation and sequencing. Samples were processed by dispersing cells into a single-cell suspension using TrypLE for 10 min at 37 °C and quantified for viability using the Vi-Cell XR (Beckman Coulter). Before proceeding, all samples were ensured to have >90% viability to minimize dead cell carry over in sequencing. Single-cell suspension samples were processed into nuclei according to the 10X Multiome ATAC + Gene Expression (GEX) protocol (CGOOO338). Cell samples were collected and washed with PBS (with 0.04% BSA), lysed with chilled Lysis Buffer for 4 min, washed three times with wash buffer and resuspended with 10x nuclei buffer at 3,000-5,000 nuclei µl −1 . Nucleus samples were processed using the Chromium 10x genomics instrument, with a target cell number of 7,000-1,0000. The 10x Single Cell Multiome ATAC + Gene Expression v1 kit was used according to the manufacturer's instructions for library preparations. Tape station figures for single-nucleus ATAC libraries can be found in Supplementary Fig. 1. Sequencing of the library was performed using the NovaSeq 6000 System (Illumina).

Processing and filtering of multi-omic sequencing data
Multi-omic sequenced files were processed for demultiplexing and analysed using Cell Ranger ARC v2.0. Genes were mapped and referenced using human reference genome  Table 2 and Supplementary Fig. 2.

Dataset normalization, integration and assay build
Gene expression data were processed with SCT transform. ATAC data were processed with 'RunTFIDF' and 'RunSVD'. Integration of datasets were performed by anchoring using 'FindIntegrationAchors'. 'rpca' was used as a reduction method, with 'SCT' normalization to correct for batch differences for RNA expression. For integrated ATAC data, 'lsi' reduction was used along with 'RunTFIDF' and 'RunSVD' for batch correction. 'FindMultimodalNeighbors' with gene expression based 'pca' and ATAC based 'lsi' reductions were used to generate a joint neighbour graph. 'FindClusters' with SLM algorithm was used to identify clusters. https://doi.org/10.1038/s41556-023-01150-8 Promoter accessibilities were determined from ATAC data using 'Gene-Activity' (2,000 base pairs upstream of the transcription start sites 89,90 ). ATAC peaks were called using MACS2 and linked using LinkPeaks and Cicero to determine cis-regulatory elements 89,91 . 'RunChromVAR' of chromVAR 1.12.0 package 92 and JASPAR version 2020 database 93 were used to compute motif enrichment. Motif sequence IDs were converted to transcription factor motifs using 'ConvertMotifID'. Promoter accessibility and motif enrichment information were incorporated as assay data in the integrated Seurat object files.

Analysis of multi-omic datasets
Cell types in the integrated datasets were identified by performing differential expression analysis. 'FindMarkers' using the 'wilcox' test method was used to determine highly expressed feartures. 'LR' test method in 'FindMarkers' was used to determine upregulated motif accessibility. 'ConveragePlot' of the Signac package (for example, Figs. 1d and 4f) was used to generate ATAC peaks with cis-regulatory element information. Uniform manifold approximation and projection (UMAP) and violin plots were generated using 'FeaturePlot' and 'VlnPlot' (for example, Figs. 1g and 2d). Heat maps were generated using average values using 'AverageExpression' and 'heatmap.2' (for example, Fig. 1c). Top active transcription factors were assessed for top upregulated transcription factor gene expression and motif enrichment by computing the average fold change of selected populations or conditions compared to other cell populations. Shared upregulated transcription factors by both gene expression and motif enrichment assays were determined as active transcription factors (for example, Fig. 1f). Volcano plots were generated using 'EnhanvedVolcano' (for example, Fig. 2c). Gene set enrichment analysis was performed using 'DEenrichRPlot' of enrichR package with databases from Gene Ontology, KEGG_2021_Human and MSigDB_Hallmark_2020.

Trajectory analysis
Trajectory analysis was performed using SeuratWrappers and the Monocle3 1.0 package 94 . 'Subset' was used to isolate specific populations (enterochromaffin cells and β cells) from the integrated dataset. A monocle compatible cds file was generated using 'as.cell_data_set'. Pseudotime was computed and determined using the 'cluster_cells' and 'learn_graph' functions. Trajectories were re-established by selecting the initial node pseudo-timepoint using 'order_cells'. Dynamic analysis of gene expression and motif enrichment were obtained by conducting differential expression feature analyses along the pseudotime trajectory. 'graph_test' was used with 'neighbor_graph' parameter set to 'principal_graph'. Genes or motifs of high differential expression along pseudotime were determined by excluding features with low Moran's I scores (morans_I >0.05). Expression values were Z-scored and plotted using 'Heatmap' with K means parameter (k m = 2). Pseudotime information from this analysis was incorporated into the Seurat object using 'AddMetaData'. MultiVelo 44 package running on Jupyter Notebook (Python) was used to perform alternative trajectory analysis. ATAC and RNA files were imported using 'sc.read_10x_mtx' and 'scv.read'. Aggregated peaks around gene regions were computed using 'mv.aggregate_peaks_10x', and mapped with RNA and ATAC information using 'pd.Index'. RNA counts and ATAC peaks were normalized using 'scv.pp.filter_and_normalize' and 'mv.tfidf_norm' respectively. Smoothing of gene peak aggregates was performed using 'mv.knn_smooth_chrom', followed with 'mv.recover_dynamics_chrom' to execute the multi-omic dynamic model to predict cell state trajectories. Pseudotime was computed using 'mv.latent_time'. Genes of interest were plotted along pseudotime using 'mv.dynamic_plot'.

Transduction of CTCF gRNA in CRISPRa line
CRISPRa genetic engineering of the H1 dCas9-VPR line 46 was performed using custom guide RNAs (gRNAs, MilliporeSigma, Supplementary Table 1). gRNAs were resuspended to a final concentration of 100 µM in water. Primers were phosphorylated and ligated together by adding T4 ligation buffer and T4 Polynucleotide kinase enzyme (NEB; B0202A and M0201S) and running on a thermocycler under the following conditions: 37 °C for 30 min; 95 °C for 5 min; and ramp down to 25 °C at 5 °C min −1 . Oligos were then diluted with 90 µl of ultrapure water. These oligos were then inserted into the single guide RNA (sgRNA) library backbone (Lenti sgRNA(MS2)_puro) using the Golden Gate reaction. This was achieved by adding a 25 ng µl −1 plasmid backbone to a master mix of Rapid Ligase Buffer 2X (Enzymatics: B1010L), Fast Digest Esp31 (Thermo: FD0454), dithiothreitol (Promega: PRP1171), BSA (NEB: B9000S), T7 DNA ligase, (Enzymatics: L6020L) and the diluted gRNA oligos in a total reaction volume of 25 µl. The Golden Gate assembly reaction was then performed in a thermocycler under the following conditions: 15 cycles of 37 °C for 5 min, 20 °C for 5 min with final hold at 4 °C. Lenti sgRNA (MS2) puromycin optimized backbone was a generous gift from Feng Zhang (Addgene plasmid no. 73797). This final plasmid was then transfected into STBL3 following the same methods as described below in the 'Lentiviral design, preparation and transduction' section.
To transfect the CRISPRa H1 dCas9-VPR stem cell line, lentiviral particles containing gRNA was added at a multiplicity of infection of 5 with polybrene (5 µg ml −1 ) in culture for 24 h. At confluency, transfected stem cells were passaged and cultured with medium containing puromycin (1 µg ml −1 ) for selection. To induce CRISPRa expression, doxycycline (MilliporeSigma) was added at 1 µg ml −1 for 7 days during stage 5 of the differentiation protocol.

Real-time PCR
Cells were lysed directly with RLT buffer from the RNeasy Mini Kit (74016; Qiagen) followed by RNA extraction following the manufacturer's instructions. Complementary DNA was synthesized from the RNA using the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems; 129382310MG) on a T100 thermocycler (BioRad). PowerUp SYBR Green Master Mix (Applied Biosystems; A257411) was used to run samples on the Quant Studio 6 Pro (Applied Biosystems), and results were analysed using ΔΔCt methodology. The housekeeping genes TBP and GUSB were both used for normalization. Primer sequences used in this paper are listed in Supplementary Table 1. qPCR data were collected from the QuantStudio6 Pro using Design & Analysis 2.6.0.

ICC
Fluorescence images were taken on a Zeiss Cell Discoverer confocal 7 microscope. For immunocytochemistry (ICC), cells were fixed in 4% paraformaldehyde (Electron Microscopy Science; 15714) for 30 min at RT. For staining, fixed cells were incubated in ICC solution (PBS (Fisher; MT21040CV), 0.1% Triton X (Acros Organics; 327371000) and 5% donkey serum ( Jackson Immunoresearch; 01700-121)) for 30 min at RT. Samples were subsequently treated with primary and secondary antibodies in ICC solution overnight at 4 °C and 2 h at RT, respectively. DAPI (Invitrogen; D1306) was used for nuclear staining. Samples were incubated in DAPI for 12 min at RT, washed with ICC solution and stored in PBS until imaging. Antibody details and dilutions can be found in Supplementary Table 1. ImageJ was used for analysis. Quantification was performed by manual counting of cells from analysed fluorescence images and can be found in Supplementary Fig. 3.

Flow cytometry
Cells were single-cell dispersed by washing with PBS and adding 0.2 ml TrypLE cm −2 for 10 min at 37 °C. Cells were washed with PBS, centrifuged and fixed by resuspending the cells in 4% paraformaldehyde at 4 °C for 30 min. After another PBS wash, samples were treated with ICC solution for 45 min at RT. Primary antibodies were prepared in ICC solution and incubated on cells overnight at 4 °C. Samples were washed with PBS and incubated for 2 h with secondary antibodies in ICC at 4 °C. Antibody https://doi.org/10.1038/s41556-023-01150-8 details and dilutions can be found in Supplementary Table 1. Cells were washed twice with PBS and filtered before running on the LSR Fortessa flow cytometer (BD Bioscience) using BD FACSDiva. FlowJo v10.8.0 (Becton, Dickinson, and Company) was used for analysis. The used gating strategy can be found in Supplementary Fig. 4.

Statistics and reproducibility
No statistical methods were used to pre-determine sample sizes, but our sample sizes are similar to those reported in previous publications 5,9,16,31,33 . This study contains multi-omic sequencing datasets of SC-islets from three independent differentiation batches (three datasets), human islets from four donors (four datasets), SC-islets after 3 weeks, 4 weeks, 6 months and 12 months into stage 6 with one replicate each (five datasets), CTCF CRISPRa differentiations of one replicate from each condition (control and doxycycline-induced CTCF overexpression; two datasets), and ARID1BshRNA SC-islets of one replicate from each condition (control and shARID1B knockdown; two datasets). Information on datasets used and sample details can be found in Supplementary Table 2. Cell selection criteria or exclusion methods for single-cell analysis can be found in Supplementary Fig. 2. Statistical significance from the multi-omic analyses was calculated using the Wilcoxon rank sum test for RNA expression, logistic regression for motif chromatin accessibility and Bonferroni correction to account for multiple testing. For in vitro experiments, we performed unpaired or paired parametric t-tests (two-sided) and one-way ANOVA with Tukey's multiple comparison testing to determine significance. Data distribution was assumed to be normal, but this was not formally tested. All in vitro experiment data points presented are biological replicates and can be found in Source Data. Significant values are marked on the basis of P values using non-significant (NS) >0.05, *<0.05, **<0.01, ***<0.001 and ****<0.0001.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
Sequencing data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) under accession code GSE199636. Source data are provided with this paper. GRCh38 human genome 95 , MSigDB 96 and JASPAR2020 93 databases were used. All other data supporting the findings of this study are available from the corresponding author on reasonable request.

Code availability
Codes used for analysing single-nucleus multi-omic sequencing are available on https://github.com/punnaug. No custom codes or mathematical algorithms were developed or used in this study.