The single-cell transcriptional landscape of innate and adaptive lymphocytes in pediatric-onset colitis

Summary Innate lymphoid cells (ILCs) are considered innate counterparts of adaptive T cells; however, their common and unique transcriptional signatures in pediatric inflammatory bowel disease (pIBD) are largely unknown. Here, we report a dysregulated colonic ILC composition in pIBD colitis that correlates with inflammatory activity, including accumulation of naive-like CD45RA+CD62L− ILCs. Weighted gene co-expression network analysis (WGCNA) reveals modules of genes that are shared or unique across innate and adaptive lymphocytes. Shared modules include genes associated with activation/tissue residency, naivety/quiescence, and antigen presentation. Lastly, nearest-neighbor-based analysis facilitates the identification of “most inflamed” and “least inflamed” lymphocytes in pIBD colon with unique transcriptional signatures. Our study reveals shared and unique transcriptional signatures of colonic ILCs and T cells in pIBD. We also provide insight into the transcriptional regulation of colonic inflammation, deepening our understanding of the potential mechanisms involved in pIBD.


In brief
The role of innate lymphoid cells (ILCs) in pediatric IBD (pIBD) remains understudied. Here, Kokkinou et al. show that while some transcriptional features of colonic ILCs in pIBD are unique, some are shared with T cells. pIBD inflammation leads to transcriptional and cellular changes in both ILCs and T cells. INTRODUCTION increasing. 2 PIBD is characterized by more severe symptoms and a more extensive disease at onset than adults, and the CD phenotype shows less isolated ileal manifestations and more colonic inflammation compared with adult patients with CD. 2 The etiology and pathogenesis of IBD is considered multifactorial. Host genetic as well as environmental factors may result in a dysregulated immune response and dysbiosis that is believed to drive chronic inflammation. 3 T cells are implicated in intestinal inflammation in both mouse models 4 and in humans, where conventional (CD4 + and CD8 + ) [5][6][7] as well as unconventional T cells, including innate-like, gd T cells and mucosal invariant (MAIT) 8,9 T cells, show dysregulated composition. In patients with IBD, biologic treatments antagonizing immunomodulating factors that either activate T cells (interleukin [IL]-12 /23, ustekinumab, etc.) or are expressed by T cells (a4b7 integrin; velodizumab, tumor necrosis factor [TNF]; infliximab, etc.) are effective in the majority of the patients. 10 Innate lymphoid cells (ILCs), including natural killer (NK) cells, are viewed as the counterparts of adaptive T cells, which, in concert, execute specialized effector programs. 11 In the inflamed adult IBD colon and ileum, the mucosal ILC composition is altered, resulting in reduced frequencies of IL-22-producing NKp44 + type 3 ILCs (ILC3s) and increased interferon (IFN)-g-producing ILC1s 12,13 as well as IL-17-producing ILC3s. 14 However, it remains unknown whether these alterations in ILC frequencies are also present in pIBD.
Single-cell RNA sequencing (scRNA-seq) studies have provided transcriptional signatures of T cells in IBD mucosa of children and adults. 5,[15][16][17][18][19] However, characterization of intestinal ILCs in parallel to T cells in pediatric patients with IBD has not yet been performed.
Here, we provide a comprehensive characterization of ILCs and T cells in colonic biopsies from newly diagnosed and treatment-naive pediatric patients with IBD colitis and non-IBD controls. Flow cytometry showed inflammation-specific alterations of ILC frequencies, including the accumulation of CD45RA + CD62L À naive-like ILCs, which correlated with disease severity. scRNA-seq revealed gene co-expression networks that were shared between or unique to innate and adaptive lymphocytes. Finally, a nearest-neighbor-based approach provided unbiased means to identify the ''most inflamed'' or ''least inflamed'' lymphocytes in our dataset based on the histological inflammation score of the samples. Overall, our study provides a comprehensive immunophenotypic and transcriptional characterization of ILCs and T cells in pediatric patients with IBD colitis. We uncover previously unknown co-regulatory dynamics of innate and adaptive lymphocytes that potentially play a role in disease pathogenesis and may be targeted for therapeutic intervention.

Dysregulated colonic ILC composition correlates with inflammatory activity in pIBD colitis
To characterize ILCs in pediatric patients with newly diagnosed IBD, we collected colonic biopsies from endoscopically inflamed and non-inflamed sites from 19 treatment-naive patients with CD or UC colitis (Table S1). Pediatric patients who underwent colonoscopy because of intestinal symptoms but did not have IBD or other endoscopic inflammation served as the non-IBD control group (n = 10). For the patients with IBD, endoscopic inflammation was evaluated with the SWIBREG 20 local endoscopic disease activity score (denoted ''endoscopic score'') (Table S2), and local microscopic inflammation was evaluated with a modified version of the D 0 Haens et al. histological score (denoted ''histo score''; see STAR Methods). 21 Human intestinal CD127 + ILCs were analyzed by flow cytometry as previously described 13,22 ( Figure S1A). The identity of ILC1s, ILC3s, and NK cells was confirmed by staining of lineage-defining transcription factors in both blood and colon samples ( Figure S1B). In contrast to CD3 + T cells and CD56 + NK cells, the frequency of CD127 + ILCs among lymphocytes was reduced in endoscopically inflamed IBD compared with non-IBD and non-inflamed IBD colon mucosa ( Figure 1A). The majority of ILCs expressed the tissue residency marker CD69 (Figure S1C) but also showed increased expression of markers associated with naive T cells, CD45RA and CD62L ( Figure S1C). While HLA-DR expression was elevated in pIBD ILCs, the expression remained unchanged in total CD3 + T cells (Figure S1C). However, elevated HLA-DR expression has been previously detected in CD4 + T cells in IBD. 5 Further phenotyping of pIBD T cells and NK cells can be seen in Figure S1C.
We detected a skewed ILC composition in endoscopically inflamed IBD versus non-IBD and non-inflamed IBD colon, as illustrated by increased frequencies of NKp44 À ILCs and ILC1s, while NKp44 + ILC3 frequencies were reduced ( Figure 1B), similar to our previous findings in adult IBD. 13 There was a positive correlation between the endoscopic and histo scores and the frequencies of ILC2s and NKp44 À ILCs, whereas the frequency of NKp44 + ILC3s was inversely correlated with both endoscopic and histo scores ( Figure 1D). Overall, our findings show that the mucosal ILC composition is dysregulated in patients with pIBD, displaying a pattern similar to what we previously reported for adult IBD.

Single-cell transcriptional profiling of colonic ILCs, NK cells, and T cells in pIBD colitis
To characterize the transcriptional signatures of both innate and adaptive lymphocytes in pIBD, we performed 3 0 103 scRNA-seq of biopsies from six treatment-naive pediatric patients with newly diagnosed CD-colitis (total n = 12 samples) ( Figure 2A; Table S3). To assess the transcriptional signatures of inflammation within the same patient, biopsies were obtained from the endoscopically least and most inflamed area of the colon and subsequently fluorescence-activated cell sorted (FACS) for Lin À CD127 + CD161 + (ILCs), Lin + CD127 À CD56 + (NK cells), and CD3 + (T cells) ( Figure S2A). After quality control, normalization, and integration of the datasets, 44,717 lymphocytes were obtained for analysis. Louvain clustering analysis allowed for the separation of the pre-sorted cell populations into 12 clusters, which were represented by all patients and samples ( Figures 2B and S2B). Clusters were annotated based on signature genes ( Figure 2C), where one cluster represented ILCs expressing IL7R, ID2, and IL22, two clusters represented different activation stages of NK cells expressing ID2, ZBTB16, PRF1, GZMA, and IFNG, and nine clusters represented subsets of CD4 + and CD8 + T cells. These included naive/central memory CD8 + and CD4 + T cells (CD8 + and CD4 + Tn/cm cells; CD8, A B C Figure 1. Dysregulated colonic ILC composition correlates to inflammatory activity in pIBD colitis (A) Bar graphs showing the frequencies of total ILCs, T cells, and NK cells (gated as shown in Figure S1A) in non-IBD (n = 10), endoscopically non-inflamed (n = 12), and endoscopically inflamed (n = 15) pIBD colon samples. A total of 8 samples were derived from paired inflamed and non-inflamed samples. Frequencies shown out of CD45 + lymphocytes. (B) Bar graphs showing the frequencies of ILC1s, ILC2s, NKp44 + ILC3s, and NKp44 À ILCs (gated as shown in Figure S1A) in non-IBD (n = 10), endoscopically noninflamed (n = 12), and endoscopically inflamed (n = 15) pIBD colon samples. Frequencies shown out of total ILCs. (C) Correlation plots between frequencies of ILC1s, ILC2s, NKp44 + ILC3s, and NKp44 À ILCs and the endoscopic and histological scores determined for each of the endoscopically non-inflamed (n = 12) and endoscopically inflamed (n = 15) pIBD colon samples. Statistical significances in (B) were detected with Mann-Whitney test. Bar graphs are shown as median ± interquartile range (IQR). Spearman's rank correlation test was applied for assessing correlations between variables. In (A)-(C), cells from each donor were analyzed in independent experiments.  (Figures 2B and 2C).
Altogether, scRNA-seq revealed an overall diverse landscape of innate and adaptive lymphocytes in the pediatric colonic mucosa.
Gene module analyses of colonic innate and adaptive lymphocytes reveal subset-specific and shared programs of co-expressed genes Complex biological processes like inflammation are orchestrated by co-expressed gene networks. In addition, ILCs and T cells largely overlap in terms of phenotype and functions, suggesting coordinated gene regulation. To identify such patterns of gene expression and regulation dynamics, we performed weighted gene co-expression network analysis (WGCNA), 23,24 where we used the top 4,000 variable genes to discern 50 modules of co-regulated genes among the lymphocytes ( Figure 3B). With this approach, we identified gene networks that were either confined to a specific cell type or shared across cell types. More specifically, we identified three modules (1, 11, and 43) preferentially expressed by the ILC cluster ( Figures 3C and S3A). Intrigu-ingly, module 1 showed coordinated expression between key ILC cytokine genes IL22, CSF2 (encoding GM-CSF), the co-stimulatory molecule CD83, and IL4I1, the latter encoding a secreted L-amino acid oxidase reported to promote aryl hydrocarbon receptor (AHR) activation while limiting Th17 proliferation. 25,26 The expression of SOX4, IL4R, and CYSLTR1 (encoding the cysteinyl leukotriene receptor) in module 11 showed that although a distinct cluster of ILC2s could not be identified in our data, ILCs expressing transcripts with known regulatory function in ILC2s/Th2 cells are present in the pIBD colon. Two gene modules were shared between the two innate lymphocyte clusters (ILCs and NK cells; modules 8 and 19), including the cytotoxicity-associated adapter proteins TYROBP and FCER1G, as well as the effector molecule AREG that was expressed together with BHLHE40, encoding a transcriptional regulator previously reported to control cytokine production in T cells 27,28 (Figures 3D and S3B). Highlighting the shared features of ILCs and T cells, module 26 was coordinately expressed by ILCs and Trm cells, including innate-like genes (ID2 and KLRB1) and genes associated with tissue residency (ZFP36 and NR4A2) ( Figures 3E and S3C). Module 7 was shared by ILCs and CD4 + Teff cells, Trm cells, and Activ. T cells and included genes typically expressed by Th17 cells (RBPJ, ICOS, CCL20, and CCR6) ( Figures 3F and S3D). In line with previous reports showing antigen-presenting features of ILCs, 29-32 module 20 contained a set of transcripts associated with antigen presentation (HLA-DRA and HLA-DPA1) that were shared by ILCs and Teff cells ( Figures 3G and S3E). Additional shared signatures between ILCs and T cells were related to genes that regulate tissue residency (FOS and NR4A1; module 18) as well as quiescence (TSC22D3, DUSP1, ZFP36L2, and KLF6; module 22) ( Figures 3H and S3F), the latter recently shown to be expressed by plastic skin ILCs in a mouse model of psoriasis. 33 Lastly, module 30 showed coordinated expression of KLF2 and TCF7, genes typically expressed by naive T cells. However, module 30 was highly expressed by ILCs and naive T cells, indicating the presence of ILCs with naive features as previously described in the adult gut 24,34 (Figures 3I and S3G).
Overall, these findings reveal subset-specific and common gene signatures that may regulate ILCs and T cells.

Gene modules associate with ILC subclusters and identify quiescent/naive-like ILCs
To better understand the expression pattern of unique and shared gene modules among ILCs, we focused our analysis only on the ILCs. Reclustering of ILCs revealed 7 transcriptionally distinct clusters that were annotated by differential expression (DE) analysis (Figures 4A, 4B, S4A, and S4B). Interestingly, the pIBD ILC subclusters displayed a similar degree of heterogeneity with ILCs from adult IBD colon, while tonsillar ILCs were less heterogeneous ( Figures S4C and S4D). Intriguingly, the module 1 genes CD83, IL4I1, IL22, and CSF2 were confined to ILC subcluster 3 ( Figures 4C and S4E), but these genes associated poorly with expression of the key ILC3 genes NCR2 ( Figure 4C) and RORC ( Figure S4F), which were widely expressed among the ILC clusters. We further identified that module 20 genes (HLA-DPA1, HLA-DRA, and HLA-DQB1) were uniquely expressed ( Figures 4B, 4D, and S4E) by subcluster 7, suggesting that they represent antigen-presenting ILCs. Furthermore, subcluster 1 expressed many genes in modules 18 and 22, including TSC22D3, DUSP1, and CD69 ( Figures 4B, 4E, and S4E), previously associated with quiescent and tissue-resident ILCs. 33 Interestingly, in line with our previous report, 34 we identified a SELL-expressing ILC subcluster 4 ( Figures 4B and 4F), with low expression of NCR2, HLA-DR, and RORC transcripts ( Figures 4B, 4D, and S4F), likely representing naive-like ILCs enriched in adult IBD. Genes in the ''naivety'' module 30 ( Figure 3I) were not solely expressed by subcluster 4. In fact, TCF7 was homogenously expressed across all ILCs, and KLF2 was also expressed by subcluster 1 (Figures 4B and 4G). These data indicated the presence of at least two subclusters of ILCs with features of naive T cells.
To better characterize the naive/quiescent ILCs in pIBD colon, we stained ILCs for markers of activation/differentiation. We have recently described that the less-differentiated and less-activated (NKp44 À HLA-DR À ) ILCs in human tissues, including adult IBD gut, form a heterogeneous population that includes naive-like/quiescent ILCs marked by the presence of surface CD45RA expression. 34 Confirming and extending our previous findings, CD45RA also marked such cells in pIBD ( Figure 4I). As we observed in the adult IBD gut, surface CD62L was not expressed on this population despite the presence of SELL transcripts ( Figure 4J). Similarly to adult IBD, the frequency of these CD45RA + CD62L À ILCs (referred to as CD45RA + ILCs) was increased in the endoscopically inflamed pIBD gut samples, and their frequency was positively correlated with the local endoscopic as well as histo score ( Figures 4K and 4L).
In summary, our analyses reveal gene networks confined to certain ILC subsets, including a naive-like module. Using previously described surface markers that identify naive-like/quiescent ILCs in tissues, we observed an accumulation of CD45RA + ILCs in the inflamed gut of pediatric patients with IBD that correlated with the degree of inflammation.
Nearest-neighbor-smoothed histo scores reveal the most and least inflammatory cells Next, we set out to explore inflammation-related transcriptional signatures of innate and adaptive lymphocytes. During endoscopy, the grade of inflammation is evaluated based on the presence of visual signs. 35 In contrast, histological findings more precisely assess both the acute and chronic cellular changes occurring in the intestinal epithelium and lamina propria. 36 We were therefore interested in defining the transcriptional signatures of these cellular changes with scRNA-seq ( Figures 5A  and 5B). Indeed, when plotting the histo scores on the uniform manifold approximation and projection (UMAP), we could observe overrepresentation of lymphocytes from samples with high histo scores in certain clusters ( Figure 5C). However, this crude representation did not consider donor representability. Therefore, we repurposed a nearest-neighbor-based smoothing method deployed elsewhere. [37][38][39] Here, each cell was assigned the average histo scores of the 100 nearest-neighbor cells in the Euclidean space, which was created by a 100-dimensional principal-component analysis based on the 4,000 most variable genes ( Figure 5D). To balance out the number of neighbors for each histological group (defined below), we conducted subsampling of the data so that all three groups consisted of the same number of neighbors ( Figure S5A). Distribution of lymphocytes across tissues with different histo scores allowed for categorization of the cells into three groups: least inflamed (histo score % 4), ''intermediate'' (histo score = 5-8), and most inflamed lymphocytes (histo score R 9) ( Figure 5E). Visualization of the smoothed histo scores revealed enrichment of the most inflamed and least inflamed lymphocytes in certain clusters ( Figures 5F and S5B). The most inflamed samples were mostly derived from 4 samples ( Figure S5C). However, similar enrich-ment was predicted when batch correction was applied (Figure S5D) and also when we performed differential abundance analysis with DAseq ( Figure S5E). 38 The most inflamed lymphocytes were predominantly NK cells (cluster 2), CD8 + Teff cells (cluster 5), CD4 + Tef cells (cluster 9), and Activ. T cells/Tregs (cluster 12) ( Figure 5F). Conversely, the least inflamed cells were enriched in ILCs (cluster 1), CD8 + Tn/cm cells (cluster 4), CD4 + Tn/cm cells (cluster 8), CD8 + Trm cells (cluster 7), and CD4 + Trm cells (cluster 10). To uncover transcriptional signatures of the most inflamed lymphocytes, we performed DE analysis in the most inflamed cells (shown as MOST) versus the rest of the cells (shown as REST) within each cluster. A total of 96 genes were upregulated in the most inflamed cells with 35 genes being shared among the clusters (Table S4). The most inflamed NK cells showed elevated expression of the transcript for the antimicrobial molecule granulysin (GNLY) and genes involved in immunoregulatory interactions (CD81 and CLEC2B), while the most inflamed CD8 + T cells expressed the selenoprotein K (SELK) and the cAMP regulator CREM ( Figure 5G). Interestingly, the most inflamed CD4 + Teff cells displayed a cytotoxic signature with expression of the granzymes GZMK and GZMH ( Figure 5G), while the most inflamed Activ. CD4 + T/Tregs upregulated genes involved in redox balance (SELK, PRDX1, PRDX2, and SOD1), the caspase inhibitor CARD16, and the inhibitory molecules LAIR2 and TNFRSF18 (GITR) ( Figure 5G). Similarly, to transcriptionally characterize the least inflamed lymphocytes, we performed DE analysis in the least inflamed cells (shown as LEAST) versus the rest of the cells (shown as REST) ( Figure 5H). The least inflamed lymphocytes demonstrated larger transcriptional differences compared with the most inflamed lymphocytes with an upregulation of 226 genes, of which 70 were shared among the clusters (Table S4). Transcripts related to tissue residency such as NR4A2 and JUND as well as PTGER4 and ICOS were commonly expressed by the least inflamed cells across several clusters ( Figure 5H). Additionally, the least inflamed ILCs revealed expression of CD83, involved in dendritic cell maturation as well as lymphocyte activation. 40 The least inflamed CD8 + and CD4 + Tn/cm cells expressed molecules related to their identity (SELL, LEF1, CCR7, TCF7) like the least inflamed Trm cells that additionally showed overexpression of genes associated with tissue residency (ICOS, JUND, NR4A2, and PTGER4) ( Figure 5H). The least inflamed CD8 + Trm cells additionally upregulated genes related to ''innateness'' (RORA, KLRB1) and effector functions (IFNG, TNF, CCL4) ( Figure 5H). To validate these findings, we performed flow cytometry using intestinal specimens of adult patients with and without IBD (Table S5). We showed that, indeed, CD4 + Trm and CD8 + Trm cells (defined as CD69 + and/or CD161 + ) were lower in frequency in IBD, as predicted by the nearest-neighbor analysis ( Figure 5I). The prediction of ILCs being mostly represented in the least inflamed landscape was additionally confirmed by our flow cytometry analysis of total ILCs ( Figure 1A).
Collectively, based on nearest-neighbor analysis of transcriptional profiles and histo scores, we identified most inflamed and least inflamed lymphocytes with potential roles in colonic inflammation. The genes expressed by these cells deserve further functional exploration to assess their usefulness as targets for immunotherapy in IBD.

DISCUSSION
In our study, we combined high-dimensional single-cell transcriptional and immunophenotypic analyses to comprehensively characterize innate and adaptive lymphocytes from colonic biopsies obtained from treatment-naive pediatric patients with newly diagnosed IBD colitis. We focused on pIBD as it generally shows a more aggressive phenotype and allows for studies of disease mechanisms that are less influenced by diagnostic delay, 41 age-related environmental exposures, for example smoking, and co-morbidities compared with adults. A number of studies have investigated IBD-associated patterns of T cells and ILCs in human tissues at the single-cell level. 5,15,16,18,19 However, these studies describe discrete cell populations of either ILCs or T cells without directly comparing these two cell types. Therefore, we complemented our protein analysis of ILCs with in-parallel single-cell transcriptional profiling of ILCs, NK cells, and T cells to uncover transcriptional networks that govern both shared and unique properties. Through our approach, we provide insights into ILC and T cell biology in pIBD. Understanding the regulation of these transcriptional networks could possibly lead to the development of therapies for IBD.
Flow cytometry revealed a dysregulated composition of mucosal ILCs in pIBD that correlated with the inflammatory activity. More specifically, we observed a reduction of NKp44 + ILC3s and an increase of NKp44 À ILCs and ILC1, the latter which could, at least to some extent, be explained by ILC3-ILC1 plasticity as described for adult IBD. 12 However, it is currently unclear how well the FACS-based definition of human ILCs correlates with transcriptional and functional features of ILCs in the gut mucosa. Similar to our FACS definition of ILC1, scRNA-seq analysis revealed an ILC1-like cluster expressing TBX21 (encoding T-bet) but low KIT (encoding CD117) and RORC. These cells likely represent the previously described CD127 + ILC1s 12 and not the intraepithelial CD103 + ILC1s, as those lack CD127 expression. 42 Expression of NCR2, encoding NKp44, was seen in several ILC subclusters, indicating transcriptional heterogeneity among NKp44 + ILC3s. It has been well described that human NKp44 + ILC3s harbor IL-22-producing ILCs 43 and that ILC3derived IL-22 is crucial for gut homeostasis in mice. 44 However, here we show that IL22 transcription is confined to a specific ILC subcluster with relatively low expression of RORC. This is biologically possible as RORC-deficient patients are still capable of IL-22 production. 45 Of note, IL22 was co-transcribed with other effector molecules such as CSF2, CD83, and IL4I1 in a gene module that was preferentially expressed in ILCs compared with T cells, potentially reflecting the more rapid response of ILCs compared with T cells. Notably, IL4I1 can activate AHR  25 which in turn supports IL-22 production in ILCs. 46 Thus, IL4I1 may promote AHR in ILC3s. Additional studies are needed to investigate how IL4I1 or other co-expressed molecules regulate IL-22 production by intestinal human ILCs.
Although ILC2s represent a large population in the mouse intestine 47 and human fetal intestine, 48 the healthy human adult intestine is largely devoid of conventional CD127 + ILC2s, 48,49 similar to what we also observed in our flow cytometric and scRNA-seq analysis of pIBD colon tissue. Although it is possible that intestinal ILC2s show an unconventional profile that we excluded with our current sorting gating strategy, we identified an ILC-restricted gene module containing genes associated with the regulation of ILC2s. This included positive regulators such as IL4R and CYSTL1 but also SOX4, which can suppress Th2 differentiation. 50 Thus, we speculate that the absence of a transcriptionally distinct subset of CD127 + ILC2s in the human intestine could be due to gut environmental factors that suppress ILC2s with a conventional phenotype. This topic requires further investigation.
One of the most striking overlaps between innate and adaptive lymphocytes in our analysis was the expression of a gene module containing genes associated with tissue residency including CD69, CXCR4, and KLF6. This gene module was particularly highly expressed in ILCs and Trm cells, indicating similar programs for tissue residency in innate and adaptive lymphocytes, a topic that deserves further investigation. However, we also noted that this module included genes associated with cellular quiescence, TSC22D3, DUSP1, and ZFP36L2. 33 These genes were highly transcribed in a SELL-expressing ILC subcluster (ZFP36L2) and in another ILC subcluster (TSC22D3 and DUSP1), which co-transcribed the transcription factor KLF2, typically expressed by naive T cells. The SELL-expressing ILC subcluster lacked features of mature ILCs including RORC, NCR2, TBX21, or EOMES, reminiscent of the CD62L + naive-like ILCs we recently described in tonsils. 34 The second naive/quiescent cluster, which lacked expression of TBX21 and EOMES, likely represented the quiescent/ naive-like ILCs that we previously reported in human tonsil and adult gut, 34 where we showed that these less-differentiated quiescent ILCs are marked by the presence of CD45RA expression and, despite expression of SELL transcripts, the absence of CD62L. 34 The frequency of this subset was increased in the inflamed gut of adult patients with IBD, and they could differentiate to ILC1s and IL-22-producing ILC3s. 34 In the present study, we extended this finding to show that CD45RA + ILCs accumulate in the inflamed gut of pediatric patients with IBD as well. These cells could be ready for trans-differentiation, as previously shown in a mouse model of psoriasis where ILC3s accumulated by trans-differentiation of ILC2s via a quiescent-like state of ILCs. 33 We further identified a gene module that was commonly expressed by a distinct subcluster of ILCs and T cells and that contained transcripts of major histocompatibility complex (MHC) class II molecules involved in antigen presentation. Indeed, we and others have previously shown that ILCs can acquire antigen-presenting properties that in turn can be suppressed by the human tumor microenvironment 29 and the gut microenvironment in mice. 31 However, the antigen-presenting capacity of ILCs in the context of human IBD, in the presence of a more complex microbiota compared with transgenic mice, remains under-explored. Hence, it is possible that ILCs could orchestrate CD4 + T cell responses during gut inflammation in humans, similarly to what was described in a mouse model of acute colitis. 30 To unravel inflammation-driven transcriptional changes in innate and adaptive lymphocytes, we made use of the histological inflammation score of our samples rather than grouping them in two endoscopically defined groups with variable histo scores. A nearest-neighbor-based smoothing of histo scores identified so-called least inflamed cells that were mostly defined as ILCs, CD4 + and CD8 + Tn/cm cells, and innate-like CD8 + Trm cells. The least inflamed colon areas of patients with pIBD may represent tissue that was previously inflamed and now healed or tissue that is yet unaffected by inflammation. Regardless, the transcriptomes of least inflamed cells represent a footprint of innate and adaptive lymphocyte homeostasis in the colon. This includes genes associated with the lineage identity of ILCs, CD4 + /CD8 + Tn/cm cells, CD4 + Trm cells, and innate-like CD8 + Trm cells, suggesting that maintained function of these cells is critical for immune homeostasis in the gut and that loss of these cells is associated with gut inflammation. In contrast, the most inflamed cells were dominated by NK cells, cytotoxic CD4 + T cells, Activ. CD4 + T cells/Tregs, and CD8 + Teff cells.
It is possible that the compositional differences in least and most inflamed cells is due to recruitment of lymphocytes from the blood to inflamed areas of the colon. However, it could also be due to plasticity and/or differentiation of cells. Plasticity of ILCs and T cells is well described, and for ILCs, conversion of non-cytotoxic human ILCs to NK cells has been demonstrated in vitro. 51 Most inflamed NK cells upregulated transcripts of the antimicrobial protein GNLY and CLEC2B, the latter mediating NK cell activation via ligation to NKp80. 52 GNLY has been shown to be released in response to infections and to facilitate antigen presentation in dendritic cells (DCs) via Toll-like receptor 4 (TLR4). 53 Interestingly, GNLY-expressing CD94 + CD127 + ILCs are expanded in patients with CD, 54 illustrating the fluidity in ILC-NK cell functions and the possibly that ILCs are reduced at the expense of NK cells during gut inflammation. Similarly, the underrepresentation of CD4 + Tn/cm and CD4 + Trm cells among the most inflamed cells could be due to differentiation to cytotoxic CD4 + T cells and/or Activ. T cells/Tregs. Several studies have described the presence of cytotoxic CD4 + T cells (CD4 + CLT cells) in IBD, 55,56 but their origin remains unknown. We identified a CD4 + T cell subset expressing GZMH and GZMK in inflamed IBD. Interestingly, this is in line with a recent report describing that CD4 + CTL cells exhibit a diverse granzyme expression in nasal polyp tissue, including a subset that expressed these non-conventional granzymes. 57 GZMK and GZMH expression is seen in CD8 + T cells with limited TCR repertoire and expression of exhaustion markers in aging 58 and autoimmunity, 59 suggesting that the CD4 + T cell subset that we detect herein might be a highly differentiated/exhausted subset. Future studies should be aimed to address the origin and function of these CD4 + CLT cells and Activ. T cells/Tregs in the context of gut mucosal inflammation in IBD.
We also noted accumulation of CD8 + Teff cells among the most inflamed cells. While it is tempting to speculate that these cells could be derived from the CD8 + Tn/cm and/or CD8 + Trm cells among the least inflamed cells, further work is required to Article ll OPEN ACCESS address this. The role for the CD8 + T cell compartment is conflicting in IBD. A single-cell atlas study of CD8 + T cells in UC described innate features of CD8 + IL26 + T cells, 18 while another study reported reduced frequencies of cytotoxic IFN-g + CD8 + T effector memory cells re-expressing CD45RA (TEMRA) cells in UC mucosa. 5 Similarly, we found increased expression of IFNG, TNF, and innate-like transcripts in the least inflamed IBD CD8 + Trm cells, suggesting that these cells have important homeostatic functions in the gut.
In summary, we report that pediatric patients with IBD display dysregulated ILC frequencies that are similar to those observed in adult patients with IBD, suggesting a common pathogenic program in this aspect. Using a combination of high-dimensional single-cell analysis on the transcriptional and protein levels, we provide a roadmap of ILCs and T cells in pIBD, including transcriptional signatures associated with inflammation. Further mechanistic studies are necessary to understand whether and how these findings contribute to pathogenesis and how they could facilitate the design of immunotherapeutic targets.

Limitations of the study
Although paired non-inflamed biopsies were used to infer inflammatory changes in the inflamed samples, the main limitation of the study is the lack of a non-symptomatic, healthy non-IBD control group in the single-cell analysis. Furthermore, one of the main transcriptional findings, on the reduced presence of Trm cells in colon inflammation, was validated on the protein level using samples from adult patients with IBD. However, we were not able to validate this, or other interesting transcriptional findings, in samples from patients with pIBD. Finally, since we performed our studies in samples from patients with ongoing inflammation, we cannot determine if the observed differences are a cause or consequence of IBD and what the role of these differences are in driving inflammation. In summary, the limitations of this study represent important aspects to address in future studies.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

ACKNOWLEDGMENTS
We would like to thank all the patients, and for the pediatric patients, also their parents, for their generous participation. Additionally, we thank all nurses, research nurses, and physicians that contributed to the study and the Flow Cytometry Core Facility, Department of Medicine Huddinge, the Bioinformatics and Expression Analysis core facility at Karolinska Institutet, and the VIB Nucleomics Core, Leuven, Belgium, for their essential help. Support by NBIS (National Bioinformatics Infrastructure Sweden) is gratefully acknowledged. P.C. is financially supported by the Knut and Alice Wallenberg Foundation as part of the National Bioinformatics Infrastructure Sweden at SciLifeLab.

Lead contact
Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Jenny Mjö sberg (jenny. mjosberg@ki.se).

Materials availability
The study did not generate new reagents.
Data and code availability d ScRNA-seq data (count matrix) are available at GEO under the accession number GSE169136. d All codes (R scripts) have been deposited at: https://github.com/ramvinay/Kokkinou_pIBD and https://github.com/jtheorell/ Kokkinou_gut_neighbors/ d Software and packages used for this analysis are listed in the key resources table. d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS
Human patients and study design Pediatric patients at the Karolinska University Hospital or Sachs' Children and Youth Hospital were included. All patients were treatment-naive and included upon diagnostic colonoscopy. IBD colitis diagnosis was endoscopically and histologically verified.
For flow cytometry analysis, 16 biopsies were obtained from two areas in the colon of pediatric patients with IBD (n = 19): the endoscopically most affected area (inflamed) and the endoscopically least-affected area (non-inflamed) (Table S1). Some patients had pancolitis or restricted ileal or upper GI-tract involvement, and thus, samples were taken only from the endoscopically inflamed or non-inflamed colonic mucosa, respectively. Non-IBD controls (n = 10), displayed unspecific gastrointestinal symptoms and colonoscopy was performed to rule out IBD diagnosis. All non-IBD controls had endoscopically and histologically normal gut mucosal findings with no signs of active or chronic inflammation.
For the scRNA-seq, treatment-naïve pediatric patients with CD-colitis were included (n = 6). Biopsies were sampled from endoscopically most inflamed and adjacent least inflamed areas of the colon and histologically evaluated (Table S3).
Characterization and inflammation activity scoring of patients with pIBD Disease activity was evaluated with endoscopic and histopathological scorings. To evaluate the disease activity endoscopically, we used the endoscopic part of SWIBREG (Swedish Inflammatory Bowel Disease Register) scoring system (Table S2). 20  CD patients, SES-CD endoscopic score and Paris phenotypic classification were additionally assessed to characterize the patient group. For histopathological scoring, we used a modification of colonic global histologic disease activity score of diagnostic samples collected in close proximity to the research biopsies. 21 The evaluation was based on the written statement of an IBD-experienced pathologist. Established IBD biomarkers (shown in Tables S1 and S5), were taken as routine diagnostic examination: blood 0-36 (median 3) and fecal samples 0-60 (median 18) days prior to biopsy sampling and are listed for characterization of the patient groups.

METHOD DETAILS
Cell isolation from intestinal samples and blood Cell isolation from colonic biopsies was performed as described in Mazzurana et al 2019, where the detailed protocols are provided. 66 Briefly, colonic biopsies were collected in wash buffer containing HBSS supplemented with antibiotics. For subsequent enzymatic digestion, wash buffer was removed and replaced with 1mL IMDM (supplemented with antibiotics), collagenase II 0.25 mg/mL; Sigma) and DNase (0.25 mg/mL; Roche). Magnetic stirring was performed at 37 C, 450rpm. For 3 pIBD patients (Table S1), peripheral blood mononuclear cells (PBMC) were isolated as previously described 29,66 using Lymphoprep TM (Stemcell Technologies) with gradient density centrifugation at 2000 rpm for 20 min, room temperature (RT). Cells were collected, washed, counted and subjected to cell staining.

Flow cytometry and cell sorting
For flow cytometry, colon cells or PBMC were stained with dead cell marker (DCM) and antibodies targeting surface antigens for 20-30 min at RT, followed by washing with FACS buffer (PBS containing 1% FCS). The cells were fixed with either 2% PFA for 10 min at RT, or in case of intracellular staining, with Foxp3 Fix/Perm solution kit (eBioscience) for 30 minutes at RT. For the latter, cells were subsequently washed with permeabilization buffer (eBioscience) and stained with antibodies targeting intracellular antigens for 30 min at RT. Samples were immediately acquired on a BD LSR Fortessa or BD Symphony A5 flow cytometer. Analysis of flow cytometry data was performed using FlowJo versions 9.9.6, 10.4.2 and 10.6.2 (TreeStar). A detailed list of the antibodies are shown in the key resources table.
For cell sorting, colonic ILCs, T cells and NK cells were surface-stained with lineage specific antibodies (all antibodies are indicated in the key resources table). Cells were sort-purified in cold RMPI using the BD FACSAria Fusion TM . Gating strategy for sorted ILCs, T cells and NK cells is shown in Figure S1A. After sorting cells were immediately subjected to RNA library preparation.
10x genomics single-cell RNA sequencing Chromium Single-Cell v2 3ʹ Reagents Kit (10x Genomics) was used to generate single-cell RNA Sequencing libraries. In brief, cell suspension was spun down and resuspended in 17.4ml PBS with 0.04% BSA at a final concentration of 1000 cells/ml (target recovery of 10 000 cells). Accordingly, Master Mix was prepared at the adjusted volume of 16.4ml Nuclease-Free water. Cells were loaded on the Chromium Controller Single Cell A Chip to generate Gel Bead Emulsions (GEMs). GEMs were proceeded to GEM-RT incubation, cDNA amplification and library construction following the Chromium TM Single Cell 3 0 library Kit v2 User Guide (10x Genomics). Quality controls for cDNA amplification and final barcoded libraries were performed using High Sensitivity DNA Assays from 2100 Agilent Bioanalyzer to assess the quantity and fragment size. Sequencing libraries were loaded at 2.1pM loading concentration on an Hi-Seq4000 with custom sequencing metrics (single-indexed sequencing run, 28/8/0/98 cycles for R1/i7/i5/R2) (Illumina, San Diego, CA).

QUANTIFICATION AND STATISTICAL ANALYSIS
Single-cell RNA sequencing data analysis 10x Cell Ranger toolkit (v2.1.1) was used for filtering and aligning reads onto human genome hg19 and counting barcodes and unique molecular identifiers (UMI). Seurat (v4.1.1) 60 was used for downstream 10x scRNA-seq data analysis. Briefly, cells with a percentage of mitochondria genes > 5% were considered as dead cells and were removed. Cells with < 200 genes and > 2500 detected genes were excluded. Genes detected in < 3 cells were excluded. Raw UMI counts were scaled and normalized through log transformation. We used Harmony for performing the integration of samples. 61 The top 4000 variable genes were selected for PCA and the first 40 dimensions were selected for UMAP and clustering analysis with resolutions 0.6. Seurat function FindAllMarkers (logFC threshold = 0.25) were used to find cluster-and sample-specific marker genes. Further, UMAP clusters were annotated by well-known marker genes and marker genes were detected by the FindAllMarkers function. The functions FeaturePlot, VlnPlot and DotPlot were used for feature plots, violins plots, heatmaps and dot plots, respectively, with default parameters. For subcluster analysis, we used the ComBat function from sva (v3.32.1) 62 R package to remove known batches (donors). Wilcoxon test was used for the DE analysis. Trajectory analysis on ILCs was performed using the Slingshot version 2.2.1 63 R package with Seurat object. We used UMAP coordinates and ILC subclusters 4 and 1 as the starting point (root). Weighted gene co-expression network analysis (WGCNA) We used the weighted co-expression network analysis (WGCNA) 23 algorithm to identify gene co-expression networks. Specifically, 4000 variable genes were used for calculation of gene-to-gene Pearson correlations. The adjacency matrix was directly used for creation of the topological overlap matrix (TOM). Gene module identification was done with agglomerative hierarchical clustering on the TOM using Ward's squared linkage (Ward.D2) with a fixed number of 50 modules. Gene counts belonging to the same module were normalized between 0-1 individually and then averaged per cell for calculation of each gene module expression pattern. Modules were prioritized by fitting a general linear model (GLM) using each gene module expression pattern as the effect and the sample metadata (Louvain cluster, Donor) as the explaining variables. Each module and metadata parameter was tested independently. The output of the GLM fit results in the residual metric of ''deviance'' (dev model ) after fitting the model and ''null deviance'' (dev null ) when considering a NULL model. The percent difference in these two deviances (Ddev) was used as a measure of how much a particular metadata factor explains the expression pattern of a gene module: This way, modules with higherDdevwere prioritized for further exploration.
Euclidean neighbor-based smoothing of histology inflammation score Since samples that are endoscopically scored as non-inflamed or inflamed are composed of samples with variable histological scores we needed a method to better define ''inflammatory'' and ''non-inflammatory'' cells to potentially discover genes involved in pIBD inflammation. This method builds on the concept of Euclidean neighbor-based smoothing, which has previously been used both in cytometry and scRNAseq analysis. 37,38 Its core assumption is that Euclidean neighbors in the transcriptomic space have similar functions.
The processing of the data was done using the neighSmooth function in the BioConductor package DepecheR, in a similar fashion as in Hart et al 37 . To even out differences in cell numbers between donors and conditions, the samples were divided into least inflamed (histo score 0-4), intermediate (histo score 7) and most inflamed (histo score 9-12). Then, random subsets of cells were chosen, so that each sample within these overarching inflammation groups had the same number of cells, and simultaneously that the least, intermediate, and most inflamed sample groups had the same number of total cells. In the end, a total number of 22065 neighbor cells were identified, ranging between 1461 and 3681 cells for each sample. After this, the 100 closest Euclidean neighbors were identified from this neighbor cell subset for all cells in the dataset, and each cell was given the average histo score for these 100 nearest neighbors. Each cell was further assessed for the number of samples represented among its 100 nearest neighbors. An arbitrary cutoff of five samples was used to filter out cell populations strongly overrepresented in outlier samples. For this more representative group of cells, aided by a histogram, cutoff values of % 4 for least inflamed cells and R 9 for most inflamed cells were introduced.
To confirm the results of the nearest neighbor-smoothing analysis of the histological scores, DAseq was used. 38 As this is a discriminant analysis tool, a few adaptions of the data was necessary. First, all samples with score 7 were excluded. Second, if both samples from one individual had low (<4) or high (R9) histological scores, the least extreme sample was removed. Subsequently, all remaining cells in the low or the high inflammation group were given these discriminant labels. This was followed by the standard DASeq pipeline. As a prediction threshold, the value of 0.5 was used.