Single-cell dissection, hdWGCNA and deep learning reveal the role of oxidatively stressed plasma cells in ulcerative colitis

Ulcerative colitis (UC) develops as a result of complex interactions between various cell types in the mucosal microenvironment. In this study, we aim to elucidate the pathogenesis of ulcerative colitis at the single-cell level and unveil its clinical significance. Using single-cell RNA sequencing and high-dimensional weighted gene co-expression network analysis, we identify a subpopulation of plasma cells (PCs) with significantly increased infiltration in UC colonic mucosa, characterized by pronounced oxidative stress. Combining 10 machine learning approaches, we find that the PC oxidative stress genes accurately distinguish diseased mucosa from normal mucosa (independent external testing AUC=0.991, sensitivity=0.986, specificity=0.909). Using MCPcounter and non-negative matrix factorization, we identify the association between PC oxidative stress genes and immune cell infiltration as well as patient heterogeneity. Spatial transcriptome data is used to verify the infiltration of oxidatively stressed PCs in colitis. Finally, we develop a gene-immune convolutional neural network deep learning model to diagnose UC mucosa in different cohorts (independent external testing AUC=0.984, sensitivity=95.9%, specificity=100%). Our work sheds light on the key pathogenic cell subpopulations in UC and is essential for the development of future clinical disease diagnostic tools.


Introduction
Ulcerative colitis (UC) is a chronic, idiopathic, inflammatory disease that affects the colonic mucosa [1,2].The incidence and prevalence of UC are increasing every year, especially in developing countries [3,4].UC is challenging to cure entirely, and patients often require long-term treatment [5].Furthermore, persistent UC can increase the risk of colorectal cancer [6].However, there is a lack of diagnostic methods for UC other than endoscopic biopsy pathology [7].A delayed diagnosis of UC significantly increases the risk of surgical interventions and other potentially life-threatening complications [8].Thus, it is crucial to identify reliable diagnostic markers for the early detection of UC and to develop new molecular stratification techniques to guide personalized treatment of UC patients.
Mechanistically, the development of UC can be resulted from synergistic interactions of multiple epithelial, immune, and stromal cells.Dysfunctions of these cells might contribute to the disease [9].A large number of studies have reported that the pathogenesis of UC is associated with various immune cell types, such as macrophages [10,11], innate lymphocytes [12][13][14] and CD4 T cell subsets [15,16].Furthermore, it was revealed that the composition of CD8 T cells is widely heterogeneous and that UC-associated CD8 effector T cells could reduce the regulatory function of excessive inflammation by producing tumor necrosis [17].Heterogeneity of plasma cells was also believed to be linked to the progression and outcome of UC [18].It has been demonstrated that there is a highly dysregulated Bcell response in UC [19].This evidence illustrates that exploring the molecular mechanisms and functions of cellular subpopulations in the UC microenvironment could provide strategies for potential new biomarkers.
In this work, we found a significant increase in plasma cell subpopulations in UC patients by single-cell sequencing analysis.We identified specific PC gene modules by high-dimensional weighted gene co-expression network analysis (hdWGCNA).Protein-protein interaction (PPI) network analysis and enrichment analysis showed that the upregulated plasma cell subpopulation is mainly associated with oxidative stress.PC oxidative stress genes facilitate UC diagnosis via machine learning and are associated with multiple immune cell infiltrations.We also verified the existence of oxidatively stressed PCs in colitis through spatial transcriptomics.Finally, we synthesized photographs of 11 key genes and immune cells to establish a gene-immune convolutional neural network deep learning model for the diagnosis of UC.
Single-cell sequencing data processing and highdimensional WGCNA (hdWCNA) Cells expressing between 200 and 10,000 genes were identified.A 20% mitochondrial proportion gene was also set as a threshold for filtering.The top 2000 highly variable genes were identified for scaling with the FindVariableFeature function in the R package Seurat [23].FindNeighbors and FindCluster were subsequently applied to obtain cell clusters.Next, the single-cell dataset was annotated with known cell markers [24].To further identify specific subpopulations of plasma cells in UC patients, the plasma cells were redimensionalized and reclustered.For WGCNA of the single-cell sequencing data, the hdWGCNA package was used according to the standard pipeline of hdWGCNA [25,26].

Pseudotime analysis and cell-cell interaction analysis
Plasma cells were extracted for subsequent pseudotime analysis.The cell differentiation state type was determined using the DDRtree method and the reduceDimension function in the monocle package.Then, we used the plot_cell_trajectory function to visualize the differentiation trajectory of cells [27,28].To study intercellular interactions mediated by ligand-receptor complexes, the CellChat package was used [29].The major signal inputs and outputs in all UC cell subpopulations were assessed by CellChatDB.Human on cellchat [29,30].

Protein-protein interaction analysis and enrichment analysis
The GeneMANIA data resource (http://www.genemania.org) was used to further identify potential interaction networks between target module proteins [31,32].The nodes represent proteins, and the edges represent the interaction between two proteins.Functional enrichment was performed using the clusterProfiler package [33] and Metascape online tool [34].

Non-negative matrix factorization (NMF), gene set scoring and immune infiltration analysis
The NMF algorithm was used to subclass patients based on RNAseq bulk gene expression profiles [35][36][37].The optimized cluster number was selected by cophenetic value.Scores for plasma cell subpopulations characterized by oxidative stress were calculated using single sample gene set enrichment analysis (ssGSEA) [38].To further reveal the relationship between UC and the immune microenvironment, MCPcounter [39] was utilized to assess the infiltration abundance of 10 immune cell species.

Spatial transcriptomic analysis
For spatial transcriptomic data, the Seurat pipeline was utilized.Mitochondrial and ribosomal genes were filtered, and genes expressing fewer than 10 spots were filtered [40].Then, the expression profile underwent SCTransformed.SpatialDimPlot and SpatialFeaturePlot were used to visualize the landscape of the section.Cd138 was selected as the marker for plasma cells, and a signature of the response to oxidative stress was obtained from the Gene Ontology database.Gene set evaluation was performed using the AddModuleScore function [30].

Machine learning and deep learning
Univariate logistic regression was utilized to identify key diagnostic genes in plasma cells characterized by oxidative stress.Subsequently, the Least Absolute Shrinkage and Selection Operator (LASSO) was employed to further filter the variables [41].To establish models with high accuracy and stable performance, the models were built by the mlr3 package [42].The best model was selected among 10 machine learning models, including k-nearest neighbor (KNN), linear discriminant analysis (LDA), logistic regression (LR), multinomial logit model (multinorm), naïve Bayes (NB), quadratic discriminant analysis (QDA), random forest (RF), recursive partitioning and regression trees (RPART), support vector machine (SVM) and extreme gradient boosting (XGBoost).Finally, to validate the accuracy of the machine learning results and the relationship between the target genes and immune cells, 11 key genes and 10 immune cells were synthesized into images for deep learning (N j,I =immune i /gene j ).A deep learning model based on the Keras and TensorFlow frameworks using convolutional neural networks (CNNs) was constructed.

Statistical analysis
The Wilcoxon test was used to make comparisons between the two groups.The Kruskal-Wallis test was used to make comparisons among the three groups.Pearson correlation analysis was used to Oxidative stress in plasma cells and ulcerative colitis reveal the relationship between the 11 key genes and 10 types of immune cells.A P value<0.05 was considered statistically significant.All analyses were performed in R (version 4.1.3).

Altered proportions of plasma cell subpopulations in UC mucosa
To explore the altered cellular composition in UC mucosa compared to normal mucosa, we applied GSE182270 to dissect the cellular composition of UC.After integration by harmony, dimensionality reduction and clustering and cell type annotation, we discovered that the altered proportion of plasma cells (PCs) was most pronounced in UC (Figure 1A).To further illustrate the specific altered subgroups of PCs, we extracted the PCs and further defined subgroups.It was interesting to find that not all plasma cells were universally up-regulated, but rather Cluster 1, 8 and 9 PCs (Figure 1B).Additionally, we noted that UC PCs possessed stronger interactions in the MIF pathway than other PCs, receiving MIF signals from T cells, epithelial cells and stem cells (Figure 1C).Thus, UC PCs were considered to be mediators of the MIF pathway (Figure 1D).

hdWGCNA revealed that Cluster 9 is characterized by the blue module
To obtain the characteristics of each small subpopulation, we performed hdWGCNA on PCs.We selected a power value of 10 to construct a scale-free network and generated 5 gene modules (Figure 2A,B).Among the gene modules, blue, brown and green were more inclined to be expressed in clusters 1, 8 and 9 and showed a significant positive correlation (Figure 2C,D).Significantly, Cluster 9 featured the blue and green modules, and the blue

1732
Oxidative stress in plasma cells and ulcerative colitis module in particular was the most distinctive considering the feature plot in 2D (Figure 2D,E and Supplementary Table S1).Furthermore, pseudotime analysis indicated that Cluster 9 is at the end of plasma cell development (Figure 2F).

Genes in the blue module participates in oxidative stress
To explore the functions of the genes in the blue module, we first conducted protein-protein interaction (PPI) analysis for the blue module genes.We noticed that the genes at the center of interactions were mainly involved in oxidative phosphorylation, ATP synthesis and metabolic pathways (Figure 3A).Moreover, KEGG enrichment showed that genes participated in oxidative phosphorylation (Figure 3B).In addition, we also used Metascape to verify the results, which further implied that Cluster 9 PCs might be those with high levels of oxidative stress (Figure 3C).

Machine learning with PC oxidative stress genes
Considering the presence of oxidatively stressed B cells in the UC mucosa, we sought to use the genes of oxidatively stressed B cells, i. e., the genes of the blue module, for the diagnosis of diseased and normal mucosa.LASSO regression reduced the number of genes to 11 (Figure 4A and Supplementary Table S2).In integrated machine learning, we incorporated 10 machine learning algorithms, including LDA, SVM, XGBoost, etc.In the training set GSE87466, we first conducted 10 repetitions of 5-fold cross-validation to examine the stability of each model.It was found that most of the models possessed good stability.Among the algorithms, RPART and XGBoost performed poorly, while SVM exhibited the best perfor- Oxidative stress in plasma cells and ulcerative colitis 1733 mance (Figure 4B and Table 1).Additionally, the AUCs of internal validation showed that SVM had the best diagnostic performance (Figure 4C).Therefore, SVM was applied as the final model and was tested in the external validation set GSE75214, which also performed well (AUC=0.991,sensitivity=0.986,specifi-city=0.909)(Figure 4D).

PC oxidative stress genes are associated with immune infiltration and available for UC patient subtyping
Furthermore, we sought to further explore the relationship of PC oxidative stress genes with immune infiltration and patient heterogeneity at the bulk level.First, we applied ssGSEA to evaluate the abundance of PC oxidative stress genes in normal and UC mucosa of three independent UC datasets, including GSE87466, GSE75214 and GSE165512.The UC mucosa did possess a higher PC oxidative stress score in different datasets (Figure 5A).Next, we evaluated the immune cell infiltration of the UC mucosa through MCPcounter.It was discovered that hub genes selected by LASSO had strong correlations with immune cells, indicating that PC oxidative stress genes might impact multiple immune cells (Figure 5B).To reveal the heterogeneity of patients, we clustered patients with PC oxidative stress genes and the NMF algorithm.Patients were classified into 3 subtypes (Figure 5C).We found that subtypes 1 and 3 had higher levels of immune infiltration and higher PC oxidative stress scores, while subtype 2 had a low level of immune infiltration (Figure 5D,E).Moreover, spatial transcriptomic data of dextran sulfate sodium (DSS) colitis mice were applied to verify the existence of the specific upregulation of stressed PCs.We found that the infiltration of plasma cells was increased in diseased regions compared with non-lesion regions (Figure 5F and Supplementary Figure S1, white arrow).Remarkably, there was colocalization of plasma cell abundance and abundance of cellular stress in space, as indicated by the white arrow (Figure 5F).We thus validated the presence of the aforementioned specificity of oxidatively stressed PCs in colitis.

Establishment of the gene-immune convolutional neural network model
Since the important relationship between PC oxidative stress genes and immune cells has been mentioned above, coupled with our need to establish diagnostic techniques independent of the batch effect of the datasets, we developed a gene-immune CNN classifier.
In brief, we constructed a gene-immune (11 units long by 10 units wide) heatmap for each patient, with the value of each square being

1734
Oxidative stress in plasma cells and ulcerative colitis the ratio of the expression of a gene to the infiltration of a particular immune cell in that patient (Figure 6A).The convolutional neural network was trained using 200 epochs, with GSE87466 as the training set and GSE75214 as the testing set (Figure 6B).We found that the gene-immune CNN performed well in both the training and testing sets (training AUC=0.602,sensitivity=95.2%,specifi-city=93.1%,testing AUC=0.984,sensitivity=95.9%,specifi-city=100%), suggesting its broad application prospects (Figure 6C).

Discussion
UC is a chronic gastrointestinal disorder of unknown origin characterized by continuous inflammatory colonic mucosa.Understanding the cellular and molecular mechanisms underlying UC has been a challenge, but recent advances in single-cell RNA sequencing and machine learning have enabled investigation of oxidative stress-related gene-gene and cell-cell interactions in UC.In this study, we used single-cell RNA sequencing and high-dimensional weighted gene co-expression network analysis to identify a diseasespecific subgroup of plasma cells characterized by oxidative stress.Additionally, we identified signature genes of oxidatively stressed plasma cells that could be used for disease diagnosis and immune microenvironment analysis.Plasma cells play a critical role in maintaining intestinal homeostasis, but their involvement in UC remains understudied.A highly dysregulated B-cell response in UC has been previously demonstrated, mainly reflected in the expansion and decreased diversity and maturation of plasma cells [19].Additionally, it was found that UC is associated with an increase in plasma cells infiltrating colonic tissue [18].However, few studies have investigated the specific subtypes and corresponding functions of plasma cells associated with UC.Our research emphasizes the role of plasma cells in the pathogenesis of UC.We observed a significant increase in PC abundance and extensive intercellular communication in UC  Oxidative stress has been recognized as a potential risk factor for UC [43].Under inflammatory conditions, high production of reactive oxygen species accumulates to generate oxidative stress, which in turn leads to DNA damage, driving the malignant progression of UC [44].Thus, targeting oxidative stress might provide an exciting avenue to alleviate colonic inflammation, as well as combat inflammation-related DNA damage and subsequent carcinogenesis [45,46].In this study, oxidative stress-related genes were identified in UC-specific plasma cells by hdWGCNA, which was consistent with the reported correlation between plasma cell aggregation and oxidative stress [47].
B cells are particularly vulnerable to oxidative stress, which can lead to B-cell dysfunction and contribute to the pathogenesis and progression of various diseases, including nonalcoholic fatty liver disease (NAFLD) and hepatocellular carcinoma associated with nonalcoholic steatohepatitis (NASH) [48].In SLE (systemic lupus erythematosus), oxidative stress in B cells contributes to immune system dysregulation, abnormal activation and processing of celldeath signals, and autoantibody production [49].In atherosclerosis, oxidative stress leads to the accumulation and dysfunction of B cells in the arterial adventitia layer, thus accelerating atherosclerotic plaque formation [50].Moreover, increased oxidative metabolism in B cells has been associated with impaired plasma cell differentiation [51].Taken together, these studies underscore the

1736
Oxidative stress in plasma cells and ulcerative colitis significance of oxidative stress-related dysfunction of B cells in various disease settings.However, the role of oxidative stress occurring in B cells in UC has not been extensively studied.Our research demonstrated that plasma cells specific to UC were distinguished by a dysregulated oxidative stress gene signature, which could aid in the identification of UC lesions.The diagnosis of UC is mainly based on clinical presentation and endoscopic findings [52].The molecular markers used to identify early UC patients are less studied, which is critical for receiving effective treatments.Fortunately, machine learning has been applied to aid the diagnosis of IBD and to predict the risk of relapse and carcinogenesis, with encouraging outcomes [53].In our study, 11 oxidative stress-related genes were identified in the blue module.SLC25A3 encodes the mitochondrial phosphate carrier, whose mutation can lead to mitochondrial phosphate-carrier deficiency [54].SLC25A3 is also differentially upregulated in UC and plays a potential carcinogenic role in UC-associated colorectal cancer [55].Specifically, XBP1 is a major endoplasmic reticulum stress-linked transcription factor and has been reported to contribute to cellular resistance to oxidative stress [56].XBP1 abnormalities result in intestinal inflammation, thus increasing susceptibility to IBD [57].PSME1 has been identified as a protein biomarker to signify active IBD using advanced label-free quantification technology for proteomes [58].COX6A1 is involved in oxidative phosphorylation and mitochondrial respiration [59,60], but to date, its engagement in IBD has never been reported.PPIB helps to adapt cells to oxidative stress and hypoxic conditions [61] but has not been studied in IBD.ARF4 has been demonstrated to have anti-apoptotic activity in glioblastoma by inhibiting stress-mediated apoptotic signals [62].PRDX4 has been reported to ameliorate lipotoxicityinduced oxidative stress and apoptosis in diabetic cardiomyopathy [63] but has never been studied in the gastrointestinal tract.NDUFAB1 encodes one of the mitochondrial respiratory chain complexes, which plays an essential role in mitochondrial homeostasis in colitis [64].MYL6 is strongly related to increased mitochondrial efficiency, especially under oxidative stress conditions.In summary, almost all the genes in the blue module are strongly related to oxidative stress.Meanwhile, UC mucosa scored significantly higher by the ssGSEA algorithm based on genes in the blue module.Additionally, the expression pattern of blue module genes is closely associated with immune infiltration, indicating their possible role in immunomodulation.To date, endoscopy is currently the gold standard for evaluating UC lesions, but misdiagnosis of UC can occur due to its complex and varied clinical manifestations.Molecular diagnosis based on RNA sequencing of colon tissue has significantly improved diagnostic accuracy [65].In our study, we demonstrated that our oxidative stress-related signature had stable performance in distinguishing UC lesions through machine learning and deep learning, which could assist in clinical diagnosis and subtype classification and lead to improved UC management.
In conclusion, we successfully identified a disease-specific subgroup of plasma cells characterized by oxidative stress in UC.Our study has established promising machine learning and deep learning models based on PC oxidative stress genes, providing a new paradigm for future UC research.

Figure 1 .
Figure 1.Altered proportions of plasma cell subpopulations in UC mucosa (A) Left panel, plasma cells (PCs) were upregulated in UC (dotted box).Right panel, proportions of each cell type (B) Left panel, clusters 1, 8 and 9 were upregulated in UC (dotted box).Right panel, proportions of each cluster of plasma cells.(C) Cell-cell interactions between other cell types and PCs.(D) UC PCs acted as mediators of MIF signaling.

Figure 2 .
Figure 2. hdWGCNA revealed that Cluster 9 is characterized by the blue module (A) Power value equal to 10 when the network reached a scalefree distribution.(B) Highly variable genes were clustered into 5 modules through hdWGCNA.(C) The correlations between modules.(D) FeaturePlots of different module scores in PCs.(E) Dot plot of the different module scores in PCs.(F) Pseudotime analysis of Cluster 9 featured by the blue module.

Figure 3 .
Figure 3. Genes in the blue module participat in oxidative stress (A) Protein-protein interaction of the genes in the blue module.(B) KEGG enrichment of the genes in the blue module.(C) Functional enrichment of genes in the blue module with Metascape.

Figure 4 .
Figure 4. Machine learning with PC oxidative stress genes (A) LASSO regression shrunk the genes to 11. (B) Mean AUC of 5-fold cross-validation for 10 replications of each model.(C) ROC curves and confidence intervals for each model.(D) ROC curve of the model in the independent external validation set.

Figure 5 .
Figure 5. PC oxidative stress genes are associated with immune infiltration and available for UC patient subtyping (A) UC patients did have higher blue module scores than normal mucosa in three independent datasets.(B) Correlations between genes selected by LASSO regression and immune cells.(C) NMF methods subclassed patients into 3 subtypes.(D) Blue module scores of 3 subtypes.(E) Immune infiltration abundance in 3 subtypes.(F) Spatial transcriptomic colocalization of PCs and the oxidative stress signature.

Figure 6 .
Figure 6.Establishment of the gene-immune CNN model (A) Principle of the model of the gene-immune CNN.(B) Training process of the geneimmune CNN model.(C) Performance of the gene-immune CNN model in the training and validation sets.

Table 1 . Performances of 10 machine learning methods
mucosa at the single-cell RNA sequencing level, particularly the subtype characterized by oxidative stress.