1. Single-cell sequencing data analysis.
GSE209872 comprises 5 samples. In this analysis, the data samples were initially screened based on nFeature RNA and nCount RNA (nFeature RNA > 50 & nFeature RNA < 5000 & percent.mt < 5) (Fig.1A-B). Moreover, we presented the analysis of the top 10 genes with the highest standard deviation (Fig.1C). Our study revealed that there were no significant batch effects observed between samples, as determined through PCA (Fig.2A). Additionally, the ElbowPlot analysis identified 7 as the optimal number of PC (Fig.2B). By utilizing TSNE analysis, a total of 16 subgroups were identified (Fig.2C). Subsequently, annotation was carried out for each subtype using the R package Single R, which resulted in the categorization of the 16 clusters into six cell types: Neurons, Astrocytes, Endothelial cells, Macrophages, Fibroblasts, and Microglia (Fig.2D). Furthermore, unique marker genes for each cell subtype were extracted from the single-cell data using the FindAllMarkers function (Supplemental table1). The software package Cellchat was employed to analyze the ligand-receptor relationships of features in the single-cell expression profile. Notably, complex interactions were identified among various subtypes of these cells. Of particular interest, closer relationships were observed between Astrocytes, Endothelial cells, and other cells (Fig.2E). Consequently, we selected the marker genes of Astrocytes as the candidate gene set (Fig.2F).
2. Identification of key genes related to DR
We obtained datasets GSE94019 and GSE102485 from the GEO database, which are associated with DR. These datasets consist of expression profile data from a total of 38 patients, including 7 control individuals and 31 patients with DR. The expression data was processed using the SVA algorithm to correct for batch effects. PCA plots were used to visualize the batch effects before and after correction. The results demonstrated a reduction in batch effects between the chips following correction using the SVA algorithm (Fig.3A-B). To identify differentially expressed genes between the DR and control groups, we utilized the limma package. Through this analysis, we detected 322 genes that exhibited differential expression, with 152 genes being up-regulated and 170 genes being down-regulated (Fig.3C). In order to explore the co-expression network of genes within the DR cohort, we constructed a weighted gene co-expression network using the WGCNA approach. The soft threshold was set to 11, and gene modules were identified based on the TOM matrix (Fig.3D-E). A total of 10 gene modules were identified, each associated with a different color: black (282 genes), blue (713 genes), green (1226 genes), green yellow (146 genes), grey (487 genes), magenta (253 genes), pink (254 genes), purple (247 genes), turquoise (866 genes), and yellow (526 genes) (Fig.3F). Among these modules, the pink module exhibited the highest correlation with disease (cor=0.63, p=0.05) (Fig.3F). Furthermore, we performed an intersection analysis between the differentially expressed genes within the pink module and the marker genes of astrocytes, resulting in the identification of 6 common genes (Fig.3G). These 6 genes, namely CD44, CPLX4, MMP14, PMEPA1, PMP22, and POSTN were determined as the key genes for our subsequent research (Fig.3G). When comparing the DR group to the control group, all key genes were found to be upregulated, except for CPLX4, which exhibited downregulation (Supplemental table2).
3. Signaling pathways involved in key genes
We subsequently investigated the particular signaling pathways implicated in the key genes and explored the potential molecular mechanisms through which these genes impact the pathological process of DR. The GSEA findings delineated that CD44 high expression was significantly enriched in pathways such as Jak-STAT signaling pathway, neurotrophin signaling pathway, TGF BETA signaling pathway, and toll like receptor signaling pathway (Fig.4A). CPXL4 low expression was notably enriched in pathways such as neurotrophin signaling pathway, TGF BETA signaling pathway (Fig. 4B). MMP14 high expression was highly concentrated in pathways like B cell receptor signaling pathway,Fc epsilon RI signaling pathway, NOD-like receptor signaling pathway, P53 signaling pathway, and Toll-like receptor Signaling Pathway (Fig. 4C). PMEPA1 highly expressed was enriched in apoptosis, dorso ventral axis formation, JAK-STAT signaling pathway, MAPK signaling pathways, and tight junction (Fig. 4D). PMP22 highly expressed was notably enriched in neurotrophin signaling pathway, Pathogenic Escherichia coli infection, regulation of actin cytoskeleton, and sphingolipid metabolism (Fig. 4E). POSTN highly expressed was enriched in chemokine signaling pathway, Fc epsilon RI signaling pathway, Jak-STAT signaling pathway, Leukocyte transendothelial migration and Toll-Like Receptor signaling(Fig. 4F). These findings suggest that key genes may impact DR progression through these pathways.
4. Key gene immune infiltration analysis
The microenvironment is primarily composed of fibroblasts, immune cells, extracellular matrix, various growth factors, inflammatory factors, and distinct physicochemical features. It has a significant impact on the diagnosis, prognosis, and clinical treatment sensitivity of diseases. We further examined the distribution of immune infiltration levels in DR (Fig.5A). A heatmap was generated to illustrate the correlation between immune cells, with red indicating positive correlation, blue indicating negative correlation, and the intensity of the color indicating the strength of the correlation (Fig.5B). Compared to the control group, the DR group exhibited significantly higher levels of T cells CD8, Macrophages M1, and Macrophages M2 (Fig.5C). We further investigated the relationship between key genes and immune cells and observed multiple key genes highly correlated with immune cells. The results revealed a significant positive correlation between CD44 and Macrophages M2, as well as a significant negative correlation with Dendritic cells activated, T cells follicular helper, B cell memory, etc. (Fig.5D). CPLX4 displayed a significant positive correlation with T cells focal helper, B cells memory, Plasma cells, and Dendritic cells activated, while displaying a significant negative correlation with Macrophages M2 and T cells CD4 memory activated (Fig.5E). MMP14 exhibited a significant positive correlation with Macrophages M2 and Macrophages M1, and a significant negative correlation with T cells follicular helper, T cells regulatory (Tregs), B cell memory, Dendritic cells activated, and T cell gamma delta (Fig.5F). PMEPA1 displayed a significant positive correlation with Macrophages M2, and a significant negative correlation with T cells follicular helper, Dendritic cells resting, Dendritic cells activated, B cell memory, and cell gamma delta (Fig.5G). PMP22 exhibited a significant positive correlation with Macrophages M2, and a significant negative correlation with T cells follicular helper Dendritic cells activated B cell memory, Dendritic cells resting, and T cells regulatory (Tregs) (Fig.5H). POSTN displayed a significant positive correlation with Macrophages M2 and Macrophages M1, and a significant negative correlation with T cells focal helper, B cells memory, Plasma cells, Dendritic cells activated, Dendritic cells resting and Macrophages M0 (Fig.5I). These analyses suggest a close relationship between these key genes and the level of immune cell infiltration, highlighting their crucial role in the immune microenvironment.
5. Analysis of transcriptional regulation related to key genes
In this study, we considered these 6 key genes for the gene set analysis and observed that they are regulated by common mechanisms, including the influence of multiple transcription factors. Consequently, we conducted enrichment analysis using cumulative recovery curves to investigate these transcription factors. By annotating the Motif-TF and analyzing the selection results of important genes, we identified that the cisbp_M5693 Motif exhibited the highest normalized enrichment score (NES: 4.7). Four key genes (CPLX4, MMP14, PMP22, and POSTN) were enriched in this particular motif. The enriched motifs, along with their corresponding transcription factors for the key genes, are depicted in Fig. 6. Moreover, all the enriched motifs and corresponding transcription factors for the 6 key genes are displayed in Supplemental table3.
6. Correlation between key genes and disease regulatory genes
We obtained disease-related genes associated with diabetic retinopathy from the GeneCards database (https: www. genecards. org). By analyzing inter-group expression differences of the disease-related genes, we found differential expression of ACE, IL6, KCNJ11, NEUROD1, PPARG, and TCF7L2 between the disease and control groups (Fig.7A). We analyzed the expression levels of the top 20 genes ranked by Relevance score and found significant correlations between the expression levels of six key genes and multiple disease-related genes. Among them, POSTN and NEUROD1 were significantly negatively correlated (r=-0.675), while CPLX4 and NEUROD1 were significantly positively correlated (r= 0.875) (Fig. 7B). The above results suggested that NEUROD1 may take part in the regulation of POSTN and CPLX4 on DR.