3.1. Data collation and transcriptomic difference analysis re- sults
After visualizing the distribution of the merged expression matrix derived from transcriptomic data, it became evident that there was a substantial batch effect between the two original datasets used. Batch effects can lead to cumulative errors and ultimately affect subsequent gene expression analysis and in- terpretation. To address this issue, we performed batch correc- tion and normalization procedures (Fig. 1a) to ensure that the gene expression data are more reliable and comparable.
Next, we took the obtained gene data and performed a union operation with 232 autophagy-related genes provided by the Autophagy Database (http://www.autophagy.lu/index.html). We conducted differential analysis and identified 36 differen- tially expressed autophagy-related genes, including 21 upregu- lated genes and 15 downregulated genes (with a log-fold change threshold of 0.4 and a p-value cutoff of 0.05). We visual- ized the differentially expressed autophagy-related genes using a volcano plot (Fig. 1b) and displayed the changes in up- and downregulated autophagy-related genes using a heatmap (Fig. 1c). Among the 36 differentially expressed genes, the top five upregulated autophagy-related genes were ITGA6, HDAC1, LAMP2, BAG3, and KIF5B, while the top five down- regulated ones were RAB1A,CAPNS1, CX3CL1, HSP90AB1, and ATG10 (Fig. 1d). Therefore, it can be inferred that Alzheimer’s disease alters the expression levels of relevant autophagy-related genes.
For the single-cell datasets GSE5281 and GSE138260 (in- cluding samples from healthy individuals and Alzheimer’s disease patients), we selected four single-cell data samples (GSM5348375, GSM5348374, GSM5348377, GSM4403286)
labeled as HC1, AD1, HC2, and AD2, respectively. We compared the total RNA count (nCount RNA), the number of RNA features (nFeature RNA), the percentage of mitochon- drial genes (pMT), and the percentage of hemoglobin genes (pHB) among these groups using violin plots (Fig. 2a). Sim- ilar to the previous analysis, we observed a noticeable batchef- fect. To address this, we applied the ”LogNormalize” method to perform a logarithmic transformation of the raw data, achieving standardization and comparability between different samples. We then visualized the processed single-cell data using PCA to effectively correct the data (Fig. 2b). The elbow plot indi- cated that the optimal number of principal components (PCs) was 15 (Fig. 2c). Based on the relevant parameters, we vi- sualized the reduced-dimensional single-cell dataset in t-SNE space and identified 31 clusters (Fig. 2d). We further dis- played the expression patterns of feature genes in different cell clusters using a bubble plot (Fig. 2e) and performed enrich- ment analysis for different cell clusters (Fig. 2f). The annota- tion resulted in six major cell types: T cells, B cells, endothelial cells, fibroblasts, myeloid cells, and epithelial cells.
3.2. Results of multi-omics analysis of infiltration degree of im- mune cells
The analysis of differentially expressed genesis an important data basis for immune infiltration analysis. We further con- ducted immune infiltration analysis based on this to identify changes in immune cell types under the regulation of differ- entially expressed autophagy genes (Fig. 3a). This helps us gain a deeper understanding of their potential roles in disease development and immune response. We labeled the experimen- tal group as ”high” (high-risk group for Alzheimer’s disease) and the control group as ”low” (low-risk group for Alzheimer’s disease) (Fig. 3b).
Through immune infiltration analysis, we identified path- ways with significant differences in infiltration levels: [1] CD4 + T cells that have not undergone antigen stimulation, dif- ferentiation, or functional maturation: the decrease in CD4 + T cells that have not undergone antigen stimulation may reflect an immune system imbalance (32). This could be due to cer- tain issues in the immune system’s response to antigen stimu- lation or abnormalities in maintaining immune balance, which affects the body’s ability to respond to potential pathogens or abnormal proteins. [2] Activated CD4 + memory T cells: the activation of CD4 + T cells suggests that the immune system is trying to respond to potential inflammation or infection (33). This partly reflects the immune system’s reaction to problems in the nervous system. [3] T cell co-regulatory proteins: the presence of these proteins indicates that the immune system is attempting to regulate its own activity or play a role in mitigat- ing inflammatory responses (34). [4] Both activated and unac- tivated NK cells: an increase in the number of NK cells in both states may indicate that the immune system is responding to pathological reactions related to Alzheimer’s disease, such as inflammatory reactions or the clearance of abnormal proteins (35). [5] Resting dendritic cells: the activity state of dendritic cells directly affects the information transmission and synaptic connections of neurons. Their decreased infiltration level may
affect neural signal transmission, exacerbating cognitive dys- function in Alzheimer’s disease patients (36). [6] Neutrophils: as the main granulocytes in the immune system, the significant differences in their presence may indicate the presence of patho- logical processes such as inflammation reactions or infections in Alzheimer’s disease patients (37).
Comprehensive analysis of immune infiltration helps reveal the key role of the immune system in biological processes and is of significant importance for understanding disease mech- anisms and discovering therapeutic targets. Subsequently, to further explore immune pathway information in Alzheimer’s disease patients, we visualized the proportions of different cell types using the previously organized single-cell transcriptomics data (Fig. 3c). We observed an increase in myeloid cell pro- portions in Alzheimer’s disease samples compared to healthy samples and visualized this through a heat map (Fig. 3d). This allows us to understand the specific distribution of cells with high or low immune levels and to discover that cells with high immune level scores are mostly myeloid cells.
3.3. Enrichment results by multiple methods and pathways
To gain a more comprehensive understanding of the role and regulatory mechanisms of these genes in cellular processes and biological functions, we performed gene enrichment pathway analysis based on the differentially expressed genes obtained from the previous differential analysis. This analysis aimed to reveal the metabolic pathways, signaling pathways, and other related biological processes involved in the differentially ex- pressed autophagy genes. This enables a deeper understanding of the important roles these genes play in disease development and physiological regulation. After conducting comprehensive gene ontology (GO) enrichment analysis incorporating cellu- lar component (CC), biological process (BP), and molecular function (MF) (Fig. 4a, b, c, d), the most prominent path- ways identified included macroautophagy, regulation of neural growth, positive regulation of cell, regulation of the nervous system, and regulation of autophagy.
Macroautophagy is involved in clearing abnormal protein ag- gregates and damaged organelles within nerve cells and helps maintain the stability of the intracellular environment (38). Re- search has shown that the expression and activity of the au- tophagy pathway in the brain are abnormal in Alzheimer’s dis- ease patients, leading to protein aggregation and disruption of organelle function, thereby exacerbating the progression of Alzheimer’s disease (39). On the other hand, the regulation of neural growth and positive regulation of cell are also associated with Alzheimer’s disease (40). In the nervous system, neuroge- nesis and synapse formation require precise regulation and co- ordination of both intracellular and extracellular environments, with positive regulation of cells playing a critical role in this process. When the positive regulation of cells is imbalanced, neurogenesis and synapse formation may be affected, thereby exacerbating the abnormal state of the nervous system.
Furthermore, after performing KEGG enrichment and GSEA enrichment analyses, the results indicate that the autophagy pathway is also enriched in various cancer phenotypes (Fig. 5a, b, c, d, e). This further highlights the important role of
autophagy in neurodegenerative diseases and cancer, providing insights into the complex regulation of autophagy in disease development and expanding the scope of autophagy research. This shared regulatory mechanism may provide new research directions at the intersection of these two diseases and empha- sizes the universal importance of autophagy in cellular survival and metabolic regulation.
3.4. Protein interaction network and gene screening of network center node
Based on the gene pathway enrichment analysis, we pro- ceeded to construct a protein-protein interaction (PPI) network and performed hub gene screening. By analyzing the protein- protein interaction network (Fig. 6a), we can explore the in- teraction relationships between genes more deeply and identify ”hub” genes that play a key role in the entire biological net- work (Fig. 6b). These ”hub” genes often interact with mul- tiple other genes and have a significant impact on the stability and function of the entire network. Therefore, identifying these critical protein-protein interaction networks and hub genes will help us better understand the patterns of interaction between genes and their biological functions, providing further insights and guidance for further bioinformatics analysis.
In this study, we used the multiple neighborhood-based clus- tering (MNC) algorithm to screen the top 10 hub genes in the PPI network (Table 1). These hub genes are CXCR4, GAPDH, RHEB, EGFR, ITGB1, IKBKB, IFNG, HDAC1, CDKN2A, and HSP90AB1.
Table 1
Top 10 hub genes and their changes based on the MNC algorithm.
Gene Symbol | Change Type |
CXCR4 | UP |
GAPDH | DOWN |
RHEB | DOWN |
EGFR | UP |
ITGB1 | UP |
IKBKB | UP |
IFNG | DOWN |
HDAC1 | UP |
CDKN2A | UP |
HSP90AB1 | DOWN |
Note: The Change type corresponding to each gene is derived from the previous differential analysis results. |
3.5. LASSO regression model feature geneselection results
After constructing the PPI network and screening for hub genes, a crucial step we took was to perform LASSO regression model to select key genes. By using machine learning methods, we were able to further narrow down the scope of analysis and identify the most critical genes, deepening our understanding of the molecular changes caused by interventions in Alzheimer’s disease. The application of LASSO regression model will help us identify genes that have a significant impact on disease status or treatment interventions, laying a solid foundation for further bioinformatics analysis and clinical translational research.
During the training process of the model, the complexity of the model decreases gradually on the training set as the regular- ization parameter increases, and the performance metric values become worse (Fig. 7a). On the validation set, the perfor- mance metric reaches its optimum around a certain regulariza- tion parameter value and then gradually deteriorates as the reg- ularization parameter increases. This optimal regularization pa- rameter value reflects the best performance of the model on the validation set. By observing the cross-validation plot, it can be seen that the optimal number of genes in the model is 20 (Fig- ure 7b), and the model’s coefficients for 19 relevant genes along with an intercept term are obtained (Table 2). Furthermore, a ROC curve is plotted to evaluate the performance of the model (Fig. 7c). The calculated AUC value is 0.970,indicating that the built model can effectively distinguish between the healthy and Alzheimer’s disease categories in this binary classification task, with a high accuracy rate and recall rate.
Table 2
Coefficients and intercept for the 19 key genes in the Lasso constructed model
Gene | Coef |
(Intercept) | -8.482628529 |
RAB1A | -0.203382697 |
KLHL24 | 0.27250216 |
BAG3 | 0.430830843 |
EEF2 | 0.291031202 |
CAPNS1 | -0.536966269 |
RAF1 | 1.089670214 |
VAMP7 | -1.144675163 |
CAPN2 | 0.419745604 |
SESN2 | 0.130015567 |
RB1 | 1.186959801 |
BAX | -0.745967665 |
CX3CL1 | -0.083016706 |
CAMKK2 | -0.106564441 |
CXCR4 | 0.443820169 |
ATG10 | -0.368013442 |
TNFSF10 | 0.128330603 |
EIF4EBP1 | 3.08E-02 |
IFNG | -0.157499347 |
CDKN2A | 0.108990092 |
Note: (Intercept) represents the intercept term. |
3.6. Venn and Nomogram mapping of key genes
After a series of gene screening operations, we ultimately take the intersection of the screening results from the various parts mentioned above to obtain the final set of selected genes. The purpose of this step is to integrate the results from different screening methods, retain genes that are consistent and impor- tant, and further highlight the key roles of these genes in bi- ological processes. In this study, we set the original common gene set (Common gene), autophagy gene set (autophagy), sig- nificantly differentially expressed gene set (Difference), Lasso modeling key gene set (Lasso), and PPI key hub gene set (Hub) as the 5 parts for intersection (Fig. 8a). We identified CDKN2A, CXCR4, and IFNG as strong candidate key genes that were expressed in all 5 modules (Fig. 8e). These genes were used as variables for logistic regression analysis, and a Nomogram column chart was constructed to build a model for assessing the degree of autophagy deterioration in Alzheimer’s disease (Fig. 8b). The ROC curve of the model was plotted (Fig. 8c), and the AUC value of the model was found to be 0.753, indicating a good model performance. Furthermore, the calibration curve was used to further evaluate and observe the performance of the model, and it was found that the calibrated curve had a good correlation with the ideal straight line (Fig- ure 8d), once again demonstrating the good performance of the established multi-factor logistic regression model in assessing the degree of autophagy deterioration in Alzheimer’s disease. CDKN2A, CXCR4, and IFNG genes can serve as reliable in- dicators for evaluating the degree of autophagy deterioration in Alzheimer’s disease.
3.7. Location of key autophagy genes and results of pseudo- time series analysis
After completing the screening of key autophagy genes in Alzheimer’s disease using transcriptomic data, we used single- cell genomics data to localize the expression of the strong can- didate key autophagy genes CDKN2A, CXCR4, and IFNG in different types of cells (Fig. 9a, b). After performing dis- persion analysis and selection on the single-cell genomics gene data and reordering them (Fig. 9c), we generated a cellu- lar developmental time series plot (Fig. 9d) and observed the expression changes of the key autophagy genes CDKN2A and CXCR4, which had relatively high expression levels, dur- ing the cell growth and development cycle (Fig. 9e). We found that the expression level of the CDKN2A gene increased initially and then decreased during the cell growth and devel- opment cycle, while the CXCR4 gene showed an increasing trend. Through pseudo-temporal analysis based on single-cell genomics, we can better understand the developmental trajec- tory of target genes in different cell types and the dynamic changes in cell states. This further reveals the dynamic changes of key autophagy genes in the process of Alzheimer’s disease.
3.8. Cell communication analysis results
After localizing and pseudo-temporal analysis of key au- tophagy genes, it was found that the CDKN2A, CXCR4, and IFNG genes have different expression patterns in different types
of cells (Fig. 9a). It was found that the CDKN2A and CXCR4 genes are highly expressed in myeloid cells compared to other cells, while the IFNG gene is mainly expressed in T cells with lower expression levels. Taking the CDKN2A gene as an example, we divided the myeloid cell cluster data into two groups based on the differential expression of the CDKN2A gene (Fig. 10a). Subsequently, we observed the communi- cation networks and outgoing/incoming signaling patterns be- tween the cells in these groups (Fig. 10b, c) and plotted a bubble chart to display the signaling pathway of different cell groups (Fig. 11a). The NRG signaling pathway network was found to have a high occurrence frequency in the entire cell communication network. By visualizing the interactions be- tween different cell groups on theNRG signaling pathway (Fig- ure 11b), it was observed that there is a close connection be- tween the Mast cell and Endothelial cell groups in the NRG sig - naling pathway. It is speculated that the changesin CDKN2A autophagy gene expression are related to the reciprocal inter- action between the NRG signaling pathway and the Mast cell cell group and B cells. The analysis methods for the other two genes are consistent with this.
3.9. Mendelian random sampling analysis results
To explore the association between different exposure fac- tors and the onset of Alzheimer’s disease and gain a deeper understanding of possible pathophysiological mechanisms, we adopted Mendelian randomization analysis using depression and vascular inflammation as exposure factors and Alzheimer’s disease as the outcome variable. Mendelian randomization analysis allows for more accurate assessment of causal relation- ships between influencing factors and outcome variables, thus assisting in the formulation of clinical decisions. The appli- cation of this method is expected to provide important scien- tific evidence for more effective prevention and intervention of Alzheimer’s disease.
After conducting sensitivity analysis (Fig. 12b), we found that the central values for vascular inflammation and mood fluc- tuations were greater than 0, indicating a more reliable re- lationship between vascular inflammation and mood fluctua- tions. Further analysis was conducted on the exposure factors, and a forest plot was generated to reflect the increased risk of Alzheimer’s disease with worsening mood fluctuations un- der the IVW (Inverse Variance Weighted) calculation method (while the unreliable nature of the results generated by depres- sion can be seen from the leave-one-out plot, and the lower sig- nificance of the results related to vascular inflammation can be observed from the box plot) (Fig. 12a, c). Additionally, the funnel plot indicates that the Mendelian randomization analysis satisfies the requirements for Mendel’s second law of random- ization (Fig. 12d).
Mendelian randomization analysis can effectively help con- trol confounding factors in experiments and minimize the ran- dom differences between experimental and control groups. This approach reduces the impact of confounding factors on exper- imental results, improves the reliability and comparability of the results, balances potential random errors and biases through
random grouping, ensuring that the differences between the ex- perimental and control groups are caused by the treatment vari- able rather than other factors. It also enhances the effectiveness of statistical inference, establishes causal relationships between exposure factors and outcome variables, and conducts statistical hypothesis testing (41).