3.1 Identification of DEGs and Merged DEGs between COVID-19 and Aging Datasets
We created a flowchart (Fig. 1A) to describe the processes involved in our study. The first step in the flowchart was the selection and grouping of our research data. The second step involved a preliminary analysis of the data to identify the differential genes that characterize the disease. The third step was to conduct further correlation analysis of differential genes using machine learning and immune infiltration. To investigate the disease characteristic genes in aging patients infected with severe COVID-19, we analyzed blood samples downloaded from the GEO database. Heatmap and volcano plots of the different expression genes in the COVID-19 and Aging datasets are shown in Supplementary materials (Figure S1A-D). Assuming that GSE164805 and GSE69832 were the training sets, we identified 7488 DEGs in the COVID-19 dataset(GSE164805)with |Log2 Fold Change| > 1.0 and | adj.P.Val. | < 0.05 (Supplementary Table 1). In the Aging dataset༈GSE69832༉, we identified 74 DEGs with |Log2 Fold Change| ≥ 0.585 and | adj.P.Val. | < 0.05 (Supplementary Table 2). Heatmap and volcano plots depicting the DEGs of COVID-19 and Aging are shown in Supplementary Figure S1A, B and Figure S2A, B respectively. GSE180594 was assumed as the verification set, and we identified 49 DEGs with |Log2 Fold Change| ≥ 0.585 and | adj.P.Val. | < 0.05 (Supplementary Table 3). Heatmap and volcano plots depicting the DEGs in the verification set GSE180594 are shown in Supplementary Figure S3A, B. The subsequent heatmap and volcano plots revealed that 8 DEGs were common to both COVID-19 and Aging datasets (Fig. 1B, C; Supplementary Tables 4, 5). Our findings suggest that Aging and COVID-19 share 8 closely related genes (ALOX12, CXCR1, C1QB, IL1B, IL23A, MX2, PTGER4) that may be involved in the disease processes.
3.2 Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of disease characteristic gens.
In order to explore the biological characteristics and pathway information of the eight disease characteristic genes related to aging severe COVID-19, we conducted GO and KEGG enrichment analysis. Our study revealed that the 8 disease characteristic genes were enriched in several biology process (BP) pathways including myeloid leukocyte migration, granulocyte chemotaxis/migration, and neutrophil chemotaxis/migration (Fig. 2A). In terms of cellular component (CC), these genes were enriched in the function of nuclear envelope (Fig. 2A). Furthermore, in molecular function (MF) analysis, the eight genes were found to be enriched in cytokine activity and cytokine receptor binding (Fig. 2A). The GO circle diagram indicated that most of the differential genes were enriched in the BP biological characteristic pathway, followed by CC and MF (Fig. 2B). In addition, KEGG pathway analysis identified the top six enriched pathways to be inflammatory mediator regulation of TRP channels, pertussis, chagas disease, human cytomegalovirus infection, and coronavirus disease-COVID-19, cytokine − cytokine receptor interaction (Figs. 2C, D). These results suggest that the eight DEGs are strongly associated with the activity of certain signaling pathways that regulate the response of inflammatory cytokines located on the nuclear membrane surface of cells. This may lead to a poor prognosis in aging severe COVID-19 patients.
3.3 Diseases Ontology (DO) and Gene Set Enrichment Analysis (GSEA) enrichment analysis results of disease characteristic genes.
Our results demonstrate a strong correlation between the eight hallmarks genes and various diseases, including ankylosing spondylitis, bronchiectasis, membranoproliferative glomerulonephritis, ulcerative colitis, and end stage renal disease (Figs. 3A, B). These findings suggest that the characteristic genes may play a significant role in the pathogenesis of bronchitis, kidney immune disease and gastrointestinal immune system diseases in older patients. Furthermore, the GSEA analysis of the test group revealed that the top five enriched KEGG pathways enriched with the eight characteristic genes were chemokine signaling, cytokine-cytokine receptor interaction, ECM receptor interaction, focal adhesion, leishmania infection (Figs. 3C, D). Notably, the ES value for ECM receptor interaction, focal adhesion, and leishmania infection pathways was over 0.5.
3.4 Machine learning LASSO and SVM-RFE screen characteristic genes in aging severe COVID-19 patients.
We applied two algorithms to evaluate the diagnostic values of our screened DEGs. First, we used LASSO logistic regression algorithm to evaluate eight DEGs, resulting in the most relevant genes (Fig. 4A). Subsequently, the SVM-RFE algorithm was employed to validate these characteristic genes in the determined DEGs of LASSO logistic regression algorithm (Fig. 4B). Then Venn diagram shows 7 overlapping genes (ALOX12, C1QB, CXCR1, IL1B, IL23A, MX2, and PTGER4) obtained using the two algorithms (Fig. 4C). The expression levels of these genes were compared between control and patient groups using boxplot, and p-value less than 0.05 indicating a significant difference. Six genes showed significant differences except MX2 (p = 0.47). The up-regulated genes were ALOX12, C1QB and CXCR1, whereas IL1B, IL23A and PTGER4 were down-regulated in severe covid-19 patients compared with the control group (Figs. 4D).
3.5 Diagnostic efficacy of characteristic genes
To determine the diagnostic efficacy, ROC curves were employed to compare the expression of 7 characteristic genes in the training dataset and the validation dataset. The results of the 7 genes in the training set were as follows: ALOX12 (AUC = 0.81, [95% CI 0.646–0.935]), C1QB (AUC = 0.842, [95% CI 0.714–0.946]), CXCR1 (AUC = 0.938, [95% CI 0.857–0.991]), IL1B (AUC = 0.891, [95% CI 0.753–0.988]), IL23A (AUC = 0.946, [95% CI 0.860-1.000]), MX2 (AUC = 0.814, [95% CI 0.665–0.929], and PTGER4 (AUC = 0.991, [95% CI 0.967-1.000]) respectively(Fig. 3A). Similarly, in the validation dataset, the results were ALOX12 (AUC = 0.944, [95% CI 0.833-1.000]), C1QB (AUC = 0.817, [95% CI 0.619–0.952]), CXCR1 (AUC = 0.913, [95% CI 0.770-1.000]), IL1B (AUC = 0.813, [95% CI 0.603–0.976]), IL23A (AUC = 0.913, [95% CI 0.778-1.000]), MX2 (AUC = 0.599, [95% CI 0.373–0.810], and PTGER4 (AUC = 0.976, [95% CI 0.921-1.000]) indicating that all genes, except MX2, had strong predictive ability (Fig. 3B).
3.6 Immune cell infiltration analysis
We employed the CIBERSORT algorithm to calculate the immune cell infiltration in aging patients. The immune cell components of the aging severe COVID-19 can be seen in the bar chart (Fig. 6A). Our findings suggest that B cell activation positively correlated with B cell memory (r = 0.92) and negatively with activated NK cells activated (r = -0.6). Additionally, B cell memory is positively correlated with T cells gamma delta (r = 0.79) and negatively with NK cells activated (r = -0.69). Furthermore, activated CD4 memory T cells exhibited a negative correlation with master cells activated (r = -0.76), while T cells CD4 memory negatively correlate with dendritic cells resting (r = -0.68) and positively correlate with NK cells activated (r = 0.75) (Fig. 6B). The violin diagram demonstrated significant differences in expression levels between control groups (\(<\)65 years) and treat groups (\(\ge\)65 years) for B cells memory ( p = 0.023), T cells CD4 memory activated (p \(<\)0.001), T cells regulatory (Tregs) (p = 0.035), T cells gamma delta (p = 0.037), macrophages M0 (p = 0.037), macrophages M1(p = 0.028), dendritic cells resting (p = 0.005), mast cells resting (p = 0.002) and mast cells activated (p\(<\)0.001) (Fig. 6C). We then evaluated the expression of seven characteristic for each immune cells and depicted the corresponding p-values in the lollipop chart. (Different colors show different expression multiples, and the size of dots shows the degree of correlation, Fig. 6D). Lastly, we utilized scatter plots to analyze the association between each type of immune cell and up-regulated and down-regulated genes, respectively (Supplementary Figure S4-5).