Meta-analysis of single-cell RNA-seq data reveals phenotypic switching of immune cells in severe COVID-19 patients

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has resulted in the global coronavirus disease 2019 (COVID-19) pandemic. Despite several single-cell RNA sequencing (RNA-seq) studies, conclusions cannot be reached owing to the small number of available samples and the differences in technology and tissue types used in the studies. To better understand the cellular landscape and disease severity in COVID-19, we performed a meta-analysis of publicly available single-cell RNA-seq data from peripheral blood and lung samples of COVID-19 patients with varying degrees of severity. Patients with severe disease showed increased numbers of M1 macrophages in lung tissue, while the number of M2 macrophages was depleted. Cellular profiling of the peripheral blood showed a marked increase of CD14+, CD16+ monocytes and a concomitant depletion of overall B cells and CD4+, CD8+ T cells in severe patients when compared with moderate patients. Our analysis indicates the presence of faulty innate-to-adaptive switching, marked by a prolonged innate immune response and a dysregulated adaptive immune response in severe COVID-19 patients. Furthermore, we identified cell types with a transcriptome signature that can be used as a prognostic biomarker for disease state prediction and the effective therapeutic management of COVID-19 patients.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected millions of people and caused more than 2 million fatalities worldwide, resulting in the coronavirus disease 2019 (COVID- 19) pandemic. Most COVID-19 patients have mild to moderate symptoms, but approximately 20% develop more severe systemic inflammation accompanied by acute respiratory distress syndrome (ARDS), often leading to death [1,2]. Multiple studies have implicated the immune response in severe cases, which plays a crucial role in determining the outcome of COVID-19 patients [3][4][5][6][7][8][9][10][11]. However, these studies were heavily reliant on either the local or peripheral response and included limited numbers of patient samples. Thus, an understanding of the local and peripheral immune landscape in relation to disease severity and progression is required and would help improve the therapeutic management of COVID-19 symptoms.
Recent single-cell RNA sequencing (RNA-seq) studies of severe COVID-19 patients have provided evidence of a reduced lymphocyte count and higher numbers of inflammatory myeloid cells [3][4][5]. Other studies have found that severe COVID-19 patients have abundant pro-inflammatory monocytes or monocyte-derived macrophages in bronchoalveolar lavage fluid (BALF) [6][7][8]. Older COVID-19 patients exhibit more severe symptoms compared with children and young adults. However, the molecular mechanisms that underlie the imbalanced host response leading to disease severity remain unclear.
Innate immune cells act as the first line of defense against invading viruses and clear the infection by producing type I interferon (IFN) [12]. Severe and terminally ill patients exhibit higher levels of cytokines and chemokines, such as IL6, IL36G, CXCL2, CXCL10, CCL2, CCL3, and CCL5. This finding points toward an overactive innate immune cell-mediated cytokine storm, leading to the development of ARDS. However, the type I IFN-mediated innate immune response is reportedly impaired and delayed [13], suggesting a dynamic shift in the overall immune response.
The adaptive immune response controls the excessive innate immune response; thus, lower lymphocyte counts in severe COVID-19 patients partly explain the observed hyperactivated innate immune response [14]. Furthermore, recent studies reported that a lower lymphocyte count and a reduced ratio of lymphocytes in the blood contribute to the increase in myeloid cells in the blood and BALF [15,16]. Indeed, older patients aged 60 years or more exhibit low numbers of CD4 + and CD8 + T cells [17], whereas children with SARS-CoV-2 infection have normal lymphocyte counts [18]. Therefore, we posit that this weakened adaptive response, along with a hyperactivated innate immune inflammatory response, may increase the fatality rate.
To understand the immune landscape in the lungs and blood of COVID-19 patients, we collated single-cell RNA-seq datasets from healthy controls and moderate, severe, and fatal cases. Our metaanalysis revealed phenotypic switching of immune cell types and proportionality in the lung and blood of severe cases. Principal component analysis (PCA) of important cell types could discriminate disease severity. Furthermore, cells that were strongly associated with severe disease or mortality had high expression levels of inflammatory genes.

Ethics statement
All required ethical guidelines were followed, and ethics committee approvals were obtained by the original study groups.

Datasets
Single-cell RNA seq raw datasets were collected from the gene expression omnibus (GEO) of the National Center for Biotechnology Information database. GSE163668 [19] and GSE166992 [20] were used for the analysis of blood samples, while GSE168215 [21], GSE169471 [22], and GSE145926 [6] were used for lung samples. GSE157344 [23] was used for both blood and lung samples. Briefly, patients who were not admitted to an ICU for longer than 3 days and did not require mechanical ventilation/intubation were designated as moderate patients. Patients admitted to an ICU for a longer period and/or required mechanical ventilation/intubation were designated as severe patients.

Single-cell data analysis
Seurat (version 4.0.2) in the R program environment (4.0.1) was used for the data quality control and analysis, as comprehensively outlined by the package developer [24] and others, owing to its fast processing time and ability to integrate multiple datasets across platforms [25]. Briefly, Seurat objects were created from individual expression matrices. Unique molecular identifier (UMI) counts were scaled by library size and a natural log transformation. Gene counts for each cell were divided by the total UMI count of that cell, scaled by a factor of 10, 000, and then transformed via the natural log plus 1 function, "Nor-malizeData". The normalized data were further scaled with the "Scale-Data" function so that the mean expression across cells was 0 and the variance was 1. PCA was used with the "RunPCA" function in Seurat to reduce the dimensionality of the data by clustering similar cells from different datasets. Next, we identified anchors across datasets using the "FindIntegrationAnchors" function in Seurat by embedding cells in a k-nearest neighborhood-based approach to identify mutual neighbors from different datasets and scored them based on their mutual nearest neighbors. Noise and batch effect variances among the datasets were taken into consideration by using reference cells from each dataset. These anchors were then used to integrate data across datasets using the "IntegrateData" function. The uniform manifold approximation and projection (UMAP) method [26] was used to visualize high-dimensional cellular data in an easy and comprehensive manner. Differentially expressed genes (DEGs) were identified using the default "FindMarkers" function in Seurat based on the non-parametric Wilcoxon rank-sum test.

Cell type annotation
The automated cell annotation program scCATCH (version 2.1) [27] was used in the R environment. scCATCH uses paired comparisons and evidence-based scoring to identify potential marker genes and annotate clustered cells.

Gene ontology analysis
Metascape [28] (web version) was used to assess the functional enrichment of DEGs. Metascape queries publicly available databases, such as Gene Ontology, Kyoto Encyclopedia of Genes and Genomes [29], and assigns DEGs to their respective enriched pathways by calculating the pairwise similarity between any two terms. The hypergeometric test and Benjamini-Hochberg p-value correction algorithm were used to identify significantly enriched ontology terms.

Principal component analysis
We performed PCA on selected gene expression changes (log2 fold change) of target cell types to distinguish the COVID-19 disease states (i. e., control, moderate, severe, deceased). The PCA involved six steps. In step 1, we standardized the expression changes in the target feature gene list by calculating the mean and standard deviation. In step 2, the covariance matrix for the feature genes was calculated. In step 3, the eigenvalues and eigenvectors for the covariance matrix were calculated. In step 4, eigenvalues and their corresponding eigenvectors were sorted. In step 5, k = 2 eigenvalues were selected to form a matrix of eigenvectors. In step 6, the original matrix was transformed (i.e., feature matrix * top k eigenvectors = transformed data). In this analysis, we selected the first two PCs because the variances explained by them were significant for evaluating the disease state. Loading plots of the first two PC coefficients for the various disease states showed that the gene expression changes (log2 fold change) widely varied among the moderate, severe, and deceased COVID-19 patients and healthy controls. Fig. 1 presents an overview of the dataset curation and analysis. We assigned the datasets into healthy control, moderate, severe, and deceased groups. The datasets, comprising several individuals, were combined and integrated for each group (Table 1 and Table 2). Visualization using UMAP showed distinct differences between the COVID-19 patient groups and the healthy controls. Moderate COVID-19 cases were comparable to the controls in terms of the presence of immature transitional B cells and CD8 + T cells, whereas these cell types were depleted in severe and fatal cases ( Fig. 2A and Fig. 2B). Moderate cases had a slight increase of CD14 + monocytes and memory B cells ( Fig. 2A and B). We did not observe any non-classical CD16 + monocytes in healthy controls or moderate cases; however, CD16 + monocytes were increased in severe cases and constituted approximately 50% of the total cell ratio in fatal cases. These data indicate an imbalance between lymphocytes and monocytes in the blood of severe COVID-19 cases. Our analysis also revealed significant alterations in B-cell subsets. In moderate cases, we detected transitional B cells and memory B cells, which constituted approximately 50% of the total cell population. In contrast, these B-cell subsets accounted for only 1%-4% of the total cell population in severe and fatal cases (Fig. 2B). Interestingly, we observed platelets only in fatal cases ( Fig. 2A), suggesting a role of platelets in blood clot formation, which may be a decisive factor in the outcome of SARS-CoV-2 infection.

Macrophage class switching was evident in severe COVID-19 patients
The lungs are the most severely affected organ in SARS-CoV-2 infection, and therefore we focused our analysis on the variability of the cellular landscape in lung. M2 macrophages resolve inflammation, and are known as anti-inflammatory macrophages, and were abundant in the lungs of healthy controls (Fig. 3A and Fig. 3B). The total number of M2 macrophages was decreased in moderate, severe, and fatal cases, and the number of M1 (pro-inflammatory) macrophages increased with the progression of disease severity (Fig. 3A and B). The number of airway secretory cells was increased in severe cases and accounted for approximately 45% of the total cell ratio in fatal cases (Fig. 3B).

Gene set enrichment analysis showed a lack of adaptive response in the blood of fatal cases
The cellular landscape was altered in SARS-CoV-2 infection, and the dynamic shift was associated with the degree of COVID-19 severity. We next analyzed whether the functional properties of the investigated cells were also altered. Gene set enrichment analysis (GSEA) was performed with the top 30 genes of the cells that demonstrated a large dynamic shift, either positively or negatively. GSEA showed that moderate cases had an adaptive response in immature transitional B cells, whereas these cells were absent in severe and fatal cases (Fig. 4A). In moderate cases, the CD8 + T cells showed a similar pattern as in healthy controls, with a nearly identical adaptive response (Fig. 4B). However, the CD8 + T cells in severe and fatal cases did not express genes involved in the adaptive immune response. These data demonstrate the requirement for a lymphocyte-mediated adaptive response for the successful resolution of SARS-CoV-2 infection.
Interestingly, CD16 + monocytes were not found in healthy controls. However, CD14 + monocytes did not exhibit much difference among the disease states, apart from a milder response in terms of positive regulation of the defense response in severe and fatal cases (Fig. 4C). The GSEA of CD16 + monocytes closely clustered severe and fatal cases and indicated a strong response of the lipopolysaccharide-like phenotype and neutrophil migration (Fig. 4D). This could lead to prolonged inflammation and the development of a sepsis-like condition.

Fatal cases showed dysfunctional surfactant metabolism in the lung
Intriguingly, airway secretory cells accounted for almost half of the total cell population in the lung of fatal cases (Fig. 3B). The GSEA of airway secretory cells identified dysfunctional surfactant metabolism in all COVID-19 patient groups, whereas airway secretory cells from healthy controls were enriched with genes involved in surfactant metabolism (Fig. 5A). We were unable to detect M1 macrophages in healthy controls; by contrast, M1 macrophages were increased in severe cases. The GSEA showed that M1 macrophages activate antigen processing and presentation (Fig. 5B). Interestingly, M2 macrophages demonstrated a similar phenotype of antigen processing and presentation via MHC class II in moderate, severe, and fatal cases (Fig. 5C). Macrophages that were positive for FCGR3A and FCGR3B were unable to initiate phagocytosis in severe and fatal cases (Fig. 5D). This not only indicates that the cellular landscape was altered but that cellular function was also impaired by SARS-CoV-2 infection.

Principal component analysis could discriminate COVID-19 disease states
Functional alterations of the cellular landscape were evident and may decide the outcome of patients infected with SARS-CoV-2. We next investigated whether differential cellular landscapes and gene expression patterns could distinguish between disease states. We performed PCA on significantly altered cell types individually. The PCA of immature transitional B cells shifted directions in accordance with gene expression (Fig. 6A). Immature transitional B cells exhibited almost identical gene expression in healthy controls and the moderate patient group (Fig. 6A). CD8 + T cells were not present in fatal cases, but the gene expression patterns could discriminate between the healthy controls, moderate cases, and severe cases (Fig. 6B). CD14 + monocytes were clearly discriminatory among the groups. Severe and fatal cases were in close proximity to each other and distinct from healthy controls and moderate cases (Fig. 6C). Auto-regulatory genes such as LAGLS1, LGALS3, and S100A4 were upregulated in the CD14 + monocytes of COVID-19 patients but not in healthy controls. Notably, the classical monocyte markers S100A8 and S100A9 were upregulated in moderate cases, but their expression levels were reduced in severe and fatal cases (Fig. 6C). Non-classical CD16 + monocytes were not present in healthy controls, but their gene expression pattern could distinguish moderate cases from severe and fatal cases. Compared with severe cases, CD16 + monocytes from moderate cases had higher expression levels of complement factors such C1QA/B/C; they also showed higher interferoninduced transmembrane protein 3 expression compared with these cells from severe and fatal cases. Severe and fatal cases distinctively expressed inflammatory chemokine CXCL8 and other inflammatory marker genes such as PTGS2, S100A12, S100A8, and MMP9 (Fig. 6D).
The airway secretory cells of healthy controls expressed numerous S100 fused-type proteins (SFTPs). The PCA of airway secretory cells clearly demonstrated that COVID-19 cases did not express these SFTP genes (Fig. 7A). M1 macrophages from moderate cases expressed IF130, but this expression diminished with disease severity. Notably, M1 macrophages from all disease states expressed several chemokines, such as CCL2, CCL7, and CXCL10 (Fig. 7B). Additionally, genes responsible for inflammation, such as APOE, CD68, GRN, human leukocyte antigen (HLA)-DQA1, HLA-DQB1, HLA-DRB1, SERPINA1, and CCL18, were induced in M1 and M2 macrophages ( Fig. 7B and C), indicating that macrophages play a predominantly inflammatory role. The PCA of macrophages revealed distinct clustering among groups. Macrophages from severe and fatal cases expressed HSPA1A, HSPB1, and HSPA6, indicating that an unfolded protein response was triggered. Furthermore, the severe and fatal cases showed higher expression levels of CCL3L1 and CCL4L2, indicating a chemokine-mediated action or neutrophil chemotaxis activation (Fig. 7D).

Discussion
Mapping the immunological cell landscape of SARS-CoV-2 infection is of great importance to help combat the COVID-19 pandemic. An enormous effort by scientists across the globe has enabled us to understand the pathological nature of this virus. However, many investigations have been directed toward specific target tissues, and a lack of synchronized effort in collecting different tissues across the disease severity spectrum makes it difficult to form a complete picture. Here, we provide a comprehensive understanding of the varying disease states based on the cellular landscape of blood and lung in moderate, severe, and fatal COVID-19 cases.
Cross-sectional data analysis of moderate, severe, and fatal COVID-19 cases enabled us to identify links between the cellular landscape and disease outcome. For instance, we observed the abundant presence of lymphocytes in the blood of moderate cases, while B-cell subtypes and CD8 + T cells were largely missing in severe and fatal cases. This was also evident in previous studies [30][31][32] and in cases of SARS-CoV-1 infection [33]. T cells control hyperactive innate immune responses [34]. The loss of B-cell subtypes and CD4 + and CD8 + T cells may heighten the innate immune response for a prolonged period. This is in accordance with Zhou et al. [35], who identified a heightened innate immune response in the BALF of COVID-19 patients.
We observed that increased numbers of non-classical CD14 + and CD16 + monocytes were present in severe and fatal cases. Monocytes from older adults exhibited a higher proportion of non-classical monocytes but expressed a basal level of cytokines [36,37]. Another study reported that COVID-19 patients with ARDS have a higher percentage of intermediate CD14 + and CD16 + monocytes, which constitute almost 50% of the total cell population [38]. In our study, CD14 + and CD16 + monocytes comprised approximately 13% of the total cell ratio; however, this was increased to 31% and 57% in severe and fatal cases, respectively. Interestingly, the CD14 + and CD16 + monocytes expressed the macrophage marker FCN1, suggesting monocyte differentiation into macrophages upon SARS-CoV-2 infection. Moreover, we noted that the expression of HLA-DRA/DPA1 was absent in CD16 + monocytes isolated from severe and fatal cases. Such downregulation may hamper antigen presentation to CD4 + T cells and is often associated with a sepsis-like condition [39]. We also observed that immature transitional B cells in moderate patients activated CD4 + T-cell differentiation (Fig. 4A), and convalescent COVID-19 patients exhibited a virus-specific CD4 + T-cell response [40]. This is an indication that moderate cases may have undergone proper switching from the innate to adaptive response, whereas this switching did not take place in severe cases.
To date, most severe and fatal cases occur in older adults, and the clinical symptoms in children are mostly mild [41], raising concerns of immunosenescence. Indeed, naïve CD4 + T cells and CD8 + T cells decrease as we age [42], whereas naïve CD4 + T cells obtained from gestational age 18-22 weeks differentiate into the FoxP3+, CD25 + Treg phenotype [43]. This may explain the high fatality rate in older people and supports the idea, in line with our findings, that the depletion of T cells is linked to COVID-19 severity.
We also investigated lung epithelial cells and their immune connection using cross-sectional datasets obtained from lung or BALF. A dramatic increase of airway secretory cells and M1-like macrophages was evident. In severe and fatal cases, airway secretory cells were devoid of SFTPs. This indicates a surfactant dysfunction-like disorder that may induce breathing difficulties, leading to ARDS, which often requires mechanical ventilation. Indeed, surfactant therapy is effective in reversing the impaired oxygenation in COVID-19 patients with ARDS [44]. Macrophages accounted for 50% of the total cell population in severe and fatal cases in our analysis, which is in line with previous observations in which post-mortem lungs from Middle East respiratory syndrome (MERS) and SARS-CoV-1 patients exhibited a predominant presence of macrophages [45]. Macrophages exert their effect as innate immune cells by triggering the adaptive immune response, initiating the phagocytosis process [46,47]. However, in our data analysis, macrophages from severe and fatal cases exhibited a complete decline in phagocytosis. This suggests not only quantitative changes but also qualitative changes in the immune landscape of COVID-19 patients. Notably, CD163+ macrophages have been reported as intermediate macrophages and are increased in severe COVID-19 patients [48,49]. This is in agreement with our analysis because M1 macrophages that exhibited high CD163 expression in severe COVID-19 cases accounted for 20% of the total cell population. We found that macrophages obtained from severe and fatal COVID-19 cases expressed various neutrophil and monocyte attractant chemokines, such as CCL2, CCL3, CCL4, CCL7, CCL8, CXCL8, CXCL9, CXCL10, and CXCL11. Patients requiring mechanical ventilation exhibited an extensive occurrence of these chemokines [50][51][52]. These chemokines may conceivably attract more inflammatory cells to local sites in the lung and prolong local inflammation, thereby initiating pulmonary dysfunction [53]. Thus, these chemokines can be used as prognostic biomarkers of severity, and the early detection of these markers may help in designing effective therapeutic interventions.
To the best of our knowledge, this is the first cross-sectional study using single-cell RNA-seq datasets from COVID-19 patients with various degrees of severity. We identified that the immunological cellular landscape in blood was altered in severe and fatal cases. Notably, T-cell and B-cell subsets were reduced, and the number of monocytes was increased (see graphical abstract). This resulted in an aberrant peripheral adaptive immune response. We also found that macrophages in lung acquired an inflammatory phenotype, with a high abundance of CD163+ M1-like macrophages. Our data revealed a qualitative alteration because these immune cells adopted a pro-inflammatory phenotype and secreted an extensive set of chemokines that could continue to recruit additional inflammatory immune cells to local sites. Despite these distinct findings, our study had some limitations. Owing to the limited number of severe and fatal cases in children and young adults, we were unable to draw direct comparisons between T-cell depletion and aging. Because the number of T cells may be the determining factor for successful viral clearance, future studies should focus on the relationship between T-cell numbers and disease severity among children, young adults, and those over 65 years of age. This will shed light on the differential outcomes between children and older adults with SARS-CoV-2 infection.

Data and code availability
The datasets and code used in this study are publicly available and comprehensively described in the methods section. Any other necessary information will be provided upon reasonable request.

Declaration of competing interest
The authors declare that no conflicts of interest exist.