Integrated single-cell analysis unveils diverging immune features of COVID-19, influenza, and other community-acquired pneumonia

The exact immunopathophysiology of community-acquired pneumonia (CAP) caused by SARS-CoV-2 (COVID-19) remains clouded by a general lack of relevant disease controls. The scarcity of single-cell investigations in the broader population of patients with CAP renders it difficult to distinguish immune features unique to COVID-19 from the common characteristics of a dysregulated host response to pneumonia. We performed integrated single-cell transcriptomic and proteomic analyses in peripheral blood mononuclear cells from a matched cohort of eight patients with COVID-19, eight patients with CAP caused by Influenza A or other pathogens, and four non-infectious control subjects. Using this balanced, multi-omics approach, we describe shared and diverging transcriptional and phenotypic patterns—including increased levels of type I interferon-stimulated natural killer cells in COVID-19, cytotoxic CD8 T EMRA cells in both COVID-19 and influenza, and distinctive monocyte compositions between all groups—and thereby expand our understanding of the peripheral immune response in different etiologies of pneumonia.


Introduction
The devastating impact of COVID-19 on both a global and interpersonal level is unprecedented in living memory. Patients with severe COVID-19 -caused by the SARS-CoV-2 virusdevelop bilateral pneumonia, profound systemic inflammation, and organ failure reminiscent of sepsis [1][2][3] . Within one year of its discovery, the world has seen hundreds of thousands excess deaths 4 .
Progression to life-threatening disease is mediated by a dysfunctional and excessive host immune response to SARS-CoV-2 5 . Distinct patterns are beginning to emerge in the characteristics of this immune response in hospitalized patients: lymphopenia 1,6 , a role for activated and exhausted T cells 7,8 , hyperinflammatory monocytes with reduced antigen presenting capacity 9,10 , delayed or dysfunctional interferon responses [11][12][13] , and expansion of plasmablasts and suppressive immature neutrophils in severe diseas 9,14 .
Thus far, most reports on deep immunophenotyping of patients with COVID-19 have been variable (both within and between studies) in important determinants of the immune response, such as disease severity, time of sampling, and age and sex of the participants. Key interventions in intensive care units, such as mechanical ventilation and vasopressors, exacerbate this problem through rapid and profound immunomodulatory effects 15 . These factors make it difficult to separate the true signal of COVID-19 immunopathophysiology from the noise of methodological heterogeneity and the potential distortions of confounding and bias.
Crucially, despite early observations that many of the immune disturbances, clinical symptoms and organ dysfunctions in COVID-19 can also occur in other infections, most studies lack disease controls 3,16 . The direct comparison of COVID-19 with another, well matched infectious disease state is urgently needed to distinguish truly specific immune features from the common characteristics of a dysregulated host response to infection. However, while several studies 4 reported on the immunological response during COVID-19 pneumonia at the single cell level, such investigations are currently lacking for the broader population of community-acquired pneumonia (CAP) 9,[11][12][13]17,18 .
Here, we performed CITE-Seq (Cellular Indexing of Transcriptomes and Epitopes by Sequencing 19 ) on peripheral blood mononuclear cells (PBMCs) from age-, sex-and disease severity-matched patients with CAP caused by either SARS-CoV-2, Influenza A, or other pathogens admitted to a non-intensive care ward. By integrating single cell RNA-sequencing with highly multiplexed surface protein marker detection -akin to classical flow cytometrywe create a high-resolution snapshot of cellular phenotypes and functional states, offering insight into the peripheral immune features of different etiologies of pneumonia.

Patient characteristics and clinical outcomes
We profiled PBMCs from eight patients hospitalized for CAP caused by SARS-CoV-2 (henceforth referred to as COVID-19), eight patients hospitalized for CAP caused by either Influenza A or other pathogens (all sampled before the start of the COVID-19 pandemic) and four non-infectious control subjects visiting the outpatient clinic without signs of acute infection (Table 1 for summary data, Supplementary Table 1 for individual patient data). We matched the groups for age, sex, and -the patients -for disease severity using the Modified Early Warning Score 20,21 . Samples from patients were obtained within 48 hours of admission to the general hospital ward, none of the patients received systemic corticosteroid therapy prior to sampling. Patients with COVID-19 reported a longer duration of symptoms prior to hospital admission. Lymphocyte counts were reduced in both patient groups but not significantly different, whereas neutrophil counts were significantly higher in patients with non-COVID-19 CAP. All patients classified as COVID-19 had a positive PCR for SARS-CoV-2. In the patients without COVID-19, three had a positive PCR for Influenza A -one of whom also had a positive blood culture for Streptococcus pneumoniae -and one patient had a positive sputum culture for Haemophilus influenzae. In subsequent analyses, we refer to the three patients infected with Influenza A as CAP-flu, and the remaining five patients as CAP-other. None of the patients were admitted to the intensive care unit during their hospital stay, one patient with COVID-19 died in the hospital as a result of the disease.

Highly cytotoxic CD8 EMRA-like T cells and type I interferon-stimulated NK cells characterize COVID-19 patients
An overview of the experimental setup is depicted in Fig. 1a. Post quality control we analyzed 16,192 cells from all profiled samples. Throughout this study, we infer disease-specific effects 6 in part by examining the transcriptional states of cell clusters that are proportionally expanded within a disease group. We evaluated proportional differences in cell clusters between groups with the Pearson's residual of the chi-squared test, in which comparisons with more than two residuals of difference were considered biologically relevant. All gene expression analyses were corrected for multiple testing using the Benjamini-Hochberg (BH) method, with significance defined throughout as an adjusted p <0.05. When compared with age-and sex-matched non-infectious control subjects, patients with COVID-19 exhibited stark differences in the proportional composition of cell clusters, including a decrease in monocytes and memory B cells, and an increase in naive B cells (Figs. 1c,d). While some lymphocyte clusters were relatively decreased, we found a striking proportional increase in a specific T cell cluster and NK cells in patients with COVID-19. This expanded T cell cluster mainly expressed low levels of surface CD27 and variable levels of CD45RA ( Supplementary Fig. 1d), resembling the terminally differentiated effector memory re-expressing CD45RA (EMRA) phenotype as defined by classical flow cytometry (CCR7-CD27-CD45RA+ 22 ). In fact, these cells (that we will henceforth refer to as CD8 EMRA-like T cells) formed a cluster almost entirely composed of cells from patients with COVID-19 ( Fig.   1c and Supplementary Fig. 1e). As the CD8 EMRA-like T cell and NK cell clusters showed the most prominent proportional increase in COVID-19 when compared with non-infectious controls (Fig. 1D), we next sought to uncover the identity and transcriptional state of these clusters.
In all T and NK cell clusters, we examined the expression of canonical genes and the top DEGs derived from comparing the CD8 EMRA-like and CD8 EM (effector memory) clusters ( Fig. 1e for selected genes, Supplementary Fig. 2a for the full list of DEGs). At the transcript level, the EMRA-like T cells expressed low levels of the canonical markers IL7R (encoding for CD127) and CCR7. The CD8 EMRA-like T cells exhibited remarkably high expression of genes related to cytotoxicity and killing -seemingly on par with NK cells -such as GZMB, GZMH, PRF1, and NKG7. These cells were also marked by the high expression of ITGB1 (CD29), which has recently been described as a marker of T cells with increased cytotoxic potential 23 .
We next examined the top DEGs between the CD8 EMRA-like T cell and NK cell clusters. The NK cell cluster was enriched in genes corresponding to type I and, to a lesser extent, type II interferon signaling pathways (Supplementary Figs. 2b,c). We then compared the expression levels of genes within the NK cell cluster between patients with COVID-19 and control subjects ( Fig. 1f) and discovered marked upregulation of pathways related to a type I interferon response and other antiviral responses in patients with COVID-19 (Biological Process Gene Ontology pathway analysis; Fig. 1g). Transposing this NK cell cluster-derived type I interferon response signature to the other cell clusters revealed a broad pattern of type I interferon responses across all clusters, statistically significant in several clusters, including NK cells (two-sided Wilcoxon rank-sum test p <0.05; Fig. 1h). Taken together, when compared with matched non-infectious   8   controls, patients with COVID-19 exhibited a marked proportional increase in CD8 EMRAlike T cells and type I interferon-stimulated NK cells, both with  Patients with CAP-other had higher levels of the CD8 EM and naïve T cell clusters (Fig. 2c).
Both CAP-flu and CAP-other patients showed a decrease in memory B cells when compared with control subjects (Fig. 2c).
To elucidate the transcriptional states of the differentially expanded T cell subsets between CAP-flu and CAP-other patients, we examined the top DEGs between CD8 EM T and CD8 EMRA-like cells (Fig. 2d): among the top upregulated genes in the CD8 EM-cluster we identified GZMK and genes related to MHC class II (HLD-DRB1 and CD74). CD8 EMRA-like T cells were characterized by genes related to cytotoxicity and activation signals, such as GNLY, GZMB, TYROBP, and FGFBP, which were also highly expressed by NK cells. We observed To further explore the myeloid cell clusters that were differentially expanded between CAP-flu and CAP-other, we assessed the surface marker expression of CD14 and CD16 ( Supplementary   Fig. 4e). In line with their transcriptome, classical monocytes were mostly CD14+ and CD16-, whereas non-classical monocytes were CD14 dim and CD16+. Intermediate monocytes and DCs showed variable CD14 expression, and mostly low CD16 expression ( Supplementary Fig. 4e).
Analysis of protein expression identified differential expression patterns of CD11b, CD11c and CD33, which further consolidated the identity of these clusters (Fig. 2e) 24 . Differential gene expression analysis directly comparing classical with non-classical monocytes and Gene Ontology pathways corresponding to the upregulated genes are depicted in Supplementary Figs.

4f-h.
As cell numbers within the classical monocyte cluster were higher among patients with CAPflu ( Fig. 2c), we examined DEGs within this cluster between CAP-flu and control subjects (Fig.   2f). Classical monocytes from patients with CAP-flu displayed upregulation of a variety of genes involved in proinflammatory processes, such as EGR-1 25 , FKB5 26 , and AREG 27 . Classical monocytes also transcribed several genes encoding for the S100 protein family, such as S100A4, S100A8, S100A9 and S100A12. Strikingly, we observed a concurrent substantially reduced expression of genes related to MHC class II (Fig. 2f), a quintessential feature of sepsisinduced immune suppression that has been associated with secondary infections and long-term

Differences in transcriptional signature between COVID-19 and CAP-flu in T and NK cells
To ascertain differences in the T and NK cell compositions between COVID-19, CAP-flu and CAP-other, we further examined clusters within these metaclusters ( To infer functional differences in T cell subsets, we looked at the top DEGs between the two EMRA-like T cell clusters (Fig. 4c). We noted increased expression of genes related to cytotoxicity, activation and inflammation -such as GZMB, GNLY, TYROBP, HOPX, and CCL5 -in the CD8 EMRA-like 2 cluster that was expanded in CAP-flu. The contrast in expression levels of these genes was especially apparent in comparison with the CD8 EM T cell cluster predominant in CAP-other, indicating that patients with a known viral infection harbor more activated CD8 T cells.
In line with high GNLY expression in the EMRA-like 2 T cell cluster expanded in CAP-flu, we

Distinctive monocyte subsets from patients with COVID-19, CAP-flu and CAP-other
Next, we focused on the differences in monocyte clusters between COVID-19, CAP-flu and CAP-other ( Deciphering the immune response of COVID-19 from the vast amounts of data generated thus far poses a unique challenge. Whereas our knowledge of the pathophysiological basis of other diseases is often derived from slowly-accumulated data generated in controlled settings, including in vitro and animal models, the scientific community now has to distill the true signal of disease immunopathophysiology largely from rapidly expanding and highly diverse clinical studies utilizing a variety of methods. A multi-dimensional snapshot obtained via -omics technologies is only useful if its target -and thus to whom the immunological findings may generalize -is clearly defined. We aimed to capture an undistorted view of the immune disturbances in hospitalized patients with COVID-19 (and other etiologies of CAP) by only including patients in a non-intensive care ward within 48 hours of hospital admission. With a matched and restricted design, we attempted to limit confounding by important determinants of the immune response, such as disease severity, age and sex, and invasive therapeutic interventions.

15
Despite overall lymphopenia in our population of patients with COVID-19, consistently described in earlier studies 5 , we report a proportional increase in cells that resemble CD8 EMRA T cells (referred to as the CD8 EMRA-like T cell clusters) with a pronounced cytotoxic transcriptional signature. Importantly, CAP-flu -but not CAP-other -was associated with a similar CD8 EMRA T cell response, suggesting that this is induced by viral infection in general rather than by SARS-CoV-2 specifically. CD8 EMRA T cells effectively kill virus-infected cells, have a varying proliferation capacity, and (together with CD8 EM T cells) represent the predominant phenotype in circulating CD8 T cells specific for respiratory viruses in adults 22,32,33 . Other studies in COVID-19, either overall or on antigen-specific CD8 T cells, similarly reported on CD8 T cells that were highly activated 8,11,34,35 , had increased cytotoxic potential 11,35 , and exhibited characteristics of EMRA (or sometimes EM) cells 8,34,36,37 . These T cells may be clonally expanded in moderate, but not severe, disease 11,38 . Taken together, these findings may point towards an important role for cytotoxic T cells with EMRA-like features in controlling the virus both in influenza and COVID-19 pneumonia, although it would be informative to contrast these findings with the T cell response in infections that do not require hospitalization.
Whereas T cell lymphopenia has been a ubiquitous finding, the effect of COVID-19 on NK cell numbers is more ambiguous, with cytopenia often limited to the most severe cases 11,13,35,39 . NK cells are innate lymphoid cells generally critical in the early stages of controlling viral infections, but may also contribute to immunopathology 40 . We report a proportional increase in highly cytotoxic NK cells with a type I interferon transcriptional signature in COVID-19.
Patients with CAP-flu -but not with CAP-other -also displayed an expansion of NK cells when compared with non-infectious control subjects, but this was less pronounced when compared directly with COVID-19. While the cytotoxic and interferon-responsive transcriptional features are in line with earlier findings, an expansion of NK cells has not been described in previous COVID-19 single-cell reports 11,13,18 . However, Wilk et al. reported that CD56 dim NK cells -key in the cytotoxic antiviral response -were maintained in moderate disease, whereas the CD56 bright NK cells (prone to produce pro-inflammatory cytokines such as IFN-γ) were depleted in both moderate and severe disease 13 . COVID-19 has been hypothesized to either induce hyperresponsive NK cells that exacerbate lung-destructive inflammation through production of IFN-γ; or to hamper effective viral clearance by NK cells, such as through suppressing type I interferon responses and reducing circulating cell numbers 40 . In influenza, depletion of NK cells has been implicated only in severe disease, with normal or even increased NK cell counts after mild disease or vaccination 41,42 . While our results partly deviate from previous reports, they could indicate a relatively adequate NK cell response in this population of non-intensive care unit patients with COVID-19 and CAP-flu.
Monocytes are key players in the innate immune response to pathogens, and have been reported to respond divergently to different types of RNA viruses 43  In the NK cells and (to a lesser extent) non-classical monocytes of patients with COVID-19, we found remarkably high expression of interferon-induced transmembrane protein 3 (IFITM3). IFITM3 plays an essential role in restricting viral replication in CAP-flu 49 , highlighted in a seminal paper demonstrating that, when infected with an influenza virus of limited virulence, mice lacking IFITM3 developed severe pulmonary inflammation 50 . The exact role of IFITM3 in COVID-19 still needs to be elucidated, but a recent study found explicit upregulation of IFITM3 in SARS-CoV-2 infected epithelial cells 51 , and a polymorphism of IFITM3 has been associated with disease severity and hospitalization 52,53 . Our results underline a potential role of IFITM3 in COVID-19, although mechanistic evidence will be decisive.
Immune suppression is a hallmark of the immune dysregulation that characterizes sepsis 28,29 .
Interestingly, we found a pronounced reduction of HLA-DR expression on classical monocytes in CAP-flu when compared with controls at both the transcriptional and the protein level. When directly comparing the three disease groups, HLA-DR protein expression on classical monocytes in COVID-19 was on par with CAP-flu, and both were lower than the expression in CAP-other. Immune suppression, as part of a severely dysregulated host response, has been Strengths of this study include the application of the novel CITE-seq technique, directly integrating proteomic surface marker and transcriptomic data on a single-cell level. This enabled us to readily validate transcriptome-derived clusters and certain disease-related transcriptional patterns at the protein level. Furthermore, the design of this study (in terms of matching and sampling within 48 hours after hospital admission) and the inclusion of disease controls strengthen the observations. Our investigation has several limitations. Experimentally, the total number of analyzed cells was at the lower end of standard practice, which was mainly due to our multiplexed and integrated design. Differences in patient characteristics, such as body mass index and duration of symptoms prior to admission, were largely representative of expected differences between the patient populations 5 . The causative organism in CAP is typically only identified in approximately 50% of patients 57,58 . While in this respect our (non-COVID-19) CAP cohort is representative of a general CAP population, patients with CAPother likely are a more heterogeneous group than patients with a pneumonia caused by one pathogen. Finally, one subject in our cohort was co-infected with both Influenza A and S. pneumoniae. We nevertheless decided to include this patient to create a realistic representation of patients with CAP-flu, as up to 30% of laboratory-confirmed pneumonia cases can involve viral and bacterial co-infections (most commonly Influenza and S. pneumoniae) 58,59 .
Collectively, this investigation provides insight into the peripheral immune features of CAP, including COVID-19, at the single-cell level in patients admitted to a general hospital ward. By contrasting multiple commonly encountered forms of CAP in a matched design that limits the influence of major confounding factors, we provide a framework for the roles of T cells, NK cells and monocytes in the immunopathophysiology of CAP. This knowledge could guide future mechanistic studies seeking pathogen-specific interventions. 20

Subjects and sample collection
All individual subject data, such as age, sex and comorbidities can be found in Supplementary   Table 1

PBMC isolation and storage
For all patients and controls, heparin anticoagulated whole-blood was processed within 4 hours of sampling. PBMCs were separated by density gradient centrifugation using Ficoll-Paque Plus medium (GE Healthcare Life science, Little Chalfont, UK) and washed twice, first with cold PBS and then with cold PBS supplemented with 0.5% sterile endotoxin-free bovine serum albumin (BSA) (Divbio Science Europe, Breda, the Netherlands). PBMCs were resuspended in PBS containing 0.5% BSA and 2 mM EDTA and the number of PBMCs was determined using a Coulter Counter (Beckman Coulter, Woerden, Netherlands). PBMCs were resuspended in IMDM medium containing 20% filter-sterilized fetal calf serum and pen/strep, after which an equal part of the same medium containing 20% DMSO was slowly added while continuously stirring and working on ice. 3-5 million PBMCs were viably stored in 1.8 ml cryogenic vials (Corning #430388), which were slowly brought to -80ºC. After 24-72 hours, the cells were transferred to liquid nitrogen storage until further analysis.

Sorting and staining
After PBMCs were thawed, Fixable Viability Dye kit (eBioscience, San Diego, CA, USA) was used to assess cell viability by FACS analysis (> 90% in all samples). 500,000 viable singletevents were sorted per sample using a Sony SH800 Cell Sorter (Sony Biotechnology, San Jose, CA, USA). The sorted-cells were incubated with Fc blocker (CD16/CD32, eBiosciences) for 10 minutes, after which each sample was incubated with TotalSeq Hashtag antibody tags to enable multiplexing and subsequent deconvoluting. After Hashtag staining the cells were pooled into four pools of five samples, each sample contributing equally to their respective pool. Cells were counted manually by light microscopy and Neubauer chambers. Each pooled sample was then incubated for 30 minutes with our TotalSeq oligo-conjugated antibody panel for later surface protein marker quantification.

Single-cell library generation
Libraries were generated using the Chromium Single Cell 5′ Library & Gel Bead Kit v1.1 (10X genomics, Pleasanton, CA) following the manufacturer's instructions. In short, pooled cells were loaded aiming the capture of 10.000 cells per pool. Libraries were generated according to standard protocol (Chromium Next GEM Single Cell V(D)J Reagent Kits v1.1 rev E). The amplified mRNA and antibody-derived tags were divided by size and sequenced separately.

Sequencing
Samples were sequenced using a Hiseq4000 150PE mode. Each position from the 10x chip was loaded into 1 HiSeq 4000 lane. All the four antibody derived libraries were pooled together and sequenced in 1 HiSeq4000 lane.

mRNA, hashtags and antibody alignment
All the libraries were aligned using CellRanger 3.1 (10x genomics). For the hashtags and antibody tags we inputted their sequences as provided by Biolegend (San Diego, California, USA) and tag structure as informed on the Cellranger website.

Cell deconvolution
To deconvolute the cells belonging to each sample we used the R package Seurat (v3) 61 . The outputs derived from CellRanger were used to create two separate objects (one with the transcriptome alignment and one with the antibody plus hashtags (HTO) alignment). Initial objects were created using the function "Read10X". We filtered both objects based on the cell barcode to keep only cells which were identified in both the transcriptome and in the antibody alignments. After this cell filtering, we used the function "CreateSeuratObject" to create a transcriptome-based Seurat object. The antibody derived data was filtered to maintain only the hashtag counts; later it was appended as a specific assay using the "CreateAssayObject" function. Antibody reads were normalized using the CLR method. For cell demultiplexing we used the function "HTODemux" with the parameters "init=9", "nsamples=10000" and "positivie quantile" ranging between 0.999 and 0.9999999 per each sample, in order to maximize the number of singlets detected. Individual single cells were finally filtered based on their assigned "HTO_classification.global"= "Singlet".

Antibody quantification and normalization
Antibody data was normalized using the Seurat function "NormalizeData"with the parameters "normalization.method" = "CLR" and "margin"="2", to indicate a normalization across cells.

Quality control
Cells were filtered based on number of features (nfeature_RNA), number of genes (nCount_RNA) and percentage of mitochondrial reads (percent.mt) using Seurat

Clustering and visualization
Principal components for each set of cells as shown in individual Figures were identified using the "RunPCA" function. Cells were then further processed for clustering and visualization using the Seurat functions "FindNeighbors", "FindClusters"and "RunUMAP". The number of principal components used as input were determined by using the "ElbowPlot" function and identifying the number of the PCs which explain most of the data variance.

Quantification and statistical analysis
All statistical analyses were performed using R version 3.6.3. Enrichment analysis were done by first generating contingency tables with the cell distributions per cluster. Chi-squares for the contingency tables were calculated ("chisq <-chisq.test(table)", the residuals rounded "round(chisq$residuals, 4)" and correlation plots generated "corrplot(chisq$residuals, is.cor = FALSE, col=rev(brewer.pal(n=8, name="RdBu")), tl.cex = 1, cl.pos="r", cl.ratio=1, cl.length=10)" with the graphical parameters shown adjusted per cluster. Comparisons with more than two residuals of difference at least were considered biologically relevant. In the box 25 and whisker plots data was represented with a median line and a box indicating the interquartile range, with individual data points shown (Figs. 1h, 2h, and 4h). Significance was defined as p <0.05. Statistical significance was determined using either the two-sided Wilcoxon rank-sum test (Fig. 1h) 3 Modified Early Warning AT-II = angiotensin II; CAP = community-acquired pneumonia; CURB-65 = confusion, blood urea nitrogen, respiratory rate, blood pressure, age 65 or older; COPD = chronic obstructive pulmonary disease; qSOFA = quick sequential organ failure assessment score.