High-resolution single-cell atlas reveals diversity and plasticity of tissue-resident neutrophils in non-small cell lung cancer

Summary Non-small cell lung cancer (NSCLC) is characterized by molecular heterogeneity with diverse immune cell infiltration patterns, which has been linked to therapy sensitivity and resistance. However, full understanding of how immune cell phenotypes vary across different patient subgroups is lacking. Here, we dissect the NSCLC tumor microenvironment at high resolution by integrating 1,283,972 single cells from 556 samples and 318 patients across 29 datasets, including our dataset capturing cells with low mRNA content. We stratify patients into immune-deserted, B cell, T cell, and myeloid cell subtypes. Using bulk samples with genomic and clinical information, we identify cellular components associated with tumor histology and genotypes. We then focus on the analysis of tissue-resident neutrophils (TRNs) and uncover distinct subpopulations that acquire new functional properties in the tissue microenvironment, providing evidence for the plasticity of TRNs. Finally, we show that a TRN-derived gene signature is associated with anti-programmed cell death ligand 1 (PD-L1) treatment failure.


In brief
Salcher et al. integrate non-small cell lung cancer (NSCLC) single-cell datasets, refine patient stratification, and reveal histological and genotypic associations with tumor microenvironment (TME) composition. Single-cell sequencing of cells with low mRNA content identifies tissue-resident neutrophil (TRN) subpopulations with non-canonical functional properties. TRN gene signature is associated with immune-checkpoint inhibitor treatment failure.
INTRODUCTION Non-small cell lung cancer (NSCLC) is a highly aggressive and heterogenous disease with diverse histological subtypes and distinct mutational signatures, 1 which accounts for an annual global death rate of 1.8 million patients. 2 The technical advances in single-cell RNA sequencing (scRNA-seq) technologies enabled the dissection of the complex NSCLC tumor microenvironment (TME) in different stages, and numerous scRNA-seq NSCLC studies have identified a hitherto underestimated TME heterogeneity in early and advanced disease. 3-13 Furthermore, these studies highlighted the importance of small cell populations in governing essential biological pathways such as immune cell activation or trafficking by tumor endothelial cells. 6 However, a major limitation of these studies is the limited number of analyzed patient samples per study. Moreover, the lack of genomic data as well as long-term follow-up data prevents comprehensive dissection of the biological heterogeneity and its potential contribution to therapy resistance and survival outcome.
Technical and methodological variations between the different studies result in significant inconsistencies and knowledge gaps. As such, not all cell types (e.g., neutrophilic granulocytes) have been portrayed in the same depth and extension yet, posing an unmet need to characterize these populations as well. In NSCLC, it is well accepted that next to cancer cells, leukocytes  Article compose the majority of cells within the TME. 12,14 Particularly since immunotherapy is routinely used in clinical practice, indepth characterization of the cancer immune cell compartment has been intensively pushed forward, and diverse cellular subsets have been profiled. 5,9,15 Previous compositional analyses by flow cytometry as well as histological work ups suggested that neutrophils compose a significant proportion of all tumorresident leukocytes, with an estimated abundance ranging from 8% to 20%. 14,16,17 Intriguingly, when looking at the scRNA-seq studies in NSCLC published over the last years, neutrophils are clearly underrepresented. This discrepancy is most likely based on technical issues rather than on biological phenomena, but its clarification is of immense importance for our fundamental immunological understanding of NSCLC and for potential translational clinical investigations. This notion is underscored by pre-clinical data suggesting that neutrophils are essential mediators of both pro-and anti-tumor inflammatory pathways (reviewed in Shaul and Fridlender 18 ), including the potential of neutrophils to limit lymphocyte trafficking into malignant tumors, thereby limiting efficacy of programmed cell death 1 (PD-1) inhibition. 19 Correlative studies in patients with NSCLC have linked the neutrophil:lymphocyte ratio with clinical outcome and response to immunotherapy. [19][20][21] Additionally, pre-clinical evidence strongly supports the use of neutrophil-depleting agents (e.g., CXCR2 antagonists) as an adjunction to immunecheckpoint inhibitors. 19 To overcome the above-mentioned hurdles, we compiled major publicly available datasets into a comprehensive NSCLC scRNA-seq atlas covering 232 patients with NSCLC and 86 non-cancer controls. Additionally, given the scarcity of neutrophil single-cell data, we complemented the atlas by analyzing samples from 17 patients with NSCLC using a platform that captures cells with very low transcript count and carried out deep characterization of tissue-resident neutrophils (TRNs) including both tumor-associated neutrophils (TANs) and normal adjacent tissue-associated neutrophils (NANs).

RESULTS
Generation of a core large-scale NSCLC single-cell atlas We first developed a core NSCLC atlas by compiling scRNA-seq data from 19 studies and 21 datasets comprising 505 samples from 298 patients ( Figure 1A). This comprehensive NSCLC single-cell atlas integrates expert-curated, quality-assured, and pre-analyzed transcriptomic data from publicly available studies as well as our own dataset (UKIM-V) in early and advanced stage NSCLC of any histology (see STAR methods; Figures S1A-S1K). Important study characteristics are summarized in Table S1. In total, the core atlas includes transcriptomic data from 212 patients with NSCLC and 86 control individuals, comprising 196 tumor samples and 168 non-tumor control samples. Of the 212 patients with NSCLC, 156 were histopathologically annotated as lung adenocarcinoma (LUAD), 41 as lung squamous cell car-cinoma (LUSC), and 15 were not otherwise specified (NSCLC NOS). NSCLC samples include tissue of the primary tumor (n = 176) or metastasis (n = 45) that were obtained either by surgical resection or by computed tomography-and bronchoscopy-guided biopsies. We clustered the disease stages of the patients with NSCLC as early (UICC stage I-II) versus advanced (UICC III-IV) diseases, as not all studies provided sufficient information on tumor stages. Among the control samples, 89 were derived from distant non-malignant tissue of patients with lung tumors (annotated as normal_adjacent), of which 65 have a patient-matched tumor sample. Further, 10 samples were derived from non-tumor-affected lymph nodes of patients with NSCLC (annotated as normal) and 79 samples from patients without evident lung cancer history (annotated as normal). Of the control patients, 18 had a history of chronic obstructive pulmonary disease (COPD). Overall, the core atlas integrates 898,422 single cells, which are annotated to 12 coarse cell-type identities and 44 major cell subtypes or cell states (e.g., dividing cells) based on previously established canonical single-cell signatures (Figure S1A) including 169,223 epithelial cells, 670,409 immune cells, and 58,790 stromal and endothelial cells ( Figure 1B). We also annotated important CD8 + T cell subclusters (terminally exhausted, activated, effector memory, naive, natural killer [NK]like, dividing) using previously reported marker genes 22 (Figure S1J). The cell-type composition for each dataset, the tissue of origin, and the patients within the core atlas are shown in Figures S1B and S1C.
Neutrophils are underrepresented in most scRNA-seq studies In both tumor and normal lung tissue of all core atlas samples, the neutrophil cluster (FCGR3B, CSF3R, CXCR2, and G0S2) comprised 8,468 cells with overt low mRNA counts. Overall, neutrophils account for only 1.5% of all atlas cells ( Figure S1E). Remarkably, 78% of all atlas neutrophils derive from the UKIM-V dataset ( Figures 1C and S1B), in which neutrophils compose 12% of all cells and 18% of all leucocytes, respectively. The remaining neutrophil data originate mainly from four other datasets, 4,5,10,12 in all of which neutrophils comprise less than 6% of all leucocytes. Particularly in those studies using the droplet-based 10x Chromium platform, neutrophils are completely absent or only rarely depicted ( Figure S1F). Our comparative flow cytometry analysis demonstrated that neutrophils account for 10%-20% of all leucocytes in NSCLC tumor and normal adjacent tissues (n = 63) ( Figure S1G), which is in accordance with previous flow cytometry and histology data. 14,16,17 Thus, low neutrophil abundance seen in previous scRNA-seq datasets suggests an underrepresentation, most likely due to technical issues. Neutrophils are fragile, short-lived cells (circulatory half-life of 7-10 h in humans 18 ), are particularly sensitive to handling procedures, and express an exceptionally low amount of mRNA molecules. 4 Comparative analysis of scRNA-seq platforms indicated that the BD Rhapsody workflow captures a notably high number of mRNA molecules per cell and may thus be especially suitable to depict low-mRNA-content cells ( Figure S1H). As a consequence, neutrophils represented a major cell cluster in the UKIM-V dataset generated with BD Rhapsody, whereas the low-mRNA-content neutrophil cluster could not be appropriately detected in the datasets generated with 10x Chromium, as also described recently, 28 and only to a very limited extend when applying other platforms ( Figure 1C).
scRNA-seq of low-mRNA-content cells Due to high clinical relevance of neutrophils and the need for their better in-depth transcriptomic characterization, we used the advantage of the BD Rhapsody platform in depicting cells with low mRNA content for further analysis. As our initial dataset included only three patients, we next performed scRNA-seq of an additional 14 patients with NSCLC to increase the statistical weight of our cohort (UKIM-V cohort). In total, our dataset contains tumor and adjacent normal lung tissue from 17 patients (6 male, 11 female) undergoing lobectomy for treatment-naive NSCLC (12 LUAD, 5 LUSC). Cells were freshly isolated, processed, and sequenced as described in detail in the STAR methods. The UKIM-V dataset (1 and 2) comprises 122,902 cells that cluster into all main lung cell types defined by the expression of specific marker genes ( Figure 1D). Neutrophils are characterized as a cell cluster with exceptionally low mRNA content and thus a relatively low number of detected transcripts, but due to the relatively high number of mRNA molecules captured per cell (unique molecular identifier [UMI] counts in epithelial cells: 8,938), we could readily depict these cells. The 15,190 neutro-phils identified in the UKIM-V dataset were derived from control lung (n = 6,378) and corresponding tumor tissue (n = 8,812).
Extension of the core single-cell atlas by transfer learning To combine the strength of the core atlas with our own data including neutrophils, we used the recently developed transferlearning method scArches, 29 which enables the extension of the core atlas using additional current, as well as future, datasets. We mapped our second UKIM-V dataset as well as one recently published dataset comprising 288,157 cells 11 onto the atlas ( Figure 1E). The cell-type annotations were transferred from the core atlas to the two new datasets on the basis of transcriptomic similarity in the batch-corrected joint embedding (see STAR methods). This extended atlas now integrates 29 datasets from 19 studies and comprises 1,283,972 cells, 44 cell types, 556 samples, and 318 patients, resulting in 1.75 billion expression values. The overall cell-type composition of the extended atlas is shown in Figure S1I, and the patient numbers per cell type are shown in Table S2. All subsequent analyses were carried out on the extended atlas dataset unless otherwise specified (Table S3).
Identification of changes in cell-type compositions in distinct histological or genetic tumor types and tumor stages is of utmost importance as it can highlight hetero-cellular interactions and possibly enable association(s) with therapy response. However, detecting shifts in cell-type composition using scRNA-seq data is challenging due to the inherent bias present in cell-type compositions and low sample sizes. We therefore adopted a Bayesian model for identifying changes in cell-type compositions while controlling for the false discovery rate (scCODA tool 30 ). scCODA requires setting a reference cell type that is assumed to be constant between conditions. When comparing cellular composition in LUSC and LUAD in primary tumor tissue with cancer cells as reference cell type, we found a significantly higher proportion of neutrophils in LUSC, whereas macrophages, CD4 + T cells, alveolar cells type 2, and transitional club/AT2 were more abundant in LUAD ( Figure 1F). The analysis of the cell-type compositions in early-stage compared with advanced-stage NSCLC tumors showed a higher abundance of cDC2 in the early stage ( Figure S1K). To validate the findings, we carried out orthogonal validation using an external cohort of 110 patients with NSCLC (55 LUAD and 55 LUSC) and immunohistochemistry staining for neutrophils, CD4 + T cells, and macrophages. The validation results confirmed the findings ( Figure 1G) and further support histology-specific TME characteristics.
Single-cell composition of the TME reveals distinct NSCLC tumor immune phenotypes Next, we stratified patients with NSCLC based on infiltration patterns of their TME using the extended atlas. Unsupervised clustering on batch-corrected cell-type fractions revealed four distinct tumor immune phenotypes ( Figure 2A): (1) immune-deserted (ID) tumors (i.e., no significant immune cell infiltration but a high cancer cell fraction); (2) the subtype of tumors with B cell dominance (B; B cell, plasma cell, mast cells); (3) the subtype of tumors with myeloid dominance (M; macrophage/monocyte); and (4) the subtype of tumors with T cell dominance (T; CD8 + , CD4 + , T regulatory cells). The affiliation of UKIM-V ll OPEN ACCESS Article patients to myeloid cell and T cell subtypes was validated using flow cytometry ( Figures S2A and S2B). Across the strata, most patients of the B cell, myeloid cell, and T cell subtypes showed LUAD histology, while half of the patients with LUSC were over-represented in the ID subtype ( Figure S2C). Neutrophils were excluded from patient stratification since they are underrepresented in most datasets. Using logistic regression, we did not find any association of the different patient strata to tumor stages (early versus advanced) or sex.
To identify tumor-cell-based TME imprinting characteristics, we next analyzed differentially enriched pathways 31 in the cancer cells of each of the four immune phenotypes. The ID subtype showed significant downregulation (false discovery rate [FDR] <0.1) of the androgen pathway ( Figure 2B). Previously, androgen receptor signaling has been shown to suppress programmed cell death ligand 1 (PD-L1) transcription in hepatocellular carcinoma (HCC) cells and may thus exert immune stimulatory effects. 32 Analysis of differentially expressed transcription factors 33,34 in the cancer cells of each subtype showed a significant downregulation of FOXO4 in the ID subtype ( Figure S2D). As previously reported, FOXO transcription factors are essential mediators of immune cell homeostasis, 35 and FOXO4 downregulation could thus promote an ID phenotype.
We then applied the tool CytoSig 36 to define enriched cytokine signaling signatures in cancer cells of each immune phenotype ( Figure 2C). CytoSig analyzes defined cytokine signatures that are differentially expressed when a cell is exposed to a specific cytokine (that is name giving for the respective cytokine signature). As expected, most signatures are downregulated in the ID group; solely, the signature of the tolerogenic cytokine interleukin-4 (IL-4), a modulator of T regulatory cell-mediated immune suppression, 37 was significantly elevated ( Figure 2C). We found a significant upregulation of interferon type I-III signatures in the myeloid subgroup, suggesting a particularly important role of macrophages/monocytes in interferon signaling in the TME ( Figure 2C).
Analysis of cell-cell communication reveals heterocellular crosstalk in the TME Using the CellPhoneDB ligand-receptor complexes database, we next assessed differences in the hetero-cellular crosstalk of cancer cells toward diverse immune cells among the two major histotypes LUAD and LUSC by analyzing the top 10 differentially expressed cancer cell ligands ( Figure 3A; the top 30 ligands are shown in Figure S3A). Overall, in both histologies, cancer cellular interactions were directed to diverse immune cell subtypes with different targets. In LUAD, we found a prominent upregulation of the KDR-VEGFA axis from cancer cells toward neutrophils, macrophages/monocytes, mast cells, and classical dendritic cells (cDCs), potentially implicating immunosuppressive signaling by cancer cells in this histotype. 38 Other major LUAD pathways involve the immunosuppressive macrophage scavenger receptor MARCO 39 as well as ADGRE5-CD55 signaling, associated with migration and invasion. 40 Conversely, in LUSC cancer cells, there is upregulation of pro-migratory SPP1 signaling 41 that has previously been reported as upregulated in lung cancer tissue particularly of squamous histology, 42 as well as an upregulation of Jagged1 (JAG1), which induces NOTCH, thereby promoting tumor progression and regulating the tumor immune microenvironment via, e.g., neutrophil recruitment. 43 Next, we investigated the crosstalk of cancer cells to immune cells within the patient immune subtypes T cell, B cell, myeloid cell, and ID by analyzing differentially expressed cancer cell ligands ( Figure 3B). While downregulated in the ID subgroup, B, M, and lesser T cell subgroups showed upregulated signaling of several chemokines (CXCL9/10/11, CCL3/13/18) to their cognate receptors on T and myeloid cell subsets, suggesting that cancer-cell-secreted chemokine gradients contribute to immune infiltration. 44 Integration of bulk RNA-seq data reveals genotypeimmune phenotype associations scRNA-seq provides an unprecedented view on the cellular heterogeneity in the TME. However, the majority of the scRNA-seq studies lack both cancer genotype information and survival data. The TCGA reference dataset includes this information together with bulk RNA-seq data. Using the recently published computational method SCISSOR, 45 we evaluated the association of atlas-derived cell-type transcriptomic signatures with genotype and survival data from the TCGA reference dataset including 1,026 patients (UICC I-IV, LUAD, and LUSC). In a previous pancancer study using bulk RNA-seq data, we have shown that genomic features including mutational load, tumor heterogeneity, and specific driver genes determine immune phenotypes. 46 Here, the high resolution of the single-cell NSCLC atlas enabled an indepth analysis of these determinants. EGFR, TP53, KRAS, and STK11 mutations showed distinct immune infiltrates ( Figures 4A-4D, S4A, and S4B). For example, cDC1 and cDC2 showed opposite infiltration patterns in patients with LUAD with mutations of either EGRF (high cDC infiltration, as reported previously 47 ) or KRAS and STK11 (low cDC infiltration) ( Figures 4A-4D). Conversely, TP53-and STK11-mutated genotypes were associated with CD8 + T infiltration, which is not seen in EGFR-or KRAS-mutated tumors ( Figures 4C, 4D, S4A, and S4B). High CD8 + T cell infiltration in TP53-mutated LUAD has also been described previously. 47 Hence, our single-cell view of the TME provides further evidence for the link between the genetic makeup of the tumor, the histology, and the respective immune contexture. 48 Given the importance of the driver genes in terms of treatment decisions, we confirmed our findings by orthogonal validation of the genotype-immunophenotype associations using two external cohorts (n = 19 and n = 37) ( Figures S4C and S4D).
We next analyzed the cell-type transcriptomic signatures and their association with survival in 1,026 patients from the TCGA cohort. Overall, patients with NSCLC with B cell-rich tumors showed a prominent association with improved survival, whereas neutrophils were the strongest negative survival predictor (Figure 4E) and were, together with monocytes, the only immune cell types that were negative predictors in both LUAD and LUSC ( Figures 4E, S4E, and S4F). To support our finding, we used an independent method based on deconvolution using bulk RNA-seq data and confirmed that B cells are indeed associated with better prognosis, albeit significantly only for LUAD ( Figures 4F, S4G, and S4H), which has also been proposed in multiple previous studies (reviewed in Patel et al. 49 ). Finally, the analysis of the CD8 + T cell subtypes showed that naive CD8 + T cells were the strongest predictor for improved survival ( Figure 4G).
TRNs acquire new properties in the TME One unique feature of the large-scale NSCLC atlas we assembled is the enrichment with single-cell expression profiles from neutrophils generated using samples from 17 patients with NSCLC, so we focused on the deep characterization of these cells. TANs are known as a very heterogenous cell population with both anti-and pro-tumorigenic properties. 50 Likely due to technical reasons, the characterization of NANs lags even further behind that of TANs. To overcome this insufficient TRN characterization, we here characterized the transcriptomic signatures of TANs and NANs in NSCLC using the extended atlas ( Figures 5A and 5B). Neutrophils were more abundant in patients with LUSC compared with those with LUAD ( Figure S5A    In a recent study, remarkable neutrophil adaptability to different tissue environments was shown, 51 suggesting that while transient, TRNs acquire new properties and function within tissue. The neutrophil phenotype differs in dependence on spatial-, temporal-, and disease-specific clues 52 as well as during the evolution from bone-marrow-resident immature (CXCR4 high , CXCR2 low , CD16 low , CD62L low , MME low ), to circulating/mature (CXCR2 high , CD16 high , CD62L high ), to aged/senescent neutrophils (CXCR4 high ). 53-55 However, none of these markers are specific for a certain maturation state. Relative to TANs, matched NANs in our dataset showed high expression of established neutrophil maturity markers (SELL, PTSG2, CXCR2, CXCR1, FCGR3B, MME) as well as canonical neutrophil markers (S100A8, S100A9, S100A12) ( Figures 5D and S5B). While downregulation of these markers in TANs suggests immaturity, we could not identify a clear expression pattern of previously suggested immaturity signatures. 55,56 Notably, CXCR1 and CXCR2 have previously been reported as TAN markers in NSCLC; 4,12 however, our analysis revealed elevated CXCR1/ CXCR2 expression in NANs ( Figure 5D). Conversely, low expression of SELL (CD62L) and CXCR2 (both downregulated in aged neutrophils 57 ) ( Figure 5D) and high expression of the known neutrophil activation markers CD83 (an inhibitory immune checkpoint), 58,59 the atypical chemokine receptor CCRL2, 60 ICAM1 (CD54), 57 and C15orf48 (a mitochondrial transcript upregulated during inflammation) 61 as well as several cytokines (CCL3, CCL4L2, CCL4, CXCL2) ( Figures 5D and S5B) support an aged/chronically activated/exhausted TAN phenotype. 62 Neutrophils support the pro-angiogenic switch in cancer via release of VEGF and other pro-angiogenic factors (reviewed in Ozel et al. 63 ). Our atlas provided evidence that neutrophils represent a major source for VEGFA expression within the NSCLC TME ( Figure 5E). A highly differentially expressed TAN marker of major interest is lectin-type oxidized LDL receptor 1 (LOX-1) encoded by the OLR1 gene that is known as main receptor for oxidized low-density lipoprotein (LDL) 64 ( Figures 5D and S5B). OLR1 has been described as a putative marker to distinguish normal peripheral blood neutrophils (LOX-1 À ) from polymorphonuclear myeloid-derived suppressor cells (PMN-MDSCs), 65 respectively. However, concordant to our results, OLR1 expression by TANs has been previously described, 12 and our comparative analysis to matched NANs underlined the tumor specificity of this marker. Moreover, peroxisome proliferator-activated receptor gamma (PPARG), a nuclear receptor and direct transcriptional regulator of OLR1, 66 was elevated in TANs (Figure 5F). Concordantly, flow cytometry analysis of tissue from patients with NSCLC (n = 7) confirmed elevated LOX-1 (OLR1) expression in TANs ( Figure 5G). We could further validate the transcriptomic TAN signature at the protein level by flow cytometry, including elevated expression of CD83 (n = 7) and CD54 (ICAM1) (n = 6) and lower expression of CD181 (CXCR1), CD62L (SELL) (n = 8), and CD16 (n = 35) ( Figure 5G).
We additionally performed multiplex immunofluorescence staining of paraffin-embedded NSCLC tumor tissue and patient-matched normal adjacent lung tissue. Co-staining of LOX-1 and CXCR2 suggested LOX-1 as neutrophil marker ( Figure S5C) (of note, CXCR2 also marks MDSCs 67 ). We found infiltration of LOX-1 + cells in tumor tissue but not in adjacent normal-lung tissue ( Figure 5H), underlining the cancer-tissue specificity of this marker.

Plasticity and non-canonical functional properties of TRNs
Previous studies have proposed transcriptomic subclusters of neutrophils in NSCLC. 4 However, a distinct subclassification and in-depth characterization of TRN including TANs and NANs in NSCLC has not been described so far. We applied unsupervised Leiden clustering on all atlas neutrophils (n = 19,166), separating four TAN subsets (TAN-1 to TAN-4) and three NAN subsets (NAN-1 to NAN-3) ( Figure 6A) that are backed by multiple datasets and multiple patients ( Figure S6A). Marker gene selection revealed an extensive phenotypic heterogeneity among the clusters and allowed identification of marker genes for each subcluster, of which the top 5 are given in Figure 6B. The TAN signature genes described above (OLR1, VEGFA, CXCR4, CD83) showed relative homogenous expression among all TAN subclusters ( Figure S6B). Overall, NAN clusters showed a predominance in LUAD and TAN clusters in LUSC tumors ( Figures S6C and S6D).
Specific NAN-1 genes included the alarmin S100A12, a known marker of activated proinflammatory neutrophils, 68   NETosis co-factor PADI4, 69 as well as pro-angiogenic markers (PROK2, MMP9). The NAN-2 cluster is pretty similar to the NAN-1 cluster but showed reduced expression of some NAN-1-specific genes (e.g., S100A12, PADI4, MMP9). NAN-3 shows strong expression of interferon-stimulated genes (GBP1, GBP5, IFIT2) ( Figure 6B). TAN-1 shows high expression of interleukin 1 receptor antagonist (IL1RN), a known marker of activated neutrophils that negatively regulates IL-8 secretion to control excessive neutrophil inflammatory activity, 70 as well as of the potent NF-kB activator RIPK2 71 and CD44, regulating cell recruitment and adhesion 72 ( Figure 6B). These finding support the concept that the plasticity of neutrophils is profoundly shaped by the NSCLC TME that attracts and activates neutrophils.
The TAN-2 subcluster was characterized by the expression of the major histocompatibility complex (MHC) class II genes HLA-DRA, CD74, HLA-DMB, and HLA-DRB1, indicating a phenotype with an immunogenic antigen-presenting feature. Both CD74 and HLA-DRA are also expressed in the other TAN clusters, albeit at lower levels ( Figure 6B). To validate the scRNA-seq findings, we analyzed samples from an additional 11 patients using flow cytometry, confirming the antigen-presenting phenotype as seen by an upregulation of HLA-DR on TANs in NSCLC compared with neutrophils from normal adjacent lung tissue ( Figure 6C). Of note, the transition to HLA-DR + neutrophils was accompanied by a shift toward the identified TAN signature (elevated expression of CD83 and LOX-1 and lower expression of CD181 [CXCR1], CD62L [SELL], and CD16) in neutrophils derived from NSCLC tumor tissue ( Figure S6E).
The TAN-3 subcluster was characterized by a high expression of proinflammatory cytokines (C15orf48, CCL3, CCL4, CSTB) as well as galectin 3 (LGALS3), which is associated with neutrophil activation and emigration ( Figure 6B). Finally, TAN-4 showed high expression of ribosomal genes (such as RPS12, RPL3, RPN2, RPL23) ( Figure 6B) similar to a neutrophil cluster identified in patients with severe COVID-19. 56 This may suggest the highly plastic phenotype of TAN-4 eventually transitioning to other cell phenotypes, as described previously for tumor endothelial cells. 6 The transcriptional profiles of the neutrophil subsets indicate their remarkable phenotypic plasticity. We therefore performed RNA velocity analysis (see STAR methods) using only the UKIM-V dataset (which includes treatment-naive patients with NSCLC only) since the method requires raw sequencing data. The analysis indicates a transition from NAN-3 to both NAN-2 and NAN-1 (whose transcriptomic signatures are similar). NAN-3 and NAN-2 transition to TAN-2 and TAN-1, respectively, with TAN-1 and TAN-2 transitioning into all TAN subtypes (Figures 6D and partition-based graph abstraction 73 in 6E). Interestingly, this TRN evolution seems to follow a one-directory path with TAN-3 as final transition, although at this point we do not know whether TAN-3 could further transit to other cell types. These observations support the hypothesis that TAN phenotypes are substantially modulated by local cues encountered in the TME. 18 We next investigated the cellular interactions of TRN subsets with CD8 + T cells and cancer cells by analyzing differentially expressed TRN ligands (FDR <0.01, abs(log2FC) >1) of each subset, revealing distinct signaling of NANs versus TANs ( Figure 6F). In all TAN subsets, we found VEGFA signaling toward cancer cells, again underlining their important proangiogenic role, as well as SPP1 signaling, which has been associated with an immunosuppressive TME 74 and pro-migratory 75 effects. CD274 (PD-L1) to PDCD1 (PD-1) signaling is significantly upregulated in TAN-2 while being downregulated in NAN-1, proposing CD8 + T cell-inhibitory effects of TAN that accord well with previous observations of impeded immunotherapy responses in neutrophil-rich tumors. 19 Conversely, NANs showed prominent interactions involving genes of the tumor-necrosis family (TNFSF13B, TNFSF10, LTB) that have been previously associated with neutrophil activation. 76,77 TRN gene signature is associated with immunecheckpoint inhibitor treatment failure Our deconvolution of the diversity of TRNs at the single-cell level prompted us to relate this information to patient prognosis and response to both chemotherapy and therapy with immunecheckpoint inhibitors (ICIs). Using a previously proposed approach for finding specific marker genes for cell-type estimation from bulk RNA-seq samples 78 (see STAR methods), we derived a signature of genes that are highly specific for TANs (n = 18) or NANs (n = 20) and are expressed only at a very low level in other cells ( Figure 6G). We additionally defined a TRN signature as the union of TAN and NAN signature genes (n = 38) ( Table S4). The expression of the TRN signature genes was heterogeneous between the different TRN subsets ( Figure 6H). In order to analyze the prognostic and predictive value of the TRN signature, we used bulk RNA-seq data from pre-treatment tumors from POPLAR 79 and OAK, 80 two randomized clinical trials of anti-PD-L1 antibody (atezolizumab) versus chemotherapy (docetaxel) in patients with NSCLC, representing the largest transcriptional collection in these settings. 81 In total, there were 891 patients, of which 439 were treated with atezolizumab (316 LUAD and 123 LUSC) and 452 with docetaxel (313 LUAD and 139 LUSC). The TRN gene signature was associated with anti-PD-L1 therapy failure in these NSCLC cohorts ( Figure 6I). Analysis of the survival data from these cohorts showed that the prognostic benefit of the TRN signature was significant for the anti-PD-L1 arm ( Figure 6J) but not for the docetaxel arm (Figure 6K). The prognostic value for the anti-PD-L1 arm was stronger for LUSC ( Figure S6F) compared with LUAD ( Figure S6G). The signatures for the different subsets TRN, namely NANs and TANs, were both predictive for the anti-PD-L1 arm of the NSCLC cohorts ( Figures S6H and S6I). The results for the signatures for other cell types (Table S5) are shown in Figure S6J.

DISCUSSION
We built a large-scale atlas of single-cell transcriptomes of NSCLC through integration of 29 datasets spanning over 1,280,000 cells from 556 samples and 318 individuals representing 1.75 billion expression values. Reduction of dataset-specific batch effects due to variation in experimental design and used platforms while retaining biological information resulted in a high-quality reference atlas for NSCLC, offering superior coverage of histological and clinical variables and thereby providing a unique resource for dissecting the cellular diversity in the TME and generating hypotheses. We leveraged the information content of the NSCLC atlas by sequencing additional patient samples using a platform suitable ll OPEN ACCESS Article to depict low-mRNA-content cells, which enabled us to comprehensively characterize the diversity and plasticity of TRNs.
First, we provide a high-resolution view of the TME in NSCLC with 44 major cell types/states and show different cell-type composition patterns in LUAD and LUSC, including more precise functional transcriptomic classification of malignant epithelial cells in both histotypes. The single-cell composition of the TME in NSCLC enabled refined tumor classification and patient stratification into four immune phenotypes: ID, myeloid, B cell, and T cell subtypes. These findings may have important implications for improving cancer immunotherapy in NSCLC. For example, combination therapies that target both myeloid cells and lymphoid cells could represent an immunotherapeutic strategy to treat myeloid subtypes, as shown recently in a melanoma mouse model. 82 Similarly, given the heterogeneity of the intratumoral B cells and the importance of tertiary lymphoid structures, 83 further analysis of the B cell subtype might open promising therapeutic avenues by additional refined B cell targeting.
Second, integration of bulk RNA-seq data from the TCGA NSCLC cohort uncovered cell subsets associated with alterations in major driver genes, such as EGFR, KRAS, STK11, and TP53, in both LUAD and LUSC subtypes, providing further evidence that genetic aberrations in cancer cells dictate the immune contexture of tumors. 48 We validated the findings from the computational analysis using two independent cohorts and immunofluorescence/immunohistochemistry assays, confirming previously published reports on genotype/immunophenotype dependencies. 47,84,85 This knowledge could be exploited to derive rationale for personalized therapeutic combination strategies based on the underlying genetic tumor profile.
Third, we provide in-depth characterization of TRNs including both TANs and NANs in human NSCLC. Our dissection of the diversity of TANs suggest that the conflicting reports can be attributed to the different TRN subsets. Of particular interest for cancer immunotherapy is the TAN phenotype with an immunogenic antigen-presenting feature. This observation implies acquisition of antigen-presenting-like properties by neutrophils at the tumor site, as previously reported. 86 Such conversion of neutrophils to antigenpresenting cells may elicit anti-tumor immunity and has recently been shown in a murine model. 87 Identification of targets that can block the transition of the antigen-presenting TAN-2 subset into TAN-1 and TAN-3 or reverse the final phenotypes into TAN-2 phenotype is an important goal for future studies.
Finally, we report that the TRN-derived gene signature has a predictive and prognostic effect of the TRN signals for immunotherapy-treated patients with NSCLC. Using transcriptomic data for patients with NSCLC (n = 439) from two randomized clinical trial cohorts treated with a single anti-PD-L1 antibody (atezolizumab), we provide evidence for the association of TRNs with therapy failure. Although not statistically significant for the chemotherapy arm, the similarity of the graphs for both drugs fits the general paradigm that neutrophil infiltration is associated with worse anti-tumor outcomes.
Beyond these biological insights, the results from this study have also important implications. Specifically, the diversity and plasticity of TRNs shown here further underscore the necessity to reevaluate the rationale for targeting neutrophils to overcome ICI therapy resistance in combination therapies using CXCR1 and CXCR2 antagonists and other inhibitors. 88 As shown here, TANs can acquire antigen-presenting properties, and such conversion of abundant neutrophils to antigen-presenting cells could overcome the limitations of the low abundance of crosspresenting DCs. 87 We advocate that rigorous approaches are required to analyze the impact of the TRN diversity and plasticity on tumor immunity in NSCLC and possibly in other cancers.
Our study, however, has several limitations. First, NSCLCs show a great intratumor heterogeneity, and the sampling location (e.g., tumor core versus tumor margin) may affect the cellular composition, particularly the in case of biopsies being compared with tissue pieces from lobe resections. With the exception of one study, 3 all studies incorporated in our atlas applied single regional sampling without annotation of the exact sampling area. Thus, our analyses do not take into account this variable, and we therefore advocate that future studies should include information about the sampling location. Second, in many cases, mRNA is not a definite proof of the extent to which a protein is expressed, and information on both RNA and protein expression is necessary for getting a complete picture of gene regulation and single-cell heterogeneity. In neutrophils, this particularly regards granule protein expression that varies throughout granulopoiesis, not always strictly correlating with mRNA expression. 89 This could partly explain the above-mentioned conflicting literature results of TAN phenotypes. And third, albeit the association of the TRN signature with anti-PD-L1 treatment failure was analyzed using transcriptomic data for patients with NSCLC, prospective studies are required to show that the TRN signature indeed represents a bona fide anti-PD-L1 therapy outcome-predicting marker rather than being a negative prognostic marker.
In conclusion, we provide a NSCLC atlas with single-cell resolution as well as a web portal that enables interactive exploration of the dataset through cell-x-gene (https://luca.icbi.at) that allows visualization of metadata and gene expression. The biological insights we present here and future discoveries arising from the exploitation of the high-resolution NSCLC atlas could provide the basis for developing combination therapies for patients with NSCLC who are not sufficiently responding to immune-checkpoint blockers.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:  protocol (sample-tag staining at room temperature for 20 min and washing by centrifugation at 400 g for 5 min). Single-cell isolation in microwells (cell load: 20 min incubation at room temperature) with subsequent cell-lysis and capturing of poly-adenylated mRNA molecules with barcoded, magnetic capture-beads was performed according to the manufacturer's instructions. Beads were magnetically retrieved from the microwells, pooled into a single tube before reverse transcription. Unique molecular identifiers (UMIs) were added to the cDNA molecules during cDNA synthesis. Whole transcriptome amplification (WTA) and sample-tag sequencing libraries were generated according to the BD Rhapsody single-cell whole-transcriptome amplification workflow. The quantity and quality of the sequencing libraries was analyzed with the Qubit dsDNA HS (High Sensitivity) assay kit (Invitrogen) and the 4200 TapeStation (Agilent) system. Libraries were sequenced on the Novaseq 6000 system (Illumina) targeting a sequencing depth of 50.000 reads/cell.  Table S7. After washing and addition of 5 mL 7-AAD, the cells were measured on a FACSymphony A5 flow cytometer (BD Biosciences). Data were analyzed using FlowJo v10.7 software. For details of the gating strategy see Figure S7.  Table S7). The staining procedure was performed using an automated staining system (BOND-RX, Leica Biosystems). All markers were sequentially applied and paired with respective Opal fluorophores (Table S7). To visualize cell nuclei, the tissue was stained with 4',6-diamidino-2-phenylindole (spectral DAPI, Akoya Biosciences). Stained slides were scanned using Mantra 2 Quantitative Pathology Workstation (Akoya Biosciences) and representative images from each tissue were acquired with the Mantra Snap software v1.0.4. Spectral unmixing, multispectral image analysis and cell phenotyping was carried out using the inForm Tissue Analysis Software v2.4.10 (Akoya Biosciences). In short, DAPI staining was used to segment cells. The perinuclear area (the defined 4-pixel area around nuclei) was therefore defined to be cell cytoplasm. Thereafter, the total cell area was evaluated for nucleic/cytoplasmic/membrane marker expression. The inForm buildin algorithm for cell phenotyping was used to define the intensity threshold for the positivity of each marker individually and each cell was characterized/phenotyped by presence/absence of the marker.

Immunohistochemistry
Staining of the validation cohort was approved by the Internal Review Board of the University of Luebeck (file number 16-277). Tissue microarrays (TMA) were constructed from formalin-fixed paraffin-embedded (FFPE) tumor blocks originating from surgical samples. In short, for TMA construction each sample was represented in triplicates of 0.6 mm diameter cores. A tumor sample was incorporated in further analysis if at least one core was evaluable. The validation cohort included 55 chemo-naive LUAD and LUSC, respectively, with no history of previous malignancies or history of receiving chemotherapy or radiotherapy. Immunohistochemistry (IHC) staining was performed according to the manufacturer's instructions, using the Ventana Discovery (Ventana Medical System) automated staining system. Slides were incubated with a primary antibody against CD4 (CD4 SP35, Ventana, RTU) to detect CD4 + lymphocytes and with a primary antibody against CD68 (KP1, Ventana, RTU) to detect macrophages (Table S7). They were further stained with chloroacetate-esterase (Naphthol-AS-D-chloracetate, Serva) to highlight neutrophils. To evaluate the immune cell infiltration, CD4 + lymphocytes, macrophages and neutrophils were counted in three high power fields (HPFs) per core, meaning that up to nine HPFs per case were assessed. Two experienced pathologists (SP and CK) performed an independent evaluation of the slides.
Staining of CXCR2 to detect neutrophils was performed using the rabbit antibody against CXCR2 (EPR22301-103, Abcam) diluted in a Primary Antibody Diluent Buffer (Primary antibody diluent, Leica Biosystems). The staining protocol included a standard antigen retrieval step with CC1/pH9 buffer (Discovery CC1, Ventana), incubation with the primary antibody for 0.5 h at room temperature (RT). Antibody staining was detected by DAB. Neutrophils were counted in 5 HPFs.

Generation of the core atlas
The data for the core atlas was from previously published NSCLC studies, 3-8,10,12,15,90,91 our own data (n = 3), and 7 studies for control purpose. [91][92][93][94][95][96][97] The selected studies were published between July 2018 and May 2021 and the incorporated datasets were generated with six different sequencing platforms, including the most commonly applied 10x Chromium (10x Genomics) as well as Smart-seq2, 98 GEXSCOPE (Singleron), inDrop 99 and Drop-Seq. 100 We further integrated our own data generated with the microwellbased BD Rhapsody scRNA-seq platform (BD Biosciences). We specifically selected studies using comparable protocols for sample processing and data generation, such as sequencing of whole cells. We did not exclude studies that applied flow cytometry-based cell-sorting prior to sequencing, as these incorporate relevant information on rare cell types. 6,9,15 From non-NSCLC studies we exclusively included those parts of the published data that were relevant for our atlas: from the Madissoon dataset 92 we only included lung samples, from Adams, 95 Reyfman, 93 Habermann, 96 Vieira Braga, 94 and Mayr 90 datasets we only used the control samples (including normal lung tissue of tumor patients, which we termed ''normal_adjacent'' or lung tissue of organ donors without history of pulmonary disease). From the Adams et al. dataset we also included data from patients (n = 18) with chronic obstructive pulmonary disease (COPD) as chronic inflammatory pulmonary disease cohort with an increased lung cancer risk.
Preprocessing and quality control of scRNA-seq data We distinguish between studies (i.e. a scientific publication) and datasets (i.e. scRNA-seq samples that were generated using the same sample preparation and the same experimental platform). Each study may contain one or multiple datasets. Demultiplexed FASTQ files of the UKIM-V datasets were merged and processed using the Seven Bridges Genomics cloud server with the BD Rhapsody WTA Analysis Pipeline. Samples from the studies Chen_Zhang_2020, Guo_Zhang_2018, He_Fan_2021, Lambrechts_ Thienpont_2018 and Maynard_Bivona_2020 were obtained as raw fastq files from the identifiers specified in the key resources table. Smart-seq2 data were processed using the nf-core RNA-seq pipeline 101,102 with the GRCh38 reference genome and GENCODE v33 annotations. 10x datasets were processed with cellranger v5.0.0 (10x Genomics) and the GRCh38-2020-A reference database as provided by 10x Genomics. All other datasets were obtained as count tables from their respective identifiers. All datasets were loaded into AnnData containers 103 with consistent structure. Quality control was performed with scanpy 104 by thresholding the number of detected genes, counts and the fraction of mitochondrial reads. Thresholds were determined per dataset by visual inspection of the distributions and are listed in Table S8.
Integration of scRNA-seq datasets Individual datasets were merged into a single AnnData object. Since genome annotations partly differed between the datasets, we re-mapped gene identifiers on the latest version of HGNC gene symbols using the https://mygene.info API. 105 In case of duplicate gene symbols, the one with the maximum read count was retained. If gene symbols were missing from a dataset, the values were filled with zeros. Gene symbols that were missing in more than 5 datasets (25%) were excluded altogether.
We integrated the datasets using the scANVI algorithm, 106,107 as it has been demonstrated to be one of the top-performing methods for atlas-level integration and to scale to >1M cells. 108 Since scANVI requires cell-type annotations for at least one of the input datasets, we manually annotated two ''seed'' datasets based on unsupervised clustering as described below. We chose Lambrechts_Thienpont_2018_6653 and Maynard_Bivona_2020 as seed datasets as they were not experimentally enriched for specific cell-types and were sequenced on two platforms with very different characteristics (10x and Smart-seq2). Raw counts were used as input for scANVI. The Smart-seq2 counts were scaled by the gene length as recommended on the scvi-tools website. The scANVI model was initialized with a pre-trained scVI model, 109 as recommended in the scvi-tools tutorial. The scVI model was trained on the 6000 most highly variable genes as determined with scanpy's 104 pp.highly_variable_genes with parameters flavor=''seurat_v3 00 and batch_key=''dataset''. Each sample was considered as an individual batch for both scVI and scANVI. Other than that the algorithms were run with default parameters.

Doublet-detection
For droplet-based scRNA-seq datasets we ran the SOLO algorithm 110 to computationally detect multiplets. We chose SOLO over other doublet detection methods as it is readily integrated into scvi-tools, 107 and was found to be one of the top-performing methods in an independent benchmark. 111 We used the SOLO implementation from scvi-tools and initialized SOLO with a pre-trained scVI model.

Unsupervised clustering and cell-type annotation
We computed UMAP embeddings 112 and unsupervised Leiden-clustering 113 with scanpy, 104 based on a cell-cell neighborhood graph derived from scANVI latent space. Coarse, lineage-specific clusters were iteratively sub-clustered to identify cell-types at a more fine-grained resolution. Cell type clusters were annotated based on previously reported marker genes 92,114,115 (Figure S1A). CD8 + T cell subclusters were annotated based on gene sets from Oliveira et al. 22 Integrating additional datasets Two datasets, Leader_Merad_2021 and UKIM-V-2, were added after the completion and annotation of the core atlas. The datasets (''query'') were projected onto the atlas (''reference'') using scArches 29 as implemented in scvi-tools. 107 scVI and scANVI models were re-trained on the fully annotated, doublet-filtered core atlas, with the parameters recommended for scArches: use_layer_ norm=''both'', use_batch_norm=''none'', encode_covariates=True, dropout_rate=0.2, and n_layers=2. Gene-symbols of the query datasets were re-mapped as described above and missing gene symbols filled with zeros. For each query dataset, scArches yielded an embedding in the same latent space as the core atlas. Based on the joint latent space, a neighborhood graph and UMAP embedding were computed for the ''extended'' atlas. Cell-types were annotated automatically, based on a majority vote of nearest ll OPEN ACCESS Article neighbors. To this end, let C be the pairwise weighted connectivity matrix of the scanpy neighborhood graph computed on the scArches embedding. Then, the transitive connectivity matrix C 0 (i.e., including connections to neighbors of neighbors) is defined as C 0 = C$C where the dot operator refers to the matrix product. Let further Q be the set of all query cells, R the set of all reference cells, and T the set of all cell-types. Then, for every cell q˛Q the cell-type is determined as argmax t˛T X r˛R sðt; rÞ C0 qr where the indicator function sðt; rÞ is 1 if cell r is of type t and 0 otherwise. The transitive connectivity matrix C 0 was chosen over C to increase robustness by increasing the number of neighbors, and to ensure that every cell from the query has connection to a cell in the reference.
Comparing cell-type abundances Since comparing cell-type fractions between groups is challenging due to different characteristics of the datasets and the inherent compositional nature of cell-type fractions, we applied the scCODA 30 model, which addresses this issue. We were interested in the differences between conditions (LUAD vs. LUSC). To this end, we ran the scCODA model with the formula $ condition + tumor_stage + dataset with 500,000 iterations using ''cancer cells'' as the reference cell-type, where tumor_stage is a binary vector classifying datasets into early (stages I-II) and advanced (stages III-IV), and dataset is a categorical vector encoding the different datasets. For the comparison, we excluded the Guo_Zhang_2018 dataset, which only contains T cells. The final result shows credible effects with a false-discovery-rate (FDR) of 0.1.

Patient stratification
We stratified patients into immune phenotypes based on immune cell-type fractions. We selected all patients with primary tumor samples and excluded the Guo_Zhang_2018 dataset, because it contains only T cells. Neutrophil fractions were excluded, since they are not appropriately captured in the majority of datasets. Cell-type fractions of primary tumor samples were loaded into a patient 3 cell-type AnnData container. Dataset-specific batch-effects were removed using a linear model as implemented in scanpy.pp.regress_out. Patients were clustered using graph-based Leiden clustering with the ''correlation'' distance metric for computing the neighborhood graph. Patient clusters were labeled according to their predominant cell-types. In addition to the histological subtypes based on the annotation of the original datasets, we annotated tumor types based on the transcriptomics data according to the most abundant cancer cell cluster.

RNA velocity analysis
We performed RNA-velocity analysis on the UKIM-V dataset using velocyto.py 116 and scvelo. 117 BAM files as generated by the BD Rhapsody WTA analysis pipeline were preprocessed with samtools 118 to make them compatible with velocyto.py (see preprocessing/bd_rhapsody/velocyto.nf in our git repository for more details). Loom files generated by velocyto.py were loaded into scvelo to estimate and visualize RNA velocities according to the scvelo tutorial. Partition-based graph abstraction (PAGA, 73 ) was computed based on the RNA velocity graph, using neutrophil subclusters as grouping variable and the option minium_spanning_ tree=False. The result was visualized as a graph showing the transition confidences as directed edges.

Differential gene expression testing
We used DESeq2 119 on pseudo-bulk samples for differential expression testing which has been demonstrated to perform well and properly correct for false discoveries. 120 For each cell-type and patient, we summed up transcript counts for each gene. Pseudo-bulk samples consisting of fewer than 10 cells were discarded. We compared primary tumor samples from LUAD vs. LUSC (condition), primary tumor samples from the patient subtypes M/B/T/desert (group), NANs vs. TANs (cell_type_tan_nan), and Neutrophil clusters (NAN1-3, TAN1-4), including the dataset as a covariate. For comparisons between multiple groups, we used contrasts with sum-tozero coding. p-values were adjusted for multiple hypothesis testing with independent hypothesis weighting (IHW). 121 Pathway, TF and cytokine signaling signatures We performed pathway, transcription factor (TF), and cytokine signaling analysis on primary tumor samples with PROGENy, 31,34 DoROthEA 33,34 and CytoSig, 36 respectively. Scores were computed using the dorothea-py and progeny-py packages. The top 1,000 target genes of the progeny model were used, as recommended for single-cell data. For dorothea, only regulons of the highest confidence levels ''A'' and ''B'' were used. The cytosig signature matrix was obtained from the data2intelligence/CytoSig GitHub repository and used with the scoring function implemented in the progeny-py package. The methods were run with the options num_perm=0, center=True, norm=True scale=True, and min_size=5. No permutations were used, as we perform statistics in a separate step at the level of biological replicates. Pathway-, transcription factor-, and cytosig scores were then compared between condition (LUAD vs. LUSC) and patient group (T vs. B vs. M vs. deserted) using an ordinary least-squares (OLS) linear model, as implemented in the statsmodels package. 122 Scores were aggregated into pseudobulk samples by computing the mean of each variable for each patient and cell-type. Samples consisting of less than 10 cells were discarded. For each variable, we fitted a model with the formulas $ condition + dataset + tumor_stage or $ group + dataset + condition + tumor_stage, respectively. Coefficients were obtained from the linear model and p-values calculated with the f-test. p-values were adjusted for multiple testing with the Benjamini-Hochberg procedure.

Cellphonedb analysis
We used the cellphonedb (CPDB) database 123 as obtained from omnipathdb 124 to investigate differences in cell-to-cell communication in primary tumor samples. The original CPDB algorithm performs statistical comparisons based on a permutation test which is designed to find differences between cell-types. For our study, on the other hand, we were interested in differences between conditions, using patients as biological replicates. Therefore, we followed an approach similar to the degs_analysis mode recently added to CPDB v3 125 : For each cell-type of interest, we considered the list of significantly differentially expressed signaling molecules in CPDB (ligands or receptors, for outgoing and incoming interactions, respectively). For each of those differentially expressed signaling molecules and for each cell-type, we determined interaction partners that are potentially affected by that change, as those that are expressed in at least 10% of the cells in a certain cell-type. Differentially expressed signaling molecules were determined with DESeq2 as described above. The fraction of cells expressing a signaling molecule was computed as the mean of fractions per patient, to avoid biases due to different cell-counts per patient.

SCISSOR analysis
We used SCISSOR 126 to associate phenotypic data from bulk RNA-seq experiments with our single-cell data. TCGA mutation and gene expression data was obtained from the GDC portal, survival data from. 127 SCISSOR was run on primary tumor cells of each patient individually according to the SCISSOR tutorial using mutation data (logistic regression) and overall survival (cox-regression) as dependent variables. A grid search for the alpha-parameter was performed in 2 À i=2 with i˛½24; 23; :::; 2 and a cutoff parameter of 0.3. 21 of 176 samples with low overall cell count failed during SCISSOR's Seurat-preprocessing step and were excluded from the subsequent analysis. For each patient and cell-type, we computed the fraction of scissor + cells (i.e. positively associated with a mutation or worse survival), scissor-cells (i.e. negatively associated), and neutral cells and added a pseudo-count of 0.01. A sample was excluded from a cell-type if it contributed % 10 cells. For each cell-type, we computed the log2-ratio of scissor+ and scissor-cells as the mean fraction of scissor + cells vs. the mean fraction of scissor-cells. Significant differences were determined by comparing the fractions of scissor+ and scissor-cells with a paired wilcoxon test with zero_method=''zsplit'' as implemented in the scipy package. p-values were Benjamini-Hochberg-adjusted and considered significant at an FDR <0.01.

TRN clusters
For an unbiased discovery of TRN subtypes, we performed unsupervised clustering of all cells annotated as neutrophils. The neighborhood graph was computed with scanpy.pp.neighbors with n_neighbors=30 based on the scANVI latent space. Clusters were determined with scanpy.tl.leiden with resolution=0.75. Two subclusters dominated by cells from normal adjacent tissue were labeled normal-associated neutrophils (NAN) 1, 2, and 3, whereas four subclusters of cells from primary tumor samples were labeled tumorassociated neutrophils (TAN) 1, 2, 3 and 4.

TRN signatures
Gene signatures for TRN and TRN clusters were determined based on fold-change (FC), specific fold-change (sFC), and area under the receiver operator characteristics curve (AUROC), applying an approach previously used to find cell-type-specific marker genes. 78 We have previously shown the resulting gene signatures to be highly specific for their respective cell-types. 128 To avoid marker genes being biased towards samples contributing more cells than on average we aggregated single cells to pseudo-bulk samples 120 by patient before deriving marker genes. For each set of marker genes derived, pseudo-bulk samples were generated by summing up raw counts for each patient and cell-type of interest. The resulting samples were normalized to counts per million (CPM) and log2-transformed with scanpy. pp.log1p(adata, base=2). Pseudo-bulk samples consisting of fewer than 10 cells were discarded. For each gene and cell-type, FC and sFC were computed as described in. 78 AUROC was computed using roc_auc_score as implemented in scikit-learn. For identifying marker genes for the 7 neutrophil subclusters, we applied a permissive cutoff of sFC >1 and FC > 1.5 and ranked genes by AUROC. For the TAN and NAN signature used to compute signature scores in bulk RNA-seq data, we empirically determined optimal cut-offs by grid search and cross-validation: First, the single-cell input data were randomly split by patients into 80% training data and 20% independent test set. On the training data, five-fold cross validation was performed. On the training set of each fold, metrics were computed as described above and all possible combinations of sFC˛{0.5, 0.6, ., 2.9}, FC {0.5, 0.6, ., 2.9} and AUROC˛{0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.96, 0.97} were tested, resulting in a total of 5,000 possible signatures. On the test set of each fold, a pseudo-bulk sample per patient (mixing all cell-types) was generated and the true fraction of the cell-type of interest calculated. The quality of each signature was evaluated as the Pearson correlation between the signature score (see section ''Signature scoring in bulk RNA-seq samples'' below) and the true cell-type fraction. The cut-off with the highest average correlation across the five folds was chosen as optimal. Finally, the signature was re-calculated on the entire training set using the optimal cut-off, and a final Pearson correlation determined on the independent test set. We defined a TRN signature to capture Neutrophils independent of their subtype as the union of the TAN and the NAN signature genes.
Signature scoring in bulk RNA-seq samples Signature scores in scRNA-seq data were computed using scanpy.tl.score_genes. Bulk RNA-seq primary tumor samples samples of TCGA LUAD and LUSC were retrieved as TPM from the GDC portal. Bulk RNA-seq samples from NSCLC patients treated with atezolizumab (anti-PD-L1) or docetaxel (chemotherapy) from the POPLAR 79 and OAK 80 trials were retrieved using the accession numbers reported in. 81 Similar to an approach previously described, 11 enrichment scores for our TRN signatures were calculated ll OPEN ACCESS

Article
Cancer Cell 40, 1503-1520.e1-e8, December 12, 2022 e7 as follows: For all signature genes, z-scores were computed across all samples from a dataset. The final signature score was defined as the mean of the z-scores of the signature genes for each sample. Associations of the TRN signature with response to immunotherapy or chemotherapy in the POPLAR and OAK datasets was tested using logistic regression in R with the formula response $ signature_score + tumor_type + dataset, where tumor_type represents LUAD and LUSC encoded as a binary vector.

Survival analysis
Survival analysis was performed using CoxPH-regression as implemented in the R package survival. Kaplan-Meyer plots were created using the R package survminer, showing the top 25% vs. bottom 25% of samples stratified by signature score. B cell fractions in TCGA samples were estimated using EPIC 129 as implemented in immunedeconv, as we have previously shown EPIC to be one of the best performing methods on B cells. 128 Cox-regression was performed on B cell fractions (TCGA data) with the formula survival $ signature_score + ajcc_stage + age, where ajcc_stage is a categorical vector with tumor stages I-IV. For neutrophil fractions (POPLAR + OAK data) the formula survival $ signature_score + dataset + treatment was used. For comparisons comprising the entire NSCLC cohort (i.e. both LUAD and LUSC), tumor_type was included as an additional covariate.

QUANTIFICATION AND STATISTICAL ANALYSIS
Statistical analysis was performed using the statsmodels library in Python (scRNA-seq data) or GraphPad Prism (flow cytometry and imaging data) using a linear model, t-test or wilcoxon test as appropriate. Single cell-data were aggregated into pseudobulk samples by biological replicates. Compositional analysis of cell-type fractions was performed using scCODA; survival analysis using CoxPH regression in R. P-values for untargeted analyses (DE genes, TFs, or pathways) were FDR-adjusted. Significance levels and more details on the statistical tests are indicated in the figure captions.
Positive stained cells per 1000 cells are given. CXCR2 expression was analyzed by immunohistochemistry and quantified per high power field in patients with EGFR mutation compared to EGFR wt patients. Analyses related to the corresponding SCISSOR analyses are shown. The horizontal line represents the median, whiskers extend to the inter-quartile range (Wilcoxon test, ***p<0.001).
(D) Immunohistochemistry staining of neutrophils (ASD + cells) quantified per high power field in patients with EGFR mutation (n=11) compared to EGFR wt patients (n=26). The horizontal line represents the median, whiskers extend to the inter-quartile range (Wilcoxon test, *p<0.05).
(E-F) Association of cellular composition with overall survival for LUSC and LUAD patients.
(G-H) Kaplan-Meyer plot of LUAD and LUSC patients with high (top 25%) and low (bottom 25%) B cell fractions of TCGA lung cancer patients as determined by deconvolution with EPIC. P-value has been determined using CoxPH-regression using tumor stage and age as covariates.
P-values are derived from a t-test and not adjusted for multiple testing. In all boxplots, the central line denotes the median. Boxes represent the interquartile range (IQR) of the data, whiskers extend to the most extreme data points within 1.5 times the IQR.
(D) Neutrophils subcluster fractions in LUAD versus LUSC. Each dot refers to a patient with at least 10 neutrophils. P-values are derived from a t-test and not adjusted for multiple testing. In all boxplots, the central line denotes the median. Boxes represent the interquartile range (IQR) of the data, whiskers extend to the most extreme data points within 1.5 times the IQR.
(E) Flow cytometry analysis demonstrating the correlation between HLA-DR expression and CD83, LOX-1, CD181, CD62L or CD16 expression, respectively. Representative analysis of neutrophils derived from NSCLC normal-adjacent tissue (blue) and tumor tissue (red) are shown. (H) Kaplan-Meyer plot comparing patients treated with atezolizumab with high (top 25%) and low (bottom 25%) NAN signature scores. P-value has been determined using CoxPH-regression including cohort and histology as covariates.
(I) Kaplan-Meyer plot comparing patients treated with atezolizumab with high (top 25%) and low (bottom 25%) TAN signature scores. P-value has been determined using CoxPH-regression including cohort and histology as covariates.
(J) Predictive value of cell-type signatures in bulk RNA-seq data from the OAK 80 and POPLAR 79 cohorts of NSCLC patients treated with atezolizumab (anti-PD-L1). The bar charts show cell-type signatures that are associated with worse (log hazard ratio > 0) or better (log hazard ratio < 0) survival at an FDR < 0.1. The hazard ratio and p-values have been determined using CoxPH regression including cohort and histology as covariates.