Human and Mouse Transcriptome Profiling Identifies Cross-Species Homology in Pulmonary and Lymph Node Mononuclear Phagocytes

SUMMARY The mononuclear phagocyte (MP) system consists of macrophages, monocytes, and dendritic cells (DCs). MP subtypes play distinct functional roles in steady-state and inflammatory conditions. Although murine MPs are well characterized, their pulmonary and lymph node (LN) human homologs remain poorly understood. To address this gap, we have created a gene expression compendium across 24 distinct human and murine lung and LN MPs, along with human blood and murine spleen MPs, to serve as validation datasets. In-depth RNA sequencing identifies corresponding human-mouse MP subtypes and determines marker genes shared and divergent across species. Unexpectedly, only 13%–23% of the top 1,000 marker genes(i.e., genes not shared across species-specific MP subtypes) overlap in corresponding human-mouse MP counterparts. Lastly, CD88 in both species helps distinguish monocytes/macrophages from DCs. Our cross-species expression compendium serves as a resource for future translational studies to investigate beforehand whether pursuing specific MP subtypes or genes will prove fruitful.


INTRODUCTION
Mice are the most commonly used organisms to study human diseases; they breed rapidly and can be genetically modified, accelerating the pace of discovery. A number of similarities with humans suggest that treatments developed in mice may apply to humans. For example, both species have an immune system that contains mononuclear phagocytes (MPs), granulocytes, innate lymphoid cells, and lymphocytes (Masopust et al., 2017). Immune checkpoint blockades, a Nobel prize-winning discovery, were first observed in the immune system of mice and then translated for therapeutic use in humans (Alt-mann, 2018); and the development of granulocyte-macrophage colony-stimulating factor (GM-CSF) therapy for pulmonary alveolar proteinosis (PAP) was influenced by observations in mice, in which the deficiency of GM-CSF results in PAP (Trapnell et al., 2009).
In spite of these encouraging examples, numerous failures in drug development across multiple diseases make it clear that ro dent-to-human translation remains an important challenge (Begley and Ellis, 2012). One key obstacle is the inability to identify functional cellular and molecular homologs in humans and mice. Often, the targets identified in mice do not function similarly in humans. Conversely, targets identified in humans are frequently reverse translated into the study of inappropriate murine targets based on naive assumptions about homology. Therefore, assessing the transcriptome alignment of cross-species humanmouse MPs is crucial.
If equivalent classes of human and murine MP subtypes and their transcriptional markers can be established, studying how murine MPs interact with one another to maintain homeostasis and resolve inflammation could help us to decipher the mechanisms that go awry in human diseases. Murine MPs have been extensively characterized and demonstrate a clear division of labor during innate and adaptive immunity Desch et al., 2014;Gibbings et al., 2017;Atif et al., 2015Atif et al., , 2018bKim et al., 2014;McCubbrey et al., 2018;Tussiwand et al., 2015). Pulmonary murine MPs consist of alveolar macrophages (AMs), tissue trafficking monocytes, dendritic cell (DC) subtypes (Tussiwand et al., 2015;Murphy, 2013) and three interstitial macrophages (IMs). Two IMs, IM1 (CD206 hi MHCII + ) and IM2 (CD206 hi MHCII lo ), display classical macrophage characteristics. The third IM, IM3 (CD206 lo MHCII + ), displays macrophage properties, but has a higher rate of turnover and expresses pro-inflammatory, monocytic, and DC genes (Gibbings et al., 2017;Schyns et al., 2019;Chakarov et al., 2019). Little is known about the functional contribution of IMs to homeostasis, inflammation, or disease, partly due to difficulties in their identification (Ural et al., 2020;Lim et al., 2018). Improving our knowledge of the IM transcriptome is important since they appear to maintain their transcriptional signature across multiple organs, as do DCs (Gibbings et al., 2017;Tamoutounour et al., 2012Tamoutounour et al., ,, 2013Epelman et al., 2014;Plantinga et al., 2013).
DCs are potent antigen-presenting cells that link innate and adaptive immunity. In the periphery, DCs can acquire pathogens, traffic through lymphatic vessels to draining LNs, and present exogenous antigens to cognate T cells, precipitating adaptive immunity (Vermaelen et al., 2001;Jakubzick et al., 2006Jakubzick et al., , 2008Desch et al., 2011;Atif et al., 2018b). In mice, there are two main DC types, Irf8 + Batf3 + DC1 and Irf4 + DC2. These DCs express distinct transcriptional factors that regulate their development, antigen acquisition, processing, and presentation capabilities. Although many of the functional cell types and properties of pulmonary MPs have been defined in mice, an assumption made is that similar functional cell types exist in the human lung.
Here, we created a gene expression compendium by in-depth RNA-seq to establish the nondiseased transcriptional profile of 15 distinct human and 9 distinct murine MPs, defined by our group and others (Desch et al., 2016;Gibbings and Jakubzick, 2018a;Yu et al., 2016;Yu and Tighe, 2018). The primary focus is pulmonary MPs, especially the understudied LN MPs. However, we also include human BL and murine spleen MPs as control tissues to validate the analytical quality of cross-species pulmonary MP analysis, as homology has been more extensively characterized. Interestingly, the overlapping percentage of the top 1,000 marker genes defined for the homologous pairs, human BL and murine splenic MPs, were similar to those observed for lung and LN MP cross-species counterpart pairs. Overall, we(1) developed a strategy to determine the cross-species counterpart for each MP subtype, (2) compared MP subtypes' global expression profiles, (3) contrasted specific sets of MP subtype marker genes, and (4) demonstrated how the compendium and subtype analysis can be used to develop new cell surface markers to distinguish macrophage/monocytes versus DC lineages.
In the mouse, we isolated 9 distinct murine MPs from 4 tissue locations: bronchoalveolar space, lung tissue and draining LNs, and spleen ( Figures 1B and S2). We isolated 4 lung macrophage populations (AM, IM1, IM2, and IM3) and 2 LN migratory DCs (CD103 + DC1 and CD11b + DC2), distinguished from lymphoid resident DCs by high major histocompatibility complex class II (MHC class II) expression ( Figure 1B). In addition, 2 splenic DC populations (CD8 + DC1 and CD11b + CD4 + DC2) and Ly6C hi monocytes were sorted. Since mice lack detectable DCs in circulation, splenic DCs were included in this study as the comparators to human BL DCs (Haniffa et al., 2012;Bachem et al., 2010;Schlitzer et al., 2013;Reynolds and Haniffa, 2015;Ingersoll et al., 2010). The spleen lacks afferent lymphatics exclusively representing a source of resident, lymphoid DCs, in contrast to tissue experienced, migratory DCs. The total number of murine MP samples sorted and analyzed by RNA-seq was 27, with 3 replicates per MP subtype.

Confirming Human MP Subtypes Based on Defined Transcriptional Profiles
To validate our designations of sorted human MPs as either macrophages, monocytes, or DCs, we examined the expression of classical macrophage/monocyte and DC genes ( Figure  2) (Scott et al., 2014;Wu et al., 2016;Satpathy et al., 2012). As anticipated, most of the macrophage/monocyte genes such as MAFB, Fcg receptors, lysosomal proteases, and associated membrane proteins were most strongly expressed in human MP subtypes putatively labeled as macrophages or monocytes ( Figure 2) (Tussiwand et al., 2012;Satpathy et al., 2012;Wu et al., 2016). Conversely, classical DC genes such as HLA-DR, CD1C, CCR7, CD40, CD80, LAMP3, IRF4, IRF8, FLT3, and ZBTB46 were strongly expressed in human pulmonary and LN MPs, putatively labeled as DCs ( Figure 2) (Wu et al., 2016;Satpa thy et al., 2012). In fact, pulmonary and LN DCs showed a higher relative expression of classical DC genes than did BL DCs. For example, Figure 2 shows a high expression of CCR7, a key DC signature gene, in pulmonary and LN DCs. DCs require murine MP subtypes, we aligned human and murine MP expression data, borrowing methods originally developed for single-cell RNA-seq (scRNA-seq) to account for speciesspecific effects on gene expression (Butler et al., 2018). Canonical correlation analysis (CCA), followed by alignment and t-distributed stochastic neighbor embedding (t-SNE) were applied to quantile-normalized human and murine MP data ( Figures S3 and S4) (Bolstad et al., 2003). The unbiased analysis used 885 genes shared between human and mouse from among the top 2,000 most highly variable autosomal genes in either species, capturing similarities among global gene expression profiles without knowledge of MP subtype identity.
The combined human-mouse t-SNE plots illustrate that cell populations with similar profiles included a mix of human and mouse cells and multiple tissues of origin, indicating that species and tissue type were not the main sources of variation in gene expression ( Figure  3A). Crucially, the transcriptional profiles were instead clustered by MP cell type, with AMs, monocytes, and DCs forming tight clusters, followed by IMs ( Figure 3A). To facilitate comparisons, samples were further grouped into six groups, labeled with an indication of their lineage ( Figure 3B). Group 1 contains AMs from BAL. The IM group (group 2) separates from AMs, as previously observed in mice (Gibbings et al., 2017). Group 3 (monocyte [Mono]) is a pure cluster of monocytes across species, sampled from lung, BL, and spleen. Despite the fact that AM and IM share the expression of a number of macrophage core signature genes, their overall gene expression is strongly influenced by their local environment, alveolus versus interstitium (Gautier et al., 2012;Gibbings et al., 2017). Group 2 also includes CD14 + monocytes from the LN. This likely reflects their shared origin as monocyte-derived populations that matured in the lung environment.
The DCs are distributed across three distinct groups: group 4 LN DCs, group 5 DC2, and group 6 DC1 ( Figure 3B). LN DCs in group 4 form a distinct cluster, clearly separated from the other DCs, suggesting a maturation or functional program distinct from BL or splenic DCs. Group 5 DC2 contains the human lung/blood DCs and murine splenic CD4 + DCs, suggesting an overall DC2 (inter-feron regulatory factor 4 [Irf4] dependent) lineage. Surprisingly, group 5 also contains human lung CD1c + IMs and murine lung IM3s. These subtypes were isolated using classical IM cell surface markers, such as CD64, CD36, and MerTK, yet comparison of their global expression patterns suggests that this IM subtype in humans and mice also exhibit some DC programming. This draws a parallel to the Langerhans cell in skin, which has both macrophage and DC-like functions and expression of macrophage and DC-specific transcription factors, Mafb and Zbtb46 (Wu et al., 2016).
Lastly, group 6 DC1 contains human BL Clec9a + DCs and murine splenic CD8 + DCs, the expected human-to-mouse counterparts of the Irf8, Batf3-dependent DC1 lineage (Hildner et al., 2008). Overall, the unbiased computational analysis of global gene expression data demonstrates that across species, the majority of MP subtypes group by cell type-AM, IM, Mono, or DC lineage-across species.

Marker Gene Analysis Identifies the Closest Cross-Species Homology per MP Subtype
We develop a novel, robust strategy to determine the maximally corresponding MP subtype between human and mouse. Other methods to establish homologous MP populations have relied on global correlation (Travaglini et al., 2020) or marker gene set enrichment , which can be affected by the set of genes used in the calculations. For instance, genes with low expression levels and housekeeping genes can bias global correlations so that cell types appear similar, failing to discriminate ( Figure S5A). Strong correlations across multiple cell types can occur even when using only highly variable genes and may disagree with the global correlations ( Figure S5B). Marker gene sets often overlap for closely related cell types, especially when not accounting for the level of uniqueness of expression for that cell type. Thus, enrichment analysis based on intersections of unranked marker gene lists can obscure the dominant functional homolog ( Figure S6A). If the closest homolog is chosen as the cell type with the largest overlapping marker set, the closest homolog using the top 50 marker genes may be different from the closest homolog using the top 1,000 ( Figure S6B). Ideally, the best homolog would be the cell type that continues to have the largest overlap, even as the size of the (ordered) marker set increases. Our approach captures this intuition to determine the maximally corresponding MP subtype between species.
Our approach prioritizes potential marker genes for each MP subtype within each species. Marker genes should be highly and preferentially expressed within a single subtype over all other subtypes within a species. In total, 15 marker gene sets for human and 9 marker gene sets for mouse are evaluated from a dataset consisting of 11,959 genes across 93 samples. Each gene is ranked for each subtype by a score that multiplies two factors: (1) the level of expression in a given subtype relative to the median expression over all subtypes and (2) how differentially expressed the gene is in that subtype over all other subtypes (Rosenthal, 1978).
To achieve a high score, marker genes for each of the 15 human and 9 murine subtypes must be differentially expressed with respect to each remaining subtype within their species, in contrast to other marker gene identification methods that consider the subtype versus the union of all of the samples from the other subtypes. Note that as a result, core signature genes at the level of macrophages, monocytes, or DCs will not be considered highly ranked, since those genes are shared among the multiple related subtypes, and thus receive a low score for differential expression. In this way, we take advantage of the granularity at which the subtypes were isolated to find the maximally corresponding homolog based on marker genes essentially unique to that subtype.
Once marker genes are ranked, we can examine the degree of overlap of marker sets between species to determine the closest cross-species match. Our method takes into account the level of uniqueness of expression for that cell type and prioritizes cell types with strong overlap regardless of the size of the marker set considered. We develop a method based on correspondence-at-the-top (CAT) plots, previously used to compare expression results across experimental platforms (Irizarry et al., 2005). CAT plots visualize the intersection percentage of two ranked lists as the list size progressively increases ( Figure 4A). Using an MP subtype in one species as the reference, each MP subtype in the opposing species is compared to the reference to determine the percentage of overlap as the marker gene set size increases-in other words, comparing the top 5 genes, the top 10 genes, and so forth (examples illustrated in Figure 4A; all human and mouse MPs in Figures S7 and S8). As seen in the CAT plots, generally the curves stabilize and one cell type dominates as the best cross-species match within the top 1,000 genes (vertical dashed line).
One option for choosing the closest match between species would be the cell type with the maximum match at the top 1,000 genes. However, marker sets are ordered by degree of unique expression for the cell type, so maximizing at any one list size would not distinguish between a pair that agrees quickly in the ranking and maintains agreement up to that list size (i.e., strong, robust agreement among the most unique marker genes) from a pair that achieves high agreement only later in the ranking (i.e., includes mainly marker genes appearing in multiple cell types or more lowly expressed genes). To be robust to marker set size and determine which subtype quickly and consistently dominates, we instead chose to summarize the correspondence of each subtype pair across species as the mean percentage overlap of the corresponding marker gene lists for the pair across all top list sizes from 1 to 1,000, referred to henceforth as mean CAT overlap ( Figure 4B).
Overall, the mean CAT overlap ranges between 2% and 23%. The closest cross-species homolog for a given MP subtype is then chosen as that subtype in the opposing species with the highest mean CAT overlap, which ranges from 13% to 23%. A potential factor contributing to the low percentages is the fact that our ranked marker genes are selected to be those expressed exclusively in a single cell type. Core signature genes appearing in multiple cell types are discarded. Including these would substantially increase the overall percentage overlap but would be less meaningful because these genes are not diagnostic of one cell type. Of note, the clustering of MP subtypes seen in the previous t-SNE plot analysis of global expression is echoed in the CAT plot analysis of marker gene sets ( Figures  3B and 4B). This suggests that despite the low overlap by our strict marker gene definition, the same biological correspondences are found as in the unbiased analysis.

Human MPs Correspond to Expected Mouse MPs Except CD1c + IMs and LN Monocytes
Most of the human MPs align best with a single murine cell type. AMs correspond across species, with a 13% mean CAT overlap (Figures 4A and 4B, group 1 AM). Human BL and lung monocytes correspond directly to murine splenic monocytes with 21% mean CAT overlap (group 3 Mono). Human BL Clec9a + DCs are most similar to murine splenic CD8 + DCs, with 17% mean CAT overlap (group 6 DC1); unexpectedly, these Clec9a + DCs are not strongly aligned with murine LN CD103 + DCs (7% mean CAT overlap), as would be hypothesized based upon the shared Irf8, Batf3-dependent lineage of CD8 + splenic DCs and CD103 + LN DCs in the mouse. Instead, LN DCs are highly correspondent across species (group 4 LN DCs), suggesting that DC maturation state or tissue imprinting may have a greater effect on gene expression.
There are a few human MPs that share homology with more than one mouse MP subtype, however. Human IM HLADR and IM CD36 are the most similar to murine IM1 and IM2, with 13%-16% mean CAT overlap ( Figure 4B, Group-2 IM, and Figure S7), demonstrating correspondence across IMs, as expected. However, human IM CD1c is most similar to murine splenic CD4 + DC (20% mean CAT overlap) and less similar to its expected counterpart murine IM3 (12% mean CAT overlap) ( Figure 4B, group 5 DC2, and Figure S7). Note, however, inversely the murine IM3 is most similar to human IM CD1c, as anticipated ( Figure 4B and Figure S8).
Human LN monocytes correspond highly with murine IM1 and IM2 (21% mean CAT overlap) instead of splenic monocytes (13% CAT mean overlap, 3 rd highest) (Figures 4A and 4B, group 2 IM). This was also evident in the t-SNE plots ( Figure 3B), reflecting the shared characteristics of differentiated monocytes in tissue.

Murine MPs Match Expected Human MPs, Yet Exhibit Evidence of Dual Expression Programs
We also assessed the inverse correspondence, taking each mouse MP and identifying the most similar human MP (Figure 4 and Figure S8). Murine AMs ( Figure 4B, group 1 AM) are highly similar to human AMs (13% mean CAT overlap), especially when comparing the top 200 genes, thus sharing the most unique AM marker genes ( Figure 4A). However, among the top 200-1,000 genes, human IM CD1c and IM HLADR (group 2 IM) begin to overlap increasingly (13%-14% mean CAT overlap), suggesting that murine AMs share dual characteristics with human AMs and IMs. Murine LN DCs and splenic monocytes were most similar to human LN DCs and lung/BL monocytes, respectively (group 4 LN DCs ~20% mean CAT overlap and group 3 mono 21% mean CAT overlap), as anticipated.
Interestingly, where human BL Clec9a + DCs align most strongly with their expected counterpart mouse splenic CD8 + DCs, the inverse is slightly different, depending on the number of top genes analyzed (Figure 4, group 6 DC1). Averaging the percentage of overlap over lists of size 1-1,000, murine CD8 + DCs correlate best with the human lung CD206 − DCs (group 5 DC2, 21% mean CAT overlap) as opposed to Clec9a + DCs (17% mean CAT overlap). Similar to the results for murine AMs, murine splenic CD8 + DCs do match their expected homolog, Clec9a + DCs, best when considering only the first top 1-50 ranked genes ( Figure 4A). This suggests that the transcriptional program of mouse splenic DCs shares similarities with both human BL Clec9a DCs and lung CD206 DCs.
Overall, CAT plot analysis of marker gene sets highlights cross-species correspondences and defines the best overall counterpart for human and murine MPs within our MP subtypes ( Figure 4B). In cases in which the predicted counterparts do not match expectations, the analysis gave evidence that the marker genes of the expected homolog did appear early in the ranking(i.e., among the top 50 marker genes), suggesting that multiple MP expression programs exist in some subtypes when patterns across a larger set of top marker genes (i.e., 1,000) are analyzed.

Conserved and Distinct Genes Expressed in Well-Defined Human-Mouse Homologs
Expression data for the top marker gene sets are first examined in three well-defined humanmouse homologs: AMs, monocytes, and DC1 ( Figure 5). The functional properties of genes mentioned below are outlined in Table S1. Expression of the top 50 marker genes in the reference MP subtype is shown in the opposing species (Figures 5, 6, and 7). Note that the best match suggested visually among expression of just the top 50 genes may differ from the assigned best match maximizing mean CAT overlap computed from the top 1,000 genes, thus motivating the decision to highlight the top 3 closest subtypes of the opposing species.
As anticipated, signature genes that define these well-defined cell types are present, such as FABP4, MARCO, and PPARG for AMs ( Figure 5B, group 1 AM); S100A8, S100A9, and CD14 for monocytes ( Figure 5C, group 3 Mono); and XCR1, CADM1, and CLEC9A, along with a recently identified gene, WDFY4, required for cross-presentation for human BL Clec9a + DCs and murine splenic CD8 + DCs ( Figure 5D, group 6 DC1) (Theisen et al., 2018). Although we outline genes known to be shared across human-murine MPs, it is important to note that many genes are not shared. For example, differences in gene expression for human-mouse AMs include SERPING1, MME, IL17RB, and C1Q (not shown), which are solely expressed in human AMs and not mouse AMs; whereas SPP1, MMP8, and CARD11 are solely expressed in mouse AMs and not human AMs ( Figure 5B). Human and mouse DC1, blood Clec9a + , and splenic CD8 + DCs differentially express IDO1 or IDO, CD207 (langerin), and CCR7 ( Figure 5D).

Conserved and Distinct Genes Expressed in Less-Well-Defined Human-Mouse MPs
Since this method of analysis supports previously identified human-mouse MP homologs, we next assessed marker genes in less-well-defined homologs: pulmonary IMs, LN monocytes, and pulmonary/LN DCs. Human IM CD36 and IM HLADR are best matched with murine IM1, closely followed by murine IM2 (group 2 IM; Figure 4B). The two murine IMs have a classical IM signature and share gene expression for FOLR2, SEPP1, LYVE1, APOE, C1Q (not shown), CCL7, and CCL2 ( Figure 6A). An example gene not shared is SPP1, which is highly expressed in human IMs (and murine AMs), but not in murine IMs ( Figure 6A).
However, human IM CD1c matches best with murine splenic CD4 + DCs ( Figure 6A), based on their shared non-classical DC genes (i.e., cytoskeletal genes, CCR6, and cell-cycle genes CD1K1 and CDC20) ( Figure 6A). Similar to the alignment of monocytes and their implied state of maturation, human lung DCs align best with murine splenic CD4 + DCs compared to LN DCs ( Figure 7B and Figures  3 and 7C, group 5 DC2), expressing genes such as CCR6, CD80, CD207, and AXL.
Consequently, human LN DCs align best with murine LN DCs (Figures 3 and 7C, group 4 LN DCs) and express classical DC genes such as ZEB1, CCL22, CCL17, and CCR7, but not the human classical DC gene, LAMP3 ( Figure 7C). Overall, these in-depth correspondences allow one to investigate genes shared and divergent in both well-defined and less welldefined homologous pairs.

A Cell Surface Marker That Allows a Clear Distinction between Macrophages/Monocytes and DCs in Human and Mouse
The rich expression compendium created in this study can be mined to identify new cell surface markers to isolate MP subtypes. In mice, the co-expression of F4/80 (EMR1) Ly6C, MerTK, and CD64 has helped distinguish DCs from macrophage and monocytes (Atif et al., 2018a;Yu et al., 2016;Misharin et al., 2013;Gautier et al., 2012). However, human monocytes do not have a Ly6C homolog; monocytes and macrophages express low levels of EMR1 and show variable expression of MerTK (perhaps due to the lack of a reliable antibody).
Analysis of our RNA-seq data identifies CD88 as a viable cell surface marker that can distinguish macrophages and monocytes from DCs in the tissue of humans and mice (Dutertre et al., 2019;Nakano et al., 2015). To validate this cross-species marker, histogram plots demonstrated that human and mouse AMs, IMs, and tissue monocytes express higher levels of CD88 compared to DCs ( Figure 7D). All in all, CD88 is a commercially available cell surface marker that helps distinguishes MPs in humans and mice (Desch et al., 2016;Gautier et al., 2012).

DISCUSSION
The aim of this study was to investigate the homology between human and murine mononuclear phagocytes to aid in the translation of animal studies to human clinical uses. The dataset and analyses we provide are intended to improve the design of animal models mimicking human disease. Toward this goal, we created a gene expression compendium using RNA-seq analysis of MP subtypes from human and mouse, and a combination of computational methods to determine the closest cross-species counterparts. Overall, this dataset allows us to explore the correspondence between human and murine MPs as well as to discover potential signature markers shared in common (or not) across human and murine MP subtypes.
As validation for our computational analysis, we analyze cell types derived from our human BL and splenic MPs, and compare the populations from several recent scRNA-seq studies profiling human BL MPs (Villani et al., 2017;Dutertre et al., 2019), splenic MPs (Brown et al., 2019), or lung MPs (Travaglini et al., 2020) (Figure S9). Examining our expression data for the top 20 marker genes defined per cluster by the original publications, human blood Clec9a + DCs consistently correspond to conventional DC1 populations and blood DC CD1c − cells correspond to plasmacytoid (pDC) populations. The monocyte subtypes express genes identified in monocyte clusters, while AMs and IMs express the top macrophage cluster genes. In general, our bulk RNA-seq data of purified cell populations aligns favorably with the assumed cluster labels of scRNA-seq data, with the added advantage of robust measurement of thousands of genes across more MP cell types than previously published, and especially in the understudied LN MPs.
The validity and utility of our analysis for the identification of human-mouse MP counterparts is demonstrated by its unbiased prediction of known homologs. As expected, cross-species groupings include (1) human and murine AMs (group 1), (2) human BL and murine splenic monocytes (group 3), and (3) human BL Clec9a + DCs and murine splenic CD8 + DCs (group 6). These pairs align best with each other when compared with all of the other MPs across species (Ingersoll et al., 2010;Caminschi et al., 2008). These three pairs provide a useful reference for as sessing the confidence with which our strategy can identify less-well-defined MP subtype homologs. In addition, high degrees of overlap between human BL and lung monocytes with mouse splenic monocytes is reassuring, as it demonstrates a negligible effect on the transcriptome of cells extracted from BL versus digested tissue.
We also observe a preserved general distinction between MP lineages in both human and mouse. For example, groups 1, 2, and 3 contain exclusively macrophages (AMs and IMs) and monocytes, while groups 4, 5, and 6 contain DCs (Figures 3 and 4B), with 2 exceptions. The first exception is human LN monocytes in the IM-enriched group 2, which as previously stated represents a more mature monocyte expressing both classical monocyte and differentiated macrophage genes. The second exception is human lung IM CD1c + , which aligns more closely with murine splenic CD4 + DCs than its expected counterpart mouse lung IM3; however, conversely, mouse IM3 aligns best with lung IM CD1c + ( Figure 4B).
Interestingly, human BL DCs and mouse splenic DCs cluster away from human and mouse LN DCs, perhaps due to tissue imprinting. One distinction we notice is the expression of CCR7, which is a hallmark feature of DCs. Human BL DCs lack CCR7, a receptor required for lymphatic migration and localization in the T cell zone of LNs (Randolph et al., 2005). This is possibly because the cells are immature and require additional endothelial-tissue interaction to upregulate CCR7. Another observation is that human DC subtypes differentially express CD62L, a transmigration molecule used by circulating monocytes and preDCs to enter tissue and LNs (Liu et al., 2009;Jakubzick et al., 2013;León and Ardavín, 2008). CD62L is highly expressed in human BL DCs, but not in tissue DCs, suggesting that once in tissue, this gene is no longer required. Thus, one possibility could be that human BL DCs may give rise to human tissue DCs. An alternative possibility is that human BL DCs and tissue DCs share the same precursor cell, which gives rise to DCs in different compartments. There is a precedent for this in mice: murine lymphoid resident DCs do not give rise to migratory tissue DCs but share the same precursor cell, where the former DCs precursor entered the LN via the high endothelial venule (HEV) and the latter DC migrated from the tissue into its draining LN via afferent lymphatics.
One surprising finding is the relatively small effect of DC lineage on gene expression patterns. In mice, there is a strong distinction between DC1 (Irf8, Batf3 dependent) and DC2 (Irf4 dependent) lineages, as seen for CD8 + (DC1) versus CD4 + (DC2) splenic DCs. The existence of two DC lineages in humans is supported by the alignment of BL Clec9a + DC with murine CD8 + splenic DC (group 6), compared with CD4 + splenic DCs. However, the separation of human lung or LN DCs based on DC lineage appears to be masked by the stronger effects of tissue imprinting, as evidenced by the distinct clustering of the LN DC (group 4) from lung DCs (group 5) and human blood/splenic DC1 (group 6). In addition, we originally separated LN DCs based on CD1a expression because we consistently observed CD1a hi and CD1a lo CD1c + lung-draining LN DCs in >50 individuals, making the assumption that they may be distinct cell types. However, it is now clear that although the sorted human LN DCs are distinct at the CD1a protein level, these populations are not transcriptionally distinct. Further investigation as to how to subset DEC205 + CD1c + lungdraining LN DCs using different cell surface markers is needed. More important, human LN DCs are understudied. Even though we are unable to decipher more than one LN DC cluster, there is a wealth of genetic information provided in this dataset for future follow-up. For instance, CCL17 and CCL22 expression is restricted to DC populations in lung and LNs, and is not expressed in human BL DCs (Alferink et al., 2003;Stutte et al., 2010;Rapp et al., 2019;Vulcano et al., 2001). This is also true in mice, in which these chemokines distinguish DCs from other immune cell types and are important for T cell recruitment and interactions (Rapp et al., 2019).
Overall, the overlap among the top 1,000 genes for corresponding human and mouse MPs of the same type is low (13%-28%; Figure S6B). This points to the need to identify speciesspecific marker genes and not assume that a gene identified in one species will be an adequate marker in another species. This applies to both translational (mouse to human) and reverse translational (human to mouse) studies. However, it is important to note that this does not mean that the species have completely different transcriptional profiles overall. To qualify as a marker gene for a given MP subtype, a gene had to be expressed primarily in just that one single subtype. Genes expressed in common across multiple closely related subtypes were not included in the overlap analyses. An alternative approach would be to detect the broader differences between monocytes/macrophages versus DC lineages overall, including all genes. Broader inclusion would likely yield a much higher percentage overlap across all MP subtypes, and estimating this overlap may be useful for some applications. However, it would require a different analysis and is beyond the scope of the present study, as our goal was to focus on the subset of genes likely to be useful as markers for particular MP subtypes.
Finally, in our preliminary investigation, we identified CD88 expression to be highly predictive of macrophage/monocytes compared to DCs in both human and mouse. Fortunately, there are commercially available antibodies against CD88 (and CD64, shown previously) that work well for separation of monocytes/macrophages from DCs in human and mouse (Desch et al., 2016;Nakano et al., 2015;Gautier et al., 2012;Jakubzick et al., 2013). In conclusion, this study provides a platform to aid in the design of future translational studies at the MP subtype level and at the gene level. Awareness of crossspecies homology, or lack thereof, can save a great amount of time and expense in medical research.

STAR★METHODS RESOURCE AVAILABILITY
Lead Contact-Further information and requests of resources and reagents should be directed to and will be fulfilled by Dr. Claudia Jakubzick (claudia.jakubzick@dartmouth.edu).

Materials Availability-This study did not generate new unique reagents.
Data and Code Availability-Transcriptome data from murine mononuclear phagocytes can be accessed from the NCBI Gene Expression Omnibus via accession numbers GSE132911 (monocyte and DCs) and GSE94135 (AMs and IMs). To protect donor anonymity human RNaseq data have not been deposited in a public repository. Requests for access to human sequencing data may be made to the lead contact.

EXPERIMENTAL MODEL AND SUBJECT DETAILS
Human subjects-We received de-identified human lungs that were not used for organtransplantation from the International Institute for the Advancement of Medicine (Edison, NJ, USA) and the University of Colorado Donor Alliance. We selected donors without a history of chronic lung disease and with reasonable lung function with a PaO2/FiO2 ratio of > 225, a clinical history and X-ray that did not indicate infection, and limited time on a ventilator. Lungs from 7 individuals were processed for RNA sequencing; the mean donor age was 35.8 years (SD 24.7 years), 3 were female and 4 were male. 4/7 donors were nonsmokers (defined as never having smoked) 2 were light smokers (less than a pack a year) and 1 was a former smoker. Additionally, blood was drawn from 3 healthy, non-smoking volunteers, all female with a mean age of 30 years (SD 4.6). Additional demographic information is shown in Table S2. 10 different mononuclear phagocyte (MP) populations were sorted from each lung donor, and 5 different populations were sorted from each blood donor, as outlined in Figure 1. Note, not every donor provided all 15 MP cell types. After data quality control analysis, this resulted in 73 total human MP samples. The Committee for the Protection of Human Subjects at National Jewish Health approved this research.
Animals-Naïve female C57BL/6 mice were obtained from Charles River Laboratories. Mice were group housed in a specific pathogen-free environment at National Jewish Health (Denver, CO), and used at 6-10 weeks of age, in accordance with protocols approved by the Institutional Animal Care and Use Committee. Each sample represents pooled cells from 5 mice; different cohorts of mice were used to isolate lungs, LLNs and spleens.

METHOD DETAILS
BAL, lung, lymph node, and PBMC cell preparation-Lungs were removed en bloc in the operating room and included the trachea, LNs and pulmonary vessels. Pulmonary arteries were perfused in the operating room with cold histidine-tryptophan-ketoglutarate (HTK) solution to preserve endothelial cell function and prevent intravascular clot formation. The lungs were submerged in HTK, and immediately shipped on ice. All lungs were processed within 24 hours of removal. The lungs were visually inspected for lesions or masses and were eliminated from the study if grossly abnormal. Peribronchial LNs were identified and removed. Procedure additionally described in Desch et al., 2016 and Bronchoalveolar lavage (BAL) was performed on the right middle lobe or lingula by completely filling the lobe three times with PBS and 5mM EDTA and then 3 times with PBS alone. After each instillation, lavage fluid was drained from the lung, collected, and pooled.
Lung tissue digestion was performed by inflating the lobe with 4U/mL elastase solution (Worthington Biochem) in PBS containing Ca 2+ , Mg2+, HEPES and dextrose. Airways were clamped to prevent leakage of enzyme and the tissue was incubated at 37°C for 40 minutes. Manual disruption of the tissue was performed, first with scissors and second with a food processor in buffer containing fetal bovine serum. The resulting tissue homogenate was filtered through a 100 mm nylon filter membrane to create a single-cell suspension. Density gradient centrifugation using 30% and 65% Percoll (Sigma Aldrich) was used to deplete red blood cells and dead cells.
Lymph nodes were identified and dissected from lung tissue, pooled, minced, and enzymatically digested with 2.5 mg/mL collagenase D (Roche/Sigma) for 30 minutes at 37°C. Digested tissue was collected and filtered through a 100 mm nylon filter membrane to create a single cell suspension.
Peripheral blood mononuclear cell (PBMC) isolation, blood was obtained by venipuncture from healthy adults as per IRB approved protocol. CPT tubes (BD) were used to separate PBMC from other blood components according to manufacturer's instructions.
Murine Lung, LLN, and spleen MP isolation-Mice were euthanized by CO 2 asphyxiation and lungs were perfused with cold PBS via the heart. Lungs were removed, minced finely with scissors and subjected to enzymatic digestion with 0.5mg/mL Liberase (Roche) for 30 minutes at 37°C as reported previously (Gibbings et al., 2017). Mediastinal lymph nodes or spleens were removed, minced, and digested by incubation with 2.5mg/mL Collagenase D for 30 minutes at 37°C. Single cell suspensions were prepared by manual homogenization and filtration through a 100 mm nylon filter membrane. Mouse lung macrophages were labeled with anti-mouse MerTK and enriched with anti-biotin microbeads as previously reported. Mouse lymph node and spleen DCs were enriched with anti-mouse CD11c microbeads and spleen monocytes were enriched using anti-mouse CD11b microbeads according to the manufacturer's protocol (Miltenyi Biotec).
Mononuclear phagocyte enrichment-Cells of interest were positively enriched prior to FACS sorting using Miltenyi microbeads according to manufacturer's protocols. Human blood monocytes and dendritic cells were enriched from PBMCs using biotin-conjugated anti-CD141 and anti-CD1c in combination with anti-biotin microbeads. Enrichment of human lung and lymph node cells was achieved using PE conjugated anti-human CD1c and biotin-conjugated anti-human CD11c antibodies in combination with anti-PE and anti-biotin microbeads.
FACS sorting and flow cytometry-Human BAL, enriched blood, lung, and lymph node cells were re-suspended in PBS with 2mM EDTA, 0.1% BSA and 10% pooled human serum, and stained with fluorescently conjugated antibodies as described in the key resources table. Dead cells were excluded by uptake of 4',6-diamidino-2-phenylindole (DAPI, Invitrogen/Thermo Fisher Scientific). Cells were either sorted using a FACS Aria Fusion (BD) as outlined in Figure 1 for RNaseq analysis or analyzed using an LSRII or LSR Fortessa (BD).
Human RNA-seq data-Isolated mononuclear phagocytes were immediately resuspended in RLT buffer for RNA extraction using Qiashredder columns and the RNeasy micro kit (QIAGEN) per manufacturer's instructions. Isolated total RNA was processed for next-generation sequencing (NGS) library construction as developed in the NJH Genomics Facility for analysis with a HiSeq 2500 (Illumina San Diego, CA, USA). A Clontech SMART-Seq® v4 Ultra® Low Input RNA Kit for Sequencing (Mountain View, CA, USA) and Nextera XT (Illumina San Diego, CA, USA) kit were used. Briefly, library construction started from isolation of total RNA species, followed by SMARTer 1st strand cDNA synthesis, full-length dscDNA amplification by LD-PCR, followed by purification and validation. After that, the samples were taken to the Nextera XT protocol where the sample is simultaneously fragmented and tagged with adapters followed by a limited cycle PCR that adds indexes. Once validated, the barcoded-pooled libraries were sequenced using 1×50bp chemistry on the HiSeq 2500 as routinely performed by the NJH Genomics Facility.
RNA-Seq data generation-For human samples, FASTQ files were generated with the Illumina bcl2fastq converter (version 2.17). Nextera TruSight adapters were trimmed, and degenerate bases at the 3'-end, as well as highly degenerate reads were removed using skewer (version 0.2.2), which by default removes reads with a remaining length of less than 18 nt. For the mouse samples, basecalling, barcode demultiplexing and adaptor trimming was performed using the Ion Torrent Suite software, and reads less than 30 nt in length were discarded. The quality of the reads was assessed using FastQC (version 0.11.5). The preprocessed sequence reads were mapped to the genome of the respective species (canonical chromosomes of release hg 19 for human and mm10 for mouse, respectively) using STAR (Dobin et al., 2013). Reads mapping uniquely to each gene of the Ensembl 75 annotation of the respective species were quantified with the featureCounts program from the subread software package (version 1.5.2, using parameters -s 0 -O-fracOverlap 0.5 (Liao et al., 2019).

QUANTIFICATION AND STATISTICAL ANALYSIS
Cross-species RNA-seq data-A cross-species dataset of RNA-seq reads was obtained for all autosomal genes with a 1:1 homology between human and mouse, as determined Ensembl BioMart version 75 (Homo sapiens genes, Homologs subsection, Mouse Orthologs). Sex chromosome genes were omitted to avoid a strong effect of donor gender. The resulting cross-species data consisted of 11,959 genes across 93 samples total. For visualization purposes, gene expression levels that had subject effects regressed out were obtained for both the human and mouse samples, as follows. The original counts were transformed to a continuous measure using the variance stabilizing transformation (VST) available from the DESeq2 R package version V1.20.0. (Love et al., 2014). Using these transformed values, linear mixed models were fit to each gene with the nlme R package version V3.1-137. These models included an intercept only for fixed effects along with a random intercept for each subject. The expression values with subject regressed out were obtained by adding the overall intercept with the residual error terms (thus excluding the random intercepts). For the small number of genes where the linear mixed models had a singular fit (i.e., the random intercept variance was estimated to be 0), the original expression values were retained. This was performed separately for the human and mouse samples.
To further account for differences in expression levels across species, the VST subjectregressed data were quantile-normalized using the preprocessCore package (version 1.40.0) in R version 3.4.2 ( Figure S3). The quantile-normalized data were used in all subsequent analyses, except the Z-score analysis described below.
Alignment and visualization of human-murine MP data-Human and murine quantile-normalized gene-expression data were aligned and jointly visualized using the Seurat pipeline, originally developed for single-cell sequencing and previously demonstrated to integrate human and murine single-cell data. The standard pipeline was applied here to the human and murine MP data using the Seurat 2.3.2 package for R version 3.4.2 (Satija et al., 2015). Briefly, the data were first normalized using the default methods in Seurat to yield natural-log transformed data. Then a set of 885 highly variable genes (hvg) was identified as shared among the 2000 most highly variable human genes and 2000 most highly variable mouse genes. The data were then scaled using the standard Z-score scaling before further dimensionality reduction. Canonical Correlation Analysis (CCA) was applied to the scaled hvg set to identify a lower dimensional space of CC vectors that maximized the shared correlational structure across human and mouse datasets. Using the MetageneBicorPlot function to display the correlation strength for each CC vector, the optimal number of 12 CC vectors was identified for use in the dynamic time warping alignment step. The resulting single, aligned, low-dimensional space representing the integrated human and mouse data was visualized in 2-D using t-Distributed Stochastic Neighbor Embedding (t-SNE). ( Figures  3, and S4).
Identifying cell-type specific marker genes-Marker genes were identified as those significantly overexpressed primarily in one MP subtype versus all others within a species. Overexpression was quantified using the quantile normalized VST counts. The median expression was computed for each gene in each subtype within a species and then scaled by the median of those expression values across all samples within the species. Scaling by the median expression helps distinguish between genes highly expressed primarily in a minority of subtypes from genes like housekeeping genes that are highly expressed in all subtypes.
Significance of overexpression in one MP subtype was determined by calculating a Z-score for each gene for each subtype of interest as follows. RNA-Seq counts from genes that had a human-mouse equivalent were filtered to include only genes with at least 3 samples having more than one read count per million aligned reads. Next, the counts were transformed to continuous values using the VST from the DESeq2 R package. For each gene, a linear mixed model using subtype for the fixed effects and a random intercept for subjects was fit using the lmerTest R package version 3.1-0 (Kuznetsova et al., 2017). An overall F-test, using Satterthwaite's degrees of freedom method, to determine which genes showed any evidence for differences in expression between the subtypes was performed, and then these p values were adjusted using the Benjamini-Hochberg method. For genes with adjusted p values ≤% 0.05 for the overall F-test, a meta-analysis approach was used to evaluate potential signature genes for each different MP subtype. For the subtype of interest, all pairwise comparisons were done using linear contrasts and t tests (again with Satterthwaite's degrees of freedom) using the linear mixed model described above. The raw p values were then combined using the Z-score method where p i was used if the effect size for that pairwise comparison was positive (i.e., the expression was higher in the subtype of interest) or max (p i , 1-p i ) was used if the expression was lower in the subtype of interest. The same process was repeated for the mouse samples. A higher positive Z-score indicates that the gene is expressed higher in the subtype of interest than the other subtypes, on average.
A final marker score for each candidate gene per subtype was computed as the product of the median scaled expression data and the Z-score using that subtype as the subtype of interest. In this way, genes that were both highly and preferentially expressed in a single subtype would be ranked more highly by the marker score. Note, this strategy discounts genes like housekeeping genes that may be highly expressed in all subtypes but not preferentially in a given subtype, as well as genes that could be marker genes for multiple cell types.
Identifying human-mouse MP analogs-A popular method for determining the corresponding MP subtype between species is to minimize distances or maximize correlations between sample groups using the quantile-normalized gene expression vectors, or in the lower dimensional projection defined by the aligned CC vectors. However, with so few replicates per sample group and a difference in gene variance structure across species, both these methods were found to be very sensitive to outliers and data scaling (data not shown). Moreover, determining the corresponding cell types by maximizing correlation between samples is highly dependent on the set of genes over which the correlation is computed ( Figure S5). Whether using all genes ( Figure S5A) or only the highly variable genes (hvg) to avoid flat or non-expressed genes dominating the correlation ( Figure S5B), all samples are nearly equally correlated, with little discrimination. Moreover, the pairings which maximize correlation using all genes do not always agree with the pairings using the hvg. Another alternative to determining homologous subtypes is to intersect marker gene lists and report as the best match that which has the largest marker gene overlap ( Figure S6). However, for the marker gene determination used here (positive Z-score and p.adj > 0.05), ~4,000-7,000 of the initial 11,959 genes were chosen as potential marker genes per cell type. Given the large marker gene set size relative to the total set of genes, comparing any two sets would necessary overlap of nearly 50% ( Figure S6A). Restricting the overlap analysis to fixed sized sets of the top ranked marker genes per candidate pair results in different 'best' matches depending on the size of the marker set ( Figure S6B). To compute an unbiased correspondence between human and murine MP, the data were analyzed instead based on a summary value of average overlap percentage when considering ordered marker set sizes from 1 to 1000 identified within each species. The set of marker genes for a reference MP subtype in one species was ranked in decreasing order of marker score and compared to the similarly ranked marker genes in the subtypes of the other species. The proportion of genes in common between the reference marker ranked list and each candidate cell type ranked list was plotted against the list size as a Correspondence-at-the-Top (CAT) plot. (Figures 4A, S7, and S8). Due to the ranking criterion, higher agreement at smaller list sizes, i.e., the top of the list, indicates two subtypes agree strongly on genes most highly and preferentially expressed in a single subtype per species, i.e., marker genes unique to the cell type. For each candidate subtype cross-species pair, the overlap percentage was averaged across all list sizes from 1 to 1000 ( Figure 4B). A high mean CAT percent agreement indicates that the pair consistently agrees on strong overlap, regardless of the marker set size (from 1 to 1000). The cross-species analog was chosen as that which maximized the mean CAT percentage agreement across all list sizes from 1 to 1000 ( Figure 4B).
Comparison to previous DC and monocyte signature gene sets-The signature gene sets defined by Villani et al., (Villani et al., 2017) for the six DC subsets and for four monocyte subsets. Genes not appearing in our dataset were first removed and the top 20 genes per subset were identified using the reported AUC value to rank the genes (AUC column, Tables 1, S2 and S5 in Villani et al. (2017) paper). Signature gene sets defined by Dutertre et al., (Dutertre et al., 2019) appearing in our dataset or occurring as multiples in their signatures were first removed and then the top 20 genes per subset were defined based on decreasing order of average log fold change (avg_logFC column, Table S2 in Dutertre et al. (2019) paper). The correspondence of cluster number to cluster identity was established using the signature genes highlighted for each cluster in Dutertre et al. Signature gene sets from Brown et al., (Brown et al., 2019); were created from the normalized count data available the Gene Expression Omnibus (GEO) accession GSE137710. Following the methods described in Brown et al., a score for each gene for each subset was computed as the product of the earth mover's distance (EMD) and the area under the ROC curve (AUC) using the gene to distinguish between cells within the cluster versus all cells outside the cluster. Genes not appearing in our dataset were removed and the top 20 genes per DC subset were defined based on decreasing order of the gene score. The top scoring genes were verified as the genes annotated in Brown et al. MP signature gene sets were defined by Travaglini et al. (2020). Genes not appearing in our dataset were first removed and the top 20 genes per subset were identified based on decreasing order of the average log fold change (avg_logFC column, Table S4 in Travaglini et al. (2020)    Expression of signature DC and macrophage transcripts from RNA-seq are shown as normalized values. Data first undergo a variance stabilizing transformation on read counts, in which the donor identity has been regressed out to mitigate any donor effect. The data for each gene are then min-max scaled into the range [0, 1] to emphasize which MP subtypes have the minimum or maximum expression for a given gene. Although the number of replicates ranges from 3-6, only the first 3 replicates are shown for each subtype.

Figure 3. Visualization of MP Subtypes across Species
Quantile-normalized Variance Stabalizing Transformed (VST) subject-regressed data from 3-6 donors per subtype for 15 human and 9 murine MP subtypes were subjected to the Seurat version 2.0 pipeline to visualize human and mouse data within a common 2dimensional (2D) space.
(A) Samples in the resulting 2D projection are colored by species, tissue, or cell type. (B) Samples are grouped into 6 clusters based on later shared marker gene analysis (see Figure 4). Shape indicates cell type, size indicates species, and color indicates group assignment. MP subtypes are denoted by the letters A through X. Note that each group contains related subtypes across species.

Figure 4. Correspondence of MPs across Species
Candidate marker genes were ranked by their score, which multiplies their expression levels, scaled relative to the median expression, by their marker gene Z score (see Method Details).
(A) For each reference subtype (graph title) from either human (top) or mouse (bottom, signified by mm), the proportion of agreement of ranked marker gene lists versus ranked marker gene lists for each candidate subtype in the other species was calculated for progressively larger list sizes and visualized with a correspondence-at-the-top (CAT) plot for a select set of MP subtypes. Percentage agreement is the percentage of genes identified as critical markers in the reference species that are also identified as critical markers for the comparison species up to that rank. For example, at rank 100 on the x axis, a value of 10% would indicate that of the marker genes in the top 100 in the reference species, 10% are in the top 100 of the comparison species. Each line shows a different cell population in the comparator species. The color of the vertical line indicates which subtype had the highest mean CAT overlap across all list sizes from 1 to 1,000 with respect to the reference. (B) The mean CAT overlap over list sizes 1-1,000 is shown for each subtype pair. Rows are scaled by the maximum and the minimum to emphasize the highest subtype correspondence for the human subtypes. Rows and columns are ordered to cluster similar subtypes. Clusters of MP subtypes are then annotated by group number, and these groupings are used for the coloring in Figure 3B. The expression values are shown for the top 50 marker genes in the reference cell type. The cell types are ordered by degree of overlap with the reference cell type, if within the same species, or by the mean CAT overlap shown in Figure 4B if in the other species. The data are scaled across the row by the gene minimum and maximum within each species to emphasize in which cell types the gene is most expressed. Genes among the top 50 in the reference that also appear in the top 1,000 ranked genes in the 1 st -, 2 nd -, or 3 rd -best match marker list appear toward the bottom and are annotated with *, **, or ***, respectively. Note that annotation by an asterisk does not mean that the gene must also appear in the top 50 marker genes in the other species, only that it appears in the top 1,000 genes for the top 3 matches. Also, visualizing the top 3 matches rather than just the best match can help reveal cases when the best match, as determined using 1,000 reference marker genes, may differ from the best match suggested by comparing expression of just the top 50 reference genes, in which case the CAT plots can inform the reason for the perceived discrepancy ( Figures S7 and S8).
(A) Example depiction of all of the data in all of the samples for a given reference and the top 3 closest matches according to Figure 4B. (B) Highest correspondence for human and mouse AMs. Three replicates for the reference and 3 closest subtypes of the opposing species are shown. (C) Reciprocal highest correspondence for human BL and mouse Spl monocytes. Three replicates for the reference and the 3 closest subtypes are shown. (D) Highest correspondence for human BL DC Clec9a + and mouse Spl CD8 + DC1. Three replicates for the reference and the 3 closest subtypes are shown.

Figure 6. Corresponding Human and Mouse IM Subtypes
Expression is shown for the first 50 ranked genes using the scaled data, as in Figure 5. Three replicates for the reference and the 3 closest subtypes of the opposing species are shown. Genes are annotated by whether they are also in the top 1,000 marker genes in the 3 highest corresponding subtypes with *, **, or ***, respectively. (A) Human IMs. (B) Mouse IMs.

Figure 7. Corresponding Human and Mouse Monocytes and DC Subtypes; CD88 Expression Distinguishes Monocytes/Macrophages from DCs
Expression is shown for the first 50 ranked genes using the scaled data as in Figure 5. Three replicates for the reference and the 3 closest subtypes of the opposing species are shown. Genes are annotated by whether they are also in the top 1,000 marker genes in the 3 highest corresponding subtypes with *, **, or ***, respectively.