An atlas of human proximal epididymis reveals cell-specific functions and distinct roles for CFTR

A coordinated process of maturation in the lumen of the epididymis is required for sperm to fertilize an egg. A single cell atlas of the human proximal epididymis reveals key cell-specific functions, regional specialization, and two roles for CFTR.


Introduction
The human epididymis has a pivotal role in male fertility. Immature sperm leaving the testis are exposed to a series of key environmental cues in the lumen of the duct that ensure their full maturation. These cues are provided in large part by cells in the epithelium of the epididymis, which secrete a complex mixture of ions, glycoproteins, peptides, and microRNAs (Belleannee et al, 2012a) that coordinate sperm maturation along the length of genital ducts. Most insights into the functional specialization of the epididymis epithelium arise from studies on rodents (primarily mouse and rat) and larger mammals such as the pig (Jervis & Robaire, 2001;Robaire & Hinton, 2002;Dacheux et al, 2005;Dacheux et al, 2009;Breton et al, 2016). However, it is apparent there are substantial differences between species, both in structure and detailed functions. Knowledge of the human male genital ducts is less well advanced because of the difficulty of obtaining live tissues for research and the impossibility of performing in functional studies in vivo. Anatomical observations show that unlike in rodents, where the different functional zones of the epididymis, the initial segment, the caput (head), corpus (body), and cauda (tail) are separated by septa, the human duct has no such clear divisions, making functional analyses even more challenging. Over the past several years, we (Harris & Coleman, 1989;Pollard et al, 1991;Bischof et al, 2013;Browne et al, 2014Browne et al, , 2016aBrowne et al, , 2016bBrowne et al, , 2018Browne et al, , 2019Leir et al, 2015), and others (Dube et al, 2007;Thimon et al, 2007;Cornwall, 2009;Belleannee et al, 2012a;Sullivan & Mieusset, 2016;Sullivan et al, 2019), have made a concerted effort to advance understanding of the human organ, to facilitate novel therapeutic approaches for male infertility and the development of targeted male contraceptives.
The human epididymis does not have an initial segment, rather the efferent ducts (EDs) provide the conduit from the testis to the head of the epididymis (caput) where the key functions of sperm maturation are thought to occur. Based on their gene expression profiles and other data, the corpus and cauda regions probably have a more important role in sperm storage and in ensuring the sterility of more proximal regions of the duct (Thimon et al, 2007;Belleannee et al, 2012b;Browne et al, 2018Browne et al, , 2019. Because of its dominant role in male fertility, we focused on the proximal part of the duct and generated a detailed single-cell atlas of the human caput epididymis, which is described here.

Results
There is remarkable diversity in the structure of the epididymis from different donors as shown in Fig 1, making precise dissection of the caput tissue (in the absence of septa in humans) somewhat challenging. On the proximal side, our goal was to minimize the contribution of ED tissue and on the distal side to not include corpus tissue. It was not possible to take prospective tissue sec-previously published bulk RNA data from the caput, corpus, and cauda tissue (Browne et al, 2016b). We recovered 1,876, 1,309, and 2,114 cells from donors aged 31, 57, and 32 years, respectively, that passed quality control on the 10X Genomics Chromium System pipeline, providing scRNA-seq data on a pool of 5,299 cells. The cell-UMI matrix was normalized and clustered using Seurat version V3 (Satija et al, 2015). The clustering of cells from the three donors is shown in the Uniform Manifold Approximation and Projection (UMAP) dimension reduction plot (Becht et al, 2019) in Fig S1E, where eight distinct clusters are identified. Individual clusters contain cells from each donor (AXH009, AXH012, and AXH014), although as expected, the numerical contribution of cells varied by cluster (Table S1).

Identification of major and minor cell populations in the human caput epididymis
Our current understanding of the cell types in the caput epididymis epithelium is largely based on careful characterization of the rodent tissue (reviewed in Breton et al [2016]) and some additional data from human tissues (reviewed in Sullivan et al [2019]). We expect to find principal cells as the most abundant cell type, with contributions from basal cells, clear cells, stromal cells, and other minor populations (reviewed in Sullivan et al [2019]). Each cell type has a set of identifying protein markers, which we used to guide our initial analysis. First, we examined the list of differentially expressed genes (DEGs) for each of the eight clusters identified in Seurat v3 (Satija et al, 2015) (Table S2) and shown by the UMAP plot in Fig 2A. A heat map illustrating the top 10 most DEGs in each of the 8 clusters shows the strong identity of each cell type ( Fig 2B). The largest groups of cells in cluster 0 and cluster 1 have somewhat similar gene expression profiles based on their adjacent location on the UMAP plot and both clusters express some predicted markers of principal cells. One classical marker of principal cells in the mammalian epididymis is the water channel aquaporin 9 (AQP9) (Da Silva et al, 2006;Hermo & Smith, 2011;Schimming et al, 2017), although our data suggest that its expression may be highest in a distinct population of principal cells (cluster 0) and much lower in cluster 1 cells (Fig S2A and B). Later, we present evidence that cluster 1 cells may encompass apical and narrow cells, rather than principal cells. The next most abundant cluster of cells (cluster 2) is primarily contributed by donors AXH012 and AXH014 and inspection of the DEG list suggests these may arise from ED cells contaminating the caput tissues. The diagnostic marker for the ED is villin (VIL1), which is apparently abundant in the ED epithelium but not seen in the caput epithelium in tissue sections Sullivan et al, 2019). Of note, not all cells in this cluster express villin ( Fig 2C), but the fact that they cluster suggest that the ED cell gene expression signature is substantially different from the other cells types in the caput tissue. Cluster 3 cells show markers of basal cells, which as their name implies are located as the base of the epididymis epithelium, although may exhibit processes into the lumen of the duct (Shum et al, 2008). The most differentially express gene in this cluster is keratin 5 (Fig 2C). Cluster 4 encompasses cells with high expression of mesenchymal markers and may include stromal cells, fibroblasts, and muscle cells. Cluster 5 cells are spermatozoa based upon high expression of sperm proteins such as cilia and flagella associated protein 43 (CFAP43/WDR96) and the FOXJ1 transcription factor, which is diagnostic of ciliated cells. This cluster is primarily contributed to by two of the tissue donors (AXH012 and AXH014). Cluster 6 cells are likely clear cells, which express the multiple subunits of the V-ATPase complex (Fig 2C), which is critical for the acidification of the epididymis lumen. A low pH in the lumen of the rodent epididymis is thought to be necessary for maintaining sperm quiescence (Breton et al, 1996;Breton & Brown, 2013). Cluster 7 cells are clearly immune cells with multiple HLA peptides differentially expressed, suggestive of B-cells and the Fc Fragment of IgE Receptor Ig (FCER1G) gene, which encodes the Fc ε receptor that is strongly associated with monocytes and macrophages. It is possible that this cluster also includes the halo cells described in other species (Serre & Robaire, 1999).
Regional and cell-specific distribution of function in the major caput epithelial cell populations Principal cells are thought to be most abundant cells throughout the length of the epididymis and are predicted to have some regional specialization of function between the caput, corpus, and cauda regions. We observed earlier that the major cell type in cultured epithelial cells derived from these regions had some characteristics of principal cells, but caput-derived cells had a substantially different morphology from the corpus and cauda cells (Leir et al, 2015). Hence, it is of interest here to observe a clear functional specialization of the predicted principal cells within the caput (clusters 0 and 1). The most differentially over-expressed genes in cluster 0 include those encoding multiple antimicrobial peptides, including β defensins (DEFB118 [ Fig S2C and    β-defensin domain which is lacking from the A form (Ribeiro et al, 2015). Also abundant in cluster 0 cells are transcripts for cystatin 11 (CST11) a type 2 cysteine protease inhibitor (Hamil et al, 2002) CSTs have multiple roles in the mammalian reproductive tract including as antimicrobial agents. Adhesion G protein-coupled receptor G2 (ADGR2 also known GPR64) and the serine protease inhibitor serpin family F member 2 (SERPINF2), which are both androgen-responsive genes in human epididymis epithelial cells (Yang et al, 2018), are also significantly over-expressed in cluster 0 cells. Also of note in this cluster are several lipocalins (LCN6, 8, and 12), LCN8 is reported to be an epididymis specific lipocalin (Suzuki et al, 2007). In contrast, cluster 1 cells show differential expression of a different group of well characterized caput epididymis markers including epididymal protein 3A and 3B encoded by (EDDM3A/3B) and cysteine-rich secretory protein 1 (CRISP1) (Figs 2C and 4A-F and K). CRISP1 is known to be secreted by the epididymis epithelium into to the lumen where it binds to the sperm head and has a role in sperm-egg fusion (reviewed in Evans [2002]). Also abundant in cluster 1 cells are whey acidic protein four-disulfide core domain protein 8 and 9 (WFDC8/9) (WFDC8 is shown in Fig 4G, H, and K, with its regional distribution shown in Fig  S3A-D). WFDC8/9 are epididymis protease inhibitors (Rajesh et al, 2011). Transcripts from the Paired Box gene 2 transcription factor gene (PAX2), which is known to direct a transcriptional network in urogenital cells including the epididymis cells (Browne et al, , 2019 is also abundant in cluster 1. Similarly over-expressed in cluster 1 cells are a group of β defensins. However, these are encoded by different genes than those seen in cluster 0 cells and include DEFB132, DEFB129, and DEFB127, which map to the defensin gene cluster on chromosome 20p13, and DEFB110, DEFB134, and DEFB131, which are encoded by defensin gene clusters on three different chromosomes (6, 8, and 4, respectively).
We subclustered groups 0 and 1 alone using DIM = 10 and DIM = 30, in an attempt to reveal the differences in their identity, but this was not highly informative: a small cluster of cells, arising primarily from one tissue donor, are likely adipocytes based upon differential expression of adipogenesis regulatory factor (ADIRF) (data not shown). In an effort to determine whether the cluster 0 and cluster 1 cells represented principal cells with significantly different functions based upon their regional localization, or whether they identified different cell types in the caput epithelium, we used immunofluorescence to examine the expression pattern of proteins encoded by cluster-specific DEGs. Sequential panels in Fig (Table S2), it is also expressed at a lower level in cluster 1 cells (Fig 3F), consistent with SPAG11B protein detection in nearly all surface cells in the caput epithelium. In contrast, CST11 protein, although also absent from the ED (Fig 3G), shows a gradient of expression along the caput. It is seen at low levels in the proximal caput close to the ED region ( Fig 3G), although it is almost undetectable in the adjacent caput region ( Fig 3H). CST11 is much more abundant in the middle of the caput ( Fig 3I) and through the corpus ( Fig 3J). However, some variation was noted between different tissue donors. CRISP1 shows a markedly different pattern of expression (Fig 4) where moderate expression levels are seen in the ED ( Fig 4A) and caput epithelium through all regions (Fig 4B and C), although the protein is much more abundant in the corpus ( Fig  4D). Careful inspection of the CRISP1 localization in the proximal and mid-caput shows much higher expression in what appear to be patches of surface/apical epithelial cells and in rarer narrow cells (marked by arrowheads [A and N, respectively] in Fig 4B). Of note, the distribution of WFDC8 (cluster 1 marker) in Fig 4G coincides with the patches of the surface (apical) epithelial cells, whereas immunofluorescence detection of serine peptidase inhibitor Kazal type 13 (SPINK13), another DEG in cluster 1 cells, highlights narrow cells ( Fig 4I, white arrowheads, Fig 4J). Higher resolution images of these cell types are shown in Fig S3E and F. These data suggest that cluster 0 may primarily encompass principal cells, whereas cluster 1 includes other cell types such as apical and narrow cells. Moreover, the results support a model whereby both the regional expression of markers along the caput within a single cell type/cluster and cell clusterspecific markers combine to integrate the regional functions of the epididymis.

The unique identity of efferent duct epithelial cells
Since there are no septa dividing the different functional regions of the human epididymis and our goal was to capture all cell types in the caput region, we were not surprised to detect some ED cells in our single cell preparations. These cells (cluster 2) were mainly contributed by two donor samples. We identified the ED cells primarily by the expression of villin (VIL1) (Fig 5A and E), which is expressed in the apical membranes of ED but not caput epididymis epithelium Sullivan et al, 2019). Not all cells in cluster 2 express high levels of villin (Fig 5E), suggesting that only a subgroup of surface epithelial cell express the marker. However, the clustering suggests that other ED cell types are more similar to each other than to the caput-derived cells in other clusters (e.g., 0 and 1). Other abundant transcripts in ED cells include asparaginase and isoaspartyl peptidase 1 (ASRGL1) a gene associated with prostate cancer (Pudova et al, 2019), the estrogen receptor (ESR1, Fig  5C) and FXYD domain containing ion transport regulator 2 (FXYD2) that encodes the sodium/potassium-transporting ATPase subunit γ, which is involved in renal transport (Sha et al, 2008) (Fig 5G). Using antibodies specific to the estrogen receptor (ESR1) and the androgen receptor (AR) in immunofluorescence we confirmed that ESR1 levels were high in the ED epithelium (both nuclear and cytoplasmic) ( Fig 5C) but absent from the caput epithelium, where AR was clearly localized to the nuclei (Fig S3G-I). By re-clustering cells in cluster 2 with a default DIM of 10, we defined three main cell types in the ED cell population (Fig S4A). The cell types within the human EDs are not well documented (reviewed in Hess [2002]). Based upon DEG lists ( Fig S4B) and comparison with the transcriptional signatures of different cell types in the epididymis these cells are predicted to be principal cell-like (Group 0), basal cell-like among others (Group 1), and apical and narrow cell-like (Group 2). Reclustering with a DIM value of 24 identified 4 cell clusters, providing a separate identity to a small subset of basal cell (arrowed in Fig  S4A), although their transcriptional signature alone did not enable definitive assignment of a differentiated cell type.

Basal cells may include the stem cells of the epididymis epithelium
A relatively small number of predicted basal cells were recovered in the scRNA-seq (cluster 3), which is perhaps not unexpected as they form a discrete single layer of cells at the base of the epithelium ( Fig  6). In earlier work on the rat epididymis, basal cells accounted for fewer than 10% of cells (Trasler et al, 1988). Despite this low number of cells, their identity is quite unique. In addition to the classical marker of basal cells in many epithelia, keratin 5 (KRT5) (Leir et al, 2020) (Fig 6A and C), which is the most DEG, the list of the top 25 DEGs show many that encode proteins involved in the cytoskeleton and extracellular matrix (Table S2). Among these are claudin 1 (CLD1) (Gregory et al, 2001), fibronectin (FN1), integrin alpha 2 (ITGA2), LIM domain and actin-binding protein 1 (LIMA1), dystonin (DST), and keratin 17 (KRT17) (Fig 6F). Also on the list are other proteins that may reflect functions of the basal cells that protrude from the base of the epithelium through to the lumen, as observed in other species (Shum et al, 2008) and illustrated in Fig 6A, yellow arrowhead. These may account for the high expression of FXYD3, which is thought to regulate ion pumps and channels (Crambert et al, 2005), and tumor-associated calcium signaling transducer 2 (TACSTD2), a cell surface receptor transducing calcium signals (Nakatsukasa et al, 2010). Also of note in cluster 3 are abundant transcripts from the TP63 gene (Fig 6A, D, and E), which encodes the p63 transcription factor, known as a marker of stem/progenitor cells in other epithelia such as the airway (Zuo et al, 2015). Immunolocalization of p63 in the caput epididymis clearly showed high abundance in only a subset of KRT5 positive basal cells (Fig 6A, purple arrow highlighting purple cells). This predicts that, as has been suggested elsewhere (Pinel et al, 2019), basal cells may be the source of stem cells for regeneration of the epididymis epithelium.

Interstitial cells of the caput epididymis
Careful dissection of the epididymis before single-cell sequencing requires the removal of substantial amounts of connective tissue and often adipose tissue deposits to reveal the tubular structure. Hence, it is not surprising that we recover the stromal cell types identified in cluster 4, although some of these may also be an integral part of the epididymis duct structure. Notable among DEGs in this group (Table S2) are vimentin (VIM) (Fig 2C), encoding the cytoskeletal intermediate filament protein found in non-epithelial cells such as mesenchymal cells, bone marrow stromal cell antigen 2 (BST2) and myosin light chain 9 (MYL9), a myosin regulatory subunit with a role in both smooth muscle and non-muscle cells. Other DEGs in this cluster include a number that encode proteins involved in actin filament polymerization and smooth muscle biology, for example, thymosin beta 4 X-linked (TMSB4X), gelsolin (GSN), and transgelin (TAGLN). Differential expression of matrix gla protein (MGP) and SPARC-like protein 1 (SPARCL1) also indicates that this cluster of cells may include blood vessel derivatives.

Spermatozoa
In our previous analysis of gene expression in the epididymis, we detected signatures that appeared to be from sperm (Browne et al, 2016b); hence, the identification of cluster 5 as sperm, based upon on their DEGs (Table S2), was expected. At the top of the DEG list is calcyphosine (CAPS), which encodes a calcium-binding protein with a predicted role in regulating ion transport. The gene was also named epididymis secretory sperm binding protein implicating a source exogenous to sperm. Also among the DEGs is adenylate kinase (AK1), which has a key role in energy metabolism, consistent with the motility requirements of sperm, and cation channel sperm associated auxiliary subunit delta (CATSPERD) a component of the CatSper complex involved in hyperactivation of sperm, which is required for sperm mobility. Another DEG with a predicted role in sperm motility role is tubulin polymerization promoting protein family member 3 (TPPP3), the regulator of microtubule dynamics and bundling. Other DEGs with a role in the structure of motile cilia include radial spoke head component 1 (RSPH1), the primary cilia formation (PIFO) gene, which has a role in cilia disassembly, multiple coiled-coil domain-containing protein genes such as cilia and flagella-associated protein 53 (CFAP53/formerly CCDC11) and 43 (CFAP43, also known as WDR96). Consistent with the importance of cilia in these cells is the identification of the forkhead box J1 (FOXJ1) transcription factor as a DEG. Also relevant to sperm function is dynein light chain roadblock-type 2 (DYNLRB2), which encodes an accessory component of the cellular motor dynein that facilitates movement of cargo along intracellular microtubules. In addition to novel sperm-associated genes in cluster 5 cells, known markers of spermatogenesis were also identified as DEGs, such as meiosis-specific nuclear structural 1 (MNS1) and rhophilin-associated tail protein 1 like (ROPN1L). A violin plot of gene expression by cluster for several key DEGs in cluster 5 is shown in Fig S5A.

Clear cells: the epididymis ionocyte
Clear cells in the epididymis are known to express high levels of the vacuolar ATPase (V-ATPase), which pumps hydrogen ions into the  lumen of the duct, and thus have a pivotal role in maintenance of luminal pH in the mouse (Breton et al, 1996;Shum et al, 2009;Park et al, 2017). Cross talk between clear cells and principal cells, which express the sodium/hydrogen exchanger (NHE3) and the CFTR (Leir et al, 2015;Park et al, 2017) is thought to coordinate the luminal pH that is required for sperm quiescence. However, recent scRNA-seq of the lung epithelium (Montoro et al, 2018;Plasschaert et al, 2018) defined the rare high CFTR-expressing cells that were observed earlier in several epithelia (Trezise et al, 1993;Engelhardt et al, 1994;Ameen et al, 1995) as pulmonary ionocytes and showed them to contain multiple ion channels and high levels of the transcription factor FOXI1. With these observations in mind, we examined the DEGs in cluster 6 cells (Table S2) and found them to have characteristics of both clear cells and ionocytes. For example, at the top of the DEG list are several subunits of the vacuolar ATPase, including cytosolic ATPase H+ transporting V1 subunits G3, A and B (ATP6V1G3 [ Fig 7A and C], ATP6V1A, and ATP6V1B1) and transmembrane ATPase H+ transporting V0 subunit D2 (ATP6V0D2) and A4 (ATP6V0A4). This suggests cluster 6 includes clear cells. However, also among the most significant DEGs is FOXI1 ( Fig 7G) and though of lower significance, but still among DEGs, are both CFTR (Fig 7F) and the gene encoding the α subunit of the epithelial sodium channel (SCNN1A). These data are consistent with cluster 6 cells being equivalent to the "ionocytes" of the epididymis. Hence, clear cells and ionocytes are likely the same cell type in the male genital duct. Of note, there are many other DEGs in cluster 6, some with known functions including several transcription factors, and many with functions previously not associated with differentiated clear cells.
Both the scRNA-seq data ( Fig 7E) and further immunofluorescence imaging (data not shown) demonstrate that only a subset of V-ATPase expressing cells also show CFTR protein. This was confirmed at the RNA level in the accompanying feature scatter plot ( Fig  S5B and C). It is possible that the histological definition of a clear cell encompasses more than one cell type, which could account for the diversity in this cluster. An extensive functional analysis is warranted to resolve the precise function of these cells in epididymis epithelial biology.

Immune cells in the epididymis
We showed data above that demonstrated a key contribution of genes and pathways of innate immunity to principal cells in groups 0 and 1 in particular. Now considering adaptive immunity, we identify most immune cells in cluster 7. At the top of the DEG list (Table S2) (Serre & Robaire, 1999), which are a low abundance cell type that likely contribute to the pool in cluster 7. A feature dot plot showing key marker gene expression changes across clusters 0-7 is shown in Fig 8. CFTR and its role in the epididymis CFTR, a small conductance anion channel, has a pivotal role in normal epididymal fluid transport. Loss-of-function mutations in CFTR are associated with absence of the vas deferens and epididymis abnormalities in cystic fibrosis (CF) (Landing et al, 1969;Holsclaw et al, 1971) and abundant CFTR expression is seen in the genital duct epithelium of humans and many other species (Harris & Coleman, 1989;Harris et al, 1991;Pollard et al, 1991;Bertog et al, 2000;Leung et al, 2001). Very high levels of CFTR expression are seen only in a subset of cells in cluster 6, clear cells/epididymis ionocytes (Fig 7B and F), while CFTR is also a DEG in some cells within the ED cluster 2 (Fig 5B and F). However, of note, the cluster 2 cells likely include more than one cell type, since re-clustering based on expression of villin and CFTR identified 222 cells that expressed only VIL1, 150 only CFTR, and 150 cells both VIL1 and CFTR (Fig S4C). Few cells in the principal (cluster 1), apical/narrow (cluster 1), basal (cluster 3), stromal (cluster 4) cell clusters, and sperm (cluster 5) show high expression of CFTR, although most cells in these clusters express low levels of the gene. This observation is in contrast to data from cultured HEE caput cells, which were thought to be most similar to principal cells, but where CFTR is more abundant than in tissue-resident principal cells (Leir et al, 2015;Browne et al, 2016b). These differences may result from the cultures conditions, including lack of relevant cell:cell cross-talk, altered substrate and culture media causing a loss or change of cell identity.

Discussion
The generation of single cell sequencing data from human tissues is transforming our understanding of biological mechanisms. This is particularly true for poorly studied organs and tissues that are difficult to obtain. One such tissue is the human male genital duct, the epididymis and vas deferens, which have a pivotal role in sperm maturation and hence maintenance of the species. Most models of epididymis function are based upon other mammals, which show substantial anatomic and functional diversity. Here, we construct a single cell atlas of the human proximal epididymis, which reveals detailed molecular characterization of both common and rare cell types, and hence may advance our understanding of mechanisms of male fertility.
The key functions of the epididymis are performed by the cells lining the lumen of the duct, which maintain a low pH environment necessary for sperm quiescence. These cells also secrete a wide spectrum of proteins, peptides, and RNAs that provide the necessary cues for normal sperm maturation and prevent damage by external stimuli such as infections. However, the precise identity of the cellular origin of many of these key components is not clear. Our data show a marked regional gradient of function along the proximal epididymis, which is supported by specific groups of individual epithelial cell types. Principal cells are known to be the majority cell type in the epididymis epithelium and predominate in most epididymis cell culture models. It has been suggested that principal cells in the caput, corpus, and cauda epididymis and the vas deferens have different functions (Jervis & Robaire, 2001;Cornwall, 2009;Domeniconi et al, 2016), although supporting evidence is derived largely from bulk microarray or RNA-seq studies and studies of individual cellular processes. Here, we show by scRNA-seq and accompanying immunocytochemistry of tissue sections, that although there is a profound regional distribution of expression of specific genes along the length of the proximal epididymis epithelium, this may not correlate with the localization of unique cell populations. Recent scRNA-seq data from the mouse genital duct deposited on BioRxiv (Shi et al, 2020 Preprint;Rinaldi et al, 2020 Preprint) are not directly comparable with our analysis, as they include all regions of the duct from the initial segment through the cauda epididymis or vas deferens, respectively, and hence identify many more cell populations.
First of note are the two most abundant groups of cells (clusters 0 and 1), which we initially identified as putative principal cells in the caput epithelium. Cluster 0 cells have a major function in producing antimicrobial peptides, including β defensins expressed from diverse gene clusters, and other components of the innate immune system. The β defensin family members DEFB118 and DEFB119 are among the most DEGs in group 0 cells, as are SPAG11A and SPAG11B, which have structural similarities to the β defensins. The expression of SPAG11B is seen in principal cells along the whole length of the caput by immunofluorescence. In addition to the antimicrobial activity of the encoded proteins/peptides, they are thought to have other functions and may be directly involved in sperm maturation (Pujianto et al, 2013). SPAG11B is widely distributed in the epididymis fluid in rodents where it also coats sperm; hence, it is of particular interest to identify its source in principal cells throughout the human caput epididymis epithelium. Other antimicrobial DEGs in cluster 0 cells are the cystatin cysteine protease inhibitors CST11 and CST3. However, unlike SPAG11B, CST11 protein shows a marked regional distribution by immunofluorescence, with highest levels in the mid-to-distal caput epithelium and lower levels in the proximal portion. CST11 is the predominant cystatin in the male reproductive tract of Macaca mulatta monkeys (Hamil et al, 2002), and like some other SPAG genes (Ribeiro et al, 2017), it is known to be regulated by androgens. Although the AR is not a DEG in cluster 0 cells, we show by immunofluorescence that AR is an abundant protein in the nuclei of epithelial cells lining the caput. The lipocalins 6, 8, and 12 are also differentially expressed in cluster 0 cells, and these proteins have a known role in transporting small hydrophobic ligands in their cup-shaped binding pocket (calyx) and transporting these to target cells. LCN8 is one of several epididymis-restricted lipocalins (Suzuki et al, 2007), suggesting that these may have a key role in sperm maturation. Cluster 1 cells have a less distinctive antimicrobial/defense gene expression signature than those in cluster 0: although other members of the β defensin gene family are among DEGs (DEFB129, DEFB13, and DEF112). Predominant transcripts in these cells were initially thought to reflect the secreted proteome of principal cells in the epididymis. CRISP1 is an androgen-responsive, abundantly secreted protein in the epididymis fluid, where it coats spermatozoa and is involved in both capacitation and fertilization (Ernesto et al, 2015). Hence, it was of interest to identify CRISP1 as a prominent DEG in cluster 1 cells, suggesting these may be the main source of this protein in the epididymis. However, our immunofluorescence data suggest that CRISP1 is localized in surface patches resembling apical cells and in a few narrow cells in the caput epithelium, unlike the distribution of principal cells. This is in marked contrast to its very abundant expression of CRISP1 in most cells in the corpus epithelium and in luminal secretions. To further investigate the possibility that cluster 1 cells encompassed a population of apical and narrow cells, we looked for markers of these two cell types in the DEG list. Apical and narrow cells were carefully examined in earlier work in the rat epididymis (Adamali & Hermo, 1996). Cathepsin D (CTSD) was highly expressed in both cell types, whereas a subunit of glutathione S-transferase (Y f, GST-P) was abundant only in apical cells. In contrast, β-hexosaminidase A was expressed at high levels in narrow cells, which were also the site of carbonic anhydrase II expression (CA2), suggesting a role in modifying the luminal pH. Among DEGs in cluster 1, we identified the genes encoding the beta subunit of hexosaminidase (HEXB) and carbonic anhydrase 8 (CA8), indicative of narrow cells and also gluthatione S-transferase Mu3 (GSTM3), suggesting the presence of apical cells. However, the presence of these markers in cluster 1 cells does not necessarily imply these cells have identical functions to the apical and narrow cells defined in the rodent initial segment and intermediate zone.
The abundance of these putative apical and narrow cell types in the scRNA-seq data is somewhat surprising, given the expected predominance of principal cells in the caput epithelium because together they contribute 1,369 cells compared with 2,192 principal cells in the combined analysis (Table S1). In earlier work in the rat epididymis (Trasler et al, 1988), principal cells were thought to account for about 75% of the total number. Whether our observation of a smaller contribution of principal cells in the human epididymis is due to species differences, or merely reflects cell recovery bias or the higher resolution of the scRNA-seq protocol, will become clear with more datasets from other species. It is of interest that these two cell types cluster closely together on the UMAP plot (Fig 2A), suggesting a more similar transcriptional signature in comparison with other cell types in the epithelium. Other secreted proteins encoded by cluster 1 cells are EDDM3A and EDDM3B, which are poorly characterized, but also thought to play a key role in sperm maturation (Kirchhoff et al, 1994). Of note, deletion of these genes has been implicated in some cases of idiopathic azoospermia (Damyanova et al, 2013;Dong et al, 2015). Also notable in cluster 1 cells are DEGs from another gene family involved in the innate immune response, WFDC8, WFDC9, WFDC2, and WFDC11 which all map to chr20q13.12 and some of which are also androgen dependent.
Another aspect of the biology of the epididymis which our single cell atlas may help resolve is the identity of stem cells in the genital duct epithelium. Here, we can learn from the stem cells defined in other epithelia, based upon their capacity to generate organoids from a single cell. In the intestine, Lrg5+ cells in the crypt alone can generate functional organoids, which reproduce many key intestinal functions (Clevers, 2016). In the airway, TP63 positive basal cells are thought to be the stem cell population (Zuo et al, 2015) although there remains some controversy about this. Epididymis organoids were generated from single cells in several species (Mandon et al, 2015;Pinel et al, 2019;Leir et al, 2020), confirming the existence of stem cells within the epithelial cell population. These organoids are spherical structures, with a layer of basal cells on the outside and additional epithelial cells on the inside facing a luminal space. The scRNA-seq data presented here shows TP63 is a highly significant DEG among the basal cells in cluster 3, with a subpopulation having very high expression levels. This observation was confirmed by immunofluorescence, where p63 was abundant in a subset of keratin 5-expressing basal cells. With the exception of a few individual cells with TP63 transcripts in clusters 1, 2, and 4, this marker is not evident in other cell population in the epididymis suggesting the epididymis stem cells reside in the basal cell compartment.
Finally, it is highly relevant in terms of our understanding of CF pathology and how this disease causes male infertility to revisit the cells that express CFTR in the epididymis. CF impairs the function of many epithelia in different organs. It is notable that many surface epithelial cells in the digestive tract, both in the pancreatic ducts and the intestinal crypts, express abundant CFTR. However, in the lung CFTR-transcripts are at very low levels in most cells in surface epithelium with the exception of the CFTR "high" cells now identified as "pulmonary ionocytes" (Montoro et al, 2018;Plasschaert et al, 2018). Earlier elegant immunofluorescence data suggested that ciliated cells in the surface epithelium of the airways were the main site of CFTR protein (Kreda et al, 2005), so the scRNAseq results were not consistent. We previously showed abundant CFTR mRNA and protein in cultured human caput epididymis epithelial (HEE) cells (Harris & Coleman, 1989;Leir et al, 2015;Browne et al, 2016b), although not in corpus and cauda HEE cells. We also found much lower levels of CFTR mRNA in caput tissue (Browne et al, 2016b), suggesting that the cells expressing CFTR were low abundance in the intact tissue or that in vivo CFTR expression was indeed lower in the same cell types. As for observations in the airway, our scRNA-seq data are similarly not consistent with recent immunofluorescence data on the human epididymis, where CFTR protein was shown to be high in principal cells along the duct (Sharma & Hanukoglu, 2019). It is possible that lack of specificity of the anti-CFTR antibodies may underlie these differences. Here, we show that in the caput epididymis, most principal cells, which are the most abundant surface epithelial cell type, express little or no CFTR mRNA. In contrast, clear cells (cluster 6), which express high levels of genes encoding the V-ATPase hydrogen pump along with ENaC, the epithelial sodium channel, are also the primary location of abundant CFTR. These observations are consistent with the epididymis clear cell having an equivalent role to the "pulmonary ionocyte" in the airway epithelium. In the caput epididymis, we suggest that the primary role of CFTR in these sites is rapid coordination of the luminal environment that is required for normal sperm maturation. In contrast, in the EDs, where a population of surface epithelial cells, probably principal cell-like, express abundant CFTR, the protein may be involved in the main functions of the ED in water reabsorption . Accordingly, the high levels of CFTR in cultured HEE cells could either reflect the adaptation of principal cells to the submerged culture environment, or be in part due to minor contamination with ED cells. Either way, the dual role of CFTR in the proximal male genital ducts provide at least two mechanisms, whereby loss of CFTR could lead to epididymis abnormalities, absence of the vas deferens, and associated infertility in CF males.

Materials and Methods
Tissue Human epididymis tissue was obtained with Institutional Review Board (IRB) approval from consented patients undergoing inguinal radical orchiectomy for a clinical diagnosis of testicular cancer. These are normal epididymis tissues and not pathological specimens because none of the epididymides have extension of the testicular cancer and donors are not receiving hormone therapy. Data included in this article were derived from seven donors aged 24 (2), 31, 32, 38, 47, and 57.

Single cell isolation
Connective tissue and fat were removed from epididymis tissue within~2 h of surgery and the tissue separated into EDs, caput, and corpus according to anatomical features ( Fig 1A). Of note there was substantial divergence in the morphology of tissue from different donors making an exact delineation of regions difficult. Caput tissue was cut into 1-2 mm pieces and digested with collagenase (2 mg/ml collagenase + 150 μg/ml DNAse I, both from Worthington Biochemical Corp.) at 37°C for 2 h with constant shaking. Digesting tissue was then agitated by pipetting, allowed to settle for 4 min and the supernatant collected and stored on ice as cell suspension 1. Fresh collagenase solution was then added to the tissue, which was digested for a further 2 h with shaking at 37°C, with subsequent agitation and settling as before. The supernatant (cell suspension 2) was then pooled with cell suspension 1 and tissue debris removed by passage through a 100-μm cell strainer (Pluristrainer). After centrifugation (300g) of the cells, they were washed once in PBS and then digested with Accutase (STEMCELL Technologies) for 20-30 min. At this point, the cells were resuspended by gentle pipetting and examined by phase-contrast microscopy and if cell clumps were still present the Accutase digestion was repeated by adding fresh Accutase after centrifugation. Once single-cell digestion was complete the cell pellet was lysed with ammonium chloride solution (310 mM NH 4 Cl, 23.8 mM NaHCO 3 , and 0.2 mM EDTA) for 3 min followed by adding 2.5× volume of PBS + 2% FBS and centrifuged at 300g for 5 min to remove red blood cells. Epithelial cells collected were washed twice in Hank's Balanced Salt Solution + 2% FBS with centrifugation at (200g), then resuspended in Hank's Balanced Salt Solution and passed through Flowmi cell strainers (40 μm; Bel-Art) to prepare single cells. Cells were counted and then used for scRNA-seq.

scRNA-seq and analysis
A total of 2,500-3,000 cells from each of three donor samples were used for scRNA-seq using the 10× Genomics Chromium Single Cell 39 Reagent Kit (v2). After Tapestation quality control, libraries were sequenced on a NovaSeq 600 sequencer (~300 million reads). Library reads were aligned to the hg19 genome package v1.0 using Cell Ranger 3.1.0, then the cell-UMI matrix was exported into Seurat V3. The matrix was subsequently filtered with min.cells = 3, min.features = 200. The mitochondrial reads ratio median was 0.0212, and the third quantile was 0.0324, well below the recommended 0.05. The three biological replicates were merged and 5,299 single-cell transcriptomes obtained post filtering, and corrected for batch effect using Seurat v3 integration with 30 dimensions and 20,000 anchor features. Cell neighbors were then found using 10 dimensions and unsupervised clustering at a resolution of 0.06. Thirty PCA dimensions were reduced using UMAP. The Seurat v3 package was also used to perform differential gene expression analysis and generate plots, including violin plots, ridge plots, and feature plots. Both FindAllMarkers and heat map generation used myAUC statistics methods. Cell Ranger 3.1.0 output data were also analyzed and visualized in the Loupe Cell Browser V3.1.1. Sequence data are deposited at GEO:GSE148963.

Supplemental material
The supplemental material includes six figures showing data pertinent to the results and discussion section; also two tables with details of scRNA cell counts (1) and DEG lists for each cell cluster identified in the scRNA-seq analysis by the Seurat pipeline (2).

Ethical approval
Procedures were performed according to the Case Western Reserve University Research Committee IRB protocol #2017-2099. Informed consent was obtained from all tissue donors.