A genetic, genomic, and computational resource for exploring neural circuit function

The anatomy of many neural circuits is being characterized with increasing resolution, but their molecular properties remain mostly unknown. Here, we characterize gene expression patterns in distinct neural cell types of the Drosophila visual system using genetic lines to access individual cell types, the TAPIN-seq method to measure their transcriptomes, and a probabilistic method to interpret these measurements. We used these tools to build a resource of high-resolution transcriptomes for 100 driver lines covering 67 cell types, available at http://www.opticlobe.com. Combining these transcriptomes with recently reported connectomes helps characterize how information is transmitted and processed across a range of scales, from individual synapses to circuit pathways. We describe examples that include identifying neurotransmitters, including cases of apparent co-release, generating functional hypotheses based on receptor expression, as well as identifying strong commonalities between different cell types.


Introduction
The anatomy of neural circuits is being characterized with increasing resolution and throughput, in part following a dramatic increase in the size of circuits amenable to detailed electron microscopy reconstruction (Swanson and Lichtman, 2016) and the development of genetic tools to access individual cell types (Luo et al., 2018). These efforts reveal anatomy at unprecedented detail, but not the molecular properties of cells. In principle, the genes expressed in each cell of a neural circuit should serve as a molecular proxy for cell physiology. However, most genomic efforts have focused on surveying neuronal diversity rather than characterizing circuit function (Ecker et al., 2017). To develop a resource exploring molecular correlates of circuit function, here we use an approach that genetically targets cell types within a wellcharacterized brain region to measure high-quality transcriptomes that can be integrated with connectomes.
Drosophila affords an ideal system to study neural circuits in detail, as both excellent genetic tools and high resolution connectomes are available. Here we focus on the repeating columnar circuits of the visual system, found in the optic lobes, a widely used model for studying circuit development and function with an extensive genetic toolbox and well-described anatomy ( Figure 1A; Nériec and Desplan, 2016;Silies et al., 2014;Apitz and Salecker, 2014). This network begins with photoreceptor neurons and contains several layers of connected neurons which process incoming luminance signals into multiple parallel streams of visual information ( Figure 1B). Many of its cellular components have been described by light microscopy, including classical Golgi studies (Fischbach and Dittrich, 1989) and recent analyses using genetic methods (Morante and Desplan, 2008;Otsuna and Ito, 2006;Nern et al., 2015;Wu et al., 2016). Electron microscopy reconstruction work has characterized the synaptic connections of many optic lobe neurons (Meinertzhagen and O'Neil, 1991;Meinertzhagen and Sorra, 2001;Rivera-Alba et al., 2011;Takemura et al., 2013;Takemura et al., 2015;Takemura et al., 2017;Shinomiya et al., 2019). Comparative studies have also explored the evolution of this ancient brain structure (Strausfeld, 2009). Despite this wealth of information, many of its fundamental properties remain unknown, including the neurotransmitters used at many of its synapses.
Measuring the genes expressed in specific cells of the brain is challenging due to its compact and complex organization. RNA sequencing (RNA-seq) addresses this challenge by profiling either single cells or genetically labeled populations of cells (Ecker et al., 2017). The latter approach requires genetic tools to access individual cell types but provides more direct access to cells of interests than sampling of unmarked single cells, especially for sparse cell types. Profiling identified cell types provides a direct link to previous work on the anatomy and physiology of those cell types. Cell type-specific drivers also facilitate follow-up experiments, for example evaluating the role of individual genes in individual cells. In Drosophila, large collections of GAL4 driver lines (Jenett et al., 2012;Tirian and Dickson, 2017) and the possibility to further refine these patterns with intersectional methods such as split-GAL4 (Luan et al., 2006;Dionne et al., 2018) enable genetic access to many neuronal populations (see, for example, Tuthill et al., 2013;Aso et al., 2014;Wu et al., 2016). We therefore chose the genetic, rather than single cell, approach to build a genomics resource to explore circuit function.
We previously developed an Isolation of Nuclei Tagged in a specific Cell Type (INTACT) method (Deal and Henikoff, 2010) to measure transcriptomes and epigenomes of genetically-marked neuronal populations in Drosophila (Henry et al., 2012) and mouse (Mo et al., 2015). Here, we develop a tandem affinity purification of INTACT nuclei (TAPIN) method with increased specificity, sensitivity, and throughput. By combining this method with an extensive set of new driver lines with predominant expression in specific cell types and a new probabilistic method to interpret transcript abundance, we build a resource of high-quality transcriptomes for one hundred driver lines. We selected drivers that expressed in cell types constituting the lamina (Fischbach and Dittrich, 1989;Tuthill et al., 2013;Edwards et al., 2012) as well as the major cell types of the circuits that compute the direction of visual motion (Mauss et al., 2017) (Figure 1C). We further included neuronal populations in two central brain regions, the mushroom body and central complex, primarily to serve as informative outgroups.
By profiling these driver lines, we develop an expression catalog for 67 Drosophila cell types as well as several broader cell populations. Through validation experiments and comparisons to the literature we demonstrate that this resource is useful both for identifying individual genes expressed in specific cell types and for reveal-ing broader patterns such as the expression of all members of a gene family across many cell types. As an example, we describe the expression of neurotransmitters and their receptors and use this information to interpret synaptic connectivity. For example, we unexpectedly found that the R8 photoreceptors express acetylcholine in addition to histamine and show that this apparent cotransmitter phenotype is further supported by differential expression of neurotransmitter receptors in R8 postsynaptic partners. Our results demonstrate that combining expression and connectomes leads to specific testable hypotheses about circuit mechanisms that are inaccessible to either approach alone.

Genetic tools for labeling the visual system
To enable transcriptome analyses of defined cell populations, we first assembled a collection of genetic drivers to access them. For this study, we combined drivers from existing collections for cell types in the lamina (Tuthill et al., 2013), the mushroom body (Aso et al., 2014), and the lobula (Wu et al., 2016) with new driver lines for many additional optic lobe cell types and also some neurons of the central complex (Wolff and Rubin, 2018;T. Wolff, personal communication). Nearly all of these drivers were generated using an intersectional method, split-GAL4, to refine expression patterns of GAL4 driver lines. To characterize new driver lines, we imaged expression patterns across the entire fly brain to determine overall driver specificity ( Figures 1D, 1-S1) and examined anatomical features such as layer patterns in higher resolution images to identify specific cell types (Table S1, Figure 1-S2). For most lines, we further confirmed the identity of labeled cells by examining the morphology of individual cells using stochastic labeling (Figure 1-S2). Although we noted that a few patterns also include some additional contaminating cells (Table S1), these driver lines are the most specific tools currently available to access individual cell types in the optic lobe.

Purifying nuclei with TAPIN
Next, we employed an improved INTACT method to measure nuclear transcriptomes in genetically defined cell populations (Henry et al., 2012), and we also developed a new variant of the method that permits higher throughput with increased purity and sensitivity. In both approaches, nuclei are purified using a nuclear tag whose expression is driven in a cell population of interest by either a standard or split GAL4 driver (Figure 2A). The new variant protocol, tandem affinity purification of INTACT nuclei (TAPIN), uses a bacterial protease (IdeZ) to specifically cleave antibodies in the hinge region separating their Fc and antigen binding F(ab') 2 fragments (Figures 2B,

FIGURE 1
Figure 1: Genetic tools to access cell types in the visual system. A. Major brain regions profiled in this study (brain image from Jenett et al., 2012). The optic lobes have a repetitive structure of~750 retinotopically arranged visual columns of similar cellular composition. B,C Examples of single cells in the optic lobe. B. Left, subregions of the fly visual system. Right, examples of layers and neuropil patterns of various classes of visual system neurons. C. We profiled cell types arborizing in the lamina (blue), medulla (purple) and lobula complex (green) of the visual system. Many cells contribute to multiple neuropiles so other groupings are possible. Note, some cell types are present at one cell per column, while others are less numerous with cells that each contribute to several columns. For example, the main synaptic region of the first optic lobe layer, the lamina, contains processes of some 13,000 cells but these belong to only 17 main cell types: 14 neuronal and 3 glial (top row). A small number of additional neurons (lamina tangential cells, Lat) project to a region just distal to the main lamina neuropile. D. Representative expression patterns of driver lines that target specific cell types. Each image is a maximum intensity projection of a whole brain confocal stack (only one optic lobe is shown). In each image the brain is counter-stained (magenta) with a neuropil marker and both the targeted cell type and the driver are indicated in the lower left and right corner, respectively. Additional images (focusing on drivers first described in this study) are shown in Figures 1-S1 and 1-S2. Imaging parameters and brightness and contrast were adjusted individually for each image. For genotypes and image details see Table S5.
2-S1B). Treating protein A magnetic bead-bound nuclei with this protease generates both nucleus-F(ab') 2 and bead-Fc complexes. Soluble nucleus-F(ab') 2 is then recaptured on protein G magnetic beads, removing nonspecifically bound material from the first capture. INTACT successfully profiled many of the abundant cell types in the optic lobe (> 1000 cells per brain), but failed for sparser cell types and those whose nuclei were difficult to purify by differential centrifugation (photoreceptors, glia, T4, T5). We solved these problems with TAPIN, which does not purify nuclei prior to bead capture. The greatest advantage of TAPIN is its ability to purify nuclei from sparse cell types (< 50 cells/brain) (Table S1). INTACT is not suitable for these lines because of loss during differential centrifugation. This difficulty cannot be overcome by processing more brains per experiment because differential centrifugation is difficult to scale. TAPIN solves this problem by running a first capture on crude extracts generated from hundreds to thousands of fly heads. The substantial background in this first capture is reduced 5-to 6-fold in a second capture with only a modest decline in both the yield of nuclei and amplified cDNA ( Figure 2C).

Measuring transcriptomes with INTACT-and TAPIN-seq
We applied INTACT and TAPIN to the cell populations defined by the genetic drivers we described above (Table S2). Most drivers express in a single anatomically defined cell type or a small group of related cell types. Others target more heterogeneous cell populations sharing a common property (e.g., driver lines aimed at recapitulating the expression of a neurotransmitter marker). Altogether, we built 250 RNA-seq libraries from 242 samples of purified nuclei (46 using INTACT and 196 using TAPIN) and 8 manually dissected samples (Table S2). We estimated relative transcript abundance in each library using kallisto (Bray et al., 2016). Libraries built from more nuclei yielded more cDNA ( Figure 2D), allowed more genes to be detected ( Figure 2E), had more estimated transcripts (Figure 2-S1C), more reproducible transcript abundance ( Figure 2F), and less bias in coverage across gene bodies (Figure 2-S1D,E). We focused on 203 libraries that had at least 8,500 genes detected, 3µg cDNA yield, and 0.85 Pearson's correlation of transcript abundances in two biological replicates. These 203 libraries consist of at least two biological replicates built from 100 drivers that covered 67 cell types (53 visual system, 7 mushroom body, 5 central complex, 2 muscle), 6 broader cell populations (ChAT, Gad1, VGlut, Kdm2, Crz, and NPF), and 2 manually dissected tissues (the lamina and remainder of the optic lobe) (METH-ODS). We provide the read and abundance data for the remaining sub-optimal libraries (47 libraries covering 24 cell types) in the event they may be informative, but we do not consider these to be of sufficient quality and do not consider them further here. We did not sort the sex of flies when preparing TAPIN-seq libraries, as we did not observe large differences in male and female expression profiles (Figure 2-S1F).
We were encouraged by the clear enrichment of previously identified markers in cell types where they were expected. For example, we recovered transcription factors (TFs) previously found in the developing monopolar interneurons and inner photoreceptors (Tan et al., 2015; Figure 2G). We further confirmed our measurements by comparing TAPIN-seq results for twelve cell types that were also recently profiled by FACS-seq (Konstantinides et al., 2018;Figure 2-S2A,B) and found concordant expression of cell type-enriched genes. This concordance also argues against major differences between nuclear and cytoplasmic transcriptomes. In combination with the technical quality of our libraries, this confirmation by independent gene expression measurements validated our approach, and also motivated us to explore how to best interpret a large dataset of relative abundances.

Interpreting transcript abundance with mixture modeling
Deriving biological insights from a matrix of transcript abundances is not straightforward. While a cell's expression of a gene can be used to infer a specific functional property of that cell, the level of expression that is needed to establish confidence in such an inference is much less clear. For example, expressing the vesicular acetylcholine transporter (VAChT) implies that a neuron is cholinergic. However, VAChT transcript abundance exhibits a wide distribution and it is not clear, a priori, what level is necessary to conclude that a cell is cholinergic ( Figure 3A).
We used mixture modeling to address this challenge by describing the expression levels of each gene as arising from a mixture of two log-normal distributions representing binary 'on' and 'off' states ( Figure 3A; METHODS). Genes can of course express in more than two states, but we show through extensive validation that this simplifying assumption is a useful one. Modelling VAChT expression in the high-quality TAPIN/INTACT-seq libraries unambiguously inferred VAChT states for all drivers (Figure 3B). We also found that the model was useful for addressing transcript-carryover, evident in our data (as well as published bulk and single cell studies in the fly (Davie et al., 2018) and mouse (Siegert et al., 2012;Macosko et al., 2015)) as photoreceptor transcripts detected in non-photoreceptor cells (Figure 3-S1A). For example, the model correctly inferred that only R1-6 photoreceptors expressed the primary rhodopsin ninaE, although ninaE abundance in other cells reached as high as 2,702 TPM (the mushroom body cell type PAM_1) (Figure 3-S1B,C). We used this method to transform our catalog of transcript abundances to probabilities of expression ( Figure  IdeZ cleavage Capture 1 Input 0 1 2 3 4 5 6 7 nuclear yield (thousands) cDNA yield (ug) INTACT TAPIN   VAChT abundance (TPM+1) # drivers on off

Number of expressing cells
Dm4 Tm4 levels and dynamic ranges between on and off states ( Figure 3-S1D,E). To further simplify these probabilities, we discretized them into on (p ≥ 0.8) and off (p ≤ 0.2) states, and otherwise considered them to be ambiguous (0.2 < p < 0.8). The expression states inferred for replicates had a median 95% concordance (Figure 3-S1F). We combined information from replicates to infer expression at the driver and cell type levels (METHODS). We found many genes that express in all cell types, and many that express in only one, with a range in between ( Figure 3D,E). As expected, given their roles in specifying identity, homeobox transcription factors (TF) expressed more specifically than transcription factors in general ( Figure 3E). Neuropeptides also expressed specifically, while genes with the more general function of synaptic vesicle endocytosis were broadly expressed. We explore these functional properties in more detail later ( Figure 44).

Evaluating accuracy of TAPIN-seq measurements
To validate our TAPIN-seq measurements, we first compared our inferred expression states to FlyBase curated reports of protein expression (n=197 data points of gene/cell pairs; 4 negative points, 193 positive points; n=22 cells; n=69 genes, Table S3) and found 93% concordance (183 matches; 14 mismatches from six genes; 0 mismatches for negative benchmark points; Figure 3-S1G). The benchmark mismatches fell into three categories: expression levels near the transition between inferred on and off components (veli, verm, para; Figure 3-S1H-J), genes with a wide dynamic range of expression (Syx, Rab11; Figure 3-S1K,L), and genes with undetected transcript but previously detected protein (Myo61F; Figure 3-S1M). The first two categories likely arise from imprecision in the model's fitted components and its representation of transcript abundance as bimodal, rather than continuous. The third category (conflicting transcript and protein levels) could reflect either technical issues (low sensitivity in our measurements, or false positives in the prior work due to antibody crossreaction) or biological complexities (e.g., long-lived transcripts, subcellular localization).
To further evaluate our results for genes expressed across a wide range of levels, we compared the model output to protein expression patterns for two transcription factors: Forkhead (fkh) and Ets65A. We visualized each protein using a C-terminal GFP tag; the tagged proteins were expressed from BAC transgenes with large flanking sequences to ensure a near native genomic context (Kudron et al., 2018). From the transcript data, we inferred fkh gene expression in 14 cell types across a 35-fold range of abundance (60 to 2,103 TPM). Of 28 cell types that we visualized at the protein level, fkh was detected in all but one that we expected from TAPINseq ( Figures 3F,G and 3-S2A). The sole exception, Tm4, has a fkh abundance (60 TPM) near the border between the inferred off and on states ( Figure 3F). However, we did detect protein in Dm9, which had a near identical raw transcript abundance (61 TPM). Similarly evaluating Ets65A expression identified two mismatches out of 11 tested cells (Figures 3H,. Ets65a protein was not detected in Tm20 (70 TPM) and epithelial glia (161 TPM), while it was weakly detected in Dm3 (77 TPM). These results further support the accuracy of TAPIN-seq and our statistical model even for genes with a wide dynamic range. The agreement between our transcript on/off calls and protein expression encouraged us to use the discretized on/off calls for all further analyses; the unprocessed relative abundances in TPM are reserved for deeper analysis when needed.

Examining the relation between cell types using transcriptomes
To study the relation between cell types, we built a dendrogram based on the expression states we inferred for the whole transcriptome and estimated the support for each branch point with bootstrap resampling ( Figure 4A). The broad groupings were well supported and mostly intuitive: muscle were outgroups, followed by a mushroom body cell type (PAM_4), the glia, the photoreceptors, and the remaining neurons. Several fine groupings of anatomically closely related neurons were also well supported (e.g., Kenyon cells; C2, C3; Lawf1, Lawf2; T4, T5; LPLC1, LPLC2). However, mid-level branchings were not well supported, indicating the lack of a simple hierarchical relationship. Neurons were generally grouped by region: central complex, mushroom body, and optic lobe. One surprise was the grouping of Tm20 and Dm1, away from all other optic lobe cell types. Upon closer examination, the identity of genes expressed exclusively in these two lines (lz, Pdh, bw) suggest that this grouping is driven by shared pigment cell contamination in the GAL4-tagged patterns of these driver lines. Similarly, the unusual position of PAM_4 is likely due to some unidentified non-neuronal cells in the driver. These are examples of imperfections in the GAL4 driver lines. While they can lead to some false positives for the main target cell types, they can also provide additional information. For example, analyzing the overlap between Dm1 and Tm20 allowed us to infer marker genes expressed in the pigment cell population.

Identifying genes that mark cell types and groups
We next identified genes that marked cell groups in the tree, using three criteria: genes that expressed in all the cells within a group, at most two cells outside this group, and with transcript abundance higher than all cells outside the group (For simplicity, we will hereon refer to 7 of 45 Pm4  T4  T5  Tm1  Tm2  Tm3  Tm4  Tm9  Tm20  Tm29  TmY3  TmY5a  LC4   Heatmap of marker genes enriched in photoreceptors, glia, muscle, and pigment cells. C. Distribution of expression breadth for genes in "terminal" FlyBase gene groups with more than 10 members in our expression probability matrix. The least-and most-broadly expressed gene groups are labeled, along with the DPR-interacting, beat and DPR family of extracellular proteins. D. TfAP-2 transcription factor distinguishes closely related cell types T4 and T5. E,F. TfAP-2 protein is specifically expressed in T4 and not in T5, confirming this detection of differential expression levels. GFP-tagged Tfap-2 (mainly nuclear, in green; see Table S5 and Methods) is shown together with a membrane marker (magenta) expressed in T4 (E) or T5 (F) cells. G. Comparison of genes with differential expression in two driver lines for T5 neurons expressing in different subtypes, identify genes that differentially label layers of the lobula plate (corresponding to different subtypes of T5 cells). H. Confirming the TAPIN-seq identification, klg protein (detected using a GFP tag (green); see Table S5 and Methods) is expressed in T4/T5 cells with the expected layer specificity (layers 3 and 4) in the lobula plate (LP). A neuropil marker is shown in magenta.

DPR-interacting proteins
cell type as just cell). We used these criteria to identify markers for photoreceptors (n=108), glia (n=60), and muscle (n = 76) ( Figure 4B, Table S4). These genes included many known as well as new markers. For example, genes enriched in photoreceptors include signaling components (Arr2, Galphaq) and transporters (trpl, Eaat2) with known physiological roles as well as uncharacterized orphan transporters (e.g., CG8468). We also identified 18 markers for pigment cells using the Tm20 and Dm1 profiles. In addition to the three types of lamina glia we profiled, several other glia types are present in both the lamina and the medulla. Genes expressed exclusively in the dissected samples (lamina, remainder of optic lobe) and not in the TAPIN libraries identified marker genes for optic lobe cells that we did not directly profile, such as medulla glia. Indeed, the genes identified in this way included several known markers for astrocytes (alrm, wun2, Obp44a) (Huang et al., 2015). We examined the breadth of expression of different functional groups of genes, as defined by FlyBase gene group curation. HOX-like homeobox TFs were among the most specifically expressed group, while groups of core cellular machinery (e.g., beta importins, mitochondrial complexes) were among the most broadly expressed groups ( Figure 4C). Some groups included both broadly and very specifically expressed genes. For example, among cell adhesion molecules, we noted an interesting distribution for three gene groups proposed to be involved in protein-protein interactions that underlie synaptic connectivity (Özkan et al., 2013;Tan et al., 2015;Carrillo et al., 2015). While the 11 DPRinteracting proteins (DIP) were among the most specifically expressed genes (expressed in a median of 6 cells), beat (median, 25.5 cells) and DPR (median, 51 cells) genes were more broadly expressed (Figure 4-S1A-D). As physical interactions among these and other extracellular proteins have been systematically characterized (Özkan et al., 2013), we combined their expression and interaction patterns to estimate the number of potential interaction between cells in the lamina (Figure 4-S1E), many of which are in actual contact (Figure 4-S1F). Except for a clear paucity of interacting protein pairs expressed by glia, these global expression-based patterns did not correlate well with connectivity in the lamina. However, we found that every pair of lamina cells expressed tens of interacting protein pairs, highlighting the broad potential for cell-cell interactions not only in the developing (Tan et al., 2015) but also adult optic lobe.

Transcriptomes can distinguish closely related cell types and subtypes
To ask whether we could identify genes distinguishing closely related cell types, we examined T4 and T5. These cells had similar transcriptomes and were neighbors in the phylogenetic tree, but we found one transcription factor, TfAP-2, that was expressed nearly two orders of magnitude higher in T4 (390 TPM) than T5 (6 TPM) ( Figure 4D). We confirmed this pattern at the protein level ( Figure 4E,F).
T4 and T5 cells can each be further divided into four subtypes that preferentially respond to motion in one of four cardinal directions and differ in anatomical details such as the lobula plate layer to which they project axons. While our split-GAL4 lines do not isolate single T4/T5 subtypes, the T5_d1 and T5_d2 drivers show differences in subtype expression (Figure 1-S1B,B',C,C'). Comparing the transcriptomes of these two drivers confirmed previously described markers (Con, bi, dac;Apitz and Salecker, 2018) that distinguish T4/T5 cells of lobula plate layers 1/2 and 3/4, and indicated additional genes, including a transcription factor (dysf) and cell adhesion molecules (klg, Dscam3) with selective expression in these subtypes ( Figure 4G). As a further confirmation of this finding, we verified that a tagged klg protein showed layer-specific expression in the lobula-plate consistent with these T4/T5 subtypes ( Figure 4H).

Reference bulk transcriptomes help interpret single cell transcriptomes
Single cell RNA-seq (scRNA-seq) was recently used to map the optic lobe (Konstantinides et al., 2018) and brain (Davie et al., 2018). Despite its routine use, interpreting scRNA-seq measurements -clustering single cell transcriptomes and labeling these clusters as known cell types -remains challenging. For example, the 52 single cell clusters found in the optic lobe (23 labeled as known cell types, including 7 types of glia; Konstantinides et al., 2018) and the 87 clusters found in the whole brain (41 labeled, also including 7 glia; Davie et al., 2018) far under-estimate the expected diversity of cell types -over one hundred anatomically distinct neuronal cell types have been described in the optic lobe alone (Fischbach and Dittrich, 1989;Morante and Desplan, 2008;Otsuna and Ito, 2006;Nern et al., 2015;Wu et al., 2016). Furthermore, comparing the number of single cells in each optic lobe cluster to the true abundance of each cell type (as established by neuroanatomical studies) reveals that the single cell map does not proportionally represent abundance (ranging from~5 times fewer Dm8/Tm5c cells to~7 times more Dm12 cells than expected in the optic lobe map; Figure 5A). The whole brain map, measured using the more sensitive 10X scRNA-seq platform rather than Drop-seq used for the optic lobe map (Svensson et al., 2017), showed similar cell type abundances ( Figure 5B). Without detailed neuroanatomy to serve as ground-truth, this similarity could be interpreted as reproducibility across platforms. Instead, our results suggest caution when interpreting cell type frequencies from scRNA-seq maps, as they can be skewed by experimental artifacts such as cell type-specific differences in RNA isolation yields, computational over-clustering, or inaccurate cell type labeling. Given the known number of cell types and their frequencies, it is clear that interpreting single cell measurements is challenging.
Comparison with our data reveals additional challenges in mapping scRNA-seq clusters to known cell types. To compare our data to the brain map, we used non-negative least squares regression to model each TAPIN-seq transcriptome as a linear weighted sum of single cell cluster transcriptomes, assuming that large regression coefficients reflect matching cell types (Davie et al., 2018;Cao et al., 2019). We interpreted the results of this comparison against an ideal scenario where single cell clusters were perfectly resolved and accurately labeled, and assuming that the driver lines used for TAPIN-seq profiling had minimal expression outside of the main target cell type. In this scenario, we would expect a unique cluster matching each of our TAPINseq profiles of cell type-specific driver lines, as well as many unmatched clusters, reflecting cell types that we did not profile. However, we observed few one-to-one matching clusters for our TAPIN-seq profiles (e.g., T1, Tm1, Dm8, Dm9, Pm3), several one-to-many matches (e.g., photoreceptor cluster #53 matching our R1-6, R7, R8-Rh5, and R8-Rh6 profiles; also, L1-5 cluster #20, Lawf1/2 cluster #58, and T4/T5 cluster #24), clusters with no TAPIN-seq matches (e.g., clusters #7, 15, 23), as well as TAPIN-seq cell types without a matching cluster (e.g., Dm4, Dm11, Mi4, Tm2, Tm20, LPLC1) ( Figure 5C). The matches confirmed several clusters labeled as single cell types (e.g., Tm1, Mi1) or multiple cell types (e.g., photoreceptors, L1-5, Lawf1/2, T4/5, Kenyon cells) and also suggested possible labels for previously unlabeled clusters (e.g., Dm8 cluster #52, Dm9 cluster #74, pigment cell cluster #76) and alternative labels for previously labeled clusters (e.g., TmY5a TAPIN matches the TmY14 cluster #11; Lai TAPIN matches Dm8/Tm5c cluster #39). We observed similar results when analyzing the optic lobe map, with few apparent single cell -TAPIN-seq matches (Figure 5-S1). Although this result could arise from major errors in our TAPIN-seq profiles, this possibility is unlikely given our earlier validation results and the concordance between our TAPIN-seq profiles and cell type-enriched genes identified from independent FACS-seq measurements (Figure 2-S2).
As a separate comparison of the bulk and single cell profiles, we examined the bulk expression of genes marking each single cell cluster ( Figure 5-S2). Confirming the regression results, this analysis also found few one-toone matches in which single cell cluster markers were enriched in only a single TAPIN-seq profile. Instead, most cluster markers were either enriched in multiple bulk cell types (over-clustering), or were not enriched in our data (cell types we did not profile by TAPIN-seq). As before, many TAPIN-seq profiles were not enriched for cluster markers, reflecting cell types that were either missing or clustered with other cell types in the single cell map.
We further explored specific examples where the TAPIN-seq data offered new insight into the single cell maps by suggesting alternative labels or labeling unannotated clusters. The single cell clusters labeled as TmY14 matched the TmY5a TAPIN profile ( Figure 5C). The cluster was originally labeled based on the expression of a single transcription factor, knot (kn), as determined using a kn-GAL4 reporter. We also observed kn expression in our TmY5a measurements and further confirmed its expression in both TmY14 and TmY5a cells using the kn-GAL4 reporter line, suggesting that the cluster likely includes not only TmY14 cells, but also TmY5a and other kn-expressing cells ( Figure 5-S3). Similarly, we found that the Dm2 cluster (optic lobe cluster #55), which was labeled based on a Dm2 FACS-seq profile, matched our Mi15 profile ( Figure 5-S2A). This observation is concordant with previous reports that the line used to FACS sort Dm2 also expresses in Mi15 (Supplementary Figure 2 in Takemura et al., 2013, Table S4 in Nern et al., 2015). Finally, we found that the Dm8/Tm5c cluster (brain cluster #39) matches our Lai TAPIN-seq profile, while unlabeled clusters match our Dm8 and Dm9 TAPIN-seq profiles ( Figure 5D). Our measurements also suggest labels for other previously unannotated clusters, such as brain cluster #76 which likely reflects pigment cell, as demonstrated by enrichment of its marker genes in both Dm1 and Tm20 profiles -both measured with lines that also express in pigment cells (Figure 5-S2B). As expected, the genes marking this cluster include known pigment cell markers (e.g., Pdh, rdhB). Altogether, our results demonstrate that cell type-identified data, such as bulk transcriptomes, can help interpret single cell RNAseq measurements.

Profiles reveal neurotransmitter output for most neuron types
The proteins that synthesize and transport neurotransmitters are well known, enabling us to use their expression to predict neurotransmitter phenotype. We used histamine decarboxylase (Hdc), glutamate decarboxylase (Gad1), the vesicular acetylcholine transporter (VAChT), and the vesicular glutamate transporter (VGlut) to identify potential histaminergic, GABAergic, cholinergic, and glutamatergic cell types, respectively (Figure 6A). Our model unambiguously inferred expression states for these genes and indicated a single transmitter (from this group) for nearly all neurons we profiled. A second cholinergic marker, choline acetyltransferase (ChaT), matched VAChT expression almost perfectly (the two genes also share an exon). The sole exception, apparent expression of ChAT but not VAChT in R7 photoreceptors, likely results from a subset of dorsal rim R8 cells labeled by the R7 driver line (further discussed below, also see Table S1).
Besides these four neurotransmitters that we identified by one or two marker genes, we also identified candidate dopaminergic neurons based on the combined expression of tyrosine hydroxylase (ple), dopa decarboxy-  TmY5a  TmY3  Tm29  Tm20  Tm9  Tm4  Tm3  Tm2  Tm1  T5 Figure 5: TAPIN-seq complements single cell RNA-seq profiling. A,B. We evaluated whether single cell RNA-seq of the optic lobe (A, Konstantinides et al., 2018) and brain (B, Davie et al., 2018) proportionally represent cell types found in the optic lobe. By comparing the single cell cluster sizes to the true abundance of each cell type (estimated as described in the methods) we found that the scRNA-seq map can both under-and over-estimate the abundance of each cell type (assuming accurate cell type labels), or that the cell type is incorrectly assigned (i.e. contains different or additional cell types). To estimate the true cell count, we made use of known anatomy (for example, several cell types are known to be present exactly once in each of the 2 x~750 medulla columns per brain) or relied on published counts. In addition, we performed some new counts. (See methods for details.) Observed/expected ratio = ( (size of cluster labeled as cell type X / size of cluster labeled as T1) / (true abundance of cell type X / true abundance of T1)). C. We used non-negative least squares regression to model each TAPIN-seq profile as a linear weighted sum of single cell clusters in the whole brain scRNA-seq map. The heatmap represents the regression coefficients of each single cell cluster (rows) contributing to the TAPIN-seq profile of each cell type, normalized within rows. D. We evaluated expression of genes that mark selected single cell clusters (Davie et al., 2018) in our TAPIN-seq profiles of visual system neurons. (see Figure 5-S2 for the complete heatmap). lase (ddc), vesicular monoamine transporter (Vmat) and dopamine transporter (DAT). While DAT, ple, and ddc were also expressed individually in several cell types that did not express Vmat, only known dopaminergic cell types and one medulla neuron (Mi15) expressed this combination ( Figure 6A).
One neuronal cell type, T1, expressed none of the neurotransmitter markers VGlut, VAChT, Vmat, and Gad1 ( Figure 6A). Although T1 does express most panneuronal genes, it does not express bruchpilot (brp), a key component of presynaptic active zones. Consistent with this result, EM reconstruction has identified very few T1 presynaptic specializations .
Transmitters for nearly half of our cell types have been previously proposed and generally agree with our results. For example, VAChT/ChAT expression in Kenyon cells supports recent reports showing they are cholinergic (Barnstedt et al., 2016;Crocker et al., 2016). Fluorescence in situ hybridization and immunolabeling guided by our measurements confirmed the expression of ChAT, Gad1, and VGlut in Mi1, Mi4, and Mi9, respectively (Long et al., 2017;Takemura et al., 2017). However, we see considerable differences between our assignments and some previous work that used reporter transgenes (Varija Raghu et al., 2011;Raghu and Borst, 2011;Raghu et al., 2013), which we generally attribute to unfaithful transgene expression patterns. We believe our assignments to be more reliable, however they are not without problems. For example, one assignment inferred by our model that seems unlikely and is not supported by other available data is the presence of Gad1 in Mi9, which was not detected in the FISH or antibody experiments mentioned above. Given the presence of some contaminating Mi4 cells in at least one Mi9 driver and the lower Gad1 abundance (mean 276 TPM in Mi9; 2165 TPM in Mi4; 1870 mean TPM in predicted GABAergic cells), we attribute the Mi9 Gad1 signal to contaminating contributions from other GABAergic cells such as Mi4.

Transcriptional regulation of neurotransmitter output
We next tried to identify transcriptional regulators of neurotransmitter output, by searching for TF genes expressed in strong correlation with transmitter phenotype. However, we only found such TFs for histaminergic out-put ( Figure 6-S1A), which in our dataset is only represented by photoreceptor neurons. This observation agrees with work on neuronal identity showing that single TFs rarely encode transmitter identity, but rather different TF and TF combinations are used to specify the same neurotransmitter output (Hobert, 2016). We thus expanded our search to TFs whose expression was informative about transmitter phenotype (i.e., cells expressing TF A are likely to produce neurotransmitter B; even if not all cells producing neurotransmitter B express TF A; Figure 6-S1A). This search identified candidate TFs for nearly all neurotransmitter types. For example, the 19 neuronal types (including the broad chat-GAL4 line) expressing apterous (ap) are cholinergic. Its worm ortholog, ttx-3, regulates the cholinergic phenotype of the AIY neuron (Wenick and Hobert, 2004). Several other TFs we identified also have worm or mouse orthologs implicated in neuronal identity (Figure 6-S1B). Several TFs appeared to identify a transmitter phenotype within a group of cell types but not across the entire dataset. For example, Lim3 distinguishes the GABAergic Dm10 from the other Dm cell types in our dataset and is also expressed in several other GABAergic cells (Mi4, Pm3, Pm4) but was also detected in the cholinergic LC4 and the glutamatergic TmY5a and Tm29. We confirmed the differential Lim3 protein expression in Dm10 and Dm12 cells ( Figure 6-S1C). Several of the transcription factors that we found to be informative of neurotransmitter output were also implicated by single cell RNA-seq data, including ap (cholinergic), tj (glutamatergic), and Lim3 (GABAergic) (Konstantinides et al., 2018). Our data also indicate exceptions to these patterns (i.e., neurons expressing tj and Lim3 but with a different neurotransmitter phenotype; Figure 6-S1A). These observations indicate that neuronal features are likely regulated in a contextdependent and combinatorial manner, and that transcriptomes can identify putative regulators.

Co-expression of canonical small molecule transmitters with non-canonical transmitters is widespread
Co-release of multiple neurotransmitters can enhance the signaling capacity of neurons and neural circuits. For example, the same cell type might release different transmitters under distinct conditions or use them to elicit distinct responses in different target cells. In addition to Mi9

Nos
Nos C3  (discussed above as being likely due to contamination), we observed two cases of potential co-transmission involving the canonical small molecule neurotransmitters. Both Mi15 drivers express dopaminergic and cholinergic markers, and both R8 drivers expressed cholinergic and histaminergic markers. We confirmed expression of Vmat protein in Mi15 ( Figure 6B), the first identified columnar dopaminergic cell type within the optic lobe, and further below we confirm the unexpected VAChT expression in R8 ( Figure 8A). Evidence for co-transmission involving additional molecules, such as neuropeptides or nitric oxide, appears frequently in our data set. Nitric oxide is a widely conserved signaling molecule that can act on many kinds of cells, including neurons (Lowenstein and Snyder, 1992). We observed very specific expression of its synthesizing enzyme, nitric oxide synthase (Nos), in the lamina (C2, C3, and Lawf2) and medulla (Mi4, Pm4, Tm4 and Mi15). To further validate these results, we confirmed Nos expression at the protein level in C3 neurons (Figure 6C). Nitric oxide can be released extra-synaptically, potentially enabling signaling between neurons that are not synaptic partners.
Several neuropeptides and their receptors were also expressed in distinct patterns suggesting widespread yet specific peptidergic signaling in the visual system (Figure 6D). For example, AstA was observed in just one cell type (Pm3; confirmed at the protein level; Figure 6E), while AstC was expressed in several cell types, and pigment-dispersing factor (Pdf) was expressed in none of the optic lobe cells we profiled. The receptors for all three of these neuropeptides were more broadly expressed ( Figure 6D). The broad expression of Pdf receptor (Pdfr) is consistent with the extensive arborization previously observed for Pdf-expressing neurons at the surface of the medulla.
While we focused on genes with well-known functions, our expression patterns also suggest new functions for poorly characterized genes ( Figure 6A). For example, photoreceptors specifically expressed CG8468, an orphan transporter in the solute carrier 16 (SLC16) family of monocarboxylate transporters. This gene might represent a candidate vesicular or plasma membrane transporter of histamine, which remains unidentified in any species. We also observed photoreceptor-specific expression of CG45782 (lovit), a member of the SLC45 sucrose transporter family recently reported as a putative histamine transporter (Xu and Wang, 2019).

Broad and patterned expression of neurotransmitter receptors
Since the functional consequences of the release of a neurotransmitter depend on which receptors for this transmitter are expressed in the receiving cell, measuring the expression of both neurotransmitter input and output genes is necessary to assign potential synaptic signs to connectomes. For example, glutamatergic transmission in Drosophila may be either inhibitory or excitatory, depending on the receptors.
In general, neurotransmitter receptors are broadly expressed, qualifying each cell type to detect multiple neurotransmitters ( Figure 7A). Patterns for individual receptors (or receptor subunits) varied widely. Some receptors, such as the GluClalpha glutamate-gated chloride channel, thought to be the main mediator of inhibitory glutamatergic transmission in flies, were expressed in most but not all cell types ( Figure 7A,B). Expression of others was much more restricted, such as the EKAR glutamate receptor subunit detected only in photoreceptor neurons, consistent with previous work (Hu et al., 2015). Nearly all cells expressed receptors for acetylcholine, GABA, and glutamate, as expected from the combination of predicted transmitter phenotypes and connectomics data. Receptors for neuromodulators such as serotonin, dopamine, octopamine, and neuropeptides in general were also widespread ( Figure 6D). For example, octopamine receptors were expressed in broad, yet gene-and cell-type specific patterns, consistent with widespread octopaminergic modulation of visual processing (for example, Arenz et al., 2017;Strother et al., 2018;Tuthill et al., 2014). We confirmed Oamb expression at the protein level in specific lamina neurons and glia, including Lawf2 cells previously shown to be octopamine sensitive (Tuthill et al., 2014) (Figure 7C).

Combining transcriptomes and connectomes
A principal goal of our work is to provide a foundation for combining molecular data such as neurotransmitter and receptor expression patterns with anatomical or functional connectivity data. One application of expression information is to constrain mechanistic models of neural circuits such as the extensively studied motion detection circuit in the fly eye (reviewed in Mauss et al., 2017;Figure 7-S1A). The combined availability of expression and connectomics data for many cell types in a brain region also makes it possible to systematically identify and further explore unusual patterns of receptor or transmitter expression; for example, cell types in which an otherwise widely expressed receptor is absent or cells with unusual combinations of receptor subunits. Below we discuss three examples, focused on potential signs of synaptic transmission, of how such patterns can lead to specific and unexpected hypotheses about circuit function. As we illustrate, combining expression data with synapselevel anatomy permits analyses which are inaccessible to either approach alone. GluCl-GFP anti-repo (glia) Pm4  T4  T5  Tm1  Tm2  Tm3  Tm4  Tm9  Tm20  Tm29  TmY3  TmY5a   LC4

Presynaptic cholinergic markers and absence of histamine receptors in some postsynaptic targets: R8 photoreceptors may signal via two fast transmitters
Fly photoreceptors have long been known to release histamine (Hardie, 1987;Sarthy, 1991). Our data indicate that inner (color vision) R8 photoreceptors also express the cholinergic markers ChAT and VAChT, suggesting an unexpected additional cholinergic phenotype ( Figure 6A). To confirm these results, we visualized a tagged VAChT protein (VAChT-HA; Pankova and Borst, 2017), expressed from the endogenous locus, selectively in photoreceptor cells. These experiments showed VAChT-HA labeling in medulla terminals of R8 but not R7 cells ( Figure 8A,A',A",B), including the specialized polarized light-responsive R8-cells in the dorsal rim of the medulla. The latter express the rhodopsin Rh3 (which is otherwise expressed in R7s; Fortini and Rubin, 1990), consistent with the presence of ChAT and VAChT transcripts in the R7 driver line (for which the model inferred expression for VAChT but not ChAT). We asked whether the apparent co-transmitter phenotype of R8 neurons was reflected in the expression of neurotransmitter receptors in their different postsynaptic partners. Postsynaptic partners of R8 cells identified by electron microscopy reconstructions include seven cell types in our dataset: Dm9, Mi1, Mi4, Mi15, R7, L1 and Tm20 ( Figure 8C) (Takemura et al., 2013;Takemura et al., 2015). All of these express one or more nAChR subunits ( Figure 7A). By contrast, expression of the histamine-gated chloride channels HisCl1 and ort, which mediate histaminergic transmission by photoreceptors (Pantazis et al., 2008), was more selective (Figure 8C,D): L1, Tm20 and Dm9 express ort, consistent with previous reports (Gao et al., 2008), while HisCl1 transcripts were detected in the R7 as well as R8 driver lines, in agreement with another recent report (Schnaitmann et al., 2018;Tan et al., 2015). However, we did not find evidence of expression of ort or HisCl1 in Mi4, Mi1 and Mi15, further supporting R8 signaling via a transmitter other than histamine.
We were interested in whether release of ACh and histamine might occur at spatially distinct locations. Insect synapses often consist of multiple postsynaptic sites apposed to the same presynapse. For cells that release more than one transmitter, two general distributions of postsynaptic processes at such multicomponent synapses are possible ( Figure 8E). Postsynaptic cells with different receptors could be grouped at different sites based on receptor expression ( Figure 8E-left) or occur together at the same locations ( Figure 8E-right). To distinguish these possibilities for R8 cells, we used EM reconstruction data (Takemura et al., 2013) to map the predicted expression of histamine receptors in postsynaptic cells at the single synapse level for all presynaptic sites of one reconstructed R8 cell ( Figure 8F). The resulting pattern indicates that processes of cell types with and without histamine receptor expression are often located near the same R8 presynapse ( Figure 8F), whereas this is not the case for a reconstructed R7 cell ( Figure 8G). This is consistent with the VAChT-HA labeling observed throughout the medulla terminals of R8s ( Figure 8A). This spatial pattern is compatible with either co-release of histamine and ACh or independently regulated release from different vesicles at the same sites.
A combined cholinergic and histaminergic phenotype has been reported for a small group of extraretinal photoreceptors (the Hofbauer-Buchner eyelet) located near the lamina (Yasuyama and Meinertzhagen, 1999) but was unexpected for R-cells of the compound eye. Establishing the functional significance of potential acetylcholine release by R8 cells will require further experiments. However, we note that double mutants lacking both histamine receptors are not completely blind (Gao et al., 2008), consistent with histamine-independent transmission by photoreceptor neurons. In addition, a very recent study suggests a role of cholinergic R8 signaling in the entrainment of the fly's circadian rhythm to lightdark cycles (Alejevski et al., 2019), perhaps similar to that of ACh-release from the Hofbauer-Buchner eyelet (Schlichting et al., 2016).

Potentially excitatory GABA-A receptors in lamina monopolar cells
Fast GABAergic transmission via GABA-A receptors is a major source of inhibition in the nervous system. However, some GABA-A subunit combinations could mediate depolarizing GABA-signaling: in vitro assays indicate that homomeric Rdl or heteromeric Rdl/Lcch3 receptors are typical GABA-gated chloride channels (Zhang et al., 1995), while Lcch3/Grd form GABA-gated cation channels (Gisselmann et al., 2004). However, the in vivo significance of this difference is unknown. Rdl and Lcch3 were expressed in nearly all neurons in our dataset (Figures 7A,9A,B), consistent with the general inhibitory nature of GABA signaling. By contrast, Grd and another predicted GABA-A receptor subunit, CG8916, were expressed in a minority of cell types ( Figures 7A, 9A,B). Photoreceptor neurons, for which no major GABAergic inputs have been identified by connectomics, expressed none of the four transcripts ( Figure 9B). Lamina monopolar L1 and L2 were the only neurons other than photoreceptors that did not express significant levels of Rdl. However, both express Grd, Lcch3 and also CG8916. Together with the in vitro findings mentioned above, this result suggests that some or all GABA-A receptors in L1 and L2 may be cation rather than chloride channels. Remarkably, lamina monopolar cells in the housefly Musca, which are thought to have very similar functional properties to those in Drosophila, depolarize in response to GABA (Hardie, 1987) but hyperpolarize in response to histamine (via ort-containing chloride channels). Thus our data identify a potential link between in vivo elec-    Grd Rdl Lcch3 trophysiology, in vitro receptor properties and cell type differences in GABA-A subunit (Rdl or Grd) expression. We next asked whether depolarizing GABA-signaling to L1 and L2 was plausible given their synaptic connectivity. Based on synapse counts and our transmitter data, the main GABAergic inputs to L1 and L2 are C2 and C3 neurons (Meinertzhagen and O'Neil, 1991;Rivera-Alba et al., 2011;Takemura et al., 2013;Takemura et al., 2015). Conversely, L1 is the main input to both C2 and C3 cells, followed by the cholinergic L1 targets L5 and Mi1. These strong connections (illustrated for C2 in Figure 9C) indicate that the effective sign of GABA input to L1 and L2 is almost certainly of functional significance. In the illustrated circuit ( Figure 9C), L1 cells hyperpolarize in response to luminance increases (as histamine from photoreceptors opens ort chloride channels). The resulting reduced secretion of glutamate is thought to depolarize L1 targets such as Mi1 (via closing of GluClalpha channels). One plausible, though speculative, scenario, is that, similar to Mi1, C2 cells also depolarize in response to light. In this case, GABA-gated cation channels in L1 (formed by Grd and Lcch3) would enable negative feedback (counter-acting) from C2 to L1, which for example could return the membrane potential closer to resting levels -speeding up the response to subsequent luminance changes. By contrast, opening of conventional GABA-A receptors (GABA-gated chloride channels) in L1 would resemble a light response (opening of histamine-gated chloride channels), and thus provide positive (reinforcing) feedback in this case. The latter possibility appears less consistent with the transient nature of the L1 (and L2) response to light ( (Järvilehto and Zettler, 1971;Laughlin and Hardie, 1978)). Distinguishing these and other possibilities will of course require future experimental work.
Similar to the findings for histamine receptors described above ( Figure 8F), again using connectivity data for the medulla, we observed that cells with different GABA-A profiles can be postsynaptic at the same synapse ( Figure 9D). In addition to L1 and L2, Grd expression indicated several other candidates for cells with unusual GABA responses ( Figures 7A, 9B). In these neurons (e.g., Dm8 or Mi4), Rdl and Grd were detected together, raising questions such as whether their subcellular distribution is synapse-specific or whether these subunits might co-assemble into channels with yet unexplored properties.

Diverse patterns of glutamate receptor expression in the targets of a single local interneuron type in the lamina
The diverse expression of glutamate receptor subunits, which can mediate both inhibitory and excitatory signaling, was particularly striking in the lamina ( Figures  7A, 9E,F). Notable patterns include the photoreceptorspecific expression of EKAR, the predicted absence of GluClalpha, otherwise broadly expressed in neurons, from some cell types, including photoreceptors (Figures 7A, 9F) and its strong expression in epithelial glia. CG3822, a Kainate-type receptor subunit recently reported to function in presynaptic homeostatic control at the neuromuscular junction (Kiragasi et al., 2017), was strongly enriched in the lamina intrinsic Lai cells. Since Lai neurons are the only known source of vesicular glutamate release in the lamina, CG3822 function in Lai must also be pre-or perhaps extrasynaptic. T1 and L3, while not expressing VGlut, might also influence glutamate levels in the lamina via the Eaat1 plasma membrane glutamate transporter. The strong expression of this transporter in T1 rather than glia is another unusual feature of glutamatergic signaling in the lamina and may be a clue to the enigmatic function of T1 cells (Tuthill et al., 2013). These examples further highlight how a transmitter released by one neuron type, here Lai, is predicted to have very different effects on target cells due to the receptors they express.

Comparisons of cell shape, synaptic connectivity and receptor expression reveal multiple similarities between local interneurons Lai in the lamina and Dm9 in the medulla
The combination of highly specific EKAR expression and the unusual absence of GluClalpha in photoreceptors prompted us to further explore cellular sources and potential functions for glutamatergic signaling to photoreceptor neurons.
Photoreceptor neurons function over an extremely wide range of light levels, from moonlight to bright sunlight. One mechanism proposed to enable this behavior is a depolarizing feedback signal from photoreceptor targets that increases photoreceptor output under low light conditions, but reduces output at higher light intensities (Zheng et al., 2009). As Lai cells express ort, and thus, like other ort-expressing photoreceptor targets, are thought to hyperpolarize in response to light, increased glutamate release from Lai could provide such light-dependent feedback via EKAR in R-cells. This scenario is consistent with reduced photoreceptor responses at low light intensities after reduction of Lai output or EKAR function (Hu et al., 2015). EKAR is also expressed in R7 and R8 cells, which project to the medulla and are not postsynaptic to Lai. We therefore asked whether there might be a medulla counterpart of Lai neurons.
Synaptic connectivity data identify Dm9 as a strong candidate for such a role: Dm9 is both a major pre-and postsynaptic partner of R7 and R8; it is the only identified R7/R8 target with these properties (other known R7 or R8 targets appear to form few if any feedback synapses on these cells). Remarkably, the overall anatomy of Dm9 cells is also very similar to Lai ( Figure 9G,H): Both Lai and Dm9 cells span multiple visual columns but the precise number and distribution of columns innervated by each individual cell is variable. Finally, Lai and Dm9 share key molecular properties: for example, both are glutamatergic and express ort histamine receptors. Based on connectivity and gene expression ( Figure 9F,J), Dm9 cells are predicted to receive hyperpolarizing R7 and R8 input via ort and excitatory input from the photoreceptor targets L3 and Dm8. Thus, similar to Lai ( Figure 9G,I), Dm9 appears qualified to increase photoreceptor output in the medulla under low light conditions, similar to a proposed function of Lai in the lamina.
One notable difference between Lai and Dm9 is that in contrast to Lai, Dm9 cells receive input from photoreceptor neurons with different spectral tuning. This input involves direct (R7, R8) and indirect pathways (R7 via Dm8, R1-6 via L3) ( Figure 9J). This integration of multiple spectral inputs could support a role of Dm9 in color processing. Indeed, the anatomical and predicted functional properties of Dm9 match those of an as yet unidentified ort expressing cell type proposed to contribute to color opponent signaling between R7 and R8 cells (Schnaitmann et al., 2018).

Discussion
We present an approach to characterize the function of neural circuits by combining genetic tools to access their component cells, TAPIN-seq to measure their transcriptomes, and a probabilistic model to interpret these measurements (Figures 1 to 3). We used this approach to establish an extensive resource of the genes expressed in 67 Drosophila cell types, including 53 in the visual system, covering photoreceptors, lamina, and components of the motion detection circuit (Figure 4) and systematically compare our results to single cell RNA-seq (Figure 5). Our approach enables an extensive analysis of neurotransmission in the Drosophila visual system, including the neurotransmitters sent and received across the network as well as transcription factors that potentially regulate neurotransmitter identity (Figures 6 and 7). We also provide specific examples of integrating transcriptomes and connectomes to illuminate circuit function (Figures 8 and 9).
Many recent studies have explored gene expression in neurons. However, only a few of these were aimed at neurons in genetically tractable organisms and brain regions for which detailed anatomical data, especially at the level of synaptic connections, are available. Previous work in the mouse retina has used both genetic (Siegert et al., 2012) and single cell approaches (Macosko et al., 2015) to characterize transcriptional regulators as well as classify cell types. More recent work in Drosophila used single cell RNA-sequencing to characterize heterogeneity in olfactory projection neurons , the midbrain (Croset et al., 2018), the optic lobe (Konstantinides et al., 2018), and the whole brain (Davie et al., 2018). The expression patterns of many genes have also been mapped in C. elegans neurons, whose con-nectivity has long been known, although these studies typically focus on individual genes rather than genomewide catalogs (Hobert, 2016). The unique combination of an extensive genetic toolbox to access individual cell types in the Drosophila visual system and systematic efforts to map its connectivity, make it well suited for exploring whether a comprehensive catalog of gene expression is useful for understanding circuit function. Towards this end, we profiled a diverse array of cell types including all of the neuronal cell types that populate the lamina and a subset of cell types in the medulla and lobula complex including those known to play a central role in the detection of motion. We also analyzed a number of cell types residing in deeper brain structures such as the mushroom body and central complex.
Our approach requires genetic driver lines to obtain transcriptomes of specific cell populations. The recent availability of large collections of reagents for split-GAL4 intersections (Dionne et al., 2018;Tirian and Dickson, 2017) make it possible to obtain such lines for virtually any cell type of interest. This expanding genetic toolbox works well with our TAPIN-seq method to profile transcriptomes.
In some cases, available driver lines, including some used in this study, may label some additional cell types. While drivers with even higher specificity could be obtained through testing of additional split-GAL4 intersections or perhaps triple intersections (Dolan et al., 2017), we did not find the contributions of small numbers of 'offtarget' cells to be a major limitation for many applications of expression data. In general, the transcriptomes support the high specificity of the intersectional lines we used to access visual system cells (Figure 1). For example, we found specific expression of known marker genes (Figures 2H, 4B) and also that most neurons only express genes for a single neurotransmitter type ( Figure 6A). The availability of these genetic tools also makes it possible to validate our transcriptome measurements in a way that is otherwise difficult for single cell RNA-seq studies. Driver lines also permit repeated access to the same cell type in multiple animals at defined time points, enabling the study of behavioral or circadian conditions in individual cell types without having to sequence the whole brain or dissected brain regions.
Modifying the one-step affinity capture in the original INTACT method to a two-step capture in TAPIN-seq increased its specificity, sensitivity, and throughput without the need for time-consuming and labor-intensive centrifugation steps (Figure 1). We initially tried improving the original INTACT method by using density gradient centrifugation to purify nuclei prior to the bead capture step, but this was cumbersome, low throughput, and ineffective for cell types with few nuclei per brain. In addition, for reasons that remain unclear, both photoreceptors and T4 cells consistently yielded few nuclei with this approach. Even with TAPIN, the libraries obtained with some sparser driver lines did not meet the quality con-20 of 45 trol standards we applied. We suspect that the quality of these sub-optimal libraries can be improved by starting with more flies, which is simplified by TAPIN-seq's ability to use frozen material, enabling the collection of many flies on multiple days at defined time points. In contrast, manual or FACS sorting of dissociated cells is more challenging to scale up, because these more labor-intensive tissue procurement schemes cannot be simplified in the same way. It is also worth noting that our tandem affinity purification approach can improve the specificity of any immunopurification method that uses a capture antibody that is cleavable by IdeZ (all IgG subclasses), without requiring expression of a traditional TAP tag (Rigaut et al., 1999).
TAPIN-seq complements single-cell RNA-seq studies of neurons in several ways (Ecker et al., 2017;Konstantinides et al., 2018). First, our high-resolution transcriptomes will serve as a reference for interpreting singlecell measurements. In particular, comparing our expression catalog to recent single cell maps of the optic lobe and whole brain highlights the challenges in interpreting single cell measurements. Several cell types that we profiled don't appear as clusters in the single cell map, while others are grouped into the same cluster. The wellestablished neuroanatomy of the optic lobe makes it an ideal setting to evaluate the accuracy of single cell RNAseq measurements and raises a broader caution when interpreting scRNA-seq surveys of less well-characterized tissues (e.g., the Human Cell Atlas effort; Regev et al., 2017): the composition of cell types (or states or clusters) observed by scRNA-seq can deviate significantly from their true abundance and requires validation with independent methods. Having both deep bulk transcriptomes and single cell maps of the same tissue also provides an opportunity for developing new analytical tools that can harness available cell type-identified information while clustering single cell data. Second, combining our approach with single-cell profiling could more efficiently profile heterogeneity within a brain region or genetically defined cell population. Finally, the complementarity between bulk and single-cell measurements extends to other genomic features that can be measured in TAPIN-seq purified nuclei, including accessible chromatin and modified histones. We expect this combination of genomic tools to help decipher the transcriptional and epigenetic regulation of neuronal expression programs.
Transcriptome measurements can be of limited utility because it is challenging to interpret relative transcript abundance. In this study we developed a probabilistic mixture modeling approach to classify relative abundances into binary on and off states. This model was a useful guide for interpreting our measurements; most genes are readily described with the two-state model, although the expression of some genes is not (e.g., Rab11; Figure 3-S1L). Even for specific genes whose expression is more continuous than bimodal, the results still offer a useful family-wide summary of expression patterns.
For example, DPR family members are more broadly expressed than DIP genes ( Figure 4C; Figure 4-S1D), an observation supported by two recent studies using different methods (Cosmanescu et al., 2018;Venkatasubramanian et al., 2019). Despite our model's utility, it is important to remember the many potential sources of error (minor cell types in driver line patterns, transcript carry over during TAPIN, biases in RNA-seq library construction and sequencing, etc.) that can affect measurements of relative transcript abundance and the resulting model inferences. Having observed most discrepancies between our modeling results and protein-level expression near the boundary between on and off states, it is prudent to treat these cases more carefully.
Our resource provides additional foundation for systematic functional and molecular studies of the Drosophila visual system. We illustrated how the resource can characterize neurotransmission in the network, particularly when combined with connectome information detailing connectivity between cell types as well as the grouping of post-synaptic partner cell types. We determined neurotransmitters used by every cell we profiled and found two likely cases of co-transmission ( Figure 6A). The expression patterns of the major fastacting transmitters histamine, acetylcholine, glutamate and GABA were comparatively simple: Nearly all neuronal cell types in our catalogue appear to express exactly one of these four transmitters. However, the transcriptomes suggest that many cells also have the potential to release specific neuropeptides, other chemical messengers such as nitric oxide, or form gap junctions with other cells.
While selected transmitter markers (e.g., Gad1 or VGlut) could also be assigned to cell types using methods such as immunolabeling or FISH, these approaches are not practical for comprehensive sampling of markers across these different modes of cell-cell communication. This is particularly clear when the expression patterns of neurotransmitter receptors are also considered ( Figure 7A). Our results suggest that, for canonical small molecule transmitters, neurotransmitter output space is tightly tuned while input space is not: neurons typically send just one type of signal but can receive many (Figures 6 and 7). The expression patterns of neurotransmitter receptors provide further context for determining circuit mechanisms (Figures 7-S1, 8 and 9). Our results also implicate transcription factors involved in regulating neurotransmitter phenotype, including several that appear to have conserved roles in specifying neuronal identity in other species (Figure 6-S1).
The availability of connectivity data for many neurons in the visual system allowed us to interpret neurotransmitter use and receptor distribution in the context of circuit architecture (Takemura et al., 2013;Takemura et al., 2015;Rivera-Alba et al., 2011). For example, our data show expression of both histaminergic and cholinergic markers in R8 photoreceptors. We further find that some major synaptic targets of R8 cells, as identified by electron microscopy, do not express known receptors for histamine ( Figure 8F). Given that R7 and R8 cells were previously thought to be exclusively histaminergic both results are unexpected and individually might appear difficult to explain (Gao et al., 2008). However, in combination, these findings make a strong case for a dual histaminergic and cholinergic transmitter phenotype of R8 cells. In contrast, R7 only expresses the histaminergic marker Hdc, and all of its targets express a histamine receptor ( Figure 8G).
Finally, our approach especially complements ongoing efforts to map circuit connectivity, which is complete for C. elegans, and is becoming accessible on a whole brain level for Drosophila (Zheng et al., 2018), and for portions of the mouse brain such as the retina. Methods to obtain and interpret serial electron micrographs, array tomography and other methods for mapping connectivity are rapidly progressing (Swanson and Lichtman, 2016;Micheva and Smith, 2007;Kebschull et al., 2016). All told, we are entering a period in neuroscience where connectomics will become pivotal. We expect that genomic approaches, such as the methods for data collection and analysis that we describe here, will enhance these efforts by using transcriptomes to provide, at high-throughput, a molecular proxy for physiological features that are otherwise inaccessible to connectomic methods.

Contact for reagent and resource sharing
Further information and requests for resources and reagents should be directed to the Lead Contact, Gilbert L. Henry (henry@cshl.edu). A detailed description of split-GAL4 hemidrivers (https://bdsc.indiana.edu/ stocks/gal4/split_intro.html) and cell-type specific split-GAl4 lines is also available (https://www.janelia. org/split-GAL4).

Experimental models and subject details
Flies were reared on standard cornmeal/molasses food at 25 • C. For profiling experiments adults, 4-7 days of age, were entrained to a 12:12 light:dark cycle and anesthetized by CO 2 at ZT8 -ZT12. Samples can be stored indefinitely at -80 • C after flash freezing in liquid N 2 . We used female flies for all anatomical characterizations.

Anatomical analyses
Details of individual genotypes and labeling methods used in the characterization of the driver lines and other anatomical experiments are summarized in Table S5.
Details of the driver lines are provided in Table S1. For the naming of RNA-seq samples, we identified all drivers with a main cell type or cell types (e.g. Mi9_d1). Most of these cell types have been described in detail and were identified based on prior descriptions (see references in Table S1; Takemura et al., 2013;Gao et al., 2008;Nern et al., 2015;Fischbach and Dittrich, 1989;Tuthill et al., 2013;Aso et al., 2014;Wu et al., 2016;Wolff et al., 2015;Wolff and Rubin, 2018;Edwards et al., 2012;Helfrich-Förster et al., 2007;Panser et al., 2016;Mauss et al., 2015). The driver names do not attempt to include additional cells present in some drivers. A few of our cell types are strictly groups of related cell types (for example, the muscle cells or, at a different level of a cell type hierarchy, the T4 and T5 cells, with four subtypes each, or R7 photoreceptor neurons, which include R7s of pale and yellow ommatidia).

Generation and characterization of new driver lines
Split-GAL4 and GAL4 driver lines (Table S1) were used to express UNC84-2XGFP in defined cell populations. Previously published driver lines were from the following studies (see Table S1 for details; Tuthill et al., 2013;Diao et al., 2015;Aso et al., 2014;Wu et al., 2016;Strother et al., 2017;Park et al., 2003;Rulifson et al., 2002;Tayler et al., 2012;Taghert et al., 2001;Brand and Perrimon, 1993;Wu et al., 2003;Park et al., 2000;Sweeney et al., 1995;Wolff and Rubin, 2018;von Reyn et al., 2017). New split-GAL4 lines were generated as in previous work (Tuthill et al., 2013;Wu et al., 2016). Briefly, we first identified GAL4 lines with expression in the cell type of interest by screening images of the expression patterns of large collections of such lines (Jenett et al., 2012;Tirian and Dickson, 2017). Typically, several candidate combinations of AD-and DBD-hemidrivers were tested to identify lines with sufficient specificity.
To characterize new driver lines, we examined both overall expression pattern in the brain and optic lobe and, for most lines, confirmed the identity of the main cell type or types using MultiColor FlpOut (MCFO)-labeled single cells . Since details of the expression patterns of GAL4 or split-GAL4 driver lines can depend on the particular UAS reporter used, we re-imaged 20 drivers with the TAPIN nuclear marker used for the profiling experiments (Figure 2A). In general, the distribution of labeled nuclei in these images appeared to match the expression patterns and specificity expected from the driver line's original characterization using a membrane marker. As expected, a small number of off-target cells were detectable (often more weakly labeled) in many driver lines.

Validation experiments
For validation experiments, we examined expression patterns of tagged proteins expressed in a near native genomic context using either large BAC-transgenes or modifications of the endogenous loci Diao et al., 2015;Kudron et al., 2018;Lee et al., 2018).
We classified fkh-GFP and Ets65A-GFP as expressed or not expressed by comparing nuclear GFP signal in cells of interest (identified using a split-GAL4 driver) to background labeling in surrounding cells. Because of considerable differences in the GFP signal for different cell types, confocal settings and post-imaging adjustments were done individually for different cell types for these experiments.
For other experiments, brains of female flies were dissected in insect cell culture medium (Schneider's Insect Medium, Sigma Aldrich, #S0146) and fixed with 2% PFA (w/v) (prepared from a 20% stock solution, Electron Microscopy Sciences: 15713) also in cell culture medium for 1 h at room temperature. Brains were washed with 0.5% (v/v) TX-100 (Sigma Aldrich: X100) in PBS and incubated in PBT-NGS (5% Goat Serum [ThermoFisher: 16210-064] in PBT) for at least 30 min. Incubations with primary antibodies and subsequently, after additional PBT washes, secondary antibodies, were in PBT-NGS at 4 • C overnight. After additional washes with PBT and then PBS, brains were mounted in SlowFadeGold (ThermoFisher: S36937) and imaged on a Zeiss LSM 710 confocal microscope using 20x 0.8 NA, 40x NA 1.3 or 63x 1.4 NA objectives. A few specimens were mounted in DPX following the protocol described in Nern et al., 2015. For experiments using only native fluorescence, brains were fixed as above and mounted and imaged after the initial post-fixation washes.
Primary antibodies used in each experiment are indicated in Table S5.

Image processing
Image analyses and processing were mainly done using Fiji (http://fiji.sc) and Vaa3D (Peng et al., 2010). Brightness and contrast were adjusted separately for individual images and channels. Figure panels were assembled using Adobe Indesign. This included selection of fields of view and adjustments of image size. Some images were rotated or mirrored. In some panels with rotated images, empty space outside the original image was filled in with zero pixels. Most of the images in Fig BDSC_67516 pJFRC12-10XUAS-IVS-myr::GFP in attP2 BDSC_32197 pJFRC19-13XLexAop2-IVS-myr::GFP in su(Hw)attP8 BDSC_32211 pJFRC21-10XUAS-IVS-mCD8::RFP in attP18 ure 1-S1C,C' and Figure 1-S2 show resampled views that were generated from three dimensional image stacks using the Neuronannotator mode of Vaa3D and exported as TIFF format screenshots.
In total we built 266 RNA-seq libraries, including 46 INTACT-seq, 196 TAPIN-seq, 8 total RNA libraries from dissected tissues, and 16 control libraries that we used to characterize each INTACT/TAPIN-seq step ( Figure 2C, Table S1).

RNA-seq data processing
We trimmed five nucleotides from the 5' end of reads using seqtk (https://github.com/lh3/seqtk) to remove potential contaminating adapter sequence from the Nu-Gen Ovation kit. We estimated the abundance of annotated genes using kallisto (v0.43.1; Bray et al., 2016) to pseudo-align trimmed reads to the fly transcriptome (cDNA and ncRNA transcript sequences from ENSEMBL release 91, based on FlyBase release 2017_04), ERCC spike-ins, and the INTACT construct sequences GAL4-DBD, p65-AD, and UNC84_2XGFP. ERCC, INTACT tag constructs, and rRNA genes were removed from the abundance tables and the estimated abundances of the remaining genes were renormalized to one million total transcripts. The ERCC spike-ins and nuclear yield values allowed us to convert relative transcript abundance (in Transcripts Per Million, TPM) to absolute abundance (Figure 3-S1G). However, we only used relative abundance for our analyses. We also aligned the trimmed reads to the genome using STAR (v2.5.3c; Dobin et al., 2013) and evaluated gene body coverage bias using Picard (v 1.9.1; http://broadinstitute.github. io/picard).
We used three criteria to quantify the quality of each library: the number of genes detected, the pearson correlation between transcript abundances measured in replicates, and the cDNA yield. We used only high-quality libraries (at least 8,500 genes detected, 3µg cDNA yield, and 0.85 Pearson's correlation of transcript abundances in two biological replicates) as input to the model described below.

Comparison to published single cell and FACS-seq datasets
We obtained genes reported to mark the single cell clusters in a recent scRNA-seq study of the optic lobe (Konstantinides et al., 2018). We also obtained the cluster assignments for each single cell in this dataset from the SCope database (Davie et al., 2018), using Seurat clustering resolution 4.0, as reported by the authors. We analyzed FACS-sorted RNA-seq samples reported by Konstantinides et al. by downloading the raw sequencing reads from the NCBI Sequence Read Archive (https:// www.ncbi.nlm.nih.gov/sra) and estimating transcript abundance using kallisto and the same transcriptome index as above.
To compare our bulk TAPIN-seq profiles to published single cell datasets, we used non-negative least squares regression to model each bulk TAPIN-seq profile as a linear weighted sum of single cell clusters and a profile-specific residual (TAPIN~cluster). We began with single-cell expression matrices extracted from the loom files deposited in the SCope database, using the SCopeLoomR package (Davie et al., 2018). After normalizing the transcriptome of each cell from transcript counts to counts per million, we calculated the mean transcriptome for each single cell cluster. We then selected genes that were enriched in either single cell clusters or TAPIN-seq cell types, using the following criteria (adapted from Cao et al., 2019): union of the top-50 genes that were most enriched relative to the average of all other clusters (or TAPIN-seq cell types) and the top-50 genes that were most enriched relative to the maximum level in all other clusters (or TAPIN-seq cell types). We then performed NNLS regression using the Lawson-Hanson implementation (Lawson and Hanson, 1974) available through the nnls R package (Mullen and van Stokkum, 2012). To visualize the results, we created heatmaps of the regression coefficients, normalizing the values for each single cell cluster across TAPIN-seq cell types.
We also performed the NNLS regression in the opposite direction (cluster~TAPIN, explaining single cell clusters as combination of TAPIN-seq profiles), as we thought this direction would more naturally describe mixed clusters composed of multiple cell types (e.g., all photoreceptors, or all monopolar cells). In practice however, this regression assigned coefficients of exactly zero to several TAPIN-seq profiles -as expected given the power of NNLS to recover sparse solutions (Slawski and Hein, 2013) and the collinearity amongst TAPIN-seq profiles (e.g., highly correlated expression amongst photoreceptor subtypes). In contrast, the TAPIN~cluster regression correctly matched mixed clusters with the corresponding TAPIN-seq profiles.

Inferring expression state from transcript abundance
We begin with a catalog of S RNA-seq samples generated from nuclei isolated from cell type cell(s) and the estimated abundance (in TPM), E gs of transcripts from gene g in each sample s. We consider only proteincoding genes with at least 10 TPM abundance in at least one sample (n=12,377 of 13,931 total coding genes).
To interpret E gs , we assume that all genes express in either an 'on' or an 'off' state. Our goal is to infer from these abundances the probability that each gene is ex-pressed in each cell type, P (z gc = on). Depending on the cell types in our catalog, we will observe some genes in both on and off states (bimodal), while others are exclusively off (unimodal-off) or on (unimodal-on). We deal with these scenarios in turn below.
Assuming that a gene is bimodal, we model its expression as arising from a mixture of two gene-specific lognormal distributions describing expression in cells where the gene is off, P (E g |z = off), and those where the gene is on, P (E g |z = on), combined with a mixing weight, π g . We use the same standard deviation for both on and off distributions to ensure a monotonic relationship between transcript abundance and the posterior probability of the on state. If we use different standard deviations for each component distribution, the wider one would become more probable than the narrower one at both low and high expression levels.
We estimate the posterior probability of the on state (assuming bimodal expression): We treated each replicate sample of the same driver as an independent probe of the same underlying driverline expression state. To combine replicates of the same driver we sum over their likelihoods: Similarly, to combine samples from the same cell type we sum over their likelihoods: We estimated parameters for each gene-specific mixture model by maximizing the likelihood for the observed sample-level data: (1 − π g )P (E gs |z = off, µ gz , σ g )) Because we assume independence of genes, we separately optimized the model parameters for each gene. To model the possibility that a gene is unimodally expressed across the cell types we analyzed, we also model the data using a single log-normal distribution, estimating the distribution parameters µ and σ and estimating the data likelihood as: Deciding whether a gene is bimodally or unimodally expressed is an example of the model selection problem in statistics. To compare the quality of the unimodal and bimodal models for each gene, we used a recently developed approach to leave-one-out cross-validation that uses Pareto-smoothed importance sampling (PSIS-LOO; Vehtari et al., 2016). Specifically, we performed 10-fold cross validation, by randomly holding out 1/10 of the samples as a "test" set (requiring that at least one replicate of each driver exist in the remaining "training" set), fitting the models using only the training data, and then evaluating the likelihood of the test data using the fitted parameters. Each of the ten cross-validation fits, i, returns an ensemble of S=500 draws from the posterior distribution of the model parameters. We estimated the expected log pointwise predictive density (elpd) of each cross-validation fit by evaluating the likelihood of each held-out dataset i using each parameter draw s : We then combined the pointwise log-likelihoods for each cross-validation fit to calculate a single estimate for each model: To compare the unimodal (u) and bimodal (b) models, we calculated the difference in elpd as well as its standard error: We then picked the model with the higher elpd, unless the difference in elpd was within two multiples of its standard error (abs(∆ elpd) ≤ 2 · se(∆ elpd)), corresponding approximately to the half-width of a 95% confidence interval in a normal sampling distribution) in which case we considered the two models' performance to be indistinguishable and chose the simpler unimodal model.
If we decide a gene is unimodal, we must still decide if it is expressed or not. To model the expression state of unimodal genes, we created two separate log-normal distributions of abundances of confidently bimodal genes (∆ elpd > 10) using samples where they were either estimated to be on according to the bimodal model (p(z gs = on|bimodal) > 0.9) and where they were estimated to be off (p(z gs = on|bimodal) < 0.1), combined with a mixing weight, π, set to the fraction of datapoints that were estimated to be 'on' according to the bimodal model.
We estimate the posterior probability of the on state assuming unimodal expression as: To build the final matrix of P (z gs = on) calls, we used bimodal estimates for genes where the bimodal model was a better fit than the unimodal model, and the unimodal estimates for the remaining genes.
We did not include the transcriptomes of the dissected samples in the mixture models because we were concerned that their cellular heterogeneity would violate our assumption of binary gene expression in each sample. That is, genes expressed in a subset of the cells of a dissected sample would give rise to transcript abundance intermediate between the off and on states, and thus make it more difficult to accurately infer the component distributions. However, in some cases the dissected samples could be useful for interpreting transcript levels in the cells that we profiled, by providing examples that extend the observed dynamic range. For example, in the case of a gene expressed in a dissected tissue, but not in the cells that we specifically profiled, the dissected levels would add "on" examples that would make it easier to interpret the levels in the cell types as "off". To use the dissected samples to better model dynamic range, we added two "dummy" samples to each model: the minimum and maximum observed level across both the cell catalog and the dissected samples. This choice allowed us to use the dissected levels if they in fact outflanked the cell type-specific levels, while not confusing the model with intermediate abundance levels. Once the models were fit, we could use the inferred parameters to estimate expression probabilities for samples that were not used in the model fit. For example, we estimated the probabilities of expression in the dissected samples to search for genes expressed exclusively in the dissected samples and not in the anatomically defined cell type libraries, indicating potential markers for cells that we did not specifically profile.
We implemented all models using RStan (Stan Development Team, 2017;Carpenter et al., 2017) to infer the posterior distribution of unknown parameters using hamiltonian Markov chain Monte Carlo. We used the same weak prior (N(7,5)) for the mean log-expression levels of both on and off components, allowing us to use Stan's positive_ordered data type to describe the location of the two components.

Evaluating model accuracy
To evaluate the accuracy of the mixture modeling approach we created a benchmark set of expression data extracted from FlyBase. Specifically, we queried the FlyBase website (http://flybase.org) for genes expressed in the optic lobe or the photoreceptor. The resulting benchmark set included 193 positive and 4 negative expression datapoints. We quantified the model's accuracy on this benchmark in two ways. First, we quantified concordance between the benchmark expression state and our model's inferred state. Second, we computed the cumulative distribution function of the inferred probabilities of expression for the positive benchmark datapoints.

Expression-based tree of cell types
To study cell relationships, we used phylogenetic treebuilding to compare their expression profiles. We first selected a subset of genes with on-component means of at least exp(3)~21 TPM and difference between on and off components of at least exp(1.5)~4.5 fold. We then encoded the expression profile of each cell as a "sequence" of expression states, where each position represents a gene, and the character indicates the gene is expressed ('A', P (z gc = on) > 0.8), not expressed ('C', P (z gc = on) < 0.2), or its expression is uncertain ('N'; 0.2 < P (z gc = on) < 0.8). We computed the Hamming distance between pairs of expression 'sequences' considering only unambiguous positions, using the dist.dna() routine in the ape R package (Paradis et al., 2004). We then used the minimum evolution approach to estimate the 'phylogeny' of the cells, using the balanced weighting scheme (Desper and Gascuel, 2002), as implemented in the ape fastme.bal() routine. We then built trees from 1000 bootstrapped replicates and quantified the support for each branch on the original tree. We visualized the tree using the phytools R package (Revell, 2012).

Identifying marker genes
We identified marker genes specifically enriched in individual cell types and groups of cells (photoreceptor, glia, muscle, neuron) by searching for genes inferred to be almost exclusively expressed in a single cell type or cell group (p(on) ≥ 0.9 for all cells within a group, and at most two cells outside a group) and with transcript abundance higher than all cells outside the group.

Evaluating expression patterns for genes with different functions
We used FlyBase Gene Groups (release 2018_02) to assign functions to genes, and considered the most terminal groups in the hierarchy that had at least 10 genes.

Mapping receptor expression onto synapses
To map receptor expression onto synaptic connectivity, we first obtained synapse pairs from Takemura et al., to identify synaptic targets of R8 (cell #111), R7 (cell #205), and C2 (cell #214) cells in the medulla (Takemura et al., 2013). When multiple instances of a cell type were available in the synaptic table, we chose the one with the greatest number of synaptic partners. For target cell types that we profiled with TAPIN/INTACT-seq, we discretized their expression as either on (p(on) ≥ 0.8) or off (p(on) < 0.8). For cell types that we did not profile, we classified them as unknown receptor expression.

Data and software availability
All raw and processed transcriptome data is available from NCBI GEO (accession GSE116969). The shell scripts used to process the raw RNA-seq data, and the R and Stan programs that implement the mixture model as well as generate all figures and tables in this paper are available at github (http://github.com/fredpdavis/ opticlobe). The cell type-level expression table can be explored interactively at http://www.opticlobe.com.  T4a  T4b  T4c  T4d   T5a  T5b  T5c  T5d FIGURE 1 -Supplemental 1 Figure 1-S1: Whole brain expression patterns of new driver lines generated in this study. A. Maximum intensity projection of confocal stacks taken from whole fly brains (only one optic lobe is shown). Expression patterns of the driver lines (myristoylated-GFP) are in green and a neuropil marker is in magenta. Imaging parameters and brightness and contrast were individually adjusted for each sample. B, B'. T4 and T5 cells comprise four subtypes (a,b,c,d) each of which project to specific layers of the lobula plate (B'). C, C'. Individual T4 and T5 driver lines label combinations of subtypes but show preferential expression in some subtypes. Subtypes were identified by their projections to specific layers in the lobula plate (C,C'). For example, T5_d2 mainly labels lobula plate layers one and two, indicating expression in T5a and T5b. Each of the lower panels is a higher magnification view of the lobula plate region (C'). In (C) both the driver and the split identifier are indicated in the lower left and right corner respectively.    We used an INTACT-seq variant, that we originally developed for mouse (Mo et al., 2015), that purifies nuclei by differential centrifugation. B. TAPIN-seq replaces the space-and time-intensive centrifugation with a two-step capture enabled by antibody hinge cleavage with the bacterial protease IdeZ. Both protein A and protein G bind the Fc region, while only protein G is able to bind F(ab')2. C. Libraries built from more nuclei have more transcript molecules (estimated using synthetic spike-ins). D. Nearly all libraries showed relatively unbiased positional coverage across gene bodies. E. The maximum bias in positional coverage observed in each library was inversely correlated with cDNA yield, although with large variance in bias for lower yield libraries. F. T4.T5 transcriptomes of female (y-axis) and male (x-axis) flies are well correlated, but also recover known sex-specific genes including RNA on X 1 (roX1) and roX2 (Amrein and Axel, 1997) and yolk protein 1 (Yp1) and Yp3 (Belote et al., 1985). G. Estimated transcript abundances were reproducible as evaluated by Pearson correlation (of log-transformed transcript abundance) between biological replicates (black), alternative drivers for the same cell type (  A. TAPIN-seq expression of marker genes identified from an independent FACS-seq dataset covering 12 cell types we also profiled (Konstantinides et al., 2018). We defined marker genes for each cell type as the top-10 most highly expressed genes relative to the mean of all cell types, requiring at least 4x higher abundance than the mean and a relative abundance of at least 50 TPM. B. We more broadly compared the TAPIN-seq and FACS-seq datasets by first identifying cell type-enriched genes within each dataset (at least two-fold higher than mean expression; at least 50 TPM in one sample) and then quantifying the degree of overlap between datasets using the overlap coefficient: 100 * (# genes enriched in both TAPIN-seq AND FACS-seq) / minimum(# genes enriched in TAPIN-seq, # genes enriched in FACS-seq )).     Pm4  T4  T5  Tm1  Tm2  Tm3  Tm4  Tm9  Tm20  Tm29  TmY3  TmY5a   LC4   4-S1: TAPIN-seq profiles identify genes enriched in cell types and groups. A. Cell adhesion molecules specifically expressed across our transcriptome catalog. B-D. The expression pattern of all beat, DIP, and Dpr family members depicted as heatmaps of probabilities of expression (left), heatmaps of relative transcript abundance (middle), or cumulative density curves of normalized expression level (right). The density curves, each depicting a bimodally expressed gene in the gene family, show expression levels that were normalized using the mean expression levels of the modeled off an on states for each gene; normalized level = (logEµ of f ) / (µon − µ of f ). The density curves illustrate that DIP genes are sparsely expressed, followed by beat and Dpr genes. Of the three families, the Dpr genes exhibit transcript abundance that appears more continuous between the estimated off and on states rather than discretely bimodal. E,F. The number of interacting pairs of extracellular protein pairs (Özkan et al., 2013) expressed by pairs of cells in the lamina (E) is not sufficient to predict the synaptic connectivity of these cells (F; data from Rivera-Alba et al., 2011). To match our expression data, we summed the synapse counts for the individual R1-R6 photoreceptors originally reported by Rivera-Alba et al., 2011. For the same reason, we also duplicated the subtype-unidentified Lawf synapse counts as separate Lawf1 and Lawf2 entries in the connectome matrix.    3  4  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  25  26  27  28  29  30  31  32  33  37  38  39  42  43  44  45  46  47  48  49  51  52  53  54  55  56  57  58   C2  C3  L1  L2  L3  L4 Pm4  T4  T5  Tm1  Tm2  Tm3  Tm4  Tm9  Tm20  Tm29  TmY3  TmY5a   LC4  LC6  LC10a  LC10b  LC16  LLPC1  LPC1  LPLC1  LPLC2  LPi−34   0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  single cell cluster ID TAPIN-seq cell type Figure 5-S2: TAPIN-seq expression of genes marking single cell clusters. A. We evaluated expression of marker genes for each optic lobe single cell cluster (as reported in Konstantinides et al., 2018) in our TAPIN-seq profiles of visual system neurons. If a single cell cluster marker corresponds to one of our identified cell types, we expect to see its marker genes highly enriched in the corresponding cell type's expression. Note that some of the single cell clusters with the best apparent cell type matches (e.g., cluster 15/TmY5a, cluster 55/Mi15) were originally reported with a different annotation. B. Expression of marker genes for each brain single cell cluster (as reported in Davie et al., 2018), as in A.   Orthologs * * * * Figure 6-S1: Transcriptional regulators of neurotransmitter identity. A. Transcription factors whose expression is predictive of neurotransmitter phenotype (i.e., P(neurotransmitter output | transcription factor expressed)). The ten most predictive transcription factors are shown for each neurotransmitter output marker. B. Summary of orthologous transcription factors in worm and mouse and their association with specific neurotransmitter types. C. The Gad1-associated gene Lim3 does not express in cholinergic Dm12 neurons, but does in the GABA-ergic Dm10 neurons. Double labeling using LexA-markers for Dm12 and Dm10 (green) with a Lim3 protein-trap-GAL4 driving RFP (magenta). This example highlights a case where Lim3 expression identifies a cell type in a group of similar cells (the Dm cells profiled in this study) that is GABAergic (all other Dms in this group are glutamatergic.) Figure 7-S1: Patterns of neurotransmitter receptor expression complement connectomics. A. Transcriptomes reveal the neurotransmitters in core cell types of the ON and OFF components of the motion detection pathway. The ON and OFF motion detection pathways supply inputs to directionally sensitive T4 and T5 neurons, respectively (Takemura et al., 2017;Shinomiya et al., 2019). Our results show that all of the inputs to T5 (Tm1, Tm2, Tm4, and Tm9) are cholinergic, whereas the inputs to T4 are a mixture of GABAergic (C3, Mi4), cholinergic (Mi1, Tm3), and glutamatergic (Mi9), suggesting different input signs. Discovering the functional signs of inputs to the directionally selective neurons is an essential step in understanding the mechanism of this long-studied neuronal computation (Strother et al., 2017). In addition, our data reveals aspects of the motion pathway that have not yet been functionally examined, such as the identification of other signaling components (see B). B. Examples of the expression of genes involved in neuropeptide, non-canonical small molecule (nitric oxide), or gap junction communication in the cell types in (A).