The Neuroscience Multi-Omic Archive: a BRAIN Initiative resource for single-cell transcriptomic and epigenomic data from the mammalian brain

Abstract Scalable technologies to sequence the transcriptomes and epigenomes of single cells are transforming our understanding of cell types and cell states. The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative Cell Census Network (BICCN) is applying these technologies at unprecedented scale to map the cell types in the mammalian brain. In an effort to increase data FAIRness (Findable, Accessible, Interoperable, Reusable), the NIH has established repositories to make data generated by the BICCN and related BRAIN Initiative projects accessible to the broader research community. Here, we describe the Neuroscience Multi-Omic Archive (NeMO Archive; nemoarchive.org), which serves as the primary repository for genomics data from the BRAIN Initiative. Working closely with other BRAIN Initiative researchers, we have organized these data into a continually expanding, curated repository, which contains transcriptomic and epigenomic data from over 50 million brain cells, including single-cell genomic data from all of the major regions of the adult and prenatal human and mouse brains, as well as substantial single-cell genomic data from non-human primates. We make available several tools for accessing these data, including a searchable web portal, a cloud-computing interface for large-scale data processing (implemented on Terra, terra.bio), and a visualization and analysis platform, NeMO Analytics (nemoanalytics.org).


INTRODUCTION
Circuits in the mammalian brain are comprised of billions of neurons, connected via trillions of synapses. Neuroscientists have long recognized that the structural and functional properties of brain circuits arise, in part, from the diverse anatomical, physiological and molecular characteristics of their composite neuronal and non-neuronal cells. Surprisingly, however, precise definitions for the myriad subtypes of brain cells have remained elusive (1,2).
Single-cell genomics has emerged as a powerful and scalable technology to more rigorously define the brain's cell types (3)(4)(5). Single-cell and single-nucleus RNA sequencing (scRNA-seq) have been applied to sequence the transcriptomes of millions of cells in mammalian brain regions, identifying hundreds or even thousands of transcriptionally distinct cell types (1)(2)(3)(4)(5)(6). Multimodal technologies to co-assay a cell's transcriptome along with its morphology, physiological characteristics, or spatial location have made it possible to relate these transcriptomic cell types to classically defined anatomical and functional cell types (7,8). Single-cell epigenomic technologies characterize the cell type-specific gene regulatory mechanisms underlying cell type identity (9,10).
The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative is applying single-cell multi-omic techniques at unprecedented scale to map the cell types in the mammalian brain (11,12). Since 2017, the BRAIN Initiative Cell Census Network (BICCN) has worked to generate an open-access reference atlas, integrating molecular, spatial, morphological, connectomic and functional data to describe the cell types in mouse, human and non-human primate brains (11). In its first phase, the BICCN produced detailed atlases for the cell types in the primary motor cortex (12)(13)(14), as well as for the cortex's prenatal development (15,16). In its continuing work, the BICCN is nearing the completion of draft cell type atlases for nearly all brain regions in mice and humans. These efforts are expected to be expanded over the next several years through BICCN's successor, the BRAIN Initiative Cell Atlas Network, focusing especially on resources for the brains of humans and non-human primates.
An integral goal of the BRAIN Initiative is to make these brain cell resources available for use by the broader research community, consistent with data FAIRness (Findable, Accessible, Interoperable, Reusable) (17). Toward this goal, the National Institutes of Health have funded the creation of data archives and tools for the web-based analysis and visualization of BRAIN Initiative data. Here, we describe the Neuroscience Multi-Omic Archive (NeMO Archive), which serves as the primary genomic data repository for the BRAIN Initiative. The NeMO Archive provides access to an expansive collection of single-cell genomic data from the mammalian brain, including a variety of tools for searching, downloading, and analyzing these data. A companion website, NeMO Analytics, provides additional, biologistfriendly tools for analysis and visualization.

Data deposition and data processing backend
The NeMO Archive receives submissions of genomic data from BRAIN Initiative researchers, as well as other authorized contributors. Data submission begins with the upload of a standardized manifest file that submitters create based on a provided template. The manifest outlines all files that will be part of the submission and includes required metadata. Once the manifest is validated for completeness and adherence to controlled vocabularies, submitters receive a directory name and command with which to submit data directly to the NeMO Archive using the IBM Aspera data transfer tool. NeMO servers contain three 'incoming' or landing areas, the public incoming area for data with no restrictions that are slated for immediate public release, the restricted incoming area for data requiring consented access (including controlled access human data), and the embargo incoming area for data to be held in embargo before publication. Submitters notify NeMO that their upload has completed through the submission of a tagged file, at which point automated processes pick up the submission for initial quality assurance, including detection of all expected files and checksum comparisons to ensure that files were not corrupted during data transfer. We note that no BICCN data are subject to an embargo.

Metadata
Data must be presented together with detailed metadata in order to be useful to the wider neuroscience research community (17). Toward this end, we have worked with the BICCN infrastructure working group and the larger neuroscience community to ensure that we capture all essential and relevant metadata in the archive, including information about the data source, organism, the experimental assays used, and the sequencing technologies and bioinformatics tools used to generate the data and derived results. We are working to implement the use of standard ontologies and controlled vocabularies including NCBI Taxonomy (18), OBI (19), EDAM (20), Uberon (21) to capture information about organism, experimental assays, file types and formats and anatomy respectively.

Identifiers, datasets and releases
An important element of the FAIR principles is to ensure that data are findable, both for human operators as well as algorithms. We assign a stable local unique identifier to each subject, sample and file asset at NeMO. This ensures that we can track each of these assets uniquely over the lifetime of the asset. We have registered NeMO as a source of such identifiers with identifiers.org to ensure NeMO identifiers can be resolved by the identifier resolver service. In addition to assigning local identifiers to individual assets, we also assign local as well as global unique identifiers to asset collections, which might include data used in a paper or a NeMO quarterly data release. These asset collections are released as BDBags (22), a structured method to create a collection of objects, and each of these is assigned a digital object identifier (DOI) issued through DataCite (23). FAIR principles require that an identifier can be used to find additional information about the object that can be consumed by both people and machines. In this spirit, each NeMO asset has a landing page which describes the asset for humans, as well as a structured JSON object describing the asset for machines.

Consensus data processing
Consensus processing of BICCN data was performed using workflows implemented on Terra, a cloud-native Nucleic Acids Research, 2023, Vol. 51, Database issue D1077 workbench developed by Broad, Verily and Microsoft (Terra.bio). These data include single cell transcriptomic data generated using 10× Genomics and SMART-seq technologies, single cell methylation data (snmC-seq), and single cell chromatin accessibility data (snATAC-seq). Single-cell and single-nucleus 10× Genomics transcriptomic read-level data were processed to generate exon/intron counts using the Optimus pipeline (24) (RRID:SCR 018908). SMARTseq data were processed with the Smart-seq2 Single Nucleus Multi-Sample Pipeline (25) (RRID:SCR 021312) in batches of 100, optimized to maximize the number of samples run on a single virtual machine within 24 h and minimize cost. For consistency, the same genome build and gene annotation were used for 10× and SMART-seq processing. snATAC-seq reads were aligned using the scATAC Pipeline (26) (RRID:SCR 018919), which utilizes SNAP-ATAC (27) and summarizes read counts per 5000 bp. snmCseq data were processed using the CEMBA pipeline (28,29) (RRID:SCR 021219), which utilizes bowtie2 for alignment and produces counts for every CpG site in the genome.

NeMO portal
To facilitate the exploration of data acquired by the NeMO Archive, the project has provisioned a web application called the NeMO Portal, available at https://portal. nemoarchive.org. The portal allows users to explore the data in various usage patterns. For example, the portal allows users to filter the existing data using facets along the left side of the interface. This is known as 'faceted search' and is commonly used in well-known e-commerce websites such as amazon.com. By using faceted search, we present a familiar and accepted usage pattern to end users of the portal to allow them to intuitively narrow down the available data to only the datasets and files that they are interested in. Facets include organism (e.g. mouse, human, marmoset), brain region, sequencing technique, and modality type (transcriptomics, epigenomics), among others. The site is dynamic in nature such that when different facets are selected or de-selected, updated charts and visualizations about the filtered data are automatically re-generated and presented to the end users so that they may better comprehend how the facets affect their selections.
The portal also provides an 'Advanced Search' interface, which allows users to directly enter a query using a simple query language. If a syntax error is detected, the portal's interface reports where in the query the error is located for easier correction. Charts and graphs update dynamically as the query is refined. The use of the aforementioned facets is also translated into an advanced search query statement.
When users have completed their exploration of the data and found datasets or files they are interested in, the portal allows them to add them (individually or in bulk) to a data cart. Again, the usage is similar to common patterns seen elsewhere on the web for greater familiarity. The cart has features that allow the user to download a file, called a manifest, which contains metadata and the network locations of the files of interest. To download these files, additional ancillary tools associated with the portal have been developed, such as the portal client (https://github. com/IGS/portal client) that allow the files in the manifests to be downloaded. This approach is taken because it is not feasible to dynamically download the user's selections directly to the browser, as the total volume can easily become tens of terabytes of size. The NeMO portal has also been integrated with the Terra platform to allow manifests to be exported into a Terra workspace for processing on the cloud, avoiding the need for local data download.

NeMO Analytics
The NeMO Analytics portal is implemented as an instance of the Gene Expression Analysis Resource (gEAR) software (30) for visualization and analysis of transcriptomic and epigenomic data. A strength of the portal is its ability to display multiple multi-omic datasets in a single page, effectively allowing a page (termed 'profile') to fully support a manuscript or include a collection of thematically related datasets. Direct access to the data is often offered by custom URLs presented in manuscript figure legends and text. Finally, data analysis tools are attached to each dataset, allowing further exploration of the data also to biologists with limited informatics skills (31). Data from BICCN were obtained from data generating labs via the NeMO Archive. Non-BICCN data were obtained from public repositories, including the Gene Expression Omnibus, the UCSC Cell Browser (cells.ucsc.edu) and GEMMA (32). Data were uploaded into the portal via the Data Uploader. For sn/scRNA-seq, Patch-seq, and MER-FISH experiments, we uploaded read counts and associated metadata. For scATAC-seq and scMethyl-seq experiments, we created TrackHubs and linked them to the portal. Custom visualizations for each dataset were created using the Data Curator tool and assembled into Profiles, enabling several datasets to be viewed side-by-side.

User guides
We have made available several user's guides providing step-by-step guides to using our resources. The NeMO Archive Documentation provides comprehensive step-bystep guides from the perspectives of both data submitters and users of the site: https://github.com/nemoarchive/ documentation, mirrored at nemoarchive.org/resources. The NeMO Analytics user's guide includes tutorials and videos describing how to use the gEAR software that underlies the platform: https://nemoanalytics.org/manual.html. The Terra user's guides provide extensive documentation for working in the Terra platform: https://terra.bio/resources/ getting-started/.

A resource of single-cell multi-omic data from the brains of humans, non-human primates and mice
The NeMO Archive has been continuously ingesting singlecell transcriptomic and epigenomic data from BRAIN Initiative researchers since August 2018. As of 10 August 2022, the archive contains 383.1 terabytes of data in 1 216 873 files from 289 707 biological samples ( Figure 1A). Summaries of all the BICCN projects contributing data to the NeMO Archive are available at biccn.org. Briefly, the overarching goal of these projects is to describe the diversity of cell types in the mammalian brain, and the resulting atlases will serve as references for the cell types in all of the major brain regions in mice and humans. Single-cell transcriptomics, single-cell epigenomics, and spatial transcriptomics technologies were each applied systematically across these brain regions, which will enable an integrated multi-modal description for each cell type. In total, these data describe the transcriptomes or epigenomes of ∼57 452 822 cells.
While a subset of these data have been presented in peerreviewed publications, the majority of the data are being released to the research community prior to publication by the BICCN consortium in an effort to accelerate discovery. Data producers within BICCN have submitted, at a minimum, raw data consisting of FASTQ files, as well as metadata describing the species, genotype, brain region, technique, investigator, grant number, and related information about each sample. The NeMO Archive also provides access to processed data such as read counts and cell type assignments from BICCN publications. These cell annotations are expected to become far more complete as the projects progress.
BRAIN Initiative researchers utilized several distinct single-cell technologies to profile brain cell types (Figure 1B). Single-cell and single-nucleus transcriptomic data (sc/snRNA-seq) were generated from 10 887 604 whole cells and 18 154 968 nuclei, respectively. Of these, 27 921 806 cells/nuclei were sequenced with variations of the 10x Genomics 3 Gene Expression technology (33), which captures cells via droplet microfluidics and generates sequence from the 3 ends of RNA molecules. sc/snRNA-seq was generated from an additional 834,367 cells with Drop-Seq, which is also a droplet-based method (3). Full-length transcriptomes were generated from 286 418 cells and nuclei with SMART-seq (34). Spatial transcriptomic data were gener-Nucleic Acids Research, 2023, Vol. 51, Database issue D1079 ated from 21 802 314 spatial positions in the brain at 10 um resolution using Slide-seqV2 (35) (henceforth, spatial positions are included among summaries of cell counts as single-cells, consistent with the near-single cell resolution of these data). Single-cell epigenomic data include open chromatin profiling of 4 021 312 cells with the Assay for Transposase-Accessible Chromatin (scATAC-seq) (36), as well as single-cell profiling of DNA cytosine methylation (snmC-seq2) (37) in 539 969 cells. Single-cell ATAC and RNA multi-omes were generated from 1 161 948 cells using 10× Genomics Multi-ome, SHARE-seq (38), and SNAREseq (39). A subset of the SMART-seq and snMethyl-seq data were generated as part of multi-modal experiments, including imaging of neuronal morphology and projection patterns, retrograde tracing, and electrophysiology (Patchseq).
Data in the NeMO Archive were derived from the brains of 20 mammalian species ( Figure 1C). The majority of the data are from laboratory mouse strains (40 232 712 cells). These are primarily from 8-week-old mice of the C57BL/6 strain and its derivatives expressing cell type-specific Cre reporters, and a smaller number of samples describe pre-and postnatal mouse brain development. Samples from mice are annotated to 194 distinct sub-regions and cell populations labeled by Cre reporters, spanning all of the major structures of the forebrain, midbrain, and hindbrain, as defined in the Allen mouse brain common coordinate framework (40). 10 503 719 cells are from the human brain. These data include extensive atlases for both the adult and developing human brain and are annotated to 269 distinct subregions and cell populations. Adult samples were derived from donors who were 18-68 years old at time of death with no history of psychiatric or neurological disorders. Developmental samples span the full range of human prenatal development from 4 to 34 gestational weeks, as well as early postnatal development. The BICCN has also generated extensive single-cell genomic resources for the brains of nonhuman primates and other mammals. These include atlases for many of the brain regions of marmosets (1 693 284 cells) and rhesus macaques (579 187 cells), as well as surveys of specific forebrain structures in 16 additional species spanning several mammalian clades.
Each sample and file in the NeMO Archive is assigned a unique identifier and indexed by its metadata (Materials and Methods). We provide multiple interfaces to the data (Figure 2), including direct access to the data via http, sftp, and Google Cloud Platform buckets; a web portal enabling users to search for datasets and download them (nemoarchive.org), a cloud-computing interface enabling users to perform data processing at scale (using pipelines implemented in Terra, terra.bio), and web-based visualization and analysis tools implemented at a companion website, NeMO Analytics (nemoanalytics.org). These interfaces are described in detail below.

Consensus processing of multi-omic data with the BICCN cloud-computing environment
To facilitate joint analyses of BICCN data, we have developed consensus pipelines for large-scale processing of single-cell genomic data submitted to the NeMO Archive.
These pipelines are implemented on Terra in partnership with BICCN's Analysis Working Group and written in Workflow Description Language (WDL), a workflow processing language designed to be easy for humans to read and write. WDL provides the flexibility to combine multiple computing languages in a single workflow and execute this workflow on a variety of local or cloud platforms. Several BICCN pipelines are available, including Optimus, a pipeline for processing 10x Genomics scRNA-seq and snRNA-seq data; a full transcript (SMART-seq) scRNAseq and snRNA-seq data processing pipeline (Smart-seq pipeline); and a pipeline for snATAC-seq utilizing SNAP-ATAC (27). Each cloud-native pipeline was rigorously tested by consortium members to ensure outputs replicated expert in-house pipelines and optimized to improve performance and cost in the cloud environment. For example, the cost of SMART-seq processing was decreased 50fold by switching from HISAT2 (41) to STAR (42) for genome alignment, using dedicated virtual machines with pre-loaded reference genomes, and optimizing the number of samples to process within a 24-h period to maximize compute resources. Additionally, engineers from Terra and NeMO utilized the APIs to submit hundreds of jobs to Terra via the command line, facilitating large-scale processing and removing the need to use manual user interfaces. Each of these pipelines is publicly available on Terra and listed in the BICCN portal (https://biccn.org/tools/biccnpipelines).
As of 31 July 2022, we have completed consensus processing using these pipelines for the adult mouse brain atlas, consisting of 1475 10× Genomics sc/snRNAseq samples (∼13 000 000 cells); 213 410 SMART-seq samples (213 410 cells); 138 single-nucleus methyl-cytosine sequencing (snmC-seq) and 185 epi-retro seq samples (168 098 and 19 357 cells respectfully); and 96 snATAC-seq samples (1 094 579 cells). In addition, we make available read counts from the adult and prenatal human brain atlases, processed through equivalent pipelines with CellRanger. All of these processed data are available in the NeMO portal (see below).

Accessing BICCN data through the NeMO portal
NeMO has deployed a data portal where users can search for and download data. The home page of the NeMO data portal ( Figure 3A) gives the data summary including the number of grants or studies represented at NeMO, the file count summarized by the data modality for each of the grants, and quick links to featured datasets. The home page also serves as the launchpad to access other functions of the portal including accessing additional information about each of the grants/studies and accessing the data search and discovery page.
The filtering interface organizes the facets in different categories and displays the items within a category. In addition, the interface displays the count of objects that are tagged with that facet. The filters can be applied to 'Samples' or 'Files' as represented by the two tabs in the top left corner. Sample-level filters are based on characteristics such as the project or study, the organism, the anatomical region, data modality, and the sequencing technique. File-level fil- ters include characteristics such as file data type or file format. Besides the filters that are visible, the users can access additional filters by clicking on the 'Add a filter' button to access other metadata fields that can be used to filter the data. As filters are applied, the summary information presented as charts on the right are automatically updated. In addition, the query that is being built through filtering is displayed in the query box on the top of the screen. The charts themselves are interactive and users can click on the charts to add additional filters. Users can view any of the charts as a table by clicking on the table icon on the top right of the chart panel. In addition to the faceted search, the portal also has an advanced search button that can be accessed by clicking on the 'Advanced' button on the query box. This brings up the advanced search interface (see Figure 3C) where users can use the type-ahead feature (a feature where users start typing a letter all possible fields starting with those letter(s) are displayed for users to choose) of the portal to filter data based on all the available metadata fields. For advanced users familiar with the metadata this can be a more efficient search interface. Another advantage of the advanced interface is that users can copy and save the text query search for subsequent use.
Once the user has narrowed down the dataset the user can reuse the 'Samples' or 'Files' tab ( Figure 3C) to select one or more of the objects to add to the 'Data Cart'. If a sample is selected, all files associated with that sample are added to the data cart. If a file is selected, only the selected file is added to the cart. The data cart page gives a summary of the data including the number of files, samples associated with these files, and the cumulative size of the data. In addition, the interface displays a list of the files in a table.
Here the user can click on the hyperlink associated with a file or sample to get additional information about the data resource ( Figure 3D) or use the 'Download' button to either download the data or move the data to other linked resources such as the Terra cloud-computing environment. When the user chooses to download the data, the user must download a manifest file, which stores the information necessary to download the actual data to the local computer. A separate download tool is necessary to download the actual data. The user can also choose to download the metadata associated with the files in the data cart. If the user wishes to make use of the data with pipelines at Terra, no data is actually moved, rather only information necessary for Terra to access the data stored in the NeMO cloud is transferred to Terra.

Visualization and analysis of multi-omic brain data with NeMO Analytics
It can be challenging for biologists not skilled in bioinformatics to fully utilize NeMO Archive data for visualization and analysis. To address this, we have initiated NeMO Analytics (nemoanalytics.org), a portal designed to allow a broad range of neuroscientists to fully benefit from these data without requiring any expertise in programming. NeMO Analytics gene expression tools are powered by gEAR (umgear.org) (30) and consist of the following six components: an Expression Browser allows the visualization of gene expression patterns (one gene at a time) across multiple datasets or views; the Curator Tool makes views of datasets customizable by the user, supporting data presentations as bar plots, line plots, x-y scatter plots (e.g. tSNE and UMAP plots of scRNA-seq data), violin plots, and SVG images; the Comparison Tool tests pairwise contrasts of all of the genes in two groups (e.g. two cell types); the Single Cell RNA-seq Workbench enables on-the-fly cell type clustering and marker gene detection via a Scanpy workflow (43); epigenomic data can be visualized in the context of a linear genome browser via integration with EpiViz (44); the Dataset Manager allows the selection of groups of datasets to visualize side-by-side.
We produced seven multi-dataset profiles enabling users to interact with the primary motor cortex datasets described in publications by the BICCN (12)(13)(14) (Figure 4). The mouse transcriptomic, epigenomic, and 'integrated' profiles provide visualizations of these data types from mouse primary motor cortex (14). The cross-species and merged-species profiles provide visualizations of transcriptomic data from integrated analyses of human, marmoset, and mouse primary motor cortex (13). The Patch-seq profile displays gene expression in the context of physiological cell types from multi-modal transcriptomic and electrophysiological profiling of primary motor cortex neurons (12). The Spatial profile displays MERFISH singlecell resolution in situ transcriptomics data on cortical sections through the primary motor cortex (12). In addition, we produced three Cortical Development profiles displaying single-cell transcriptomic data from the developing human cortex produced by the BICCN (16,(45)(46)(47), alongside related resources from the developing brains of humans, non-human primates, and mice, as well as cortical organoids.
A long-term goal for the BRAIN Initiative is to integrate cell type atlases with research on brain disorders. Toward this end, we have assembled several profiles with singlecell and traditional transcriptomics and epigenomics experiments related to Alzheimer's disease (AD). These NeMO-AD profiles feature >50 AD-related RNA-seq and scRNAseq datasets, including data from post-mortem brain tissue of AD cases vs. controls, as well as from animal models. Using these profiles, one can assess AD-related transcriptional changes across disease progression, in several brain regions, and in specific cell types. We have also created a NeMO-AD profile for the visualization of data describing microglial cell states, since AD is associated with brain inflammation mediated by these immune cells.
In addition to these pre-designed profiles, NeMO Analytics enables users to develop their own profiles from datasets in the NeMO Archive or by uploading any dataset of interest. As of Auguet 2022, 713 datasets are publicly available in the NeMO Analytics platform. These datasets can be analyzed using the Single-Cell RNA-seq Workbench to explore the molecular markers in each cell type. Users can then add their analyses to a profile or download the data to perform more detailed analysis. The Dataset Uploader allows users to create visualizations and analyses of any scRNA-seq, RNA-seq, or epigenomic dataset of interest. These datasets are initially private to the user. At their discretion, users can share their custom datasets and profiles with other users or make them fully public. The latter feature has proven useful in rapidly producing companion websites for both BICCN (12)(13)(14) and non-BICCN (48,49) publications.

DISCUSSION
The single-cell transcriptomic and epigenomic data generated by the BICCN and housed in the NeMO Archive are valuable resources to the research community. We anticipate that many investigators will use these data as references to assist in the annotation of the cell types in their own singlecell genomic data. Computational biologists and computer scientists will utilize the data to further annotate the cell types and their molecular profiles using novel approaches. For many biologists, the most valuable aspect of the resource will be the ability to query individual genes of interest using NeMO Analytics. Our goal is to make the data as widely available as possible to support all of these use cases and others.
New features will be regularly added to the NeMO Portal and to NeMO Analytics with an eye toward maximizing the utility of these resources. One such opportunity will be to incorporate additional annotations and metadata, most importantly annotations of cell types. Currently, the NeMO Archive hosts fully-analyzed data from selected BICCN publications (12)(13)(14), including information about provenance that is consistent with the recently proposed minSCe standards (50). We will continue to ingest these data as the analyses mature. As annotated data become more prevalent, we anticipate enabling users to search the Archive for specific cell types of interest. However, these initial efforts by the BICCN are unlikely to erase the broader issue that there is no widely agreed upon catalog of brain cell types and their molecular markers. Consequently, cell type annotations are likely to change over time as new information becomes available. Also, there is no widely accepted convention for naming brain cell types, so similar cell types may be named differently across studies. Ongoing efforts from the BICCN and other groups aim to develop standards for how to define cell types for the brain based on single-cell data, extending existing standards in that area (51). The NeMO Archive is committed to building an infrastructure to support these efforts.
A related challenge is the need to integrate transcriptomic and epigenomic cell types with functional and morphological data. Despite the relatively low throughput of Patch-seq and related techniques, BICCN researchers have generated multi-modal data from thousands of neurons. In addition to the NeMO Archive, the BRAIN Initiative is supporting archives for imaging-based data (https://www.brainimagelibrary.org) and physiological data (https://dandiarchive.org/), and the Brain Cell Data Center (biccn.org) is tasked with multi-modal integration. We are working closely with these teams to ensure that users can find all the data from each cell and to support the development of multi-modal cell type models.
There are also important opportunities to more fully integrate data archives with additional analytical tools that are also part of the BRAIN Initiative data ecosystem. Several visualization and analysis tools have been produced, including our NeMO Analytics platform, as well as the Broad Single Cell Portal (singlecell.broadinstitute.org), the UCSC Cell Browser (cells.ucsc.edu), the Allen Institute's Cell Cards resources (https://celltypes.brain-map.org/), and others. Each of these platforms host complementary datasets and have complementary features, so it will be valuable to build the infrastructure to enable users to easily transit back and forth between them while querying genes or cell types of interest. Similarly, there is a need for infrastructure to make it possible for the visualization tools to pull annotated data directly from the archives in an automated fashion, which is essential if the visualization tools are to provide access to all of the datasets that are becoming available through the archives.
Finally, we are excited to support the integration of BRAIN Initiative resources with efforts in the broader research community to further annotate cell types and reveal their dynamic changes across conditions. Our archive should not exist in a vacuum of other important studies that are hosted by other funded resources. The research community will be best served if these resources operate in a data landscape where many brain-related projects are federated and these data are made more FAIR to users. Notably, the NeMO Archive will be the data repository for the NIHfunded Single Cell Opioid Response in the Context of HIV (SCORCH) program, providing an opportunity for us to pilot this resource integration 'in-house'. We are actively developing partnerships with other consortia and researchers developing data of these types to extend the ecosystem more broadly.

DATA AVAILABILITY
Data resources described in this manuscript are available in the NeMO Archive (nemoarchive.org) and NeMO Analytics (nemoanalytics.org).