Protocol for single-nucleus ATAC sequencing and bioinformatic analysis in frozen human brain tissue

Summary Single-nucleus ATAC sequencing (snATAC-seq) employs a hyperactive Tn5 transposase to gain precise information about the cis-regulatory elements in specific cell types. However, the standard protocol of snATAC-seq is not optimized for all tissues, including the brain. Here, we present a modified protocol for single-nuclei isolation from postmortem frozen human brain tissue, followed by snATAC-seq library preparation and sequencing. We also describe an integrated bioinformatics analysis pipeline using an R package (ArchRtoSignac) to robustly analyze snATAC-seq data. For complete details on the use and execution of this protocol, please refer to Morabito et al. (2021).


SUMMARY
Single-nucleus ATAC sequencing (snATAC-seq) employs a hyperactive Tn5 transposase to gain precise information about the cis-regulatory elements in specific cell types. However, the standard protocol of snATAC-seq is not optimized for all tissues, including the brain. Here, we present a modified protocol for singlenuclei isolation from postmortem frozen human brain tissue, followed by snATAC-seq library preparation and sequencing. We also describe an integrated bioinformatics analysis pipeline using an R package (ArchRtoSignac) to robustly analyze snATAC-seq data. For complete details on the use and execution of this protocol, please refer to Morabito et al. (2021).

BEFORE YOU BEGIN
Assay for transposase-accessible chromatin coupled with high-throughput sequencing (ATAC-seq) is a simple, fast, and sensitive method for mapping chromatin accessibility genome-wide and is therefore considered as an alternative to other methods like DNase-seq or MNase-seq (Buenrostro et al., 2015). ATAC-seq works well across many cell types and species and requires only 50,000 cells or less to capture regulatory elements (Shashikant and Ettensohn, 2019). However, bulk measurements of chromatin accessibility limit the precise understanding of how specific cell-types contribute to overall disease etiology (Fang et al., 2021). Recent innovations in single-cell genomics have enabled scalable profiling of chromatin with cellular or subcellular resolution using single nucleus ATAC-seq (snATAC-seq) (Rai et al., 2020;Ziffra et al., 2021). The typical bulk ATAC or snA-TAC-seq protocol requires nuclei extracted from fresh cells and tissues, due to the significant disruptive effect of freezing on chromatin integrity and structure. However, intensive effort has been made to introduce modifications in the snATAC-seq protocol to allow nuclei extraction from frozen tissues, including frozen mammalian brains. The current protocol provides a simple yet highly reproducible pipeline for isolation of single nuclei from frozen postmortem human brains followed by library preparation for snATAC-seq and downstream bioinformatics analysis. The tissue dissection protocol and modified single nucleus isolation protocol together give rise to enough intact nuclei from the frozen brain to achieve targeted nuclei recovery for further processing. Additionally, our integrated bioinformatic protocol using both ArchR and Signac pipelines (Granja et al., 2021;Stuart et al., 2021) provides an approach, specifically an R package (ArchRtoSignac), to convert an ArchRProject to a Signac SeuratObject. This conversion R package keeps the 501 bp fixed-width open chromatin regions in the peak matrix generated by ArchR for chromatin accessibility analysis and provides the option to access additional advanced secondary analyses through Signac, in addition to the functions in the ArchR pipeline. Due to the inherent sparsity of snATAC-seq data and the heterogeneity of human samples, a minimum of 2000-3000 nuclei per sample with four biological replicate samples per group is highly recommended for the analysis of open chromatin changes. In summary, the modified wet lab protocol is robust, simple, and capable of generating high-quality chromatin accessibility data from frozen human tissue. The integrated dry lab protocol provides a data format conversion to take advantage of two popular snATAC-seq analysis pipelines, ArchR and Signac, which is more robust than using the two pipelines separately.

Institutional permissions
Human postmortem brain samples (prefrontal cortex, PFC) were obtained from UCI MIND's Alzheimer's Disease Research Center (ADRC) tissue repository under the Institutional Review Board (IRB) of UCI. All participants, or participants' legal representative, provided written informed consent for the study.

Part one: Wet lab preparation
Timing: $1 h (for preparationof frozen human brain tissue dissection) Timing: $1 h (for preparationof nuclei isolation) Timing: $2 h(forlibrary preparation) 1. Frozen human brain tissue dissection. Please check materials and equipment section (Materials needed for frozen human brain dissection) for a complete list of required reagents and consumables.
a. Clean the dissection box with 10% bleach (contact time 10 min) followed by freshly prepared 70% ethanol. Place the dissection box on absorbent pads. b. Prepare the À80 C box where the 1.5 mL tubes (nuclease-free and LoBind tubes e.g., 1.5 mL Eppendorf tubes, Catalog No. 022431021) will be kept by labeling it and putting it on dry ice outside the dissection box. c. Mark and prechill all the 1.5 mL tubes. d. Prechill nuclease-free spatula, petri-dishes, and tissue forceps (preferably with teeth on the tips of the forceps for more secure hold) in dry ice. e. Weigh a few 1.5 mL Eppendorf tubes to get an idea of the weight. 2. Nuclei isolation from frozen human brain tissue for snATAC-seq.
a. Set the centrifuge temperature at 4 C. b. Mark all the 1.5 mL and 15 mL tubes. c. Thaw/equilibrate the reagents to room temperature (20 C-25 C) or specific thawing temperature (e.g., incubate digitonin at 65 C to dissolve precipitates). d. Prepare fresh Wash Buffer (13), 13 Lysis Buffer, Lysis Dilution Buffer, 0.13 Lysis Buffer, Sucrose Cushion Buffer I and Diluted Nuclei Buffer (Please see the 'buffers for single nucleus isolation' section for required reagents and buffer recipes). Keep the buffers on ice before use. Do not store the buffers overnight (>6 h). e. Prewet 70 mm Cell Strainer with sterile and nuclease-free 13 PBS. 3. Library preparation: Please refer to the detailed kits instruction for exact guidelines and preparation. However, few common preparations are listed below. a. Thaw/equilibrate the buffers, gel beads, primers to room temperature (20 C-25 C) or specific thawing temperature (Please refer to Chromium Next GEM Single Cell ATAC Reagent Kits v1/ v1.1 user manual).
Note: A few reagents, including cleanup buffer, require special thawing conditions like a 10 min incubation at 65 C with max speed ($1400 rpm) on a ThermoMixer.
b. Place enzymes on ice 5-10 min before adding to the reaction mixture. c. Set temperatures and cycle numbers in the PCR machines (always double-check before incubating the reaction mixture). d. Mark all the tubes and place those on ice if needed. e. Properly mix the beads (SPRI and Dynabeads) before use. f. Prepare glycerol solution (for <8 samples). g. Thaw, vortex, and centrifuge gel beads before loading into the chip. h. Prepare 80% ethanol (fresh) before use. i. Prepare reagents/working solutions (as required) for quantifying DNA (by Qubit Fluorometer) and measuring average fragment length (by TapeStation).
Part two: Dry lab preparation Installing cellranger-atac on a local PC or high-performance computing cluster.
We can install cellranger-atac software on a local desktop/Linux environment or on a high-performance computing cluster. Full instructions for downloading and installing the software and reference data package can be found on the 103 Genomics webpage.

4.
A proper reference is needed for read alignment. For example, if a human sample reference is needed for snATAC-seq, please download and use the human genome reference (GRCh38) dataset from cellranger: refdata-cellranger-arc-GRCh38-2020-A-2.0.0.tar.gz. 5. Please unzip it for the following use: Installing ArchRtoSignac package for data format conversion.
We made an R package with several wrapper functions to make the object conversion from ArchRProject to SeuratObject simplified.
6. Install the ArchRProject to Signac SeuratObject conversion function by running the following code: >tar -xvf refdata-cellranger-arc-GRCh38-2020-A-2. 0 Note: To analyze snATAC-seq data, ArchR and Signac packages are required, and they are required for the use of ArchRtoSignac package. However, due to the complexity of the programming environment, we suggest constructing one isolated conda environment with ArchR, Signac and other dependencies to avoid package conflicts for snATAC-seq analysis. After installing ArchRtoSignac package, we provide a quick way to check if all required packages are installed and load those available packages automatically. Please refer to the ArchRtoSignac GitHub page for detailed instructions for installation and construction of an isolated conda environment.

KEY RESOURCES TABLE
# Packages we need for ArchRtoSignac > packages <-c("ArchR","Seurat", "Signac","stringr") > loadinglibrary(packages) # For example: If the required R package is not available, a ''Package not found'' message will remind the researcher to install the package. Otherwise, the R package will be loaded.
Buffers for single nucleus isolation CRITICAL: Please prepare fresh buffers before use and keep the buffers on wet ice before adding to the tissue/nuclei. Do not store the buffers more than 6 h (360 min) on ice.
Note: We need 1.5 mL Wash Buffer /sample (Please make 2 mL/sample).
Note: Required amount of 0.13 Lysis Buffer is 0.5 mL/Sample.
Note: Required amount of Sucrose Cushion Buffer I is 2.8 mL/sample (Please make 3 mL/sample).
Note: Required amount of Diluted Nuclei Buffer depends on the number of isolated nuclei (Please make 1 mL/sample).

Wash buffer
Component (

STEP-BY-STEP METHOD DETAILS
Part one: Wet lab protocol Timing: 7-10 min for each sample dissection. Additionally, an interval of 3-4 min between each sample is required to prechill the accessories Timing: 1.5-2 hfor nuclei isolation The objective of part one is to isolate nuclei from frozen human brain samples and process it for single-nucleus ATAC-seq library preparation and sequencing.

Frozen human brain tissue dissection:
Note: Required amount of tissue is 50(G5) mg flash frozen brain tissue.
Note: This step is required to aliquot the exact amount of tissue for the assay.
CRITICAL: Do not thaw the tissue at any point. Use enough dry ice to fully cover the tubes containing tissue.
a. Fill the dissection box with dry ice, making sure that the surface is completely covered and full of dry ice as shown in Figure 1.
Note: Wait for at least 15-20 min after the initial filling for the chamber to cool down completely.
b. Keep 50-100 mg of Drierite absorbent (with indicator; 8 mesh) within the box to absorb excess moisture. c. Dissect each sample carefully in the dissection box on a petri-dish. The top and bottom of each dish can be used for separate samples. d. Aliquot tissue in the prechilled 1.5 mL tube and then quickly weigh on the scale.
Note: Required amount of tissue per sample is 50(+/-5) mg (around 0.5 cm size) flash frozen brain tissue.
e. Immediately place the samples on dry ice and finally in the storage box (already on dry ice). f. Wipe the spatula and forceps with 70% ethanol. Discard the petri-dish in biohazard bins. g. At the end of the day, leave the dry ice in the box, and the next morning, clean the box with 10% bleach (contact time 10 min) and then 70% ethanol.
Pause point: Store the tissue samples at À80 C or proceed to the next step (step 2.a).

Figure 1. Dissection box setup
The precleaned box is placed on an absorbent pad and the inner surface of the insulated box is fully covered with dry ice. All required things including a spatula, petri-dishes, labeled 1.5 mL tubes and forceps are precooled in dry ice. 50-100 mg of Drierite absorbent is also placed within the box to reduce the moisture level.
Note: This step is required for preparing a single nucleus suspension (of optimum concentration and quality) from the desired brain tissue.
Note: Reagents needed include freshly prepared Wash Buffer (13), freshly prepared 13 Lysis Buffer, Lysis Dilution Buffer, freshly prepared 0.13 Lysis Buffer, Sucrose Cushion Buffer I, Diluted Nuclei Buffer (See the 'buffers for single nucleus isolation' section for required reagents and buffer recipes).
CRITICAL: Set the centrifuge temperature at 4 C before starting the experiment (before step 2.a).
CRITICAL: Do not thaw the tissue before adding the Lysis Buffer.
CRITICAL: Make fresh Wash Buffer and Lysis Buffer. Do not store.
CRITICAL: Do not start with more than 8 samples.
CRITICAL: Do not freeze isolated nuclei following nuclei isolation.
a. Add 500 mL chilled 0.13 Lysis Buffer to the frozen brain tissue and immediately homogenize 15 times using a pellet pestle (Fisherbrandä Pellet Pestleä Cordless Motor with RNase-Free Disposable Pellet Pestles, Catalog No.12-141-364).
CRITICAL: Avoid bubble formation during this step.
b. Incubate the lysate for 5 min on ice. c. Pipette mix 10 times. You can use a wide-bore pipette tip if needed. d. Incubate the lysate for 10 min on ice. e. Add 500 mL Wash Buffer (chilled) to each tube and pipette mix 5-7 times. f. Pass the suspension through a 70 mm Cell Strainer (prewet the strainer with 13 nuclease-free and sterile PBS) into a 15 mL marked and precooled tube. g. Add 1.8 mL Sucrose Cushion Buffer I to the 15 mL tube and pipette mix 10 times. h. Prepare two sucrose gradients by adding 500 mL Sucrose Cushion Buffer I to two new marked 2 mL tubes. i. The nuclei suspension (2.8 mL) of each sample will be divided into 2 tubes (2 mL tubes with Sucrose Cushion Buffer I) in this step. j. Carefully layer 1.4 mL of nuclei suspension (half of the total volume, i.e., 2.8 mL) to the top of each 2 mL tube containing Sucrose Cushion Buffer I. Do not mix. k. Centrifuge the tubes (containing nuclei suspension and Sucrose Cushion Buffer I) at 13000 g for 45 min at 4 C. l. Check the myelin content (a white sticky layer as shown in Figure 2A) at the top and sidewall of the tube and carefully remove the myelin (if needed), using 1 mL pipette tips or a sterile and nuclease free spatula, without disturbing the pellet. m. Carefully remove the supernatant leaving 100 mL in each tube and resuspend the nuclei pellet by gentle pipetting ($10 times). Do not vortex the cell suspension for mixing. n. Add 500 mL Wash Buffer and gently pipette mix 8-10 times or until nuclei are completely resuspended. o. Carefully pass the resuspended nuclei through a 40 mm Flowmi Cell Strainer into a new 1.5 mL tube. p. Combine the 2 aliquots of the same sample at this point. Total volume (in the new 1.5 mL tube) after combining both the filtered cell suspension (of same sample) should be $1200 mL.

OPEN ACCESS
q. Determine the nuclei concentration using a cell counter. Take 10 mL of the nuclei suspension and 1 mL of DAPI or trypan blue (please check the amount to be added) for the counting. r. Centrifuge the tube with cell suspension (from step n) at 500 g for 5 min at 4 C. s. Remove the supernatant without disturbing the pellet. t. Resuspend the pellet (final nuclei pellet) in chilled Diluted Nuclei Buffer.
Note: Resuspend the pellet in nuclei buffer using the concentration table provided in the kit (Chromium Next GEM Single Cell ATAC Reagent Kits v1/v1.1) manual.

Dilution of Nuclei Stock:
Note: Total no. of nuclei loaded onto the Chromium chip is 10,000-15,000.
Note: Total volume needed = 5 mL (for 103 Genomics kit). Please double check the specific kit user manual to ensure correct volume for your kit.
Note: Recovery efficiency factor = 1.53 (The recovery efficiency factor is determined empirically by 103 Genomics; for scATAC reagent kit, the value is 1.53).
Note: Targeted Nuclei Recovery = Total number of nuclei desired for the experiment, and is different from the actual number of nuclei loaded.  Note: Check the kit's instruction for the total volume.

Volume of Nuclei
iii. Incubate the tubes in a thermal cycler (please refer to the user guide for the incubation temperatures and time required). b. GEM Generation & Barcoding.
Note: Objective of this step includes generation of GEMs (by combining barcoded Gel Beads, transposed nuclei, Master Mix, and Partitioning Oil on a Chromium Next GEM Chip H), thermal cycling of the GEMs (for production of 103 barcoded single-stranded DNA) and breaking the GEMs for recovering the pooled fractions. Pause point: Store at 15 C for up to 18 h or at À20 C for up to 1 week or proceed to the next step.
c. Post GEM Incubation Cleanup.
Note: This step is important for removing leftover biochemical reagents and unused barcodes from the post GEM reaction mixture by using Dynabeads and Solid Phase Reversible Immobilization (SPRI) beads, respectively.

Protocol:
i. Add Recovery Agent to each sample (DO NOT pipette mix).
ii. Carefully remove and discard the Recovery Agent/Partitioning Oil (pink) from the bottom of the tube. iii. Prepare Dynabeads Cleanup Mix, add to the sample and incubate for a specific period as per the instruction. iv. Using a magnetic separator, remove the supernatant and perform two ethanol (freshly prepared 80% ethanol) washes. v. Immediately add Elution Solution and incubate for the time listed in the kit manual.

OPEN ACCESS
vi. Using a magnetic separator, separate the supernatant and transfer it to another tube. vii. Add SPRIselect reagent to each sample and incubate. viii. Using the magnetic separator, remove the supernatant and perform two ethanol (freshly prepared 80% ethanol) washes. ix. After removing from the magnet, immediately add Elution Solution and incubate for the time listed in the kit manual. x. Using a magnetic separator, separate the supernatant and transfer it to another tube.
Pause point: Store at 4 C for up to 72 h or at À20 C for up to 2 weeks or proceed to the next step.

d. Library Construction:
Note: The objective of this step is to construct the final libraries containing the P5 and P7 sequences (used in Illumina bridge amplification).
i. Prepare Sample Index PCR Mix (as per the user guide) and add it to the sample.
ii. Add individual Single Index primers to each well and record assignment.
iii. Incubate in a thermal cycler (please refer to the user guide for the incubation temperatures and time required). iv. After incubation add the SPRIselect reagent to each sample and incubate again for 5 min. v. Using a magnetic separator, transfer the supernatant to a new strip tube. vi. Again, add SPRIselect reagent to each sample and incubate again for 5 min. vii. Using the magnetic separator, remove the supernatant and perform two ethanol (freshly prepared 80% ethanol) washes. viii. After removing from the magnet, immediately add Buffer EB, pipette mix and incubate for the time listed in the kit manual. ix. Using a magnetic separator, transfer the supernatant to a new strip tube. (The sample is now ready for post library construction quality control and sequencing).
x. Measure the DNA concentration (ng/mL) of the sample with Qubit and determine the average fragment size.
Note: We use Agilent TapeStation High Sensitivity D1000 ScreenTape or Agilent Bioanalyzer High Sensitivity DNA chip to determine the average fragment size.
xi. Calculate the dsDNA library concentration (from ng/mL to nM) using the following formula: xii. Normalize the samples to an appropriate concentration for library pooling before sequencing.
Note: Library loading concentration differs with the instrument, consult sequencing instrument manual for additional details.
A representative information sheet is shown below for better understanding.
xiii. Pool libraries and dilute as necessary (to the appropriate concentration in nM/pM, e.g., final loading concentration for NovaSeq TM is 300 nM). Note: Check the instrument manual for the final pool volume and concentration.
Pause point: Store at 4 C for up to 72 h or at À20 C for long-term storage.

Sequencing:
Note: Verify the compatibility of the sequencer with the kit and proceed accordingly.
Note: A representative information sheet for sequencing a sample (using the above protocol and library produced by 103 Genomics kit) is shown below for a better understanding.

Part two: Data analysis section
Timing: $2-3 h per samplefor generating chromatin accessibility counts matrix using cellranger-atac. Runtime varies based on the size of the input files and the available computational resources. We suggest running each sample in parallel if a high-performance computing cluster is accessible Timing: $20-30 min per samplesfor creating ArrowFiles. Using multithreading will significantly reduce the processing time Timing:$20-30 minfor snATAC-seq data preparation and object conversion Timing: $2.5 hfor pseudo-bulk replicates and calling peaks Timing: less than 20 min, depending on the sample sizesfor conversion of the ArchRProject to SeuratObject and further processing Timing: 3-5 min per sampleto add a gene score/activity matrix to SeuratObject Timing: $100 minfor Signac cell-type identification with reference mapping Timing: $ 4-5 working daysfor advanced secondary analysis The objective of part two is to align the sequencing data for snATAC-seq to the reference genome, obtain chromatin accessibility count matrix and process it for downstream analysis including identification of differential accessibility regions and other advanced secondary analyses.
Note: There are two main approaches to process sequencing data produced from Chromium snATAC libraries: 103 Genomics Cloud Analysis or cellranger-atac software. Please refer to 103 Genomics' instructions for the Cloud analysis. Note: Here we demonstrate how to use the cellranger-atac count pipeline to filter reads, align to a reference genome, count cell barcodes, identify transposase cut sites, detect chromatin accessibility peaks, generate chromatin accessibility counts matrix, and perform preliminary clustering analysis.
Note: Before proceeding, please refer to the Fastq input data format.
Note: Please see the following example of sample format from our dataset: a. Input and output files' path for cellranger-atac: b. Path to reference annotation: c. Use cellranger-atac count to generate single-nucleus accessibility counts: 7. Creating ArrowFiles and an ArchRProject from cellranger-atac output and constructing TileMatrix using ArchR pipeline. We create ArrowFiles for each sample to reduce memory usage in R. An ArrowFile is simply a path to access information related to the individual sample, and ArchR will update the ArrowFile with additional information. Note: Here is a good time to refer to the ArchR website.
ArchR accepts fragments files from cellranger-atac's output to create ArrowFiles and an ArchRProject.
a. Construct the input file path of fragments samples. For the >createArrowFiles() function in the next step, we will provide the sample fragment files, saved in the format of 'fragments.tsv.gz', for the required arguments, inputFiles and sampleNames. In our tutorial, sequencing outputs have a pattern as ''Sample-'' followed by a number, such as Sample-100, Sample-101, and Sample-10.
b. Create ArrowFiles. ArchR by default generates a 'TileMatrix' of genome-wide 501-bp bins that makes quantifying accessible peaks easier as their length is fixed. We use the default 501-bp size, since ArchR suggests that the majority of peaks in snATAC-seq are less than 500-bp (Granja et al., 2021). Based on our previous testing, we have found it provides enough resolution for cell-type identification. However, the size of the tile matrix can be adjusted in one of the two options: 1) Reconstruct the tile matrix by adjusting the tile size when creating ArrowFiles. For example, if we want to create a TileMatrix with only 200-bp, we can adjust >creatArrowFiles(TileMatParams = list(tileSize = 200)). 2) Add the new TileMatrix to ArchRProject by computing counts for new fixed-width tile using >addTileMatrix(input = ArchRProject, tileSize = 200). ArchR also provides a 'GeneScoreMatrix' to store predicted gene expression based on weighting insertion counts in tiles near a gene promoter.
c. Per-cell quality control. A stringent quality control (QC) is vital to minimize the contribution of low-quality data. In the previous step, we set lenient cutoffs for the minimum number of mapped ATAC-seq fragments required per cell (minFrags) and the minimum numeric transcription start site enrichment score (minTSS) so that we can initially assess the quality of the data. demonstrate how to determine the filtering cutoff in step 7h after creating the ArchRProject using > ArchRProject(). d. Inferring snATAC-seq doublets with ArchR. During the cell barcoding process, the nanodroplet can encapsulate more than one cell, and due to this technical mistake, they are considered as a single nucleus.
e. Create an ArchRProject for your study. ArchRProject is the data structure to hold the data and to perform all the downstream analysis.
Pause point: Saving and loading ArchRProject.
Note: If you want to pause at any time, please save your R session in '.rda' or save your ArchRProject using the following command.
Note: ArchRProject can be reloaded by using the following command: f. Add metadata to ArchRProject. We can supply sample information (metadata), such as batch number, gender, etc, to annotate the cells. Sample IDs in the metadata file should match those in the ArchRProject. h. Additional quality control removal of cell outliers. TSS enrichment score is calculated based on fragment ratios centered at TSS to fragments in TSS flanking regions. In Figure 4, many cells in Sample 1 (panel A) have a relatively lower TSS enrichment score and a lower number of fragments compared to Samples 2 and 3 (panels B-C). If a significant number of nuclei from a sample have a low TSS enrichment score, it indicates poor quality of the snATAC-seq experiment. Additionally, the total number of peak fragments (nFrag) reflects the sequencing depth and cellular complexity. Nuclei with very few reads can be excluded due to low sequencing depth. Meanwhile, nuclei with an extremely high number of fragments may represent doublets, if not filtered by >filterDoublet(), or other artifacts.
8. snATAC-seq data preparation and object conversion. Due to the sparsity of the snATAC-seq data, standard dimensionality reduction methods cannot be used. Instead, a layered dimensionality reduction approach called latent semantic indexing (LSI) is used with snATAC-seq data. LSI uses combined steps of term frequency-inverse document frequency (TF-IDF), a normalization method that can be used to process sparse data, and singular value decomposition (SVD), a dimensionality reduction method, to assess how important a peak is to a sample. a. Dimensionality reduction with snATAC-seq.

OPEN ACCESS
Optional: If strong batch effect differences are observed when plotting with >plotEmbedding() from ArchR, batch effect correction with Harmony can be implemented by using >addHarmony() in ArchR. The name of the column in cellColData, a matrix containing the data associated with each nucleus, can be used to group cells together for Harmony batch correction, such as 'Batch' or 'Sample'. b. Single nucleus clustering. Clustering is standard practice for both snRNA-seq and snATACseq for cell-type grouping and to illustrate transcriptomic and epigenomic signatures of each cell type. Most single-nucleus clustering methods compute the nearest neighbor to identify communities of cell clusters. We can cluster cells in ArchR using >addClusters().
Note: Please be aware that a change is needed for the argument, reducedDims, in both functions (addHarmony and addClusters) if a different dimensionality reduction was applied, e.g., 'IterativeLSI' or 'Harmony'.
c. Single-nucleus cluster embedding. Embeddings such as Uniform Manifold Approximation and Projection (UMAP) or t-distributed stochastic neighbor embedding (t-SNE) are used to visualize nuclei in a low-dimensional space. The primary difference between UMAP and t-SNE is the interpretation of distance between the cluster communities. Both UMAP and t-SNE maintain the local cluster community structure. However, UMAP also preserves most of the global structure in data, which means clusters near each other in UMAP are informative, compared to their location in t-SNE. >addUMAP()computes the UMAP embedding and adds it to ArchRProject. Code for visualization is shown later in step 9, where we present two options for cell-type identification. d. Pseudo-bulk replicates and calling peaks. ArchR combines a group of similar nuclei to form a pseudo-bulk sample. The pseudo-bulk replicates overcome the sparsity problem of the snA-TAC-seq data for peak calling, which identifies the area in a genome that has been enriched.
With the pseudo-bulk replicates generated, we can call peaks to identify accessible regions in chromatin. A peak set containing annotation for each peak is created and saved to Arch-RProject. i. Create pseudo-bulk replicates by grouping nuclei by cluster to overcome the binary (open or close) phenomenon of chromatin accessibility.
ii. Call peaks with MACS version 2. Model-based Analysis of ChIP-Seq (MACS) version 2 currently is the default peak caller of the ENCODE ATAC-seq pipeline. ArchR pipeline uses MACS2 with the function >addReproduciblePeakSet() to produce peaks with a 501-bp fixed-width.
Optional: In addition to assigning cell identity through gene scores, ArchR allows us to align cells from a snATAC-seq dataset with cells from a snRNA-seq dataset by using >FindTrans-ferAnchors() function from the Seurat package.
e. Object conversion from ArchR to Signac. Both ArchR and Signac are popular snATAC-seq analysis packages with a comparable set of features. Some software functions are constantly under development and likely to change over time. Users can use only the Signac or ArchR pipeline during the whole analysis process. However, in this protocol, we provide an option to help with data formatting from ArchRProject to Signac SeuratObject, retaining the fixedwidth peak matrix for its advantage in computation, as peak length does not need to be normalized in the downstream analysis. After data conversion, researchers can access additional analysis pipelines and other customized functions for the analysis, such as label transfer, co-accessibility analysis, and trajectory analysis. i. Load our object conversion package ArchRtoSignac. ii. Set cellranger-atac directory (with fragments files).
iii. Get the gene annotation, which includes information related to genomic locations and their associated annotations, from Ensembl database in GRanges Object for the SeuratObject, using ArchRtoSignac GitHub wrapper function >getAnnotation().
Since the function utilizes an in-house function >GetGRangesFromEnsDb() from the Signac package to get gene annotation, the reference provided must be an Ensemble reference genome. >getAnnotation()also changes the annotation to UCSC genome style, which is used by both ArchR and Signac.
Note: These arguments will be used when running this function: Reference: An Ensembl genome reference used for the function GetGRangesFromEnsDb to extract gene annotations from EnsDb (for example, EnsDb.Hsapiens.v86).
seqStyle: Changes the sequence style of the annotation extracted from EnsDb to 'UCSC' since Signac maps to hg38.
refversion: The assembly release and versions of the UCSC genome reference (for example, 'hg38').
iv. Obtain the fixed-width peak matrix from ArchRProject and changing the row names of the peak matrix to their matched chromosome range using ArchRtoSignac GitHub wrapper function >getPeakMatrix().
v. Make a list of the samples.
vi. Convert the ArchRProject to SeuratObject by creating a list of Seurat objects for each sample with their corresponding peak matrix and then merging objects from each sample with the ArchRtoSignac wrapper function >ArchR2Signac() Note: These arguments will be used when running this function: ArchRProject: Input ArchRProject. samples: A unique sample list for all processed samples from ArchRProject. fragments_dir: A path to the cellranger-ATAC output. The directory contains all samples' folders before '/outs/fragments.tsv.gz'.
pm: peak matrix recently acquired from ArchRProject by using getPeakMatrix() function in the previous step.
reference: The same Ensembl genome reference used previously for getting the annotation.
seqStyle: The same sequence style used for getting the annotation. refversion: The same assembly release and versions of UCSC genome reference used for getting the annotation.
vii. Check if the cells are in the right order before adding other information to SeuratObject.
Note: This is extremely important for downstream analysis.
Note: reducedDims: The argument for the reduction dimension can be transferred from ArchRProject to Signac SeuratObject ix. Add gene score or gene activity matrix to SeuratObject. Gene activity matrix from Signac and gene score matrix from ArchR similarly construct expression predictions for genes by assessing their chromatin accessibility. Under the default setting, Signac calculates gene activity by summing the fragments intersecting with gene body and promoter regions. The gene score matrix in ArchR implements a comparable approach as in Signac. However, it also considers the activity of possible distal regulatory regions using an exponential weighting model, where peaks further away from the gene have lower priority. Different models' performances will vary differently for datasets, so there is not a universal best fit approach to predict gene activity (Granja et al., 2021). There are two different options to add a gene score/activity matrix: 1) Transfer the ArchR gene score matrix to the SeuratObject. 2) Add a new gene activity matrix using Signac functions to the SeuratObject.
Pause point: Saving and loading SeuratProject.
Note: Here is a good time to pause and save your SeuratObject before moving forward. The output directory can be set and created as follows.
Note: >saveRDS() saves your SeuratObject in .rds format with Sys.Date() in the file name, and the SeuratObject can be loaded through >readRDS(). 9. Signac cell-type identification. Option 1: Signac manual cell-type identification with chromatin accessibility.
Using the chromatin accessibility of each cell-type at canonical cell-type marker genes is a standard practice to decipher the cell clusters' identities. This involves manual inspection of coverage plots showing the accessibility of given genomic regions (e.g., cell-type marker genes).
Option 2: Signac cell-type identification with reference mapping.
Note: The second profiling option utilizes other datasets as a reference and maps snATACseq data onto it. We believe this method provides some advantages. Firstly, due to the sparsity of the snATAC-seq data, using a reference from a well-annotated high-quality dataset, a published resource, or an atlas would help us better interpret our data. Ideally, the reference dataset should be sequenced from the same or similar biological sample for accurate prediction. Additionally, we can integrate snATAC-seq with snRNA-seq and analyze them together, especially if unique information related subclusters are present.
>MapQuery() transfers anchor information from a reference dataset, integrates the two datasets, and projects the query data into the coordinates of a provided reference UMAP. a. Find a set of anchors between the reference and query object by using >FindTransferAnchors(). Please refer to the function source page regarding the suggested selection for dimensional reduction arguments, reduction and reference.reduction.
Note: Switching the default assay of the query dataset is needed when the query and reference datasets' default assays do not share features. We use a snRNA-seq reference dataset in our example, so we need to use gene activity score (RNA assay) in our query snATACseq dataset for reference mapping. b. Rerun the mapping process by >RunUMAP() on the selected reference dataset with the argument return.model = TRUE to compute UMAP and store the UMAP model to run >Map-Query() later. Our reference, a snRNA-seq dataset, is integrated and analyzed using Liger, which relies on integrative non-negative matrix factorization (inmf). We specified the reference reduction based on this previous integration process, so please adjust the reduction argument accordingly.
c. For the transferring reduction, please refer to >FindTransferAnchors() function source page. We chose 'cca', a suggested reduction for the projection from snRNA-seq to snA-TAC-seq. Additionally, 'lsiproject' is advised for the projection from snATAC-seq to snA-TAC-seq, and 'pcaproject' is for the projection from snRNA-seq to snRNA-seq. d. Map the snATAC-seq dataset onto the provided reference using >MapQuery(), enabling cell-type label transferring from reference dataset to the query dataset. Figure 5 shows UMAPs of mapped query nuclei from snATAC-seq to a reference dataset as an example. All extracted nuclei that passed QC are shown in Figure 5A. However, due to the sparsity of the snATAC-seq data, you should choose to filter predicted cells with a desired cutoff to limit the number of false positives after using >MapQuery() for reference mapping. After adjusting the predicted.cluster.score parameter as shown in Figures 5B-5D, nuclei in ATAC-seq with high predicted cluster scores can be preserved for future analysis. Additionally, please be careful to not remove too many nuclei. e. Remove nuclei with low predicted cluster scores.
Note: Results (Figures 6 and 7) of step 9 Signac cell-type identification are in the expected outcomes section.
Note: One of the standard practices in a snATAC-seq study is finding differentially accessible regions (DAR) between comparison groups by cell type. We can use the same function as in Seurat >FindMarkers() to perform a differential accessibility test. Due to the sparsity of snATAC-seq data, we would suggest lowering the minimum fraction of cells, min.pct. Furthermore, test.use can be changed as desired; however, Signac suggests using a logistic regression framework (Stuart et al., 2021), which can account for latent variables, such as technical and/or biological variables.
a. Find the differentially accessible peaks.
11. Advance secondary analysis. a. Pseudotime trajectory analysis using Monocle3. During development, disease progression, or in response to environmental or drug stimuli, cells may transition between different states, which likely have different genetic or epigenetic # For example latent.vars = c("Batch","Age","Sex","PMI"), used only when test.use is one of 'LR', 'negbinom', 'poisson', or 'MAST' ) signatures, due to genes being silenced or newly activated. Monocle3, originally created for snRNA-seq data, can detect gene expression dynamics and trajectory over time within cell types. We can, however, utilize Monocle3 to perform trajectory analysis for snATAC-seq trajectory with the help of its extension Cicero, where single-nucleus chromatin accessibility changes place each cell in a predicted position in trajectory (Pliner et al., 2018;Cao et al., 2019). Afterward, differential accessibility analysis can be performed to determine changes in chromatin accessibility across the cell states. Depending on the goals of your studies, we can perform trajectory analysis on cell lineages or a cell-type (for example, microglia), and we can do this by subsetting the cell clusters. In order to use Monocle3 for trajectory analysis, the SeuratObject needs to be converted to a cellDataset object using >as.cell_data_set().
i. >as.cell_data_set() is a SeuratWrapper function. Install SeuratWrappers and its dependencies first before using the conversion function.
ii. Convert the group of nuclei you wish to study from snATAC-seq SeuratObject to cellDataset object.
iii. In order to compute pseudotime estimates for a trajectory, a start node or root needs to be identified, which we can do with >get_earliest_principal_node(). We can also select cells through the interactive function in Monocle3 using >order_cells() without specifying the root nodes or root cells. Additionally, we can use pre-selected cells if the start is known by supplying the parameter, >root_cells = "a vector of preselected cells", in >order_cells().
Note: Please refer to Monocle3's documentation for the helper function used for identification of root principal points.
iv. Calculate the measurement of cell progression in pseudotime.
b. Cis-regulatory network analysis using Cicero. One important piece of information from snATAC-seq data is the linkage of candidate cis-regulatory elements to their potential target genes, and this provides an opportunity to investigate common and specific regulatory mechanisms across different cell types. This linkage is known as co-accessibility, the correlation of accessibility between two peaks. Cicero computes the correlation between all pairs of peak sites within 500 kb with a distant-dependent penalty and therefore provides co-accessibility scores for pairs of peaks in snATAC-seq data (Pliner et al., 2018). Additionally, Cicero groups highly co-accessible groups of peaks into ''cis co-accessibility networks'' (CCANs), which are modules of highly co-accessible sites. Co-accessible peaks in CCANs may be physically close to form a chromatin hub, where they interact with a common set of transcriptional factors in a loop (Pliner et al., 2018). i. We use Cicero, which is an extension of Monocle3, previously used for trajectory analysis, to perform cis-regulatory network analysis, and therefore, the SeuratObject needs to be converted to a cellDataset object using >as.cell_data_set() as in trajectory analysis. However, it then needs to be further made into a Cicero object by using >make_cicero_cds().
ii. Select the genomic region of interest from the SeuratObject, and convert the chromosome sizes to a data frame before running Cicero to find co-accessibility.
Note: Please refer to the Cicero function in Signac and Cicero for Monocle3 for more detailed information and visualization.
c. DNA sequence motif and transcription factor footprinting analysis. After identification of differentially accessible peaks, we may also want to determine the binding activities of transcription factors, especially to those at accessible chromatin sites. Signac provides two separate and complementary approaches for performing motif analysis: one by finding the overrepresented motifs in the set of differentially accessible peaks we found in Part Two step 10, and another one by computing per-cell motif activity scores using chromVAR (Schep et al., 2017) and then identify differentially active motifs using >FindMarkers(). i. Identify overrepresented motifs.
ii. Compute and identify differential motif activities.

OPEN ACCESS
We can then examine the footprints of the overrepresented motifs or motifs with differential activity by using >Footprint(). Transcription factor footprinting allows us to predict the precise binding location of a TF at a particular locus. d. Genomic regions enrichment of annotations analysis. Coding regions with corresponding gene names are usually well documented with their biological functions; however, non-coding regions comparatively lack this information, especially at distal binding sites. GREAT maps cis-regulatory elements to neighboring genes within 5 kb upstream and 1 kb downstream of the TSS to predict their functions based on functional annotation databases and therefore associates cis-regulatory elements to not only their proximal binding events. GREAT (McLean et al., 2010) is available online and can be used in R with >library(rGREAT). Please be aware that the default reference is hg19 in rGREAT version 4, and the reference genomes can be changed through species = "hg38" or "mm10" in >submitGreatJob().
Note: Please refer to the rGREAT tutorial for more details.

EXPECTED OUTCOMES
The tissue dissection protocol and modified single nucleus isolation protocol should give rise to enough intact nuclei from the frozen brain to achieve targeted nuclei recovery for further processing including library generation (Please refer to Figure 2B). The kit-based library generation process is highly reproducible and provides good quality libraries (Please refer to Figure 3) for sequencing. Figure 6Ashows identified snATAC-seq clusters from the ArchR pipeline after running step 8.b that are then embedded into a UMAP by running step 8.c and plotted as demonstrated in step 9 Option 1. Clusters C1, C2, C3, and C4 have significant chromatin accessibility at the promoter/TSS and gene >job = submitGreatJob(bed, species = "hg38") Figure 3. Representative trace A representative TapeStation profile from 50 mg of frozen human brain tissue (PFC). 2 mL of DNA from each sample (from step 4.d.ix) was mixed with 2 mL of High Sensitivity D1000 Sample Buffer followed by vortex and brief centrifugation. The ladder and the samples were then loaded into the TapeStation instrument to run. body of CSF1R, a canonical cell-type marker for microglia, as shown in Figure 6B. This result indicates that this group of cell clusters is microglial. Researchers should be able to clearly distinguish cell-type clusters after using known cell-type markers for identification. Figure 7Ashows a UMAP of the reference snRNA-seq dataset and projected cell-types in snATACseq query dataset with a prediction score larger than 0.90. Researchers should be able to clearly distinguish cell-type clusters after reference mapping and filtering outlined in Part Two step 9 Option 2 a-e. Figure 7B shows snATAC-seq pseudo-bulk chromatin accessibility profiles of canonical cell-type marker genes (SYNPR for neurons, MOBP for oligodendrocytes, GFAP for astrocytes and CSF1R for microglia) as validation of the accuracy of MapQuery for label transfer and cell-type prediction. (B and C) In Samples 2 and 3, most of the nuclei have good TSS enrichment scores and fragments numbers. However, further filtering is still needed to remove outliers that are low quality (potential doublets or artifacts).

LIMITATIONS
Part one: Wet lab The authors do not recommend substituting buffers or buffer components/reagents by other providers. Additionally, the protocol is not optimized for low (<45 mg) amount of human brain tissue. Please start with 45-55 mg of tissue.
Part two: Dry lab Due to the sparsity of snATAC-seq data, calculations constructing the gene activity or gene score matrix cannot accurately predict gene expression level by only assessing chromatin accessibility. Improvements in expression prediction algorithms for snATAC-seq data will help predict gene expression levels, which will, in turn, help in cell-type identification.
The performance and accuracy of reference mapping for the cell-type identifications are dependent on the selection of the reference dataset (listed in step 9 Option 2 a-e). Having a well-annotated dataset collected from the same or similar brain regions for the reference dataset will significantly improve the prediction of cell-type for the clusters. TROUBLESHOOTING Problem 1 Abnormal low No. of nuclei as shown in Figure 8A.

Potential solution
Start with R50 mg of tissue. Grind the sample to a fine powder (flour-like consistency) with a mortar and pestle in liquid nitrogen before adding the Lysis Buffer. Ensure the sample is kept frozen while grinding.  Problem 2 Excessive debris in isolated nuclei as shown in Figure 8B.

Potential solution
If the resulting nuclei suspension shows debris, the filtration (through a 40 mm filter) of the nuclei suspension should be repeated.

Problem 3
Low DNA concentration measured by Qubit or unusual profile in TapeStation High Sensitivity D1000 ScreenTape or D5000 ScreenTape (as shown in Figure 8C).

Potential solution
Try to start with a good-quality sample. Measure RIN or perform DNA genotyping to make sure the samples are good in quality. However, avoid pooling human samples.
Make sure all the steps were performed as per the protocol. Read the whole protocol, including the kit protocols, (with special attention to buffer and Master Mix making) before starting the experiments.
Please double check all the incubation temperatures and cycle numbers before incubation.
In addition, carefully read the troubleshooting part from the specific kit user manual.

Problem 4
Failed QC or not enough nuclei for downstream analysis as shown in Figure 9.

Potential solution
Try to start with an experimental design including multiple technical and biological replicates.
Repeat the web lab procedure on additional samples if possible.

Problem 5
Parallel processing or multithreading error in ArchR.
Errors related to parallel processing can happen when running ArchR. However, this problem is much more related to the computer environment than ArchR. Please refer to the ArchR GitHub issue page if similar problems occur.

Potential solution
Set the thread as 1. Using a single thread could solve most of the problems related to parallel processing. However, the disadvantage is that it may require much more time to process the same function compared to using parallel processing.
Change the thread usage of the whole session to 1. Figure 9. Examples of per-cell quality control from step7.c Each dot is a representation of a single nucleus. (A) An example of poor quality snATAC-seq data that has not enough nuclei to pass the per-cell quality control for statistically meaningful analysis. Extremely high median fragments count in nuclei and relatively low median TSS enrichment further confirm the poor quality of the data. (B) An example of good quality snATAC-seq data with enough nuclei and reasonable median fragments count and good median TSS enrichment for downstream analysis.