In vivo, genome-wide profiling of endogenously tagged chromatin-binding proteins with spatial and temporal resolution using NanoDam in Drosophila

Summary NanoDam is a technique for genome-wide profiling of the binding targets of any endogenously tagged chromatin-binding protein in vivo, without the need for antibodies, crosslinking, or immunoprecipitation. Here, we explain the procedure for NanoDam experiments in Drosophila, starting from a genetic cross, to the generation of sequencing libraries and, finally, bioinformatic analysis. This protocol can be readily adapted for use in other model systems after simple modifications. For complete details on the use and execution of this protocol, please refer to Tang et al. (2022).


SUMMARY
NanoDam is a technique for genome-wide profiling of the binding targets of any endogenously tagged chromatin-binding protein in vivo, without the need for antibodies, crosslinking, or immunoprecipitation. Here, we explain the procedure for NanoDam experiments in Drosophila, starting from a genetic cross, to the generation of sequencing libraries and, finally, bioinformatic analysis. This protocol can be readily adapted for use in other model systems after simple modifications. For complete details on the use and execution of this protocol, please refer to Tang et al. (2022).

BEFORE YOU BEGIN
Profiling the interaction of proteins with chromatin in vivo is an essential step in understanding how gene regulation affects biological functions in different cell types. Although chromatin immunoprecipitation (ChIP) is commonly used, it depends upon the availability of specific antibodies. Furthermore, cell-type specificity has to be achieved through careful cell isolation (via fluorescent-activated cell sorting (FACS). As an alternative approach, van Steensel and Henikoff (2000) developed DNA adenine methyltransferase identification (DamID). DamID uses a Dam methylase (from E. coli) fused to a protein of interest to methylate GATC sites neighboring the protein of interest's binding sites. Building upon this, targeted DamID (TaDa) (Southall et al., 2013) was developed to place DamID under the control of the GAL4 system (Brand and Perrimon, 1993), which enables in vivo profiling and cell-type specificity without cell isolation. With TaDa, transgenes are first generated (i.e., the protein of interest fused to Dam methylase under UAS control) and the Dam fusion protein is expressed from a bicistronic mRNA, enabling low level expression and circumventing Dam-associated toxicity. To simplify TaDa and avoid the need for transgenic constructs, we developed NanoDam (Tang et al., 2022). NanoDam profiles the binding targets of endogenous proteins in vivo, with both spatial and temporal specificity. NanoDam makes use of a nanobody, a recombinant single-domain antibody originally found in camelid species, distinguished from conventional antibodies by their small size, higher stability and solubility (Muyldermans, 2001). The nanobody is used to target Dam methylase to a chromatin binding protein tagged with GFP, or any other tag recognized by a specific nanobody. Restricting the expression of NanoDam with the GAL4 system enables the genome-wide binding profile of any tagged factor to be investigated in a defined subset of its endogenous expression pattern after a single genetic cross.
The steps below describe how to perform a NanoDam experiment in Drosophila, with the use of the GAL4 system for cell-type-specific expression and a temperature-sensitive GAL80 (tubGAL80 ts ) to control the timing of GAL4 expression. However, steps from genomic DNA extraction onwards can be applied to other organisms, allowing NanoDam experiments to be applied to other model systems (e.g., organoids, cell cultures). The transgenic tools for spatial and temporal specific induction of NanoDam will vary depending on the organism and technology available. Endogenouslytagged proteins can be sourced either from stock centers (in the case of Drosophila for example) or generated via CRISPR/Cas9 genome editing.

Institutional permissions
Ensure that all experiments performed adhere to the relevant regulatory standards or national guidelines and permission has been acquired from the relevant institutions.

NanoDam induction in vivo and tissue isolation
Timing: variable; dependent on tissue of interest and developmental stage This is the profiling step of the experiment. NanoDam is expressed in the cell types of interest by GAL4 driver. If the endogenously tagged protein of interest is also present in these cell types, NanoDam will methylate the neighboring GATC sequences depending on where the protein binds in the genome.

OPEN ACCESS
1. Set up a genetic cross with GAL4>UAS-NanoDam with the endogenously-tagged protein of interest. (See Figure 1 for example). a. For the control, set up a cross with GAL4>UAS-NanoDam in the absence of the tagged protein (ideally using a fly line with the same genetic background as the experimental condition). b. Use the temperature-sensitive tubGAL80 ts to restrict the expression of the GAL4 to the time frame needed. 2. Induce NanoDam for a minimum of 10 h prior to the timepoint of interest, by shifting from 18 C (GAL4 expression inhibited) to 29 C.
Note: Typical induction times for NanoDam range from 10 h to 14 h. These times depend on the expression of the protein and cell types of interest.
3. Isolate the tissue of interest by dissection or other appropriate methods. If not required, collect whole animals (e.g., larvae) and place them into a 1.5 mL Eppendorf tube. a. Dissect tissues in PBS. b. Remove excess PBS with a pipette and store tissue until required.
Pause point: Tissue can be stored at À20 C or at À80 C.
Note: First-instar and third-instar whole larvae have also been tried with successful results (in the context of profiling binding in neural stem cells and avoiding dissection). However, this depends on specific experimental setup, such as the specificity of the GAL4 driver (for the cell types of interest), the protein of interest being profiled and the numbers of cells per organism. If cells of interest are rare, tissue can be stored long-term and processed once sufficient amounts have been collected. Replicates can also be stored and processed simultaneously. Aim for 3 biological replicates per condition (including NanoDam alone).

Sera-mag bead preparation
Timing: 30 min This preparation step for Sera-mag beads was modified from (Rohland and Reich, 2012). These beads will later be used for multiple rounds of DNA purification.
Optional: Using homebrew Sera-mag beads is a cost-effective way to perform this protocol. However, this step can be skipped if commercial alternatives are used (see materials and equipment below).

In a 15 mL Falcon tube add:
CRITICAL: Ensure the 3 g of PEG-8000 is weighed accurately as this will impact the size of DNA fragments that will be Isolated. Note: Rohland and Reich (2012) diluted Sera-mag beads in 18% PEG-8000 but for the purposes of NanoDam, dilution in 20% PEG-8000 is better and gives comparable results to Agencourt Ampure XP beads (commercial alternative).

dsAdR stock for adaptation ligation buffer
Timing: 1-2 h (For step 16) Make a 50 mM AdR stock by annealing DamID adaptors ( Figure 2). 10. Take 50 mL AdRt (100 mM in dH 2 O) and 50 mL AdRb (100 mM in dH 2 O). 11. Incubate in removable metal heating block at 95 C for 2 min. 12. Remove heating bock and allow to cool to 21 C-24 C. 13. Store the dsAdR stock at À20 C for up to 6 months.

Buffer and reagent preparation
There are several homebrew buffers in this protocol that can be made in advance and stored at À20 C before use. Making the buffers before beginning greatly increases the efficiency and ease of following this protocol. (See the materials and equipment section below for details on the reagents needed for each buffer.).

OPEN ACCESS
Alternatives: Agencourt Ampure XP beads (Beckman Coulter, Cat#A63880) are an alternative to the Sera-mag beads for the DNA clean-up steps. Although more expensive, Agencourt Ampure XP beads are ready to use, while Sera-mag beads need to be diluted in PEG buffer (see above) prior to use.
Other types of equipment that are required but do not have to be from a specific manufacturer (Equipment used will be listed in the key resources table (KRT)): Benchtop centrifuge with spinning capability at 20,000 g. DNA analyzer for size and quality control of samples. DNA fluorometer. Magnetic rack (96-well or one that can fit 0.2 mL PCR tubes). PCR machine. Pestle and electric drill (depending on tissue type). Sonicator. Temperature controlled metal heat block (up to 95 C).

STEP-BY-STEP METHOD DETAILS
Extraction of genomic DNA

Timing: 1 h; enzyme incubation times variable (1 h-16 h)
The aim of this step is to extract and isolate genomic DNA from the samples, using the QIAamp DNA Micro Kit. Depending on the type of tissue or amount of material being processed, there are two methods for initial tissue homogenization.
1. Pre-heat a heat block to 56 C. 2. Initial tissue homogenization via one of two options: a. AL Buffer Protocol: recommended for whole Drosophila embryos, whole larvae, whole adult Drosophila heads (with additional mechanical homogenization) and tissue culture cells (no additional mechanical homogenization). i. Take the samples (stored without buffer) from À80 C and add 180 mL 13PBS to the Eppendorf tubes.
Optional: For tissue containing gut or any tissue with high concentrations of nucleases and/ or proteases, add 145 mL of 13PBS and 40 mL 500 mM EDTA (50 mM final concentration) to the Eppendorf tube instead.
ii. Add 20 mL RNase (12.5 mg/mL stock solution) and gently mix.
Optional: If samples (whole embryos, whole larvae, adult heads) require mechanical homogenization, use a sterilized pestle (washed in 100% ethanol) attached to an electric drill.
iii. Add 20 mL Proteinase K (from QIAamp DNA Micro Kit), mix gently (by pipetting up and down or flicking the tube gently) then leave for 1 min at 21 C-24 C. iv. Add 200 mL Buffer AL, gently invert-mix roughly 50 times and incubate at 56 C for 10 min or 16 h, until the sample is completely lysed and digested.
CRITICAL: Make sure the sample is completely digested.
v. Cool to 21 C-24 C, add 200 mL 100% ethanol and mix by gently inverting. b. ATL Buffer protocol: Recommended for small volumes (<10 mL) of dissected tissue or cut larvae. i. Add 20 mL of 500 mM EDTA (50 mM final concentration and 20 mL of Proteinase K to 180 mL of ATL Buffer, mix by vortexing. ii. Take the samples (stored without buffer) from À80 C and add mixture from the previous step. Mix gently by inverting the tube. iii. Incubate at 56 C until completely digested and gently invert the tube occasionally to mix.
Note: Depending on the type of tissue, digestion times can take from 1 h-16 h.
Optional: If the sample is not properly digested after an 16 h incubation, add another 180 mL Buffer ATL + 20 mL Proteinase K to the sample and incubate for several more hours. Note that the volumes of RNase, Buffer AL and ethanol added in the subsequent steps will be doubled.
iv. Add 20 mL RNase (12.5 mg/mL stock solution), mix by inverting the tube and incubate at 21 C-24 C for 2 min. v. In a separate tube, mix 200 mL buffer AL and 200 mL 100% ethanol by vortexing. (Total volume of 400 mL needed per sample). vi. Add 400 mL of Buffer AL/100% ethanol mix to each sample, mix well by gently inverting and flicking the tubes.
Note: If a precipitate develops during this step, reheat the sample to 50 C-56 C for 1 min before mixing.
3. Add all of the solution from either steps 2a.(v.) or 2b.(vi.) to a spin column (QIAamp DNA Micro Kit). 4. Spin (>6,000 g) at 21 C-24 C for 1 min; discard the flow-through and collecting tube. 5. Add 500 mL AW1 solution and spin (>6,000 g) for 1 min; discard flow-through and collecting tube. 6. Add 500 mL AW2 solution and spin (>6,000 g) for 1 min; discard flow-through and collecting tube. 7. Transfer the column to a new tube and spin at 20,000 g for 3 min to dry the column.
Note: Spinning at maximum speed of a standard benchtop centrifuge is recommended. A Qiavac vacuum can be used for steps 3-6 if there are many samples to be processed, but the drying step must be done using a centrifuge.
8. Transfer the column to a new 1.5 mL Eppendorf tube, add 50 mL of buffer AE and leave at 21 C-24 C for a minimum of 10 min. Spin at (>6,000 g) for 1 min and keep the flow-through (elution). 9. Run 1 mL of the elution on a 0.8% agarose gel to check sample quality. The genomic DNA should be a single band on the top the gel and not a smear, which could indicate DNA shearing. Troubleshooting 1 and Troubleshooting 2. Optional: Step 9 can be done while proceeding with the next major step. Set aside 1 mL of elution for quality checking and use the remainder for the next step.

Isolation of methylated DNA
Timing: 1-2 days At this stage, the sample will be digested with DpnI, which only cuts at adenine-methylated GATC sites. After enzyme digestion, the genomic DNA will be cleaned up using the QIAGEN PCR purification kit. The DNA will also be ligated with adaptors as the methylated fragments will serve as a template for PCR amplification. To ensure that only methylated regions of the DNA are amplified, the sample is digested with DpnII, which cuts at unmethylated GATC fragments.

OPEN ACCESS
10. Transfer 43.5 mL of elution to a new 1.5 mL Eppendorf tube. 11. In a separate tube, prepare a master mix with 5 mL of NEB CutSmart buffer and 1.5 mL of DpnI enzyme per sample, flick to mix and spin down. 12. Add 6.5 mL of the mix from above to the elution, very gently flicking the tube or pipetting up and down with a P1000 pipette to mix. Digest the mixture for 2 h-16 h at 37 C.
CRITICAL: Do not vortex this mixture as this can lead to shearing of genomic DNA.
13. Clean up the digested DNA according to the instructions in the QIAGEN PCR purification kit. a. Elute in a final volume of 32 mL of dH 2 O. Pipette the H 2 O directly on to the filter in the spin column and leave for 5 min at 21 C-24 C before spinning.
Pause point: DNA can be stored for up to 6 months at À20 C.
14. Measure the DNA concentration using a Qubit or NanoDrop fluorometer. Dilute samples to a maximum of 750 ng in 15 mL of dH 2 O. If the amount of DNA is lower than 750 ng, use 15 mL of the undiluted elution.
Note: It is not unusual to have very low yields of DNA (minimum 5 ng) at this stage, as uncut genomic DNA (which should compose the majority of the sample initially) will be discarded during the purification steps. The yield of DNA is dependent upon the starting material, cell type and number of cells that are profiled by NanoDam.
Optional: The diluted elution or any unused DNA can be stored at À20 C (as spare sample or for troubleshooting if necessary).
15. Transfer 15 mL of sample to 0.2 mL PCR tubes. 16. Add 4 mL of pre-made adaptor ligation buffer and 1 mL of T4 DNA ligase to the sample, mixing gently. Adaptor ligation buffer and T4 DNA ligase can be premixed in a master mix. 17. Using a PCR machine, incubate the ligation reaction for 2 h at 16 C, then 10 min at 65 C to inactivate the T4 DNA ligase. 18. Add 19 mL of pre-made DpnII digestion buffer and 1 mL of DpnII enzyme. These two can be premixed in a master mix before adding to the sample. 19. Digest at 37 C for a minimum of 2 h and maximum of 16 h (overnight). 20. Heat inactivate the enzyme by incubating the mixture at 65 C for 20 min.

Timing: 40 min
This cleaning step greatly improves the efficiency of NanoDam-PCR through the removal of the buffer solution from previous steps. This step is recommended but optional if there are time constraints to the experiment. Clean-up is done using Sera-mag beads. Pause point: PCR-amplified DNA can be stored 16 h at 4 C or for up to 6 months at À20 C.

Sonication and quality checks
Timing: 16 h The DNA from the previous step is purified and the adaptors used for PCR amplification are removed. The quality checks are performed prior to the preparation of the libraries for next-generation sequencing.
33. Transfer the 50 mL sample to a 1.5 mL Eppendorf tube and purify the DNA following the instructions of the QIAGEN PCR purification kit. Elute in 32 mL of dH 2 O, leaving it for 5 min before the final spin. 34. Run 1 mL of the elution on a 0.8% agarose gel for a quality check. A smear between 400 bp-2 kb is expected. 35. Measure DNA concentration using Quantus/Qubit/NanoDrop. Troubleshooting 3. 36. Dilute samples to 2 mg DNA (or less) in 90 mL of dH 2 O in 1.5 mL or 0.2 mL sonication tube.
CRITICAL: Using tubes designed for sonication is important for consistent results.

PCR cycling conditions
Steps Temperature Time Cycles Note: AlwI digestion (which removes the dsAdR primer) can be carried out either before or after sonication. AlwI cannot be heat inactivated but will not affect the steps downstream.
40. Transfer 70 mL of each sample to 8-well PCR strips for library preparation.
Pause point: Sonicated DNA can be stored for up to 6 months at À20 C.

Sequencing library preparation
Timing: 3-4 h Computational analysis and visualization of NanoDam data The workflow for analyzing NanoDam data builds upon the existing damidseq_pipeline (Marshall and Brand, 2015) and takes into account cross-comparisons of multiple replicates. It is composed of a suite of Python scripts (collectively called damMer) which generates normalized binding tracks (*.bedgraph format) and identifies statistically significant and reproducible peaks (across replicates) (*.bed format). Data analyzed by this pipeline can then be used for other downstream analysis, such as ChIPseeker, principal component analysis and can be visualized using the Integrative Genome Viewer (IGV) (Robinson et al., 2011) (Figure 3).
Here we will summarize the main steps of using this pipeline for a NanoDam experiment to generate binding tracks and peak sets. Additional steps on assessing the quality of binding data, including comparisons of replicates and complexities of libraries generated from the experiments will also be discussed. Please note that additional technical notes to supplement the NanoDam data analysis are provided in the corresponding GitHub repository: https://github.com/AHBrand-Lab/ NanoDam_analysis. All references to Python scripts (i.e., *.py) and R markdowns (i.e., *.Rmd) refer to code that was deposited in this repository.
damMer applies the statistical framework of the damidseq_pipeline in an automated manner to all possible pairs of the provided NanoDam-tagged protein and NanoDam-alone samples, averaging across all pairs. As these individual tasks benefit greatly from parallelization, the suite utilizes the workload manager slurm (https://slurm.schedmd.com/documentation.html). All tasks (e.g., copying fastq files, running damidseq_pipeline, averaging, quantile normalization) are submitted as individual jobs to slurm, which automatically schedules these jobs.
55. Install or download the following packages and scripts in preparation for running damMer: a. To run damMer.py: i. samtools.  http://bowtie-bio.sourceforge.net/bowtie2/index.shtml or generated with the bowtie2build command based on the genome sequence in *.fasta format). iii. Prepare a file specifying all genomic NanoDam-methylation sites (i.e., GATC-motifs) which can be generated by the gatc.track.maker.pl script (*.gff format). iv. To avoid leaving samples out, provide all NanoDam-tagged protein and NanoDam-alone sequence files as shell arrays. Note: damMer.py will generate folders for every pairwise comparison (i.e., *_vs_*, Figure 4), copy the required fastq files with trimmed reads into them, validate the file formats, generate shell scripts to run the damidseq_pipeline in all folders and submits them as jobs to slurm while ensuring that all jobs are running. A separate log-file with all information provided to and by damMer.py is also generated which enables users to retrace all parameters and arguments used to run this script. Similarly, all shell scripts submitted to slurm are kept.
CRITICAL: damMer_tracks.py will adjust names for all files generated by damMer.py, therefore it is not recommended to manually change filenames in the folder created by damMer.py.
b. damMer_tracks.py: generates tracks of averaged, normalized binding intensities for the NanoDam experiment across the entire genome using tracks generated across all pairwise comparisons of NanoDam-tagged protein and NanoDam-alone samples (Figure 4). i. To include all tracks generated by all pairwise comparisons by specifying the folders (i.e., *_vs_*) in which they are located when running damMer_tracks.py. ii. Prepare a file with the corresponding genome sizes (*.tsv format) (e.g., dm.chrom.sizes from https://www.encodeproject.org/files/dm6.chrom.sizes/).
Note: damMer_tracks.py creates a folder ending in *_tracks (beginning of folder name defined via '-o' argument) that includes copies of all individual tracks from all folders generated by damMer.py (i.e., *_vs_*), as well as quantile normalized and averaged versions in *.bedgraph and *.bw (bigwig) format.
Optional: Alternatively, all individual tracks can also be read into R via the functions import.bedgraph() or import.bw() into a combined matrix and quantile normalized via preprocess-Core::normalize.quantiles() (see genomewide_correlation.Rmd).
Note: damMer_tracks.py also submits jobs for peak calling with MACS2 based on bam-files acquired from damMer.py (i.e., *-ext300.bam). For each pairwise comparison, peaks will be called for the bam-file corresponding to the tagged protein of interest compared to its NanoDam-alone control. As NanoDam-dependent methylation signals are not as locally confined as ChIPseq-signal, we always obtain broad peaks (i.e., MAC2 argument -broad).
To obtain both normalized tracks for the binding intensities of the tagged protein of interest and tracks specifying putative open chromatin sites, damMer.py makes use of damidseq_pi-peline_vR.1.pl and damMer_tracks.py creates a second folder, named *_DamOnly_tracks. damidseq_pipeline_vR.1.pl is a modified version of the initial damidseq_pipeline.pl that also generates unnormalized (i.e., not compared to a control) NanoDam-alone tracks in line with the ideas of CATaDa (Aughey et al., 2018). Copies of these NanoDam-only tracks are gathered, quantile normalized to each other and averaged in the *_DamOnly_tracks folder by damMer_tracks.py. c. damMer_peaks.py: identifies sets of statistically significant peaks and sets of reproducible peaks for a defined list of FDR-thresholds (i.e., reproducible peaks that occur in at least 50% of all pairwise comparisons). i. This script will use the folder names (and file names) generated by damMer.py (i.e., *_vs_*) and create ''*_peaks and *_DamOnly_peaks folders to store the final output peak files (code chunk 4).
Note: All *.broadPeak files with peaks identified in individual pairwise comparisons will be gathered in a new *_peaks folder by damMer_peaks.py. The peaks in this set of files will be thresholded multiple times according to a defined list of 41 FDR cut-offs (i.e., -log 10 (FDR) = 0, 1, 2 ..., 5, 10 ... 100, 125 ..., 1900, 2000). All peaks left after thresholding with a particular FDR-value (-log 10 (FDR peak ) R FDR threshold ) will be combined into a single file, sorted and merged to obtain a consensus set of peaks for each FDR, leaving the user with a set of 41 files corresponding to the FDR-values (i.e., *.mergePeak file format). In addition, all peaks are filtered by their appearance throughout the set of *.broadPeak files belonging to the pairwise comparisons and only peaks occurring in at least 50% across all files will be kept as reproducible set (i.e., *.reproPeak file format).
58. Download the latest version of Integrative Genome Viewer (IGV) and load the desired binding tracks (files in *.bedgraph or *.bw format) and significant peaks (files in *.bed format corresponding to the desired FDR).
Note: Other visualization methods can be used (e.g., UCSC genome browser) though IGV is the standard. Analysis of NanoDam binding data quality While the necessary quality check of the sequencing libraries accounts for low quality reads and nucleotides, as well as adapter contamination, the sensitivity of the assay and reproducibility of the experiment have to be determined separately. In order to detect problems occurring during NanoDam-induction and -methylation, library preparation and sequencing that may impact the entire library, genome-wide correlation analysis, signal enrichment analysis, and assessing the complexity of all sequenced libraries is recommended.
59. Perform genome-wide correlation analysis on unnormalized (i.e., normalization against a NanoDam-alone control) sequencing libraries. a. Use the *-ext3000.bam files (generated by damMer.py) located in its individual output folders (one per pairwise comparison). b. Map and bin the reads via bamCoverage from the deeptools suite (Ramírez et al., 2016) (Code chunk 5). c. The expected outputs are *.bedgraph files which can be read into R and genome-wide correlation analysis can be executed by following the workflow of genomewide_correlation.Rmd. 60. Perform correlation analysis on the normalized data derived from pairwise comparisons of NanoDam-tagged protein and control NanoDam-alone samples. a. Use the normalized *.bedgraph files generated from damMer.py and damMer_tracks.py, stored in the *_tracks folder. b. Process these files following the workflow of genomewide_correlation.Rmd.   (Daley and Smith, 2013) to calculate the alignment complexity of libraries: reads with unique sequencing information are plotted as a function of all reads across a gradually increasing number of reads included in the library (i.e., sequencing depth). i. Use the c-curve function on aligned reads in *.bam file format (Code chunk 6) or using the *-ext300.bam files in the individual folders for pairwise comparisons. ii. Plot the results of the output *.txt files in R as outlined in signal_enrichment.Rmd (section [5.0]).
b. Perform cumulative enrichment analysis (Diaz et al., 2012) and generate a fingerprint plot which determines how the signal from the NanoDam-tagged protein samples can be differentiated from the background read distribution in the NanoDam-alone control samples. 62. Assess whether signal (i.e., tracks) are enriched on binding sites deemed statistically significant (i.e., peaks) by following the workflow of signal_enrichment.Rmd in R: a. Use the normalized, averaged tracks from NanoDam-tagged protein experiments (from damMer_tracks.py) and significant, reproducible peaks (*.reproPeak files from damMer_ peaks.py). b. Extract the binding signal over the peaks using the extract_matrix() function and plot the results.
Note: For these analyses it is recommended to use other NanoDam-tagged protein datasets with the same FDR-threshold as a comparison and negative controls derived from comparing NanoDam-alone samples to each other (e.g., NanoDam-alone_2_vs_NanoDam-alone_1 ., NanoDam-alone_3_vs_NanoDam-alone_4). This requires damMer to run on NanoDam-alone samples as experimental (-e argument) and control samples (-c samples). When starting dam-Mer_tracks.py and damMer_peaks.py, the folders (i.e., *_vs_*) where the same NanoDam- alone sample is both experimental and control (e.g., NanoDam-alone_1_vs_NanoDam-alone_1) can be left out (Code chunk 7).

EXPECTED OUTCOMES
By the end of the protocol, next generation sequencing data should be obtained from the NanoDam profiling experiment. The damMer python suite of scripts should generate genome-wide binding tracks of the NanoDam experiment in *.bedgraph format, normalized and averaged across all replicates and with all possible pairwise comparisons. Statistically significant binding can be determined by the identification of peaks, which are stored in *.bed format. Loading these files on IGV enables visualization of tagged protein binding across the whole genome ( Figure 5).
Further downstream analysis can be performed after running damMer. The ChIPSeeker package (Yu et al., 2015) can be used to compare potential differential binding under different experimental conditions and to examine the binding distributions across the whole genome (preferential binding on promoters, exons, introns etc.).
Quality check of NanoDam data (1): Genome-wide correlation of replicates At the end of steps 59 and 60, Pearson correlation coefficients for the correlation of all samples against each other are calculated. It is recommended to keep individual libraries with a Pearson correlation coefficient of R0.9 compared to libraries of the same type (i.e., among NanoDam-tagged protein samples or NanoDam-alone samples) and pairwise comparisons with a Pearson correlation coefficient of R0.8 among the other comparisons of the same type (e.g., NanoDam-tagged pro-tein_vs_NanoDam-alone comparisons derived from the same experimental setup (Figure 6).

OPEN ACCESS
Quality check of NanoDam data (2): Complexity analysis Libraries of NanoDam-alone and NanoDam-tagged protein samples should show the highest possible slopes in the complexity curves. At the same time, the curves for the two sample types should be as far apart from each other as possible with the NanoDam-tagged protein curves showing a lower slope as the NanoDam-alone curves. This indicates an enrichment of reads at binding sites of the tagged protein profiled by NanoDam ( Figure 7A).
To further validate this enrichment and thus protein-binding signal across the genome, the fingerprint plots will elucidate an overall distribution of reads in a library across the genome. For this purpose, reads are quantified along the binned genome and the bins ranked according to the sum of reads falling into them. By plotting the cumulative fraction of reads along an increasing bin rank, the coverage of the genome by reads of the library in one curve can be evaluated. The closer the elbow of a library's fingerprint curve is to the lower right corner of the plot, the more concentrated the library's reads are in few bins. Similar to the complexity curves, both sample types, NanoDamtagged protein and NanoDam-alone, should have the widest possible distance from one another ( Figure 7B). This ensures a high signal enrichment (i.e., signal-to-noise-ratio). For both, complexity curves (step 61a) and fingerprint plots (step 61b.), an overlap of the curves for NanoDam-tagged protein-and NanoDam-alone-samples would mean an absence of specific binding sites.
Quality check of NanoDam data (3): Signal enrichment analysis and selecting FDR values Whether a NanoDam-experiment was successful can be examined by quantifying the enrichment of signal (i.e., tracks) on significant binding sites (i.e., peaks). The lower the number of reads mapping stochastically to the genome (i.e., noise, due to random DNA shearing during library preparation for example), and the higher the specificity of the reads to accumulate at binding sites (due to high affinity of the protein of interest for its binding motif), the higher the signal-to-noise ratio or signal enrichment. The closer the individual tracks from the pairwise comparisons are, the more reproducible the results are amongst individual libraries and experiments. In parallel, a high signal in the center of the peaks indicates a high signal enrichment. The curves for the negative controls should show an insignificant enrichment compared to the enrichment of the experimental pairwise comparisons. However, slight signal enrichment over peak centers is expected as this indicates stochastic methylation in open chromatin sites and presence of noise.
An analysis of signal enrichment on significant peaks analysis can also be used to determine which FDR-threshold (see damMer_tracks.py and damMer_peaks.py) should be used for defining the set of significant peaks. Dam-and by extension NanoDam-derived methylation can be separated into direct (primary) and indirect (secondary) methylation sites (Redolfi et al., 2019). Indirect or secondary methylation sites are open chromatin regions without bound NanoDam-tagged protein that are in close topological contact with the primary, direct sites bound by the tagged protein. High quality data will include both types of methylation sites. However, the more significant peaks are considered in signal enrichment analysis, the more secondary sites with a reduced signal enrichment will be included as well. A more stringent FDR-cut-off is recommended to focus on direct binding sites of the tagged protein of interest.

LIMITATIONS
NanoDam mainly relies on the interaction of a GFP tagged protein with DNA, therefore it is important that the GFP tagging does not impair the function of the protein and specifically its interaction with DNA. When producing endogenously tagged cells or organisms, the function of the GFP tagged proteins should be assayed and alternative tagging strategies could be used to circumvent any problems: N-terminal tagging vs C-terminal tagging.
Another limitation of NanoDam is its resolution compared to ChIP-Seq. As in TaDa, it depends on the frequency of GATC sites in the genome (with median spacing of $190 bp in Drosophila). (B) Every curve represents either a NanoDam-alone or a NanoDam-tagged protein library. In both analyses, the further the NanoDam-tagged protein sample curves are away from the corresponding NanoDam-alone curves, the stronger is the specific signal enrichment at the actual binding sites of the NanoDam-tagged protein.

OPEN ACCESS
The minimum number of cells required for a successful NanoDam experiment remains to be determined, nevertheless TaDa has been able to provide reliable data with as few as 10.000 cells, which could be extrapolated to NanoDam. The minimum proportion of cells expressing the protein of interest required in order to obtain proper NanoDam signal may depend on several factors, such as the binding affinity or potential cofactors of the protein of interest, the accessibility of target sites and nuclear concentration of the protein. Determining the optima for every tagged protein of interest is potentially feasible for small screens or targeted approaches, but impractical for assaying a multitude of factors across various conditions. TROUBLESHOOTING Problem 1 Genomic DNA shearing.
If a smear of DNA rather than a single high molecular weight band is observed in the agarose gel, it is likely that the DNA has been sheared during extraction and therefore the sample should be discarded: sheared DNA can ligate to the DamID adapter as if it had been methylated (Figure 8).

Potential solution
Be gentler when extracting DNA, avoid vortexing and vigorous pipetting.

Problem 2
Low amounts of genomic DNA.
If the DNA is barely visible it is likely that there are insufficient amounts of material.
Potential solution Use more starting material or more homogenization.

Problem 3
Low PCR yield. If the DamID PCR yields low amounts of DNA it might be that the starting material was insufficient or the methylation signal is low (Figure 9).

Potential solution
Use more starting material or change NanoDam induction time. Using more material or allowing NanoDam to methylate for longer periods may help. When increasing induction times, do it accordingly with your Dam only control.
Problem 4 DNA fragment sizes bigger than expected.
After sonication, a smear around 300-400 bp should be obtained. At this stage, we recommend using the Tapestation to visualize the distribution of fragments (see below) due to the relatively low concentrations of DNA. Any differences observed would suggest that the number of sonication cycles needs to be optimized ( Figure 10).

Potential solution
Increase number of sonication cycles.

Problem 5
No DNA after preparation of sequencing library.
If all control points have been achieved up to this point, obtaining no DNA after library preparation would suggest that there has been a mistake in the preparation of the reactions.

Potential solution
Repeat library preparation procedure with fresh reagents.

Problem 6
Secondary peak in sequencing library.
Probably due to exhaustion of the PCR resulting in concatemers ( Figure 11).

Potential solution
Reduce input DNA quantity or number of cycles. If this secondary peak is seen in the Tapestation genomic plots, reduce the number of PCR cycles. 6 is recommended, as this issue is generally seen when using 8 or more cycles.

Problem 7
Library has adapter contamination.
If a peak around 120-130 bp is observed it would suggest the presence of contaminating adapter dimers, which would affect the sequencing yield.

Potential solution
Repeat bed-cleanup step at with 0.93 volumes of beads.

Problem 8
Low number of reads/low percentage of reads mapped to genome.
Adaptor dimers or adaptor concatemers might still be present in the sequencing library. Failure to remove initial DamID adaptors with AlwI could also be the cause. Contamination with foreign DNA.

Potential solution
Use fresh AlwI enzyme and buffer. Make sure to keep pipettes and workspace clean and use filter tips when processing samples.

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Andrea Brand (a.brand@gurdon.cam.ac.uk).

Materials availability
Plasmids and fly stocks generated in this study are available upon request. Data and code availability NanoDam data have been deposited at GEO and are publicly available as of the date of publication. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request. Accession numbers are listed in the key resources table. All original code has been deposited at GitHub and publicly available of the date of publication. DOIs are listed in the key resources table.

ACKNOWLEDGMENTS
We acknowledge Kay Harnish at the Gurdon Institute for performing Illumina sequencing.