Non-denaturing bisulfite treatment and single-molecule real-time sequencing reveals d-loop footprints, their length, position and distribution

Displacement loops (D-loops) are signature intermediates formed during homologous recombination. Numerous factors regulate D-loop formation and disruption, thereby influencing crucial aspects of DNA repair, including donor choice and the possibility of a crossover outcome. While D-loop detection methods exist, it is currently unfeasible to assess the relationship between D-loop editors and D-loop characteristics such as length and position. Here, we developed a novel in vitro assay to characterize the length and position of individual D-loop with base-pair resolution and deep coverage, while also revealing their distribution in a population. Non-denaturing bisulfite treatment modifies the cytosines on the displaced strand of the D-loop to uracil, leaving a permanent signature for the displaced strand. Subsequent single-molecule real-time sequencing uncovers the cytosine conversion patch as a D-loop footprint, revealing D-loop characteristics at unprecedented resolution. The D-loop Mapping Assay is widely applicable with different substrates and donor types and can be used to study factors that influence D-loop properties.


INTRODUCTION
flap may become a loading pad for D-loop disruption enzymes. Helicases and/or topoisomerases such as Sgs1-Top3-Rmi1, Mph1, and Srs2 reverse D-loops with different specificities (Fasching et al., 2015;Liu et al., 2017;Piazza et al., 2019;Prakash et al., 2009;Putnam et al., 2009). Internal D-loops also increase the possibility of a second invasion near the 3′-end, either in the same donor (Wright et al., 2018) or a different donor DNA (multi-invasions) and the formation of potential genomic rearrangements . Second, the presence of mismatches within hDNA enhances the D-loop disruption in a process termed heteroduplex rejection (Chakraborty et al., 2016;Honda et al., 2014). Thus, D-loops formed ectopically at heterologous sites would have a higher likelihood of reversal than at fully homologous donors, influencing the choice of the donor (Chakraborty et al., 2016). Third, the length of a D-loop might influence D-loop stability and likely their susceptibility to disruption enzymes. Long extended D-loops are also more likely to have second-end capture, leading to dHJ formation and the possibility of a crossover outcome (Wright et al., 2018). In summary, D-loop properties that influence D-loop reversibility regulate donor choice, the possibility of forming dHJ, and a crossover outcome, as well as multi-invasions and resultant genomic rearrangements Piazza et al., 2017). Thus, these features underline the importance of studying D-loop characteristics.
Furthermore, several interlinked factors influence D-loop characteristics. First is the resection length, where longer resected tracts permit the formation of longer D-loops. However, in vivo hypo-resection is usually rare, as evidenced by the highly processive resection machinery (Mimitou & Symington, 2011). As resection tract length increases, it allows for more possible internal invasions, influencing D-loop position, length, and potentially also donor choice. Internal D-loops could be disrupted to try end-invasion again, the hDNA may be extended or migrated towards the 3′-end, or the 3′-flap may be cleaved or involved in a second invasion at a different site (Wright et al., 2018). Second, Rad51 filament properties such as length, position, composition (inclusion of accessory factors) and stability, may influence Rad54 translocation activity with subsequent effects on D-loop characteristics. Third, the length of homology between the invading and donor DNA decides the longest possible D-loop length. Subsequently, long homology lengths are associated with increased crossovers (Inbar et al., 2000). Fourth, the topology at donor site or chromatin remodelers altering DNA topology may influence the position and length of a D-loop that is highly favored in negatively supercoiled regions (Wright et al., 2018). Lastly, the extent of DNA synthesis may or may not directly reflect the length of the extended D-loop. The nascent Dloop can either be extended past the initial point of invasion with DNA synthesis or be migrated along with DNA synthesis. Thus, the gene conversion tracts (Guo et al., 2017;Neuwirth et al., 2007) that reflect the length of recombination-associated DNA synthesis do not necessarily mirror the length of the D-loop. Yet gene conversion tract lengths positively correlate with a crossover outcome (Guo et al., 2017). Thus, factors regulating D-loop characteristics also influence the outcome and fidelity of the repair.
In summary, D-loop properties, as well as factors that influence D-loop properties, impact the repair outcome. Thus, it is crucial that an assay that permits the characterization of D-loops at the single-molecule level to be developed.

Current Methods For D-Loop Detection and Their Limitations
Since D-loops are dynamic in nature, it has been challenging to develop an experimental method to study D-loops. Recent advancements in the study of nascent and extended D-loop dynamics includes the development of the D-loop capture and the D-loop extension assays, as well as modifications to a widely-used assay called the D-loop assay Piazza et al., 2019;Wright & Heyer, 2014). Though D-loop assays typically utilize a short, ~ 100 nt ssDNA as the broken strand, Wright & Heyer showed that D-loops formed using physiological length ssDNA substrates with a 5′-duplex DNA may better recapitulate D-loop dynamics in vivo (Wright & Heyer, 2014). Wright and Heyer also developed a restriction-digestion based assay to map the location of D-loops across the region of homology (Wright & Heyer, 2014). However, the assay is limited by the presence and distribution of unique restriction sites across the homologous region.
Moreover, the method relies on the use of a supercoiled duplex donor to restrict D-loop length.
Additionally, the distance between two restriction sites should be greater than the maximum possible D-loop length in a supercoiled donor to distinguish between the D-loops formed at the two sites. Hence, while the restriction-digestion mapping assay broadly reveals D-loop distribution in a population of in vitro D-loops, it cannot define the length of individual D-loops.
In vivo detection of D-loops has been even more challenging, especially somatic D-loops. Meiotic single-end invasions (SEIs) that likely represent a particular form of extended D-loop destined to becoming a crossover are detected using 2D-gel electrophoresis (Hunter & Kleckner, 2001).
Recently, using a proximity-ligation-based method, Piazza and colleagues demonstrated the ability to quantitatively measure somatic nascent D-loop formation in vivo . The sensitivity of the assay relies on preserving the D-loops through psoralen-crosslinking. However, the crosslinking efficiency limits the assay by prohibiting the detection of D-loops shorter than the crosslinking density of ~500 bp. Thus, the assay falls short in the ability to distinguish a change in D-loop signal caused by a change in D-loop length or an alteration in D-loop quantity among a population of cells. Other assays either detect extended D-loops, a downstream product of nascent D-loop (Hicks et al., 2011;, or measure the extent of DNA synthesis via gene conversion tracts (Guo et al., 2017;Neuwirth et al., 2007), and thus, do not reflect the dynamic nascent D-loop properties. Thus, there is an unmet need for a method allowing the study of D-loop properties, such as its length, position, and distribution.
Here, we developed a novel high-throughput assay to detect for the first time the length and position of D-loops formed in vitro at the single-molecule level and base-pair resolution.
Subsequent analysis of individual D-loops reveals the distribution of D-loop lengths and position in a population of D-loops. Lack of enrichment or a crosslinking step permits unbiased D-loop analysis. The assay is based on non-denaturing bisulfite sequencing, adapted from the approach used to map R-loops that are formed by RNA invasion in DNA (Malig et al., 2020;Yu et al., 2003). The method allows easy multiplexing of several D-loop samples with high coverage. The method is accompanied by computational analysis and data visualization pipeline for D-loop profiling. The method is widely applicable to study (i) D-loops formed with a wide range of ssDNA substrates and donor DNA, (ii) factors altering D-loop properties (iii) D-loop disruption factors and their preferences for D-loop features. We also discuss the limitations and possible refinements of this method.
In vitro D-loop formation: D-loop reactions were performed as described in Wright & Heyer, 2014 with the following modifications. The reactions used a linearized or supercoiled pBSphix1200 plasmid donors. All D-loop reactions were carried out at 30°C. Homologous ssDNA was added at 3 nM with the 3 kb donor dsDNA also present at 3 nM (molecules). Rad51 was saturating with respect to the invading ssDNA (1 Rad51 to 3 nts ssDNA), and RPA was added at one heterotrimer to 25 nt ssDNA, while Rad54 was added at 18 nM monomers. 1x buffer was used as described in Wright & Heyer, 2014. The order of addition was: Rad51 + ssDNA, 10 min incubation; then RPA, 10 min incubation; and finally, Rad54 + linear dsDNA, 15 min incubation. In case a supercoiled dsDNA was used instead of a linear dsDNA donor, the reaction was carried on for 10 min to achieve maximum D-loop formation and prevent D-loop disruption based on Wright & Heyer, 2014. The reactions had a final volume of 25 µl. Reactions were stopped with 2 mg/ml Proteinase K (2.5 μl of 20 mg/ml Proteinase K) and 10 mM EDTA (0.5 μl of 0.5 M EDTA). The reaction was then split, such that 9 μl of the reaction was added to a tube containing 0.2 % SDS (0.2 μl of 10 % SDS) and 1x DNA loading dye for gel visualization (2.8 μl of 6x dye). This fraction of the reaction was deproteinized by incubating at room temperature (RT) for 1-2 hr. Subsequently, the samples were separated on a 0.8 % TBE agarose gel at 70 V for ~3 hr. The gel was stained with SYBR Gold Stain for 30 min at RT, before visualization. The remainder of the D-loop reaction (19 μl) was incubated at room temperature (RT) for 30 min to allow deproteinization, before proceeding to bisulfite treatment for the D-loop Mapping Assay. Note that SDS was avoided in this fraction of the D-loop sample to prevent branch migration of D-loops (Allers & Lichten, 2000). The reaction volume ensures that >50 ng of dsDNA was incorporated into the bisulfite reaction.

Non-denaturing Bisulfite treatment
Once deproteinized, the D-loops were treated with 85 μl of sodium bisulfite and 35 μl of DNA protecting reagent from the Qiagen 'Epitect Bisulfite Kit' at RT for 3 hr. RT was used for the treatment instead of the recommended higher temperatures to avoid D-loop disruption and migration as well as to avoid DNA denaturation. The bisulfite reaction was stopped by adding 310 μl BL buffer spiked with 3.1 μl of carrier RNA, following the kit instructions. The RNA was added since the dsDNA concentration is <100 ng preventing non-specific binding to the column. The bisulfite reaction was completed using the kit and finally eluted in 18 μl of EB buffer.

Two-step PCR
For the first round of PCR, primers were designed complementary to the non-homologous regions of the dsDNA donor, more than 500 nt upstream and downstream of the homology. Both the bisulfite-modified and unmodified DNA was amplified unbiasedly using primers lacking a cytosine in the forward primer or guanine in the reverse primer ( Table S1). The primers contained a 'universal' primer sequence at their 5′-end (UNI+Donor-PB-F, UNI+Phix-PB-R). Reactions were performed using ThermoFischer PhusionU DNA polymerase (ThermoFisher PN F555S) to produce high-fidelity, long-range single-band products, despite the presence of uracil in the bisulfite-treated DNA. 0.5 μl of eluted bisulfite DNA was used as input. PCR conditions: 98°C 2 min, [98°C 30s, 55°C 30s, 72°C 3 min]x35 cycles, 72°C 10 min, store at 10°C. Reactions were purified by 0.7x Sera-Mag SpeedBead Carboxylate-Modified Magnetic particles (Hydrophobic) (PN 65152105050250) and eluted in 50 μl of 1x TE pH 7.5. For the second round of PCR, the primers contained the universal primer sequence and a symmetric barcode sequence at the 5′-end.

AMPure Bead Purification
AMPure PB beads from PacBio (PN 100-265-900) were used. 0.7x volume, or as specified, of AMPure beads, were added to the DNA. The DNA was allowed to bind to the beads by shaking on a VWR vortex mixer at speed-2 at room temperature. After quick centrifugation, the tube was placed on a magnetic bead rack. 30-60 sec was given to allow bead separation towards the magnet, and then the supernatant was carefully removed. The beads were then washed with freshly prepared 70% ethanol. Ethanol was added without disturbing the beads, incubated for 30 sec, and pipetted out. The wash was repeated. Finally, the tube was briefly centrifuged and placed on the magnetic rack to remove any residual ethanol. After a 30-60 sec air-dry, the beads were resuspended in 1x TE pH 7.5. The beads were shaken on the vortex for 2-3 min to allow elution of DNA, before placing them back on the rack to pipette out the DNA. DNA concentration was measured by NanoDrop.

Amplicon Pooling
In order to multiplex the sequencing, PCR amplicons from various D-loop reactions were pooled before library preparation. Barcoded amplicons that were AMPure purified from the second PCR reaction were pooled in equimolar concentration. Ideally, for a Sequel-I or Sequel-II run, the amplicons were pooled to have a final concentration of 2-3 μg DNA. We usually pooled approximately 40-60 samples (depending on the PacBio system used for sequencing, see below).
The pooled amplicons were concentrated using a 0.7x AMPure bead wash and eluted in 37 μl of EB buffer (obtained from SMRTbell Template Prep Kit 1.0). The DNA concentration was measured by Nanodrop to have at least 2 μg DNA.

SMRTbell Library Preparation
The PacBio Sequel or RS-II system was used to achieve long-read, single-molecule resolution sequencing of D-loop footprints. SMRTbell Template Prep Kit 1.0 (PN 100-259-100) from PacBio was used for preparing the libraries. Libraries were built as per the "Procedure & Checklist -2 kb Template Preparation and Sequencing" protocol (PN 001-143-835-08) from PacBio with some modifications. The DNA repair reaction was carried out for 30 min at 37 °C. Ligation was done for 1 hr instead of 15 min at 25°C. Finally, the DNA library was eluted in 12 μl of Epitect Bisulfite buffer. SMRTbell libraries were quantified by Qubit or NanoDrop, and their size was crossvalidated using gel electrophoresis or Agilent Genomic's 2100 Bioanalyzer. Libraries were sequenced on a PacBio RS-II, Sequel-I or -II instrument with 10-hr movie times and a V2 primer.
Note that depending on the PacBio sequencing instrument used, the quality and quantity of output varied. For RS-II runs, only 8-10 samples were multiplexed and generated 500-1,000 reads per sample. For Sequel-I runs, ~50 different samples were multiplexed, resulting in ~3,000-5,000 reads per sample. For Sequel-II runs, ~50 samples were pooled as well but resulted in >20,000 reads per sample. The Sequel-II instrument may permit multiplexing ~200 samples without loss of data or undersampling.

Computational Data Processing
Circular Consensus Sequence (CCS) generation: Default parameters of SMRT link v6.0 with ccs 3.0.0. commit 1035f6f for (PacBio RS II and Sequel I) or v8.0 (for Sequel II) were used for processing subreads into CCS reads. As part of default parameters, a minimum subread quality of at least 90% was processed into CCS with a minimum pass filter of 3.

Gargamel computational pipeline
The Gargamel pipeline (available at https://github.com/srhartono/footLoop_sh) allows users to map reads, assign strands, call single-molecule D-loop footprints as peaks of C to T conversion, perform clustering on peaks, and visualize the data, similar to the pipeline described in (Malig et al., 2020).
Read mapping: Reads were mapped to the dsDNA reference sequence using Bismark v0.20.0 (Krueger & Andrews, 2011) (part of the footLoop.pl pipeline). The reference sequence included a 100 bp buffer off the beginning and end positions. Bismark default settings were used except for a slightly relaxed minimum score threshold (--rdg 2,1 --rfg 2,1 --score_min L,0,-0.8) and bowtie 2.2.6 (--bowtie2) was used. Truncated reads shorter than 50% of their expected length were discarded. After removing truncated reads, the median length of reads was 2,500 bp, which is the expected size. Altogether, the stringent requirements imposed for circular consensus, duplicate removal, mapping, and size, ensured the selection of very high-quality reads.
Strand assignment: For each read, strandedness was assigned based on conversion patterns. Reads with insufficient conversions (C-to-T < 6 and G-to-A < 6) could not be assigned and were named 'unknown'. Likewise, if the number of C-to-T conversions was within +/-10% of G-to-A conversions on a read, then the strandedness was considered unknown. Such 'unknown' strand reads represented less than ~5% of the total pool.
Otherwise, reads with predominant C to T or G to A conversions were assigned as non-template (top) or template (bottom) strand, respectively. Ambiguous regions carrying indels due to PacBio sequencing errors were masked (including a 5 bp buffer around the indel) so as not to distort the conversion frequency calculation. Bottom strand reads were more frequently observed (>2-fold) than the top strand (or the displaced strand of a D-loop). This is likely due to a PCR bias in amplification of uracil containing DNA and/or nicking of the displaced single-stranded DNA on a D-loop, which is often associated with the bisulfite treatment.
Peak calling: A threshold-based sliding window method was used to call tracts of C to T conversion referred to as D-loop peak. Unless otherwise specified, the windows spanned 50 consecutive cytosines and were moved across each read one cytosine at a time. (To mitigate sequence biases, we defined the window size based on a number of consecutive cytosines rather than a fixed length of nucleotides.) For each window, we calculated the C to T conversion frequency and called a window D-loop positive if a minimum of 40% cytosines were converted.
Positive windows were further extended if neighboring windows also satisfied the 40% threshold.
Upon encountering a window with conversion frequency less than the threshold, the peak was terminated, and its boundaries recorded.
We tested a combination of window sizes (20, 30, and 50 cytosines) and conversion thresholds (25%, 30%, and 40%); the results were qualitatively similar except that more positive peaks were recovered for less stringent conditions. A window size of 50 cytosines and minimal C to T conversion of 40% permitted a good combination of specificity and sensitivity.
Clustering and Data Visualization: All D-loop peaks were clustered based on their location from 5′ to 3′ end. Reads from each strand that contained a D-loop peak were visualized as a separate heatmap. The heatmap depicts position on cytosine in reference sequence as yellow vertical lines.
Each horizontal line in the heatmap depicts a different read molecule. The color of cytosine for each read is changed to green for every C-to-T modification. For a stretch of C-to-T conversions that cross the peak threshold, the color is changed to red, representing a footprint. A separate heatmap is formed for reads that contain a footprint, and for reads devoid of a footprint. Reads representing the top and bottom strand are also separately visualized.

Optimizing Peak Threshold and determining bisulfite conversion efficiency
We were used to amplify the spiked ssDNA during the PCR-1 step of DMA.

Analysis of D-loop footprints
Quantitation of D-loops from reads: D-loops were quantified by calculating the percentage of reads from each strand (top or bottom) containing a D-loop footprint.

Setting up the D-Loop Mapping Assay with several layers of internal control
To establish and optimize the D-loop Mapping Assay (DMA), we started by using a size-restricted D-loop sample formed in vitro. To do this, we used a supercoiled plasmid as a dsDNA donor to perform an in vitro D-loop reaction ( Figure 1A) (Wright & Heyer, 2014). For every 10.4 bp of hDNA generated in the D-loop, one negative supercoil from the dsDNA is consumed. Plasmids purified from E. coli have a mean supercoiling density of σ = -0.07 supercoil/turn (Champion & Higgins, 2007), which we verified independently (Solinger et al., 2002). Thus 3 kbp plasmids such as those used in this study should contain ~20 ± 5 negative supercoils, meaning that they can accommodate a maximum of ~210 ± 50 bp hDNA, even if the length of the invading homologous ssDNA is longer. Longer D-loops would cause the introduction of compensatory positive supercoils, which is energetically unfavorable. By contrast, fully relaxed plasmids carrying Dloops ~210 bp long are expected to be in the lowest possible energy state and thus favored at equilibrium (Wright et al., 2018). Thus, such an experimental setup allowed us to establish the DMA by predefining the expected lengths of the D-loops.
Once formed, the D-loop reactions were stopped and subjected to bisulfite treatment at room temperature to allow cytosine-to-uracil conversions on the single-stranded displaced strand. Using primers that are non-biased to conversion and outside the region of homology, the DNA was PCR amplified, purified, and subjected to single-molecule real-time sequencing ( Figure 1B). The sequencing reads were then processed, mapped to the reference sequence, and analyzed to reveal cytosine conversion regions as D-loop footprints. The cytosine conversion regions were defined as D-loop footprints only when they crossed a pre-specified peak threshold. The peak threshold was implemented to filter out any background cytosine conversions from spontaneous DNA breathing (for details, see Materials and Methods).
Negative controls at each step of the analysis further enhanced the strength of this well-defined in vitro design ( Figure 1C) should not harbor any footprints. This represents a further internal control within each sample.
Lastly, an experimental negative control can be generated by using a D-loop reaction performed in the absence of invading ssDNA or the Rad51 recombinase. Without ssDNA or Rad51, no strand invasion footprint is expected. Thus, these predicted positions and/or length of D-loops will provide confidence that the footprints derived solely from D-loops.

D-loop footprints are strand-specific, and D-loop levels detected by DMA correlate with gelbased quantitation
We implemented the controlled experimental set-up described above and analyzed in vitro D-loops homologies: ds98-607, 21% gel assay and 12% DMA; ds98-197-78ss, 12% gel assay and 7% DMA. Thus, for all three substrates, consistently, ~60% of the D-loops visualized on the gel were detected by the DMA assay. While Figure 2D depicts the total reads analyzed for each substrate, the average number of reads with a D-loop footprint across 3 replicates is presented in Table S2, and data from each replicate is available in Source data 1. In summary, despite slightly lower Dloop levels detected by DMA, the relative D-loop levels were comparable across the two orthogonal D-loop detection assays ( Figure 2E).

Position of D-loops formed by a supercoiled donor is restricted within the homology window, while their distribution reveals an enrichment at the 3′-end of homology
Next, we looked at the position of individual D-loops relative to the dsDNA sequence. Most of the D-loop footprints were confined within the 931 nt region of homology (depicted by a blue box in the heatmap) for the ds98-931 substrate (Figure 2A). As we reduced the length of homology shared between the ssDNA substrate and the dsDNA donor, the position of the D-loop footprints changed to reflect this differential homology length (Figure 2A-C). Even with a ds98-197-78ss substrate having only 197 nt homology to a 3,000 bp donor DNA, all the D-loop footprints were confined within the 197 nt homology window. Thus, the D-loop footprints were specific to homology, irrespective of the length of homology.
Next, we looked at the distribution of the D-loop position within the homology window for each substrate type. As evident from the visual appearance of D-loop footprints clustered based on their position on the heatmap, the D-loops seemed enriched at the 3′-end of homology (Figure 2A-C).
We quantified the distribution of D-loop position as the fraction of D-loops found within 100 nt bins non-exclusively. The distribution of D-loops across the homology length for ds98-931 and ds98-607 substrate is shown in Figures 3A and 3B, respectively. Similar to the results from a restriction-digest based assay (Wright & Heyer, 2014), there was an enrichment of D-loops near the 3′-end of ssDNA, while the 5′-end of homology had the lowest D-loop presence. This observation supports the model that Rad51 filament grows preferentially in a 5′-to-3′ direction (Qiu et al., 2013;Spirek et al., 2018), making it more likely for the D-loops to assemble at the 3′end of homology than at the 5′-end. Only D-loops in which the 3′-OH end is incorporated into the hDNA are competent for D-loop extension (Li & Heyer, 2009). For the ds98-197-78ss substrate, due to the small window size of only 197 nt, the distribution of D-loop position is less informative, but is shown in Figure 3C.
Note that a slight drop in the frequency of D-loops was seen in the last ~100 nt of homology near

Length of D-loops with a supercoiled donor is restricted by the supercoiling density of the donor
To determine the length of individual D-loops in nucleotides, we measured the length of the footprints. The precise margins of a D-loop footprint are influenced by a combination of the peak threshold used to define a footprint and the bisulfite conversion efficiency (see later discussion on peak thresholds).
As stated earlier, with a supercoiling density of ~20 ± 5 in the donor DNA, D-loops are expected to have a maximum length of ~210 ± 50 nt, irrespective of the homology length. We observed that D-loop footprints have an average length of 259 ± 93 nt and 231 ± 60 nt distributed across the region of homology with the ds98-931 and ds98-607 substrates, respectively ( Figure 3D). With the 197 nt homology substrate (ds98-197-78ss), the average D-loop length was 206 ± 48 nt ( Figure   3D). Thus, when using a super-coiled donor, D-loop length is tightly regulated by the supercoiling density of dsDNA irrespective of the size of homology between the substrate and the donor.
Note that these average lengths are slightly higher than the maximum expected length of ~210 ± 50 nt for ds98-931 and ds98-607 substrates. This slightly higher average length is due to a small fraction (< 5-10%) of footprints that were much longer than expected (> 350 nt) with a supercoiled donor (Figure 3D, S1B). These longer D-loops were nevertheless still within the maximum possible length based on the region of homology that is shared between the ssDNA substrate and the supercoiled duplex donor. Such long D-loop footprints may either be due to a small fraction of nicked (relaxed) plasmids in the reaction or due to branch migration of D-loops during the bisulfite treatment. Since the D-loops formed on supercoiled donors are very stable at room temperature (Wright & Heyer, 2014), branch migration appears to be the less likey explanation. Additionally, separation of the D-loop reactions on a gel revealed that 5-10% of the supercoiled plasmid was nicked/relaxed ( Figure S1A-B). Nicked plasmids would have unrestricted topology, allowing the formation of D-loops that span the entire region of homology. Thus, we conclude that these longer D-loops are the result of nicked dsDNA molecules.
In summary, the lack of footprints outside the region of homology, their absence on the bottom strand, their dependence on the formation of a D-loop, and their uniform length restricted by the supercoiling density of the donor provide high confidence that the footprints represent D-loops.

3′-end and depict a wide range of D-loop lengths across the region of homology
After establishing the assay using a supercoiled donor, a linearized plasmid, free of topological constraints, was used as a donor to map the D-loops in a topologically uninhibited manner. Again, a range of ssDNA substrates was used with varying lengths of homology to the donor (ds98-915,  Table S3. Lastly, the substrate with a non-homologous 3′-flap was used to test if 3′-heterology affects the enrichment of D-loops at the 3′-end of homology. As expected, the D-loop footprints for each of these substrate types were specifically seen only on the top strand, as evident in the cumulative read data ( Figure 4B). A total of ~800-3,700 (10-18%) D-loops were observed among ~ 4000 -40,000 top-strand reads ( Figure 4B, Table S4). Again, while the bottom strand comprised ~ 13,000-100,000 reads, it had < 0.1% of reads with a footprint.
(Here, a 10-fold difference in the total number of reads analyzed was due to upgrade in Pacific The distribution of D-loops was also unaltered by changes in the homology sequence between the ds98-931 and ds98-915 substrates (Figure 4D and S3D). This suggests that differences in cytosine distribution across the homology do not significantly alter D-loop distribution, as long as the % CG is not very low. In these cases, the % CG were >40%. Thus, these observations further support the 5′-to-3′ directionality of the Rad51 filament growth (Qiu et al., 2013;Spirek et al., 2018).
Interestingly, there was no optimal D-loop length. Instead, a broad range of D-loop lengths was observed, independent of the substrate type ( Figure 4E and 4F). The lengths of the D-loops ranged from ~100 nt to a maximum spanning the entire length of homology. D-loops as long as 900 nt were seen with the long substrates (ds98-915, ds98-931, and ds98-915-78ss) at a frequency of ~ 5% of the total D-loops ( Figure 4E). The minimum D-loop size of 100 nt is limited by the peak threshold (of t40w50) used to define a D-loop footprint. An average D-loop length of ~390 nt was noted with the 900 nt homology series substrates. There was a slight, yet significant difference in the D-loop lengths between the ds98-915 and ds98-931, with a difference in mean of 60 nt. This difference might be due to the small variation in homology length or the variability in cytosine distribution within the two homologous sequences. However, there was no statistical difference in D-loop lengths between the ds98-915 and ds98-915-78ss substrates, suggesting that the 78 nt 3′flap does not alter D-loop lengths. So, the difference in mean between D-loops with the ds98-915 and ds98-931 substrates might be due to variation in cytosine content or distribution. Overall, > 40% of footprints were longer than the average length of 400 nt for each of these three substrates.
Thus, the differences in the homology sequence or the presence of a non-homologous 3′-flap did not significantly alter the distribution of D-loop lengths.
However, unsurprisingly, the average D-loop length decreased from ~390 nt to ~320 nt, when the homology length was reduced to 607 nt using the ds98-607 substrate ( Figure 4F). The maximum D-loop length for the ds98-607 substrate was 600 nt, restricted only by the homology length. About 6% of the D-loops were longer than 500 nt, traversing across the region of homology. Similarly, the D-loops formed with the ds98-197-78ss substrate had an average length of 176 nt, and ~ 73% spanned the entire region of homology. The minimum D-loops length observed was ~100 nt limited by the peak threshold of t40w50. In summary, our results demonstrate that homology length defines the maximum observable D-loop length, and in turn, influences the average D-loop length.
Formation of multi-invasions where a single substrate forms D-loops simultaneously in multiple donor molecules is reported both in vitro and in vivo (Wright & Heyer, 2014;Piazza et al., 2017).
However, multi-invasions cannot be detected by this assay, since each donor molecule is separately analyzed. Instead, in some cases (~ 2-5%), multiple D-loop footprints are seen within a single read ( Figure S3A-C), representing a single donor. This may be due to multiple invasions of one or more substrates into a single donor at two different sites ( Figure S3F). Multiple invasions from a single substrate at two sites within a donor may arise due to the formation of two Rad51 filaments separated by a gap. However, Rad54 may be able to translocate through such a gap, making these kinds of multiple invasions a rarity. Alternatively, two substrates invading a single donor may also give rise to dual footprints. Conversely, two D-loop footprints on a single read may also arise due to inefficient cytosine conversions across a long D-loop, possibly due to the continued presence of RPA on the displaced strand. Unless the two footprints are separated by > 200 nt (~50 cytosines), it is currently hard to distinguish between the two possibilities.

RAD51 and RAD54 also form D-loops of varying lengths similar to Rad51 and Rad54 and depict an enrichment of D-loops at the 3′-end
We also similarly tested the D-loop mapping assay on D-loops formed using the human recombination proteins RAD51, RAD54, and RPA. The rationale for using human proteins was to test whether the D-loop length and distribution was an inherent property of the yeast Rad51 recombinase. In vitro D-loop reactions with human proteins were performed using the ds98-931 substrate and a supercoiled donor ( Figure S4A) or a linearized donor ( Figure S4C). With both donor types, the D-loop footprints were again specifically seen within the homologous region ( Figure S4B and S4D) and on the top strand ( Table S5). The distribution of D-loops was similar to the D-loops produced by the yeast proteins. An enrichment of D-loops was seen at the 3′-end of homology, irrespective of the donor type ( Figure S4E). Average D-loop length seen with a supercoiled donor was 252 ± 78 nt as expected, while with a linear donor, the average length was 320 ± 140 nt ( Figure S4F). The average D-loop length from a linear donor was slightly lower than the 410 nt seen with yeast proteins. Conformingly, only 21% of the D-loops were longer than 400 nt with human recombinant proteins compared to > 40% seen with yeast recombinants. Only one D-loop spanned the entire length of homology. Thus, compared to yeast proteins, human proteins seemed to be less efficient under the conditions used in forming longer D-loops. As evident from observation of the D-loops on an agarose gel ( Figure S4G and S4H), the efficiency of D-loop formation was also much lower, with only 3% using a linear donor. This could be due to the poor stability of the RAD51 filament. It may also suggest a role for RAD51 accessory factors in stabilizing the RAD51 filament and altering its properties to promote more efficient D-loop formation (San Filippo et al., 2008). Concomitantly, the percentage of D-loops observed on a gel correlated well with the frequency of D-loop footprint ( Figure S4H). Thus, the D-loop Mapping Assay can be broadly used to test the effect of recombinant proteins across various species on Dloop properties.

Temperature conditions for bisulfite treatment
The bisulfite conversion is most efficient at 70°C for 30 min (Yi et al., 2017). Such high temperatures would be deleterious for our purpose as it would cause D-loop dissolution (Wright & Heyer, 2014)  A 4-hr incubation at 37°C showed cytosine conversion patches even outside the homologous region and on the bottom strand (Figure S5A), indicating enhanced DNA breathing at 37°C (note that the number of reads analyzed was lower as the lower throughput RS-II sequencing platform was used). Incubation at 37°C is also known to enhance D-loop migration and dissolution (Wright & Heyer, 2014). Bisulfite treatment at 37°C for 30 min and at RT for 3 hr showed better recovery of D-loops compared to other treatment conditions (Figure S5A-C), depicting a higher fraction of reads with a D-loop footprint. To minimize potential D-loop migration, we opted for a 3 hr incubation at RT as the optimal condition for the bisulfite treatment.

Defining a peak threshold -t40w50 is an optimum peak threshold
A footprint is called based on a predefined peak threshold, that can be altered in the analysis pipeline. A peak threshold is a minimum conversion frequency (t) required within a minimum window size (w) of consecutive cytosines to call a footprint. The 't' parameter in a peak threshold should be lower than the average conversion efficiency of bisulfite treatment to minimize falsenegative footprints. Conversely, the 'w' parameter is set to eliminate false positives from spontaneous DNA breathing.
A single-stranded DNA (pBSKS-circular DNA) was spiked into an in vitro D-loop reaction sample to measure the efficiency of cytosine conversion. 58% of the 420 cytosines were converted on average in each individual read, and 100% of the 21 reads had cytosine conversion ( Figure   S6A and S6B). The cumulative cytosine conversion efficiency was thus, 58%. This finding implies that choosing a 't' value lower than 58% should prevent false elimination of true footprints.
Note that the conversion efficiency will always be less than 100% since non-denaturing conditions are applied.
To optimize the 'w' parameter, we examined spontaneous DNA breathing in the dsDNA (uninvaded) sample along with an in vitro D-loop sample. Footprints arising from spontaneous DNA breathing would be present on both the top and bottom strands of the dsDNA, as well as in the regions lacking homology to the ssDNA substrate. We tested two window sizes of 20 and 50 consecutive cytosines with a range of 't' parameters, 25%, 30%, 40%, and 60%, hereby denoted as t25w20, for example ( Figure 5A, S6B, and S6C). For a D-loop sample, with peak thresholds of t40w50 and t60w20, almost no footprints were observed on the bottom strand or in the flanking non-homologous regions (Figure 5A, S6C). With t40w20, 1.4% of bottom strand reads had such non-specific footprints. Lowering the conversion frequency (t) to 25% significantly increased footprints on the bottom strand, along with a concomitant increase in the percent of footprints detected on the top strand. 11.6% and 8.8% of bottom strand reads had a footprint with t25w20 and t25w50, respectively, compared to ~40% top strand footprints (Figure 5A, S6C). We conclude that at least a t40 conversion threshold is required to eliminate the majority of footprints arising from spontaneous DNA breathing. To further eliminate false positives among the D-loop footprints, we decided on the next least stringent condition of t40w50 with no detectable nonspecific footprints. While t60w20 also eliminated background footprints, the frequency 't' requirement was higher than the average conversion frequency seen. Hence, most of the genuine top-strand D-loop footprints were lost, reducing the percentage of D-loops to 8% ( Figure 5A).
Thus, with respect to specificity, the t40w50 peak threshold is the most optimal for detecting Dloop footprints.
Interestingly, while the t40w50 threshold eliminates the majority of bottom strand footprints, at the t25w20 threshold, a large number of bottom strand reads showed two footprints within a read ( Figure S6B). These two short footprints within a single read may have arisen at the two junctions of a D-loop, where the bottom strand may be single-stranded ( Figure 5B). RPA may bind at the 3′ junction of a D-loop (Sneeden et al., 2013), indicating that at least 25 nt (the binding site size of RPA) single-stranded region may be present on the bottom strand. However, it is currently not possible to determine if they truly represent D-loop junctions, or simply arise from breathing of the DNA. The footprints outside the region of homology on the bottom strand surely arise from the breathing of the DNA.

Effect of Different Peak Thresholds on D-Loop Length and Distribution
While optimization of peak threshold helped minimize false positives and false negatives, we wondered if it had any effect on the overall D-loop features. To address the effect of peak thresholds on D-loops called, we examined D-loop lengths and distribution under the various peak threshold conditions with the ds98-931 substrate and a linear donor.
In terms of the D-loop lengths, as one may expect, more short-length footprints were called with the low stringency conditions. With t25w50, 22% of the footprints were shorter than 250 nt, compared to 5% with a t40w50 threshold. Consequently, the average D-loop length with a t25w50 threshold was 357 nt compared to 410 nt with a t40w50 threshold ( Figure 5C). Additionally, lowering the window size from w50 to w20 further lowered the average D-loop length. The average length was 220 nt for t40w20, with 59% of the D-loops being shorter than 200 nt. Thus, with lower window size, more of the short D-loop footprints were captured along with some footprints potentially arising from spontaneous DNA breathing, leading to an overall reduction in average D-loop length.
Note that genuine-D-loops shorter than 250 bp would be most affected by the window size. This is because a window size of 20 would result in a minimum D-loop size of 80-100 bp, considering a CG distribution of >40%. While using a window of 50, the minimum D-loop size would be 120-200 bp. Hence, for w = 50, D-loops that were originally 180 bp would be slightly overestimated in size as 200-250 bp, assuming that the cytosine conversion frequency permits it. On the other hand, longer D-loops will be unaffected by the window size. The length of genuine D-loops will also be unaffected by a change in conversion frequency below the average bisulfite conversion frequency.
Lastly, changes in peak threshold did not affect the overall distribution of D-loops within the homology window ( Figure 5D). Thus, despite the detection of short non-specific footprints at lower thresholds, there was no bias in the position of these non-specific footprints, meaning that there was no overall change in their distribution.

Advantages and Limitations of the D-loop Mapping Assay
The D-loop Mapping Assay provides the following advantages: simultaneously. This provides speedy and cost-efficient high-throughput analysis.
The D-loop Mapping assay is limited in the following ways: 1) The minimum detectable D-loop length is limited by the peak threshold used for the analysis of DMA. The peak threshold is used to ensure confidence in that a conversion patch is derived from a genuine D-loop and not spontaneous breathing of the duplex DNA. D-loops shorter than 120-200 nt are undetectable with a threshold window of 50.
2) There is lower confidence in the precise length of those D-loops whose length approaches or is lower than the threshold window. This is because the length may be slightly overestimated due to the requirement of a minimum threshold window. However, D-loop with a length longer than the threshold window is not affected.
3) As with any D-loop assay, some unstable D-loops may be lost during the bisulfite treatment.
Some D-loop molecules may also be lost due to the nicking of ssDNA during bisulfite treatment.
Despite this, due to the lack of experimental bias, D-loops can be compared across different samples.
4) Some long D-loops may be misidentified as short D-loops due to insufficient bisulfite conversion within the treatment time period. However, optimizing the conversion threshold by using a spiked ssDNA control helps minimize the possibility of underestimated D-loop lengths or false negatives. Using the novel DMA, we have shown that without a topological barrier, D-loop length can vary widely, in some cases stretching as long as the homology permits. These differences in D-loop length may either reflect differential Rad51 filament lengths or variations in the processivity of Rad54 in forming hDNA. Moreover, the D-loops are enriched at the 3′-end of homology, similar to previous observations (Wright & Heyer, 2014). This enrichment confirms the directionality of the Rad51 filament in a 5′-to-3′ direction (Qiu et al., 2013).

DATA AVAILABILITY
The computational pipeline for mapping reads, strand assignment, peak calling, clustering, data

5.
(  Table summarizing the total number of reads containing a footprint as 'peak' and the total number of reads analyzed as 'total' for each strand. '% D-loops/% Peak' indicate the percentage of reads containing a footprint. The data represents a cumulation from > 3 independent replicates. The percentage of D-loops is calculated by dividing the number of reads containing a footprint by the total number of reads for that strand. (E) 39 (6.5%) reads with footprint from 597 top-strand reads