Dam Assisted Fluorescent Tagging of Chromatin Accessibility (DAFCA) for Optical Genome Mapping in Nanochannel Arrays

Proteins and enzymes in the cell nucleus require physical access to their DNA target sites in order to perform genomic tasks such as gene activation and transcription. Hence, chromatin accessibility is a central regulator of gene expression, and its genomic profile holds essential information on the cell type and state. We utilized the E. coli Dam methyltransferase in combination with a fluorescent cofactor analogue to generate fluorescent tags in accessible DNA regions within the cell nucleus. The accessible portions of the genome are then detected by single-molecule optical genome mapping in nanochannel arrays. This method allowed us to characterize long-range structural variations and their associated chromatin structure. We show the ability to create whole-genome, allele-specific chromatin accessibility maps composed of long DNA molecules extended in silicon nanochannels.


Chromatin accessibility signals along aggregated genetic features
Gene bodies were defined as spanning from the transcription start site (TSS) to the transcription end site (TES) annotated by GENCODE v34. 1 Promoters were defined as ranging from 1000 bp upstream to 500 bp downstream from the GENCODE TSS. General predicted enhancers were mapped to gene targets by JEME and adapted from Cao et al. 2 Genomic coordinates of enhancers were converted from the human genome build hg19 to hg38 using UCSC liftOver. 3 Enhancers overlapping ambiguous genomic regions were discarded, as well as pairs of enhancers and gene targets that are overlapping or in close proximity (up to 5 kbp). 4
In the evening 0.5 mL of this cell culture were transferred into LB medium (300 mL) containing ampicillin (100 mg/L) and incubation was continued with shaking at 37°C overnight.

Conversion of OGM data to global chromatin accessibility maps
All scripts mentioned in this work are available from: https://github.com/ebensteinLab/DaFCA, https://github.com/ebensteinLab/EcoDAM The data output from the Bionano Genomics "Saphyr Molecule Detect" is provided in proprietary CIP and BNX formats. The CIP file contains the indexing information necessary for retrieving the continuous intensity profile for a specific molecule in a BNX file. The file is an ASCII text, tab delimited file. Which is not compatible with commonly used bioinformatic tools. Therefore, the intensity profiles of the red channel (EcoDam accessibility labeling) for the filtered molecules were converted by a series of custom python scripts: The first is readCIP.py that converts the data into a numerical profileTrace.txt file containing the molecules ID's and their intensity profile.
Each chromosome was then analyzed independently using an automation script exexute_tasks that supports a YML file format for each script's specific arguments.
Using the script CheckProfileAlignmentA.py we extracted the data from the profileTrace.txt file and combined it with the alignment data (Xmap format) to account for the intensity of the red After applying the whole pipeline on both chromatin and naked samples they were each merged to a Whole genome file containing all 23 female chromosomes for GM12878. For the final step, the naked data was used for the chromatin normalization.

Data smoothing
Data was smoothed using a Gaussian sliding window. The window size and STD were chosen to yield the best correlation with the expected theoretical profile predicted by the genomic DAM site distribution. We screened different combinations of STD and window size for smoothing, and calculated the correlation coefficient (CC) between the smoothed theoretical data and the naked control experimental data. This was done in an iterative manner, so that each time one parameter was fixed, and the other was monitored, until finding the pair with the best CC value (fig S1 A&B).
To ease computation resources CC was not calculated over the whole genome but rather on a representative region (chr19: 132000-155000). The optimal parameters were found to be a window size of 4000bp and STD value of 2000bp. Indeed, applying these values over the whole genome generated theoretical peaks that highly resemble our experimental peaks ( fig S1C).

Correlation between DAFCA replicates
For assay reproducibility assessment, we performed the experiment on two biological replicates of the GM12878 cell line and calculated their Pearson correlation coefficient across chromosome 10, we divided the genome into non-overlapping windows of 500bp, 5kbp and 50kpb and calculated