Using Chromatin Accessibility to Delineate Therapeutic Subtypes in Pancreatic Cancer Patient-Derived Cell Lines

Summary Disrupted chromatin regulatory processes contribute to the development of cancer, in particular pancreatic ductal adenocarcinoma. The assay for transposase accessible chromatin with high-throughput sequencing (ATAC-seq) is typically used to study chromatin organization. Here, we present a revised ATAC-seq protocol to study chromatin accessibility in adherent patient-derived cell lines. We provide details on how to calculate the library molarity using Agilent’s Bioanalyzer and an analysis pipeline for peak calling and transcription factor mapping. For complete details on the use and execution of this protocol, please refer to Brunton et al. (2020).

HIGHLIGHTS ATAC-seq protocol to measure chromatin accessibility in patient-derived cell lines Includes detailed size selection library clean-up step Instructions on how to calculate the library molarity using Agilent Bioanalyzer A comprehensive data analysis pipeline is provided 1. Harvest PDCLs using trypsin and count cells a. Typically, 100,000 cells worked well for most PDCLs tested. To optimize this for different cell types a range of cell numbers should be tested first to establish optimal cell: transposase ratio. b. Pipette cells using a 200 mL pipette tip to produce homogenous single cell suspension. 2. Centrifuge 100,000 cells for 5 min at 600 3 g, 4 C. REAGENT  3. Remove and discard supernatant. 4. Add 50 mL ice-cold PBS to wash pellet, do not disturb pellet, and centrifuge for 5 min at 600 3 g, 4 C. 5. Remove and discard supernatant. 6. Add 50 mL ice-cold ATAC-seq lysis buffer to pellet. Very gently dislodge the pellet. Centrifuge immediately for 10 min at 600 3 g, 4 C.
CRITICAL: It is crucial to treat the cell pellet gently during the lysis step to prevent overlysis and preserve the native chromatin configuration to be analyzed.
CRITICAL: To proceed as quickly as possible to the transposition reaction, make the transposition master mix (described in step 9) while the pellet is spinning for 10 min at 600 3 g, 4 C (step 6) 7. Discard the supernatant and immediately continue to the transposition reaction.

Transposition Reaction and Purification
Timing: $45 min Cells are incubated with the Tn5 transposase enzyme to insert oligonucleotide tags within open regions of DNA.
8. Make sure the cell pellet is placed on ice. 9. To make the transposition reaction mix, combine the following: 25 mL TD (23 reaction buffer from Nextera kit) 4.7 mL TDE1 (Nextera Tn5 Transposase from Nextera kit) 22.3 mL nuclease-free H 2 O 10. Resuspend nuclei pellet in the transposition reaction mix. 11. Incubate the transposition reaction at 37 C for 30 min at 400 rpm on an Eppendorf mixer. 12. Immediately following transposition, purify the DNA using the Qiagen MiniElute PCR purification kit. 13. Elute transposed DNA in 10 mL DNase-free water.
Pause Point: Purified DNA can be stored at À20 C for up to 72 h.

PCR Amplification
Timing: 30-60 min Transposed DNA fragments are amplified by PCR using custom primers which contain the designated index (barcode) for each library.
Note: Each library must contain a unique index to enable sample identification once the libraries have been pooled for sequencing.
14. To amplify transposed DNA fragments, combine the following in a 0.2 mL PCR tube: 10 mL transposed DNA 10 mL nuclease-free H 2 O 2.5 mL 25 mM PCR Primer 1 2.5 mL 25 mM Barcoded PCR primer 2 25 mL NEBNext High-Fidelity 23 PCR master mix 15. Perform initial amplification as follows: ll OPEN ACCESS CRITICAL: To allow extension of both ends of the primer after transposition the first 5 min extension at 72 C is required.
Note: The final number of PCR cycles is determined using qPCR to allow library amplification to be stopped prior to saturation. This helps to reduce GC and size biases.
16. To run a qPCR side reaction, combine the following qPCR compatible consumables: 5 mL of previously PCR-amplified DNA 4.2 mL H 2 O 0.4 mL 25 mM primer 1 0.4 mL 25 mM primer 2 5 mL 23 SYBR Select master Mix CRITICAL: Use white qPCR plates to run the qPCR reaction. White plates help increase sensitivity and reduce variability in the qPCR assay.
17. Using a qPCR instrument, cycle as follows: 18. To calculate the additional number of cycles required, plot linear Rn versus cycle and determine the additional number that corresponds to one-third of the maximin florescent intensity ( Figure 1).
19. Run the remaining 45 mL PCR reaction to the cycle number determined by qPCR.
Step Temperature Time

Double Size Selection
Timing: 30-60 min The first objective is to remove large DNA fragments which are >1,000 bp. Large fragments are difficult to accurately quantify and result in reduced clustering efficiencies when sequencing. The second objective is to remove smaller DNA fragments which are <100 bp. Smaller fragments are preferentially sequenced and consequently could become over-represented in the sequencing result. This step will also remove excess adapter sequenced.
Note: This protocol is adapted from the SPRI Based Size Selection manual CRITICAL: Vortex magnetic beads and bring beads and DNA to room temperature before starting size selection assay. To calculate the additional number of cycles required, plot linear Rn versus cycle and determine the additional number that corresponds to one-third of the maximum fluorescent intensity. For example, as shown for sample A replicate 1, the ''end point'' RFU value after 20 cycles is 3,900 which is divided by 3 to give ''one third'' RFU value of 1,300. A line is then drawn at 1,300 on the y-axis until it meets the amplification curve of that sample. This point is then joined to the x-axis to establish the number of additional cycles required for the library amplification. Take forward replicate libraries which require the minimum number of cycles for library amplification. For example: 50 mL sample 3 0.553 ratio = 27.5 mL of SPRIselect to add to library.

Reagent
25. Add beads to sample and pipette 10 times and incubate for 5 min at RT.
CRITICAL: Insufficient mixing of samples and SPRIselect will lead to inconsistent size selection results.
Note: The yields of ATAC-seq libraries are typically lower as compared to other library protocols, e.g., RNA-seq or WGS. To ensure that all of the desired DNA fragments are bound to the beads the incubation time can be extended. However, extending the time of DNA and bead incubation can increase binding of unwanted size fragments.
26. Add samples to an appropriate magnetic rack or plate such as DynaMag-2 Magnet. Wait for the beads to attached to the magnet and for the sample to go clear. Bead attachment times can vary depending on the initial sample size, the volume of beads added, and the strength of the magnet. 27. Take the supernatant and keep -the large unwanted fragments are attached to the beads.
CRITICAL: Be careful not to pipette any of the beads when removing the supernatant as this will contaminate the library prep with large DNA fragments.
Note: Steps 28-32 are for Small Size Selection (>100 bp) Note: Increasing the ratio of SPRIselect volume to sample volume will decrease the efficiency of binding larger fragments.
28. To select for DNA fragments that are 100 bp and above use a final volume of 1.83 SPRIselect based on the initial reaction volume. The second SPRIselect volume is determined after accounting for the first volume of 0.553 SPRIselect added.
29. Add SPRIselect to sample, mix well, and incubate for 5 min at RT. 30. Add sample to magnetic rack. 31. Make fresh 80% ethanol. 32. Once beads have attached to the magnet and the sample has gone clear, remove, and discard the supernatant. The library is now attached to the beads.
CRITICAL: Be careful not to remove beads during this step because the desired library is now associated with the beads. Significant bead loss will result in reduced yield.
33. With the reaction vessel still on the magnet, add 500 mL of 80% ethanol and incubate at RT for 1 min. Remove and discard the ethanol supernatant.
CRITICAL: Take care not to remove any beads during this step. CRITICAL: Allow all ethanol to evaporate but DO NOT let the beads dry out.
The bead supernatant will begin to crack if allowed to dry out completely -avoid this.
35. Remove sample from magnetic rack and resuspend beads in 23 mL nuclease-free H 2 O. Mix well.
Note: Elution volume should be large enough so that the liquid level is high enough for the beads to settle to the magnet.
36. Leave samples to stand off the magnetic rack for 1 min. 37. Add samples back to the magnetic rack and allow beads to attach to the magnet. 38. Once clear, take 20 mL of eluted DNA and store in a DNA-low binding Eppendorf at 4 C.

Estimate of Library Yield Using the Qubit dsDNA HS Assay Kits
Timing: 15 min To accurately quantify the ATAC-seq library we recommend using Agilent Bioanalyzer high sensitivity DNA chips. These chips are extremely sensitive and work best when 1 ng/mL library is loaded. To establish the ATAC-seq library concentration for loading onto the bioanalyzer chip, first use the Qubit dsDNA HS Assay Kits to obtain a rough estimate of library yield.
The Qubit dsDNA HS (High sensitivity) Assay Kits include concentrated assay reagent, dilution buffer and prediluted DNA standards. The assay is accurate for sample concentrations from 10 pg/mL to 100 ng/mL. This assay is prepared at room temperature (RT) and is stable for 3 h.

Timing: 2 h
The Bioanalyzer high sensitivity DNA kits can quantify libraries from just a few amplification cycles thereby reducing the amplification bias of shorter DNA fragments and improving the quality of sequencing data. The kit also offers improved sensitivity for checking the size and quantity of DNA samples of limited abundance. The chip can detect DNA fragments in the range of 50-7,000 bp. This sizing range is optimal for analyzing the full spectrum of DNA fragments generated in ATAC-seq libraries.
Quantitative range is 5-500 pg/mL Make sure gel matrix reagents are at RT for at least 30 min before adding to chip.
Typically, 500 pg/mL loaded onto the chip gives an accurate quantification. 52. Run HS DNA chip on Bioanalyzer and quantify library molarity using the bioanalyzer software: 2100 Expert Software (Figures 2 and 3).

EXPECTED OUTCOMES
A successful ATAC-seq library will typically have a molarity in the region of 4-30 nM (Figure 3).
Successful libraries typically require fewer than 12 total cycles of PCR amplification. More than 12 cycles can impact the library complexity and result in over-amplification of the same DNA fragments. Library complexity can be improved by making biological replicates and selecting the best libraries, i.e., libraries that require the minimum number of amplification cycles (Figure 1) or by optimizing the cell input.
Successful double size selection results will contain a selection window ranging from 100-1,000 bp ( Figure 4). The full fragment size distribution should be maintained.

QUANTIFICATION AND STATISTICAL ANALYSIS Library Sequencing
Timing: 48 h ATAC-seq and the Nextera workflows are designed for sequencing using Illumina high-throughput sequencing instruments. When sequencing ATAC-seq libraries, Nextera-based sequencing primers and reagents must be used Figure 5). For inferring differences in open chromatin with human samples and transcription factor mapping we used >100 million mapped reads.

A 4 nM library pool of 16-20 libraries were clustered and sequenced over 8 lanes of an Illumina
HiSeq 4000 flow cell. 5 mL of the 4 nM library pool was used for each lane. Clustering was done onto the flow cell by using a HiSeq 3000/4000 cluster kit and an Illumina cBot2.
CRITICAL: To achieve the best quality of sequence avoid generating air bubbles when clustering reagents are mixed.  Note: If a different sequencing platform will be used for sequencing of the library pool the molarity of the pool and the number of libraries in the pool may have to be adjusted.
Note: Data yield can be impacted by the large fraction of mitochondrial reads.

Sequence Read Mapping and Peak Calling
Timing: 4-8 h Stand-alone bioinformatic analysis pipelines for ATAC-seq read mapping, quality control, peak calling, and differential analysis include: (i) the bcbio-nextgen toolkit (via the ChIP/ATAC-seq pipeline) https://bcbio-nextgen.readthedocs.io/en/latest/contents/atac.html; or (ii) the nfcore-atacseq pipeline https://nf-co.re/atacseq. These pipelines are capable of running tasks across multiple compute infrastructures. Importantly, both pipelines incorporate QC analysis steps to determine whether your ATAC-seq experiment is

OPEN ACCESS
suitable for further downstream analysis. In addition to alignment level QC and estimation of library complexity, ATAC-seq specific QC reports are generated using ataqv https://github.com/ ParkerLab/ataqv. ataqv produces a web-based visualization and analysis report including a number of basic QC metrics.
The authors recommend that practitioners follow ATAC-seq standards used by ENCODE https:// www.encodeproject.org/atac-seq/ and refer to the following guide which provides a detailed overview of the ATAC-seq analysis workflow and important QC metrics https://yiweiniu.github.io/blog/ 2019/03/ATAC-seq-data-analysis-from-FASTQ-to-peaks.
As a general rule, experiments should have two or more biological replicates with each replicate having 50 million non-duplicate, non-mitochondrial aligned reads when using paired-ended sequencing. Mitochondrial reads will typically have high coverage and should be removed. Duplicate reads which likely arise from PCR artifacts should also be removed. The percentage of mapped reads should be greater than 80%.
ATAC-seq specific quality metrices can also be evaluated using ATACseqQC: a Bioconductor package for post-alignment quality assessment of ATAC-seq data Ou et al., 2018. Successful mapping of ATAC-seq libraries will be in the region of 60%-90%. https://bioconductor.org/packages/release/bioc/html/ATACseqQC.html

Set up a YAML file detailing the samples and the pipeline to use
Note: This pipeline only requires a YAML file. Installation of the bcbio-nextgen toolkit, setup of the YAML file and running of the pipeline is well documented in the bcbio-nextgen manual (https://bcbio-nextgen.readthedocs.io/en/latest/index.html).
4. Remove any samples with a low number of mapped reads (we used a threshold of <100 million mapped reads from the pipeline QC report) to exclude samples from the downstream analysis 5. Check output peak files from macs2 component of bcbio-nextgen pipeline for successful peak calling Note: The number of peaks called will vary by sample but we would expect a minimum of 10,000 peaks called for most samples. Differential Peak Analysis and TF Mapping Peaks associated with specific conditions are identified by differential peak analysis. Enrichment of condition-specific transcription factors is calculated by identifying transcription factor binding motifs in the center of condition-specific peaks.
Note: Bioconductor package DiffBind (https://bioconductor.org/packages/release/bioc/ html/DiffBind.html) is used for the condition-specific peak identification and HOMER (http://homer.ucsd.edu/homer/motif/) is used for the TF mapping. Manuals are available for both software detailing all options available for each step.
6. Create a sample sheet detailing all samples. This can be an R data frame or a CSV/XLSX file and must include sample ID, condition, replicate (if applicable), location of BAM file and location of macs2 peak file (see DiffBind vignette and user manual for further options). 7. Read in the peaksets for all samples using the dba() function. The only input required for this function is the sample sheet from step 6 (either (a) a data frame containing the sample sheet or (b) the name and location of CSV/XLSX file to load). library(DiffBind) a) dba.object <-dba(sampleSheet = sample.sheet) b) dba.object <-dba(sampleSheet = ''path/to/sample_sheet.csv'') 8. Count reads for the full set of peaks called across all samples (dba.count()). Re-center peaks around the most enriched point of the peak and set to a standard width of 500 bp (250 bp upand 250 bp downstream of peak center) using the option ''summits = 250''. dba.object <-dba.count(dba.object, summits = 250, score = DBA_SCORE_TMM_READS_FULL) Note: Various exploratory plots can be generated to visually inspect the data. For example, correlation heatmaps can be generated after both steps 7 and 8 to produce a clustering of the samples (plot(dba.object)) and a PCA plot can be generated after step 8 (dba.plotPCA(dba.object, DBA_CONDITION, label = DBA_ID)).
12. Produce BED files (or HOMER peak files) of the genomic locations for the open peaks in each condition (see HOMER manual for details of input file format: http://homer.ucsd.edu/homer/ ngs/peakMotifs.html). 13. Perform TF motif enrichment for each condition using the findMotifsGenome.pl script from the HOMER software. The only required user input is the peak file. Other required arguments of the function are: genome (e.g., hg19) output directory size (optional: user background peak file -BED file or HOMER peak file. -bg background/BED file) findMotifsGenome.pl <peak/BED file> <genome> <output directory> -size # [options] Note: ''size'' was set to 200 to focus motif detection on the 200 bp around the center of each peak.
Note: findMotifsGenome.pl can compare identified motifs in the input BED file to a randomly produced background or a user supplied background file. We found that comparison to a random background consistently reported AP-1/JUN factors as the most enriched motifs. Comparison of open peaks to closed peaks in one condition produced condition-specific results.

Promoter Mapping and GO Enrichment Analysis
Downstream functional analysis of normalized ATAC-seq data was performed using Bioconductor packages ChIPseeker, and clusterProfiler.

LIMITATIONS
Both genomic DNA and mitochondrial DNA are processed in ATCA-seq libraries. To ensure enough sequencing depth to allow for TF mapping and removal of contaminating mitochondrial reads, faster sequencing platforms such as NextSeq 500 mid output flow cells (130 million) reads provided ll OPEN ACCESS STAR Protocols 1, 100079, September 18, 2020 insufficient sequencing depth and problems with over-clustering. Therefore, sequencing platforms such as NovaSeq 6000 or NextSeq 2000 are recommended. The Illumina HiSeq system is also suitable for sequencing ATAC-seq libraries but has recently been discontinued.

Potential Solution
Most problems arise from not determining the appropriate number of cells to use in the tagmentation reaction. A low cell to transposase enzyme ratio can result in over-tagmentation, which produces a library mainly represented by small DNA fragments ( Figure 6).
Over-tagmented libraries can be identified by a DNA trace that has one distinct band of sub-mononuleosomal size and has lost banding patterns at higher molecular weight bands representing mono-nucleosomal and di-nucleosomal sizes.
Under-tagmentation occur when too many cells have been added and results in a library mainly composed of large DNA fragments, which can be identified by one distinct band at the higher molecular weight marker on a DNA gel.
To resolve the problem of under-or over-tagmentation first try a series of cell inputs for the transposition reaction. Typically, a range between 50,000-200,000 cells worked well for PDCLs tested. Different cell types may require less or more cells for this reaction.

Problem
Persisting small or large DNA fragments after the two-part size selection library clean-up (visible by bioanlayzer electropherogram).

Potential Solution
Unwanted DNA fragments can persist after the SPRIselect clean-up step which can be a consequence of either bead contamination or incorrect incubation time with the beads.
For large DNA fragment contamination, be careful not to pipette any of the beads that are attached to the magnet when removing the library supernatant. If beads detach into the library supernatant, place the Eppendorf tube back on the magnetic rack and wait for all beads to reattach to the magnet. Then carefully remove the library supernatant.
If bead contamination is not the problem, extending the time of DNA and bead incubation can increase binding of DNA fragments. However, this should be optimized to ensure that DNA fragments of interest do not also bind. Typically, an incubation time in the region of 5-15 min is recommended.
For small DNA fragment contamination, the ratio of SPRIselect can be optimized to exclude smaller DNA fragments. For example, a ratio between 1.8 -1.2 SPRIselect will bind fragments in the region of 100-1,000 bp, however this may vary depending on cell type.

Potential Solution
A DNA smear can be caused by violent pipetting during the lysis step. During the cell preparation and transposition steps (step 4-13) treat the cell pellet very gently. Lightly dislodge the pellet by flicking the Eppendorf tube rather than by pipetting. If pipetting is required, carefully resuspend the pellet once. Do not aggressively resuspend the pellet in lysis buffer by pipetting multiple times.
Additionally, if the level of detergent is too high for your specific cell type the nucleus might rupture, therefore the detergent concentration might require optimization for different cell types.

Problem
Low DNA yield.

Potential Solution
Significant library yield can be lost during the two-part size selection library clean-up. Library incubation times with beads must be optimized to ensure the correct sizes are captured (as described above).
During the SPRIselect bead ethanol wash step (step 33), it is crucial not to remove beads during this step because the desired library is now associated with the beads.
Increasing the cell input should also increase the DNA yield and should be optimized for different cell types.

Lead Contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Dr Peter J. Bailey (Peter.Bailey.2@glasgow.ac.uk).

Materials Availability
The patient-derived cell lines were provided by the Australian Pancreatic Cancer Genome Initiative (APGI, www.pancreaticcancer.net.au) and the Garvin Institute of Medical Research (Sydney, Australia).
Data and Code Availability ATAC-seq sequencing data from patient-derived cell lines can be found at BioProject ID: PRJNA630992. Original data for all datasets in this paper are available at Mendeley Data DOI:10. 17632/74s7crj7xj.1. All software packages used are publicly available through commercial vendors.