Defining the location of promoter-associated R-loops at near-nucleotide resolution using bisDRIP-seq

R-loops are features of chromatin consisting of a strand of DNA hybridized to RNA, as well as the expelled complementary DNA strand. R-loops are enriched at promoters where they have recently been shown to have important roles in modifying gene expression. However, the location of promoter-associated R-loops and the genomic domains they perturb to modify gene expression remain unclear. To resolve this issue, we developed a bisulfite-based approach, bisDRIP-seq, to map R-loops across the genome at near-nucleotide resolution in MCF-7 cells. We found the location of promoter-associated R-loops is dependent on the presence of introns. In intron-containing genes, R-loops are bounded between the transcription start site and the first exon-intron junction. In intronless genes, the 3' boundary displays gene-specific heterogeneity. Moreover, intronless genes are often associated with promoter-associated R-loop formation. Together, these studies provide a high-resolution map of R-loops and identify gene structure as a critical determinant of R-loop formation.


1
Since bisDRIP-seq is a new technique, we lacked sufficient information to determine the appropriate number of samples. As such, information regarding appropriate sample size is not included in this submission.

Statistical reporting
 Statistical analysis methods should be described and justified  Raw data should be presented in figures whenever informative to do so (typically when N per group is less than 10)  For each experiment, you should identify the statistical tests used, exact values of N, definitions of center, methods of multiple test correction, and dispersion and precision measures (e.g., mean, median, SD, SEM, confidence intervals; and, for the major substantive results, a measure of effect size (e.g., Pearson's r, Cohen's d)  Report exact p-values wherever possible alongside the summary statistics and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.
Please outline where this information can be found within the submission (e.g., page numbers or figure legends), or explain why this information doesn't apply to your submission: The number of control-treated bisDRIP-seq experiments performed is mentioned on pages 5 and 22, as well as the figure legends of figures 1-6. The number of triptolide-treated bisDRIP-seq experiments performed is mentioned on pages 22 and 27 and the figure legends of figures 2-5.
We clarify the nature of our replicates (biological vs technical) on page 22.
Determination of which Gencode transcription start sites to use in our full TSS list is described on pages 36-37. For promoter ranking, we excluded inactive promoters and removed promoters that had zero bisDRIP-seq score in one or more samples as described on pages 40-41. For later metaplot analysis, inactive promoters were excluded for sense R-loop analysis as described on page 12 and 37. For intron-exon metaplot analysis, inclusion of exon-intron junctions, and the associated transcription start sites, was described on pages 38-39.
A private link to high-throughput sequence data is provided on page 43.   Figure 3A: Wilcoxon signed-rank test, n = 13 samples, nonparametric test. Multiple-hypothesis test correction was performed using a Bonferroni approach and initial p-value were multiplied by the total number of hypotheses (2001 nucleotide positions) Figure 3 -figure supplement 2B: Wilcoxon signed-rank test, n = 3020 promoter regions, non-parametric test We also describe statistical tests on: page 7: Spearman's test using asymptotic t approximation, n = 78218 promoter regions, non-parametric test. pages 8-9: Spearman's test using asymptotic t approximation, n = 78218 promoter regions, non-parametric test.
Additional data files ("source data")  We encourage you to upload relevant additional data files, such as numerical data that are represented as a graph in a figure, or as a summary table  Where provided, these should be in the most useful format, and they can be uploaded as "Source data" files linked to a main figure or table  Include model definition files including the full list of parameters used  Include code used for data analysis (e.g., R, MatLab)  Avoid stating that data files are "available upon request" Please indicate the figures or tables for which source data files have been provided: eLife Sciences Publications, Ltd is a limited liability non-profit non-stock corporation incorporated in the State of Delaware, USA, with company number 5030732, and is registered in the UK with company number FC030576 and branch number BR015634 at the address 1st Floor, 24 Hills Road, Cambridge CB2 1JP | August 2014 4 (1) Figure 1B is based on Source_data_file_1.xls (2) In Figure 4, the genomic location of exon-intron junctions are included in Source_data_file_2.txt (3) Figure 5A and 5B and Table 1