DNA sequence encodes the position of DNA supercoils

The three-dimensional organization of DNA is increasingly understood to play a decisive role in vital cellular processes. Many studies focus on the role of DNA-packaging proteins, crowding, and confinement in arranging chromatin, but structural information might also be directly encoded in bare DNA itself. Here, we visualize plectonemes (extended intertwined DNA structures formed upon supercoiling) on individual DNA molecules. Remarkably, our experiments show that the DNA sequence directly encodes the structure of supercoiled DNA by pinning plectonemes at specific sequences. We develop a physical model that predicts that sequence-dependent intrinsic curvature is the key determinant of pinning strength and demonstrate this simple model provides very good agreement with the data. Analysis of several prokaryotic genomes indicates that plectonemes localize directly upstream of promoters, which we experimentally confirm for selected promotor sequences. Our findings reveal a hidden code in the genome that helps to spatially organize the chromosomal DNA.


Introduction
Control of DNA supercoiling is of vital importance to cells. Torsional strain imposed by DNA-processing enzymes induces supercoiling of DNA, which triggers large structural rearrangements through the formation of plectonemes (Vinograd et al., 1965). Recent biochemical studies suggest that supercoiling plays an important role in the regulation of gene expression in both prokaryotes (Le et al., 2013) and eukaryotes (Naughton et al., 2013;Pasi and Lavery, 2016). In order to tailor the degree of supercoiling around specific genes, chromatin is organized into independent topological domains with varying degrees of torsional strain (Naughton et al., 2013;Sinden and Pettijohn, 1981). Domains that contain highly transcribed genes are generally underwound whereas inactive genes are overwound (Kouzine et al., 2013). Furthermore, transcription of a gene transiently alters the local supercoiling (Kouzine et al., 2013;Naughton et al., 2013;Peter et al., 2004), while, in turn, torsional strain influences the rate of transcription (Chong et al., 2014;Liu and Wang, 1987;Ma et al., 2013).
For many years, the effect of DNA supercoiling on various cellular processes has mainly been understood as a torsional stress that enzymes should overcome or exploit for their function. More recently, supercoiling has been acknowledged as a key component of the spatial architecture of the genome (de Wit and de Laat, 2012;Dekker et al., 2013;Ding et al., 2014;Neuman, 2010). Here, bound proteins are typically viewed as the primary determinant of sequence-specific tertiary structures while intrinsic mechanical features of the DNA are often ignored. However, the DNA sequence influences its local mechanical properties such as bending stiffness, curvature, and duplex stability, which in turn alter the energetics of plectoneme formation at specific sequences (Dittmore et al., 2017;Irobalieva et al., 2015;Matek et al., 2015). Unfortunately, the relative importance of these factors that influence the precise tertiary structure of supercoiled DNA have remained unclear (Dekker and Heard, 2015). Various indications that the plectonemic structure of DNA can be influenced by the sequence were obtained from biochemical and structural studies (Kremer et al., 1993;Laundon and Griffith, 1988;Pfannschmidt and Langowski, 1998;Tsen and Levene, 1997) as well as from work performed in silico (Eslami-Mossallam et al., 2016;Pasi and Lavery, 2016;Wang et al., 2017). These studies suggested that plectonemes may get localized to highly curved or flexible segments of DNA. However, this examined only a handful of specific sequences such as phased poly(A)-tracts and a particular high-curvature sequence rich in poly(A)-tracts, making it difficult to determine if curvature, long poly(A)-tracts, or some other DNA feature drives the sequencestructure relationship.
Here, we study how DNA sequence governs the structure of supercoiled DNA by use of a recently developed single-molecule technique termed ISD (intercalation-induced supercoiling of DNA) (Ganji et al., 2016b), which uses intercalating dyes to induce supercoiling as well as to observe the resultant tertiary structures in many DNA molecules in parallel. Plectonemes are directly observable as intensity maxima along the DNA, from which their position along DNA can be extracted (see Figure 1a and Figure 1-figure supplement 1). We find a strong relationship between sequence and plectoneme localization. By examining many different sequences, we systematically rule out several possible mechanisms of the observed sequence dependence. Using a model built on basic physics, we show that the local intrinsic curvature determines the relative plectoneme stability at different sequences. Application of this model to sequenced genomes reveals a clear biological relevance, as we identify a class of plectonemic hot spots that localize upstream of prokaryotic promoters. Subsequently, we confirm that these sequences pin plectonemes in our singlemolecule assay, testifying to the predictive power of our model. We also discuss several eukaryotic genomes where plectonemes are localized near promoters with a spacing consistent with nucleosome positioning. Taken together, our experimental results and our physical model show a clear sequence-supercoiling relationship and indicate that genomic DNA encodes information for positioning of plectonemes, likely to regulate gene expression and contribute to the three-dimensional spatial ordering of the genome.

Results
Single-molecule visualization of individual plectonemes along supercoiled DNA To study the behavior of individual plectonemes on various DNA sequences, we prepared 20 kblong DNA molecules of which the end regions (~500 bp) were labelled with multiple biotins for surface immobilization (Figure 1-figure supplement 1a-b). The DNA molecule were flowed into streptavidin-coated sample chamber at a constant flow rate to obtain stretched double-tethered DNA molecules ( Figure 1a and Figure 1-figure supplement 1a). We then induced supercoiling by adding an intercalating dye, Sytox Orange (SxO), into the chamber and imaged individual plectonemes formed on the supercoiled DNA molecules. Notably, SxO does not have any considerable effect on the mechanical properties of DNA under our experimental conditions (Ganji et al., 2016b). Consistent with previous studies (Ganji et al., 2016b;van Loenhout et al., 2012), we observed dynamic spots along the supercoiled DNA molecule (highlighted with arrows in Figure 1b-top left and Video 1). These spots disappeared when DNA torsionally relaxed upon photo-induced nicking (Figure 1b-bottom left) (Ganji et al., 2016b), confirming that the spots were plectonemes induced by the supercoiling. Interestingly, the time-averaged fluorescence intensities of the supercoiled DNA were not homogeneously distributed along the molecule (Figure 1b DNA sequence favors plectoneme localization at certain spots along supercoiled DNA Upon observing the inhomogeneous fluorescence distribution along the supercoiled DNA, we sought to understand if the average plectoneme position is dependent on the underlying DNA sequence. We prepared two DNA samples; the first contained a uniform distribution of AT-bases while the second contained a strongly heterogeneous distribution of AT-bases (Figure 1c, template1 and template2, respectively). In order to quantitatively analyze the plectoneme distribution, we counted the average number of plectonemes over time at each position on the DNA molecules and built a position-dependent probability density function of the plectoneme occurrence (from now onwards called plectoneme density; see Materials and methods for details). The plectoneme density is normalized to its average value across the DNA such that a density value above one indicates that the region is a favorable position for plectonemes relative to other regions within the DNA molecule. For both DNA samples, we observed a strongly position-dependent plectoneme density ( Figure 1d). Strikingly, the plectoneme densities ( Figure 1d) were very different for the two DNA samples. This difference demonstrates that plectoneme positioning is directed by the underlying DNA sequence. Note that we did not observe any position dependence in the intensity profiles when the DNA was torsionally relaxed, indicating that the interaction of dye is not responsible for the dependence (Figure 1-figure supplement 2a).
The plectoneme kinetics showed a similar sequence dependence, as the number of events for nucleation and termination of plectonemes was also found to be position dependent with very different profiles for each DNA samples (Figure 1-figure supplement 2b). Importantly, at each position of the DNA, the number of nucleation and termination events were the same, showing that the system was at equilibrium. Because the aim of our study is to examine the sequence-structure relationship in supercoiled DNA, which is an equilibrium property, we focus on analyzing the plectoneme density profiles for a variety of sequences.
Systematic examination of plectoneme pinning at various putative DNA sequences We first considered a number of potential links between DNA sequence and plectoneme density. Note that in particular the sharply bent apical tips of plectonemes ( Figure 1A) create an energy barrier to plectoneme formation. This barrier could be reduced if the DNA was able to locally melt or kink, if a specific region of DNA was more flexible than others, or if the DNA sequence was intrinsically curved already before the plectoneme formed. Because all of these properties (duplex stability, flexibility, and curvature) are influenced by the AT-content, we first examined the relationship between AT-content and the measured plectoneme densities in Figure 1c-d. Indeed, the plectoneme density showed a weak correlation with the local AT-percentage (R = 0.33, Figure 2-figure supplement 1a).
In order to unambiguously link changes in plectoneme density to specific sequences of arbitrary size, we developed an assay where we inserted various short DNA segments carrying particular sequences of interest in the middle of the homogeneous template1 ( Figure 2a and Figure 2-figure supplement 1b). This allowed us to easily determine the influence of the inserted sequence on plectoneme formation by measuring changes in the plectoneme density at the insert relative to the rest of the DNA strand. We examined three different AT-rich inserts: seqA, seqB, and seqC with~60%,~65%, and~60% AT, respectively ( Figure 2a). Interestingly, all three samples showed a peak in the plectoneme density at the position of insertion, further supporting the idea that AT-rich sequences are preferred positions for plectonemes ( Figure 2b). Furthermore, when we shortened or lengthened one AT-rich sequence (seqA), we found that the probability of plectoneme pinning (i.e. the area under the peak) scaled with the length of the AT-rich fragment (Figure 2-figure supplement 1b-e). Overall, these results suggest that plectoneme preferentially form at AT-rich regions.
However, it is clear that AT-content alone cannot be the only factor that sets the plectoneme pinning. For example, the right-end of template1 exhibits a region that pins plectonemes strongly (Figure 1d-top, arrow), even though this region is not particularly AT-rich ( Figure 1c). When we inserted a 1 kb copy of this pinning region into the middle of template1 (Figure 2c, 'seqCopy'), we observed an additional peak in plectoneme density (Figure 2d, green). Given that this region had the same total AT-content as the surrounding DNA, we hypothesized that the particular distribution of A and T bases may be more important than the total AT-content alone. In particular, poly(A)tracts influence the local mechanical properties of DNA and might be responsible for the plectoneme pinning, as suggested by early studies (Kremer et al., 1993;Pfannschmidt and Langowski, 1998;Tsen and Levene, 1997). To test this, we removed all poly(A) tracts of length four or higher by replacing alternative A-bases with G or C-bases in seqCopy (Figure 2c, 'A-G mutation'). Upon this change, the peak in the plectoneme density indeed disappeared (Figure 2d, blue). However, when we instead disrupted the poly(A)!4 tracts by replacing them with alternating AT-stretches Video 1. A representative real-time fluorescence image of a supercoiled DNA molecule that shows dynamic bright spots upon plectoneme formation. At 20 s after acquisition, the DNA was torsionally relaxed due to photo-induced nicking. DOI: https://doi.org/10.7554/eLife.36557.005 ( Figure 2c, 'A-T mutation'), we, surprisingly, did observe strong pinning (Figure 2d, red), establishing that plectoneme pinning does not strictly require poly(A)-tracts either. Hence, instead of poly(A)tracts, it could be possible that stretches consisting of either A and T ('poly(A/T)-tracts') induce the plectoneme pinning. To test this hypothesis, we re-examined the seqB construct to test if long stretches of 'weak' bases (i.e. A or T) were the source of pinning. Here, we broke up all poly(A/T)!4 tracts (i.e. all linear stretches with a random mixture of A or T bases but no G or C bases) by shuffling bases within the seqB insert while keeping the overall AT-content the same. This eliminated plectoneme-pinning, consistent with the idea that poly(A/T) tracts were the cause (  TAT TAT TAT TAT TAT TAT TAT TAT TAT TAT TAT TAT TAT   template 1 seqB seqA seqC presence of poly(A/T) stretches, but instead is dependent on the relative positions of these stretches. Taken together, this systematic exploration of various sequences showed that although pinning correlates with AT-content, we cannot attribute this correlation to AT-content alone, to poly(A)tracts, or to poly(A/T)-tracts. Our data instead suggest that plectoneme pinning depends on a local mechanical property arising from the combined effect of the entire base sequences in a local region, and our shuffled poly(A/T) constructs suggest this property must be measured over distances greater than tens of nucleotides. Among the three mechanical properties we first considered, duplex stability, flexibility, and curvature, the duplex stability is unlikely to be a determinant factor for the plectoneme pinning because duplex stability is mostly determined by the overall AT/GC percentage rather than the specific distribution of bases in the local region.
Intrinsic local DNA curvature determines the pinning of supercoiled plectonemes To obtain a more fundamental understanding of the sequence specificity underlying the plectoneme pinning, we developed a novel physical model based on intrinsic curvature and flexibility for estimating the plectoneme energetics (see Materials and methods for details). Notably, the major energy cost for making a plectoneme is spent in inducing a strong bend within the DNA in the plectoneme tip region. Our model estimates the energy cost associated with bending the DNA into the highly curved (~240˚arc) plectoneme tip (Marko and Neukirch, 2012). For example, at 3pN of tension (characteristic for our stretched DNA molecules), the estimated size of the bent tip is 73 bp, and the energy required to bend it by 240˚is very sizeable,~18 k BT (Figure 3a-b). However, if a sequence has a high local intrinsic curvature or flexibility, this energy cost decreases significantly. For example, an intrinsic curvature of 60˚between the two ends of a 73 bp segment would lower the bending energy by a sizable amount,~8 k B T. Hence, we expect that this energy difference drives plectoneme tips to pin at specific sequences. We calculated local intrinsic curvatures at each segment along a relaxed DNA molecule using published dinucleotide parameters for tilt/roll/twist ( Figure 3a and supplementary file 1) (Balasubramanian et al., 2009). The local flexibility of the DNA was estimated by summing the dinucleotide covariance matrices for tilt and roll (Lankas et al., 2003) over the length of the loop. Using this approach, we estimate the bending energy of a plectoneme tip centered at each nucleotide along a given sequence ( Figure 3b). The predicted energy landscape is found to be rough with a standard deviation of about~1 k B T, in agreement with a previous experimental estimate based on plectoneme diffusion rates (van Loenhout et al., 2012). We then used these bending energies to assign Boltzmann-weighted probabilities, P B ¼ exp À

Eloop kBT
, for plectoneme tips centered at each base on a DNA sequence. This provided theoretically estimated plectoneme densities as a function of DNA sequence. Note that we obtained these profiles without any adjustable fitting parameters as the tilt/roll/twist and flexibility values were determined by dinucleotide parameters adopted from published literature. Although both intrinsic curvature and flexibility were included, the model predicts that the flexibility is unimportant and that intrinsic curvature clearly is the dominant factor in positioning plectonemes ( Figure 3c). The predicted plectoneme densities (Figure 3d and Figure 3-figure supplement 1) are generally found to be in very good agreement with the measured plectoneme densities. For example, the non-intuitive mutant sequences tested above (A-G and A-T mutations) are faithfully predicted by the model (Figure 2d and Figure 3d). More generally, we find that the model qualitatively represented the experimental data for the large majority of the sequences that were tested (Figure 3-figure supplement 1). The simplicity of the model and the lack of fitting parameters make this agreement all the more striking. Only occasionally, we find that the model is too conservative, that is while it performs well in avoiding false positives, it suffers from some false negatives (Figure 3-figure supplement 1, SeqA, SeqB, and SeqC), possibly because of an insufficient accuracy in the dinucleotide parameters that we adopted from the literature. For example, different dinucleotide parameter sets from the currently available literature produce variations in the model predictions (Figure 3-figure supplement 2). Alternative explanations for the false negatives are also possible, for example that the local curvature is influenced by interactions spanning beyond nearest-neighbor nucleotides, or some unknown DNA sequences that stabilize twist rather than strand writhing or that are prone to base-flipping even in the positive supercoiling regime.
Position ( Schematics showing the energy required to bend a rigid elastic rod as a simple model for the tip of a DNA plectoneme. (c) Plectoneme density prediction based on intrinsic curvature and/or flexibility for seqCopy. Predicted plectoneme densities calculated based on either DNA flexibility (blue), only curvature (red), or both (black). Combining flexibility and curvature did not significantly improve the prediction comparing to that solely based on DNA curvature. (d) Predicted plectoneme densities for the DNA constructs carrying a copy of the end peak and its mutations, as in Figure 2b. Note the excellent correspondence to the experimental data in Figure 2b.
(e-f) Predicted (e) and measured (f) plectoneme density of a synthetic sequence (250 bp) that is designed to strongly pin a plectoneme. Raw data from the model are shown in black and its Gaussian-smoothed (FWHM = 1600 bp) is Figure 3 continued on next page As a test of the predictive power of our model, we designed a 250 bp-long sequence ('curved250') for which our model a priori predicted a high local curvature and strong plectoneme pinning (Figure 3e). When we subsequently synthesized and measured this construct, we indeed observed a pronounced peak in the plectoneme density (Figure 3f, blue). By contrast, when we constructed a 500 bp-long flat sequence without strongly curved regions ('flat500'), the model predicted no such peak, which again was verified experimentally (Figure 3f, black). These data demonstrate that the model can be used to identify potential plectoneme pinning sites in silico. Perhaps most strikingly, we found that a single highly curved DNA sequence of only 75 bp length was able to pin plectonemes (Figure 3g), consistent with the approximated tip loop size in our physical model (~73 bp). As a negative control, we did not observe any such pinning when we inserted a 75 bp-long flat DNA sequence (Figure 3h).
Finally, we wanted to verify that the intrinsic curvature, and not the GC/AT content, is the major determinant for plectoneme formation. Given that the earlier examples in Figure 2f clearly showed that some but not all AT-rich sequences can pin plectonemes, we designed some specifically GCrich (i.e., AT-poor) sequences that should pin plectonemes. Because of the distribution of wedge angles available, GC-rich sequences tend to produce less intrinsic curvature over >10 bp sequences. To generate plectoneme pinning at a GC-rich sequence, we therefore inserted 8 repeats of a 75 bplong GC-rich (~60%) insert in the middle of the flat500 sequence. As predicted by the model, the experimental data for this GC-rich curved sequence showed plectoneme pinning (Figure 3i), once more confirming that intrinsic curvature and not AT/GC content is the major determinant for plectoneme pinning.

Transcription start sites localize plectonemes in prokaryotic genomes
Given the success of our physical model for predicting plectoneme localization, it is of interest to examine if the model identifies areas of high plectoneme density in genomic DNA that might directly relate to biological functions. Given that our model associates plectoneme pinning with high curvature, we were particularly interested to see what patterns might associate with specific genomic regions. For example, in prokaryotes, curved DNA has been observed to localize upstream of transcription start sites (TSS) (Kanhere and Bansal, 2005;Olivares-Zavaleta et al., 2006;Pérez-Martín et al., 1994). In eukaryotes, curvature is associated with the nucleosome positioning sequences found near promoters (Tompitak et al., 2017). However, given that our model requires highly curved DNA over long lengths of~73 bp to induce plectoneme pinning, it was a priori unclear if the local curvature identified at promoter sites is sufficient to strongly influence the plectoneme density.
We first used the model to calculate the plectoneme density profile for the entire E. coli genome, revealing plectonemic hot spots spread throughout the genomic DNA (Figure 4a). Interestingly, we find that a substantial fraction of these hot spots are localized~100 nucleotides upstream of all the transcription start sites (TSS) associated with confirmed genes in the RegulonDB database (Figure 4b, red) (Gama-Castro et al., 2016). We then performed a similar analysis of several other prokaryotic genomes (Figure 4b) (Cortes et al., 2013;Irla et al., 2015;Papenfort et al., 2015;Zhou et al., 2015). We consistently observe a peak upstream of the TSS, but the size of the peak varied substantially between species, indicating that different organisms rely on sequence-dependent plectoneme positioning to different extents. In one organism (C. crescentus), the signal was shown in blue in the left panel. Plectoneme densities measured from individual DNA molecules carrying the synthetic sequence (thin grey lines) and their averages (thick blue line) are shown in the right panel. (n = 37, and 21 for curved250, and flat500, respectively) (g-h) Model-predicted (upper panels) and experimentally measured (bottom panels) plectoneme densities of 75 bp-long highly curved (g) and flat (h) inserts. (i) Model-predicted (upper panels) and experimentally measured (bottom panels) plectoneme densities of curved GC-rich sequences. (n = 36, 26, 21, 20, 52, and 29 for curve75-1, curve75-2, flat75-1, flat75-2, GCcurve1, and GCcurve2, respectively). DOI: https://doi.org/10.7554/eLife.36557.008 The following figure supplements are available for figure 3: too weak to detect at all. To experimentally confirm that these sequences represent plectonemic hot spots, we inserted two of these putative plectoneme-pinning sites from E. coli into template1. Gratifyingly, we indeed observed a strong pinning effect for these sequences in our single-molecule assay (Figure 4c-d).
Finally, we extended our analysis to eukaryotic organisms. Again we found plectonemic hotspots that were spread throughout the genome (Figure 4e). When averaging near the TSS (Dreos et al., 2017), we found a diverse range of plectoneme positioning signals (Figure 4f). While one organism (S. cervisiae) showed no detectable plectoneme positioning, most organisms showed both peaks and valleys indicating plectonemes were enriched but also depleted at different regions around the promoter. The features showed a weak periodicity consistent with the reported nucleosome repeat lengths (~150-260 bp) (Jiang and Pugh, 2009).

Discussion
In this study, we reported direct experimental observations as well as a novel basic physical model for the sequence-structure relationship of supercoiled DNA. Our single-molecule ISD technique allowed a systematic analysis of sequences that strongly affect plectoneme formation. To explain the underlying mechanism, we developed a physical model that predicts the probability of plectoneme  pinning, based solely on the intrinsic curvature and the flexibility of a local region of the DNA. In the positive supercoiling regime (where no partial duplex melting is expected for the physiological range of tensions and torques), we identified the intrinsic curvature over a~70 bp range as the primary factor that determines plectoneme pinning, while the flexibility alters the mechanics only minimally. Examining full genomes, we found that plectonemes are enriched at promoter sequences in E. coli and other prokaryotes, which suggests a role of genetically encoded supercoils in cellular function. Our findings reveal how a previously unrecognized 'hidden code' of intrinsic curvature governs the localization of local DNA supercoils, and hence the organization of the three-dimensional structure of the genome.
For a long time, researchers wondered whether DNA sequence may influence the plectonemic structure of supercoiled DNA. Structural and biochemical approaches identified special sequence patterns such as poly(A)-tracts that indicated plectoneme pinning (Kremer et al., 1993;Laundon and Griffith, 1988;Pfannschmidt and Langowski, 1998;Tsen and Levene, 1997). These early studies suggested that highly curved DNA can pin plectonemes, but the evidence was anecdotal and restricted to a handful of example sequences and it was not possible to establish a general rule for sequence-dependent plectoneme formation. Our high-throughput ISD assay, however, generated ample experimental data that enabled a comprehensive understanding of the underlying mechanism of the sequence-dependent plectoneme pinning.
Our physical modeling reveals that intrinsic curvature is the key structuring factor for determining the three-dimensional structure of supercoiled DNA. In contrast, although perhaps counter-intuitive, we found that the local flexibility is hardly relevant for plectoneme localization. Although highly flexible mismatched single-stranded regions have been shown to be able to act as a preferential position for plectoneme formation (Dittmore et al., 2017;Ganji et al., 2016b), the variations in the flexibility of duplex DNA due to sequence differences seem to produce very minor changes in the pinning probability.
Remarkably, although only the energy required to form the limited tip-loop region of~73 bp is considered in our modeling, the model is capable of strikingly good qualitative predictions. In occasional cases, the model failed to reproduce the experimental results, giving some false negative predictions. A full statistical mechanical modeling of the plectonemic structures distributed across the DNA molecule should further improve the predictive power and accuracy, but will require significant computational resources and time.
Significant intrinsic curvatures are encoded in genomic DNA, as evident in our scans of both prokaryotic and eukaryotic genomes, which indicates its biological relevance. In support of this idea, an in silico study indeed suggested that curved prokaryotic promoters may control gene expression (Gabrielian et al., 1999). Moreover, early in vivo studies showed that curved DNA upstream to the promoter site affects gene expression levels (Collis et al., 1989;McAllister and Achberger, 1989). These in vivo studies suggested that curved DNA facilitates binding of RNA polymerase, an idea that is further supported by sharply bent DNA structures found around bound RNAP (Rees et al., 1993;Tahirov et al., 2002;ten Heggeler and Wahli, 1985;Yin and Steitz, 2002). In addition to this direct interaction of RNA polymerase and curved DNA, our results suggest an indirect effect, as the same curved DNA can easily pin a plectoneme that can further regulate the transcription initiation and elongation by structural re-arrangement of the promotor and coding regions.
Our analysis of prokaryotic genomes indicates that promoter sequences have evolved local regions with highly curved DNA that promote the localization of DNA plectonemes at these sites. There may be multiple reasons for this. For one, it may help to expose these DNA regions to the outer edge of the dense nucleoid, making them accessible to RNAP, transcription factors, and topoisomerases. Plectonemes may also play a role in the bursting dynamics of gene expression, since each RNAP alters the supercoiling density within a topological domain as it transcribes (Chong et al., 2014;Kouzine et al., 2013), adding or removing nearby plectonemes (Liu and Wang, 1987). In addition, by bringing distant regions of DNA close together, plectonemes may influence specific promoter-enhancer interactions to regulate gene expression (Benedetti et al., 2014). Finally, plectoneme tips may help RNA polymerase to initiate transcription, since the formation of an open complex also requires bending of the DNA (ten Heggeler-Bordier et al., 1992), a mechanism that was proposed as a universal method of regulating gene expression across all organisms (Travers and Muskhelishvili, 2007). The ability of our model to predict how mutations in the promoter sequence alter the plectoneme density opens up a new way to test these hypotheses.
Our analysis of eukaryotic genomes showed a greater diversity of behavior. The spacing of the peaks suggests that plectonemes may play a role in positioning nucleosomes, consistent with proposals that nucleosome positioning may rely on sequence-dependent signals near promoters (Travers et al., 2010). It is also broadly consistent with the universal topological model of plectoneme-RNAP interaction at promoters (Travers and Muskhelishvili, 2007), which proposes that the plectoneme tip forming upstream of the TSS in eukaryotes is positioned by nearby nucleosomes. The plectoneme signal encoded by intrinsic curvature could therefore indirectly position the promoter plectoneme tip by helping to organize these nearby nucleosomes.
In our study, we investigated the sequence-dependent behavior of plectonemes in a positively supercoiled state, although the technique can be extended to study negative supercoiling as well. For negative supercoils, plectoneme pinning can be influenced by both sequence-induced local curvature and local melting, which are hard to disentangle. Furthermore, although theoretical methods have been developed for the sequence dependence of the duplex stability of negatively supercoiled DNA (Benham, 1990;Benham, 1992), torsion-induced melting has been shown to exhibit complicated properties (Vlijm et al., 2015). The model that we have developed for positive supercoils should not be very sensitive to the handedness of supercoiling, since the dinucleotide curvature parameters are not strongly perturbed at these torques. We therefore expect the model to also capture curvature-dependent effects on pinning of negative plectonemes too.
The above findings demonstrate that DNA contains a previously hidden 'code' that determines the local intrinsic curvature and consequently governs the locations of plectonemes. These plectonemes can organize DNA within topological domains, providing fine-scale control of the threedimensional structure of the genome (Le et al., 2013). The model and assay described here make it possible both to predict how changes to the DNA sequence will alter the distribution of plectonemes and to investigate the DNA supercoiling behavior at specific sequences empirically. Using these tools, it will be interesting to explore how changes in this plectoneme code affect levels of gene expression and other vital cellular processes.

Preparation of DNA molecules of different sequences
Full sequences of all DNA molecules are given in Supplementary file 2. All DNA molecules except 'template 2' in Figure 1 were prepared by ligating four or five DNA fragments, respectively: 1) 'Cy5biotin handle', 2) '8.4 kb fragment', [3) 'Sequence of Interest',] 4) '11.2 kb fragment', and 5) 'biotin handle' (Figure 1-figure supplement 1b). The 'Cy5-biotin handle' and 'biotin handle' were prepared by PCR methods in the presence of Cy5-modified and/or biotinylated dUTP (aminoallyl-dUTP-Cy5 and biotin-16-dUTP, Jena Bioscience). The '8kb-fragment' and '11 kb fragment' were prepared by PCR on Unmethylated Lambda DNA (Promega). These fragments were cloned into pCR-XL using the TOPO XL PCR cloning kit (Invitrogen) generating pCR-XL-11.2 and pCR-XL-8.4 (Ganji et al., 2016b). The fragments were PCR amplified and then digested with BsaI restriction enzyme, respectively (Supplementary file 3). The 'Sequence of Interest' was made by PCR on different templates. Template two in Figure 1C-black and 1e was made from a digested fragment of an engineered plasmid pSuperCos-l1,2 with XhoI and NotI-HF (van Loenhout et al., 2012). The digested fragment was further ligated with biotinylated PCR fragments on XhoI side and a biotinylated-Cy5 PCR fragment on the NotI-HF (Supplementary file 4). All the DNA samples were gel-purified before use.

Dual-color epifluorescence microscopy
Details of our experimental setup are described previously (Ganji et al., 2016a;Ganji et al., 2016b). Briefly, a custom-made epifluorescence microscopy equipped with two lasers (532 nm, Samba, Cobolt and 640 nm, MLD, Cobolt) and an EMCCD camera (Ixon 897, Andor) is used to image fluorescently labeled DNA molecules. For the wide-field, epifluorescence-mode illumination on the sample surface, the two laser beams were collimated and focused at the back-focal plane of an objective lens (60x UPLSAPO, NA 1.2, water immersion, Olympus). Back scattered laser light was filtered by using a dichroic mirror (Di01-R405/488/543/635, Semrock) and the fluorescence signal was spectrally separated by a dichroic mirror (FF635-Di02, Semrock) for the SxO channel and Cy5 channel. Two band-pass filters (FF01-731/137, Semrock, for SxO) and FF01-571/72, Semrock, for Cy5) were employed at each fluorescence channel for further spectral filtering. Finally, the fluorescence was imaged on the CCD camera by using a tube lens (f = 200 mm). All the measurements were performed at room temperature.

Intercalation-induced supercoiling of DNA (ISD)
A quartz slide and a coverslip were coated with polyethlyleneglycol (PEG) to suppress nonspecific binding of DNA and SxO. 2% of the PEG molecules were biotinylated for the DNA immobilization. The quartz slide and coverslip were sandwiched with a double-sided tape such that a 100 mm gap between the slide and coverslip forms a shallow sample chamber with flow control. Two holes serving as the inlet and outlet of the flow were placed on the slide glass. Typically, a sample chamber holds 10 ml of solution.
Before DNA immobilization, we incubated the biotinylated PEG surface with 0.1 mg/ml streptavidin for 1 min. After washing unbound streptavidin by flowing 100 ml of buffer A (40 mM TrisHCl pH 8.0, 20 mM NaCl, and 0.2 mM EDTA), we flowed the end-biotinylated DNA diluted in buffer A into the sample chamber at a flow rate of 50 ml/min. The concentration of the DNA (typically~10 pM) was empirically chosen to have an optimal surface density for single DNA observation. Immediately after the flow, we further flowed 200 ml of buffer A at the same flow rate, resulting in stretched, doubly tethered DNA molecules (Figure 1a and Figure 1-figure supplement 1a) of which end-to-end extension can be adjusted by the flow rate. We obtained the DNA lengths of around 60-70% of its contour length (Figure 1-figure supplement 2a), which corresponds to a force range of 2-4 pN (Ganji et al., 2016b). We noted that SxO does not exhibit any sequence preference when binding to relaxed DNA, allowing us to back out the amount of DNA localized within a diffraction-limited spot from the total fluorescence intensity.
After immobilization of DNA, we flowed in 30 nM SxO (S11368, Thermo Fisher) in an imaging buffer consisting of 40 mM Tris-HCl, pH 8.0, 20 mM NaCl, 0.4 mM EDTA, 2 mM trolox, 40 mg/ml glucose oxidase, 17 mg/ml catalase, and 5% (w/v) D-dextrose. Fluorescence images were taken at 100 ms exposure time for each frame. The 640 nm laser was used for illuminated for the first 10 frames (for Cy5 localization), followed by continuous 532 nm laser illumination afterwards. From our previous study, we noted that SxO locally unwinds DNA and extends the contour length (Figure 1-figure supplement 1a), but does not otherwise affect the mechanical properties of the DNA (Ganji et al., 2016b). Based on the same previous work and assuming that each intercalating dye reduces the twist at the local dinucleotide to zero, we estimate that roughly 1 SxO is bound on every 26 base-pairs of DNA. We note that the numbers of plectoneme nucleation and termination events along supercoiled DNA were equal (Figure 1-figure supplement 2b), which is characteristic of a system at equilibrium. Furthermore, we verified that increasing the NaCl concentration from 20 mM to 150 mM NaCl did not result in any significant difference in the observed plectoneme density results, indicating that the plectoneme density is not dependent on the ionic strength ( Figure 2figure supplement 1f).

Data analysis
Analysis of the data was carried out using custom-written Matlab routines, as explained in our previous report (Ganji et al., 2016b). Briefly, we averaged the first ten fluorescence images to determine the end positions of individual DNA molecules. We identify the direction of the DNA molecules by 640 nm illumination at the same field of view, which identifies the Cy5-labelled DNA end. Then, the fluorescence intensity of the DNA at each position along the length was determined by summing up 11 neighboring pixels perpendicular to the DNA at that position. The median value of the pixels surrounding the molecule was used to correct the background of the image. The resultant DNA intensity was normalized to the total intensity sum of the DNA for each frame to compensate for photobleaching of SxO. We recorded more than 300 frames, each taken with a 100 msec exposure time, and built an intensity kymograph by aligning the normalized intensity profiles in time. Supercoiled DNA intensity profiles, that is single lines in the intensity kymograph, were converted to DNA-density profiles by comparing the intensity profile of supercoiled DNA to that of the corresponding relaxed DNA. Specifically, the ratio between the cumulative intensities of all the pixels in the right and the left-hand sides of each position of the DNA was first determined. To find the genomic position (i.e. base pair position) of the peak, we compared this ratio with that obtained after torsional relaxation of the molecule of which the pixel position is the same with the genomic position under the given constant tension (Ganji et al., 2016b). The torsionally relaxed intensity profile was obtained after the plectoneme measurements by increasing the excitation laser power that yielded a photo-induced nick of the DNA.
The position of a plectoneme is identified by applying a threshold algorithm to the DNA density profile. A median of the entire DNA density kymograph was used as the background DNA density. The threshold for plectoneme detection was set at 25% above the background DNA density. Peaks that sustain at least three consecutive time frames (i.e.,!300 ms) were selected as plectonemes. After identifying all the plectonemes, the probability of finding a plectoneme at each position (250 bp-long segment) along the DNA in base-pair space was calculated by counting the total number of plectonemes at each position (segment) divided by the total observation time. The probability density is then further normalized to its mean value across the DNA molecule to build a plectoneme density. Note that the plectoneme density represents the relative propensity of plectoneme formation at different regions within a DNA molecule, which is insensitive to the length of the DNA as well as the linking number. Typically, more than 20 DNA molecules were measured for each DNA sample and the averaged plectoneme densities were calculated with a weight given by the observation time of each molecule. The analysis code written in Matlab (The MathWorks, Inc.) is freely available from GitHub (Kim, 2018; copy archived at https://github.com/elifesciences-publications/Plectoneme_ analysis).

Plectoneme tip-loop size estimation and bending energetics
An important component of our model is to determine the energy involved in bending the DNA at the plectoneme tip. We first estimate the mean size of a plectoneme tip-loop from the energy stored in an elastic polymer with the same bulk features of DNA. For the simplest case, we first consider a circular loop (360˚) formed in DNA under tension. The work associated with shortening the end-to-end length of DNA to accommodate the loop is where F is the tension across the polymer, r is the base pair rise (0.334 nm for dsDNA), and N is the number of base pairs. The bending energy is where k B is the Boltzmann constant, A is the bulk persistence length (50 nm for dsDNA). Hence, we obtain an expression for the total energy: Taking the derivative of E total with respect to N and setting it to zero gives the formula: Here, the values of the constants are: So, at 3 pN we get: If the loop at the end of the plectoneme is held at the same length but only bent to form a partial circle, the work needed to accommodate the loop will remain the same but the bending energy will be lower, scaling quadratically with the overall bend angle. For a plectoneme tip, a 240˚loop is sufficient to match the angle of the DNA in the stem of the plectoneme. The preferred length of a 240l oop is therefore: where: Physical model predicting the plectoneme density A full model must explicitly account for the fact that DNA is not a homogeneous polymer. Instead, each DNA sequence has (1) intrinsic curvature and (2) a variable flexibility. Both 1 and 2 depend on the dinucleotide sequences at each location. Note also that we can bend the DNA along any vector normal to the path of the DNA, which describes a circle spanning the full 360˚surrounding the DNA strand. We must therefore specify the direction of bending f when calculating the bend energy, and we define f = f B to be the bend direction that aligns with the intrinsic curvature. The intrinsic curvature can be estimated from the dinucleotide content of the DNA (Figure 3a). Several studies have attempted to measure the optimal set of dinucleotide parameters (i.e. tilt, roll, and twist) that most closely predict actual DNA conformations (Balasubramanian et al., 2009;Bolshoy et al., 1991;Morozov et al., 2009;Olson et al., 1998). We find that the parameter set by Balasubramanian et al., produces the closest match to our experimental data when plugged into our model (Balasubramanian et al., 2009). Using these parameters (see Supplementary file 1), we first calculate the winding ground state path traced out by the entire DNA strand. We then determine the intrinsic curvature, (N,i), across a given stretch of N nucleotides centered at position i on the DNA by comparing tangent vectors at the start and end of that stretch. Tangent vectors are calculated over an 11 bp window (one helical turn,~3.7 nm). Note that the intrinsic curvature, defined by (N,i), also determines the preferred bend direction f B .
The flexibility of the DNA also varies with position. The flexibility of the tilt and roll angles between neighboring dinucleotide has been estimated by MD simulations (Lankas et al., 2003). Using these numbers, we can add the roll-tilt covariance matrices for a series of nucleotides (each rotated by the twist angle) to calculate the local flexibility of a given stretch of DNA. The flexibility also depends on the direction of bending. The summed covariance matrix allows us to estimate a local persistence length A(N,I,f).
By combining the local bend angle (N,i) and the local persistence length A(N,I,f), we are now able to calculate the energy needed to bend a given stretch of DNA to 240˚. When the DNA is bent in the preferred curvature direction, this bending energy becomes: More generally, we can bend the DNA in any direction, in which case the bending energy can be calculated using the law of cosines: N; i ð Þ 240 8 > : The first formula is the special case when f = f B . Because both A(N,i, f) and (N,i) are sequence dependent, the loop size and bend direction that minimizes the free energy will also be sequence dependent. Rather than trying to find the parameters that give a maximum likelihood at each position along the template, we find that it is more efficient to calculate the relative probabilities of loops spanning a range of sizes and bend directions. We first calculate the energy associated with each loop using: We then assign each of these bending conformations a Boltzmann weight: Finally, we sum over all the different bending conformations to get the total weight assigned to the formation of a plectoneme at a specific location i on the template: Because the direction f is a continuous variable and the length of the loop can range strongly, there are a very large number of bending conformations to sum over. However, because of the exponential dependence on energy, only conformations near the maximum likelihood value in phase space will contribute significantly to the sum. For an isotropic DNA molecule, the maximum likelihood should occur at N = 73 and f = f B . We therefore sum over parameter values that span this point in phase space. Our final model sums over eight bending directions (i.e. at every 45˚, starting from f = f B ) and calculates loop sizes over a range from 40 bp to 120 bp at 8 bp increments. We verified that the predictions of the model were stable if we increased the range of the loop sizes considered or increased the density of points sampled in phase space, implying that the range of values used was sufficient.
The following previously published datasets were used: