Long repeating (TTAGGG)n single stranded DNA self-condenses into compact beaded filaments stabilized by G-quadruplex formation

Conformations adopted by long stretches of are the architecture of replication forks, R loops, and other structures generated during DNA metabolism in vivo . This is particularly so if the ssDNA consists of short nucleotide repeats. Such studies have been hampered by the lack of defined substrates greater than ~150 nt, and the absence of high-resolution biophysical approaches. Here we describe the generation of very long ssDNA consisting of the mammalian telomeric repeat (5’-TTAGGG-3’) n as well as the interrogation of its structure by electron microscopy (EM) and single molecule magnetic tweezers (smMT). This repeat is of particular interest as it contains a run of 3 contiguous guanine residues


INTRODUCTION
DNA replication, transcription, recombination and repair all involve the generation of single stranded DNA (ssDNA) segments (1)(2)(3). In some cases, such as resection of DNA ends during repair or creation of an uncoupled replication fork, these ssDNA segments potentially could span several hundred nucleotides or more in length (4). Mixed sequence ssDNA segments are assumed to behave as random coils with some (weak) secondary structure due to local base pairing (5,6). However, if the ssDNA is long and consists of arrays of nucleotide repeats that could fold into macroscopic structures this could profoundly affect subsequent events of DNA metabolism. One well known example is the trinucleotide repeats linked to the triplet repeat diseases in which it is thought that as the replication fork passes these repeats, the ssDNA can fold into long hairpin structures and generate repeat expansions. The telomeric hexanucleotide repeat (5'-TTAGGG-3') found in all mammals and many eukaryotes (7) provides another example of a repeating sequence with the potential of folding into large macroscopic structures when present in a ss state. The telomeric repeat is a member of a subset of repetitive DNAs found widely across nature containing runs of 3 to 4 contiguous guanine bases that as ssDNA, have the ability to form Gquadruplexes (8). Other examples include the replication origin of Kaposi's Sarcoma-Associated Herpesvirus (KSHV) that contains 26 and 22 short repeats each with 3-4 G's each (9) as well as the C9orf72 gene locus in which expansion of 5'-GGGGCC-3' repeats from 2-24 in the genome to sometimes several thousand copies is the cause of the most common form of familial amyotropic lateral sclerosis (ALS) and frontotemporal dementia (10). Gquadruplexes form when runs of two or more guanine residues in a DNA stack on each other in a square planar structure stabilized by Hoogsteen base pairing (8,11,12). They are further stabilized by certain cations, in particular K + (13,14).
The identification of telomeric Gquadruplexes in vitro was first reported in 1989 (15,16).
The existence of Gquadruplexes in vivo was initially recognized by immunohistochemistry (IHC) in the telomeric DNA of the ciliate Stylonychia lemnae (17,18) and later throughout the human genome (19). Only a portion of the human cellular Gquadruplex IHC signal was detected at telomeres consistent with the presence of multiple and likely different repeat sequences throughout the genome capable of forming G quadruplexes (19).
There has been growing interest in the potential role of G quadruplex formation at the ssDNA termini of mammalian and many other eukaryotic telomeres in which the Grich strand extends beyond the double strand DNA (dsDNA)-ssDNA junction as a 3'-ss overhang. In humans, this ssDNA extension is ~150-300 nt in length (7) and studies have pointed to the formation of G quadruplexes in the overhang and their role in inhibiting telomerase as well as regulating binding and action by the shelterins (20).
A 24 nt ssDNA consisting of (3'-TTAGGG-5')4 has been crystalized as a model for the telomeric overhang. In the resulting structure the DNA is arranged into a flat disk 0.6 nm in height and 4 nm on a side with the DNA adopting a parallel Gquadruplex conformation, and with TTA triplets looping out in a propeller fashion (21). However, NMR studies and other solution approaches have argued that the 24 nt units exist in hybrid or anti-parallel conformations suggesting that the parallel arrangement may be promoted by crystallization (22). Potential condensation of G-quadruplex folds into larger structures have been proposed but not experimentally demonstrated (23).
Telomeric repeats also exist in extremely long nucleic acid chains in the form of G-rich transcripts of telomeres termed TERRA (5'-UUAGGG-3')n. In human cells TERRA molecules of nearly 9000 nt in length have been observed (24,25). We previously demonstrated that TERRA is arranged into chains of 24 nt particles joined by 3 nt linkers and provided evidence that these particles are stabilized by G-quadruplex folds (26). Thus, a 24 nt particle stabilized by G-quadruplex formation may be shared by both the telomeric G-rich ssDNA and RNA forms. Moreover some TERRA RNA remains present in telomeric R-loops following transcription in many organisms (27). Telomeric R-loops have been shown to promote homologous recombination (HR) and are prevalent in a significant number of cell lines and tumors exhibiting the Alternative Lengthening of Telomeres (ALT) phenotype (27).
It is important that we understand the kinds of macroscopic folding that ssDNA segments anywhere from 200 to 2000 nt or more in length can adopt at sites of transcription, recombination, or uncoupled replication forks. To do so requires generating very long ssDNA substrates so that any effect of DNA ends on the internal folding is minimized. In this study, we generated very long ssDNA (up to 20,000 nt) composed of the G-strand telomeric repeat (3'-TTAGGG-5'), the C-strand complement repeat (3'-AATCCC-5') and a mutant G-strand repeat (3'-TTAGTG-5'). We then interrogated their structure using EM and single molecule magnetic tweezers (smMT) force extension analysis, two methods capable of providing structural information about very large nucleic acid molecules. We demonstrate that the Gstrand telomeric ssDNA spontaneously condenses into a thick bead-like filament as imaged by EM and resolved by step-wise elongation following force extension analysis by smMT. Moreover, we show that switchable ATP-dependent RAD51 ssDNA binding could be used to probe the structure of these higher-order G-strand configurations.

Preparation of Long G-Strand Telomeric Single Stranded DNA.
Tandem copies of ssDNA containing the human G-strand (5'-TTAGGG-3')n, repeat, the complementary C-strand (5'-CCCTAA-3')n, repeat and a mutant repeat disrupting the three-G run (5'-TTAGTG-3')n were generated using a f29 DNA polymerase rolling circle replication scheme from a mini-circle template as previously described (Supplementary Figure S1a) (28). The size distribution of the ssDNAs was determined by alkaline gel electrophoresis and displayed a broad distribution from 100 nt to >20,000 nt (Supplementary Figure S1b).

G-strand Single Stranded DNA Contains G-Quadruplex Structures.
Numerous studies have indicated that telomeric G-strand ssDNA folds into Gquadruplex structures (15,16,29). CD analysis is frequently used to demonstrate the presence of G-quadruplex structures within DNA (30). However, CD studies require high molar concentrations of a single DNA species and the long ssDNAs prepared by the rolling circle method contain a mixture of many different lengths even following alkaline agarose gel isolation and purification, resulting in very low molar concentrations of any individual species.
Spectra obtained and further discussion of the limitations is presented in the Supplementary Material and Supplementary Figure S2.
As an alternative we utilized structurespecific G-quadruplex Fluorescence Intercalator Displacement (G4-FID) analysis to probe the underlying structure of the long ssDNA generated by rolling circle replication (Figure 1a; (31)). The G4-FID assay was developed from similar assays of double stranded DNA (dsDNA). When thiazole orange (TO) binds DNA, it produces a strong fluorescence signal with high quantum yield while remaining nonfluorescent in the absence of DNA. Displacement from DNA may be monitored as a decrease in fluorescence. This assay is now used widely to determine selectivity of a candidate compound through its ability to displace TO from quadruplex DNA (31-33).
We used the 3,6,9-trisubstituted acridine ligand BRACO-19, which was designed by molecular modeling as a Gquadruplex binding-specific candidate molecule. Previous work has shown that BRACO-19 inhibits telomerase activity, which results in telomere shortening and end-to-end chromosomal fusions in cancer cells as a consequence of quadruplex stabilization at the ssDNA overhang (34,35). Indeed a crystal structure of the human telomeric G-quadruplex and BRACO-19 has been solved (36) and resembles the native unimolecular and bimolecular human telomeric quadruplexes structure (21) bound by TMPyP4; a wellknown porphyrin based G-quadruplex binder (37). BRACO-19 was used earlier by another group as candidate Gquadruplex binder for G4-FID assays (33). We introduced increasing BRACO-19 to long G-strand ssDNA which resulted in reduced TO fluorescence and increased the calculated TO displacement (Figure 1b, blue triangle; EC50 = 0.44 µM). Moreover, the TO displacement of equivalent concentrations of the smaller (5'-TTAGGG-3')4 (G24; Figure 1b, black square; EC50 = 0.43 µM) and (5'-TTAGGG-3')20 (G120; Figure 1b, red circle; EC50 = 0.45 µM) Gquadruplex forming oligonucleotides by BRACO-19 appeared nearly identical to the long G-strand DNA. Importantly, we found that BRACO-19 displaced TO from the Gstrand ssDNA (Figure 1c, black square; EC50 = 0.42 µM) but not the complementary C-strand (Figure 1c, red circle; EC50 = 7.3 µM) or the mutant G-strand ssDNA that disrupts the G-quadruplex repeat ( Figure  1c, blue triangle; EC50 = 7.2 µM). These observations are consistent with the conclusion that the long G-strand ssDNA forms G-quadruplex structures similar to well-defined G-quadruplex forming oligonucleotides that are known to bind the G-quadruplex specific ligand BRACO-19. In contrast, long C-strand and mutant Gstrand ssDNA, which lack the G-quadruplex structure do not bind BRACO-19.

G-Strand Single-Stranded DNA Forms Bead-like Structures.
The G-strand, C-strand, and mutant Gstrand ssDNA's were prepared for EM by direct adsorption to thin carbon supports followed by tungsten metal shadow casting. The C-strand DNA appeared thin and lacked any discernable organization, typical of phage ssDNAs prepared by these methods (Figure 2a). The mutant G-strand DNA (Figure 2b) was also disorganized although these molecules appeared to display more compaction that the C-strand DNA. We determined the width of tungsten metal shadow casting for the C-strand (1.8 ± 0.6 nm, N = 268) and found that it appeared approximately half the width of M13 dsDNA (2.9 ± 0.5 nm, s.e.m., N = 125) (Supplementary Note S2). The results suggest that in these preparations both the C-strand ssDNA and M13 RF dsDNA were coated with 0.9 nm of tungsten metal in order to account for their known ~1 nm and 2 nm width, respectively.
In the EM Images the G-strand ssDNA was remarkably different and consisted of thick condensed fibers with a bead-like appearance that were often linked by thin filaments similar in thickness to the C-strand DNA (Figure  2c,d, Supplementary Figure S3). The diameter of these bead-like particles from multiple preparations revealed two size classes (Figure 3a). The larger bead-like particles (9.4 ± 1.9 nm, s.d.; Figure 2c,d, large arrowheads) when corrected for tungsten metal deposition had an estimated mean diameter of 8.5 nm, while the smaller beadlike particles (6.1 ± 1.5 nm, s.d.; Figure  2c,d, small arrowheads) when corrected for tungsten metal deposition had an estimated mean diameter of 5.2 nm. In numerous cases the larger particles appeared conjoined in a way that appeared to suggest an association of two smaller bead-like particles might generate the larger ones (see also Supplementary Figure S3; Supplementary Note S2). In all of the fields of the G-strand ssDNA some molecules were less organized suggesting that either other conformations may be possible or that the process of forming the beaded structures may be incomplete.
The G-strand ssDNA was subjected to alkaline agarose gel electrophoresis and a region corresponding to 1-1.5 knt was isolated, neutralized and purified. This size-selected G-strand DNA (ave. 1.25 knt) was prepared for EM in the presence of K + cation and the length and number of beads were determined for molecules that appeared fully saturated with mostly larger bead-like particles and no significant linker DNA (Figure 3b, left). Compared to the calculated length of 875 nm for a random sequence 1.25 knt ssDNA (0.7 nm/nt for ssDNA) (38), the mean length of the saturated G-strand ssDNA (69.5 ± 18.9 nm, s.d.; N = 91; Figure 3b, right) appeared compacted 12.6-fold. By dividing the length of each individual molecule by its distinct number of beads we determined the mean length per bead from the binned values (13.7 ± 1.9 nm/bead, s.d.; N = 91; Figure 3c); equivalent to 5 beads per 1250 nt. Together, these observations suggest that there are approximately 250 nt of Gstrand ssDNA per large bead particle.
In the EM experiments described above, the DNA was adsorbed to the thin carbon foils in a buffer containing 2.5 mM spermidine (Experimental Procedures) followed by dehydration through waterethanol solutions and air-drying. While this approach has been found to faithfully reveal structures in DNA in past studies, and provides the highest density of material on the substrate, we also mounted the DNA using a new method being developed in this laboratory. Here the DNA is adsorbed to the supports in a low salt buffer containing no divalent metals or spermidine followed by very rapid freezing, freezedrying and finally rotary metal shadow casting. While the images were not as crisp due to residual salt remaining after freeze-drying, the basic beaded appearance of the ssDNA was the same as when prepared by the procedure used in the experiments above (Supplementary Figure S4).

The G-Strand Single-Stranded DNA is Resistant to Extension Force.
To further examine the properties of the G-strand ssDNA we developed a smMT system (39-41) in which one end of the Gstrand ssDNA was attached to an engineered flow cell surface via a biotin-NeutrAvidin linkage, while the other end of the DNA was attached to a superparamagnetic (SPM) bead via a digoxygenin-antidigoxygenin linkage (42). Controlled positioning of permanent neodymium magnets allowed the application of a well-defined force that results in DNA extension, which was monitored by the location of the SPM bead relative to the flow-cell surface (39,41,43) .
Alkaline agarose gel electrophoresis was used to size-select ~5 knt length Cstrand, G-strand and mutant G-strand ssDNA in which the ligated rolling-circle contained biotin and the 5'-thymine residue of the primer oligonucleotide contained a digoxigenin (Supplementary Figure S1). The C-strand and mutant G-strand ssDNA's displayed a force extension curve consistent with a freely jointed chain model that is characteristic for ssDNA (Figure 4a, black squares with red curve; Supplementary Figure S5) (44).
In contrast, a 5 kb dsDNA displayed a force extension that is consistent with a wormlike chain model as previously described (Figure 4a, red squares with blue curve; Supplementary Figure  S5) (45). Remarkably, the G-strand ssDNA extension stalled at ~600 nm that resulted in a forceextension curve which adopted a sigmoidal character (Figure 4a, blue squares with turquois curve; Supplementary Figure  S5). These results are inconsistent with a mechanical extension that is caused by the natural ordering of a disordered polymer molecule under force (39), but rather implies a compacted ordered DNA structure that is relatively resistant to forces below 12 pN (Figure 4a; Supplementary Figure  S5). Interestingly, at forces above 5 pN (Figure 4a A large positive change in the secondderivative of the extension versus time plot (Figure 4c) was used to identify 434 steps from 62 individual DNA molecules ( Figure  4d). When binned to the resolution of the smMT instrument (10 nm) we observed a distribution of steps that appeared to contain multiple embedded peaks ( Figure  4d). Employing coefficient of determination (46) and root-mean-squared goodness of fit (46) analysis appeared to indicate six embedded Gaussian peaks that when smoothed describe the binned histogram data as corresponding to six step-release sizes (Figure 4d; Supplementary Figure  S6).
Interestingly, we observed a weak positive trend between the step size and the introduced force, with the largest steps occurring at higher forces (Supplementary Figure S7). It is important to note that even at very slow load rates the release of folded G-quadruplex structure only begins to occur at forces above 20 pN and peaks at 35-55 pN, depending on the force spectroscopy method (47,48). Together, these results seem to imply that the application of magnetic force results in the step-wise release of higher-order structure(s) within the G-strand ssDNA, while maintaining the intrinsic array of Gquadruplex folds.

Modeling
Several models have been proposed for the structure of the 3'-ssDNA overhang at the terminus of mammalian telomeres, which are largely based on parallel or hybrid G-quadruplex structures and result in somewhat different outcomes (23). Multimeric G-quadruplex structures are largely based on beads-on-a-string models (29), where multiple G-quadruplex units formed from (3'-TTAGGG-5')4-20 are arranged into individual thermodynamically unique beads-on-a-string that do not interact with one other, can move freely and are only restricted by their TTA linker. These models are founded on studies of limited length DNAs (120 nts or less) and do not address the possibility of a variation in the length of intervening repeat ssDNA or interactions between G-quadruplex units via stacking interactions of the G-tetrad cores (23). Perhaps more importantly there are several types of G-quadruplex folds as well as suggestions that these different folds may exist together under identical conditions (49). Nevertheless, once folded all G-quadruplex structure appear exceedingly stable (50).
To understand the distribution of stepsizes in the force extension analysis we performed G-quadruplex folding simulations with variable lengths of G-strand ssDNA (Supplementary Figure S8). In these simulations, a G-quadruplex may fold randomly along the length of an ssDNA between four contiguous telomeric repeats. The type of G-quadruplex fold is irrelevant for these simulations, although we assume that once formed these folds are stable and no subsequent remodeling may occur. These random folding events result in an array of G-quadruplex structures along the G-strand ssDNA that are separated by 0, 1, 2, or 3 repeats with distinct probabilities (Supplementary Figure S8).
We determined that a G-quadruplex embedded in a 1 knt or 6 knt G-strand ssDNA is on average separated from the adjacent G-quadruplex by one complete telomeric repeat (Supplementary Figure  S8). When subjected to smMT at forces below those that disrupt the G-quartet structure, a single G-quadruplex may be pulled from two different corners that adds different lengths depend on the type of fold (51). This corner-to-corner extension would account for 0.6-1.5 nm that with a free TTA tail (2.1 nm) and on average one telomeric repeat (4.2 nm) totals of 6.9-7.8 nm. The projected force extension peaks appear to be tetramer multiples of the ~7.3 nm unit (calculated: 29 nm, 58 nm, 87 nm 117 nm and 146 nm; observed 33 nm, 55 nm, 77 nm 117 nm and 142 nm; Figure 4d). The extra 99 nm (or 77 nm) extension peaks might reflect intermediate extensions and/or over-fitting of multiple Gaussian curves.
A plausible model for the higher-order structures exploits the possibility that random condensation along G-strand DNA results in G-quadruplexes that are separated by, on average, one additional G-strand repeat (Figure 5a). These intervening repeats contain guanine nucleotides that could form Hoogsteen base pairing with nearby intervening Grepeats that are not folded into a Gquadruplex (11,14). It is our hypothesis that four intervening repeats condensed into a G-residue pseudo-fold, which is the basis for stable higher-order packing (Figure 5b, dark blue box).
In this scenario four intervening repeats could come from any combination of 1-3 repeats between two embedded G-quadruplexes linked with the additional necessary nearby intervening repeats such that a total of four repeats are condensed into a G-residue pseudo-fold. The extension of these Gresidue pseudo-fold higher-order structures under the force conditions utilized in our studies would leave the bona fide Gquadruplexes intact while stretching the terminal TTA linkers as well as the intervening G-strand repeats on average ~29 nm (Figure 5c). Such a G-residue pseudo-fold formed between intervening repeats could account for the increased force required to extend the G-strand ssDNA that is significantly more than random-coil ssDNA but less than a true Gquadruplex fold (Figure 4d). Importantly, this model predicts the G-residue pseudofolds formed between multiple intervening repeats is more exposed than the G-strand ssDNA folded into a compact Gquadruplex.

RAD51 systematically alters higher order G-quadruplex associations
To further probe the structural characteristics, we examined the force extension properties of G-strand ssDNA in the presence of the central HR component RAD51.
Unlike strong ssDNA binding proteins that singularly disrupt Gquadruplex folds formed in ssDNA (52), the RAD51 protein displays a switchable weak ssDNA binding in the absence of ATP (or in the presence of ADP), that is transformed into a robust nucleoprotein filament in the presence of ATP which firmly coats a ssDNA (53). This cycle of RAD51 ATP binding and hydrolysis has been connected to nucleoprotein filament reordering and turnover during the HR homology search (54) and may play a role in ALT telomere maintenance (55).
Importantly, the formation of an ATP-bound RAD51 nucleoprotein filament appears capable of generating sufficient binding forces to provoke nucleosome disassembly in model chromatin substrates (56).
We first determined that multiple cycles of magnetic extension and release of a single G-strand ssDNA resulted in largely identical force extension curves (Figure 6a;  Supplementary Figure S9a).
This observation strongly suggests that the Gstrand ssDNA spontaneously re-folds into comparable stable higher order structures following release from magnetic extension. When RAD51 is included in the force extension analysis without ATP (-ATP) the curves gradually flatten following multiple extension and release cycles, demonstrating significantly more extension at low forces (Figure 6b; Supplementary  Figure S9b).
Combined with the Gquadruplex modeling (see Modeling above and Supplementary Figure S8), we interpret these results to imply that RAD51 increasingly binds to intervening ssDNA repeat(s) embedded between G-quadruplex folds; eventually inhibiting spontaneous condensation into the G-residue pseudofolds that constitute the higher order structures. We note that the weak and reversible ssDNA binding activity of RAD51 (-ATP) does not appear to fully inhibit higher order structure re-folding and results in a DNA-protein complex that does not exhibit ssDNA or dsDNA force extension properties (Figure 6b; Supplementary  Figure S9b). In contrast, RAD51 in the presence of ATP (+ATP) appears to rapidly form a stable nucleoprotein filament that extends longer than a similar length dsDNA at low forces and displays no intermediate force extension curves regardless of the number of extensions (Figure 6c;  Supplementary Figure S9c). The longer extension of the RAD51 nucleoprotein filament is consistent with historical studies that demonstrated a helical ATP-bound RAD51 nucleoprotein filament extends an ssDNA 1.5x the length of a corresponding dsDNA (57,58).
To confirm the hypothesis that RAD51 form a stable filament that fully extends the G-strand ssDNA we examined the RAD51 nucleoprotein filament in the presence of ATP by EM following negative staining. We observed fields of extended RAD51 nucleoprotein filaments with a helical pitch of 10.7 nm (n= 59 measurements) in agreement with the previous studies (Figure 7). Similar filaments were not present when G-strand ssDNA was incubated with RAD51 in the absence of ATP (not shown).
We conclude that RAD51 forms a stable nucleoprotein filament in the presence of ATP that fully extends the G-strand ssDNA, unraveling both the embedded G-quadruplex folds and the higher-order G-residue pseudo-fold structures.

DISCUSSION
Extending the structural studies of short (~20-150 nt) G-strand repeat ssDNAs to the analysis of much longer G-strand repeat ssDNAs has been hampered by the lack of well-defined substrates. To overcome this technical issue, we further developed a rolling circle replication system as previously described (59,60) and adapted it for generating long ssDNA containing distinct repeating nucleotide sequences. This method allowed us to produce G-strand, C-strand and mutated Gstrand ssDNAs containing different forms of the human telomeric repeat sequence ranging in size from 100 to >20,000 nt (Supplementary Figure S1).
G4-FID analysis confirmed the presence of G-quadruplex structures within the long human G-strand ssDNA (5'-TTAGGG-3')n, but not in the C-strand ssDNA (5'-AATCCC-3')n or a mutant Gstrand ssDNA (5'-TTAGTG-3')n that interrupts the contiguous G-triplet within the telomere repeat. Using EM we showed that very long human telomeric G-strand ssDNA spontaneously condenses into chains of large discrete bead-like particles with 5 nm and 8 nm diameter.
These bead-like particles compact the ssDNA nearly 12 fold in length. Using size-selected DNA we determined that the larger beads contain ~240 nt and inferred the smaller beads contained half this amount (~120 nt).
Unlike EM, which requires a final dehydration step, smMT studies may be performed in solution at physiological ionic strength and with predominantly intracellular K + cation. Force extension analysis indicated several discrete G-strand ssDNA elongation step sizes. Because it requires extension forces >20 pN to unravel a folded 24 nt G-quadruplex (47,48), we interpreted these results to indicate that the extension steps represented fundamental units of higher-order interaction structures. The weak positive correlation between step size and extension force (Supplementary Figure S7) suggests that the higher-order structures are intrinsically more stable the larger they become. In EM fields of the compacted G-strand ssDNA we noted some particles appeared less organized than others. This might suggest that in addition to the principal structures other arrangements may be possible that might contribute to the force-extension "noise" that extends outside the baseline 10 nm resolution of the smMT instrument. Taken together these observations represent the first physical evidence for intra-molecular higher-order structures formed by G-strand ssDNA containing embedded G-quadruplexes. These findings appear broadly applicable for the long repetitive blocks of G-rich sequences present in many genomes including the human genome. We calculated that the major tetramer and octamer higher order structures observed as initial step sizes by smMT would on average contain ~120 nt (four Gquadruplexes plus four extra repeat loops) and ~240 nt (eight G-quadruplexes and eight extra repeat loops), respectively. The amount of the octamer higher order structure DNA content is remarkably consistent with the calculated amount of ssDNA associated with the large EM beads (247 nt), and by extension the DNA content of the smaller beads that appear approximately half the size. These two initial smMT extension steps account for 63% of the total events, which appears consistent with the two major bead sizes observed by EM analysis.
We developed a model based on the stochastic formation of G-quadruplexes, followed by condensation of intervening Grepeats (Figure 5). An inter-molecular side-by-side pairing of a G-repeat ssDNA, termed G-wires, has been described that utilized small oligonucleotides (5'-GGGGTTGGGG-3') (61). In theory, G-wires might provide a plausible structure for the association(s) of intervening telomeric ssDNA repeats. However, we note that the inter-molecular G-quadruplex folds predicted by G-wires should display comparable stability to force-extension as bona fide intra-molecular G-quadruplexes. A less robust interaction between the intervening repeat G-residue pseudo-folds appears more consistent with our studies, which showed both magnetic force and the weak ssDNA binding activity of RAD51 (-ATP) were capable of disrupting the higherorder structures. For example, we interpret the observation that RAD51 (-ATP) increasingly inhibits condensation following multiple release of magnetic force as consistent with increased shielding of intervening repeat ssDNA by weakly bound RAD51; supporting the intervening repeat G-residue pseudo-fold condensation model (Figure 5), We performed extensive nuclease digestion analysis of the long G-strand ssDNA exploiting multiple nucleases with the goal of isolating higher order G-strand structures (not shown). However, under the best conditions we observed a 24 nt band consistent with a protected Gquadruplex along with a smear of higher molecular weight ssDNA. We concluded that the higher-order G-strand structures are only partially resistant to nucleases, which also appears consistent with exposed features of intervening repeat Gresidue interactions as proposed in our model ( Figure 5).
We calculated the probability (P) that a higher-order G-strand structure might condense within various lengths of human telomere 3'-ssDNA by assuming that a minimum of four individual ssDNA G-strand repeats separated by at least three Gquadruplex folds are required to form a stable particle (Supplementary Figure  S8e; Methods) (62). This analysis suggested that a significant fraction of 100 nt (P = 0.13) and 200 nt (P = 0.32) human telomeric G-strand ssDNA may stochastic fold into G-quadruplexes leaving at least four intervening G-repeats, which could condense into a G-residue pseudo-fold. This probability dramatically increased with increasing ssDNA length (Supplementary Figure S8e). Moreover, these estimates likely represent a minimum since we do not include the possibility that adjacent ssDNA repeats might nucleate into a G-residue pseudo-fold or consider any potential phasing by the telomere ssDNA-dsDNA junction. Taken as a whole, these studies suggest that in the absence of other factors, higher-order G-strand structures may form naturally on a 3'-ssDNA (63,64).
We consider the possibility that a higher order telomeric G-strand structure could serve as a molecular switch that might inhibit or facilitate the interactions of telomeric protein factors such as the shelterin protein complex or telomerase. Compaction into these large structures could aid in shielding the telomere from double strand break (DSB) sensing and repair factors prior to binding by telomeric protein components (65), or they might act as a roadblock for telomerase mediated telomere elongation (13). Cech and colleagues have examined a 144 nt (3'-TTAGGG-5')n ssDNA bound by hPOT1 and found highly compact DNA-protein particles, suggesting that POT1 is likely to disrupt any higher order G-strand condensation. Indeed POT1 and its partner TPP1 appear to overcome G-quadruplex roadblocks during telomere metabolic processes (66).
The self-condensation of long segments of ssDNA containing runs of G's could have significant global consequences for replication, recombination and repair. As noted above, if the G-rich strand at the C9orf72 locus (C2G4)n assumed a higher order conformation, this could not only induce replication fork stalling, but in itself might lead to further expansion by mechanisms related to replication restart, recombination or repair.
Indeed, the expanded repeat has been shown to cause replication fork stalling in vitro (67). The KSHV replication origin contains ~260 bp of G-rich repeats on one strand. Were this Grich strand to condense into a higher order structure, this would likely generate a severe bend or knot at the KSHV replication origin with the C-rich strand present in a more open unstructured state. The net result would likely be a complex scaffold with very different protein binding properties than the linear dsDNA.
It will be of interest to generate other long ssDNAs including ones containing the G-rich C9orf72 repeat and the KSHV origin repeats and examine their architecture using the biophysical methods described here.

EXPERIMENTAL PROCEDURES Rolling Circle Replication
Telomeric ssDNA circles were synthesized using oligonucleotides with either wild type G-strand, mutant G-strand, or C-strand sequences (see below) containing 20 telomeric repeats (IDT, Coralville, IA) as described earlier (28). In brief, oligonucleotides were ligated into circles using CircLigase TM and any remaining linear oligonucleotides were digested with ExoI and ExoIII (New England Biolabs Inc., Ipswich, MA) according to the manufacturer's recommendations (Epicentre Biotechnologies, Madison, WI). Circular products were confirmed by 7% denaturing PAGE.
For size selection of long G-strand DNA we used alkaline agarose gel electrophoresis. In brief, 0.8% alkaline agarose gels were prepared in alkaline buffer (10X buffer contains 10N NaOH and 0.5M EDTA pH-8.0). The sample DNA was digested with S1 nuclease and mixed with alkaline gel loading buffer.
After electrophoresis, the gel was soaked in neutralizing solution (containing 1M Tris-HCl and 1.5M NaCl), stained with SYBR® Green I Nucleic Acid Gel Stain (Invitrogen) followed by isolation of target DNA length using a QIAquick gel extraction kit (Qiagen).

G-Quadruplex Fluorescent Intercalator Displacement (G4-FID)
A constant temperature (20°C) SPEX Fluorolog-3 (Horiba) spectrofluorometer with thermostated cell holders (3 ml) was used to perform G4-FID studies in 10 mM Li-cacodylate buffer (pH 7.3) and 100 mM KCl. Briefly, 0.25 µM pre-folded DNA in Licacodylate-KCl buffer was mixed with thiazole orange (TO) (0.50 µM). Each ligand addition step (from 0.5 to 10 equivalents) was followed by a 3 min equilibration period after which the fluorescence spectrum was recorded. The percent displacement was calculated from the fluorescence area (FA, 510-750 nm, λex = 501 nm), using the following equation: TO displacement (%) = 100−[(FA/FA0)×100], where FA0 is the fluorescence of TO bound to DNA without added ligand. The TO displacement (%) was plotted as a function of the concentration of added ligand.

Electron Microscopy
DNA was adsorbed onto grid supports covered with a thin glow discharge treated carbon film for 30 sec to 1 min in the presence of the buffer present with the DNA, or a buffer containing 2.5 mM spermidine, 10 mM Tris-HCl, 50 mM KCl, 75 mM NaCl and 1 mM MgCl (pH 7.5). Samples were washed in water followed by a series of ethanol dehydration steps, airdrying, and rotary shadow casting with tungsten at 1x10 6 torr (68). Samples were visualized using a Tecnai 12 TEM (FEI Inc., Hillsboro, OR) at 40kV and images were collected with a Gatan Orius CCD camera (Gatan Inc., Pleasanton, CA) with Digital Micrograph supporting software (Gatan Inc., Pleasanton, CA).
Dimensions of telomeric DNA filaments were measured from digital micrographs using Digital Micrograph (Gatan Inc., Pleasanton, CA) and ImageJ (NIH, Bethesda, MD). Negative staining of the G-rich ssDNA bound by human RAD51 (purified as previously described (56)) was carried out by incubating 50 ng of G-rich ssDNA with 1500 ng of RAD 51 in a buffer containing 20mM HEPES (pH-7.5), 10% glycerol, 0.5mM DTT, and 2 mM MgCl2 for 30 min at 37 degrees in the presence or absence of 2.5 mM ATP. Drops of the sample were adsorbed to glow charged thin carbon supports for 3 min followed by washed with 2% uranyl acetate, air drying and imaging in a Tecnai 12 as above at 80 kV. Images for publication were arranged and contrast optimized using Adobe Photoshop CS5 (Adobe Systems, San Jose, CA).

Magnetic Tweezers Preparation
Flow cells were engineered with glass cover slides affixed with double-sided tape to an aluminum foundation that maximized SPM bead imaging. Telomeric C-strand, Gstrand or mutant G-strand ssDNA (30 pM final) was mixed with bead formation buffer (50 mM Tris-Cl (pH-7.6), 100mM KCl). The combined sample was boiled for 10 minutes followed by snap chilling on ice for 1-2 hours. Prior to attachment, the glass slides were treated with (3-Aminopropyl) triethoxysilane followed by a 1:100 mixture of Biotin-PEG SVA to mPEG-SVA (Invitrogen).
Nuetravidin (500 µm, Invitrogen) was injected in the flow cell at a rate of 8 µl min -1 , followed by the ssDNA. Tosyl activated M-280 SPM Dynabeads (ThermoFisher Scientific) were coated with anti-digoxigenin antibodies (Roche). Stock beads were removed and resuspended in a 0.1 M Borate, pH 9.5 (Buffer A). Beads were then resuspended in a mixture of Buffer A and anti-dig at a 20µg: 1mg, antibody: beads ratio. The beads were incubated for 12-17 hours at 37°C with slow tilt rotation. 3M Ammonium Sulphate in Buffer A (Buffer B) was mixed into the beads. The beads were then resuspended in 10 mM Na/K-Phosphate (pH 7.4), 140 mM NaCl/KCl (PBS) with 0.5% (w/v) Acetylated BSA (Sigma) (Buffer C). After incubation at 37° C, 1 hr, the coated beads were resuspended in PBS (pH 7.4) with 0.1% (w/v) Acetylated-BSA (Buffer D). Buffer D was removed and then added to reach the desired bead concentration (2 × 10 9 beads mL -1 ).
Anti-digoxigenin coated beads were mixed with Running Buffer [50 mM Tris-Cl (pH-7.6), 100mM KCl, 200 µg µL -1 Acetylated BSA, 0.0025% Tween-20 (Amresco)], and injected into the flow cell containing the ssDNA that was bound to the surface via a biotin-neutravidin linkage; at 8 µl min -1 while agitating the system. The bound ssDNA was washed extensively with running buffer to remove free SPM beads prior to analysis.
Force extension measurements used four 1 cm 3 rare earth magnets (Neodynium, Magcraft). The SPM beads were imaged using a 530 nm LED lamp (Thorlabs), a 100X Olympus oil immersion objective and images collected on a 1024 x 1024 pixel charge coupled device (CCD) camera (Grasshopper Express 1.0 MP Mono FireWire 1394b) at a frame rate of 70 ms.
The human RAD51 protein was purified and stored as previously described (52). Force extension analysis was performed in Running Buffer additionally containing 2 mM MgCl2, 100 µM Dithiothreitol (DTT) and human RAD51 (500 nM) with or without 1 mM ATP (Roche).

Magnetic Tweezers Data Analysis
SPM bead analysis was performed with the 3D bead tracking software Video Spot Tracker (CISMM at UNC-CH). Displacement events were determined using the software Edge Detector (CISMM at UNC-CH) and MATLAB and Statistics Toolbox Release 2014b (The MathWorks, Inc., Natick, MA). Edge Detector was edited to avoid scoring negative displacement as these can bias the value of large positive displacement.
Data was smoothed with the Savitzky-Golay filter 3 and an average window size of 20 points using Origin software (OriginLabs, Northhampton, MA).
Distributions for displacement events were binned to the resolution size of step size (~10 nm) and

Simulation of Multiple Quadruplex Formation
Define an n-dimensional array 9, composed of ai, i = 1, …, n where ai =0 for all i.

Define Algorithm A:
Create n random numbers in an ndimensional array 9, composed of ui, i = 1, …, n where ui ~ iid U([0,1]). are also random numbers from different distributions that are dependent on n. By definition,

Calculating the Probability of Higher-Order G-strand Structure Formation
A solution to the likelihood that a human telomere G-strand repeat sequence (TTAGGGn) supports an environment compatible with higher-order structure formation begins by defining a condition where four consecutive repeats (TTAGGG4) fold into a G-quadruplex, followed by 1-3 repeats (TTAGGG1-3; termed: a gap). This chain of G-strand repeats can be thought of as a "unit" containing a G-quadruplex fold with adjacent gap, surrounded by Gquadruplex folds containing no gaps. Structural analysis has suggested that three gaps with an adjacent G-guadruplex (3 units) plus one additional gap of 1-3 Gstrand repeats are minimally required to form a higher-order structure. Thus, we can reframe the probability of a nucleotide sequence having conditions compatible with higher-order structure formation as the probability that at least four successive gaps will occur within the G-strand sequence. If we term developing a single gap as a "success", then Feller (62) described a simple method for determining the probability of at least r consecutive successes (gaps) in n Bernoulli trials (total gaps): where, p is the probability of success of a single gap, q = 1-p, and x is the real root: 1 − + * \ * \I@ = 0 which, cannot equal 1/p. In our case, the success probability (p) depends on total probability of 1-3 repeats occurring in a defined DNA length, and can be empirically determined from simulated quadruplex formation. As an example, for 120 nt of 3'-ssDNA: total probability = 0.28 (1 repeat) + 0.23 (2 repeats) + 0.14 (3 repeats) = 0.65. From simulation, we can also calculate n total gaps in m 3'-ssDNA nucleotides from the average gap size (Ave) expanded to the unit size. where m/6 is the maximum number of Grepeats in m nucleotides of 3'-ssDNA. As an example, for 120 nt of 3'-ssDNA containing a hexameric repeat (TTAGGG): (120 ÷ 6) * 1 / [4 (repeats per Gquadruplex) + 1.15 (average repeats between G-quadruplexes)] = 3.88. For simplicity, 3.88 is rounded to the nearest integer 4; effectively making the resulting probability (P) calculation a minimum.
From eq.1 we can then determine the probability of at least 4 consecutive gaps in m nucleotides. As an example, for 120 nt: q * p 4 = 0.35 * (0.65) 4 = 0.06; x = 1.10; leading to a probability P(r=4,n~4) = 0.16. These results suggest that within a population of 120 nt 3'-ssDNA 16% will stochastically condense into a higher order G-strand structures. As expected, eq. 1 becomes more accurate as n increase. The probability of higher-order G-strand structure versus 3'-ssDNA length is plotted.

SUPPORTING INFORMATION
Supporting Information is available online.     illustration of G-quartet as a black box, extended connector non-G-residue nucleotides between G-quadruplex folds (red) and non-folded G-rich repeats between G-quadruplex folds (blue dots). G-rich repeats may fold into several types of G-quadruplex structures (parallel, anti-parallel, hybrid, etc.) with non-G-residue nucleotide loops connecting the corners of the hypothetical box. In an array of G-quadruplex folds, regardless of the type of folded structure, the non-G-residue nucleotides of the last G-rich repeat may extend to the adjacent Gquadruplex. For the human telomeric repeat (TTAGGG)n the non-G-residue nucleotides TTA (red) comprise this extended connection, with on average at least one non-folded G-rich repeat (TTAGGG; blue dots). b) The results suggest that multiples of four non-folded intervening Grich repeats (blue dots) may condense into G-residue pseudo-folds (blue box). The condensation into G-residue pseudo-folds is expected to be completely independent of the type of G-quartet fold(s) within the G-quadruplex array since they involve only the intervening G-rich repeats. c) Under magnetic force G-residue pseudo-folds may extend in multiples of ~29 nm. Figure 6. The binding of RAD51 to G-Strand ssDNA. a) Representative force-extension of a single G-strand ssDNA in which the applied force is released following extension and then reapplied for a total of 3 successive cycles. b) Representative force-extension of a single Gstrand ssDNA in which the applied force is released following extension and then re-applied for a total of 3 successive analysis in the presence of human RAD51 and in the absence of ATP (-ATP). c) Representative force-extension of a single G-strand ssDNA in which the applied force is released following extension and then re-applied for a total of 3 successive analysis and in the presence of ATP (+ATP).