i-Motif formation and spontaneous deletions in human cells

Abstract Concatemers of d(TCCC) that were first detected through their association with deletions at the RACK7 locus, are widespread throughout the human genome. Circular dichroism spectra show that d(GGGA)n sequences form G-quadruplexes when n > 3, while i-motif structures form at d(TCCC)n sequences at neutral pH when n ≥ 7 in vitro. In the PC3 cell line, deletions are observed only when the d(TCCC)n variant is long enough to form significant levels of unresolved i-motif structure at neutral pH. The presence of an unresolved i-motif at a representative d(TCCC)n element at RACK7 was suggested by experiments showing that that the region containing the d(TCCC)9 element was susceptible to bisulfite attack in native DNA and that d(TCCC)9 oligo formed an i-motif structure at neutral pH. This in turn suggested that that the i-motif present at this site in native DNA must be susceptible to bisulfite mediated deamination even though it is a closed structure. Bisulfite deamination of the i-motif structure in the model oligodeoxynucleotide was confirmed using mass spectrometry analysis. We conclude that while G-quadruplex formation may contribute to spontaneous mutation at these sites, deletions actually require the potential for i-motif to form and remain unresolved at neutral pH.


INTRODUCTION
Regions of GC Skew in eukaryotic genomes, where dG residues are generally confined to one strand and dC residues are confined to the other, are prone to the formation of several types of non-B DNA structure. Among the more common of these are regions that conform to a gen-eral sequence motif: d(G 3+ N 1-7 G 3+ N 1-7 G 3+ N 1-7 G 3+ ). This motif has been used to detect genomic sites with potential to fold into G-quadruplex structures on the G-rich strand in regions of GC Skew (1). Since G-quadruplex structures are strongly stabilized by the presence of potassium (2), they are favored under physiological conditions. Nevertheless other alternative DNA structures such as triplexes (3) and i-motifs (4) can also form at sites conforming to this general sequence motif.
Genomic sequence comparisons across Eukarya show that sequences conforming to this motif have increased in frequency during the evolution of ever more complex eukaryotic organisms (5,6) suggesting that they are under positive selective pressure and may serve one or more biological functions. Since they are often present in promotor regions (7) there has been considerable interest in the possibility that they play roles in transcription and other epigenetic phenomena. Evidence in support of these ideas has been obtained in a number of studies. Triplex formation at the MYC promoter has been implicated in transcription regulation at this gene (3), and i-motif formation at the BCL2 gene has been shown to play a role in transcription at that locus (8). Moreover, formation of stable structures at these sequences has been shown to promote methylation in their vicinity (9) and disrupt the formation of repressive chromatin structures (10). G-quadruplexes, for example may influence DNA methylation patterns via the direct inhibition of DNA methyltransferase I (11). It has also been shown that G-quadruplex formation on the G-rich strand in a region of GC Skew may indirectly influence DNA methylation patterning by stabilizing R-loop RNA:DNA hybrids on the C-rich DNA strand, since RNA:DNA hybrids have also been shown in vitro to inhibit DNA methyltransferase I (12,13).
In normal cells, G-quadruplex formation is likely to be transient, given the constellation of G-quadruplex resolvases known to be expressed in normal mammalian cells.
Among these are the RecQ helicases encoded by the BLM (14) and WRN genes (15) and the helicase encoded by the Fanconi Anemia locus FANCJ (16). Further, during DNA replication in normal cells the Y-polymerase family genes (REV1, POLK and POLH) express polymerases capable of replication through G-quadruplex DNA. Rev1 actually disrupts G residues in the quadruplex and inserts a dC residue on the nascent strand (17) while, Pol and Pol also appear to function in replicating through G-quadruplex forming sequences (18). Even so, replication through extended regions of G-quadruplex potential like the d(GGC) n repeat at FRAXA in disease carriers appears to be able to induce replication fork collapse and chromosome breaks and subsequent deletion and insertion prone repair (19).
In the most extensive study of quadruplex-linked replication impediments to date, the dog-1 mutation in Caenorhabditis elegans was shown to promote genome wide deletions confined to oligo dG sequences (20,21). The mutation at dog-1 knocks out the C. elegans homolog of human FANCJ, which is known to resolve G-quadruplex sequences (16), thus implicating unresolved G-quadruplex sequences as replication impediments at multiple genomic locations. The deletions in C. elegans were shown to occur 5´-to the oligo dG strand (i.e. the G-rich strand), and often deleted much of the oligo dG element itself at each site (20). Deletion frequency also increased with the length of the oligo dG element (20).
Interestingly, the oligo dC strand has been shown to form an i-motif quadruplex at neutral pH in vitro (22). These sequences form optimally when they conform to a (4n -1) length rule for n > 2 (22). That is to say, the potential for i-motif structure at pH 7 in vitro is greatest at lengths of 11, 15, 19 and 23 dC residues, since the pH T (i.e. the pH at which half the population of sequence adopts an i-motif) is 7.2 at sequence lengths conforming to this (4n -1) rule. Data from the dog-1 experiments shows that at lengths of 15 and 19 the deletion frequency appears to be suppressed but not at sequences of greater length (20). For short sequences of this type this is consistent with evidence showing that G-quadruplex and i-motif formation at shorter sequences can be mutually exclusive due to steric hindrance while other, often longer sequences, permit the formation of both structures offset from one another (23)(24)(25). Taken together, this suggests that i-motif formation in oligo dC at lengths of 15 and 19 suppresses G-quadruplex formation in C. elegans and thus hinders dog-1 induced deletions, while at longer d(G:C) sequences the formation of i-motifs does not prevent formation of G-quadruplexes resulting in dog-1 induced deletions. For this interpretation to be correct, C. elegans must either retain a repair system that resolves i-motif structures, or i-motif structures do not form replication impediments in C. elegans.
In previous work, we devised a method designed to clone sequences from stalled replication forks at sites in DNA where DNA polymerase and CMG helicase uncoupling is present prior to DNA isolation (Supplementary Figure S1). Double strand fragments originating near a putative CMG helicase halt site were often recovered associated with a non-B structure forming element generally detected within 1800 bp of the cloned sequence (26). One of the sequences detected in that screen, d(TCCC) 9 , was shown to be in an open conformation in PC3 cells as judged by bisulfite accessibility in native chromosomal DNA, and was characterized by short deletions in the PC3 prostate tumor cell line (26). Since the sequence was capable of forming an i-motif at neutral pH in vitro, this raised the possibility that the imotif structure may form a replication impediment that is not suppressed in human prostate cancer cells. The absence of a functional p53 gene in the PC3 cell line is consistent with the possibility that i-motif suppression by DNA repair systems could be defective in this repair compromised cell line.
In this report we determined the tendency of the d(TCCC) n sequences to from i-motif structures and the tendency of the d(GGGA) n complementary sequence to form G-quadruplex at neutral pH in vitro. Since the d(TCCC) n element occurs with different lengths at multiple chromosomal locations in the human genome we compared these in vitro results with the frequency of deletions at sites ranging from d(TCCC) 5 to d(TCCC) 15 observed in the PC3 cell line. We found that G-quadruplex formation in vitro was spontaneous at physiological pH over the range tested d(GGGA) 4 to d(GGGA) 14 while i-motif formation at neutral pH increased over the range d(TCCC) 5 to d(TCCC) 15 . Deletion frequency in the PC3 cell line reached 100% of cloned representatives when the pH T observed in vitro for i-motif formation exceeded pH 7. Moreover, deletions were confined 5´-to the d(TCCC) n element (i.e. only on the C-rich strand) in each gene examined.

Cell culture
Cell culture methods and preparation of DNA from the PC3 cell line were as previously described (27).

DNA isolation
DNA isolation methods have been previously described (27).

PCR methods
All PCR Primers were obtained from Integrated DNA Technologies (San Diego, CA).

Bisulfite mediated deamination of native genomic DNA
Genomic DNA isolated from the PC3 cell line was treated with bisulfite using the EZ methylation Kit (Zymo Research, Irvine CA). For these experiments, DNA was added to the conversion reagent without M-dilution buffer (a denaturant) or heat treatment in order to preserve the native structure present in the isolated DNA. The reaction mixture was incubated in the dark at 37 • C for 16 h. Thereafter, the kit protocol including the desulfonation step was carried out in order to recover DNA for subsequent PCR amplification of the selected region in the RACK7 gene.

In vitro analysis of d(TCCC) n and d(GGGA) n oligodeoxynucleotides
Oligonucleotides. were purchased from IDT (San Diego, CA) or Eurogentec (Liege, Belgium) and were HPLC puri-fied. Solid DNA samples were initially dissolved as a stock solution in purified water (1 mM); further dilutions were carried out in the respective sodium cacodylate buffer. Samples were thermally annealed in a heat block at 95 • C for 5 min and cooled slowly to room temperature overnight.
UV absorption spectroscopy. UV spectroscopy experiments were performed on a Cary 60 UV-Vis spectrometer (Agilent Technologies) and recorded using low volume masked quartz cuvettes (1 cm path length). ODNs were diluted to 2.5 M in buffer at the desired pH. Samples (200 l) were transferred to a cuvette and stoppered to reduce evaporation of the sample. The absorbance of the ODN was measured at 295 nm as the temperature of the sample was held for 10 min at 4 • C, heated to 95 • C at a rate of 0.5 • C per min, then held at 95 • C for 10 min before the process was reversed; each melting/annealing process was repeated three times. Data were recorded every 1 • C during both melting and annealing and melting temperatures (T m ) and annealing temperatures (T a ) were determined using the first derivative method. This analysis was performed independently on each of the three melting curves collected for each ODN and the values presented are the average and standard deviation of these. TDS were calculated by subtracting the spectrum of the folded structure between 220 and 320 nm at 4 • C from that of the unfolded structure at 95 • C. The data was normalized and maximum change in absorption was set to +1 as previously described (28).
Circular dichroism. CD spectra were recorded on a Jasco J-810 spectropolarimeter using a 1 mm path length quartz cuvette. ODNs were diluted to 10 M (total volume: 200 l) in buffer at pH increments of 0.25 pH units from 5.0 to 8.0. The scans were recorded at room temperature between 200 and 320 nm. Data pitch was set to 0.5 nm and measurements were taken at a scanning speed of 200 nm/min, response time of 1 s, bandwidth of 2 nm and 100 mdeg sensitivity; each spectrum was the average of three scans. Samples containing only buffer were also scanned according to these parameters to allow for blank subtraction. Transitional pH (pH T ) for each i-motif was calculated from the inflection point of fitted ellipticity at 288 nm.
Frontier orbital calculations. As previously described (29), model compounds were used in electronic structure calculations. Models of C, C + , C:G and C:C + in single stranded, Watson-Crick paired duplex or i-motif structures in DNA, were 1-methyl-cytosine,1-methyl-N3-protonatedcytosine, 1-methly-cytosine Watson-Crick base paired with 1-methyl guanine, and 1-methyl-cytosine base paired with 1-methyl-N3-protonated-cytosine respectively. Electronic structure calculations for each of the above compounds and the HSO 3 anion were carried out in Spartan'18 (Wavefunction, Irvine, CA). Preliminary geometries obtained with molecular mechanics were submitted for equilibrium geometry calculation with Parametric Method 3. The resulting geometry was submitted for ab initio equilibrium geometry calculation using the Hartree-Fock level of theory with a final geometry optimization using the 6-31G* basis set. All molecular orbital parameters and surfaces were computed with this basis set. Mass spectrometry analysis of (TCCC) 9 . Bisulfite mediated deamination of the (TCCC) 9 oligodeoxynucleotide was carried out at 37 • C and pH 5.3, where more than 95% of this oligodeoxynucleotide adopts an i-motif structure (see Table 2), using the procedure described above for nondenatured genomic DNA. Bisulfite treated DNA recovered from 0.5, 1 or 2 g of input oligodeoxynucleotide was submitted to Novatia, LLC (Newtown, PA) for High Resolution Liquid Chromatography Mass Spectroscopic analysis. dC→dU conversions were detected as 1 Da increases in mass of the reference mass of the (TCCC) 9 oligodeoxynucleotide. Deconvolution reports like the representative report given in Supplementary Data Set 1 were used to determine average frequencies for each level of conversion from 1 dU/36mer to 27 dU/36mer.

Bisulfite treatment of native genomic DNA
At pH 5.3 and 37 • C isolated DNA is primarily in the B-form DNA conformation with Watson-Crick base pairs. Biologically formed non-B structures can be expected to persist under these conditions, provided that they are not maintained by constraints present in chromatin (30,31). Cytosine residues that are Watson-Crick base paired in the B-DNA duplex are not susceptible to attack by the bisulfite nucleophile, which has been shown experimentally (31,32) and is confirmed by the frontier orbital energy of the C:G base pair model (Figure 1), where the lowest unoccupied molecular orbital (LUMO) of the base pair electrophile (centered on the cytosine moiety at C6) is very high relative to the highest occupied molecular orbital (HOMO) energy of the HSO 3 − nucleophile. Consequently, for the regions comprising d(TCCC) 5 and d(TCCC) 9 depicted in Figure 3, bisulfite mediated deamination in native DNA at pH 5.3 and 37 • C shows that these regions are in non-B structures on some chromosomes in PC3 cells. Deletions were also detected at the 5´end of the d(TCCC) 5 and d(TCCC) 9 elements. Moreover, deamination and deletion at d(TCCC) 9 was more extensive than it was at d(TCCC) 5 . Only two forms capable of reaction with the bisulfite nucleophile are possible at these sites: an unpaired and disordered loop, or an i-motif. Since pH 5.3 is near the pK a of N3 of cytosine, should dissociation of a biologically formed i-motif occur during DNA isolation result in an unpaired loop, both the protonated and unprotonated forms of deoxycytosine would be present. While the frontier orbital picture virtually precludes nucleophilic attack by the HSO 3 − anion on dC, given the high energy of its LUMO, the protonated dC + residue is expected to be readily attacked as shown both by experiment (32,33) and by the proximity to its LUMO energy relative to the energy of the HOMO of HSO 3 − (Figure 1). Although one might expect that an intact i-motif structure might resist bisulfite attack given the results with duplex DNA, the energy of the LUMO of the d(C:C + ) base pair electrophile (centered over C6 of the dC + moiety) in the i-motif (Figure 1) is also close to that of the nucleophile and therefore should be more readily attacked than either unprotonated dC or the C:G base pair. These considerations and the frontier orbital analysis above suggested that although the i-motif is a closed structure, it can be attacked by bisulfite. To directly confirm this possibility we treated d(TCCC) 9 with bisulfite using the same procedure used for genomic DNA (i.e. pH 5.3 and 37 • C) where it is almost exclusively in the i-motif form ( Table 2). High resolution liquid chromatography mass spectrometry analysis showed that significant conversion of dC to dU occurred during the bisulfite reaction even though the i-motif structure was the dominant form in solution with a T m of 66.7 • C ( Table 2). In these experiments intact 36mers accounted for 92% of the deconvoluted mass spectrum. A representative deconvolution report is given as Supplementary Data set 1. While the data clearly show significant conversion ranging from 1/27 to 27/27 cytosines converted. The frequency distribution for the recovered species was approximately random and peaks at about nine dC→dU conversions per 36mer (Figure 2). Unfortunately, sequence assignments are not possible with this method.
The sequencing results given in Figure 3 show aligned exemplars originating from individual chromosomes. In spite of the deletions observed in many of the sequences, both the d(TCCC) 5 and d(TCCC) 9 regions are generally complete. The C→T conversion frequency in d(TCCC) 5 was 10.4% while the conversion frequency in d(TCCC) 9 is 37.4%. Further, deletions were confined to the 5´-end at both d(TCCC) 5 and d(TCCC) 9 with a frequency of 18.5% and 100% of the cloned sequences respectively.

In vitro analysis of d(TCCC) n and d(GGGA) n model oligodeoxynucleotides
Given that the observed bisulfite mediated deamination was more extensive at d(TCCC) 9 than at d(TCCC) 5 we asked if the potential for i-motif structure formation at neutral pH was a function of length as seen in other i-motif forming sequences (22). Circular dichroism (CD) spectra exhibit well-characterized and easily distinguishable signatures for both the i-motif structure and the G-quadruplex structure for single-stranded oligodeoxynucleotide sequences. Gquadruplex sequences form spontaneously at neutral pH, however, i-motif structure formation is enhanced at low pH because protonation of N3 of cytosine increases the formation of the C:C + base pair in parallel DNA strands. However, many i-motif structures are known to form effectively at neutral pH (4), suggesting that successive stacking interactions in longer i-motif forming sequences may also enhance i-motif formation.
We determined titration profiles for i-motif formation using CD spectra at different pHs for the d(TCCC) n element  4 15.6 ± 1.0 13.8 ± 1.2 1.8 ± 2.1 6.6 ± 0.01 d(TCCC) 5 28.3 ± 0.6 26.0 ± 0.6 2.4 ± 1.0 6.7 ± 0.04 d(TCCC) 6 31.4 ± 0.6 29.2 ± 0.0 2.2 ± 0.6 6.8 ± 0.02 d(TCCC) 7 32.8 ± 1.0 29.1 ± 0.0 3.7 ± 1.0 6.8 ± 0.03 d(TCCC) 8 40.7 ± 0.6 30.7 ± 0.6 10.0 ± 0.0 6.9 ± 0.01 d(TCCC) 9 40.7 ± 0.6 30.3 ± 0.6 10.4 ± 0.6 7.1 ± 0.04 d(TCCC) 12 40.2 ± 0.6 30.5 ± 0.0 9.8 ± 0.6 7.2 ± 0.08 d(TCCC) 14 42.0 ± 4.0 30.4 ± 0.0 11.6 ± 4.0 7.1 ± 0.07 d(TCCC) 15 40.2 ± 4.0 23.2 ± 2.0 16.0 ± 2.0 7.1 ± 0.08 with n ranging from 4 to 15. Data for the full analyses for the length series d(TCCC) 4 to d(TCCC) 15 is given in the Supplementary Figure S7A-I, and is summarized in Table  1. The input concentration of 10 M oligodeoxynucleotide was chosen to permit detection at a concentration that favors unimolecular folding. The pH T (i.e. the transition pH values at which equal concentration of i-motif and single strands are present in solution) for each sequence was seen to increase with increasing length. The pH T values steadily increase with length up to a plateau with pH T reaching neutral pH at lengths greater than n = 8 with corresponding average T m values of 41 • C (Table 1 and Supplementary Figure S8). In contrast, the equivalent G-rich sequences formed stable G-quadruplexes over the whole length range. When we performed analyses at physiological mimicking potassium cation concentrations (100 mM KCl), the T m values were all >90 • C, or near the limit of detection. Reduction of the KCl levels to 20 mM still gave average melting temperatures of >70 • C for every sequence (Table 1 and Supplementary Figures S5A-H and Supplementary Table S1). This gives us confidence that in physiological conditions, these G-quadruplexes are highly stable, regardless of sequence length. Thermal difference spectra for G-quadruplex formation are given in Supplementary Figure S6. The pH profiles for d(TCCC) 5 and d(TCCC) 9 are depicted in Figure 4. Both sequences have significant potential for the formation of i-motif structures at physiological pH, with the pH T values of 6.7 for d(TCCC) 5 and 7.1 for d(TCCC) 9 . Since the pH of the bisulfite reaction is very  5 and d(TCCC) 9 repeats. CD spectra for each repeat over the titration range pH 5 to pH 8. In each titration, the characteristics i-motif signature emerge as pH is decreased, with the full signature emerging at pH 5 for d(TCCC) 5 and at pH 5.5 for d(TCCC) 9 . Transition pH (pH T ) values show that a significant fraction of each sequence would be present as i-motif near physiological pH.  9 66.7 ± 2.0 62.0 ± 2.0 4.6 ± 0.5 close to the pK a of the N3 of cytosine, protonated cytosines are expected to be present during the reaction should a biologically formed i-motif dissociate into a loop. However, the CD titration profiles of each sequence suggest that at pH 5.3 a biologically formed i-motif would be preserved as the dominant form in solution during the bisulfite reaction and hence the primary substrate for the deamination reaction ( Figure 4). Further, we assessed the stability of each of these sequences (d(TCCC) 5 and d(TCCC) 9 ) at pH 5.3 and, as expected, the melting temperatures were high (Supplementary Figure S4A and B). T m profiles were determined for each d(TCCC) n oligodeoxynucleotide at pH 6.7 and are given in Supplementary Figure S2A-I. Thermal difference spectra for i-motif formation are given in Supplementary Figure S3. At pH 6.5, the T m value for d(TCCC) 5 was found to be 28.3 ± 0.6 • C and that for d(TCCC) 9 was 40.7 ± 0.6 • C (Table 1). However, at pH 5.3 the T m value for d(TCCC) 5 was found to be 50.2 ± 3.5 • C and that for d(TCCC) 9 was 66.7 ± 2.0 • C ( Table 2). This indicates that, under the conditions of the bisulfite reaction (pH 5.3 and 37 • C), both these two regions would remain almost exclusively in the i-motif conformation in those chromosomes that had formed an imotif biologically.

Deletion Frequencies in the PC3 cell line
The d(TCCC) n repetitive element occurs in multiple locations across the whole human genome. Of these, there are about 220 distinct locations were d(TCCC) n for n ≥ 9. We used PCR to isolate fragments spanning sequences of various lengths from the genomic DNA of the PC3 cell line as previously described (26). Isolated fragments were cloned and sequenced also as described previously (26). Data in Figure 5 shows the results obtained for the HAFA5 gene (d(TCCC) 5 element), the BCR gene (d(TCCC) 7 element), the RACK7 gene (d(TCCC) 9 element), the ABL1 gene (d(TCCC) 11 element), and the PLA2G2A gene (d(TCCC) 15 element). The sequences were aligned as previously described against the genomic sequence (GRCh38.p12 Primary Assembly) of the human genome (26). As depicted in Figure 5, deletions (highlighted in red) are uniformly associated with the 5´-end of the genomic sequence of the Crich strand. Deletions are absent from cloned representatives of the HAFA5 site, with a single deletion recovered at the BCR site. However, the RACK7, ABL1 and PLA2G2A showed significant deletions at each cloned representative. The data is summarized in Figure 6. The potential formation of G-quadruplex structures is not correlated with deletion frequency at all, since in the absence of an i-motif (i.e. as a single-stranded oligodeoxynucleotide), G-quadruplex structures were observed to form in vitro at physiological pH at each length tested (Supplementary Figure S9). Moreover,   Table S1). Thermal difference spectra for G-quadruplex formation are given in Supplementary Figure S6. Conversely, we observe a strong correlation between the potential for i-motif formation in vitro and the frequency of deletion in PC3, since 100% of the recovered clones showed a deletion when the pH T of the relevant imotif forming sequence observed in vitro is above pH 7.0.

DISCUSSION
While G-quadruplex and i-motif structures can both form at neutral pH, as noted here for the d(TCCC) n and as re-ported for dC n (22), i-motif structure formation at neutral pH can depend dramatically on sequence length and composition. Consequently, the correlation between deletion frequency and d(TCCC) n folding at neutral pH ( Figure  6) provides strong evidence that i-motif structures are linked to deletions at d(TCCC) n elements in the human genome. This correlation suggests that i-motif structures form in vivo in the repair compromised PC3 cell line. Further, the in vitro findings of pH T 's near 7.1 and a melting temperatures above 40 • C, for each sequence length associated with a significant deletion frequency in PC3 (Supplementary Figures  S2A-I, S7A-I and S8), indicate that the i-motif structure, once formed, is expected maintain a structural disruption of duplex DNA structure at 37 • C and physiological pH at d(TCCC) n with n ≥ 9. Even so, we can only speculate on the process that leads to the extrusion of G-quadruplexes and i-motifs biologically. Evidence suggesting roles in transcription require complex biological processes that elicit these structures (7,8,11,34). Such systems might alter local topology in order to extrude quadruplexes and it is known that transcription itself can extrude a G-quadruplex at a site of RNA:DNA hybrids during R-Loop formation (35). However, i-motif structures have not been found associated with R-Loops, since RNA:DNA hybrids appear to form exclusively on the C-rich strand in regions of GC-Skew.
That one or both forms are in fact extruded so as to elicit replication stress is strongly suggested by the data in C. elegans (20,21,36) and our own data on the human PC3 cell line (26). In C. elegans dog-1 mutant worms, unresolved Gquadruplex structures cause replication stress and replication fork stalling and collapse (37). Since replication fork stalling can lead to HMG helicase uncoupling (38), once a replication fork stalls and the HMG helicase uncouples (38) the two strands will be separated and both strands can form non-B structures as topological constraints outside the region are relieved. In previous work we used a cloning procedure (see Supplementary Figure 1) designed to clone sequences near uncoupled HMG halt sites (26). Clones isolated from PC3 cell line DNA using that procedure were most often within 5 kb of a G-quadruplex motif (26). The d(TCCC) 9 site in RACK7 was identified with this cloning procedure suggesting that the clone came from a site undergoing replication stress and HMG helicase uncoupling (26).
Our previous work on the RACK7 repeat in PC3 cells also showed that d(TCCC) 9 sequence at this site was heavily deaminated by bisulfite under non-denaturing conditions (26), demonstrating that it is not paired with its complementary strand (31). However, the data presented here suggests that the i-motif structure can even survive after DNA isolation, as has been shown for other non-B structures (30,31). Importantly, the frequency of deamination at the d(TCCC) 5 element in the region of 10% compared to the frequency of 37% for d(TCCC) 9 correlates with the relative lower stability of d(TCCC) 5 (pH T 6.7 and T m = 28 • C) compared to d(TCCC) 9 (pH T 7.1 and T m = 41 • C) under physiological conditions. We considered the possibility that the region around d(TCCC) 9 was simply single-stranded and the i-motif might not be present given the possibility that, if it were present, it might behave like native duplex DNA and would resist nucleophilic attack by bisulfite. However, careful analysis of the features of the bisulfite reaction ( Figure  1) and pH and melting profiles (Table 2 and Figure 4) suggested that: the i-motif itself is the mostly likely initial substrate for nucleophilic attack by the bisulfite anion at pH 5.3 and 37 • C (Figure 3). This suggestion was confirmed by high resolution liquid chromatography mass spectrometry of the bisulfite treated d(TCCC) 9 36mer (Figure 2) at 37 • C and pH 5.3. After bisulfite treatment, between 1/27→27/27 dC→dU conversions were observed in intact 36mers in a roughly random distribution (Figure 2). While the method does not allow sequence assignments for the conversions, the approximate peak in the frequency distribution ( Figure  2) at 9 conversions/ 36mer is consistent with the frequency of 10.8 conversions/36mer seen in bisulfite treated genomic DNA ( Figure 3).
Further, the roughly random distribution of frequencies is consistent with the frontier orbital picture (Figure 1) since in the initial stages of the reaction, cytosines exposed in a loop would be expected to be only marginally more reactive than those in a closed i-motif. As the reaction progressed, bisulfite addition to the i-motif structure is expected to destabilize the structure by neutralizing the charge on the protonated cytosine and introducing sp3 carbons at both C6 and C5. Such destabilization is expected to permit further bisulfite addition at exposed dC + residues in unpaired single-strands.
While the mass spectrometry data does not address the nature of the structure of the i-motif that forms in d(TCCC) 9, for a sequence of this type with 27 dC residues, two forms seem plausible. In the first form the sequence can be viewed as two i-motif 16mers linked as follows: d(TCCC) 4 -d(TCCC)-d(TCCC) 4 ( Figure 7A). In the second a single 32mer forms the i-motif linked as follows: d(TCCC)-d(TCCC) 8 . Although our data do not prove that the structure is more like the second structure ( Figure 7B) one expects the T m for the tandem structure ( Figure 7A) to be very close to that observed for d(TCCC) 4 at neutral pH i.e. 15.6 • C as opposed to the observed value for d(TCCC) 9 at neutral pH of 40.7 • C (Table 1A). In both models (Figure 7) a single dT residue is exposed in the loops of each i-motifs. Additional stability in a longer structure, like that depicted in Figure 7B, may be achieved in part by the additional stacking energy and in part by the increase in pKa associated with additional base pairing (39,40).  (20,21,36). A: When an i-motif forms the leading strand impediment and survives after replication deletions occur 5 to the i-motif on the C-rich strand often extending into the repeat sequence itself. B: When a G-quadruplex forms a transient leading strand impediment and an i-motif forms on the lagging strand, deletions again occur near the i-motif often extending into the repeat sequence itself. Note that when only G-quadruplex survives replication as is the case in dog-1 mutant C. elegans deletions will occur 5 to the G-quadruplex on the G-rich strand (20,21,36) often extending into the repeat itself.
In addition to these findings it is important to note that independent of d(TCCC) repeat length, all of the deletions we observe, mapped near the 5´end of C-rich strand of the d(TCCC) n element and often extended into the element itself (Figure 3). The opposite orientation for deletions has been clearly shown for dog-1 mutations in C. elegans (20,21). There, deletions map 5´-to the G-rich strand and G-quadruplex provides the impetus for deletion. Hence, the orientation of the human deletions reported here implicates i-motif formation during or prior to DNA replication in the genesis of deletions at d(TCCC) n sequences.
By analogy with the multiple repair systems known to play roles in suppressing G-quadruplex formation, it seems likely that similar suppression systems for i-motif formation may exist, given the association of the i-motif with deletions as described here. However, the efficiency of such systems remains unknown. In this regard, it must be noted that the PC3 cell line has been in long-term cell culture, hence the deletions we see may also indicate a simple stochastic accumulation of deletions representing the overall frequency of i-motif occurrence and the efficiency of suppression at each replication cycle. In this regard, it is interesting to note that if the in vitro pH T values for d(TCCC) n sequences for n ≥ 9 reflect a high rate of i-motif formation at these sites, and the overall efficiency of an i-motif suppression system translates to a probability of remaining unsuppressed in vivo during replication of only 0.05, then the probability that an i-motif would impair replication and induce DNA damage on average at least once in 25 replication cycles is 0.72.
Models of the deletion process induced by the dog-1 mutation in C. elegans suggest that poorly suppressed Gquadruplexes provide a replication impediment that creates a gap beyond the impediment that survives replication and leads to deletions in a subsequent replication cycle that map to 5´end of the impediment (21,37). Our findings in the human PC3 cell line for i-motif impediments formed at the d(TCCC) n elements are completely consistent with this model (Figure 8). The data presented here and in (26) would support HMG helicase uncoupling and uncoupled replication followed by leading strand repriming to produce a gap (41) near an i-motif leading strand impediment ( Figure 8A). A transient G-quadruplex that led to HMG helicase uncoupling and uncoupled replication and the formation of an i-motif on the lagging strand would leave a gap in the lagging strand 3´to the i-motif ( Figure 8B). Such a gap would explain why bisulfite can attack the region 3 to d(TCCC) 9 element in RACK7 ( Figure 3). Given that the dog-1 mutation impairs G-quadruplex suppression and results in deletions mapping 5´to end of the G-quadruplex in the dG n elements in C. elegans, models like that shown in Figure  8A in which the leading strand impediment is an unresolved G-quadruplex are ruled out since they require that the dele-tions occur 5´to the G-quadruplex impediment on the Grich strand and 3´on the C-rich strand, because all of the deletions we observe are 5´-to the i-motif/G-quadruplex forming sequence on the C-rich strand. Given the the ability of the d(GGGA) n motifs to form G-quadruplexes in vitro over the range of lengths n > 3 we conclude that while Gquadruplex formation at these sites may be a necessary or contributing condition for these deletions, i-motif formation a under physiological conditions is required for deletions to occur.
Finally, given the dog-1 precedent indicating that loss of a G-quadruplex resolvase results in deletions at sites that can form G-quadruplexes and i-motifs it is reasonable to speculate that the failure of an unknown i-motif suppression system could result in the deletions in the deletions observed here in the PC3 cell line. Given the absence of a functional p53 gene in the PC3 cell line, failure to suppress i-motif formation may be linked to p53, however, direct evidence for this suggestion is currently unavailable.