The Transition from Unfolded to Folded G-Quadruplex DNA Analyzed and Interpreted by Two-Dimensional Infrared Spectroscopy

A class of DNA folds/structures known collectively as G-quadruplexes (G4) commonly forms in guanine-rich areas of genomes. G4-DNA is thought to have a functional role in the regulation of gene transcription and telomerase-mediated telomere maintenance and, therefore, is a target for drugs. The details of the molecular interactions that cause stacking of the guanine-tetrads are not well-understood, which limits a rational approach to the drugability of G4 sequences. To explore these interactions, we employed electron-vibration-vibration two-dimensional infrared (EVV 2DIR) spectroscopy to measure extended vibrational coupling spectra for a parallel-stranded G4-DNA formed by the Myc2345 nucleotide sequence. We also tracked the structural changes associated with G4-folding as a function of K+-ion concentration. To classify the structural elements that the folding process generates in terms of vibrational coupling characteristics, we used quantum-chemical calculations utilizing density functional theory to predict the coupling spectra associated with given structures, which are compared against the experimental data. Overall, 102 coupling peaks are experimentally identified and followed during the folding process. Several phenomena are noted and associated with formation of the folded form. This includes frequency shifting, changes in cross-peak intensity, and the appearance of new coupling peaks. We used these observations to propose a folding sequence for this particular type of G4 under our experimental conditions. Overall, the combination of experimental 2DIR data and DFT calculations suggests that guanine-quartets may already be present before the addition of K+-ions, but that these quartets are unstacked until K+-ions are added, at which point the full G4 structure is formed.

Gaussians, and is not to be confused with fitting of line cuts with a sum of 1D Gaussians.Information suggests that the carbonyl stretch of single strand guanine should give rise to a column of cross-peaks with comparable intensities to the G4 GCO II column in Figure 7b in the manuscript.As the 2D fits of the EVV 2DIR spectra of Myc2345 did not require a column of gaussians between   = 1660-1673 cm -1 to take account of the presence of single strand guanine, it suggests the G4 structure dominates.One possible approach towards estimating the abundance of single-strand versus G4 DNA would be to fit a horizontal line cut in the guanine carbonyl stretch region with a peak between   = 1660-1673 cm -1 for the single strand guanine in addition to the G4 guanine peak.Figure S12 shows a horizontal line cut of the 100 mM K + EVV 2DIR spectrum at   −   = 1350 cm -1 fitted with a 1D gaussian at   = 1678 cm -1 (G4) and another at   = 1660 cm -1 (single strand).The maximum intensity of the single strand gaussian is 0.04 times the intensity of the G4 gaussian.Whilst acknowledging the limitations of the extent and the suitability of the underlying data for this analysis, and assuming the cross-sections of these two couplings are equal, an estimate stemming from it suggests an upper bound of 4% of single strand DNA in the 100 mM K + sample.

Figure S3 .
Figure S3.Computational EVV 2DIR spectrum of isolated guanine base.a, Computational EVV 2DIR spectrum of a guanine base.An isolated guanine base has one carbonyl stretch mode at 1758 cm -1 , which gives rise to a row and column of cross-peaks.b, Depiction of guanine carbonyl stretch mode in an isolated guanine base.

Figure S4 .
Figure S4.Sequence comparison of the G4-forming sequence in the human c-myc promoter region from different publications.a, Pu27, the 27-nt purine-rich wild type sequence found in the NHE of the c-myc promoter.b, Myc2345 sequence used for the EVV 2DIR measurements in this work.c, Wild type Myc2345 sequence as used in NMR structure PDB:7KBV. 1 d, MycL1altered Myc2345 sequence used in CD measurements in Ref. 2 Bases in red signify those that differ from the Myc2345 sequence used in the EVV 2DIR measurements.

Figure S5 .
Figure S5.The G-quadruplex and G-quartet structure used for the calculation of the computational EVV 2DIR spectrum.a-e, Structure of the G-quadruplex.Figure (b) shows a top-down view of the G-quadruplex.The terminal quartets (purple) are aligned with each other, and the central quartet (green) is ~45 o rotated with respect to the terminal quartets.Figures (d) and (e) presents the terminal and central quartets of the G-quadruplex, respectively.f-g, Structure of the G-quartet from (f) top-down view and (g) side view.

Figure S6 .
Figure S6.Comparison of structures of G-quartet and terminal quartets in a G4.Structure of (ab) G-quartet and (c) a terminal quartet of G4, showing the intersection of molecular planes of adjacent guanine bases.The guanine molecular planes intersect at an angle of 21 o in the G-quartet and 19 o in the terminal quartets of the G4.

Figure S7 .
Figure S7.Processing of EVV 2DIR spectrum of glass coverslip used to normalise EVV 2DIR spectra of Myc2345.EVV 2DIR spectrum of glass coverslip (a) raw, (b) after smoothing by averaging over nearest neighbours, (c) after interpolation along the vertical axis, (d) after applying a low-pass filter to the Fourier transform.

Figure S8 .
Figure S8.a-b, Fitting of line cuts of experimental EVV 2DIR spectra of 1 mM Myc2345 in the presence of 0, 5, 10 and 100 mM K + at ωβωα = 1675 cm-1 with a sum of 1D Gaussians.The 100 mM K + line cut (a) was first fitted with five 1D gaussians with no parameter boundaries.The 10, 5 and 0 mM K + line cuts (b-d) were subsequently fitted with five 1D Gaussians, with the widths and locations fixed to the values determined from the fit of the 100 mM K + line cut and no boundaries for the intensities.These fits show peak 101 (green Gaussian feature) growing in as [K + ] is increased.

Figure
Figure S9.a-d, Fitting of line cuts of experimental EVV 2DIR spectra of 1 mM Myc2345 in the presence of 0, 5, 10 and 100 mM K + at ωβωα = 1675 cm-1 with a sum of 1D Gaussians with unconstrained widths.The 100 mM K + line cut (a) was first fitted with five 1D gaussians with no parameter boundaries.The 10, 5 and 0 mM K + line cuts (b-d) were subsequently fitted with five 1D Gaussians.The locations were fixed to the values determined from the fit of the 100 mM K + line cut.The widths of the gaussians were allowed to vary ±2 cm -1 from the value determined in the fit of the 100 mM K + line cut.No boundaries were placed for the intensities of the gaussians.These fits show peak 101 (green Gaussian feature) growing in as [K + ] is increased, and that it is not an artifact of constraining the widths of Gaussians.

Figure S10 .
Figure S10.A comparison of raw and fitted data of 1 mM Myc2345 in the presence of 0, 5, 10 and 100 mM K + in the regions where new peaks 101 (a) and 92 (b) are observed to appear on addition of K + ions.The fitted data shown here is from the fitting of the entire 2D spectra with a sum of 102 2D

Figure S11 .
Figure S11.FTIR spectra of solutions of 1 mM Myc2345 in the presence of 0, 5, 10 and 100 mM K + , prepared as described in Materials and Methods excluding gel formation.The guanine carbonyl stretch feature is observed to red-shift from 1675 cm -1 (0 mM K + ) to 1670 cm -1 (100mM K + ).

Figure S12 .
Figure S12.The upper limit for the abundance of single stranded DNA can be estimated due to the difference in guanine carbonyl stretch frequencies between single strand and complexed guanine.The calculated EVV 2DIR spectrum of an isolated guanine shown in Figure S3a in the Supplementary