Solution structure of the major G-quadruplex formed in the human VEGF promoter in K+: insights into loop interactions of the parallel G-quadruplexes

Vascular endothelial growth factor (VEGF) proximal promoter region contains a poly G/C-rich element that is essential for basal and inducible VEGF expression. The guanine-rich strand on this tract has been shown to form the DNA G-quadruplex structure, whose stabilization by small molecules can suppress VEGF expression. We report here the nuclear magnetic resonance structure of the major intramolecular G-quadruplex formed in this region in K+ solution using the 22mer VEGF promoter sequence with G-to-T mutations of two loop residues. Our results have unambiguously demonstrated that the major G-quadruplex formed in the VEGF promoter in K+ solution is a parallel-stranded structure with a 1:4:1 loop-size arrangement. A unique capping structure was shown to form in this 1:4:1 G-quadruplex. Parallel-stranded G-quadruplexes are commonly found in the human promoter sequences. The nuclear magnetic resonance structure of the major VEGF G-quadruplex shows that the 4-nt middle loop plays a central role for the specific capping structures and in stabilizing the most favored folding pattern. It is thus suggested that each parallel G-quadruplex likely adopts unique capping and loop structures by the specific middle loops and flanking segments, which together determine the overall structure and specific recognition sites of small molecules or proteins. LAY SUMMARY: The human VEGF is a key regulator of angiogenesis and plays an important role in tumor survival, growth and metastasis. VEGF overexpression is frequently found in a wide range of human tumors; the VEGF pathway has become an attractive target for cancer therapeutics. DNA G-quadruplexes have been shown to form in the proximal promoter region of VEGF and are amenable to small molecule drug targeting for VEGF suppression. The detailed molecular structure of the major VEGF promoter G-quadruplex reported here will provide an important basis for structure-based rational development of small molecule drugs targeting the VEGF G-quadruplex for gene suppression.


INTRODUCTION
The human vascular endothelial growth factor (VEGF) is a pluripotent cytokine and a key regulator of angiogenesis. VEGF plays an important role in tumor survival, growth and metastasis (1,2). It binds to VEGF receptors on the surfaces of endothelial cells to promote the formation of new blood vessels, or angiogenesis, which can promote tumor growth by providing oxygen and nutrients as well as provide escape routes for disseminating tumor cells (3,4). VEGF overexpression is frequently found in a wide range of human tumors (5)(6)(7)(8) and can be induced by the loss or inactivation of tumor suppressor genes (9), the activation of oncogenes (10), external stimuli such as hypoxia and cytokines (11,12) and transcriptional upregulation, which is controlled by the cis-acting elements and transcription factors (5)(6)(7)(8)(9)13). Anti-VEGF therapy has been actively pursued for cancer therapeutics in a variety of forms, including antibodies, ribozymes, immunotoxins and small molecule inhibitors (14)(15)(16)(17)(18)(19)(20)(21)(22)(23).
The G-quadruplexes formed in oncogene promoters have been shown to be potential targets for small molecule drugs (24)(25)(26). Most recently, the existence of DNA G-quadruplex has been visualized on chromosomes *To whom correspondence should be addressed. Tel: +1 520 626 5969; Fax: +1 520 626 6988; Email: yang@pharmacy.arizona.edu in human cells using a G-quadruplex-specific antibody (27). One region proximal to the transcription initiation site, a 39-bp polyG/polyC region located À88 to À50 bp relative to the transcription initiation site, has been shown to be functionally significant in VEGF transcriptional activity with multiple transcription factor binding sites, including three potential Sp1 binding sites (13). This region has been shown to be highly dynamic in conformation and can form DNA G-quadruplex secondary structure on the G-rich strand, as demonstrated by in vitro and plasmid footprinting with dimethyl sulfate (DMS), DNase I and S1 nuclease in K + (28,29), and by in vivo DMS footprinting using A498 kidney cancer cells that overexpress VEGF (30). The formation of DNA G-quadruplex structure is clearly enhanced by G-quadruplex-interactive agents (28), which can repress VEGF expression in human tumor cells (31), suggesting that the VEGF G-quadruplex is amenable to small molecule drug targeting for VEGF suppression. A detailed molecular structure of the major VEGF promoter G-quadruplex will be important for structurebased rational development of small molecule drugs (32).
We report here the nuclear magnetic resonance (NMR) structure of the major G-quadruplex formed in the human VEGF promoter in K + solution. Our NMR study unequivocally demonstrated that the major intramolecular G-quadruplex formed in the VEGF promoter in K + is a parallel-stranded structure with 1:4:1 loop-size arrangement. We have found that the middle 4-nt loop interacts with the 5 0 flanking residues to form a specific capping structure, a salient feature as this interaction is specific to the VEGF sequence and differs from those other parallel-stranded structures. Together with the 5 0flanking segment, the 4-nt middle loop appears to play a central role in forming the specific capping structure that likely determines this most favored folding pattern. Parallel-stranded G-quadruplexes have been found to be common in the human promoter sequences. Significantly, our results indicate that each parallel structure is likely to adopt unique capping and loop structures by the specific flanking sequences and middle loops, which together determine the stability of the overall G-quadruplex structure and potential specific interactions with small molecules or proteins.

MATERIALS AND METHODS
The synthesis and purification of DNA oligonucleotides was done as described earlier (33)(34)(35)(36)(37) Circular dichroism (CD) spectroscopic study of the oligonucleotides was performed on a Jasco J-810 spectropolarimeter (Jasco Inc., Easton, MD, USA) equipped with a thermoelectrically controlled cell holder as described previously (38). The quartz cell of 1 mm optical path length was used. A blank sample containing only buffer was used for the baseline correction. CD spectroscopic measurements were the averages of three scans collected between 200 and 350 nm. The scanning speed of the CD instrument was 100 nm/min, and the response time was 1 s. T m values were measured by CD melting and annealing experiments performed at 265 nm for three repeats, with a heating or cooling rate of 2 C/min, respectively.
NMR experiments were performed on a Bruker DRX-600 MHz spectrometer as discussed earlier (33)(34)(35)(36)(37). Stoichiometric titration of the unfolded and folded strands as a function of total strand concentration from 0.01 to 0.1 mM was performed at 75 C (melting point) (39). The guanine H1 imino protons, one-bond coupled to N1, and H8 protons, two-bond coupled to N7, can be unambiguously assigned by 1D 15 N-edited heteronuclear multiple quantum coherence (HMQC) experiments (40). For this purpose, site-specific labeled DNA synthesis with 6% 15 N-labeled-guanine phosphoramidite (41) was used. The 1D GE-JRSE HMQC experiments were used for measuring 15 N-edited spectra (40) to identify guanine imino and H8 protons. The 1D variable temperature (VT) proton NMR experiments were done in the range from 1 C to 80 C. Homonuclear 2D-NMR experiments double quantum filtered-correlation spectroscopy (DQF-COSY), total correlation spectroscopy (TOCSY) and nuclear overhauser effect spectroscopy (NOESY) were collected at 5, 15 and 25 C for complete proton resonance assignment in water and D 2 O solution. The contribution from J-modulation and zero quantum coherence effect was suppressed by using z-gradient filter having gradient strength 20% and a duration of 1 ms. The NMR experiment for samples in water were performed using Jumpreturn spin-echo water suppression technique in which water peak was suppressed with maximum intensity tuned to 11 ppm (42). Relaxation delays were set to 2.5 s. The acquisition data points were set to 4096 Â 512 (complex points). Peak assignments and integrations were achieved using the software Sparky (UCSF). Nonexchangeable protons were estimated based on the Nuclear Overhauser Effect (NOE) cross-peak volumes at 50-300 ms mixing times, with the upper and lower boundaries assigned to ±20% of the estimated distances. Distance restraints for the unresolved cross-peaks were set with looser boundaries of ±30%. The cytosine base proton H6-H5 distance (2.45 Å ) was used as a reference distance. The distances involving the unresolved protons, e.g., methyl protons, were assigned using pseudo-atom notation to make use of the pseudoatom correction automatically computed by X-PLOR.
The structure of Pu22-1213T was calculated using X-PLOR (43). Metric matrix distance geometry and simulated annealing calculations were carried out in X-PLOR (43) to embed and optimize 100 initial structures based on an arbitrary extended conformation of the single-stranded Pu22-1213T sequence to produce a family of 100 DG structures, as described previously (33,34). The experimentally obtained distance restraints and G-tetrad hydrogen bonding distance restraints were included during the calculations. All of the 100 molecules obtained from the distance geometry simulated annealing (DGSA) calculations were subjected to NOE-restrained Simulated Annealing refinement in X-PLOR (43) with a distance-dependent dielectric constant. A total of 407 NOE distance restraints were introduced into the NOErestrained structure calculation with a force constant of 20 kcal mol À1 Å À2 . Hydrogen bond restraints were applied to the G-tetrads, using a quadratic energy function with a force constant of 100 kcal mol À1 Å À2 . A low-level planarity restraint (2 kcal mol À1 Å À2 ) was also applied on the G-tetrad in the simulated annealing step of the structure calculation. The planarity restraints were removed in the final molecular dynamics simulation with energy minimization. Dihedral angle restraints were used to restrict the glycosidic torsion angle for the experimentally assigned anti conformation bases and for tetradguanines. The 30 best molecules were selected based both on their minimal energy terms and number of NOE violations and were further subjected to NOE-restrained molecular dynamics calculations at 300 K for 25 ps. The coordinates saved at every 0.1 ps during the last 2.0 ps of NOE-restrained molecular dynamics calculations were averaged, and the resulting averaged structure was subjected to further minimization until the energy gradient of 0.1 kcal mol À1 was achieved. The 10 best molecules were selected based both on their minimal energy terms and number of NOE violations with the mean rms deviation of 1.10 Å for the family of 10 ensemble structures. For the G-quadruplex formed in the wild-type VEGF_Pu22 sequence, we took the G-quadruplex formed in Pu22-T12T13 as the starting structure and replaced T12 and T13 with the wild-type G12 and G13 residues. This structure was then subjected to energy minimization followed by unrestrained molecular dynamics simulation for 25 ps at 300 K.

RESULTS
The major G-quadruplex formed in VEGF Promoter in K + solution adopts a parallel-stranded structure with 1:4:1 loop-size arrangement The G-rich strand of this VEGF proximal promoter region contains five guanine-runs. Using electrophoresis mobility shift assay (EMSA), DMS footprinting and DNA polymerase stop assay in K + solution, it has been shown that the G-quadruplex formed in this region involves only the 5 0 four successive G-runs (VEGF-Pu22, Figure 1A) (29,31), which contain four (G2-G5), three (G7-G9), five (G12-G16) and four (G18-G21) guanines, respectively. VEGF-Pu22 can form multiple loop isomers. The wild-type VEGF-Pu22 forms a clear predominant G-quadruplex structure in 95 mM K + solution, as shown by a set of imino proton peaks at 10.5-12 ppm in 1 H NMR, characteristic of G-tetrad guanines ( Figure 1B). The CD spectrum of VEGF-Pu22 showed a positive peak $265 nm and a negative peak at 240 nm (Supplementary Figure S1), characteristic of a parallel-stranded Gquadruplex structure (38). We prepared the wild-type VEGF-Pu22 sequence with 6% site-specific incorporation of 15 N-labeled-guanine at each guanine of the 5 G-run G12-G16 ( Figure 1A). The imino protons of G14, G15

5'-CAGGGCGGGCCTTGGGCGGGAT-3'
Pu22-T12T13A2A21 E VEGF-Pu22 and G16 were clearly detected in 1D 15 N-edited HMQC experiments, whereas the imino proton of G12 was weak and the imino proton of G13 was missing ( Figure 1C); the imino proton of G13 was not detected even at 2 C (Supplementary Figure S2), indicating that the major conformation formed in the wild-type VEGF-Pu22 does not involve G12 and G13 in the G-tetrad formation. Thus, the folding topology of the major G-quadruplex formed in VEGF-Pu22 is a parallel G-quadruplex with a 1:4:1 loop-size arrangement ( Figure 1D). This major VEGF G-quadruplex can be isolated by the sequence Pu22-T12T13, with G-to-T mutations at positions 12 and 13 ( Figure 1A). Pu22-T12T13 gave rise to a well-resolved 1 H NMR spectrum in 95 mM K + solution ( Figure 1B) and was used for NMR structure determination.
To determine the effect of loop and flanking residues, we have tested various modified VEGF sequences by 1 H NMR ( Figures 1B and Supplementary Figure S3). The spectrum of Pu22-T12 with G12-to-T mutation is almost the same as that of the wild-type VEGF-Pu22, indicating that G12 is involved in neither the tetrad formation nor the capping structure. The spectrum of Pu22-T12T13 is similar to that of the wild-type VEGF-Pu22, with the G7 imino proton down-field shifted, likely due to a smaller ring-current effect of T13 than that of G13 in the capping structure (see later in the text). The spectrum of Pu22-T12T13A2 showed a shifted G18 imino proton, likely caused by a different base pair conformation (T13:A2) of this modified sequence, whereas Pu22-T12T13A2A21 showed additionally shifted G20 and G16 imino protons, likely due to the mutated A21 base.
The less stable 1:2:3 loop isomer can also be isolated in a modified VEGF sequence in K + solution Our result is consistent with the previous DMS footprinting data, which show that the 1:4:1 loop isomer is the predominant G-quadruplex formed in the wild-type VEGF promoter sequence in K + solution (29). It was suggested by DMS footprinting that a minor conformation, the 1:2:3 G-quadruplex (Supplementary Figure S4A), could also be formed (29). The 1:2:3 G-quadruplex needs G12 and G13 in the core G-tetrads and can be isolated using the Pu22-T15T16 sequence (Supplementary Figure S4A). Although the 15 N-edited HMQC experiments of the wild-type sequence VEGF-Pu22 in K + did not detect the formation of the 1:2:3 G-quadruplex, as the signals for the imino protons of G12 and G13 were either very weak or missing ( Figure 1C), Pu22-T15T16 can form a single G-quadruplex in K + (Supplementary Figure S4B). The 1D 1 H spectrum of the wild-type sequence appears to show a minor species, likely to be the 1:2:3 loop isomer (Supplementary Figure S4B). It is possible that the HMBC experiment of the 6% 15 N-Glabeled DNA is not sensitive enough to detect the low population of the 1:2:3 loop isomer. The melting temperature of the 1:4:1 G-quadruplex formed in Pu22-T12T13 is 77.3 C, whereas the melting temperature of the 1:2:3 G-quadruplex is 73 C ( Table 1). The melting temperature of the wild-type VEGF-Pu22 is 77.9 C (Table 1), which is close to that of the major conformation 1:4:1 G-quadruplex. The 4 C difference in T m may explain the major formation of the 1:4:1 G-quadruplex in the VEGF promoter sequence.
Complete NMR spectra assignment of the major VEGF promoter G-quadruplex NMR experiments of Pu22-T12T13 were carried out in 95 mM K + solution. We have also examined this sequence in the physiologically relevant 140 mM K + concentration, which gave rise to the same NMR spectrum (Supplementary Figure S5). The guanine imino and H8 protons of Pu22-T12T13 were assigned using 15 N-edited HMQC ( Figure 2) (36,37). The absence of imino protons for G2 and G21 (Figure 2A) indicated that G2 and G21 were not involved in the G-tetrad formation. Noteworthily, the imino protons of G14, G15 and G16 of Pu22-T12T13 ( Figure 1B) are almost at the same locations as those of the wild-type VEGF-Pu22 (Figure 1C). The G-quadruplex formed in Pu22-T12T13 appears to be of monomeric nature as shown by the NMR stoichiometry titration experiment at the melting temperature (Supplementary Figure S6). Pu22-T12T13 forms a parallel-stranded intramolecular G-quadruplex with 1:4:1 loop-size arrangement ( Figure 1D). This folding topology was determined by NOE connectivities of guanine imino and H8 protons. In a G-tetrad plane with a Hoogsteen Hbond network, the NH1 of a guanine is in close spatial vicinity to the NH1s of the adjacent guanines and to the H8 of one of the adjacent guanines ( Figure 1E). For example, the NOEs of G18H8/G14H1, G14H8/G7H1, G7H8/G3H1 and G3H8/G18H1 ( Figure 3A) defined the tetrad plane of G3-G7-G14-G18. The other two tetradplanes, G4-G8-G15-G19 and G5-G9-G16-G20, were determined in a similar way.
Complete proton assignment of Pu22-T12T13 was accomplished by sequential assignment (Figure 3 Table 2. All of the residues appear to adopt anti conformation based on the intensities of intra-residue H8-H1' cross peaks ( Figure 3B). Critical inter-residue NOE interactions are summarized in Figure 4 and define the overall structure of the VEGF promoter G-quadruplex.
Complete spectral assignment (Supplementary Figure S7) was also accomplished for Pu22- T12T13A2A21 with additional G-to-A mutations at G2 and G21.
NOE-refined solution structure of the VEGF G-quadruplex shows unique capping structure involving both the 4-nt middle loop and the two flanking segments Solution structures of the Pu22-T12T13 G-quadruplex were calculated using a NOE-restrained distance geometry (DGSA) and restrained molecular dynamics (RMD) approach ( Figure 5, PDB ID 2m27), starting from an arbitrary extended single-stranded DNA. A total of 407 NOE distance restraints, including 145 interresidue NOE interactions, were used in the NOE-restrained structure calculation (Supplementary Table S1). Dihedral restraints are used for the anti glycosidic torsion angle () for loop residues. The stereo view of the 10 lowest energy structures is shown in Figure 5A. The structure statistics are listed in Supplementary Table S1.
Pu22-T12T13 forms a well-defined parallel-stranded Gquadruplex structure with three tetrads. The two 1-nt loops are located in the groove and adopt similar conformations, with extended sugar backbone and the cytosine base sticking out to the solvent ( Figure 5B). The 4-nt double-chain-reversal loop, C10-C11-T12-T13, interestingly, adopts a unique conformation ( Figure 5B). The T13 base stacks over the G14 base and appears to be hydrogen-bonded with the G2 base of the 5 0 flanking segment ( Figure 5B-iii). The hydrogen-bond interaction was supported by NMR, i.e., the G2 imino proton was detected at 2 C at $10.8 ppm (Supplementary Figure S2). The G2:T13 base pair appears to completely stack over the 5 0 G-tetrad ( Figure 5B-iii) and thus would experience strong ring-current effect. This is shown by the NMR data, i.e. a clear upfield-shifting of the chemical shifts for sugar protons of G2 and T13, e.g. G2H1 0 , ( Figure  3B, Table 2). The other three residues, C10, C11 and T12, are located in the groove to connect the now fourlayer structure (three G-tetrads plus one G: T base pair) with the C9 and C10 bases pointing out to the solvent. The T13, which is involved in the G2:T13 base pair capping structure, is a mutation from the wild-type G13. To examine the G-quadruplex formed in the wild-type sequence VEGF-Pu22, we took the G-quadruplex structure formed in Pu22-T12T13 and replaced T12 and T13 with the wild-type G12 and G13 residues. We carried out energy minimization followed by unrestrained molecular dynamics simulation for 25 ps at 300 K. Notably, a hydrogen-bonded G2:G13 base pair can be nicely formed in the wild-type sequence to cap the VEGF Gquadruplex quadruplex ( Figure 5C). We have collected 2D NOESY data with a 50 ms mixing time for the wildtype sequence VEGF-Pu22. Similar to what was observed in the Pu22-T12T13 sequence, no syn conformation was observed for any nucleotide in the VEGF-Pu22 sequence (Supplementary Figure S8).

DISCUSSION
The NMR results in the present study unequivocally demonstrated that the major intramolecular Gquadruplex formed in the VEGF proximal promoter in K + solution is a parallel-stranded structure with a 1:4:1 loop-size arrangement. The minor species, a 1:2:3 loop isomer (Supplementary Figure S4), could not be detected in the wild-type sequence VEGF-Pu22 by NMR, as the imino proton of G13, which is required for the core-tetrad of the 1:2:3 loop isomer (Supplementary Figure S4), was not detected, even at 2 C ( Figure 1C and Supplementary Figure S2). The T m of the 1:2:3 loop isomer was shown to be 4 C lower than that of the 1:4:1 loop isomer, which may explain the major formation of the 1:4:1 G-quadruplex in the VEGF promoter sequence.
Parallel-stranded structures have been found to be common in the human promoter G-quadruplexes, such as c-MYC (35,44,45), HIF-1a (46), c-KIT21 (47), RET (48) and hTERT (49,50). Importantly, all of these parallel-stranded promoter G-quadruplexes contain three tetrads and two 1-nt loops (first and third), but a variablelength middle loop ( Figure 6) (26,32). We have previously determined the molecular structure of the major G-quadruplex formed in the c-MYC promoter, a threetetrad parallel structure with 1:2:1 loop-size arrangement (35), which shows that the 1-nt loop is highly favored in parallel-stranded G-quadruplexes because of the righthanded twist of the adjacent G-strands. Although the VEGF G-quadruplex also contains the 1-nt first and third loops, the middle loop of the VEGF G-quadruplex is 4 nt long. Significantly, unlike the 2-nt middle loop of the MYC G-quadruplex that stays in the groove, the 4-nt middle loop of the VEGF G-quadruplex stretches over the 5 0 tetrad to form a unique capping structure with the flanking segment. This capping structure was observed in the Pu22-T12T13 sequence with two G-to-T mutations at the 12 and 13 positions; a similar capping structure was also shown to form in the wild-type sequence VEGF-Pu22 using unrestrained molecular dynamics simulation. It is noted that, although the two capping structures in the wild-type and mutant sequences are similar, there appear to be differences in their respective conformations. For example, the G13:G2 capping structure is larger than that of the T13:G2 capping structure formed in the mutant sequence and would thus cover more of the top G-tetrad. In addition, the groove-located wild-type G12 residue also likely to possess a stronger ring-current effect on G7 than that of the mutated T12 ( Figure 5), which could explain the observed upfield-shifting of the resonance of G7 imino proton in VEGF-Pu22 as compared with Pu22-T12T13 (Figure 1 and Supplementary Figure  S4). As such, the 4-nt middle loop of the VEGF Gquadruplex appears to play a critical role in forming the specific capping structure and stabilizing the most favored folding structure. This capping structure represents a unique, VEGF sequence-specific loop interaction and distinguishes the VEGF G-quadruplex from other parallel-stranded structures, such as the MYC G-quadruplex whose capping structures are formed solely by the flanking segments due to the short 2-nt middle loop (35). The specific capping structure of the VEGF promoter G-quadruplex may be recognized by
small molecule or protein ligands, and the molecular structure described in this study could provide a starting point for structure-based rational design of quadruplexinteractive small molecules targeting VEGF. In conclusion, although parallel structures are common to the promoter G-quadruplexes, our study indicates that each G-quadruplex is likely to adopt unique capping structures by its specific variable middle loop and flanking segments, which together determine the overall structure and specific interactions with small molecules or proteins.

ACCESSION NUMBERS
PDB ID 2m27

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.