Short Inverted Repeats Are Hotspots for Genetic Instability: Relevance to Cancer Genomes

Analyses of chromosomal aberrations in human genetic disorders have revealed that inverted repeat sequences (IRs) often co-localize with endogenous chromosomal instability and breakage hotspots. Approximately 80% of all IRs in the human genome are short (<100 bp), yet the mutagenic potential of such short cruciform-forming sequences has not been characterized. Here, we ﬁnd that short IRs are enriched at translocation breakpoints in human cancer and stimulate the formation of DNA double-strand breaks (DSBs) and deletions in mammalian and yeast cells. We provide evidence for replica-tion-related mechanisms of IR-induced genetic instability and a novel XPF cleavage-based mechanism independent of DNA replication. These discov-eries implicate short IRs as endogenous sources of DNA breakage involved in disease etiology and suggest that these repeats represent a feature of genome plasticity that may contribute to the evolu-tion of the human genome by providing a means for diversity within the population.

Correspondence karen.vasquez@austin.utexas.edu In Brief The significance of short inverted repeats (IRs) in the human genome is unclear. Lu et al. find that short IRs are enriched at translocation breakpoints and provide evidence for ''replication-related'' and ''DNA-structure-specific cleavage'' models of IR-induced DNA breaks and mutations in mammalian cells, suggesting a role in cancer-associated genomic instability.

INTRODUCTION
Genetic analyses of cancer-related genetic instability events have detected regions of the human genome that are hypersusceptible to breakage, which can lead to the deregulation of oncogenes and/or inactivation of tumor suppressors (Popescu, 2003). Interestingly, many such regions contain sequences that can adopt alternative conformations (i.e., non-B DNA), and several of these conformations have been shown to be sources of genetic instability (Kurahashi et al., 2004;Nasar et al., 2000;), yet the underlying mechanisms are not clear.
This study fills a gap in our understanding of the role of short IRs in genomic instability in mammals by providing evidence that cruciforms formed at short IRs (%30 bp) can stimulate DSBs by stalling DNA replication forks and/or by activating enzymes (i.e., ERCC1-XPF) that cleave the structures, causing deletions. These findings provide a mechanistic explanation for the co-localization between short IRs and human cancer breakpoints and support the hypothesis that non-B DNA is involved in genetic instability, disease etiology, and evolution.

Short IRs Adopt Cruciform Structures and Induce Genetic Instability
To determine the mutagenic potential of short IRs in mammalian cells, we inserted a 29-bp cruciform-forming IR or a 29-bp control sequence into a lacZ 0 mutation-reporter gene on the vector pUCNIM ( Figure 1A). Cruciform formation on the plasmid (pU + ) was confirmed by T7 endonuclease I cleavage ( Figure S1). pU + and the control pUCON were introduced into mammalian COS-7 cells and screened for mutations 48 hr post-transfection. pU + stimulated mutations $3-fold above that of pUCON (9.2 3 10 À3 versus 2.9 3 10 À3 : p < 0.003; chi-square test, Figure 1B), demonstrating that short IRs are mutagenic in mammalian cells. Restriction digestion and sequencing analyses revealed that >90% of IR-induced mutants consisted of large deletions (>200 bp) ablating the IR sequence ( Figures 1C and S2), with some mutants suffering deletions of >2,000 bps ( Figure S2A).
Most deletion mutants contained 1-6 bp of microhomologies at the breakpoint junctions, suggesting that they were generated from DSBs processed by an error-prone microhomology-mediated end joining (MMEJ) mechanism ( Figure S2B). The remaining $10% of IR-induced mutants contained small insertions or deletions (indels) within the IR sequence ( Figures 1C and 1D), suggesting a structure-specific cleavage model where the single-stranded cruciform tips represent endonucleolytic targets for a ''center-break mechanism'' . No point mutations were identified. In contrast, the control (pUCON) mutants contained >20% single base substitutions, with the control insert remaining intact in 74% of the deletion clones ( Figure 1C).
Similar results were obtained in replication-incompetent HeLa cell extracts. Here, IRs induced mutations >4-fold above back- ground (4.3 3 10 À4 for pU + versus 0.9 3 10 4 for pUCON: p < 0.01; chi-square test, Figure 1E), similar to the levels obtained when DNA replication was enabled by supplementing the extracts with large T antigen ( Figure S3). About 60% of the mutations were large deletions (>200 bp) with microhomologies at the breakpoint junctions, suggesting that they were largely products of error-prone DSB repair.

Short IRs Stimulate DSBs in Replication-Competent and -Incompetent Systems
To provide direct evidence that the IRs stimulated DSBs in mammalian cells, ligation-mediated PCR (LM-PCR) was performed Wang and Vasquez, 2004) using an upstream primer ($200 bp from the IR) and a primer within the linker. The results revealed a breakpoint hotspot (BH1) at the IR, and another (BH2) 60 bp upstream of the IR (Figure 2A). Sequencing of the PCR products mapped the locations of DSBs near the base of the predicted cruciform stem (1-10 bp upstream of the IR) and $60 bp upstream of the IR ( Figure 2C). In both regions breakpoints were clustered within a small area, consistent with DNA-structure-induced DSBs. By contrast, no distinct DSB hotspots were identified on the control plasmid.
Interestingly, the locations of the IR-specific DSBs in replication-incompetent HeLa extracts were similar to those generated in replication-competent COS-7 cells; both BH1 and BH2 were clearly present, while BH2 was diminished in the replicationcompetent in vitro system ( Figure 2B). These results implicate both replication-independent and replication-related cleavage mechanisms for IR-induced DSBs.

Short IRs Stall DNA Replication Forks In Vivo
To explore plausible mechanisms of IR-induced genetic instability, we examined the ability of short IRs to impede DNA polymerase, as long IRs can stall DNA replication forks in yeast (Voineagu et al., 2008). We performed two-dimensional (2D) gel electrophoresis to separate replication intermediates recovered from mammalian COS-7 cells. After removing unreplicated DNA by DpnI digestion, replication intermediates consisting of the NdeI and BsaI fragments (containing the inserts) of control (pCON) and IR-containing (pS + ) plasmids were analyzed. We detected the expected smooth Y-shaped replication arc on the control plasmid. In contrast, the arc generated from the IR-containing plasmid contained a bulge on the right arm of the Y-arc mapping to the IR site, and a corresponding weaker signal on the left arm, indicating replication fork stalling at the IR sequence ( Figure 2D). In addition, the IR-containing samples (pS + ) contained substantially more double Y-arc replication intermediates than the control samples, indicative of stalled replication forks at the IR colliding with replication forks progressing from the opposite direction. This result supports replication stalling-related breakage as a likely mechanism for instability in vivo, although stimulation of recombination by stalled replication could also lead to rearrangements without DSBs (Lambert et al., 2005).
The DNA Repair Complex ERCC1-XPF Cleaves Cruciforms and Is Involved in IR-Induced Genetic Instability To identify gene products responsible for cleaving cruciforms in vivo, we inserted the IR or control sequences downstream of the selectable URA3 gene in a yeast artificial chromosome (YAC) and screened for loss of URA3 function in wild-type BY4742 and a number of gene-deficient yeast strains (Wang et al., 2009). The IR-induced loss of URA3 function was substantially diminished in the absence of RAD1, the yeast homolog of human XPF (6.5 3 10 À5 versus 2.1 3 10 À5 , p = 0.0018), suggesting that Rad1p/XPF was required for cruciform-induced DSBs and subsequent genetic instability ( Figure 3A).
Moreover, purified human recombinant ERCC1-XPF cleaved the cruciform structures in vitro, to yield an $35 nt fragment (Figure 3C), supporting the genetic data. Interestingly, incision by ERCC1-XPF occurred on the 3 0 side of the cruciform loop, rather than 5 0 of the single-strand/double-strand junction, as predicted based on its cleavage pattern on a stem-loop, a well-established substrate ( Figure 3D).

Short IRs Are Enriched at Translocation Breakpoints in Human Cancer Genomes
To support the biological significance of our findings, we investigated whether short IRs promote genetic instability in cancer. We developed a computer program to map such repeats within ±100 bp of 19,956 translocation breakpoints from sequenced cancer genomes (COSMIC at http://cancer.sanger.ac.uk/ cancergenome/projects/cosmic/). The average number of 7-30 bp IRs in the ±100 bp regions surrounding the breakpoints was significantly greater than in 20,000 random control sequences extracted from the reference human genome (mean ± SD 28.5 ± 4.7 versus 21.4 ± 5.0; p = 1.14 3 10 À31 ; two-tailed t test), and their location peaked at the cancer breakpoint sites ( Figure 4A). Moreover, there were up to five times more IR sequences containing stem lengths of 10-30 bp in cancer translocations than in the control sequences ( Figure 4B). Thus, we conclude that short IRs induce chromosomal translocations in human cancer, likely through the ability to extrude into cruciforms.

DISCUSSION
Sequences with the capacity to adopt cruciforms are abundant in the human genome, with IRs R8 bp found in $1 in every 5,600 bp in the human genome (Schroth and Ho, 1995). Herein, we found that short IRs (stem length 10-30 bp) are significantly enriched within 200 bps surrounding 19,956 translocation breakpoints in human cancer genomes. Toward a mechanistic understanding of this co-localization, we provide direct evidence that short IRs can induce genetic instability in mammalian cells. Short IRs stimulated DSBs, resulting in large deletions containing microhomologies at the breakpoint junctions in >90% of the mutants ( Figure S2). These findings suggest that IR-induced DSBs are processed by MMEJ, a pathway implicated in chromo- somal translocations in cancer (McVey and Lee, 2008). We have identified at least two plausible mechanisms by which IRs can induce DSBs and genetic instability: replication fork stalling and collapse, resulting in DNA replicationdependent mutagenesis; and replication-independent, structure-specific cleavage by ERCC1-XPF and, perhaps, additional nucleases. It has been shown that long IRs consisting of two inverted Alu sequences ($300 bp) slowed the progression of replication forks in bacteria, yeast, and mammals, promoting DNA strand breaks (Pearson et al., 1996;Voineagu et al., 2008). Here, we found that much shorter IRs of 14 bps can stall replication forks ( Figure 2D), which can lead to DNA strand breaks. This provides direct evidence for a replication-related mechanism of short IR-induced genetic instability in vivo.
We also found that short IRs stimulated mutations in the absence of DNA replication, implicating a replication-independent mechanism of IR-induced mutagenesis. Moreover, IRinduced DSBs in replication-incompetent HeLa extracts differed from those on replicating templates (Figures 2A and 2B). These findings strengthen our earlier hypothesis  that, in addition to replication-related mechanisms, there are replication-independent mechanisms of DNA-structure-induced genetic instability that rely on structure-specific endonuclease cleavage. The identification of mutants containing duplications of the spacer region between the IRs or deletions of half of the IR sequence ( Figure 1D) provides further support for structurespecific cleavage of non-B DNA structures. These mutations may have arisen from a ''center-break mechanism,'' where the cruciform tips were targets for endonucleases . These deletion events, together with the results from LM-PCR, suggest that repair nucleases may contribute to IRinduced mutagenesis by structure-specific cleavage in mammalian cells.
We found that lack of XPF in human cells or Rad1 in yeast significantly reduced IR-induced instability. Although ERCC1-XPF has been implicated in Holliday junction resolution (Agostinho et al., 2013;Oh et al., 2008) and can cleave at the 5 0 end of a four-way structure (Agostinho et al., 2013) or 5 0 to a stemloop junction ( Figures 3C and 3D), we found that cleavage occurred at the 3 0 side of the cruciform tip ( Figures 3C and  3D). Whether this behavior is due to local structural features or a property common to short cruciforms remains to be determined. In any case, ERCC1-XPF cleavage on both tips of a cruciform will lead to DSBs. The differences in the predominant cleavage sites identified by cleavage assays ( Figure 3C) and LM-PCR analysis (Figures 2A and 2B) may have occurred because LM-PCR identifies the most upstream breakpoints after DSB processing.
Most IRs present in eukaryotic genomes are >60% A+T rich (Schroth and Ho, 1995). We also found that the cancer-associated IRs were more A+T rich than those of the control data set (0.79 ± 0.04 versus 0.72 ± 0.05; p = 2.39 3 10 À45 ) ( Figure S4). We performed similar experiments with a plasmid containing the same repeat length (29 bp), but A+T rich, and observed similar results to those obtained with the G+C-rich IR. Moreover, we found that a very short IR (5 bp in each arm) was mutagenic (data not shown). Thus, the effect of IRs on replication and mutagenesis was for the most part dependent on the cruciform structure itself.
Collectively, our data reveal both replication-related and -independent mechanisms for IR-induced mutagenesis. During replication, the duplex is separated into single strands and negative supercoiling is generated, which facilitates cruciform extrusion (Lilley, 1980). Our data support a model in which cruciforms can stall replication forks and/or can be recognized and processed in a structure-dependent fashion, thereby leading to DSBs ( Figure 4C). The resulting DSBs are then repaired in an error-generating fashion by MMEJ. Our findings support the conclusion that IR-mediated rearrangements contribute to human disease; thus, a better understanding of these processes may lead to novel strategies to treat or prevent such diseases caused by genetic instability.

EXPERIMENTAL PROCEDURES Plasmids
A 29-bp IR and a control sequence unable to adopt non-B DNA were cloned into the lacZ 0 mutation-reporter vector pUCNIM at the EcoRI-SalI cassette ( Figure 1A). The plasmids were named pU + and pUCON, respectively. The four-way junction-specific T7 endonuclease I was used to determine cruciform formation.

Mutagenesis Assays
Plasmid DNA was transfected into COS-7, XPF-proficient or -deficient cells using AMAXA Nucleofector Kit V according to recommended protocols. After 48 hr, plasmids were recovered using a QIAGEN Miniprep kit and treated with DpnI to remove unreplicated plasmids. Mutants were identified through a DH5a blue-white screen, as described . Mutation frequencies in mammalian cells were adjusted by subtracting the bacterial mutation frequencies. Student's t tests were used to determine statistical significance if not otherwise described.

LM-PCR
LM-PCR was carried out as described Wang and Vasquez, 2004). The region between the upstream primer and IR was $200 bp. Amplified PCR products were separated on 1.8% agarose gels, bands purified using Promega Wizard SV Gel and PCR Clean-Up System, and cloned into the Promega pGEM-T vector for sequencing.

In Vitro Mutagenesis Assays
HeLa cell-free extracts from a CHIMERx DNA replication assay kit were used as a replication-incompetent system. Purified large T antigen was added to allow for replication in extracts. Fifty nanograms of plasmid DNA was incubated for 6 hr at 37 C and then cleaved with DpnI (from replicating extracts). IR-induced mutants were determined by blue-white screening and analyzed by restriction digestion and DNA sequencing.

Figure 4. Short IRs Are Associated with Translocations in Cancer Genomes
(A) Running average of IR motifs in the cancer (red) and control (black) data sets whose center loop positions are located at each base along the ±100 bp flanking cancer translocation breakpoints or the control sites. (B) Distribution of stem lengths for the IRs found in the cancer and control data sets. (C) Model of replication-related genetic instability where a cruciform impedes DNA replication (right) and replication-independent mechanisms of structurespecific cleavage (e.g., XPF) (left).

Two-Dimensional Gel Electrophoresis
The control and IR sequences, identical to those in the mutation reporter plasmids, were inserted into a different backbone and named pCON and pS + , according to their inserts. Plasmids were transfected into COS-7 cells, and after 24 hr replication intermediates were isolated and separated by 2D gel electrophoresis, as described (Krasilnikova and Mirkin, 2004). DNA was transferred to nitrocellulose membranes, and replication intermediates were identified with radiolabeled probes for the NdeI-BsaI inserts.

IR-induced Loss of URA3 in Yeast
Loss of URA3 function in yeast was assessed as described (Wang et al., 2009). Briefly, YACs containing the IR or control sequences were introduced into wild-type yeast strain BY4742, or mutant yeast strains. Yeast cells were plated on 5-fluoroorotic acid (FOA) selective media to screen for loss of URA3.

ERCC1-XPF Cleavage Assays
Sixty nanograms of purified human recombinant ERCC1-XPF or BSA was incubated with 4 3 10 À8 M DNA substrates ( Figure 3D; Supplemental Information) at 30 C for 1 hr in a buffer containing 50 mM Tris (pH 8.0), 750 mM MnCl 2 , 500 mM DTT, and 20 mM NaCl. The cleavage products were resolved by electrophoresis on 12% denaturing polyacrylamide gels and visualized by a Typhoon PhosphorImager.

Bioinformatic Analyses
The cancer genome translocation breakpoint data set was obtained from COSMIC at http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/. Perfect IR motifs, stem length 7-30 and loop size 0-7 bases, were retrieved using custom scripts; for overlapping IRs, only the longest match was output. The bedtools random utility was used to generate the set of 20,000 nongap-matching 200-bp sequences. In order to confirm cruciform structure formation at the short IR sequence, plasmids pU+ and pUCON were first treated with T7 endonuclease I, which cleaves at the four-way junction between B-DNA and the cruciform stems, and then digested with ScaI to release ScaI-ScaI fragments containing the IR insert or the control sequence. The 1.2 kb ScaI-T7 and 1 kb T7-ScaI fragments indicate cruciform extrusion at the IR sequence in a subpopulation of plasmids. Thus, our data indicate that the short IR sequence can form a cruciform structure in vitro.  Plasmids pU+ and pUCON were incubated for 6 h in HeLa cell extracts supplemented with SV40 large T antigen to support replication. The plasmids were then purified and subjected to blue-white screening to detect mutations that inactivated the lacZ´ reporter gene. The results indicate that pU+ induced mutations ~3-fold above that of the control plasmid, pUCON (4.9x10 -4 vs. 1.5x10 -4 : P < 0.02). Running average of A+T fractions for the IRs along ±100-bp flanking cancer translocation breakpoints (red) or the randomly picked control sites (black). The results

Supplemental Information includes Supplemental Experimental
show that cancer-associated IRs were more A+T-rich than those of the control set (mean ± SD, 0.79 ± 0.04 versus 0.72 ± 0.05; P = 2.39 x 10 -45 ).
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.