Introduction

Duchenne muscular dystrophy (DMD, OMIM #310200) and Becker muscular dystrophy (BMD, OMIM #300376) are X-linked recessive neuromuscular disorders characterized by progressive muscle wasting and weakness due to degeneration of skeletal, smooth and cardiac muscles. The incidence of DMD and BMD is ~1 in 3500 and 1 in 20 000–30 000 live male births, respectively.1 DMD occurs earlier and is more severe than BMD. Patients affected by BMD have a broad spectrum of clinical symptoms ranging from mild to severe. No effective treatments exist for the severe forms of these disorders, both caused by mutations in the DMD gene (OMIM *300377).

The DMD gene, located at Xp21 locus, spans more than 2 Mb and contains 79 relatively small exons (range 32–269 bp), whereas intron (intervening sequence, IVS) size ranges from 107 to 248 401 bp. It has at least seven different tissue-specific promoters and two polyadenylation sites; moreover, differential processing gives rise to a variety of DMD transcripts encoding a set of protein isoforms.1 The protein translated from the larger transcript is the dystrophin, an important cytoskeletal protein that links the cytoskeleton of muscle fibre to the underlying basal lamina, in the skeletal muscle. Alteration or loss of dystrophin forces excess calcium into the muscle cell membrane, which causes mitochondria to undergo a ‘permeability transition’, that is, the regulated formation of the permeability transition pore complex or mitochondrial megachannel, which spans the outer and inner mitochondrial membranes, leading to an irreversible loss of matrix and intermembrane contents and swelling of the mitochondria;2 thus, the affected skeletal muscle will undergo mitochondrial dysfunction, amplification of stress-induced reactive-oxygen species production, sarcolemma damages and cell death. These cellular alterations cause increased serum levels of creatine kinase and sometimes hypertransaminasemia, also in asymptomatic patients, already in early childhood.3, 4 The disease progression is characterized by widespread necrosis of muscle fibres that are ultimately replaced with adipose and connective tissue.

Exon deletions in the human DMD gene are the most common pathogenic variants (50–70%) in DMD/BMD; 30% of patients has point mutations and about 5% exon duplications.5 Gene variants that shift the DMD open-reading frame in the spliced mRNA or that lead to stop codons result in truncated, non-functional dystrophin and usually gives rise to the DMD phenotype. Alternatively, genetic variants that do not affect the open-reading frame, thus predicting the production of a semi-functional protein, have been associated to the less severe BMD phenotype.6 This reading-frame hypothesis, which arises from analysis of genomic DNA, applies to about 90% of DMD/BMD cases and is the basic principle underlying the most recent molecular strategies that aim to definitively restore the dystrophin reading frame in muscle cells of severely affected DMD patients.7

Notably, the evidence that intragenic deletions preferentially cluster in two areas of the DMD gene, that is, the 5′ (exons 2–20) and the central (exons 45–53) regions, suggests that some features of the DNA or of the chromatin structure within intronic sequences may predispose the DMD gene to specific breakage or recombination events. However, precise mapping of intronic deletion breakpoints revealed no clustering of breakpoints and no extensive homology between adjacent sequences to each breakpoint.8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 Therefore, the molecular mechanisms generating these copy number variations remain poorly understood, although models based on locus-specific genomic architecture have been proposed.20 Moreover, because a relatively high frequency of rearrangements affects central exons of DMD, precise mapping of the breakpoint junctions located within this deletion rich region could drive a more precise targeting in a potential genome editing-based therapy.

Here we report mapping and sequence analysis of 17 breakpoints located in IVSs 50 and 51 of the DMD gene and the DNA sequence signatures at the junctions of each deletion. IVSs 50 and 51 are located within the central hotspot region that border the sequences encoding the hinge III segment (exons 50 and 51) of the dystrophin protein whose absence/presence has been correlated with the severity of the BMD phenotype.21

Materials and methods

We analyzed 17 unrelated DMD/BMD (DMD, n=10, BMD, n=7) patients carrying deletions starting or ending within IVSs 50 and 51 whose DNA samples were still available after molecular diagnosis. Exon deletions were detected by using quantitative fluorescence multiplex-PCR and multiple ligation probe amplification as described.5, 22, 23 Informed consent was obtained for each patient according to the procedure established by the local Bioethics Institutional Committee.

We amplified eight sequences tagged sites (Table 1, 200–300 bp), distributed at contiguous intervals of roughly similar size, within IVS 50 and IVS 51 of the DMD gene (GenBank NG_012232.1), in 17 DMD/BMD male patients with deletions starting or ending in these two IVSs. Moreover, we amplified DMD-specific sequences in IVS 44 (248 401 bp), 47 (54 222 bp) and 48 (38 368 bp) as described elsewhere.12, 24 The absence/presence of these sequences in patients was verified by 1% agarose gel electrophoresis, to define the intronic interval where each breakpoint lies. Genomic regions spanning the deletion breakpoints were amplified by long-PCR (Expand High Fidelity PCR System, Roche Molecular Systems Inc, Pleasanton, CA, USA), cloned in the pGEM-T (Promega, Fitchburg, WI, USA) plasmid and sequenced by primer walking on the ABI-Prism 377 Applied Biosystems Genetic analyzer (Applied Biosystems Inc, Foster City, CA, USA). Structural features of intronic sequences were characterized by searching for inverted and tandem repeats using the EMBOSS software analysis package (program Palindrome, http://www.bioinformatics.nl/cgi-bin/emboss/palindrome and Etandem http://emboss.bioinformatics.nl/cgi-bin/emboss/etandem, respectively), and for repetitive elements using the RepeatMasker program (www.repeatmasker.org).

Table 1 Primer sequences to amplify STS in IVS 50 and IVS 51, and amplicon regions (gene reference NG_012232.1)

Results

We mapped deletion breakpoints in 17 DMD/BMD patients that carry deletions starting or ending within IVSs 50 (45 782 bp long) and 51 (44 211 bp long) of the DMD gene (Figures 1 and 2). First, we used sequences tagged sites walking to roughly map residual intronic sequences in IVS 44, 47, 48, 50 and 51. Then, we amplified and sequenced the DNA fragments spanning all the deletion junctions (Figure 3, junction 1–17). Therefore, we precisely determined the size of each deletion (ranging from 2454 to 419 729 bp) and residual sequences of each IVS (Table 2). Sequencing allowed us to localize, at nucleotide level, 2 deletion startpoints in IVS 50, and 8 and 9 endpoints in IVSs 50 and 51, respectively. In these IVSs, breakpoints appear to be evenly distributed without any significant clustering also when compared with data reported by other authors.16, 17, 18, 19, 25 Interestingly, patients 76/343 and 171/676 (J7 and J8, respectively, Table 2), both carrying the exon 48–50 deletion, have probable (end of sequence homology) startpoints in IVS 47 and endpoints in IVS 50 located at a distance of 12 and 14 nt, respectively (Figure 3, Table 2). Moreover, patients 128/544 and 95/416 (J16 and J17, respectively, Table 2), both carrying the deletion of exons 49–51, share the same startpoint in IVS 48 and have probable endpoints in IVS 51 located at distance of only 3 nt (Figure 3 and Table 2). Microhomologies of 1–5 bp are present in 58.8% (10/17) of the breakpoint junctions (Figure 3 and Table 2) and small insertions (1–22 bp) have been found in 5 cases (J3, J7, J12, J13 and J15, Figure 3 and Table 2). Nucleotides from 4 to 16 of the 20 bp sequence inserted within J12 (patient 859, deletion of exons 45–51) are identical to a sequence of the ZFX gene (OMIM *314980), at chromosome Xp22.11 (Chromosome X: 24,149,173–24,216,255), about 7 Mb away from the DMD gene; the longest (22 bp) inserted sequence is present in J13 (patient 123/530, deletion of exons 45–51) and is identical to a sequence located at IVS 44 of the DMD gene, about 1300 nt downstream the deletion startpoint. Moreover, deletions of 1 (J1, J3, J4 and J15), 3 (J5), 11 (J8) and 56 (J15) nt have been observed within 30 nt from the probable breakpoints (Figure 3 and Table 2). Genomic rearrangements can give rise to microdeletions, as the 11 and 56 nt deletions we detected in J8 and J15 breakpoints, which abolish a palindromic sequence (Figure 3) and lie in the 3′ of a LINE/L1 repeat sequence (Table 2), respectively. Sequence features that might facilitate DNA breakage through exposure of single-stranded DNA, such as inverted repeated sequences with a stem of 6 or more base pairs, are present near 12/17 breakpoints (J1, J2, J4, J7, J8, J9, J10, J11, J14, J15, J16 and J17) and short (6–8 nt) tandem repeats are present near the breakpoints in junctions J5, J6, J7, J8, J9 and J13 (Figure 3). The short deletion consensus sequence TGRRKM26 is present in proximity of 10 deletion breakpoints (J2, J5, J7, J8, J10, J11, J12, J14, J15 and J17, Figure 3) and the immunoglobulin heavy chain class switch repeat TGAGC,27 which has been found over-represented in close proximity to many deletion breakpoints,28 is present in 3 deletion breakpoints (J7, J8 and J12, Figure 3). The TTTAAA sequence, which is known to be able to curve the DNA molecule,29 has been found at or in close proximity to junctions J4, J10 and J14 (Figure 3). In particular, three TTTAAA sequences are present in junction J4: two flank the breakpoint site at IVS47, and one is in IVS50 (Figure 3). Analysis of IVS 50 and 51 sequences of the DMD gene revealed a number of interspersed repeats, such as LINE, LTR and SINE, as reported in Table 2; ~82% (14/17) of sequenced breakpoints are flanked by at least one kind of repeat element (Table 2). However, no extensive homology exists in these cases.

Figure 1
figure 1

Localization of deletion breakpoints in IVS 50. The sample identification numbers and the DMD exon deletions are indicated; deletion startpoint; deletion endpoint.

Figure 2
figure 2

Localization of deletion breakpoints in IVS 51. The sample identification numbers and the DMD exon deletions are indicated; deletion startpoint; deletion endpoint.

Figure 3
figure 3

DNA sequences spanning the 17 deletion junctions (J1–J17), with the corresponding normal intervening sequences (IVS) regions. Only the 5′–3′ strands are shown. The numeration was on the dystrophin genomic reference sequence NG_012232.1. Capital letters indicate exonic sequence (J10). Microhomology regions across the deletion junction are in boxes. Filled arrowheads () show the probable deletion breakpoints (end of sequence homology). New nucleotides inserted are in bold. The sequence TTTAAA is underlined. Arrows and dotted arrows indicate palindromic sequences with a stem of 6 or more base pairs and short tandem repeats, respectively; +++++ indicates the short deletion consensus, TGRRKM;26 **** indicates the immunoglobulin class switch repeat TGAGC.27 56 nt: 56 nt deletion in J15 (IVS 47, nt 51010–51065).

Table 2 Breakpoints and DNA sequence signatures at the junctions of 17 deletions in the DMD gene

Discussion

In this work we report molecular and structural characterization of 17 deletion breakpoints that start or end at IVS 50 and 51 of the DMD gene. Several mechanisms have been proposed to explain genomic rearrangements, such as non-allelic homologous recombination, non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ) involving short (2–20 bp) stretches of sequence and L1 retrotransposition.20 In particular, NHEJ is one of the main mechanisms for repairing double-stranded DNA breaks by ligation of two DNA ends without the need for a homologous template; the two DNA ends then associate in a manner that tolerates nucleotide loss or addition at the junction site. NHEJ typically utilizes short homologous DNA sequences to guide repair. Some features of the DNA breakpoint sequence we analysed could contribute to generate DMD gene rearrangements. Indeed, microhomologies of 1–5 bp, typical of NHEJ and MMEJ events, are present in 58.8% of our deletion junctions, comparable to the previous findings for the DMD gene.17, 30, 31 Furthermore, the local DNA sequence environment may be involved in promoting gene deletions. Human sequences involved in deletion events often contain the short deletion consensus sequence, TGRRKM, generally associated with palindromic sequences at junctions.26 We found that several deletion consensus sequences, palindromic sequences and short tandem repeats flank the junctions we analyzed. These sequences may promote DNA instability by facilitating the formation of secondary structure intermediates (hairpin loop structure in a single strand of DNA), which predispose DNA to breakage and intragenic recombination. Moreover, five TTTAAA sequences, known to be able to induce DNA bending, have been found at or in close proximity of junctions J4, J10 and J14 (Figure 3).29 Given the expected frequency of this sequence (1/1420 bp) in the human genome,32 the probability that 5 such sequences occurred by chance near the breakpoints is extremely low, suggesting that curved DNA may be involved in one of the mechanisms of the DMD gene deletions. Furthermore, RepeatMasker analysis revealed that 82.3% of breakpoint junctions aligned with one or more repeat elements, consistent with previous studies on DMD gene rearrangements, but much higher than the average frequency (35.6%) of repetitive sequences in the DMD gene.6, 12 Therefore, these highly homologous sequences (such as LINEs or SINE) might be involved in DMD gene rearrangements. Interestingly, no such repeated element was identified around the almost equal junctions J7 and J8 (Figure 3), which obviously share various DNA motifs (inverted and palindromic repeats). This observation supports the concept that genetic instability may depend on short DNA motifs that mark the genomic architecture of such regions.

Although the limited number of samples we analyzed may reduce the significance of our results, the sequence signature analysis we performed at level of the breakpoint junctions of 17 DMD gene deletions is helpful in explaining why rearrangements preferentially occur within particular DMD gene regions. Moreover, it enlarges the number of sequenced deletion breakpoints in IVSs 50 and 51, which are located within the hotspot deletion region and border the sequences encoding the hinge III segment (exons 50 and 51) of the dystrophin. The absence/presence of the hinge III has been correlated with the severity of the BMD phenotype; in fact, the loss of this region correlates with a slower disease progression.21 Therefore, removing DNA regions that encode this protein domain or restoring the reading frame of out-of-frame deletions by mutation-specific gene therapy, including exon skipping or suppression of nonsense mutations during translation, may be considered a strategy to treat severe DMD forms.7, 33 Efficient genomic correction using CRISPR–Cas9 system is a promising method to permanently correct DMD mutations in induced pluripotent stem cells of DMD patients at genomic level.33 Indeed, genome editing by CRISPR–Cas9 is able to remove internal, but unessential, regions of the mutated gene to restore the reading frame.6 Driver of the editing is the Cas9 protein that forms a complex with the 3′ end of a single guide RNA (sgRNA) molecule. The protein-RNA pair recognizes its genomic target by complementary base pairing between the 5′-end of the sgRNA sequence and a predefined 20 bp DNA sequence, known as protospacer, which must be identified within the target genomic region. Alternatively, co-introducing in the cell Cas9/sgRNA and a donor DNA template can induce a homologous recombination event that allows to insert a specific DNA element at a target locus, based on the homology sequence of the 5′- and 3′-arms of the donor template.33

Notably, reading frame could be efficiently restored by removing sequences encoding exon 51 in our DMD patients with the out-of-frame deletions of exons 45–50 and 48–50 (J1–J8, Figure 3 and Table 2). Moreover, full-length dystrophin could be rescued by knocking-in exon 51 in the two patients with the isolate deletion of exon 51 (J9 and J10, Table 2). In such cases, potential targets of genome editing lie within residual sequences of IVS50, exon 51 and IVS51 of DMD and should spare repeated elements. Therefore, exact breakpoint mapping and information about the genomic architecture surrounding breakpoint junctions could drive a more precise and specific targeting of the residual intronic sequences within the central DMD deletion rich region in a potential genome editing-based therapy.