Uncovering the true features of dystrophin gene rearrangement and improving the molecular diagnosis of Duchenne and Becker muscular dystrophies

Summary Duchenne and Becker muscular dystrophies (DMD/BMD) are caused by complex mutations in the dystrophin gene (DMD). Currently, there is no integrative method for the precise detection of all potential DMD variants, a gap which we aimed to address using long-read sequencing. The captured long-read sequencing panel developed in this study was applied to 129 subjects, including 11 who had previously unsolved cases. The results showed that this method accurately detected DMD mutations, ranging from single-nucleotide variations to structural variations. Furthermore, our findings revealed that continuous exon duplication/deletion in the DMD/BMD cohort may be attributed to complex segmental rearrangements and that noncontiguous duplication/deletion is generally attributed to intragenic inversion or interchromosome translocation. Mutations in the deep introns were confirmed to produce a pseudoexon. Moreover, variations in female carriers were precisely identified. The integrated and precise DMD gene screening method proposed in this study could improve the molecular diagnosis of DMD/BMD.


INTRODUCTION
Molecular genetic studies have indicated that both Duchenne (OMIM: 310200) muscular dystrophy (DMD) and Becker (OMIM: 300376) muscular dystrophy (BMD) are caused by mutations in the dystrophin gene (DMD). 1 In male probands, hemizygous pathogenic variants are frequently identified in DMD.In contrast, females are usually asymptomatic carriers; however, some female carriers show mild skeletal muscle and/or cardiac symptoms of DMD/BMD and are referred to as manifesting carriers.In patients with DMD/BMD, 68-70% of mutations are single or multiple exon deletions, and 8-11% are duplications of the DMD gene.3][4] In the case of patients with DMD/BMD, diverse methods for causative mutation investigation have been employed, of which multiplex ligation-dependent probe amplification (MLPA) is commonly and preferentially applied to simultaneously screen all 79 exons of DMD for deletions/duplications. 5 However, single exon deletion still requires additional validation owing to the probe binding problem.Moreover, MLPA is not suitable for detecting single nucleotide variants (SNVs) and small indels. 6High-throughput sequencing (short-read sequencing) technologies are widely applied to call SNVs, microdeletions, and microduplications.8][9] Nanochannel-based next-generation mapping enables the sensitive detection of pathogenic structural variations (SVs), which can be missed by PCRbased techniques or chromosomal microarrays; however, the resolution of breakpoints is limited to endonuclease nicking site density. 104][15] Thus, RNA analysis of muscle tissue using RT-PCR or short-read sequencing could serve as a genetic diagnostic tool for these rare mutations; however, this method requires a highly invasive muscle biopsy procedure.Moreover, clinicians usually require further investigations of DNA mutations for gene therapy and prenatal diagnosis.Therefore, we believe that a method that can obtain an accurate and full spectrum of mutation types of the DMD gene based on genomic DNA resources could benefit the molecular diagnosis and treatment of patients with DMD/BMD.
Long-read sequencing, represented by two mainstream platforms: Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB) single-molecule real-time sequencing, continues to rapidly improve throughput and reduce costs. 16,17Compared to conventional methods, long-read sequencing technologies offer more possibilities, especially in the detection of SVs and complex rearrangements, which involve duplications, 18,19 whereas SNVs and small indels can also be accurately identified with adequate read depth (>303) and HiFi sequencing technology. 20Additionally, targeted long-read sequencing is considered to be an efficient and cost-effective method for complex variant detection. 21Therefore, long-read sequencing can be used as a unified methodology to detect all heterogeneous sets of pathogenic mutations in the DMD gene.Accordingly, we designed a complete DMD gene panel based on long-read sequencing technology.Furthermore, to uncover the true features of DMD mutations, we performed genetic sequencing and analyzed 129 patients with DMD/BMD using ONT and PB platforms.

Patient information
Among the 129 subjects included in this study, there were 76 males diagnosed with DMD, 16 males with BMD, seven males with intermediate muscular dystrophy (IMD), and two symptomatic female carriers.Additionally, 24 asymptomatic female carriers and four normal subjects (one male and three females) were included.Of the subjects, 76 were identified by the MLPA method, 33 had mutations that could be detected by conventional short-read sequencing methods, such as whole exome sequencing (WES) or DMD gene panel, and 11 were specifically sampled for this study due to the failure of genetic diagnosis using MPLA or NGS-based methods.Additionally, nine relatives who had not been previously diagnosed using MLPA or short-read sequencing were enrolled.In this cohort, 70 subjects were sequenced using the ONT platform, 51 using the PB platform, and eight using both the ONT and PB platforms for parallel validation (Figure 1; Tables S1 and S2).

Sequencing data overview
A total of 137 independent long-read datasets were generated for the 129 subjects.An average of 118,433 and 1,261,799 reads were collected from the PB and ONT datasets, respectively.The PB platform had an average read length of 2,502 bp with a coverage of 96.92% of the target region, while the ONT platform had an average read length of 1,945 bp with a coverage of 99.99% of the target region.The average sequencing depth for PB and ONT was 168 3 and 382 3, respectively.

Captured long-read sequencing provided a more efficient method for Duchenne muscular dystrophies gene mutation detection
In this study, accurate variations of the DMD gene were identified and corrected using the designed long-read sequencing panel, including mutations that had previously failed to be detected.Captured long-read sequencing provides an advantage for large-scale inversion and translocation detection, with SV fragment sizes ranging from 11 kb to 40 Mb.The results showed that translocations, duplications, and inversions had a wider range of alterations than deletions in the DMD gene (Tables 1 and 2).In addition, targeted long-read sequencing is capable of detecting microindels and SNVs, including deep intron variations, which is more accurate than short-read sequencing at the same 303 sequencing depth.In this study, 41 patients had SNVs and small indels within the DMD gene.Of these, five were deep intron mutations predicted to affect transcription by generating splicing sites and pseudoexons.Moreover, transcriptional alteration in subject D116, who had an 11-kb deletion in intron 48 (chrX: 31881062-31892406), was confirmed to generate a 161-bp transcription by muscle biopsy RNA-Seq (Figure S1).All SNV mutations were correctly identified using long-read sequencing platforms.Moreover, the software and analytic process mentioned in the STAR methods section accurately detected SNVs located within the homologous sequence (subject D130) in both the ONT and PB datasets, even in areas where short-read WGS failed.Nonsense and splice-site mutations were more frequently detected in SNVs (Table 1).

Captured long-read sequencing revealed the precise characteristics of structural variation ligation in Duchenne muscular dystrophies
Using the target long-read sequencing method, a total of 84 patients were found with duplications and/or deletions in the DMD gene, of which 32 had duplications, 34 had deletions, nine had inversions, two had translocations, and seven had large discontinuous deletions and duplications (Table S2).Based on the captured long-read sequencing results, we confirmed that all breakpoints of the SVs fell within the intron region and that most contained microinsertions.Additionally, a few subjects had a large fragment (600-1000 bp) insertion between the breakpoints (Table S3); however, sequential ligation was not affected.Moreover, although most of the subjects were found to have headto-tail ligation, we found a rare subject (D127) with continuous exon 19 to exon 44 duplications that resulted from an upstream to exon 18 deletion (NC_000023.10:g.32521926_40387205del)and an upstream to exon 44 duplication (NC_000023.10:g.32119087_40772185dup).Meanwhile, a nonsense mutation (NM_004006.3:c.5404C>T) was identified.Therefore, the above findings showed that not all sequential duplications were as simple as they seemed in the MLPA results (Figure 2).In addition, in nine of the subjects exhibiting DMD intragenic inversion, seven had previously been detected as having discontinuous deletions or duplications using the MLPA method (Figure 3; Table 2).The junction reads produced by inversion and translocation were verified with Sanger sequencing (Figure S2).The captured long-read sequencing panel revealed the characteristics of SV ligation in DMD.

Captured long-read sequencing exhibited higher accuracy for structural variation haplotype detection in female carriers
For SV haplotype detection, samples from 29 female participants were analyzed.We found that the SV haplotypes could be identified by longread sequencing and were independent of sex.In addition, translocation occurred in two female patients (Figure 3; Table 2; Table S1).Thus, compared with MLPA and short-read sequencing methods, this one-step captured long-read sequencing method could attain higher accuracy in screening female carriers (Tables S1 and S2).

DISCUSSION
In this study, we investigated the efficacy of a captured long-read sequencing panel and validated its performance in a cohort of 129 individuals.The results demonstrated that, compared with MLPA and short-read sequencing, the captured long-read sequencing method has significant advantages in uncovering the underlying large-scale rearrangements of DMD, especially in the detection of deep intron SVs, inversions, and translocations.In addition, this designed panel could capture the homologous region of DMD, which the short-read sequencing method could not achieve.Furthermore, at a 303 sequencing depth, this long-read sequencing panel precisely detected microindel mutations and deep intronic SNVs.We also observed that this method provided a more precise identification of SV breakpoint positions in the DMD gene.Both the ONT and PB sequencing methods were able to determine the exact breakpoints for deletions, duplications, and other complex rearrangements; however, PB showed slightly better consensus calling than ONT due to a more congruent sequence error pattern.Thus, for SNV detection, PB is more accurate than ONT.However, the accuracy of ONT could be improved by increasing the sequencing depth.Thus, ONT sequencing has the potential to meet the requirements for DMD mutation detection.
Although previous studies have characterized the deletion and duplication patterns of DMD, accurately identifying the precise junctions of breakpoint sequences, especially those involving duplicates, remains challenging.This highlights the need for further analyses of duplicates to achieve an accurate diagnosis and facilitate antisense-mediated exon-skipping therapy. 22,23Here, long-read sequencing demonstrated robust proficiency in decoding the entire sequence of the DMD rearrangement region, surpassing conventional methodologies.In particular, we found that not all duplications were tandem repeats, which directly revealed the nature of exon duplication in DMD.
MLPA is an effective method for detecting exon rearrangements in patients with DMD or BMD.These variations include the concurrence of exon deletion and duplication as well as the non-contiguous deletion or duplication of DMD exons.5][26] Our long-read sequencing study suggests that complex SVs are mostly induced by inversion or translocation in unstable regions.Previous studies using chromosomal inversion and translocation detection methods have shown that an inverted duplication occurring next to a terminal deletion is a common rearrangement pattern in cancer and constitutional genomes. 27,28Our junction sequencing analysis revealed that non-contiguous duplications were interconnected with duplications in either the direct or inverted orientation.Complex deletions and duplication rearrangements may be the result of replication fork collapse. 29n addition, captured long-read sequencing significantly improves the detection of SVs in numerous genetic diseases for repeat expansion diseases in both male and female carriers. 30,31In our study, all female carriers were sequenced, and SVs were accurately identified in single alleles.Unlike the short-read sequencing-based method, which identifies duplications or deletions (copy number variations) using read-depth information, the long-read-based method takes advantage of split-read mapping.Hence, it can more accurately identify copy number variations in the heterozygous state.Therefore, captured long-read sequencing is a considerably more appropriate molecular diagnostic method than short-read sequencing for female carriers.
Both DMD and BMD are primarily caused by DMD mutations that lead to DMD genes of abnormal size, quantity, or function. 32However, occasionally, muscle biopsies may be necessary to accurately identify cases that cannot be resolved through mRNA and dystrophin protein expression studies alone. 33Specifically, an SNV in the exon or intron of DMD produces a premature termination codon or pseudoexon, resulting in DMD/BMD. 34,35However, these mutations are generally considered when SVs are not found in clinical gene detection, which extends the time of diagnosis.Our one-step mutation detection platform will accelerate the process of DMD gene detection, making the diagnosis much more efficient.But, the exception is the variation occurs during transcription only.Furthermore, in emerging treatments for patients with DMD/BMD, antisense oligonucleotide-mediated exon skipping is one of the most promising therapeutic approaches and is expected to be applicable to most mutations, including deletions, duplications, and nonsense mutations of in-frame exons. 36However, it is difficult to design an antisense oligonucleotide for certain duplications and complex variations when the actual sequences are unclear.Similarly, a gene-editing therapeutic strategy also requires accurate and detailed DMD defect information to select the appropriate candidate.Therefore, the new era of gene therapy for DMD/BMD requires more accurate mutation detection.The long-read sequencing panel used in this study offers a highly efficient and accurate approach for the molecular diagnosis of DMD/BMD.Additionally, these findings have expanded our understanding of DMD gene rearrangements and provided insights into the treatment of DMD/BMD.

Limitations of the study
In this study, we explored the captured long-read sequencing method for DMD mutation detection.However, some rare patients with DMD/ BMD could be caused by transcription variation alone.We did not investigate the possible mechanisms underlying DMD rearrangement and the mRNA variation, and further research in this area is essential for our future studies.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

Figure 1 .
Figure 1.Flow chart of captured long-read sequencing The capture and library preparation of the entire genomic DMD gene are shown step-by-step for both PB and ONT platforms, respectively.

Figure 2 .
Figure 2. Captured long-read sequencing panel precisely uncovered the SVs of DMD (A-G) Long-read sequencing revealed that the non-contiguous variants and the minority of duplications detected by MLPA methods are definitively attributed to inversion (A-D); translocation between autosomal and X chromosomes involving the DMD gene is shown (E and F); an inversion occurring in the intron region is indicated (G).

Table 1 .
Long-read sequencing allows for efficient DMD gene mutation detection MP, mutation patterns; L-Del, large sequential deletion; L-Dup, large sequential duplication; LDD, large discontinuous deletion and duplication; MF, mutation frequency; a*, number of positive detections; b # , total number of samples in the detection method; a, NGS-panel; b, WES; g, WGS; NA, the detection method is NOT used for this pattern; LRSP, long-read sequencing panel; SV, structural variation.

Table 2 .
Captured long-read sequencing corrected MLPA and short-read sequencing results NA, the detection method is NOT used for the subject; SV, structural variation; None, no variant was detected; inv, invention; delins, the HGVS recommendation for the nomenclature of translocation.

TABLE
d RESOURCE AVAILABILITY B Lead contact B Materials availability B Data and code availability d EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS B Ethics declaration B Human participants d METHOD DETAILS