Exonization of an Intronic LINE-1 Element Causing Becker Muscular Dystrophy as a Novel Mutational Mechanism in Dystrophin Gene

A broad mutational spectrum in the dystrophin (DMD) gene, from large deletions/duplications to point mutations, causes Duchenne/Becker muscular dystrophy (D/BMD). Comprehensive genotyping is particularly relevant considering the mutation-centered therapies for dystrophinopathies. We report the genetic characterization of a patient with disease onset at age 13 years, elevated creatine kinase levels and reduced dystrophin labeling, where multiplex-ligation probe amplification (MLPA) and genomic sequencing failed to detect pathogenic variants. Bioinformatic, transcriptomic (real time PCR, RT-PCR), and genomic approaches (Southern blot, long-range PCR, and single molecule real-time sequencing) were used to characterize the mutation. An aberrant transcript was identified, containing a 103-nucleotide insertion between exons 51 and 52, with no similarity with the DMD gene. This corresponded to the partial exonization of a long interspersed nuclear element (LINE-1), disrupting the open reading frame. Further characterization identified a complete LINE-1 (~6 kb with typical hallmarks) deeply inserted in intron 51. Haplotyping and segregation analysis demonstrated that the mutation had a de novo origin. Besides underscoring the importance of mRNA studies in genetically unsolved cases, this is the first report of a disease-causing fully intronic LINE-1 element in DMD, adding to the diversity of mutational events that give rise to D/BMD.


Introduction
Duchenne or Becker muscular dystrophies (D/BMD), caused by pathogenic variants in the Dystrophin (DMD) gene, are among the most common inherited diseases of muscle, with an estimated prevalence of~1/3800 live male births [1]. A broad mutational spectrum for D/BMD has been thoroughly described in the literature, ranging from large multi-exonic deletions/duplications to smaller single nucleotide variants [2]. More complex and rarer DMD mutations, such as large rearrangements and gene disruption mediated by retrotransposition activity, have also been reported [3,4]. The genetic heterogeneity, size, and complexity of the DMD gene demands expertise in a vast number of molecular techniques, besides the routinely used multiplex-ligation probe amplification (MLPA) and genomic sequencing. Since dystrophinopathies are now amenable to therapy, the genetic characterization of these patients has gained relevance beyond clinical follow-up and genetic counselling purposes.
We previously reported the characterization of 308 dystrophinopathy patients, from 284 unrelated families, leading to the identification of 175 distinct mutations [5]. This 91% positivity rate (284 of 312 families) was achieved in a cohort with strict inclusion criteria. Since then, and considering all referrals with clinical suspicion of D/BMD, over 100 cases remain unsolved at the genetic level.
This report describes a wide combination of genetic studies performed on a patient presenting a mild BMD phenotype, where a unique mutational event involving the insertion of a long interspersed nuclear element 1 (LINE-1) was identified.

Patient Samples
Formal written informed consent for publication of this case report was obtained from the patient and other family members whose data is presented. The study was conducted in accordance with the Declaration of Helsinki, and with approval of the institutional (CHP) ethics committee (Code: 336-13(196-DEFI/285-CES); date of approval: 11 December 2013).

RNA Studies
Total RNA extracted from patient and control muscle samples with PerfectPure RNA Fibrous Tissue kit (5 PRIME) was converted to cDNA using the High Capacity RNA-to-cDNA Kit (Thermo Fisher Scientific, Waltham, MA, USA). DMD transcripts were amplified by PCR covering the entire coding region. Amplicons were purified with Illustra ExoStar (GE Healthcare, Little Chalfont, UK) and sequenced using BigDye TM Terminator Cycle Sequencing Kit V3.1 (Thermo Fisher Scientific). Reference sequence for variant description: NM_004006.2.

LINE-1 Characterization
In order to identify the 5 insertion site, primers were designed against five candidate target regions within intron 51 (Supplementary Materials Data S1).
For the 3 insertion site, a forward primer was designed against a conserved region of the LINE-1 3 UTR (L1-F: AAATTAGGTATTGATGGGACGTATT) and a reverse primer within intron 51 (51int-R: GAGAAGATGACAGTTAAATCAAAGC) (Supplementary Materials Data S1). Resultant amplicons were sequenced as described above. LINE-1 was genotyped by single molecule real-time sequencing (PacBio RS II system, Pacific Biosciences, San Francisco, CA, USA) using custom DNA libraries (Supplementary Materials Data S1). FASTA/Q files were mapped against a LINE-1 reference and consensus sequence was obtained from BAM files using Samtools mpileup command (Supplementary Materials Data S1). Sequence artifacts and ambiguous sites were clarified via long-range PCR followed by Sanger sequencing.

Results
We report the genetic characterization a 50-year-old male patient with clinical features compatible with BMD, namely, onset at 13 years of age with progressive proximal weakness of lower limbs, electromyography showing myopathic signs, and high creatine kinase levels. The patient was initially referred for DMD molecular testing in 2001, but multiplex PCR and Southern blot failed to detect pathogenic variants. The case was re-evaluated in the context of genetic counselling of the patient's daughter. A muscle biopsy performed in the patient revealed dystrophic features and irregular staining for Dystrophin ( Figure 1). (Supplementary Materials Data S1). Sequence artifacts and ambiguous sites were clarified via long-range PCR followed by Sanger sequencing.

Results
We report the genetic characterization a 50-year-old male patient with clinical features compatible with BMD, namely, onset at 13 years of age with progressive proximal weakness of lower limbs, electromyography showing myopathic signs, and high creatine kinase levels. The patient was initially referred for DMD molecular testing in 2001, but multiplex PCR and Southern blot failed to detect pathogenic variants. The case was re-evaluated in the context of genetic counselling of the patient's daughter. A muscle biopsy performed in the patient revealed dystrophic features and irregular staining for Dystrophin ( Figure 1). Given that large deletions/duplications and point mutations were not detected by current routine genetic studies (MLPA and DMD genomic sequencing), complete DMD cDNA analysis of the muscle specimen was performed. An insertion of 103 nucleotides (not traceable in DMD) was identified between exons 51 and 52 (r.7542_7543ins(103), Figure 2A), predictably shifting the DMD open reading frame (ORF). Besides the predominant mutated transcript, a residual amount of wild-type transcript was detectable ( Figure 2A). To identify the origin of the mutated sequence, a BLASTN query was performed against the human nucleotide collection. Identity of 95.1% (98/103 base pairs (bp)) was retrieved against two human LINE-1 sequences: L1.21 and L1.14 (GenBank accession numbers U93570 and U93566, respectively) (Supplementary Materials Figure S2). Comparative analysis also showed high similarity with other human genomic sequences containing LINE-1 elements. Further confirmation was obtained using CENSOR software where 98/103 bp had 100% similarity with the consensus sequence of the human LINE-1 element (Supplementary Materials Figure S2).
A strategy was delineated to identify the genomic insertion site in the DMD gene. The first five nucleotides (AATTC), having no correspondence to the LINE-1 consensus sequence, were presumed to belong to intron 51 of DMD. This ~44 Kb intron was scanned for the sequence AG/AATTC (AG being the canonical dinucleotide for acceptor splice-sites); a total of eight such sequences were found. Composite splice-site analysis narrowed this down to five potential sites-those presenting high splice-site scores and suitable (cryptic) branch-points located nearby (Supplementary Materials Figure S1). To identify the LINE-1 5′ insertion site by PCR, three forward oligonucleotides were designed to encompass these five regions of interest and a single reverse oligonucleotide annealing to the known inserted LINE-1 sequence (detected by cDNA analysis). PCR experiments and subsequent sequencing showed that the LINE-1 was inserted at position NM_004006.2:c.7542+8951_c.7542+8952 of intron 51 ( Figures 2B,C). The 3′ end was then identified, containing a poly-A tail and a stretch of 9 bp (AAAGAATTC) consistent with a flanking target site Given that large deletions/duplications and point mutations were not detected by current routine genetic studies (MLPA and DMD genomic sequencing), complete DMD cDNA analysis of the muscle specimen was performed. An insertion of 103 nucleotides (not traceable in DMD) was identified between exons 51 and 52 (r.7542_7543ins(103), Figure 2A), predictably shifting the DMD open reading frame (ORF). Besides the predominant mutated transcript, a residual amount of wild-type transcript was detectable (Figure 2A). To identify the origin of the mutated sequence, a BLASTN query was performed against the human nucleotide collection. Identity of 95.1% (98/103 base pairs (bp)) was retrieved against two human LINE-1 sequences: L1.21 and L1.14 (GenBank accession numbers U93570 and U93566, respectively) (Supplementary Materials Figure S2). Comparative analysis also showed high similarity with other human genomic sequences containing LINE-1 elements. Further confirmation was obtained using CENSOR software where 98/103 bp had 100% similarity with the consensus sequence of the human LINE-1 element (Supplementary Materials Figure S2). duplication (TSD) ( Figure 2D). Southern blot and hybridization was performed to estimate the size of insertion. Results revealed a ~6 Kb size increase, thus corresponding to a complete or almost complete LINE-1 element (Supplementary Materials Figure S3). The patient's daughter was seen to be a carrier of the LINE-1 insertion mutation. Additional family members (the patient's healthy brother and two sisters presenting at-risk haplotypes) were A strategy was delineated to identify the genomic insertion site in the DMD gene. The first five nucleotides (AATTC), having no correspondence to the LINE-1 consensus sequence, were presumed to belong to intron 51 of DMD. This~44 Kb intron was scanned for the sequence AG/AATTC (AG being the canonical dinucleotide for acceptor splice-sites); a total of eight such sequences were found. Composite splice-site analysis narrowed this down to five potential sites-those presenting high splice-site scores and suitable (cryptic) branch-points located nearby (Supplementary Materials Figure S1). To identify the LINE-1 5 insertion site by PCR, three forward oligonucleotides were designed to encompass these five regions of interest and a single reverse oligonucleotide annealing to the known inserted LINE-1 sequence (detected by cDNA analysis). PCR experiments and subsequent sequencing showed that the LINE-1 was inserted at position NM_004006.2:c.7542+8951_c.7542+8952 of intron 51 ( Figure 2B,C). The 3 end was then identified, containing a poly-A tail and a stretch of 9 bp (AAAGAATTC) consistent with a flanking target site duplication (TSD) ( Figure 2D). Southern blot and hybridization was performed to estimate the size of insertion. Results revealed a~6 Kb size increase, thus corresponding to a complete or almost complete LINE-1 element (Supplementary Materials Figure S3).
The patient's daughter was seen to be a carrier of the LINE-1 insertion mutation. Additional family members (the patient's healthy brother and two sisters presenting at-risk haplotypes) were also screened for the mutation using a LINE-1-specific PCR (Figure 3). Only the patient´s daughter tested positive, suggesting a de novo event. also screened for the mutation using a LINE-1-specific PCR (Figure 3). Only the patient´s daughter tested positive, suggesting a de novo event. Further genotyping confirmed that a full-length LINE-1 was present (sequence available in Supplementary Materials Data S2 and submitted to GenBank, accession number MF421743). L1Xplorer and RepeatMasker tools classified the element as a member of the L1HS subfamily, as it had all the typical hallmarks of these retrotransposons: a 5′-untranslated region (UTR), two non-overlapping ORFs (ORF1 and ORF2), a short 3′UTR and a poly-A tail (Figure 4). haplotyping. This insertion is present in the patient (IV:1) and his daughter (V:1), and not detectable in the patient's sisters (IV:6 and IV:7 both carriers of the same at-risk haplotype). Interestingly, this family was initially thought to have an X-linked transmission, since one of the patient's maternal great-uncles (deceased) was suspected to have a neuromuscular disease (NMD). (B) LINE-1 specific PCR used to screen for additional carriers.
Further genotyping confirmed that a full-length LINE-1 was present (sequence available in Supplementary Materials Data S2 and submitted to GenBank, accession number MF421743).
L1Xplorer and RepeatMasker tools classified the element as a member of the L1HS subfamily, as it had all the typical hallmarks of these retrotransposons: a 5 -untranslated region (UTR), two non-overlapping ORFs (ORF1 and ORF2), a short 3 UTR and a poly-A tail (Figure 4).

Discussion
LINE-1s are the most abundant type of retrotransposable elements, accounting for nearly 17% of the human genome [10]. Typically, they are ~6 kb in length and exhibit characteristic components ( Figure 4). ORF1 encodes an RNA-binding protein while ORF2 encodes a protein with endonuclease and reverse transcriptase activity. Although the transcriptional mechanism of LINE-1 is not fully understood, it has been proposed to involve target-site primed reverse transcription. The cDNA originated by this process recombines with genomic DNA, giving rise to characteristic signatures: a 7-20 bp direct repeat of the endonuclease target flanking the inserted LINE-1 (TSD) [11]. Only 80-100 LINE-1s in the human genome (0.1% of total) are believed to be capable of active retrotransposition [11].
Despite their importance in evolution and genome diversity, the insertion of a LINE-1 within a gene could have a deleterious effect, giving rise to disease. To date, only 30 such insertions have been reported, the majority located in exonic regions and causing frame-shifts or exon skipping [11]. In contrast, intronic LINE-1 insertions are rarely reported.
Regarding the DMD gene, only five pathogenic insertions have been described. Exonic disruptions, giving rise to a DMD phenotype, have been reported twice in exon 44 [12,13], and also in exons 48 and 67 [14,15]. A further pathogenic insertion was detected in two unrelated Japanese families with X-linked dilated cardiomyopathy, where a 5′-truncated form of a LINE-1 was integrated in the DMD 5′UTR, thought to affect the transcription or the stability of muscle transcripts [16]. A different repetitive element (Alu-like) was also reported to cause dilated cardiomyopathy, activating a cryptic acceptor splicing site in intron 11 of DMD [17]. The mutational event in our patient is completely distinct in two aspects: it is a deep-intronic insertion and a full LINE-1 sequence is present. This LINE-1 was classified as a member of the L1HS subfamily, responsible for the majority of the documented LINE-1 retrotransposition events. Our results showed its partial exonization at the cDNA level, due to the recognition of a cryptic 3′ splice-site located in intron 51 and a 5′ splice-site within this element (Figure 4). This presumably gives rise to a truncated polypeptide (p.Ala2515Asnfs*21). The presence of a residual wild-type transcript explains the

Discussion
LINE-1s are the most abundant type of retrotransposable elements, accounting for nearly 17% of the human genome [10]. Typically, they are~6 kb in length and exhibit characteristic components ( Figure 4). ORF1 encodes an RNA-binding protein while ORF2 encodes a protein with endonuclease and reverse transcriptase activity. Although the transcriptional mechanism of LINE-1 is not fully understood, it has been proposed to involve target-site primed reverse transcription. The cDNA originated by this process recombines with genomic DNA, giving rise to characteristic signatures: a 7-20 bp direct repeat of the endonuclease target flanking the inserted LINE-1 (TSD) [11]. Only 80-100 LINE-1s in the human genome (0.1% of total) are believed to be capable of active retrotransposition [11].
Despite their importance in evolution and genome diversity, the insertion of a LINE-1 within a gene could have a deleterious effect, giving rise to disease. To date, only 30 such insertions have been reported, the majority located in exonic regions and causing frame-shifts or exon skipping [11]. In contrast, intronic LINE-1 insertions are rarely reported.
Regarding the DMD gene, only five pathogenic insertions have been described. Exonic disruptions, giving rise to a DMD phenotype, have been reported twice in exon 44 [12,13], and also in exons 48 and 67 [14,15]. A further pathogenic insertion was detected in two unrelated Japanese families with X-linked dilated cardiomyopathy, where a 5 -truncated form of a LINE-1 was integrated in the DMD 5 UTR, thought to affect the transcription or the stability of muscle transcripts [16]. A different repetitive element (Alu-like) was also reported to cause dilated cardiomyopathy, activating a cryptic acceptor splicing site in intron 11 of DMD [17]. The mutational event in our patient is completely distinct in two aspects: it is a deep-intronic insertion and a full LINE-1 sequence is present. This LINE-1 was classified as a member of the L1HS subfamily, responsible for the majority of the documented LINE-1 retrotransposition events. Our results showed its partial exonization at the cDNA level, due to the recognition of a cryptic 3 splice-site located in intron 51 and a 5 splice-site within this element (Figure 4). This presumably gives rise to a truncated polypeptide (p.Ala2515Asnfs*21). The presence of a residual wild-type transcript explains the patient's milder dystrophinopathy (BMD phenotype). This LINE-1 sequence has 100% identity with a LINE-1 element located in chromosome 2 (GenBank accession number AC216112), which could constitute its original source. It has a near-complete identity (only one bp difference) with another pathogenic LINE-1 (GenBank accession number AF149422) inserted in the hemoglobin-beta locus; a seemingly active retrotransposable element of the human genome [18].
Intronic LINE-1 insertions causing exonization have only been described in three other cases: chronic granulomatous disease (CYBB gene) [19], Chanarin-Dorfman syndrome (ABHD5 gene) [20], and familial retinoblastoma (RB1 gene) [21]. The rarity of LINE-1-mediated pathogenic insertions described in the literature and in variant databases is attributable mostly to their low activity throughout the genome and to the technical difficulties in their detection (especially the full-length insertions). With massive parallel sequencing technology, there are also considerable limitations, particularly through short reads sequencing, and here, tailored bioinformatics analysis tools and strategies are required [22]. In the case of intronic LINE-1 insertions, detection may be hampered by the intron's length and the fact that it mainly affects transcriptional events (e.g., intronic retentions or exonization events). One possibility is therefore to conduct mRNA studies in yet uncharacterized B/DMD patients, as previously suggested [5,23].
It is known that repetitive transposable elements such as short interspersed nuclear elements (e.g., Alu sequences) or LINE sequences are frequent in DMD intronic regions. However, besides being the underlying cause of some gross rearrangements, they may also influence gene expression by mediating alternative splicing. One can thus speculate that this would ultimately interfere with the efficacy of RNA-based therapies, especially those designed to restore the reading frame. To our knowledge, this is the first report of a deep-intronic insertion of a LINE-1 element in the DMD gene shown to cause disease. Besides its scientific relevance, while expanding the mutational mechanisms underlying B/DMD, this finding also reinforces the need to develop comprehensive approaches to identify LINE-1 insertion profiles in the human genome.