Short Tandem Repeat Expansions and RNA-Mediated Pathogenesis in Myotonic Dystrophy

Short tandem repeat (STR) or microsatellite, expansions underlie more than 50 hereditary neurological, neuromuscular and other diseases, including myotonic dystrophy types 1 (DM1) and 2 (DM2). Current disease models for DM1 and DM2 propose a common pathomechanism, whereby the transcription of mutant DMPK (DM1) and CNBP (DM2) genes results in the synthesis of CUG and CCUG repeat expansion (CUGexp, CCUGexp) RNAs, respectively. These CUGexp and CCUGexp RNAs are toxic since they promote the assembly of ribonucleoprotein (RNP) complexes or RNA foci, leading to sequestration of Muscleblind-like (MBNL) proteins in the nucleus and global dysregulation of the processing, localization and stability of MBNL target RNAs. STR expansion RNAs also form phase-separated gel-like droplets both in vitro and in transiently transfected cells, implicating RNA-RNA multivalent interactions as drivers of RNA foci formation. Importantly, the nucleation and growth of these nuclear foci and transcript misprocessing are reversible processes and thus amenable to therapeutic intervention. In this review, we provide an overview of potential DM1 and DM2 pathomechanisms, followed by a discussion of MBNL functions in RNA processing and how multivalent interactions between expanded STR RNAs and RNA-binding proteins (RBPs) promote RNA foci assembly.

Although DNA repeat expansions are the primary cause of the associated disorder, the downstream pathomechanisms underlying development of the disease phenotype have remained unclear for the majority of these diseases. Possible STR expansion disease mechanisms include host gene haploinsufficiency, host gene transcript misprocessing [57,58], bidirectional gene transcription [59], repeat-mediated RBP sequestration [60], canonical translation of toxic polyglutamine and polyalanine proteins [61,62] and non-canonical repeat-associated non-AUG (RAN) translation [63]. Of course, these pathomolecular events may and often do, co-occur in an expansion disease. For example, in C9orf72 amyotrophic lateral sclerosis and frontotemporal dementia (C9-ALS/FTD), the C9orf72 intron 1 GGGGCC expansion (GGGGCC exp ) results in alternative first exon selection, intron 1 retention, altered sense and antisense transcription, sequestration of multiple repeat-binding proteins (e.g., HNRNPH1, RanGAP1) and RAN translation of dipeptide repeats (DPRs) [64].
In this review, we focus on two related multisystemic disorders, myotonic dystrophy (DM) type 1 (DM1) and type 2 (DM2), which have served as a paradigm of RNA-mediated diseases. Although DM1 is caused by a DMPK 3' untranslated region (3'UTR) CTG exp and DM2 by an intronic CCTG exp in CNBP, they share a number of pathological features including skeletal muscle myotonia and weakness/wasting, heart conduction block, unusual dust-like ocular cataracts and cognitive dysfunction ( Figure 2). Below, we evaluate different disease mechanisms for DM1 and DM2 in the context of other microsatellite expansion disorders followed by a more detailed analysis of the pathogenic roles of toxic CUG exp and CCUG exp in RNA processing.

Is Myotonic Dystrophy Caused by Haploinsufficiency?
Loss-of-function mechanisms are well documented in Friedreich's ataxia (FRDA) and a number of folate-sensitive fragile sites, including Fragile XA syndrome (FRAXA) where microsatellite expansions induce epigenetic changes that result in transcriptional repression ( Figure 1) [65,66]. However, several observations argue against a haploinsufficiency or host gene loss-of-function, model for either DM1 or DM2. First, neither DMPK nor CNBP coding region mutations have been reported to cause DM. Second, both DM1 and DM2 are classified as myotonic dystrophies but DM1 is caused by DMPK CTG exp and DM2 by CNBP CCTG exp , mutations and these two genes are located on different chromosomes and encode proteins with very different functions. DMPK is a serine/threonine protein kinase while CNBP is a CCHC-type zinc finger (ZnF) protein. DMPK is expressed as six major isoforms and is a member of the AGC kinase family [67,68] while CNBP has been implicated in both transcriptional and post-transcriptional regulation and a recent study demonstrated that this ZnF protein binds to G-rich RNA elements to block G-quadruplex structures and enhance translation [69]. Third, the DMPK CTG exp mutation results in the retention of DMPK mRNA in the nucleus and depletion of DMPK protein but neither Dmpk heterozygous nor homozygous knockout mice recapitulate the major pathological features of DM1 [70][71][72].
In an effort to model DM2, several Cnbp mouse knockout models have been generated that vary in phenotype, possibly due to strain background, which are characterized by either embryonic lethality or sarcomere disorganization and muscle atrophy, in homozygous knockouts while Cnbp +/− heterozygous knockouts develop later-onset muscle weakness/wasting [80,97,98] In an effort to model DM2, several Cnbp mouse knockout models have been generated that vary in phenotype, possibly due to strain background, which are characterized by either embryonic lethality or sarcomere disorganization and muscle atrophy, in homozygous knockouts while Cnbp +/− heterozygous knockouts develop later-onset muscle weakness/wasting [80,97,98]. Thus, Cnbp knockouts reproduce skeletal muscle features common to DM1 and DM2 although CNBP levels are not compromised in DM1. CNBP downregulation has also been noted in a zebrafish model of Treacher Collins syndrome, a craniofacial disorder caused by mutations in TCOF1 that encodes the treacle protein involved in rDNA transcription [99]. Interestingly, forebrain truncation and craniofacial defects have also been observed in Cnbp knockout mice [97]. Cumulatively, these findings argue that DMPK haploinsufficiency is not a major DM1 disease factor but a potential role for CNBP loss-of-function in DM2 requires further study.

DM1 and DM2 Are RNA-mediated Disorders
In contrast to the dysregulation of host gene expression, transcription of microsatellite expansion mutations independent of the host gene context results in the synthesis of toxic STR RNAs. Thornton and colleagues provided evidence for CUG exp toxicity by the generation and characterization of HSA LR transgenic mice, in which a CTG~2 50 mutation was inserted into the 3'UTR of a human skeleton actin (HSA) transgene [100]. HSA LR mice develop several manifestations of DM1 muscle, including myotonia, with pathological severity dependent on transgene expression level.
Both the RNA sequences and structures of STR expansions have been implicated as pathogenic factors and AU/GC composition influences the propensity of these expansions to form higher order RNA structures. In silico prediction of secondary structures formed by expanded microsatellites indicates that the majority of AU-rich repeats are primarily single-stranded compared to those formed by GC-rich STRs [58,101,102]. For instance, DM1 CUG and DM2 CCUG repeats form stable imperfect hairpins with one or two unpaired nucleotides while the C9-ALS/FTD GGGGCC repeats form G-quadruplex structures [103][104][105]. In addition, inherited mutation length increases during an affected individual's lifespan, particularly in post-mitotic cells and results in different cells expressing varying repeat lengths [106][107][108][109]. Due to this somatic mosaicism, DMPK mutant allele repeat expansions are readily detectable in blood cells but expansions in skeletal muscles may be much larger and reach thousands of CTG repeats with variable repeat lengths in different myonuclei [110,111]. Current evidence indicates somatic expansion is one of the triggers of pathological onset, however in some tissues there is no clear correlation between expansion size and disease severity [112]. Moreover, exceptionally large microsatellite expansions in noncoding regions are not always associated with greater disease severity. For instance, DM2 is caused by up to 11,000 CCTG repeats in CNBP intron 1 but this disease is generally recognized as a less severe type of myotonic dystrophy distinguished by relatively late-onset and lack of a congenital form [113].
Why are STR expansion RNAs toxic? In DM1 and DM2, CUG exp and CCUG exp RNAs adversely affect the activities of several developmentally regulated RNA splicing factors in human tissues, cells and animal models ( Figure 3) [114,115]. For example, in DM1 skeletal muscle and heart, expression of CUG exp RNA leads to protein kinase C (PKC)-mediated CELF1 hyperphosphorylation [116,117]. Since CELF1 promotes fetal alternative splicing patterns, CUG exp expression and CELF1 overexpression plays a role in the reversion to fetal splicing patterns in adult tissues [118,119]. These mis-splicing events have been linked to specific pathophysiological outcomes including myotonia, muscle weakness/wasting, heart conduction block and insulin resistance. It is unclear if CELF1 is upregulated in DM2 [79,117,120]. . Models of DM1 and DM2 disease mechanisms. CTG exp and CCTG exp in the DMPK and CNBP genes produce pre-mRNA transcripts containing expanded CUG and CCUG repeats. DMPK pre-mRNA is correctly spliced whereas CCUG exp triggers CNBP intron 1 retention. mRNAs with C(C)UG exp sequester Muscleblind-like (MBNL) and RBFOX (in DM2) alternative splicing factors. In addition, CUG exp increase CUGBP Elav-Like Family Member 1 (CELF1) splicing factor stability through protein kinase C (PKC)-mediated hyperphosphorylation. All these changes in the bioavailability of splicing factors cause an imbalance in alternative splicing and enhanced fetal mRNA isoform production in adult tissues. As a result, inappropriate protein expression patterns lead to a variety of DM symptoms.

MBNL Sequestration and Loss-of-function in DM1 and DM2
In addition to CELF upregulation, current pathomechanistic models propose a major role for the MBNL family of RNA processing factors in DM1 and DM2 disease onset and progression. MBNL proteins are ~37-43 kDa trans-acting factors implicated in the alternative regulation of pre-mRNA splicing, pre-mRNA 3′-end cleavage/polyadenylation, mRNA localization, mRNA stability and microRNA biogenesis, as well as circular RNA generation during embryonic and postnatal development ( Figure 4) [96,[121][122][123][124][125][126][127]. Of the three mammalian MBNL/Mbnl paralogs, mouse Mbnl1 and Mbnl2 function primarily during the postnatal period to switch their RNA targets to adult expression patterns although they also play essential roles in utero since Mbnl1 −/− ; Mbnl2 −/− double knockout mice are embryonic lethal. On the other hand, Mbnl3 is expressed primarily during embryonic development and during adult tissue regeneration [128].
MBNL genes contain several alternatively spliced cassette exons important for RNA binding, splicing activity, nuclear localization and homotypic interactions ( Figure 5A). MBNL proteins interact with their RNA targets via four zinc finger (ZnF) domains that bind GC steps mainly flanked by pyrimidines ( Figure 5B) [130]. The consensus RNA binding site for MBNL proteins is YGCY, a repetitive motif in both CUG exp and CCUG exp RNAs [126,[131][132][133]. The RNA processing activity of MBNL proteins is modulated by both the number and structural context of these binding motifs [132,134]. Expanded repeats in a single transcript may provide hundreds or even thousands, of high affinity MBNL binding sites (KD = ~4-300 nM, depending on RNA structure/length and GC dinucleotide spacing) [135][136][137][138][139]. Expression of CUG exp and CCUG exp RNAs results in sequestration of the MBNL proteins in RNA foci (discussed in more detail below), depletion from the nucleoplasmic pool, the shift to more immature isoforms for MBNL targets and DM disease manifestations ( Figure  3) [118]. In DM1, detection of >50 CTG repeats in blood is considered a molecular hallmark of the adult form of DM1 whereas >1000 CTGs greatly increases the risk of congenital DM1 (CDM) [140,141]. The possibility that CDM results from MBNL sequestration in utero has been recently tested using Mbnl conditional knockout mice. Interestingly, coordinate loss of Mbnl1, Mbnl2 and Mbnl3 expression in skeletal muscle is required to reproduce the congenital phenotype of respiratory muscle development [129]. MBNL loss-of-function has also been implicated in other CTG exp diseases including Fuchs endothelial corneal dystrophy (FECD) [142,143] and spinocerebellar ataxia type 8 (SCA8) [144]. Finally, MBNL proteins may be indirectly sequestered by other RBPs or by other types of repeats, including CGG exp associated with FXTAS and CAG exp linked to polyglutamine diseases [145][146][147].  Figure 4. MBNL functions in RNA biogenesis, localization and stability. MBNL regulates alternative (orange boxes) splicing events, including cassette (e) exon, 5 splice site, 3 splice site, mutually exclusive exons, (i) intron retention [121,129] and alternative 3 end formation by alternative cleavage and polyadenylation (pA) [122]. MBNL also regulates microRNA (miRNA) biogenesis [96], circular RNA (circRNA) formation [123], mRNA localization (horizontal arrow) [121] and increases mRNA stability (vertical arrow) [124]. All examples represent MBNL-mediated events and representative targeted RNAs are indicated.
MBNL genes contain several alternatively spliced cassette exons important for RNA binding, splicing activity, nuclear localization and homotypic interactions ( Figure 5A). MBNL proteins interact with their RNA targets via four zinc finger (ZnF) domains that bind GC steps mainly flanked by pyrimidines ( Figure 5B) [130]. The consensus RNA binding site for MBNL proteins is YGCY, a repetitive motif in both CUG exp and CCUG exp RNAs [126,[131][132][133]. The RNA processing activity of MBNL proteins is modulated by both the number and structural context of these binding motifs [132,134]. Expanded repeats in a single transcript may provide hundreds or even thousands, of high affinity MBNL binding sites (K D =~4-300 nM, depending on RNA structure/length and GC dinucleotide spacing) [135][136][137][138][139]. Expression of CUG exp and CCUG exp RNAs results in sequestration of the MBNL proteins in RNA foci (discussed in more detail below), depletion from the nucleoplasmic pool, the shift to more immature isoforms for MBNL targets and DM disease manifestations (Figure 3) [118]. In DM1, detection of >50 CTG repeats in blood is considered a molecular hallmark of the adult form of DM1 whereas >1000 CTGs greatly increases the risk of congenital DM1 (CDM) [140,141]. The possibility that CDM results from MBNL sequestration in utero has been recently tested using Mbnl conditional knockout mice. Interestingly, coordinate loss of Mbnl1, Mbnl2 and Mbnl3 expression in skeletal muscle is required to reproduce the congenital phenotype of respiratory muscle development [129]. MBNL loss-of-function has also been implicated in other CTG exp diseases including Fuchs endothelial corneal dystrophy (FECD) [142,143] and spinocerebellar ataxia type 8 (SCA8) [144]. Finally, MBNL proteins may be indirectly sequestered by other RBPs or by other types of repeats, including CGG exp associated with FXTAS and CAG exp linked to polyglutamine diseases [145][146][147]. While DM1 and DM2 are classified as myotonic dystrophies characterized by myotonia, progressive myopathy and multiorgan involvement, they also have distinct clinical features [153]. DM2 is generally a later-onset disease with no congenital form and CNBP is expressed at much higher levels than DMPK ( Figure 6). However, mis-splicing is usually less severe in DM2 compared to DM1 although the affinity of MBNL proteins for CCUG exp is higher than CUG exp [112,137,138]. What is the explanation for this discrepancy? Recently, RBFOX was shown to bind to CCUG exp but not CUG exp , RNAs and accumulate in RNA foci suggesting that MBNL-RBFOX competition for RNA binding sites selectively reduces MBNL sequestration in DM2 to promote more adult-like splicing patterns [154]. Similar to CELF and MBNL, the RBFOX proteins are also expressed in multiple tissues ( Figure 6). DM2 appears to have distinct cell-specific molecular signatures that differ from DM1, including no detectable MBNL1 sequestration in RNA foci and no mis-splicing of MBNL1 RNA targets in DM2 iPSC-derived cardiomyocytes [155]. Additionally, retention of CNBP intron 1 and nuclear export of the incompletely processed mRNA, occurs in multiple DM2 tissues suggesting that RAN translation of the intron 1 CCUG exp results in enhanced expression of aberrant tetrapeptide repeats that are a major pathological factor in DM2 [58,156].  [135,[148][149][150]. Exon enumeration is derived from previous studies [151]. ZnF1/2 and ZnF3/4-zinc finger domain pairs 1/2 and 3/4 respectively. NLS-nuclear localization signal. MBNL protein levels are shown as Low (bottom) and High (top) which trigger the indicated splicing events (black lines). (B) The structural model of MBNL1 zinc fingers (ZnF; blue) in complex with RNA (red) [152].
While DM1 and DM2 are classified as myotonic dystrophies characterized by myotonia, progressive myopathy and multiorgan involvement, they also have distinct clinical features [153]. DM2 is generally a later-onset disease with no congenital form and CNBP is expressed at much higher levels than DMPK ( Figure 6). However, mis-splicing is usually less severe in DM2 compared to DM1 although the affinity of MBNL proteins for CCUG exp is higher than CUG exp [112,137,138]. What is the explanation for this discrepancy? Recently, RBFOX was shown to bind to CCUG exp but not CUG exp , RNAs and accumulate in RNA foci suggesting that MBNL-RBFOX competition for RNA binding sites selectively reduces MBNL sequestration in DM2 to promote more adult-like splicing patterns [154]. Similar to CELF and MBNL, the RBFOX proteins are also expressed in multiple tissues ( Figure 6). DM2 appears to have distinct cell-specific molecular signatures that differ from DM1, including no detectable MBNL1 sequestration in RNA foci and no mis-splicing of MBNL1 RNA targets in DM2 iPSC-derived cardiomyocytes [155]. Additionally, retention of CNBP intron 1 and nuclear export of the incompletely processed mRNA, occurs in multiple DM2 tissues suggesting that RAN translation of the intron 1 CCUG exp results in enhanced expression of aberrant tetrapeptide repeats that are a major pathological factor in DM2 [58,156]. While DM1 and DM2 are classified as myotonic dystrophies characterized by myotonia, progressive myopathy and multiorgan involvement, they also have distinct clinical features [153]. DM2 is generally a later-onset disease with no congenital form and CNBP is expressed at much higher levels than DMPK ( Figure 6). However, mis-splicing is usually less severe in DM2 compared to DM1 although the affinity of MBNL proteins for CCUG exp is higher than CUG exp [112,137,138]. What is the explanation for this discrepancy? Recently, RBFOX was shown to bind to CCUG exp but not CUG exp , RNAs and accumulate in RNA foci suggesting that MBNL-RBFOX competition for RNA binding sites selectively reduces MBNL sequestration in DM2 to promote more adult-like splicing patterns [154]. Similar to CELF and MBNL, the RBFOX proteins are also expressed in multiple tissues ( Figure 6). DM2 appears to have distinct cell-specific molecular signatures that differ from DM1, including no detectable MBNL1 sequestration in RNA foci and no mis-splicing of MBNL1 RNA targets in DM2 iPSC-derived cardiomyocytes [155]. Additionally, retention of CNBP intron 1 and nuclear export of the incompletely processed mRNA, occurs in multiple DM2 tissues suggesting that RAN translation

C(C)UG exp RNA-MBNL Interactions in RNA Foci Formation
Evidence that RNA STR expansions form RNA foci was first shown just three years after the DM1 mutation discovery. Singer and colleagues reported unusual focal concentrations of expanded CUG repeats in DM1 nuclei visualized by RNA fluorescence in situ hybridization (RNA-FISH) using a fluorescent probe complementary to the expanded repeats [157]. Later, RNA foci were found in additional noncoding STR expansion diseases [158]. Since the discovery of RNA foci, our view of these structures has evolved from insoluble RNA aggregates into dynamic RNP complexes. RNA foci are non-membrane bound RNP complexes, a group that includes nucleoli, paraspeckles, nuclear speckles and Cajal bodies in the nucleus, as well as cytoplasmic P bodies, stress granules and neuronal and germ granules [159].
In DM1 and DM2, RNA foci are formed by co-transcriptional recruitment of MBNL proteins to C(C)UG exp repeats ( Figure 7A) [160]. The exact role of MBNL proteins in RNA foci assembly is not well understood but the size of RNA foci visualized by RNA-FISH increases at higher MBNL concentrations whereas protein depletion results in a decrease of foci size, suggesting a role for MBNL in promoting foci assembly and/or stability [135,[160][161][162]. It is also possible that a single MBNL protein can interact with two RNA molecules because these proteins have two tandem ZnF pairs ( Figure 5A) (ZnF 1/3 and ZnF 2/4 have similar motifs or CX 7 CX 6 CX 3 H and CX 7 CX 4 CX 3 H, X = amino acid, respectively) and a single ZnF pair is sufficient to bind to RNA ( Figure 5B) [163]. RNA foci formation might also be enhanced by protein-protein interactions since homotypic interactions are mediated by the MBNL1 C-terminal region, which includes the alternatively spliced exon 7 [137,150,164]. Additional proteins have also been proposed to be sequestered within RNA foci (hnRNP H, H2, H3, F, A2/B1, K, L, DDX5, DDX17 and DHX9) but their role in foci formation is unclear [165]. Certainly, these other proteins could reside in foci, either through MBNL or C(C)UG exp RNA interactions or by recruitment to existing MBNL-RNA exp RBP complexes. These additional factors, including members of the DEAD-box family, could exacerbate or ameliorate the toxicity of these complexes by either promoting (e.g., DDX5) [166] or blocking (e.g., DDX6) [167] foci formation. Even though several other proteins have been detected within foci, including hnRNPs, their impact on the DM pathomechanism remains less clear [154,165]. It is also intriguing that RNP foci may be enriched in additional nuclear RNAs, which evokes the question of how the sequestration of non-C(C)UG exp RNAs might contribute to disease.
The dynamic nature of RBP and expanded RNA interactions in foci has been reported for MBNL and RBFOX paralogs for various length STR RNAs [135,154,162,168]. RNP foci dynamics have been captured by live-cell imaging revealing that foci coalesce upon direct interaction or divide into smaller units ( Figure 7B). Characteristics like foci number, shape and volume are prone to time-dependent and cell-state changes [135,162,169]. RNA foci may be stable for minutes or hours and while MBNL proteins are densely packed in foci, they dynamically translocate between RNA binding sites in foci and exchange with free MBNL proteins in the surrounding nucleoplasm ( Figure 7C). Interestingly, a low level of expanded CUG RNA is readily saturated with MBNL proteins, which dynamically exchange with the unbound MBNL in the nucleoplasmic pool, while higher CUG loads severely deplete this pool [135]. Thus, RNA foci assembly in vivo is likely modulated not only by expansion repeat length but also by the spatiotemporal pattern of DMPK and CNBP expression. For instance, DMPK is expressed at a significantly higher level during fetal muscle development than in mature tissues and expression varies significantly between tissues [129].
interactions or by recruitment to existing MBNL-RNA exp RBP complexes. These additional factors, including members of the DEAD-box family, could exacerbate or ameliorate the toxicity of these complexes by either promoting (e.g., DDX5) [166] or blocking (e.g., DDX6) [167] foci formation. Even though several other proteins have been detected within foci, including hnRNPs, their impact on the DM pathomechanism remains less clear [154,165]. It is also intriguing that RNP foci may be enriched in additional nuclear RNAs, which evokes the question of how the sequestration of non-C(C)UG exp RNAs might contribute to disease.

Emerging Roles for RNA-RNA Multivalent Interactions in DM Disease
Early studies assessed the effects of RNA-MBNL interactions in the formation of RNA foci [170], however multivalent RNA-RNA interactions between expanded RNAs also impact foci formation. In transfected cells, overexpressed CUG, CAG and GGGGCC expanded RNAs form phase-separated gel-like droplets displaying ATP-dependent dynamics, implicating RNA-RNA interactions as major drivers of foci assembly [171]. However, it is not clear if CUG exp and CCUG exp expressed at endogenous levels promote RNA foci formation. For instance, DMPK transcripts are present at one to a few dozen molecules per cell in DM1 patient-derived myoblasts so RNA foci must be assembled from only a few transcripts [172,173].
Recently, a four-phase model was proposed whereby RNA-RNA, RNA-RBP and RBP-RBP interactions must reach an assembly threshold [159]. Based on this model, RNA foci assembly in DM1 and DM2 cells occurs when CUG exp and CCUG exp RNAs provide a scaffold for multivalent homotypic and heterotypic interactions that allow the formation of higher order assemblies. In CDM, with high mutant DMPK loads and relatively low MBNL expression, foci might be formed primarily through multivalent extra-and intramolecular interactions that are more prone to RNA gelation. In contrast, in DM1 mature skeletal muscles, CTG expansion size progressively increases while DMPK expression decreases and MBNL protein levels are relatively high.

Conclusions and Perspectives
Studies designed to elucidate the downstream pathways altered by the DMPK CTG exp and CNBP CCTG exp mutations in DM1 and DM2, respectively, have resulted in key insights into the developmental regulation of RNA alternative splicing and polyadenylation in the nucleus as well as RNA localization in the cytoplasm [118,119]. Early studies on DM1 also led to the discovery of nuclear RNA foci [157] while an investigation on SCA8 and DM1 revealed RAN translation [174] and both of these pathomechanisms have been subsequently described for other repeat expansion diseases [175,176]. Nevertheless, many outstanding questions remain including the possibility that a structure related to RNA foci may exist in unaffected cells and provide a critical additional step in nuclear RNA quality control.