Translation of dipeptide repeat proteins in C9ORF72 ALS/FTD through unique and redundant AUG initiation codons

A hexanucleotide repeat expansion in C9ORF72 is the most common genetic cause of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). A hallmark of ALS/FTD pathology is the presence of dipeptide repeat (DPR) proteins, produced from both sense GGGGCC (poly-GA, poly-GP, poly-GR) and antisense CCCCGG (poly-PR, poly-PG, poly-PA) transcripts. Translation of sense DPRs, such as poly-GA and poly-GR, depends on non-canonical (non-AUG) initiation codons. Here, we provide evidence for canonical AUG-dependent translation of two antisense DPRs, poly-PR and poly-PG. A single AUG is required for synthesis of poly-PR, one of the most toxic DPRs. Unexpectedly, we found redundancy between three AUG codons necessary for poly-PG translation. Further, the eukaryotic translation initiation factor 2D (EIF2D), which was previously implicated in sense DPR synthesis, is not required for AUG-dependent poly-PR or poly-PG translation, suggesting that distinct translation initiation factors control DPR synthesis from sense and antisense transcripts. Our findings on DPR synthesis from the C9ORF72 locus may be broadly applicable to many other nucleotide repeat expansion disorders.

To design therapies that reduce DPR levels, it is valuable to identify initiation codons used in DPR translation. To date, the synthesis of sense DPRs has been a major focus in the ALS/FTD field, resulting in the identification of translation initiation codons for poly-GA and poly-GR (Green et al., 2017;Tabet et al., 2018;Boivin et al., 2020;Sonobe et al., 2018). As previously shown, noncanonical codons (CUG for poly-GA, AGG for poly-GR) initiate DPR synthesis from the sense strand (Green et al., 2017;Tabet et al., 2018;Boivin et al., 2020;Sonobe et al., 2018;van 't Spijker et al., 2022). Interestingly, studies in Drosophila and cultured cells showed that the presence of an expanded GGGGCC repeat alone, without flanking intronic sequences, can result in DPR production, suggesting an unconventional form of translation (Zu et al., 2013). However, deletion analysis of cis-regulatory elements upstream of the GGGGCC repeats and ribosome profiling revealed that translation initiation in the poly-GA and poly-GR frames does depend on flanking intronic sequences surrounding the repeats (van 't Spijker et al., 2022;Lampasona et al., 2021;Almeida et al., 2019). Moreover, a recent study proposed that a canonical AUG initiation codon is used for poly-PG synthesis from the antisense CCCCGG transcript (Boivin et al., 2020), suggesting conventional translation is involved in the synthesis of at least one DPR. However, the initiation codons for other DPRs (e.g., poly-PR, poly-PA) from the antisense transcript remain unknown. Hence, it is unclear which mode of translation is utilized for DPR synthesis from the antisense transcript.
Although both sense and antisense transcripts produce GP-containing dipeptides (sense: poly-GP, antisense: poly-PG), the antisense transcript seems to be the primary source of poly-PG/poly-GP inclusions in the brain of C9ORF72 ALS/FTD patients (Zu et al., 2013). Further, two recent ALS clinical trials that specifically targeted the production of DPRs from the sense transcript failed (Liu et al., 2022a;Tran et al., 2022;Krishnan et al., 2022). Therefore, studying the mechanisms responsible for DPR synthesis from the antisense transcript is important, and this is the focus of the present study.
Here, we employ cell-based models of C9ORF72 ALS/FTD to identify translation initiation codons for DPRs produced from the antisense transcript. Transfection into cultured cells of constructs carrying 35 CCCCGG repeats (preceded by 1000 bp of human intronic C9ORF72 sequence) leads to DPR production (poly-PR, poly-PG) and reduced cell survival. We find that a canonical AUG initiation codon located 273 base pairs (-273 bp) upstream of the CCCCGG repeats is necessary for poly-PR synthesis. Further, we provide evidence for redundancy in usage of canonical initiation codons for poly-PG synthesis. Although an AUG at -194 bp is the main start codon for poly-PG, two other AUG codons (at -212 bp and at -113 bp) can also function as alternative translation initiation sites. These findings suggest that DPR synthesis from the antisense transcript occurs via AUG-dependent translation, contrasting with the mode of DPR synthesis from the sense transcript, which depends on Figure 1. Poly-PR and poly-PG are translated from antisense CCCCGG repeats. (A) Schematic diagram of the constructs with 35 CCCCGG repeats preceded by 1000-bp-long intronic sequence from human C9ORF72, and then followed by nanoluciferase (nLuc). (B) HEK293 and (C) NSC34 cells were cotransfected with fLuc along with either ΔC9 or AS-C9 plasmids. The levels of luciferase activity were assessed by dual luciferase assays (mean ± s.e.m.). The experiments were repeated four times. One-way ANOVA with Tukey's multiple comparison test was performed. (D-E) HEK293 and NSC34 cells were transfected with either ΔC9 or AS-C9 plasmids. Cell lysates were processed for western blotting, and immunostained with antibodies to (D) poly-PR, non-canonical start codons (CUG for poly-GA, AGG for poly-GR). Finally, we show that the translation initiation factor eIF2D, which is necessary for CUG-dependent poly-GA synthesis from the sense transcript (Sonobe et al., 2021), is not involved in AUG-dependent antisense DPR (poly-PG, poly-PR) synthesis. Hence, distinct translation initiation sites and factors are employed for DPR synthesis from sense GGGGCC and antisense CCCCGG transcripts.

Results
Transfection of constructs carrying 35 CCCCGG repeats leads to antisense DPR synthesis and reduced cell survival To study DPR synthesis from the antisense transcript, we engineered three constructs with 35 CCCCGG repeats preceded by 1000-bp-long intronic sequence from human C9ORF72 ( Figure 1A; Sonobe et al., 2021), and then followed by nanoluciferase (nLuc) in frame of poly-PR, poly-PG, or poly-PA (see Materials and methods). 48 hr after transfection of poly-PR::nLuc or poly-PG::nLuc into HEK293 and NSC34 cells, robust expression of poly-PR and poly-PG was detected both in luciferase assays ( Figure 1B-C) and western blotting for poly-PR, poly-PG, and nLuc ( Figure 1D-E, Figure 1-figure supplement 1, Figure 1-source data 1), suggesting the luciferase signal is an accurate readout for DPR production. Protein isolation of soluble and insoluble fractions showed that both DPRs (poly-PG and poly-PR) are predominantly detected in the soluble fraction under these experimental conditions ( Figure 1-figure supplement 2). Further, production of poly-PR and poly-PG in transfected NSC34 cells was confirmed with immunofluorescence staining ( Figure 1F-G). Finally, transfection of either poly-PR::nLuc or poly-PG::nLuc into NSC34 cells led to reduced cell survival ( Figure 1H-I).
Consistent with a previous study (Boivin et al., 2020), we did not detect poly-PA with luciferase assays ( Figure 1B-C) and western blotting (Figure 1-figure supplement 3) upon poly-PA::nLuc transfection. We surmise that the initiation codon for poly-PA may lie outside the 1000 bp intronic sequence used in our construct, or that the specific regulatory machinery needed for poly-PA synthesis is lacking in the cellular context examined here (HEK293 and NSC34 cells). Altogether, our cell-based model of C9ORF72 (construct with 35 CCCCGG repeats and 1000 bp of human intron) produces two antisense DPRs (poly-PR, poly-PG) and displays reduced cell survival.
The online version of this article includes the following source data and figure supplement(s) for figure 1: Source data 1. Full raw unedited images of western blots shown in Figure 1.       initiation codons for poly-PR ( Figure 2A). We then mutated these codons either to CCC or the termination codon UAG ( Figure 2A). Western blotting and luciferase assays showed that mutation of the CUG at -366 bp to CCC or UAG did not affect poly-PR expression ( Figure 2B-G, Figure 2-source data 1). However, mutation of the AUG at -273 bp to CCC or UAG completely abolished poly-PR expression both in HEK293 and NSC34 cells, as shown by western blotting ( Figure 2B-E), luciferase assays ( Figure 2F-G), and immunofluorescence staining against poly-PR ( Figure 2H). Importantly, the reduced survival of NSC34 cells upon poly-PR::nLuc transfection was partially rescued when the -273 bp AUG codon was mutated into the UAG termination codon, suggesting poly-PR production is toxic under these experimental conditions ( Figure 2I). These results strongly suggest that the AUG at -273 bp is the start codon for translation of poly-PR, one of the most toxic DPRs in C9ORF72 ALS/ FTD. This AUG is predicted to be included in the endogenous antisense CCCCGG transcript based on 5' Rapid Amplification of cDNA Ends (RACE) analysis on brain samples of C9ORF72 ALS/FTD patients (Zu et al., 2013).

Evidence for redundancy of AUG initiation codon usage in poly-PG translation
We next investigated poly-PG, which is less toxic than poly-PR (Wen et al., 2014;Lee et al., 2016;Mizielinska et al., 2014;Freibaum et al., 2015), and has been proposed as a biomarker for C9ORF72-ALS/FTD (Gendron et al., 2017;Lehmer et al., 2017). Using the same machine-learning algorithm (Gleason et al., 2022), we identified four putative initiation codons (AUG at -212 bp, AUG at -194 bp, CUG at -182 bp, AUG at -113 bp) ( Figure 3A), all with relatively good Kozak sequences (gaaAUGa at -212 bp, aaaAUGc at -194 bp, gctCUGa at -182 bp, aggAUGc at -113 bp). Of note, a prior publication previously identified the AUG at -194 bp as an initiation codon (Boivin et al., 2020). Simultaneous mutation of all four of these codons to CCC completely blocked poly-PG expression ( Figure 3B-D, Figure 3-source data 1), suggesting one or more of these codons is required. Next, we simultaneously mutated three codons to CCC, but left intact the AUG at -212 bp.
We refer to this construct as '-212 AUG'. Upon transfection of -212 AUG, we observed poly-PG expression, suggesting poly-PG translation can start at the AUG at -212 bp. Intriguingly, when we followed a similar approach to mutate three codons to CCC but leave intact the AUG at -194 bp or at -113 bp, we also observed poly-PG production, but this time at an expected lower molecular weight ( Figure 3B-D, Figure 3-source data 1). Of note, when we mutated to CCC all three AUG codons (-212 bp, -194 bp, -113 bp) but left intact the CUG at -182 bp, we observed no poly-PG expression ( Figure 3B-D, Figure 3-source data 1). These results suggest that any of these three AUGs, but not the CUG at -182 bp, can function as a start codon for poly-PG, indicating redundancy in the translation initiation codon for poly-PG. We observed a strong (higher molecular weight) band and a fainter (lower molecular weight) band for poly-PG when the intact version of the poly-PG::NanoLuc plasmid was translated ( Figure 3B, Figure 3-figure supplement 1, Figure 3-source data 1). The strong band is likely to result from translation initiation at the AUG at -194 bp, whereas the faint band is likely initiated at the AUG at -113 bp ( Figure 3B). Hence, the AUG at -194 bp appears to be the main initiation codon for poly-PG synthesis from the antisense transcript of 35 CCCCGG repeats ( Figure 3B), which is consistent with mass spectrometry results from a previous report (Boivin et al., 2020).
Interestingly, selective mutation of the AUG at -194 to CCC did not abolish poly-PG expression ( Figure 4A-D, Figure 4-figure supplement 1). Instead, it led to the production of two poly-PG products: a high molecular weight product (strong band) resulting from use of the AUG at -212 bp as well as a lower molecular weight product (faint band) resulting from AUG at -113 bp ( Figure 4B, Figure 4-source data 1). Altogether, these results suggest that the AUG at -194 bp is mainly used for poly-PG expression from antisense CCCCGG repeats. However, when this AUG is mutated, two other AUG codons (at -212 bp and -113 bp) can also function as translation initiation sites, again revealing redundancy in the start codon usage for poly-PG synthesis.

Mutation of the -113bp AUG abolishes poly-PG production
We further corroborated this redundant initiation of poly-PG translation by individually mutating each of the AUG codons to a termination UAG codon ( Figure Figure 5E). Finally, the reduced survival of NSC34 cells was not rescued upon transfection of the -113 UAG construct, suggesting poly-PG production is not toxic under these experimental conditions ( Figure 5F).
Altogether, these findings strongly suggest that the AUG at -194 bp is primarily used for poly-PG translation, but the other two AUG codons at -212 bp and -113 bp can also function as translation initiation sites under certain experimental conditions.

EIF2D does not control poly-PR and poly-PG synthesis from the antisense transcript
Following the identification of AUG codons for translation initiation of poly-PR and poly-PG, we next sought to identify translation initiation factors necessary for synthesis of these antisense DPRs. We focused on EIF2D because we previously found it to be necessary for poly-GA synthesis from the sense GGGGCC transcript in Caenorhabditis elegans and cell-based models (HEK293 and NSC34 cell lines) (Sonobe et al., 2021). To this end, we generated an EIF2D knockout HEK293 line using CRISPR/Cas9 gene editing (see Materials and methods) ( Figure 6A-C, Figure 6-source data 1). Next, we transfected the poly-PR::nLuc reporter construct into control and EIF2D knockout HEK293 cells. We found that knockout of EIF2D did not affect the expression levels of the poly-PR::nLuc reporter ( Figure 6E). We obtained similar results upon knockdown of EIF2D with a short hairpin RNA (shRNA) ( Figure 6H), again suggesting that eIF2D is not required for poly-PR synthesis from antisense CCCGG transcripts. Lastly, knockout or knockdown (shRNA) of EIF2D in HEK293 cells transfected with poly-PG::nLuc did not decrease poly-PG expression based on a luciferase assay ( Figure 6D and G). Hence, knockout or knockdown of EIF2D does not affect the production of two antisense DPR (poly-PR, poly-PG). On the other hand, knockdown of EIF2D did reduce the levels of poly-GA ( Figure 6I), a DPR generated from sense RNA. The poly-GA reduction is consistent with our previous observations in a C. elegans model of C9ORF72 ALS/FTD (Sonobe et al., 2021), albeit more modest -likely due to a technical reason (see legend of Figure 6I).

Knockdown of EIF2D in human iPSC-derived motor neurons
We next tested whether EIF2D is required for DPR synthesis in a cellular context that maintains the endogenous human C9ORF72 gene locus. We initially used one published iPSC line from a C9ORF72 carrier (line 26#6), as well as an isogenic control line (26Z90) which had CRISPR/Cas9-mediated deletion of expanded GGGGCC repeats (Lopez-Gonzalez et al., 2019). The iPSC lines were differentiated into motor neurons as previously described (Lopez-Gonzalez et al., 2016). Repeated transfection of a small interfering RNA (siRNA) against EIF2D (EIF2D-siRNA-1), but not of a control scrambled siRNA, resulted in robust downregulation of EIF2D mRNA as assessed by RT-PCR ( Figure 7A) and eIF2D protein analysis (Figure 7-figure supplement 1). The mRNA levels of eIF2A, a related initiation factor, remained unaltered, suggesting specificity in the siRNA effect. Despite this knockdown, an immunoassay (conducted in a blinded manner) failed to show any differences in the steady-state performed. (F) HEK293 and (G) NSC34 cells were cotransfected with the plasmids along with fLuc. The levels of luciferase activity were assessed by dual luciferase assays (mean ± s.e.m.). The experiments were repeated four times. One-way ANOVA with Tukey's multiple comparison test was performed. (H) NSC34 cells transfected with either ΔC9, poly-PR::nLuc, or -273 AUG ->UAG plasmids were stained with 4′,6-diamidino-2-phenylindole [DAPI] (blue) and immunostained with a poly-PR antibody (green). Scale bars show 20 μm. (I) NSC34 cells were transfected with either ΔC9, wild type (WT), or -273 AUG ->UAG plasmids. WST-8 assay was performed to assess the cell viability. The experiments were repeated five times. One-way ANOVA with Tukey's multiple comparison test was performed. In ΔC9 and WT, the same datasets as Figure 1H were used (mean ± s.e.m.). The experiments were repeated five times. One-way ANOVA with Tukey's multiple comparison test was performed.
The online version of this article includes the following source data for figure 2: Source data 1. Full raw unedited images of western blots shown in Figure 2.  Figure 7B), suggesting eIF2D is not necessary for poly-PG translation from the antisense transcript. We caution though that our immunoassay does not distinguish between poly-PG produced from the antisense transcript and poly-GP from the sense transcript ( Figure 7B). Hence, a mild effect upon EIF2D knockdown on poly-PG (from antisense transcript) can potentially be masked by poly-GP (from sense transcript). Of note, PG/GP inclusions in brain tissue of C9ORF72 ALS/FTD patients contain ~80% of poly-PG from the antisense transcript and ~20% of poly-GP from the sense transcript (Zu et al., 2013). However, other studies indicate that the exact contribution of sense poly-GP and antisense poly-PG C9ORF72 ALS/FTD has not been resolved (Tran et al., 2022; indicated plasmids. The level of luciferase activity was assessed by dual luciferase assay (mean ± s.e.m.). The experiments were repeated four times. One-way ANOVA with Tukey's multiple comparison test was performed.
The online version of this article includes the following source data and figure supplement(s) for figure 3: Source data 1. Full raw unedited images of western blots shown in Figure 3.       Krishnan et al., 2022;Gendron et al., 2017). Hence, our data hint that eIF2D may not affect poly-PG synthesis from the antisense CCCCGG transcript.
Despite the lack of an effect on poly-PG/GP, we found that EIF2D knockdown reduced poly-GA synthesis from the sense GGGGCC transcript in neurons derived from iPSC line 26#6 ( Figure 7B), critically extending previous observations made in C. elegans and cell-based models (Sonobe et al., 2021). Consistent with the latter study, EIF2D knockdown had no effect on poly-GR synthesis from the sense transcript based on an immunoassay that measures soluble poly-GR ( Figure 7B). Altogether, these findings from one patient line (26#6) suggest that eIF2D is required for CUG start codon-dependent poly-GA synthesis from the sense transcript in human iPSC-derived neurons, but is dispensable for poly-GR (from sense transcript) and poly-PG synthesis, albeit our immunoassay cannot distinguish between poly-PG and poly-GP. However, when we repeated this experiment with two additional iPSC lines (27#11 and 40#3) from C9ORF72 carriers with two siRNAs (EIF2D-siRNA-1 and -2), we did not achieve robust EIF2D knockdown ( Figure 7C-D). We note that the same siRNA (EIF2D-siRNA-1) led to robust EIF2D knockdown in the first patient line (26#6) (compare Figure 7A with Figure 7C, D). Hence, the issue of variable siRNA knockdown efficiency prevents us from drawing any general conclusions on the role of EIF2D in DPR synthesis in the context of motor neurons derived from different iPSC lines of C9ORF72 carriers ( Figure 7B and E).

Discussion
Here, we show that canonical AUG codons on the antisense CCCCGG transcript serve as translation initiation codons for two DPRs -poly-PR and poly-PG. This finding may inform the design of future therapies for ALS/FTD, especially since poly-PR is a highly toxic DPR and poly-PG is thought to be primarily translated from the antisense transcript (Zu et al., 2013). Our finding of canonical AUG codons serving as translation initiation codons for antisense DPRs (poly-PR, poly-PG) differs from the proposed mode of translation of sense DPRs (e.g., poly-GA, poly-GR). In the latter case, it is thought that repeat-associated non-AUG (RAN) translation of poly-GA and poly-GR occurs via non-canonical CUG and AGG initiation codons, respectively, located in the intronic sequence upstream of the GGGGCC repeats (Green et al., 2017;Tabet et al., 2018;Boivin et al., 2020;Sonobe et al., 2018;van 't Spijker et al., 2022;Sonobe et al., 2021). Interestingly, studies in Drosophila and cultured cells showed that the presence of an expanded GGGGCC repeat alone, without flanking sequences, can result in DPR production (Zu et al., 2013;Freibaum et al., 2015). Hence, our findings together with these previous studies suggest that DPR synthesis may involve at least three different modes of translation: (1) near-cognate start codon (e.g., CUG, AGG) dependent translation for poly-GA and poly-GR from sense GGGGCC transcripts, (2) canonical AUG-dependent translation for poly-PR and poly-PG synthesis from antisense CCCCGG transcripts, and (3) DPR synthesis may also occur through RAN translation mechanisms that solely utilize the repeat. It is conceivable that all three modes of translation may occur simultaneously in disease, and that the use of non-canonical and canonical initiation codons may be the primary contributors of DPR production.
A notable finding is the presence of redundancy in start codon usage for poly-PG synthesis. Our data suggest that the AUG at -194 bp is primarily used for poly-PG translation from antisense CCCCGG transcripts, consistent with a previous investigation (Boivin et al., 2020). However, when this AUG is mutated, two other canonical AUG codons (at -212 bp and -113 bp) can also function as translation initiation sites under the experimental conditions described herein. Although it is (E) NSC34 cells transfected with indicated plasmids were stained with 4′,6-diamidino-2-phenylindole [DAPI] (blue) and immunostained with a poly-PG antibody (green). Scale bars show 20 μm. (F) NSC34 cells were transfected with indicated plasmids. WST-8 assay was performed to assess the cell viability (mean ± s.e.m.). The experiments were repeated five times. One-way ANOVA with Tukey's multiple comparison test was performed. In ΔC9 and wild type (WT), the same datasets as Figure 1I were used.
The online version of this article includes the following source data and figure supplement(s) for figure 5: Source data 1. Full raw unedited images of western blots shown in Figure 5.  unclear whether such redundancy in DPR translation initiation occurs in the central nervous system of C9ORF72 ALS/FTD patients, these findings nevertheless suggest that targeting only one translation initiation site may be insufficient to prevent poly-PG synthesis. Redundancy in start codon usage may also apply to other DPRs, such as poly-PR synthesis from the antisense transcript. Although we identified an AUG at -273 bp as necessary for poly-PR synthesis, a previous study detected poly-PR when only 100 bp downstream of the GGGGCC repeats were included in an adeno-associated viral The experiments were repeated three times. Unpaired t test was performed. The poly-GA reduction upon EIF2D shRNA is consistent with our previous observations (Sonobe et al., 2021), albeit more modest -likely due to a technical reason (a bicistronic construct containing 75 GGGGCC repeats was used in Sonobe et al., 2021).
The online version of this article includes the following source data for figure 6: Source data 1. Full raw unedited images of western blots shown in Figure 6. (AAV) vector . It is important to note that this intronic 100-bp-long sequence was placed next to a 589 bp regulatory element of the woodchuck hepatitis virus (WPRE), which contains several putative start codons. The AUG initiation codons we identified as necessary for either poly-PR or poly-PG synthesis are predicted to be included in the endogenous antisense CCCCGG transcript based on 5' RACE analysis on brain samples of C9ORF72 ALS/FTD patients (Zu et al., 2013). Nevertheless, endogenous mutagenesis of these codons -in the native genomic context of the C9ORF72 locus -is needed in the future to further test the validity of our findings.
Emerging evidence suggests distinct proteins affect translation initiation of DPRs from sense and antisense transcripts in C9ORF72 ALS/FTD. For example, the RNA helicase DDX3X directly binds to sense (GGGGCC), but not antisense (CCCCGG) transcripts, thereby selectively repressing the production of sense DPRs (poly-GA, poly-GP, poly-GR) (Cheng et al., 2019). Here, we provide evidence that the translation initiation factor EIF2D is not involved in DPR (viz., poly-PG, poly-PR) synthesis from antisense (CCCCGG) transcripts. In a previous study (Sonobe et al., 2021), we showed in C. elegans and in vitro cellular systems (HEK293 and NSC34 cells) that EIF2D is required for poly-GA production from sense (GGGGCC) transcripts. These findings are important because they indicate that not only distinct translation initiation codons, but also different regulatory proteins are involved in DPR synthesis from sense and antisense transcripts, suggesting that different modes of DPR translation (e.g., RAN translation, AUG-dependent translation) occur simultaneously in C9ORF72 ASL/FTD. Consistent with this idea, translation initiation is the most heavily regulated step in protein synthesis because it is the rate-limiting step (Richter and Sonenberg, 2005). Hence, we favor a model where distinct regulatory factors are necessary for translation initiation of different DPRs. In striking contrast, the transcriptional control of sense and antisense transcripts appears coordinated. For example, a single protein -the transcription elongation factor Spt4 -controls production of both sense and antisense transcripts (Kramer et al., 2016).
In addition to C9ORF72 ALS/FTD, nucleotide repeat expansions are present in various genes, causing more than 30 neurological diseases (Chintalaphani et al., 2021;Depienne and Mandel, 2021). In many of these, products translated from the expanded repeat sequences have been detected in the nervous system of affected individuals. Hence, our findings may also apply to this large group of genetic disorders in the following ways. First, translation of peptides from the same nucleotide repeat expansion may require different modes of translation (RAN-and AUG-dependent translation), as previously proposed (Gao et al., 2017). Second, the surprising redundancy in canonical AUG initiation codon usage for DPR (poly-PG) synthesis may also apply to proteins translated from nucleotide repeat expansions in other genes. Lastly, our results support the idea that distinct translation initiation factors are involved in the synthesis of individual DPRs produced from the same nucleotide repeat expansion. Future studies focused on transcriptional and translational mechanisms of expanded nucleotide repeats may critically contribute to the design of therapies for these diseases.

Generation of the plasmid constructs
All oligonucleotides were obtained from Integrated DNA Technologies. Oligonucleotide I-F/R (Supplementary file 1) contains part of a HindIII site followed by 113 nucleotides that are normally upstream of the GGGGCC repeats and then by three GGGGCC repeats. Oligonucleotide II-F/R contains 10 GGGGCC repeats followed by part of a NotI site. These two oligonucleotides were phosphorylated, annealed, and then ligated into restriction sites of HindIII and NotI of a pAG plasmid. The plasmid was then digested with HindIII and BamHI. The HindIII-BamHI fragment was digested with BanII, and the resultant HindIII-BanII fragment was then ligated with oligonucleotide II-F/R into the pAG plasmid. This approach was repeated three times with similar digestions and ligations of oligonucleotide II. Finally, the HindIII-BanII fragment was ligated with oligonucleotide III-F/R (which contains two CCCCGG repeats followed by a 99 bp flanking sequence and then followed by part of the NotI site) into the pAG plasmid (referred to as 113bp-35RG4C2-99bp plasmid). To delete stop codons after the CCCCGG repeats, the plasmid was treated with BfaI and NotI, and the digested fragment was ligated with oligonucleotide IV-F/R. To add sequence upstream from the C4G2 repeats, a 543 bp portion (408-950 of NCBI reference sequence, NC_000009.12) of the C9ORF72 gene from HEK293 genomic DNA was amplified by PCR using the primer shown in Supplementary file 1. The amplified construct was then ligated with the BtgI/NotI-digested fragment of the 113bp-35RG4C2-99bp plasmid into XbaI and NotI sites of pcDNA6/V5-His A plasmid (referred to as 609bp-35RC4G2 plasmid). To further increase the length of sequence upstream from CCCCGG repeats, a 392 bp portion (951-1342 of NCBI reference sequence, NC_000009.12) of C9ORF72 gene from HEK293 genomic DNA was amplified by PCR using the primer shown in Supplementary file 1. The amplified construct was then ligated with the XbaI/NotI fragment of 609bp-35RC4G2 plasmid into HindIII and NotI sites of the pAG plasmid (referred to as AS-C9 plasmid). The ΔC9 plasmid (Sonobe et al., 2021) was generated as previously described.
To mutate sequences, a 560 bp portion upstream from the repeats in the AS-C9 plasmid was amplified by PCR using a primer shown in Supplementary file 1. The amplified portion was then ligated into the HindIII and NotI sites of pcDNA6/V5-His A plasmid. Mutations were made with Q5 Site-Directed Mutagenesis Kit (New England Biolabs) using primer sets (Supplementary file 1). The StuI/BtgI portion of the resultant mutants was then cloned back into the StuI and NotI sites of AS-C9 plasmid with BtgI/NotI portion of AS-C9 plasmid using the primer sets in Supplementary file 1.
To generate the vector to induce expression of poly-PA, the fragment AUG-PA-F/R (Supplementary file 1) was phosphorylated, annealed, and then ligated into restriction sites of HindIII and BtgI of the AS-C9 plasmid.

Cell culture
HEK293 and NSC34 cells were cultured in DMEM supplemented with 10% FBS, 2 mM L-glutamine, 100 U/ml penicillin, and 100 μg/ml streptomycin. The cell lines were checked for mycoplasma contamination by DAPI staining but were not authenticated.

Luciferase assay
The cells were plated in 24-well plates at 5×10 4 cells per well and then cotransfected using Lipofectamine LTX (Thermo Fisher Scientific) with 100 ng of the plasmid along with 100 ng fLuc plasmid as a transfection control. After 48 hr, the cells were lysed with 1× passive lysis buffer (Promega). Levels of nLuc and fLuc were assessed with the Nano-Glo Dual-Luciferase Reporter assay system (Promega) and a Wallac 1420 VICTOR 3V luminometer (Perkin Elmer) according to the manufacturer's protocol.

Cell viability assay
Cell viability assay was performed using Cell counting kit-8 (Dojindo) according to the manufacturer's protocol. In brief, NSC34 cells were plated in 96-well plates at 2.5×10 3 cells per well and then transfected using Lipofectamine LTX with 100 ng of the indicated plasmid. After 48 hr, 10 μl of the CCK-8 solution was added to the well and incubated for 2 hr in a CO 2 incubator. The reaction was stopped by adding 0.1 M HCl and the absorbance at 450 nm was measured.

Immunocytochemistry
The cells were plated in four-well Lab-Tek II Chamber Slide (Nunc) coated with 50 μg/ml poly-D-lysine (Sigma) at 5×10 4 cells per well and transfected using Lipofectamine LTX with 500 ng of the indicated plasmid. After 48 hr, the cells were fixed with 4% paraformaldehyde for 15 min at room temperature. Then, the cells were permeabilized with phosphate buffered saline (PBS) with 0.2% Tween-20 for 20 min at room temperature. The samples were incubated with blocking buffer (2% BSA in PBS) for 1 hr at room temperature and then incubated overnight at 4°C with antibodies against poly-PR (1:250, ABN1354, EMD Millipore) or poly-GP (1:100, TALS 828.179, Target ALS). After rinsing with PBS, cells were incubated with Alexa 488-conjugated chicken anti-mouse IgG (1:2000, Thermo Fisher Scientific) or Alexa 488-conjugated goat anti-rabbit IgG (1:2000, Thermo Fisher Scientific) for 1 hr at room temperature, and then counterstained with DAPI. Images were captured using a confocal laser microscope system (Leica TCS SP5, Leica Microsystems) and processed using ImageJ2 software (version 2.9.0/1.53t).

Generation of EIF2D knockout cells by CRISPR/Cas9 gene editing
A single guide RNA (sgRNA) ( GCAG TGAC TGTG TACG TGAG ) that targets exon 2 of eIF2D was cloned into lentiCRISPR v2 plasmid (Addgene). HEK293 cells were plated into six-well plates at 4 × 10 5 cells per well, and then transfected using Lipofectamine LTX with 2.5 μg lentiCRISPR v2 plasmids containing the sgRNA sequence. Transfected cells were selected using 3 μg/ml puromycin for 3 days. EIF2D knockout cell clones were obtained by limited dilution. The resulting EIF2D knockout cells carry allelespecific mutations, as follows. Compared to the wild type (WT) GGAT GCAG TGAC TGTG TACG TGAG TGGT GG sequence, one allele GGAT GCAG TGAC TGTG TACG T TGAG TGGT GG has a single nucleotide insertion shown bolded while the other allele contains a two-nucleotide deletion GGAT GCAG TGAC TGTG TA-TGAG TGGT GG. Both alleles lead to a premature stop codon, likely resulting in two different truncated eIF2D proteins with the following respective sequence: (twice) from isogenic control and one C9ORF72 iPSC line. DPR levels were measured using an Meso Scale Discovery (MSD) immunoassay in a blinded manner. Data presented as mean ± SD. p-Values were calculated using two-way ANOVA with Dunnett's multiple comparison test using Prism (9.1) software. (C-D) The EIF2D and actin mRNA levels were assessed by real-time quantitative PCR on C9ORF72 human motor neurons (two patient lines) upon siRNAs transfection (scramble, EIF2D siRNA-1 or EIF2D siRNA-2). The eIF2D mRNA levels were normalized to actin. The experiments were repeated three times. *p<0.05, ***p<0.001, ns, not significant by two-tailed unpaired t tests were used for two groups and a one-way ANOVA followed by Dunnett's post hoc analysis was used for more than two groups. (E) Poly-GA, poly-GR, and poly-GP levels in motor neurons differentiated independently (n=3 times) from isogenic or healthy control lines and total two C9ORF72 patient iPSC lines (lines 27#11 and 40#3). DPR levels were measured using an MSD immunoassay in a blinded manner. For poly(GA) assay, total protein normalized poly(GA) concentrations were converted to percentage and presented as mean ± SE. For poly(GR), poly(GP) assay, total protein normalized electrochemiluminescence (ECL) values were converted to percentage and presented as mean ± SE. p-Values were calculated using one-way ANOVA with Dunnnett's T3 multiple comparisons test .
The online version of this article includes the following source data and figure supplement(s) for figure 7:

Knockdown of eIF2D in HEK293 cells
shRNA plasmids against human eIF2D were prepared using previously published methods (Sonobe et al., 2021). In brief, oligonucleotides with an siRNA sequence were cloned into the BamHI and HindIII sites of pSilencer 2.1-U6 neo Vector (Thermo Fisher Scientific) according to the manufacturer's protocol. The latter kit also contained a control shRNA vector. For luciferase assays (shown above), the cells were plated in 24-well plates at 5×10 4 cells per well and cotransfected with 50 ng of the AS-C9 plasmids and 50 ng of the fLuc plasmids along with 500 ng of either control shRNA or anti-eIF2D shRNA using Lipofectamine LTX (Thermo Fisher Scientific).

SiRNA knockdown
After 3 weeks in neuron culture media, motor neurons were transfected with an siRNA specific to eIF2D mRNA or a scrambled control. For the transfection, Lipofectamine RNAiMAX (Thermo Fisher Scientific) was first diluted in Opti-MEM medium, and then both eIF2D and scrambled control siRNAs were separately diluted in Opti-MEM medium at room temperature. Diluted siRNA and diluted Lipofectamine RNAiMAX (1:1 ratio) were then mixed and incubated for 20 min. The siRNA-lipid complex solution was then brought up to the appropriate volume with MN culture medium. The culture medium in the plate was aspirated and replaced with an siRNA-lipid complex at a final concentration of 60 pmol siRNA in 1.5 ml medium per 1,000,000 cells. After 24 hr, the medium was replaced with a normal motor neuron medium. This process was repeated two more times at 26 and 31 days in culture. After 36 days in culture, we measured siRNA efficiency and levels of DPRs in harvested motor neurons.

RNA extraction and quantitative real-time PCR
Total RNA from iPSC-derived motor neurons was extracted with the RNeasy Mini Kit (QIAGEN) and then reverse-transcribed to cDNA with the TaqMan Reverse Transcription Kit (Applied Biosystems). Quantitative PCR was carried out with SYBR Green Master Mix (Applied Biosystems). Using primers listed in SI Appendix, Table, Ct values for each gene were normalized to actin and GAPDH. Relative mRNA expression was calculated with the double delta Ct method.

Measurement of soluble poly-GR and poly-GP in iPSC-derived neurons
Soluble poly-GR and poly-GP levels in iPSC-derived neurons were detected using the Meso Scale Discovery (MSD) Immunoassay platform as previously reported (Krishnan et al., 2022). In brief, cells were lysed using Tris-based lysis buffer, and lysates were adjusted to equal concentrations and loaded in duplicate wells. Background subtracted electrochemiluminescence signals were presented as percentage. The MSD assays were performed in a blinded manner.

Soluble and insoluble fractionation for measurement of poly-GA
Motor neurons were lysed in RIPA buffer (Boston BioProducts, BP-115D) with protease and phosphatase inhibitors. The lysates were rotated for 30 min at 4°C, followed by centrifugation at 13,500 rpm for 20 min. The supernatant was removed and used as the soluble fraction. Protein concentrations of the soluble fraction were determined by the BCA assay (Thermo Fisher Scientific, Cat # 23227). To remove carryovers, the pellets were washed with RIPA buffer, and then resuspended in the same buffer with 2% SDS followed by sonication on ice. The lysates were rotated for 30 min at 4°C, then spun at 14,800 rpm for 20 min at 4°C. The supernatant was removed and used as insoluble fraction. Protein concentrations of the insoluble fraction were determined by Pierce 660 nm Protein Assay (Thermo Fisher Scientific, 22660).

Measurement of poly-GA in iPSC-derived neurons
Poly-GA in soluble motor neuron lysates was measured using an MSD sandwich immunoassay. A human/murine chimeric form of anti-GA antibody chGA3 was used for capture, and a human anti-GA antibody GA4 with a SULFO-tagged anti-human secondary antibody was used for detection. Poly-GA concentrations were interpolated from the standard curve using 60X-GA expressed in HEK 293 cells and presented as percentage. For background correction, values from no-repeats neuron samples were subtracted from the corresponding test samples.

Statistical analysis
Statistical analysis was performed by one-way ANOVA with Tukey's multiple comparison test and twoway ANOVA with the Šídák multiple comparison test using GraphPad Prism version 9.3.1. A p-value of <0.05 was considered significant. The data are presented as mean ± standard error of the mean.

Funder
Grant reference number Author The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Additional files
Supplementary files • Supplementary file 1. List of primers used for this study.
• Transparent reporting form 1Data availability All data generated or analyzed during this study are included in the manuscript and supporting files.