Differential toxicity and localization of arginine-rich C9ORF72 dipeptide repeat proteins depend on de-clustering of positive charges

Summary Arginine-rich dipeptide repeat proteins (R-DPRs), poly(PR) and poly(GR), translated from the hexanucleotide repeat expansion in the amyotrophic lateral sclerosis (ALS)-causative C9ORF72 gene, contribute significantly to pathogenesis of ALS. Although both R-DPRs share many similarities, there are critical differences in their subcellular localization, phase separation, and toxicity mechanisms. We analyzed localization, protein-protein interactions, and phase separation of R-DPR variants and found that sufficient segregation of arginine charges is necessary for nucleolar distribution. Proline not only efficiently separated the charges, but also allowed for weak, but highly multivalent binding. In contrast, because of its high flexibility, glycine cannot fully separate the charges, and poly(GR) behaves similarly to the contiguous arginines, being trapped in the cytoplasm. We conclude that the amino acid that spaces the arginine charges determines the strength and multivalency of the binding, leading to differences in localization and toxicity mechanisms.


INTRODUCTION
Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disease that causes progressive degeneration of upper and lower motor neurons. 1 The most frequently mutated familial ALS-causative gene is C9ORF72. 2,3 All C9ORF72-related ALS (C9-ALS) patients have aberrant GGGGCC hexanucleotide repeat expansions (HRE) in intron 1 of the gene. Although the precise pathophysiology of C9-ALS remains unclear, 1) loss of function of C9ORF72 protein, 4 2) HRE of DNA/RNA-mediated toxicity, 5,6 and 3) toxic repeat-associated non-ATG translation (RAN-T) products 7,8 have been suggested as possible causes of neurotoxicity. Among the RAN-T products, in particular, arginine-rich DPRs (R-DPRs), poly(proline-arginine: PR) and poly(glycine-arginine: GR), are reportedly highly toxic in cell and animal models. [9][10][11] Poly(PR) and poly(GR) have similar biochemical properties in many respects. They are positively charged because of alternating arginines and have repetitive structures. 12 These repetitively charged sequences are capable of multivalent binding (arginine works as a sticker amino acid and glycine or proline function as spacer amino acids) to macromolecules such as proteins and nucleic acids; thus, they have a strong propensity for liquid-liquid phase separation (LLPS). 13 Indeed, both poly(PR) and poly(GR) reportedly phaseseparate with various intracellular molecules 14,15 and associate with membrane-less organelles (MLOs), disturbing their phase separation homeostasis and functions. 12,13,16 However, there are significant differences between poly(PR) and poly(GR). In various ALS models, poly(PR) is more toxic than poly(GR). 11,12,17 In addition, poly(PR) localizes mainly to the nucleolus, whereas poly(GR) is present in the cytoplasm and nucleolus and is often observed only in the cytoplasm, [18][19][20][21] resulting in different interactomes. [22][23][24] The difference in localization is confirmed for both short 20-repeat DPRs 19 and DPRs with more than 1000 repeats, which is a physiologically-relevant size. 25 However, the mechanism by which poly(PR) and poly(GR) localize differently, even when the repeat length is the same, remains unknown. Furthermore, it has been reported that poly(GR), but not poly(PR), exerts unique toxicity via impaired nuclear transport of TAR DNA-binding protein-43kDa (TDP- 43) 26 and disturbance of mitochondrial functions, 27,28 but how the spacer amino acid influences protein localization, phase separation, protein-protein interactions, and toxicity remains unknown.
with arginine, cytoplasmic localization was enhanced by uneven distribution of lysine, but unlike arginine, GFP-(P 16 K 16 ) 3 remained partially localized to the nucleolus ( Figure S2D). We tested whether this rule could be applied to natural proteins. The US11 protein of herpes simplex virus (HSV) type 1 has 24 (X-P-R) repeats and localizes to the nucleolus, 35 but uneven distribution of arginines resulted in the loss of localization to the nucleolus and the US11 mutant localized to the cytoplasm ( Figure S2E). Human ribosomal protein L29 (RPL29), which has repetitive basic amino acids, localizes to the nucleolus and cytoplasm. 36 Still, by changing the distribution of basic amino acids without changing the net charge, it lost nucleolar incorporation and localized to the cytoplasm ( Figure S2F). Furthermore, when lysines of wild-type RPL29 were substituted for arginines, its targeting to the nucleolus was enhanced and when arginines of wild-type RPL29 were substituted for lysines, its targeting to the nucleolus was weakened, indicating that arginine has a greater impact on nucleolar localization than lysine ( Figure S2F). These data indicate that periodic appearance, rather than contiguous arginines is important for nucleolar distribution.  50 and NPM1. TOP-IDP is used as the disorder propensity scale. Ten cells/each were evaluated using the ImageJ EzColocalization Plugin. (C) Pie chart depicting proteins containing 5 or more repeats of (X-R/K) in the human proteome.
(D) Overrepresentation of proteins containing 5 or longer repeats (middle), or proteins containing 10 or longer repeats (right) of (X-R/K) in MLO proteomes. iScience Article Poly(GR) has characteristics similar to those of consecutive R, and localizes to the cytosol In C9ORF72-ALS, both poly(PR) and poly(GR) are toxic in vitro and in vivo. 9,10,16,37 Both poly(PR) and poly(GR) contain 50% arginine, and the positive charge of arginine allows them to interact with acidic macromolecules, including nucleic acids and proteins, resulting in neurotoxicity. However, poly(PR) and poly(GR) reportedly activate different neurotoxic pathways, even though they contain the same ratio of arginine. 27,38 One possible explanation for this is the difference in their subcellular localization.
The principles of protein targeting to the nucleolus are well-studied and this targeting is mediated by a charge-dependent mechanism. 39 Six consecutive arginines are sufficient for the peptide to distribute to the nucleolus, and an isoelectric point above 12.6 is sufficient for targeting to the nucleolus. 32 The webbased nucleolar localizing signal (NoLS) detector (NoD), which predicts the presence of NoLS in proteins, also reveals that the positive charge rather than a specific motif is responsible for nucleolar localization. 40 According to the NoD, both (PR) 50 and (P 16 R 16 ) 3 are recognized as NoLS ( Figure S3A). However, GFP-(PR) 50 localized to the nucleolus, whereas GFP-(P 16 R 16 ) 3 localized to the cytosol ( Figure S2B). In addition, GFP-(GR) 50 localized to both the nucleolus and cytosol despite the same isoelectric point as GFP-(PR) 50 , 19 indicating that a factor other than net charge determines nucleolar localization of the protein.
To clarify the mechanism, we first investigated the effect of the number of arginine residues on nucleolar localization. Arginine, a positively charged amino acid with a dipole moment, is responsible for the protein-protein interaction of poly(PR). 15 When we overexpressed R 10 and R 20 in HeLa cells, they localized to the nucleolus, as reported previously. 32,39 However, although consecutive R is recognized as an NoLS by the NoD webserver, constructs of R 30 or longer localized to the cytosol, but not to the nucleolus (Figure 2A). In contrast, when we overexpressed poly(PR), it localized to the nucleolus, and the longer it was, the more exclusively the localization became ( Figure 2B). Next, we tested the poly(GR) with different repeat lengths. When the repeat length of poly(GR) was increased, consistent localization to the nucleolus was observed until a repeat number of 30 ( Figure 2C). However, as the length of poly(GR) was further increased, localization to the nucleolus peaked at a certain point and then shifted to cytoplasmic localization, which is very similar to the behavior of contiguous R (Figure 2A). The PCC of the GFP signal with a nucleolar marker NPM1 signal indicated that the PCC decreased as the lengths of consecutive R and poly(GR) increased, but for poly(PR), the PCC with the NPM1 signal increased for longer poly(PR) repeat lengths ( Figure 2D). We further tested longer constructs of poly(PR) and poly(GR). Because the HREs longer than 200 are considered as very likely pathogenic, 41 Figure 2E). Supporting these data, poly(PR) of physiologically relevant size, (PR) 1100 , still exclusively localizes to the nucleolus. 25 To biochemically validate these results, fractionation analyses were performed. PolyR produced smears whose molecular weights were larger than expected in the cytoplasmic fraction. Poly(PR) and poly(GR) showed bands in the expected sizes. Poly(PR) enhanced the nuclear signal as its length increased, whereas poly(GR) shifted its localization from the nucleus to the cytoplasm when its length increased ( Figures 2F-2H). The effect of repeat length on nucleolar localization was also tested for poly(YR) and poly(QR), which migrated to the nucleolus similarly to poly(PR) ( Figure 1A). Poly(QR) localized to the nucleolus more efficiently when the repeat length was increased, as with poly(PR) ( Figure S3B). (YR) 10 did not localize to the nucleolus, but to the nucleoplasm, but (YR) 30 and (YR) 50 efficiently localized to the nucleolus ( Figures S3C and S3D). This suggests that in general, poly(XR) more efficiently localizes to the nucleolus when the repeat length is increased and that poly(GR) is exceptional in this respect. As reported previously, consecutive lysines also act as a nucleolar migration signal. 32 Therefore, we generated polyK of different lengths as well, and tested their subcellular localization ( Figures S3E and S3F). We found that co-localization of polyK with NPM1 decreased as the length increased, as was the case with polyR, but a faint signal in the nucleolus remained even at a repeat length of 50 ( Figures S3E and S3F).
Nuclear import of proteins is mediated by both passive diffusion and active transport via importin family proteins. 42 Importin family proteins bind to the nuclear localizing signal (NLS) and the importin complexes move through the nuclear pore. To test whether loss of nucleolar distribution of (GR) 50 could be reverted by addition of NLS, we fused three tandem repeats of NLS of SV40 large T antigen (PKKKRKVD) 43 to the N-terminus of GFP-(GR) 50 and tested its localization, but the NLS did not affect localization of (GR) 50 50 is static or a result of an exaggerated nucleocytoplasmic export system. To test this, we treated cells with a Chromosomal maintenance 1 (CRM1, also known as exportin-1) inhibitor, Leptomycin B (LMB). 44 The shuttle-tdTomato (s-tdTomato), which carries both the NLS and nuclear exporting signal (NES) and which works as a fluorescent reporter of nucleocytoplasmic transport, 45 was localized in both the cytoplasm and nucleus in untreated cells, whereas in LMB-treated cells, nucleocytoplasmic export was arrested and s-tdTomato accumulated in the nucleus ( Figure S4C). iScience Article However, localization of R 50 and (GR) 50 to the cytoplasm was not affected by LMB, indicating that the distribution of R 50 and (GR) 50 to the cytoplasm was static rather than enhanced CRM1-mediated nucleocytoplasmic export. Arginine has a strong positive charge and promotes intermolecular interactions. Contiguous arginines might align well with acidic sequences nearby, 46 suggesting that polyR is strongly bound to cytosolic molecules, as shown by the smear in the fractionation experiment ( Figure 2F). To evaluate the macromolecular environment around R 50 , we fused CRONOS (crowding sensor with mNeonGreen and mScarlet-I), a Fö rster resonance energy transfer (FRET)-based macromolecular crowding sensor, with R 50 ( Figure S4D). 47 CRONOS consists of mNeonGreen and mScarlet-I connected by a flexible linker, similar to the mCerulean-mCitrine-based sensor as previously reported, 48 and once the macromolecular crowding level increases, the distance between these two fluorescent proteins diminishes, resulting in higher FRET efficiency. The increased FRET efficiency of the CRONOS reporter suggested that GFP-R 50 is more likely to have enhanced macromolecular crowding because of interactions with surrounding molecules ( Figures S4E  and S4F). These results suggest that (GR) 50 and R 50 do not diffuse freely in the cytoplasm, but rather bind strongly to intracytoplasmic molecules, making them difficult to translocate to nucleus.

Poly(GR) binds strongly to surrounding molecules
Glycine is the smallest amino acid, and because of its side chain of a single hydrogen atom, glycine confers flexibility on peptide structure. In contrast, proline has a pyrrolidine side chain that restricts peptide flexibility, contributing to structural rigidity. To dissect biochemical differences between poly(PR) and poly(GR), we performed a molecular dynamics (MD) simulation of (PR) 12 and (GR) 12 using GROMACS, to estimate their folding status. 49 The radius of gyration (Rg) of (PR) 12 is larger than that of (GR) 12 , indicating that (GR) 12 can fold more compactly than (PR) 12 ( Figure 3A). The larger root-mean-square deviation (RMSD) of (GR) 12 was consistent with the expected higher flexibility of (GR) 12 ( Figure 3B). These results were consistent with former MD simulation results of poly(PR) and poly(GR). 50 Previous computational and experimental studies revealed that the distribution of charged residues affects electrostatic forces, and consecutive charged residues have especially strong charge correlations because of the sequence alignment of two nearby chains, which is advantageous for strong binding. 34,46 The highly flexible nature of poly(GR) may be beneficial for sequence alignment of nearby molecules, similar to consecutive arginines. If this is true, poly(GR) binding to surrounding acidic molecules is stronger than that of poly(PR).
To test this hypothesis, we first evaluated fluidity of molecules phase-separated with poly(GR) or poly(PR) by fluorescence recovery after photo bleaching (FRAP) analysis. When mixed with poly-rA RNA containing tetramethylrhodamine (TAMRA)-labeled rA 15 , all R 12 , (GR) 12 , and (PR) 12 underwent LLPS ( Figure S5A). The FRAP analysis showed that fluidity of (PR) 12 droplets was higher than that of (GR) 12 or R 12 ( Figure S5B), indicating that R 12 and (GR) 12 bound more tightly to RNA than did (PR) 12 . It is possible that efficient localization to the nucleolus can be explained by strong affinity for nucleolar molecules. To compare the affinity of GFP-R 20 , GFP-(PR) 20 and GFP-(GR) 20 to the nucleolus in live cells, we performed FRAP analysis in the nucleolus ( Figure S5C). GFP-(PR) 20 loosely interacted with nucleolar molecules and maintained high fluidity, while GFP-R 20 and GFP-(GR) 20 strongly interacted with nucleolar molecules and had low fluidity, which is consistent with the in vitro FRAP analysis ( Figure S5B). This finding was further substantiated by FRAP analysis of nucleolar GFP-(GR) 50 and GFP-(PR) 50 ( Figure S5D). To directly compare binding strength between poly(PR) or poly(GR) and acidic molecules, we measured critical salt concentration, which is the minimal  Figure 3D). The interaction between R-DPRs and KAPB2-H8 was confirmed by nuclear magnetic resonance (NMR). 26 20 bound more tightly to KAPB2-H8 than did (PR) 20 . Importantly, (GR) 20 mutant droplets disappeared at 300 mM NaCl even though they carried 20 (GR) repeats, as in (GR) 20 ( Figures 3G and 3H). Because (PR) 20 showed higher fluidity in LLPS droplets, we speculated that the inside of (PR) 20 droplets might be less crowded. To examine whether proline residues actually render droplets more fluid, we measured macromolecular crowding levels in phase-separated droplets formed by KAPB2-H8 and (PR) 20 , (GR) 20 or (GR) 20 mutant. In this experiment, we added recombinant CRONOS C-terminally fused with KAPB2-H8 peptide to LLPS droplets. The CRONOS-KAPB2-H8 reported macromolecular crowding in the droplets, and the (GR) 20 droplets showed the highest macromolecular crowding (Figures 3I and 3J). In contrast, periodic insertion of proline into every 5 (GR) repeats attenuated the crowding, and (PR) 20 droplets showed the lowest degree of macromolecular crowding, indicating that proline residues indeed contributed to the formation of loosely packed droplets. Next, we tested the effect of periodic insertion of proline every five (GR) repeats on localization of (GR) 50 (Figure 3K). Localization of (GR) 50 to the nucleolus was drastically enhanced by insertion of prolines ( Figures 3L, 3M, and S5E). FRAP analysis of (GR) 50 mutant showed slightly better recovery ( Figure S5F), indicating that (GR) 50 can distribute to the nucleolus if intermolecular interactions are attenuated.

Alternate insertion of Pro, but not other amino acids, promotes multivalent interactions
As reported previously, poly(PR) and poly(GR) inhibit protein translation in vitro and in vivo. 12,16,53,54 To test whether this inhibitory effect is specific to poly(PR) and poly(GR) or is a universal feature of poly(XR), we performed a puromycin-based SUnSET assay to monitor protein translation in cells 55 ( Figure 4A). Most GFP-(XR) 50 exerted mild suppression, whereas GFP-(PR) 50 showed the most potent inhibitory effect on protein translation ( Figure 4B). To dissect the mechanism underlying the strong toxicity of poly(PR), we investigated the nature of protein-protein interactions of poly(PR). When proline is alternately inserted into consecutive arginine sequences, it weakens the intermolecular interaction, but promotes multivalent interactions, resulting in a stronger propensity for LLPS with acidic proteins. 34 A comparison of the quantitative interactome analysis of R 12 , (GR) 12 , and (PR) 12 shows that the interactome of (GR) 12 is similar to that of R 12 , 34 but (PR) 12 is very different from (GR) 12 , with many signals enhanced ( Figure 4C). To examine whether this signal enhancement of interacting proteins is proline-specific or whether it is also observed for other amino acids, depending on the size and hydrophobicity of their side chains, we performed immunoprecipitation (IP)-immunoblot analysis (IB) by mixing synthetic HA-(XR) 12 peptides with cell lysates ( Figure 4D). In this experiment, we chose candidate interacting proteins based on our previous study, 34 in which we detected proteins common to the interactomes of R 12 , (GR) 12 , and (PR) 12 and confirmed that spacer amino acids influence affinity to these proteins. Non-POU domain-containing octamer-binding protein (NONO) was chosen as a control that did not differ between R 12 and (PR) 12 . We found that signal intensities of binding proteins were enhanced only in (PR) 12 . This enhancement was specific because some proteins such as NONO interacted similarly with all (XR) 12 peptides ( Figure 4D). This suggests that proline is particularly prone to forming high-multivalent interactions, which may account for the phenotypic difference between poly(PR) and poly(GR).
To further prove this, we mixed poly(PR) or poly(GR) with KAPB2-H8 peptide in different ratios and monitored . Continued (C) A volcano plot indicating differences between interactomes of HA-(PR) 12 and HA-(GR) 12 . Interactome data were obtained from a previous study. 34 (D) IP-IB comparing the degree of interactions of HA-(XR) 12 with indicated proteins. The asterisk shows the signal of the IgG heavy chain.
(E) Phase diagram of (GR) 12  iScience Article their phase separation. If poly(PR) is capable of interacting with a larger number of molecules, poly(PR) can phase separate and form droplets with a higher molar ratio of KAPB2-H8 than poly(GR). Because the length of DPR affects its propensity for phase separation, we tested peptides with different repeat numbers, 12 and 20. The phase diagram shows that (PR) 12 phase-separated with higher molar ratio of KAPB2-H8 than (GR) 12 (Figures 4E and 4F). This result was further emphasized with peptides of repeat length 20. 25 mM of (PR) 20 can undergo LLPS with 500 mM of KAPB2-H8 (molar ratio = 1:20) (Figures 4G-4I). These results indicate that poly(PR) is capable of binding to a larger number of molecules per peptide than poly(GR) and forms phase-separated droplets.
Adequate de-clustering of arginine charges is necessary for R-DPR distribution to the nucleolus The ratio of proline (spacer) to arginine (sticker) in the poly(PR) determines the nature of the poly(PR) droplets, and the segregation of adjacent arginines by proline is critical for the toxicity of poly(PR). 34 To assess the influence of the sticker-spacer ratio on nucleolar incorporation, we tested nucleolar localization of iScience Article variants of (PR) 50 and (GR) 50 with different ratios of sticker and spacer ( Figure 5A). We also tested the effect in (QR) 50 and (YR) 50 because they showed a good PCC with nucleolar NPM1 signals ( Figures 1A and 1B), and glutamine and tyrosine are known as typical components of LLPS-prone proteins. 56 Here, spacer X separates the charges of neighboring arginines, and in the case of X:R = 2:1, the distance between adjacent arginines is doubled compared to the 1:1 case. If X:R = 1:2 or 1:3, two or three consecutive arginines will appear, and one spacer will separate the consecutive charges for every two or three arginines. Because contiguous arginines above a certain threshold are trapped in the cytoplasm (Figure 2A), it is presumed that spacer amino acids, which are efficient in separating contiguous arginine charges, help escape from entrapment and facilitate entry into the nucleolus. If the distance between arginines is too great, the clustering of positive charges necessary for nucleolar distribution is lost, and accumulation in the nucleolus is limited. First, we monitored subcellular localization of (PR) 50 variants containing 50 arginines with different P:R ratios. (P 1 R 3 ) 16 was found in both the cytoplasm and the nucleolus, whereas (P 1 R 2 ) 25 and (PR) 50 were exclusively localized in the nucleolus (Figures 5B and 5C). (P 2 R 1 ) 50 was found in the cytosol without nucleolar distribution, indicating that segregation of arginine charges by proline is efficient ( Figure 5B). In the case of glycine, (G 1 R 3 ) 16 and (G 1 R 2 ) 25 localized to the cytoplasm, whereas (GR) 50 partially localized to the nucleolus in 22% of cells ( Figure 5C). Importantly, (G 2 R 1 ) 50 showed dramatically increased localization to the nucleolus, suggesting that a single glycine was not sufficient to separate the arginine charges and that insertion of two glycines between arginine separated the charge enough to shift localization from the cytosol to the nucleolus. When (QR) 50 and (YR) 50 were tested the same way, insertion of two glutamines between arginines resulted in decent migration to the nucleolus, whereas insertion of two tyrosines between arginines attenuated the degree of migration to the nucleolus, suggesting that the ability to segregate the charges of consecutive arginines is P > Y > Q > G. These observations were also confirmed in the motor neuronal NSC34 cells ( Figure S6). These results indicate that the difference in subcellular localization of poly(PR) and poly(GR) is because of the difference in their ability to separate the arginine charges. The FRAP analysis showed that (P 1 R 3 ) 16 is most tightly bound to nucleolar components, followed by (P 1 R 2 ) 25 and (PR) 50 , indicating that strong binding to nucleolar molecules does not necessarily mean efficient incorporation into the nucleolus ( Figure 5D). Although both GFP-(P 1 R 3 ) 16 and GFP-(P 2 R 1 ) 50 localized to the cytosol, we speculated that the molecular mechanisms underlying their localization are different.
To investigate this, we overexpressed GFP-(P 1 R 3 ) 16 or GFP-(P 2 R 1 ) 50 in combination with s-tdTomato in HeLa cells and treated them with LMB. GFP-(P 1 R 3 ) 16 remained in the cytoplasm, whereas GFP-(P 2 R 1 ) 50 accumulated in the nucleus ( Figures S7A and S7B), indicating that GFP-(P 1 R 3 ) 16 was statically localized in the cytoplasm, whereas GFP-(P 2 R 1 ) 50 freely moved between the nucleus and cytoplasm and was LMB-sensitive. The degree of inhibition of protein translation was also affected by the ratio of P:R and (PR) 50 showed the strongest inhibitory effect ( Figure 5E). To examine whether the greater migration rate of (G 2 R 1 ) 50 and (GR) 50 mutant [one proline inserted into every (GR) 5 ] ( Figure 3K) to the nucleolus depends on mimicking the charge separating property or the high-multivalent interaction of (PR) 50 , we performed IP-IB using (G 2 R 1 ) 12 and (GR) 12 mutant. We confirmed that neither (G 2 R 1 ) 12 nor (GR) 12 mutant developed higher multivalent interactions ( Figure 5F). Therefore, segregation of arginine charges rather than high-multivalent interactions is important for nucleolar incorporation. The inability of glycine to separate the charges of arginine probably depends on the size of side chain and the flexibility. Alanine has the next smallest side chain after glycine ( Figure S8A), and LLPS droplets with (AR) 12 showed similar biochemical characteristics to those of R 12 and (GR) 12 , such as irregular shapes of the phase-separated droplets and the limited recovery rate of FRAP analysis (Figures S8B and S8C). However (AR) 50 is localized almost exclusively to the nucleolus, indicating that the methyl group of the side chain of alanine, which limits rotation of the peptide main chain, has a significant effect on subcellular localization ( Figure 1A). 57

De-clustering of arginine charges contributes to differences between the interactomes and toxicity of R-DPRs
Differences in the mechanism of toxicity between poly(PR) and poly(GR) are expected to derive from their qualitatively and quantitatively different interactomes, which depend on the subcellular localization and valence of interaction. To investigate the manner of their protein-protein interactions, we performed interactome analysis using the TurboID-based proximity labeling method. 58 TurboID fused to R 50 , (GR) 50 , (G 2 R 1 ) 50 , (GR) 50 mutant, or (PR) 50 , was expressed in HeLa cells. After induction of biotinylation, biotinylated proteins in the HeLa cells were visualized using AlexaFluor488-conjugated streptavidin ( Figure 6A). Signals indicated that biotinylation of these proteins in close proximity was achieved. We also confirmed the biotinylation of the proteins by SDS-PAGE, followed by detection with horseradish peroxidase (HRP)-labeled iScience Article streptavidin ( Figure 6B). We purified the biotinylated proteins with streptavidin beads and analyzed them by quantitative liquid chromatography/mass spectrometry (LC/MS). 59 We found that the interactome of (GR) 50 was quite similar to that of R 50 , but different from that of (PR) 50 ( Figures 6C-6E). Importantly, the (PR) 50 interactome was qualitatively similar to that of (G 2 R 1 ) 50 and (GR) 50 mutants, but quantitatively different from them, indicating that the increased binding valence of proline determined the characteristics of the (PR) 50 interactome ( Figures 6D-6E, S9A, and S9B). We hypothesized that the strong inhibition of protein translation by (PR) 50 might derive from the increased valence of protein-protein interactions. To test this hypothesis, we again performed proximity labeling analysis of (PR) 50 and (YR) 50 , which localizes in the nucleolus in the same manner as (PR) 50 , but which exerts a less inhibitory effect on protein translation ( Figure 4B). We found that both TurboID-(PR) 50 and TurboID-(YR) 50 exclusively biotinylated proteins in the nucleolus ( Figures 6A and 6F), but that the signal intensities of many biotinylated nucleolar proteins were higher in the interactome of (PR) 50 than in that of (YR) 50 ( Figure 6G).
In our previous study, we demonstrated that alternate insertions of proline into consecutive arginines confer a propensity for binding to acidic proteins. Therefore, we compared the isoelectric points (pI) of the 100 most and the 100 least enriched nucleolar proteins. The most enriched proteins tend to have slightly lower pIs when compared with the least enriched proteomes ( Figure 6H). When we sought the longest acidic stretch in the proteins, we found that the 100 most enriched proteins contain significantly longer acidic stretches than the least enriched proteins (p = 0.00029) ( Figure 6I). Amino acid occurrence in those proteins shows that acidic residues, but not aromatic residues were overrepresented in the poly(PR) interactome ( Figure 6J). These results suggest that alternate insertions of proline have a more substantial impact on the interaction with acidic stretches than alternate insertions of tyrosine. STRING analysis of proteins with signal intensities that more than doubled in the (PR) 50 interactome revealed that many of these proteins are involved in ribosomal RNA (rRNA) processing [GO: 0006364, False discovery rate (FDR) = 5.563eÀ76] (Figures S9C and S10). We measured rRNA expression levels by quantitative PCR and confirmed that (PR) 20 peptide treatment inhibited rRNA processing with increased pre-ribosomal RNA ( Figure 6K). This suggests that the strong inhibitory effect of poly(PR) on protein translation was due in part to disturbance of rRNA processing.

DISCUSSION
Poly(PR) and poly(GR), produced from mutant C9ORF72, are toxic in vitro and in vivo and are thought to contribute to pathogenesis of ALS. 60 However, it is unclear what accounts for differences in their localization and toxicity mechanisms. In this study, we found that poly(GR) binds strongly to a small number of molecules, whereas poly(PR) binds weakly to a large number of molecules. We expect that the rigidity that is a unique feature of proline, rather than the size of the side chain, contributes to achieving highly multivalent interactions because even though tyrosine has a larger side chain than proline, poly(YR) interacted with proteins in a less-multivalent manner (Figures 4D and 6G). Furthermore, proline efficiently and adequately separates the charges of arginines, which promotes the transition from the cytosol to the nucleolus. In contrast, glycine is flexible, and poly(GR) behaves like polyR and localizes to the cytoplasm. These biochemical features produce differences in localization and toxicity between poly(PR) and poly(GR).  50 . Biotinylated proteins were visualized with AlexaFluor488-conjugated streptavidin. Scale bar: 10 mm.
(G) A scatterplot indicating enrichment of identified proteomes. Analyses were performed in duplicate, and proteins with signal intensities that were at least doubled in the interactome of (PR) 50 , compared with that of (YR) 50 are colored in red.
(H) Isoelectric points of the 100 most enriched proteins and the 100 least enriched proteins in the interactome of (PR) 50 when compared with that of (YR) 50 . Proteins were sorted in order of pI.
(I) Histogram of the longest D/E stretches in the 100 most enriched and 100 least enriched proteins in the interactome of (PR) 50 , compared with that of (YR) 50 and the human proteome.
(J) Amino acid occurrences in the 100 most enriched and 100 least enriched proteins in the interactome of (PR) 50 , compared with that of (YR) 50  iScience Article Glycine is unique in structure. It has only a hydrogen atom as its side chain; thus, it is conformationally very flexible, allowing the main chain to adopt many conformations. Therefore, poly(GR) can easily achieve a good alignment with consecutive acidic sequences, as does polyR, 46 and can form strong intermolecular interactions at a low molar ratio. Proline is also a unique amino acid in which the side chain is connected to the main chain twice, forming a five-membered, nitrogen-containing pyrrolidine ring. Because of this nature, proline cannot occupy many of the main chain conformations that all other amino acids can adopt; thus, proline often forms tight turns where the peptide chain must change its direction. The high steric hindrance and rigidity of proline strongly interfere with the movement of the main chain. Therefore, poly(PR) is less likely to align well with consecutive acidic groups and binds only partially, allowing it to bind weakly to a large number of molecules. Furthermore, poly(PR) prefers more linear structures because of the rigidity of proline ( Figure 3A), which increases the probability of encountering a partner molecule to which it binds. Periodic insertions of proline into poly(GR) may interfere with strong interactions with acidic molecules, resulting in nucleolar localization. We speculate that the rigidity of proline rather than steric hindrance by the side chain contributes to multivalent interactions by insertions of proline, because (PR) 12 binds to more molecules than other (XR) 12 peptides that have side chains larger than that of proline ( Figure 4D). This finding was further substantiated by performing proximity labeling-based interactome analysis of (PR) 50 and bulky (YR) 50 ( Figure 6G). Amino acid occurrence in the interactome revealed that poly(PR) interacts with nucleolar proteins via electrostatic interactions, rather than cation-pi interaction, because acidic amino acids, but not aromatic amino acids were overrepresented in the interactome ( Figure 6J).
MLOs are formed by LLPS of RNA and ribonucleoprotein, and arginine-rich peptides have a positive charge that facilitates their interaction with RNA and their incorporation into the MLO. 32 Localization of proteins to the nucleolus also requires that basic amino acids form clusters locally. 39 However, as we have shown, contiguous arginine sequences of a certain length (N R 30) are trapped in the cytoplasm and do not migrate to the MLO, including the nucleolus. The molecular mechanism accounting for why R 50 failed to localize to the nucleolus, remaining in the cytoplasm is still unclear, but we hypothesize that the extremely consecutive R has high binding energy and therefore binds tightly to cytoplasmic molecules just after being translated by the ribosome. Therefore, it cannot be transported into the nucleus. This hypothesis is also supported by the fact that the degree of macromolecular crowding around R 50 was high ( Figure S4E). When we separated the arginine charges with alternate insertions of amino acids, we observed that most of the GFP-(XR) 50 migrated from the cytoplasm to the nucleolus. Since only poly(PR) acquired highly multivalent binding and no enhancement of multivalent binding was observed for poly(YR) or poly(QR) (Figure 4D), appropriate segregation of arginine charges, but not high multivalency in binding is required for localization to the nucleolus. In addition, there is an appropriate spacer/sticker ratio for localization to the nucleolus, and the ratio depends on how efficiently the inserting amino acid can separate the charges of arginines. For example, for proline, which strongly restricts the freedom of arginine, (PR) 50 variants with a P:R ratio = 1:1 or even 1:2 localize to the nucleolus, but when the ratio of P:R = 2:1, the cluster of basic amino acids necessary for nucleolar incorporation is not formed, and affinity for the nucleolus is not formed, allowing it to freely diffuse in the cytosol. This mechanism is different from that of (P 1 R 3 ) 16 , which is modestly trapped in the cytoplasm because of insufficient charge separation (Figures S7A and S7B). By contrast, single glycine insertion is not sufficient to separate the arginine charges and (GR) 50 localizes to the cytoplasm like R 50 , whereas (G 2 R 1 ) 50 , which is able to separate the charges sufficiently, localizes to the nucleolus. Proximity-dependent labeling also revealed that the degree of de-clustering of arginine charges determines the interactome of poly(XR) and differences in toxicity.
A lysine-to-arginine substitution in human RPL29 revealed that arginine has a more prominent impact on nucleolar cohesion. The comparison of polyK and polyR with different lengths indicated that arginines that are consecutive enough have a stronger propensity for static localization in the cytosol. These results suggest that although arginine and lysine are similar amino acids, they have fundamental differences. Arginine has intense binding capability because of the three-planer guanidinium ion, which enables simultaneous formation of cation-pi, pi-pi, and cation-anion contacts, 29 whereas poly(PK) forms less viscous droplets than poly(PR) in vitro. 15 Although inhibition of protein translation has been studied as an important aspect of R-DPR-mediated toxicity, the detailed molecular mechanism remains to be revealed. It reportedly binds directly to ribosomes and impairs their function, 61 disrupting eIF1a-dependent pathways 54 and affecting transcription. 34 Because both poly(GR) and poly(PR) inhibit protein translation, translation inhibition could be a common  50 , and (AR) 50 , which localize exclusively in the nucleolus. Comparison of the interactomes of (PR) 50 and (YR) 50 by proximity labeling revealed that (PR) 50 binds to a group of molecules involved in rRNA processing in the nucleolus, and this may partially explain the strong inhibitory effect of poly(PR). However, cytosolic poly(GR) may inhibit protein translation via a different mechanism, and further studies are needed.
In conclusion, we revealed molecular mechanisms determining the subcellular localization and different toxicities of poly(PR) and poly(GR). This finding will contribute to development of specific inhibitors for R-DPR-mediated toxicity. Furthermore, the clarified roles of spacer amino acids will accelerate research in phase-separation and MLO-associated proteins.

Limitations of the study
There are several limitations to this study. The first is that relatively short DPRs were used. That these may differ from biochemical features of naturally occurring DPRs with hundreds to thousands of repeats cannot be ruled out. Although even less than 50 repeats have been reported to contribute to the development of ALS with DPR pathology, 62-64 the HREs longer than 200 are considered as very likely pathogenic. 41 We confirmed that the localization of (PR) 200 and (GR) 200 , which can be translated from 200 repeats is the same as that of 50 repeats. As previously reported, localization of R-DPRs with more than 1000 repeats is consistent with that of DPRs with 50 repeats that we examined here. 25 The longer the repeat length, the greater the electrostatic force, so there is still a possibility that even poly(PR) will begin to localize to the cytoplasm above a certain threshold. Although our results clearly show that poly(PR) and poly(GR) localize differently up to 200 repeats, the subcellular localization of R-DPRs with several thousand repeats, as seen in ALS patients, requires further investigation. Another limitation of the study is that we obtained the data from immortalized cell lines.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:  iScience Article before transfection. Cells were transfected with poly(XR) constructs using Lipofectamine 2000 (Thermo Fisher Scientific). Twenty-four h after transfection, cells were trypsinized and centrifuged at 500 3g for 5 min and they were resuspended with ice-cold PBS. Cell pellets were transferred to 1.5-mL microcentrifuge tubes and centrifuged at 500 3g for 5 min and supernatants were aspirated. Ice-cold CEB buffer containing protease inhibitors was added to the pellets on ice. After 10 min incubation with gentle mixing on ice, cells were centrifuged at 500 3g for 5 min, supernatant fractions (cytosol) were collected. The remaining pellets containing nuclei were lysed in sample buffer. The cytosolic fraction and nuclear fractions were subjected to SDS-PAGE followed by immunoblot analyses. Membranes were probed with the following antibodies: GFP antibody (MBL, #598, 1:3000), LMNB1 antibody (Proteintech, 12987-1-AP, 1:3000), alphatubulin antibody (Wako, 017-25031, 1:3000). The signals were visualized by HRP-conjugated anti-rabbit secondary antibody (Cell Signaling Technology, #7074, 1:5000) or HRP-conjugated anti-mouse secondary antibody (Cell Signaling Technology, #7076, 1:5000).

LLPS and fluorescence recovery after photo bleaching (FRAP) analysis
For observation of LLPS of R 12 , (PR) 12 or (GR) 12 and poly-rA, each peptide (final concentration 100 mM) and poly-rA (final concentration 0.5 mg/mL) containing 100 nM TAMRA-rA 15 (Fasmac) were mixed. Droplets were plated on chambered cover glasses and covered with Silicon oil AP100 (Sigma). Droplets were observed using an LSM-710 confocal microscope (Carl Zeiss). FRAP analyses were also performed with an LSM-710 confocal microscope using a lens of N.A. = 1.2. For FRAP analysis, HeLa cells were seeded on 4-well chambered cover glasses (Matsunami) at 3310 4 cells/500 mL/well the day before transfection. Twenty-four h after transfection, FRAP analyses were performed with an LSM-710 confocal microscope using the same settings. Data were analyzed with ZEN software (Carl Zeiss).

Critical salt concentration
To determine the critical salt concentration, 50 mM of (PR) 20 , (GR) 20 or (GR) 20 mutant peptides were mixed with 50 mM of KAPB2-H8 peptide in a phase-separation buffer (10 mM HEPES, pH 7.4) containing different concentrations of NaCl from 0 to 400 mM. Turbidity was measured by absorbance at 600 nm using NanoDrop One (Thermo Fisher Scientific).

Phase diagram of R-DPRs with different ratios of KAPB2-H8
To evaluate the multivalency of interactions, each R-DPR at the indicated concentration was mixed with different concentrations of KAPB2-H8 peptide in phase separation buffer [10 mM HEPES (pH7.4) and 100 mM NaCl]. Turbidity was measured with absorbance at 600 nm using NanoDrop One (Thermo Fisher Scientific).

Evaluation of macromolecular crowding in LLPS droplets using a CRONOS sensor
A Fö rster-resonance energy transfer (FRET)-based fluorescent biosensor CRONOS (crowding sensor with mNeonGreen and mScarlet-I) was used to evaluate macromolecular crowding in LLPS droplets consisting of R-DPRs and KAPB2-H8. 47 To obtain recombinant CRONOS-KAPB2-H8, E. coli BL21 strain (New England Biolabs) was transformed with pET28a-CRONOS-KAPB2-H8 and cultured overnight on an LB agar plate containing kanamycin. A single colony was picked and cultured in 23 YT growth medium at 37 C until the OD 600 reached 1.0. Expression of CRONOS-KAPB2-H8 was induced by adding 1 mM isopropyl b-D-thiogalactopyranoside (IPTG), and cells were cultured overnight at 25 C. The E. coli expressing recombinant CRONOS-KAPB2-H8 was pelleted by centrifugation at 4000 3g for 10 min and sonicated in lysis buffer [50 mM Tris-HCl (pH 7.4), 500 mM NaCl, 1% Triton X-100, cOmplete protease inhibitor cocktail (Roche), 0.1 mg/mL lysozyme (Wako chemical)]. Recombinant CRONOS-KAPB2-H8 was purified with a His60 Ni Gravity column purification Kit (Clontech) following the manufacturer's protocol. After elution with 500 mM imidazole, recombinant CRONOS-KAPB2-H8 was dialyzed overnight with PBS using Slide-a-Lyzer dialysis cassette (Thermo Fisher Scientific). The concentration of purified CRONOS-KAPB2-H8 was measured using NanoDrop One (Thermo Fisher Scientific), and then the protein was frozen at À80 C until use.

Gene ontology analysis
STRING interaction network of nucleolar proteins with signal intensities that were at least doubled in the interactome of (PR) 50 when compared with that of (YR) 50 were analyzed with STRING (www.string-db. org) 69 for gene ontology analysis.
Quantitative PCR of rRNA HEK293 cells were cultured in the presence of 10 mM of (PR) 20 peptide for 8 or 24 h. Total RNA was extracted with TRIzol reagent (Invitrogen) following the manufacturer's protocol. cDNA was synthesized from 1 mg of total RNA using a Superscript III First-Strand Synthesis System (Thermo Fisher Scientific). Expression levels of pre-ribosomal RNA, 28S rRNA, 18S rRNA, and 5S rRNA were quantified with DyNAmo ColorFlash SYBR Green qPCR Kit (Thermo Fisher Scientific). Quantitative PCR was performed with QuantStudio Real-Time PCR system (Thermo Fisher Scientific). PCR was carried out with the following conditions: initial denaturation at 94 C for 3 min, and 45 cycles of 94 C for 15 s, 60 C for 1 min. Relative expression levels were normalized against ACTB.

Computational identification of (X-R/K) repeats in natural proteins
In order to extract periodically positively charged motifs and subcellular localization signatures from the human proteome, we performed data mining of protein sequences obtained from the reviewed human proteome (UniProt accession AUP000005640). We analyzed protein sequences and subcellular localization data by importing human proteome data from UniProt into Spyder (Python v3.8) in Anaconda (https://docs. anaconda.com). First, we searched proteins harboring X-R/K repeats with repeat lengths 5 or longer. Out of 20,386 human proteins, we found 797 proteins containing X-R/K repeats with repeat lengths of at least 5. Next, we counted the number of each amino acid in the X-R/K repeats, and compared their frequencies with their general occurrences in the entire human proteome. Hypergeometric p values were calculated with the web server https://systems.crump.ucla.edu/hypergeometric/index.php. We also accessed the UniProt API (https://www.uniprot.org/help/programmatic_access) to obtain xml files of each protein with X-R/K repeats and collected subcellular localization information. Among repeat-containing proteins, 112 proteins with X-R/K repeats 5 or longer and 4 proteins containing 10 or longer repeats were excluded due to lack of xml or localization data. We also collected localization information for 16,810 proteins (no data: 3576 proteins) in the whole proteome.

Data mining
Data mining was performed to extract acidic stretches from the nucleolar proteome. We analyzed protein sequence data based on Anaconda by importing data from Uniprot into Spyder (Python 3.7.3) in Anaconda (https://docs.anaconda.com/anaconda/).
First, the UniProt application programming interface (API) was used to pick up proteins localized to the nucleolus from the (PR) 50 and (YR) 50 interactomes detected by LC/MS, respectively. Next, sequence information was extracted from Uniprot for the 100 most and 100 least enriched proteins in (PR) 50 compared to (YR) 50 , and the length of the longest consecutive acidic amino acid (D/E) sequence was examined. Lengths of contiguous D/E sequences were similarly examined in the total human proteome (20,327 proteins). Frequencies of occurrence of each amino acid in these 200 nucleolar proteins were also examined and compared with frequencies in the entire human proteome (20,327 proteins).

MD simulation
(PR) 12 and (GR) 12 were modeled using AVOGADRO software and studied at pH 7, with both N-terminal proline/ glycine and C-terminal arginine residues being protonated. MD simulation was performed using GROMACS 2022.1 and the OPLS-AA/L all-atom forcefield was used. 67,70,71 A cubic box (20 3 20 3 20 Å 3 ) with periodic boundary conditions applied in all directions was used for all simulations. The entire system was solvated with an explicit SPC (Single Point Charge) water model and neutralization was achieved by addition of chloride ions. First, we performed energy minimization with the steepest decent method to reach the maximum force <1,000 kJ/mol. We pre-equilibrated the systems for 100 ps at a constant temperature of 300 K using the V-rescale coupling method in a canonical (NVT) ensemble and 100 ps at a constant pressure of 1 bar using the ll OPEN ACCESS