Structural and functional similarities and differences in nucleolar Pumilio RNA-binding proteins between Arabidopsis and the charophyte Chara corallina


 
 Pumilio RNA-binding proteins are evolutionarily conserved throughout eukaryotes and are involved in RNA decay, transport, and translation repression in the cytoplasm. Although a majority of Pumilio proteins function in the cytoplasm, two nucleolar forms have been reported to have a function in rRNA processing in Arabidopsis. The species of the genus Chara have been known to be most closely related to land plants, as they share several characteristics with modern Embryophyta.
 
 
 In this study, we identified two putative nucleolar Pumilio protein genes, namely, ChPUM2 and ChPUM3, from the transcriptome of Chara corallina. Of the two ChPUM proteins, ChPUM2 was most similar in amino acid sequence (27% identity and 45% homology) and predicted protein structure to Arabidopsis APUM23, while ChPUM3 was similar to APUM24 (35% identity and 54% homology). The transient expression of 35S:ChPUM2-RFP and 35S:ChPUM3-RFP showed nucleolar localization of fusion proteins in tobacco leaf cells, similar to the expression of 35S:APUM23-GFP and 35S:APUM24-GFP. Moreover, 35S:ChPUM2 complemented the morphological defects of the apum23 phenotypes but not those of apum24, while 35S:ChPUM3 could not complement the apum23 and apum24 mutants. Similarly, the 35S:ChPUM2/apum23 plants rescued the pre-rRNA processing defect of apum23, but 35S:ChPUM3/apum24+/− plants did not rescue that of apum24. Consistent with these complementation results, a known target RNA-binding sequence at the end of the 18S rRNA (5′-GGAAUUGACGG) for APUM23 was conserved in Arabidopsis and C. corallina, whereas a target region of ITS2 pre-rRNA for APUM24 was 156 nt longer in C. corallina than in A. thaliana. Moreover, ChPUM2 and APUM23 were predicted to have nearly identical structures, but ChPUM3 and APUM24 have different structures in the 5th C-terminal Puf RNA-binding domain, which had a longer random coil in ChPUM3 than in APUM24.
 
 
 ChPUM2 of C. corallina was functional in Arabidopsis, similar to APUM23, but ChPUM3 did not substitute for APUM24 in Arabidopsis. Protein homology modeling showed high coverage between APUM23 and ChPUM2, but displayed structural differences between APUM24 and ChPUM3. Together with the protein structure of ChPUM3 itself, a short ITS2 of Arabidopsis pre-rRNA may interrupt the binding of ChPUM3 to 3′-extended 5.8S pre-rRNA.



Background
Pumilio proteins are a family of RNA-binding proteins that are evolutionarily conserved in eukaryotes [1]. Typical Pumilio proteins have tandem repeats of 8 Puf domains that recognize 8 RNA bases, and each Puf domain contains 35-39 amino acids that form three α-helical structures [2,3]. The basis of RNA recognition by these proteins is the crescent-shaped structure [4,5]. The conserved aromatic and basic amino acids on the concave side of the crescent structure interact with RNA, whereas the amino acids on the convex side interact with partner proteins. Although Pumilio proteins have a variety of biological roles, their major molecular functions are mRNA decay and localization, translational repression [6,7], and rRNA processing [8]. Most of the Pumilio proteins are localized in the cytoplasm and are involved in the posttranscriptional regulation of mRNA. However, a small subset of these proteins is localized in the nucleolus and participates in rRNA processing. For instance, nucleolar Nop9 of yeast [9] and TbPUF7 of trypanosomes [8] are involved in 18S rRNA biosynthesis and ribosome maturation through proper pre-rRNA processing. In plants, two nucleolar Pumilio proteins have been implicated in rRNA processing [10][11][12][13][14][15], including Arabidopsis APUM23, a homolog of yeast Nop9, and APUM24, a homolog of human Puf-A and yeast Puf6. APUM23 not only is required for normal growth patterning, such as leaf development and organ polarity [10,16] but also is involved in ABA signaling [17]. APUM24 is essential for plant development, as its homozygous mutant displays embryo lethality [15]. APUM24 is implicated in the maturation of 5.8S and 25S rRNAs, while APUM23 participates in the processing of 18S and 5.8S rRNAs.
Pre-rRNA is a long single-stranded RNA transcribed from the rDNA repeat in the nucleolus. This transcript is subsequently cleaved to three mature rRNAs (5.8S, 18S, and 25S) by endoribonucleolytic activities [18,19]. Misprocessed rRNA byproducts that are produced during rRNA processing are degraded by 5′-to-3′ and 3′to-5′ exoribonucleolytic activities. These two pre-rRNA processing activities require additional accessory proteins, such as RNA exosome components, Pumilio proteins, and many RNA-binding proteins. It has been reported that Arabidopsis and rice show similar pre-rRNA processing pathways, probably due to the similar flanking sequences around the endocleavage sites of A2 and A3 in ITS1 [20], suggesting that RNA binding specificity is essential for the selection of cleavage sites. Recently, APUM23 was found to bind 11 nt in the 18S rRNA at positions 1141-1151 [12], and APUM24 interacts with rRNA segments encompassing the 5.8S and ITS2 regions [15]. Therefore, it is likely that APUM23 and APUM24 play crucial roles in the recruitment of target RNA sequences and interacting proteins for the maturation of 18S and 5.8S rRNAs in Arabidopsis.
Approximately 450 million years ago, land plants evolved from Charophyta living in freshwater and adapted to the terrestrial environment [21][22][23][24]. Charophyta shares numerous molecular and physiological characteristics with living land plants that are not found in Chlorophyta [25,26]. Charophyta shows high sequence similarities to land plants in plastidal atpB and rbcL, mitochondrial nad5, and nuclear-encoded small subunit rRNA genes [27].
In this study, we found that all the green plants (Viridiplantae) examined have two putative nucleolar Pumilio proteins homologous to Arabidopsis APUM23 and APUM24. Consistent with this, two nucleolar Pumilio genes, namely, ChPUM2 and ChPUM3, were identified in Chara corallina by transcriptome analysis. We postulated that two Pumilio proteins encoded by these genes might be evolutionarily and functionally conserved, as their Arabidopsis homologs play crucial roles in pre-rRNA processing required for proper protein synthesis. Transiently expressed ChPUM2-RFP and ChPUM3-RFP were localized in the nucleoli of Nicotiana benthamiana leaf cells, suggesting the nucleolar function of ChPUM2 and ChPUM3. The apum23 mutant transformed with 35S:ChPUM2 recovered its defective rRNA processing and morphological phenotypes to normal levels. However, the rRNA processing defects and embryo lethality of the apum24 mutant were not rescued by 35S: ChPUM3 or ChPUM2. Consistent with the failure of complementation of apum24 with 35S:ChPUM3, APUM24 has different domain structures at the Cterminus from ChPUM3. Moreover, the target ITS2 region of Arabidopsis pre-rRNA is 156 nt shorter than that of C. corallina and might not be sufficient for the binding of ChPUM3.

Phylogeny of nucleolar Pumilio proteins
Pumilio proteins are ubiquitous in eukaryotic organisms, albeit in different numbers [1,5]. Among the organisms whose whole genome sequences are available, higher plants have a higher number of Pumilio proteins than photosynthetic single-cell organisms and nonplant organisms; for example, 25 Pumilio proteins are found in Arabidopsis thaliana, 20 in Oryza sativa, 14 in Physcomitrella patens, 5 in Chlamydomonas reinhardtii, 11 in Caenorhabditis elegans, 7 in Saccharomyces cerevisiae, and 2 in humans [10]. Based on a similarity search using Arabidopsis nucleolar Pumilio proteins (APUM23 and APUM24) as queries and the existence of a nucleolar localization signal(s) (NoLS) as a requirement [28], the green plants whose genomes have been sequenced (Phytozome v12.1; https://phytozome.jgi.doe.gov) were shown to have two putative nucleolar Pumilio proteins. Using PacBio Iso-Seq analysis, we also identified two putative nucleolar Pumilio proteins out of four Pumilio proteins in C. corallina. Consistent with our transcriptome analysis, four Pumilio proteins were predicted in the Chara genome data, including 2 nucleolar forms [26]. When compared with Arabidopsis APUMs, comprising 25 Pumilio proteins, ChPUM2 and ChPUM3 displayed high homology with APUM23 and APUM24, respectively, while ChPUM1 and ChPUM4 belonged to other distinct clades (Additional file 1: Figure S1).
To gain insight into the evolutionary relationship of putative nucleolar Pumilio proteins, we constructed a phylogenetic tree with the homologous proteins of 14 species of green plants together with 5 outgroup species that contain a NoLS(s) [28] (Fig. 1a and b). All green plants analyzed in this study had two proteins belonging to the APUM23 and APUM24 clades. The ChPUM2 of C. corallina was closer to APUM23 than APUM24, whereas ChPUM3 was categorized in the APUM24 clade. Phylogenetic analyses using ChPUM2 and ChPUM3 indicated that C. corallina is closer to land plants than other green algae examined in this study, suggesting that the evolution of nucleolar Pumilio proteins is consistent with previously determined phylogenetic positioning [21][22][23][24][25].
We then compared the number and position of Puf domains in ChPUM2, ChPUM3, APUM23, and APUM24 using the SMART web program (http://smart. embl-heidelberg.de) [30]. APUM23 and APUM24 have six and five Puf domains, respectively, and ChPUM2 and ChPUM3 have one and five Puf domains, respectively. As each Puf domain has been known to recognize a single RNA base [3], this observation raised the possibility that ChPUM2 may have a distinct RNA binding property from APUM23 and that ChPUM3 may bind similar, if not identical, RNA motifs as APUM24 (Fig. 1c).

Structure of ChPUM2 and ChPUM3
Classic structural analysis of Pumilio proteins shows a tandem repeat of 8 Puf domains in the C-terminal region [31,32]. However, a recent analysis of human Puf-A and yeast Puf6 identified 11 Puf domains, including 3 additional domains, in these Pumilio proteins involved in pre-rRNA processing [14]. A similar analysis previously performed for APUM23 showed 10 Puf domains instead of the six previously known domains [10,12]. Consistent with a close phylogenetic relationship between APUM23 and ChPUM2 ( Fig. 1a and Additional file 1: Figure S1), ChPUM2 also contained 10 Puf domains that showed an identical distribution as in APUM23 ( Fig. 2a and Additional file 2: Figure S2a). Each domain of ChPUM2 showed an average 26% identity and 41% homology with the corresponding domain of APUM23. Notably, a high degree of homology was found in the 1st, 2nd, and 5th residues of the 2nd αhelix of each Puf domain (red boxes in Fig. 2a and Additional file 2: Figure S2a). These three residues have been known to play a pivotal role in the recognition of RNA bases [14].
Using the same approach, ChPUM3 was shown to possess 11 Puf domains (N-R1 to N-R3 at the Nterminus and C-R1 to C-R8 at the C-terminus) ( Fig. 2b and Additional file 2: Figure S2b), as reported in human Puf-A and APUM24 [14]. Comparison of amino acids in 3 N-terminal Puf domains displayed an average identity of 47% and homology of 67% between APUM24 and ChPUM3, and that of 8 C-terminal domains showed 39% identity and 55% homology. Out of the 11 Puf domains, two domains (C-R5 and C-R7) had lower identities than the other domains ( Fig. 2b).
Comparison of amino acid sequences among Puf domains suggested that ChPUM2 and ChPUM3 may be functional homologs of Arabidopsis APUM23 and APUM24, respectively ( Fig. 2a and b), which is consistent with the observation obtained from phylogenetic analysis. However, since ChPUM2 and APUM23 contain different numbers of Puf domains, unlike ChPUM3 and APUM24, in the classic domain analysis (Fig. 1c), they may bind distinct RNA substrates. Therefore, to determine the structural relationship between nucleolar ChPUM and APUM proteins, we predicted the tertiary structure of these proteins using the SWISS-MODEL web server (https://swissmodel.expasy.org/) [33]. A previous high-resolution structural study demonstrated that the C-shaped structure of APUM23 has a long chain between the 2nd and 3rd α-helix of the R3 domain that participates in the recognition of RNA bases [12]. Homology modeling revealed a high similarity between ChPUM2 and the APUM23 reference protein, as well as the C-shaped structure similar to APUM23, in the 3dimensional structure (Fig. 3a). Compared with APUM23, ChPUM2 has a long random coil in the R3 domain (red colored lines in the bottom panels of Fig.  3a), but it maintains uninterrupted 2nd and 3rd αhelical structures in this domain, similar to APUM23. Therefore, analyses of consensus amino acid sequences and homology modeling suggest that ChPUM2 may recognize similar, if not identical, RNA bases.
In contrast to the C-shaped configuration of ChPUM2 and APUM23, an L-shaped structure was predicted for ChPUM3 and APUM24, similar to the human Puf-A reference protein [14] (Fig. 3b). The most marked structural differences between ChPUM3 and APUM24 were found in the C-R5, and N-R2 and N-R3 domains. ChPUM3 had a longer random coil in the C-R5 domain than APUM24 (Fig. 2b and the dotted circles in Fig. 3b). Additionally, ChPUM3 contained negatively charged (E210) and uncharged (Q249) amino acids in the N-R2 and N-R3 domains (Fig. 2b), instead of the basic amino acids that are known to be involved in RNA binding and found at both sites of APUM24 and the human Puf-A reference [14]. Thus, it appeared that ChPUM3 has different RNA binding specificity from APUM24, considering the chain length of C-R5 and the lack of basic amino acids in 2 N-terminal Puf domains. . The phylogenetic trees were constructed using the maximum likelihood LG + G model using MEGA7 software [29] with 1000 bootstrapping replicates. Two independent nucleolar Pumilio proteins of red algae (Chondrus crispus and Galdieria sulphuraria), Drosophila melanogaster, Homo sapiens, and Saccharomyces cerevisiae were used as outgroups. c Primary protein structures of APUM23 and APUM24 from Arabidopsis thaliana and ChPUM2 and ChPUM3 from Chara corallina. Black hexagons indicate Puf RNA-binding domains

Subcellular localization of ChPUM2 and ChPUM3
Next, we examined the subcellular localization of ChPUM2-RFP and ChPUM3-RFP. Previously, the GFP fusions of APUM23 and APUM24 were known to preferentially localize in the nucleoli of Arabidopsis root and tobacco leaf cells [10,15]. We performed Agrobacteriummediated coinfiltration into N. benthamiana leaf cells using 35S:APUM23-GFP and 35S:ChPUM2-RFP and 35S: APUM24-GFP and 35S:ChPUM3-RFP. All GFP-and RFPtagged Pumilio proteins were found in the nucleoli and were weakly detected in the nucleoplasm (Fig. 4). The colocalization results suggest that ChPUM2 and ChPUM3 may play similar roles in pre-rRNA recognition and processing to APUM23 and APUM24.

Discussion
Based on our transcriptome data, public databases, and previous results [10,15,35], we found that the green plants examined in this study have two nucleolar Pumilio proteins. Protein phylogeny showed a closer relationship of nucleolar Pumilio proteins of land plants with multicellular C. corallina than with single-celled green algae (Chlorophyta). This result is consistent with a previous report showing that the genus Chara is more closely related to land plants than other green plants [27]. Although the possibility that certain putative nucleolar Pumilio proteins do not have a nucleolar function has not been ruled out, land plant species appear to have evolved two Pumilio proteins for the removal of aberrant pre-rRNAs.
We demonstrated that ChPUM2 is a functional ortholog of APUM23 in Arabidopsis cells, as evident from the restoration of defective pre-rRNA processing and morphology of apum23 in 35S:ChPUM2/apum23 plants. Arabidopsis nucleolar Pumilio proteins are known to play a role in recognizing the target sequences on pre-rRNA and recruiting catalytic proteins such as exoribonuclease [10,12,13,15]. Comparison of pre-rRNA identified a target sequence (5′-GGAAUUGACGG-3′) of APUM23 [12] in the 18S rRNA of C. corallina at positions 1148-1158 (Additional file 6: Figure S6). Therefore, it is likely that ChPUM2 may bind to this sequence in C. corallina. This assumption is supported by the primary structure of ChPUM2 and APUM23, each Puf domain of which is highly conserved between ChPUM2 and APUM23 except the fourth domain (Fig. 2a). Therefore, the common target rRNA sequence and similar amino acid composition of Puf domains might enable the restoration of apum23 phenotypes, including morphological and rRNA processing defects, to normal in 35S:ChPUM2/apum23. In apum23 complementation analysis using 35S:ChPUM2, the poly(A)-tailed 5.8S pre-rRNAs that accumulated in the apum23 mutant were not completely removed (Fig. 5e). It seems possible that this might be due to a weak interaction of ChPUM2 with other unknown proteins that belong to the partners of intrinsic APUM23 specifically required for 5.8S rRNA processing. Indeed, the predicted structure of ChPUM2 has more unfolded chains than APUM23 (Fig. 3a), which may interfere with the interaction of ChPUM2 with other protein components. To verify this possibility in planta, it is worthwhile to identify and compare the components interacting with CPUM2 in C. corallina and APUM23 in Arabidopsis.
Although ChPUM3 appeared structurally similar to APUM24 (Fig. 3b), ChPUM3 did not functionally replace APUM24 in Arabidopsis. We assume that this result is due to fine structural differences between ChPUM3 and APUM24. Typical Pumilio proteins bind to a specific RNA base with the second α-helix of the Puf domain, but APUM24 and its homologs are not capable of binding to a specific RNA base through this αhelix domain [14,15]. ChPUM3 does not complement the apum24 mutant perhaps due to (1) the very long random coil in the C-R5 domain, (2) the negatively charged and uncharged amino acids in two N-terminal domains, and (3) the long ITS2 sequence in C. corallina. First, a very long random chain at the C-R5 domain of ChPUM3 would interrupt the interaction of other Cterminal domains with RNA bases in the 5.8S-ITS2  Figure S4a. b Confirmation of the expression of ChPUM2 and ChPUM3 transgenes in the apum24-1 +/− mutant using RT-PCR. Original gel images are provided in Additional file 4: Figure S4b. c Siliques of Col-0 control, apum24-1 +/− , and transgenic apum24-1 +/− expressing 35S:ChPUM2 or 35S:ChPUM3. The right panels for each plant line are enlarged images of the boxed regions. Arrows and arrowheads indicate undeveloped ovules and aborted seeds, respectively. Note that none of the transgenics complemented the abnormal seeds to normal levels. d qRT-PCR for analyzing relative unprocessed rRNA levels in Col-0 control, apum24-1 +/− , and 35:ChPUMN/ apum24-1 +/− using the same primers that were used in Fig. 5. Two technical and three biological replicates were performed for PCR measurements. Values represent means ± SDs (n = 3) (**; p < 0.01) junction of Arabidopsis pre-rRNA. Indeed, it was reported that human Puf-A and its homolog APUM24 have a long random coil in C-R5 that prevents the C-terminal Puf domains from binding to RNA [14]. ChPUM3 has a random coil that is an 80 aa longer than that of APUM24 (Fig. 2b); thus, ChPUM3 may not recognize Arabidopsis pre-rRNA. Second, in ChPUM3, the N-R2 and N-R3 of patch 1B include negatively charged (E210) and uncharged (Q249) amino acids, unlike the positive amino acids (K) at both positions of APUM24, which may result in differential binding characteristics from APUM24 toward the 5.8S-ITS2 region. The N-R2 and N-R3 domains of the human Puf-A protein are essential for RNA binding [14]. Third, ChPUM3 might be optimized for the recognition of long ITS2 sequences. ITS2 of C. corallina pre-rRNA is 156 nt longer than that of Arabidopsis (Additional file 7: Figure  S7). In addition to the long side chain of the C-R5 domain in ChPUM3, the relatively short ITS2 sequence of Arabidopsis pre-rRNA may prevent ChPUM3 from binding to its substrate. Indeed, ITS2 evolved rapidly and has been used to evaluate genetic divergence [36,37].

Conclusions
In this study, we identified two nucleolar Pumilio proteins, namely, ChPUM2 and ChPUM3, from C. corallina that are phylogenetically and structurally close to the Arabidopsis nucleolar Pumilio proteins APUM23 and APUM24, respectively. Complementation analyses using 35S:ChPUM2 and 35S:ChPUM3 showed that ChPUM2 rescued the defective phenotypes of the apum23 mutant, but ChPUM3 did not restore the phenotypes of the apum24 mutant. Consistent with these complementation results, ChPUM2 showed similar features of Puf domains as APUM23 in the primary amino acid sequence and a predicted 3-D protein structure. ChPUM3 has a long random coil in the C-R5 domain and contains distinct amino acids from those in APUM24 in the Nterminal domain. In addition to the structural difference between ChPUM3 and APUM24, a short ITS2 sequence of Arabidopsis pre-rRNA might prevent ChPUM3 from properly processing Arabidopsis 5.8S pre-rRNA. Taken together, the results show that ChPUM2 was functional in Arabidopsis, similar to APUM23, but ChPUM3 could not substitute for APUM24 in Arabidopsis. Further studies on the nucleolar functions of ChPUM2 and ChPUM3 in Charophyta will help us understand the evolution of rRNA processing in green plants.  [29,39]. Amino acid sequence alignments were performed using Clus-talW (http://www.clustal.org/) and edited using BioEdit software (https://bioedit.software.informer.com/).

Colocalization assay of ChPUM and APUM fusion proteins
The C-terminal RFP fusion proteins of ChPUM2 and ChPUM3 and C-terminal GFP fusions of APUM23 and APUM24 were transiently expressed in N. benthamiana leaves using agroinfiltration [41]. Briefly, cultures of Agrobacterium carrying fusion constructs were harvested at the stationary phase and resuspended in MMA buffer (10 mM MES, 10 mM MgCl 2 , and 150 μM acetosyringone) to OD 600 = 0.8. For coexpression of ChPUM-RFP and APUM-GFP, equal volumes of two Agrobacterium cultures that had either the 35S:ChPUM-RFP or 35S:APUM-GFP construct were mixed before infiltration. Infiltration was performed on the abaxial side of tobacco leaves using a needleless syringe. Plants were kept in the dark at 22°C under high humidity for 30-34 h, and the infiltrated leaves were observed under a fluorescence microscope.
Quantitative reverse transcriptase-PCR (qRT-PCR) for analyzing unprocessed rRNA Total RNA was isolated using the RNeasy Plant Mini Kit (Qiagen, cat. # 74904) from 100 mg seedling and treated with 2 units of RNase-free TURBO™ DNase (Ambion, cat. # AM2238) in 50 μL reaction at 37°C for 50 min. First-strand cDNA was synthesized from 5 μg of total RNA using the oligo (dT) 18 primer in a 20 μL reaction and diluted 3-fold. Then, one μL of cDNA was mixed with 0.6 μL of 10 mM primers and 10 μL of 2 x SYBR® Green Supermix (Bio-Rad, cat. # 172-5261) in a 20 μL reaction and subjected to PCR according to the manufacturer's instructions. For the detection of unprocessed poly(A) rRNAs, three different combinations of primers (5′ETS/18S, 18S/ITS1, and 5.8S/ITS2) were used. Tubulin (Tub4, At5g44340) cDNA was used as an internal control. For qPCR measurements, two technical and three biological replicates were used. Data were calculated using the 2 -ΔΔCT method [42].