High resolution RNA‐seq profiling of genes encoding ribosomal proteins across different organs and developmental stages in Arabidopsis thaliana

Abstract In Arabidopsis thaliana, each ribosomal protein (RP) is encoded by a small gene family consisting of two or more highly homologous paralogues, which results in ribosome heterogeneity. It is largely unknown that how genes from multiple member containing RP families are regulated at transcriptional level to accommodate the needs of different plant organs and developmental stages. In this study, we investigated the transcript accumulation profiles of RP genes and found that the expression levels of RP genes are varied dramatically in different organs and developmental stages. Although most RP genes are found to be ubiquitously transcribed, some are obviously transcribed with spatiotemporal specificity. The hierarchical clustering trees of transcript accumulation intensity of RP genes revealed that different organs and developmental stages have different population of RP gene transcripts. By interrogating of the expression fluctuation trend of RP genes, we found that in spite of the fact that most groups of paralogous RP genes are transcribed in concerted manners, some RPs gene have contrasting expression patterns. When transcripts of paralogous RP genes from the same family are considered together, the expression level of most RP genes are well‐matched but some are obviously higher or lower, therefore we speculate that some superfluous RPs may act outside the ribosome and a portion of ribosomes may lack one or even more RP(s). Altogether, our analysis results suggested that functional divergence may exist among heterogeneous ribosomes that resulted from different combination of RP paralogues, and substoichiometry of several RP gene families may lead to another layer of heterogeneous ribosomes which also have divergent functions in plants.


| INTRODUC TI ON
Ribosome is a ribonucleoprotein complex comprising a large and a small subunit, and is essential for catalyzing the peptidyl transferase reaction during polypeptide synthesis in all living cells. In plants, the large ribosomal subunit is composed of 48 RPL (Ribosomal Protein of Large subunit) proteins in conjunction with three rRNAs (25S, 5.8S, and 5S), whereas the small subunit is composed of 33 RPS (Ribosomal Protein of Small subunit) proteins in conjunction with the 18S rRNA (Chang et al., 2005;Savada & Bonham-Smith, 2014).
Biogenesis of cytoplasmic ribosome is a highly orchestrated process involving the coordinated production and transport of four rRNAs and 81 RPs (Saez-Vasquez & Delseny, 2019).
In Arabidopsis thaliana, each RP is encoded by a small gene family containing two or more highly homologous family members (Barakat et al., 2001). The presence of multiple gene paralogues for RPs in plants, which leads to the production of heterogeneous ribosomes (Giavalisco et al., 2005), might be a consequence of high frequency of ancestral polyploidy events and could reflect a need to maintain adequate ribosome dose or to maintain some degree of ribosome heterogeneity (Blanc & Wolfe, 2004;Horiguchi et al., 2011Horiguchi et al., , 2012Martinez-Seidel et al., 2020;Thomas et al., 2006;Xue & Barna, 2012). Recently, many studies suggested that a number of individual animal RPs have wideranging extraribosomal functions in processes such as transcription, translation, mRNA processing, DNA repair, apoptosis, and tumorigenesis (Aseev & Boni, 2011;Lindstrom, 2009;Lu et al., 2015;Naora, 1999;Warner & Mcintosh, 2009;Yang et al., 2019). Although RPs were normally targeted to the nucleolus for cytoplasmic ribosomal subunit assembly with rRNAs (Degenhardt & Bonham-Smith, 2008;Kruger et al., 2007;Lam et al., 2007;Savada & Bonham-Smith, 2014), some RPs could be secreted out of the cell (Dai et al., 2010), all of which may suggest extraribosomal functions of RPs.
The complexity of the plant ribosome biogenesis together with extraribsomal functions of RP genes raise an important question: how the duplicated RP genes are regulated at the transcriptional level to coordinate the needs of cells for specific RPs or specific RP paralogues?
By analyzing the expressed sequence tag (EST) data and the complete genomic sequence of Arabidopsis, previous studies have identified 249 genes (including some pseudogenes) corresponding to different cytoplasmic RP types, and found only 52 RP genes lack a matching EST accession and 19 of these contain incomplete open reading frames.
These results confirm that most RP genes are expressed (Barakat et al., 2001). A group of researchers identified 996 putative RP genes spanning 79 distinct RPs in Brassica napus using EST data from 16 tissue collections (Whittle & Krochko, 2009). Comparative analysis of the transcript levels according to EST data for Brassica napus RPs revealed that a large fraction of RP genes is differentially expressed and that the number of paralogues expressed for each RP varied extensively with tissue types. Using Genevestigator (Hruz et al., 2008) to analyze Arabidopsis 22k microarray data, another group of researchers studied transcript levels of 192 of the 254 Arabidopsis RP genes and revealed that transcript levels from different RP genes show up to a 300-fold difference across the RP population (Savada & Bonham-Smith, 2014).
Despite these studies on the steady state levels of RP transcripts, so far, it is still not clear about how coordinated expression of RPs from multigene families is accomplished. The genome-wide expression profiles across 79 different tissues and developmental stages using highthroughput transcriptome sequencing (Klepikova et al., 2016) provide the opportunity to comprehensively understand the expression of the duplicated RP genes in Arabidopsis thaliana. In this study, we analyzed these data and found that duplicated RP genes are transcribed dynamically in different cell types with some degree of function diversity and co-expression patterns. Furthermore, our analysis results suggested that transcript insufficiency of several RP families brings the possibility of the second layer of heterogeneous ribosomes, which may also have divergent functions.

| RP gene list
The RP sequences of Arabidopsis thaliana were compiled based on the Arabidopsis Genome Initiative (AGI) identification numbers provided by previous researches (Barakat et al., 2001;Browning & Bailey-Serres, 2015;Whittle & Krochko, 2009). We focused on a total number of 240 RP genes which have definite AGI numbers or have detected transcripts (N RP genes = 256, N Pseudogenes =16).

| Heatmap and hierarchical clustering
The original RPKM was log 2 transformed. A heatmap with the hierarchical cluster tree was made using the R package pheatmap with log 2 (RPKM) value of RP genes with the parameters "clustering_method =complete","clustering_distance_rows =euclidean".

| Boxplot
The boxplot figure showing the overall expression level of RP genes across different organs and developmental stages was generated by the R packages ggpubr. ggpubr: 'ggplot2' Based Publication Ready Plots. R package version 0.2.5. https://CRAN.R-proje ct.org/packa ge=ggpubr).

| Calculating the transcript levels of different RP gene families
The transcript levels (RPKM value) of different RP gene families from different tissues were combined (sum of individual family members) respectively. The transcripts stoichiometry of 81 RP gene families was calculated with RPKM value of each RP gene family divided by the median RPKM value of total RP gene families of each examined tissue.

| Code availability
RNA-seq data bioinformatic analyses in this study were performed by an integrated pipeline for next-generation sequencing analysis, pRNASeqTools v0.6 [https://github.com/grubb ybio/pRNAS eqToo ls/] in its mapping only mode to get the reads count. This pipeline can be used freely under the MIT license.

| RE SULTS
We downloaded the original RNA-seq data of 79 different Arabidopsis thaliana tissues and developmental stages from NCBI Sequence Read Archive (SRA) (Klepikova et al., 2016). In order to maximize the representation of different organs and stages, and to provide insights into the dynamics of gene expression in the most important processes in the plant life cycle, the sequenced samples were selected from different developmental stages including seed germination, seed development, silique development, transition to flowering, flower development, ovule development. Furthermore, the samples were also chosen with special focus on tissues and stages not sampled in microarray-based transcriptome map (Schmid et al., 2005), for example, detailed shoot apical/inflorescence meristem series and leaf development series (Klepikova et al., 2016). Each sample had two biological replicates. Description of sequenced samples is listed in Data S1. The Arabidopsis RP genes (Table S1) used in this research is based on previous publications (Barakat et al., 2001;Browning & Bailey-Serres, 2015;Whittle & Krochko, 2009). Some of the RP gene AGI numbers have been updated/adjusted based on data from The Arabidopsis Information Resource (TAIR).

| The expression levels of RP genes across different organs and developmental stages vary substantially in Arabidopsis thaliana
In order to understand the overall expression levels of RP genes during temporal and spatial developmental processes, we calculated

| Most RP genes are ubiquitously expressed
In order to investigate the transcription spatiotemporal specificity of RP genes, we made a heatmap with hierarchical cluster trees of transcript accumulation intensity of RP genes across different organs and developmental stages ( Figure 2). We found that the majority of RP genes are ubiquitously expressed with exceptions that some RP genes are expressed with obviously spatiotemporal specificity. RP genes with extremely low transcripts accumulation levels or with obviously spatiotemporal specificities are shown in Figure S1. Take The hierarchical clustering tree of transcripts accumulation intensity of 240 different RP genes revealed that distinct members from different RP families cluster together, implying that these F I G U R E 1 A boxplot figure showing the overall transcripts accumulation level of 240 RP genes and total genes across 79 different tissues and developmental stages in Arabidopsis thaliana. The average median value of RPKM (Reads Per Kilobase per Million mapped reads) of 240 RP genes and total genes were calculated from two biological replicates for each sample. The median value of log 2 (RPKM) together with upper quartile and lower quartile were plotted against different samples. Blue, RP genes; yellow, total genes non-paralogous RPs have similar expression intensity crossing different organs and developmental stages. Meanwhile, the hierarchical clustering tree of the examined tissues reflected an organ-specific and developmental stage-specific structure, as the different tissues series are found to be organized into distinct clades. For example, different parts of root, different parts of leaf, different stages of F I G U R E 2 A heatmap with hierarchical cluster trees of transcripts accumulation intensity of RP genes across different organs and developmental stages. The value of log 2 (RPKM) of each RP gene was plotted against each examined sample. Yellow, relative high expression; black, relative low expression. The vertical hierarchical cluster tree shows the Euclidean distance of examined samples and the horizontal hierarchical cluster tree shows the Euclidean distance of different RP genes based on the log 2 (RPKM) value F I G U R E 3 Diversified spatiotemporal transcripts accumulation pattern within RPS15 family. A heat map with hierarchical cluster trees of transcripts accumulation intensity of RPS15 family containing six members (RPS15A, RPS15B, RPS15C, RPS15D, RPS15E, and RPS15F) was made. Yellow, relative high expression; black, relative low expression. The vertical hierarchical cluster tree shows the Euclidean distance of examined samples whereas the horizontal hierarchical cluster tree shows the Euclidean distance of different RP genes based on the log 2 (RPKM) value meistems are clustered together, respectively. Thus, different tissues and developmental stages have distinct population of RP gene transcripts. Ribosomes are highly heterogenous in Arabidopsis, each organ or developmental stage may need different association of non-paralogous RPs.

| Stoichiometric analysis of steady-state level of mRNAs between different RP families
Any given ribosome contains only a single polypeptide of most RPs except acidic ribosomal proteins P1 and P2 which form a heterodimer and two heterodimers are present per 60S subunit (Armache et al., 2010;Inglis et al., 2019). However, each one of the 81 Arabidopsis RPs is encoded by two or more paralogues. In order to maintain the appropriate stoichiometry of ribosome components, the amount of each RP family must be regulated at comparable level. Therefore, the transcript levels of RP gene families may be coordinately controlled. In this study, we combined transcript levels of each dry seeds (SD.d), and the value is greater than 1.5 in most examined samples (t-test, p < .05), (Data S2, and Figure S2). The value of the gene family RPLP2 is greater than 1 in all samples and is greater than 1.5 in near half of the examined tissues (t-test, p < .05), (Data S2, and Figure S2). The value of the plant specific gene family RPLP3 is >0.5 meanwhile <1 in all of the examined tissues (t-test, p < .05), (Data S2).
Interestingly, we found numbers of RP gene families, of which the values are greatly deviated from 1 (<0.5 or >2), are much larger in dry seeds (SD.d), senescent internodes (IN.sn), anthers of the mature flowers (F.AN.ad), and opened anthers (F.AN), where destructive metabolisms are the major biochemical reactions, than other examined tissues (Figure 4), suggesting that the stoichiometries of RP genes transcript accumulation levels are highly unequal in these tissues. The translation efficiencies of RP gene transcripts maybe different from each other (Fernie & Stitt, 2012). On what degree the stoichiometry inequality of RP gene transcripts affect RP stoichiometry at protein level needs to be investigated. RP families with higher transcript accumulation levels may have more RP proteins being translated, thus the amount of them maybe more than enough for the assembly of ribosome (the proposed hypothesis is shown in Figure 4g), suggesting these free RPs may act outside ribosomes. In contrast, RP families with extremely lower transcripts accumulation levels may have less RP proteins being translated, therefore they maybe not sufficient to be incorporated into ribosomes, causing a portion of ribosomes lacking these RPs (the proposed hypothesis is shown in Figure 4g). Apart from the heterogeneity of ribosomes contributed by different combination of non-paralogous RPs, substoichiometry of RP genes causes another layer of ribosome heterogeneity. Ribosomes lacking one or more RP(s) may not recognize some kinds of mRNAs or may have lower translating efficiency toward specific subpools of mRNAs as reported in mammalian cells (Kondrashov et al., 2011;Shi et al., 2017). Together, heterogeneous ribosomes contributed by some substoichiometric RPs and free RPs resulted from several superfluous RPs that are not incorporated into ribosomes, may play roles in regulating specific developmental stages or participating the establishment of specific organs.  (Thompson et al., 2016). Our analysis results suggested that a portion of paralogous RP genes might have undergone functional specialization whereas the majority of paralogous RP genes seemed to remain the same function.

| Synchronous and discrepant expression patterns coexist for RP genes
We also compared the expression fluctuation trend between different RP gene families. Interestingly, we found that the expression fluctuation trends of most RP genes from different families are similar ( Figure 7, Figures S3-S5), indicating they are regulated in highly coordinated manners. Thus, synchronous expression of RP genes may play an important role in ribosome biogenesis. It was reported that almost half of all Arabidopsis RP genes carry several clustered GCCCR motifs in their proximal promoters, which could be recognized by a transcription factor called TCP20 (Li et al., 2005). TCP20 mediated transcriptional regulation of RP gene expression is likely to contribute to the synchronous expression of RP genes. sn; (f) SD.d, of which the number of RP gene families largely deviating from 1 (<−0.5 or >2) is more than 5. (g), proposed hypothesis of free RPs which may act outside ribosomes and heterogeneous ribosomes contributed by some substoichiometric RPs. Relative level was calculated with RPKM value of each RP gene divided by the median RPKM value of total RP genes in each examined tissue. Error bars indicate SD from two independent experiments; asterisk indicates significant difference (t-test, p < .05) coli, RP genes are clustered together and are arranged into 20 operons with approximately half of the genes mapping to a single locus (Mager, 1988). The arrangement of prokaryotic RP genes in operons The expression of prokaryotic RP genes could be rapidly changed in response to stimuli such as nutrient availability (Nomura, 1999;Schmid et al., 2005). In yeast Saccharomyces cerevisiae, three-fourths (59/79) of the RPs are encoded by functionally duplicated genes of which are not transcribed at the same level Jimenez et al., 2003;Warner et al., 1985). In mammals, most functional RPs are encoded by a single gene although there are about 2,000 pseudogenes that maybe related to RP genes (Balasubramanian et al., 2009). It should be noted that few mammalian RP genes are encoded by more than one functional paralogues. For example, in human there are three paralogous genes encoding RPS4, namely RPS4X, RPS4Y1, and RPS4Y2, which are located on the X chromosome and the Y chromosome respectively (Ellis et al., 2010;Fisher et al., 1990). RPS4X and RPS4Y1 are found to be ubiquitously expressed; in contrast, the expression of RPS4Y2 is restricted to the testis and prostate, suggesting functional specialization of these paralogues.

| The complexity of RP gene expression patterns in
The existence of multigenes for each RP family in plants presents a picture of RP expression that may be more complex than those that were previously described for other species. The expression patterns of some RP genes have been investigated, primarily at the level of transcript abundance (Dresselhaus et al., 1999;Hulm et al., 2005;. However, the overall profiles of the coordinate response of RP genes to cell differentiation, growth, development remains to be comprehensively investi-

| Different organs and developmental stages demand different amount of ribosomes
Our analysis results revealed that transcripts accumulation levels of RP genes vary substantially across different organs and developmental stages. Anthers of mature flower (F.AN.ad) and opened anthers (F.AN), which are considered as highly differentiated organs, were found to have the lowest transcripts accumulation level of RP genes among the 79 examined samples with the median number of log 2 (RPKM) being only around 3.5. Whereas, anthers of the young flower (F.AN.y), of which the median number of log 2 (RPKM) is 8, have much higher transcripts accumulation levels of RP genes than those two development stages. RP transcripts accumulation level from dry seeds (SD.d) at dormant state, was the 3rd lowest among the examined samples. Thus, it is reasonable that the expression level of RPs from established and quiescent tissues are relatively low, a reduced level of ribosome is still sufficient to support the living of them. After soaking, seeds begin to germinate, the expression level of RPs from seeds at first day after soaking (SD.g1) was the first highest among the 79 examined samples cross different organs and different developmental stages. During the early stage of seed germination, initial protein synthesis is dependent on residual ribosomes within the cells of Note: RP family, with all members are transcribed in concerted manner, is listed in the "Similar pattern" group; whereas RP family, with one or more member(s) has/have different transcription manner(s), is listed in the "Different pattern" group.

| Potential ribosome heterogeneity resulted from substoichiometry of RPs
The appropriate stoichiometry of ribosomal proteins in the nucleolus is very important for efficient ribosome biogenesis. It is generally considered that equal molar ratio of each RP family exists in assembled ribsomes. Our analysis results reveal that transcript F I G U R E 7 Consensus expression patterns of non-paralogous RP genes in different developmental stages of meristem. The RPKM value of paralogous and non-paralogous RP genes from different developmental stages of meristem was treated in accordance with Z-score using R packages TCseq with the parameters"algo = "cm", k = 6"s  Arabidopsis (Li et al., 2017). According to our analysis result, the expression level of RPL24 family was a little bit higher than the median level of total RP families in most examined tissues (Data S2), which could be the evidence supporting the existence and action of free RPL24 outside the ribosome. RPL10, which was demonstrated to act in antivirus defense (Zorzatto et al., 2015), was another example of extraribosomal function of RPs. Our analysis results showed that expression level of RPL10 family are obviously higher than expected from a stoichiometric view of ribosome organization ( Figure S2 and Data S2).

| Functional redundant and functional specialization co-exist within paralogous RPs
Duplication of RP genes is originated from gene or genome duplication events (Blanc & Wolfe, 2004). Gene balance pressures together with dosage effects could explain the high rates of retention of duplicated RP genes in the genome (Birchler et al., 2001(Birchler et al., , 2007Conant & Wolfe, 2008;Veitia et al., 2008). The non-allelic noncomplementation phenomenon, which may indicate dosage effects between paralogous genes, was found in members within RP families, such as RPL5 (Fujikura et al., 2009) showed that paralogous RPL36a genes are transcribed in a concerted manner. Knockdown of RPL23aA resulted in severely developmental phenotypes whereas knockout of RPL23aB has no obvious phenotype (Degenhardt & Bonhamsmith, 2008a, 2008b. Interestingly, RPL23aA and RPL23aB were found to be transcribed in a concerted manner with much higher expression level of RPL23aA than that of RPL23aB, and over-expression of RPL23aB in rpl23aa could rescue the phenotype of rpl23aa (Xiong et al., 2020). We thought paralogous RP genes with concerted expression patterns may be the evidence supporting the dosage hypothesis which posits that the presence of duplicated RP genes might be necessary to maintain adequate RP dosages with functional redundant. Meanwhile, we also found a lot of paralogous RP genes are transcribed in highly contrasting manners, indicating function divergences exist within these paralogous RP genes. Plant paralogous RPL10 genes were found to have different roles during development and responds differently to ultraviolet-B stress Ferreyra, Pezza, et al., 2010). Consistent with genetic and functional studies of paralogous RPL10 genes, our analysis results revealed that paralogous RPL10 genes have contrary expression patterns (Table 1). Divergence in expression patterns is believed to be an important evidence proving functional specialization of duplicated genes (Adams et al., 2003;Blanc & Wolfe, 2004;Casneuf et al., 2006). Together, our analysis results suggested that functional redundant and functional specialization may co-exist within paralogous RPs.

ACCE SS ION NUMBER S
The original RNA-seq data of A. thaliana different organs and developmental stages were downloaded from NCBI Sequence Read Archive (project ID PRJNA314076 for samples except meristem and project ID PRJNA268115 for the meristem samples).

ACK N OWLED G M ENTS
We thank Dr. Lei Gao, Dr. Chenjiang You, and Dr. Yang Liu from Shenzhen University for their help with data analysis. This work was supported by the Guangdong Innovation Team Project

CO N FLI C T O F I NTE R E S T
The authors have declared no conflict of interest.