A Single Gene Codes for Two Forms of Rat Nucleolar Protein B23 mRNA*

Protein B23 (38 kDa, pI = 5.1) is an abundant RNA-associated nucleolar phosphoprotein and putative ribosome assembly factor. A full length cDNA clone (lambda JH1) encoding a major expressed form of rat protein B23, now designated B23.1, was reported recently (Chang, J. H., Dumbar, T. S., and Olson, M. O. J. (1988) J. Biol. Chem. 263, 12824-12837). In this paper the isolation from a rat brain library and sequence of a cDNA clone (lambda JH2) coding for a second form (B23.2) of protein B23 is reported. Isoforms B23.1 and B23.2 are polypeptides of 292 and 257 amino acids, respectively. The 5'-untranslated regions of the two cDNAs and the amino-terminal 255 amino acids of the proteins are identical in the two isoforms. However, the 3'-untranslated regions of the mRNAs are completely different, and the dipeptide Gly-Gly in B23.1 (residues 256 and 257) is replaced by Ala-His in B23.2 indicating that the former is not a precursor of the latter. The finding of AGGT sequences in the 3' regions of lambda JH1 suggest the presence of intron-exon boundaries at the point where the two cDNAs begin to differ. To investigate the origin of the two isoforms, two rat genomic libraries were screened with oligonucleotide probes based on sequences from the unique regions of the two cDNAs. One of the genomic clones isolated (lambda JH125) contained a 6.5-kilobase fragment encoding the 3' end of both cDNAs. lambda JH125 contains four exons designated W, X, Y, and Z in the order indicated. Exons W and X encode 36 amino acids at the carboxyl terminus of B23.2, whereas exons W, Y, and Z encode the carboxyl-terminal 71 amino acid residues of B23.1. Exons X and Z each contain distinct 3'-untranslated sequences in which are found polyadenylation signals. These data suggest that two different mRNAs are formed by alternative splicing of separate 3' segments onto a common 5' region.

A Single Gene Codes for Two Forms of Rat Nucleolar Protein B23 mRNA* (Received for publication, March 13, 1989) Jin-Hong Chang and Mark 0. J. Olson4 From the Department of Biochemistry, The University of Mississippi Medical Center,Jackson, Protein B23 (38 kDa, PI = 5.1) is an abundant RNAassociated nucleolar phosphoprotein and putative ribosome assembly factor. A full length cDNA clone (XJH1) encoding a major expressed form of rat protein B23, now designated B23.1, was reported recently (Chang, J. H., Dumbar, T. S., and Olson, M. 0. J. (1988) J. Biol. Chem. 263,[12824][12825][12826][12827][12828][12829][12830][12831][12832][12833][12834][12835][12836][12837]. In this paper the isolation from a rat brain library and sequence of a cDNA clone (XJH2) coding for a second form (B23.2) of protein B23 is reported. Isoforms B23.1 and B23.2 are polypeptides of 292 and 257 amino acids, respectively. The 5'-untranslated regions of the two cDNAs and the amino-terminal 255 amino acids of the proteins are identical in the two isoforms. However, the 3'-untranslated regions of the mRNAs are completely different, and the dipeptide Gly-Gly in B23.1 (residues 256 and 257) is replaced by Ala-His in B23.2 indicating that the former is not a precursor of the latter. The finding of AGGT sequences in the 3' regions of XJHl suggest the presence of intron-exon boundaries at the point where the two cDNAs begin to differ. To investigate the origin of the two isoforms, two rat genomic libraries were screened with oligonucleotide probes based on sequences from the unique regions of the two cDNAs. One of the genomic clones isolated (XJH125) contained a 6.5-kilobase fragment encoding the 3' end of both cDNAs. XJH125 contains four exons designated W, X, Y, and Z in the order indicated. Exons W and X encode 36 amino acids at the carboxyl terminus of B23.2, whereas exons W, Y, and Z encode the carboxyl-terminal 71 amino acid residues of B23.1. Exons X and Z each contain distinct 3'untranslated sequences in which are found polyadenylation signals. These data suggest that two different mRNAs are formed by alternative splicing of separate 3' segments onto a common 5' region.
Proteins B23 and nucleolin (C23) are major RNA-associated nucleolar phosphoproteins and putative ribosome assembly factors. Immunoelectron and immunofluorescence microscopy have shown that protein B23 is predominantly located in the granular region of the nucleolus, whereas nucleolin is * This work was supported by National Institutes of Health Grants 5 R01 GM28349 and SO7 RR05386 and BIONET Grant 1 U41 RR-01685. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence($ reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) 504943 and 504944.
$ To whom correspondence should be addressed Dept. of Biochemistry, The University of Mississippi Medical Ctr., 2500N. State St., Jackson, MS 39216-4505. Tel.: 601-984-1521 largely present in the fibrillar region (Spector et al., 1984;Escande et al., 1985). The granular location of protein B23 is also consistent with biochemical studies which showed it associated with preribosomal ribonucleoprotein particles (Prestayko et al., 1974;Yung et al., 1986). Since the nucleolar granular component contains the more mature preribosomal particles, protein B23 may be involved in the later stages of ribosome assembly.
The concentration of protein B23 in the nucleolus varies depending on the physiological conditions of the cell. On the one hand, this is done by regulation of the rate of synthesis; e.g. phytohemagglutinin stimulation of B and T cells increases the rate of protein B23 synthesis whereas growth arrest has the opposite effect Mond, 1987a, 1987b). On the other hand, a translocation mechanism operates to remove the protein from the nucleolus to the nucleoplasm during inhibition of preribosomal RNA synthesis (Yung et al., 1985). Thus, the nucleolar level of protein B23 appears to be tightly controlled by the cell.
Protein B23 is expressed in at least two isoforms. Chan et al. (1986a) observed two electrophoretic variants (a and (3) of rat protein B23 which differ by 2 kDa in molecular mass, although these have virtually identical isoelectric points. A third form, designated y, having the same molecular weight but a different isoelectric point than isoform (3 has been found in HeLa cells . The presence of at least two variants was confirmed when two different human B23 partial cDNAs were isolated and sequenced (Chan et al., 1986b).
Recently, a full length cDNA for rat protein B23 was isolated and sequenced (Chang et al., 1988). During the course of screening a rat brain cDNA library a second, apparently full length, cDNA clone was observed. The initial purpose of the present study was to determine how the second cDNA differed from the previously isolated cDNA. It was found that the two cDNAs contain identical 5' regions and distinct 3' regions. This led to a second question: do the two isoforms arise from separate genes or from different expressed segments within a single gene? It was found that, at the genomic level, the identical 5' regions and distinct 3' segments of both cDNAs are contained in adjacent segments of DNA separated by intervening sequences. This suggests that two forms of protein B23 mRNA arise from differential splicing of separate 3' segments onto a common 5' region.

EXPERIMENTAL PROCEDURES
Isolation and Characterization of cDNA Clones-A rat brain cDNA library in h g t l l (a generous gift from Dr. S. Wilson, National Cancer Institute, National Institutes of Health) was initially screened by hybridization with a synthetic oligonucleotide (probe 22;Fig. L4) labeled with 32P at the 5' end (Chang et al., 1988). The positive clones were confirmed by a second screening. The recombinant phages were isolated, and phage DNAs were purified by CsCl gradient centrifugation (Maniatis et al., 1982). After EcoRI digestion, the inserts of the recombinant phages were analyzed by electrophoresis in 1% agarose gels and transferred to Nytran filters (Maniatis et al., 1982). The filters were hybridized (Singh and Johns, 1984) Chang et al., 1988) which were 5'-end-labeled with 32P according to Maniatis et al. (1982).
Isolation ofcenomic Clones-Two rat genomic libraries in X Charon 4A, containing random fragments of rat liver DNA fragments generated by EcoRI or HaeIII partial digestion, were kindly provided by Dr. J. S. Sevall (Specialty Labs, Santa Monica, CA). Recombinant phages containing sequences encoding rat protein B23 were identified by hybridization with oligonucleotide probe 27 (Fig. LA) labeled as above. Probe 27 is complementary to the 3"untranslated region of cDNA clone XJH2. The positive clones were isolated and rescreened by hybridization with probe 33 (Fig. LA). After the positive clones were isolated and partially sequenced other synthetic oligonucleotides were constructed for isolating genomic clones containing sequences upstream from those in the existing clones. These are further described in Fig. l.
DNA Sequencing and Analyses-Phage DNA was isolated and the inserts were subcloned into bacteriophage M13mp18 in both orientations using Escherichia coli strain JM109 as host (Hanahan 1983;Yanisch-Perron et al., 1985). DNA sequencing was done by Sanger's dideoxy method using Sequenase as the polymerizing enzyme . Sequences of subcloned cDNAs and genomic DNAs were obtained using previously prepared synthetic oligonucleotides as primers (Chang et al., 1988). All subsequent sequences were extensions of previously determined sequences made by priming with oligonucleotides derived from a terminal sequence (14-17 nucleotides) of the previous sequence. Synthetic oligonucleotides were prepared on the Applied Biosystems 380A DNA synthesizer. DNA sequences were analyzed by the IBI Pustell computer programs and by programs available on line from BIONET.

RESULTS
cDNAs Encoding Two Forms of B23"The isolation and sequencing of an apparently full length cDNA (XJH1) coding for rat protein B23 has been previously reported (Chang et al., 1988). During the course of screening the rat brain cDNA library in Xgtll by hybridization with probe 22 (Fig. lA) a second, apparently full length, cDNA clone (XJH2) was isolated. However, the insert of XJHZ was slightly shorter than that of XJHl (1.2 uersus 1.3 kb'; see Fig. 2). When the inserts from both clones were analyzed by hybridization against oligonucleotides derived from the 5' and 3' ends of XJHl (probes 22 and 33, respectively) it was found that only JH1 hybridized to both probes (Fig. 2). However, XJH2 hybridized only to the 5' end probe (probe 22) suggesting that the 3' region was missing from this cDNA clone. The insert of XJH2 was isolated and subcloned into M13mp18 and sequenced according to the strategy shown in Fig. 3A. This insert contained 1169 base pairs coding for one open reading frame of 257 amino acid residues beginning after a 75-bp 5"untranslated region (Fig. 3B). In addition, a 315-bp transcribed, nontranslated 3'-flanking sequence was found.
Comparison of the two cDNA clones XJHl and XJHP indicated that both contained identical 5'-untranslated regions and 765 bp from the first ATG encoding 255 amino acids (Fig.  3B). However, beginning at residue 256 the proteins coded for by XJHl and XJHZ contained 37 and 2 amino acids, respectively, to form carboxyl-terminal sequences (Fig. 3B). Because of these differences the two proteins were designated B23.1 and B23.2 to correspond tb the coding sequences in XJHl and XJH2, respectively. Not only was there a difference in the length of the two isoforms, but the residues Gly-Gly in position 256-257 of B23.1 were replaced with Ala-His in B23.2. At the 3' ends, although both inserts contained polyadenylation signals (AATAAA) about 16-19 bp from the poly(A) sites (Nevins, 1983), the remainder of the sequences was completely different. Thus, the two cDNA clones have The abbreviations used are: kb, kilobase(s); bp, base pair($. identical 5' regions but differ in their 3' sequences (see diagram in Fig. 4).
Isolation of Genomic Clones-To determine the origin of the two cDNAs for B23 a rat genomic library was screened with synthetic oligonucleotides. In initial attempts using a 5' probe (probe 22; Fig. I), four genomic clones were isolated and partially sequenced. However, none of these contained sequences completely matching the sequence of the cDNA clones XJHl or JH2. Therefore, probe 27 ( Fig. l), corresponding to the 3"untranslated region of XJH2 was synthesized and used for screening the genomic library. Using this probe, one clone (XJH100) was isolated and rescreened with probe and sequencing strategy. The cloned cDNA XJH2 was excised from vector with EcoRI restriction endonuclease and subcloned into M13mp18 in both orientations. A , sequencing reactions were primed with synthetic oligonucleotides as indicated by the overlapping arrows. Thickened line designates coding sequences of the insert. The proposed initiator (ATG) and terminator (TGA) codons are shown above the map. The scale (100 bp) is indicated by the bar. B, nucleotide sequences of cDNA clone XJHB and deduced amino acid B23.2 and comparison to cDNA clone XJHl and deduced amino acid sequence B23.1. The sequence of cDNA clone XJH2 was determined using the strategy indicated in A. The deduced amino acid sequence is shown below the nucleotide sequence with numbering beginning at the proposed initiator methionine residue. The cDNA and deduced amino acid sequences of XJHl are included for sequence comparison (see "Results"). (*, identical residues; -, deletion inserted to optimally align the 3"untranslated sequences of the two mRNAs). 33. XJHlOO contained a 6.5-kb EcoRI fragment which hybridized to both probes 27 and 33. This fragment was purified and subjected to sequencing. The partial sequence (Fig. 5) of the 6.5-kb fragment of XJHlOO revealed the presence of three coding sequences separated by two intervening sequences (Fig. 6). These were designated exons X and Y and Z and corresponded to the 3' ends of the cDNAs in XJHB and XJH1, respectively. The 222 bp of exon X was not only identical to the 3' end of XJHB but it also coded for the Ala-His sequence at the carboxyl terminus of B23.2. Similarly, the 72 and 392 nucleotides in exon Y and Z, respectively, were identical to the corresponding sequences at the 3' end of XJHl and they encoded the 37 carboxylterminal residues in B23.1. Both exons X and Z contained polyadenylation signals (AATAAA) about 12 bp upstream from the corresponding poly(A) sites in the cDNAs. Thus, it became clear that three different exons separated by two introns could code for the 3' ends of the two different mRNAs for B23 isoforms B23.2 and B23.1 (Fig. 6).
Although XJHlOO contained the 3' sequences which were different in the mRNAs encoding the two isoforms of B23, it did not extend into the region which was common to the two forms. Therefore, another oligonucleotide, based on sequences in the 5' end of the upstream intron of XJH100, was synthesized ( Fig. l A , probe 28). The rat genomic library was screened with this probe and with probe 26 (Fig. lA). Several clones which hybridized with both probes were obtained and one of them (XJH125) was selected for sequencing. XJH125 contained the sequences found in the 6.5-kb fragment of JHlOO and extended into the common sequences of cDNA clones XJHl and XJH2 (Figs. 5 and 6). The insert from this clone contained one additional exon, designated W, which was not found in XJH100. Exon W codes for residues 221-255 of both B23.1 and B23.2; i.e. the carboxyl-terminal end of the segment which is common to both isoforms. This further supports the idea that a single gene with alternatively spliced exons encodes both forms of protein B23.
Because of the suggested splicing pattern the sequences of the inserts of the genomic clones were searched for typical intron-exon junction sequences. The introns at the borders of exons W and Y (Fig. 5) contained the intron-exon junction consensus sequences, exon GT-intron-AG exon (Keller and Moon, 1984;Nelson and Green, 1988;Sharp et al., 1987). However, exons X and Z contained only the AG sequence adjacent to the 5' end of the exon. This was expected because these introns are adjacent to exons which are at the 3' terminus of the mRNA and which also contain the polyadenylation signals. Also, a consensus branch site sequence (YNYRAY) is found about 20 bp upstream from the 3' splice sites in all of the introns . Thus, the 3' end of this gene contains the required signals for alternate splicing of various exons.

DISCUSSION
The finding of two distinct cDNAs for mRNAs encoding two different forms of protein B23 reported here raises the question of their genomic origins; i.e. do the two isoforms arise from separate genes encoding unique isoforms or from a single gene encoding multiple exons which are alternatively spliced at the pre-mRNA level. The two isoforms of mRNA have identical 5' segments but differ in their respective 3'translated and -untranslated regions. Because the sequences coding for the two different 3' regions of the mRNAs are found in three exons within a 6.5-kb genomic fragment, the first mechanism is essentially eliminated and it is likely that a single gene encodes both isoforms. This is further supported by the presence of an exon coding for the carboxyl-terminal end of the common region immediately upstream from the prior divergent exon. Thus, alternative splicing of multiple 3"terminal exons onto a common 5' segment of the mRNA best explains the origin of the two isoforms of protein B23. Alternative splicing at the mRNA level to generate various isoforms of proteins is a widespread mechanism, especially in muscle and nervous tissues and in immunological systems (for a review, see Andreadis et al., 1987). Examples of genes in which protein isoforms are generated by utilizing common initiation sites coupled with multiple 3"terminal exons are the genes for calcitonin and calcitonin gene-related peptide (Amara et al., 1984; Jones et al., 1985), immunoglobulin heavy  A q t q a q t a a a a a t t a c c t t t   chains (Early et al., 1980;Rogers et al., 1980;Perlmutter and Gilbert, 1984), and tropomyosin (Karlik and Fyrberg, 1986;Helfman et al., 1986). In at least one of these, the secreted uersus membrane-bound IgM heavy chains, the 3"terminal end is governed by the choice of polyadenylation sites (Galli et al., 1988) or by competition between the pa poly(A) site usage and the Cp4-to-Ml splice Perry, 1986, 1989). The mRNA for the membrane-bound form has two additional exons spliced onto a common 5' segment compared to the secreted form which is truncated and utilizes the upstream polyadenylation site. This is somewhat different from the case of the B23 isoforms in which completely different 3"terminal exons, each containing polyadenylation signals, are spliced onto the 5' module. The gene which more closely resembles the B23 splicing model is the gene coding for calcitonin and calcitonin gene-related peptide, in which separate 3'-terminal exons, each with distinct polyadenylation signals, are also spliced onto a 5' region to produce two different mRNAs. In this case, not only is the pattern of splicing the same, but also the order of the three differentially spliced 3' exons is identical to that found in the B23 gene. Thus, there is at least one precedent for this particular design of the 3' end of a gene.

a t q c t t t c t t c c t c a a t a q G G T G G T T C T T C C C R A A G T EXON X AlaHis
Among the advantages of alternative splicing are the ability to generate families of proteins with variable and constant domains from a single gene and to provide for efficient and reversible switching of proteins as the physiological requirements of the cell change (Andreadis et al., 1987). With the current state of knowledge of the structure and function of protein B23 one can only speculate on the reasons for the utilization of alternative splicing to generate two forms of the protein. However, in this case, the result of this mechanism is the deletion of the carboxyl-terminal 35 residues and the substitution of two upstream residues to transform B23.1 to B23.2. Secondary structure predictions (Chou and Fasman, 1978) suggest that this carboxyl-terminal end of the molecule (especially residues 260-290) is predominantly a-helical. In this segment the nonpolar residues would tend to be on one side of the helix and be expected to interact with the remainder of the molecule. Therefore, its removal may have a significant effect on the entire structure. On the other hand, the presence of a substantial number of aromatic residues suggests that this segment may be involved in nucleic acid binding. It is known that protein B23 is associated with RNA in the nucleolus  and it was recently learned that the protein preferentially binds single-stranded nucleic acids in Deletion of the carboxyl-terminal end could alter the ability of the protein to bind nucleic acids and could serve as a mechanism for modulating protein-nucleic acid interactions.
A number of RNA-binding proteins (e.g. hnRNA binding proteins, the poly(A)-binding protein and nucleolin) have * T. S. Dumbar, G. A. Gentry, and  segments of sequence in common (Dreyfuss et al., 1988;Merrill et al., 1988) suggesting that these proteins constitute a gene family. In these proteins the putative RNA binding site 1s partially contained 80-90 residue repeats which, in turn, contain a highly conserved "ribonucleoprotein consensus sequence." Comparison of the protein B23.1 sequence with several of these proteins reveals only a vague similarity to these RNA binding domains. A short segment (residues 260-269, KVEAKFINY; Fig. 3B) has a weak resemblance to one of the ribonucleoprotein consensus sequences of the yeast poly(A)-binding protein (Dreyfuss et al., 1988), at least in the spacing of the downstream aromatic residues. However, crosslinking studies or generation of deletion mutations will be required to determine whether this is an RNA binding region of the molecule.
Because of the rapid evolution of sequence information on B23 it is now possible to compare sequences from various species (Fig. 7) with the hope of finding clues on functionally important segments of the protein. The cDNA and amino acid sequences have been determined completely for protein B23 from rat (Chang et al., 1988), mouse (Schmidt-Zachmann et al., 1988), human (Chan, et al., 1989), Xenopus lueuis (Schmidt-Zachmann et aZ., 1987), and partially for chicken (Borer et al., 1989). In addition, protein B23 is related to X . laeuis nucleoplasmin (Dingwall et aZ., 1987). A number of interesting observations may be made comparing the sequences shown in Fig. 7. First, among the mammalian sequences (excluding B23.2) the level of conservation is extremely high. In fact, there are only 2 very conservative differences between mouse and rat and 14 differences (about one-half of them conservative substitutions) between human and rat B23. Second, the conservation is greatest at the two ends where the rat, mouse, and human sequences are all identical up to residue 131 and from residue 214 to the   carboxyl terminus. This is where the aromatic residues are located, which leads to a third observation: the aromatic residues are all in identical positions in all species examined with the exception of a missing phenylalanine at residue 230 in Xenopus. Because of the suggested involvement of aromatic residues in nucleic acid binding (see above), the two ends of the molecule seem the most probable candidates for this activity. Finally, although the degree of conservation is not as high in the centers of the molecules as in the ends the positions of the two highly acidic regions (residues 120-132 and 159-187) are conserved in all species for which there is sequence information. This also includes the phosphorylatable serine at residue 125 (Chan et al., 1986a) which is present as a casein kinase I1 site in all species. As sequences of protein B23 from more divergent species become available it will be interesting to see which parts of the molecule remain conserved.
The findings reported here raise a number of questions regarding the functions of the isoforms, their levels of expression and control of expression. Protein B23 is believed to be involved in the later stages of ribosome assembly (Spector et al., 1984). More recently, the protein has also been suggested to be a carrier of ribosomal proteins from the cytoplasm to the nucleolus and/or a transporter of nearly completed ribosomes from the nucleolus to the cytoplasm (Borer et al., 1989). Under varying physiological conditions, differential expression of the two forms could modulate these activities. Therefore, it will be necessary to determine the actual protein and mRNA levels of the isoforms in various tissues and cell types. It will also be important to isolate the two isoforms or express them in bacterial systems so that nucleic acid and ribosomal protein binding properties may be compared. Finally, the mechanisms controlling the expression of the isoforms will need to be investigated. For example, are specific sequences generating unique secondary structures at the 3' ends of the mRNA involved (Solinick, 1985;Eperon et al., 1988)? Is posttranscriptional modification of internal residues; e.g. methylation, a possible control mechanism (Canaani et al., 1979;Dimock and Stoltzfus, 1977;Schibler et al., 1977;Wei and Moss, 1977)? Are there transacting factors such as proteins or small RNAs responsible for splice site selection (Dreyfuss, 1986;Frendewey et al., 1987;Konarska and Sharp, 1987;Nelson and Green, 1988;Sharp et al., 1987;Leff et al., 1987)? Answers to these functional and regulatory questions should provide us with a clearer understanding of the role of nonribosomal proteins in ribosome biogenesis.