Independent Genes Coding for Three Acidic Proteins of the Large Ribosomal Subunit from Saccharomyces cereuisiae”

The yeast ribosome contains three acidic proteins, L44, L44‘, and L45, closely related from a structural point of view, that seem to play a functional role simi- lar to that of proteins L7 and L12 in the bacterial ribosome. By screening a cDNA bank in Xgtll with specific polycolonal and monoclonal antibodies, recombinant phages expressing each one of the acidic proteins have been cloned. A unique copy of each gene is detected using the phage cDNA inserts as probes in nitrocellu- lose blots of yeast DNA digested with different restriction enzymes. The inserts were subcloned in the plas- mid pUC19, and their physical maps and nucleotide sequences were determined. By using the cDNA inserts as probes in genomic DNA banks, DNA fragments car- rying the acidic protein genes have been cloned, characterized, and sequenced. The results conclusively show that the three yeast acidic proteins are coded by independent genes and are not the result of a post-translational modification of the product of a unique gene, as in bacteria. Like most ribosomal protein genes, the gene for protein L44’ has an intron and two upstream stimulatory boxes (UAS,,) fitting closely to the consensus sequence. The genes coding for proteins L44 and L45 lack introns and seem also exceptional in other characteristics of their sequences. Proteins L44 and L45 have amino acid sequences with about 80% similarity.

The yeast ribosome contains three acidic proteins, L44, L44', and L45, closely related from a structural point of view, that seem to play a functional role similar to that of proteins L7 and L12 in the bacterial ribosome.
By screening a cDNA bank in Xgtll with specific polycolonal and monoclonal antibodies, recombinant phages expressing each one of the acidic proteins have been cloned. A unique copy of each gene is detected using the phage cDNA inserts as probes in nitrocellulose blots of yeast DNA digested with different restriction enzymes. The inserts were subcloned in the plasmid pUC19, and their physical maps and nucleotide sequences were determined. By using the cDNA inserts as probes in genomic DNA banks, DNA fragments carrying the acidic protein genes have been cloned, characterized, and sequenced. The results conclusively show that the three yeast acidic proteins are coded by independent genes and are not the result of a posttranslational modification of the product of a unique gene, as in bacteria. Like most ribosomal protein genes, the gene for protein L44' has an intron and two upstream stimulatory boxes (UAS,,) fitting closely to the consensus sequence. The genes coding for proteins L44 and L45 lack introns and seem also exceptional in other characteristics of their sequences.
Proteins L44 and L45 have amino acid sequences with about 80% similarity. Protein L44' is only 63% similar to the other two polypeptides. The three proteins have highly conserved carboxyl termini comprising the last 30 amino acids, and the first 10 amino acids of L44 and L45 are identical.
The results cast doubts about the possibility of a similar role for the different acidic ribosomal proteins.
The presence of a set of very acidic proteins is a general feature of ribosomes from all organisms Bielka, 1982). These proteins seem to be involved in the interaction of supernatant factors with the ribosome (Kisha et al., 1971;Hamel et al., 1972); although under particular conditions, some factordependent ribosomal activities can be performed in their * This work was supported by institutional grants to the Centro de Biologia Molecular from Fondo de Investigaciones Sanitarias and by personal grants (to one of us) from Consejo Superior de Investigaciones Cientificas. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked ''aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession numberfs) 503760 and 503761.
absence (Ballesta and Vizquez, 1972;Hamel et al., 1972;Glick, 1977;Koteliansky et al., 1977). In Escherichia coli, the two acidic proteins correspond to a unique polypeptide with the amino terminus either free (protein L12) or acetylated (protein L7). They are present as dimers in a complex with ribosomal protein L10, forming a characteristic protuberance of the large ribosomal subunit (see Moller and Maasen (1986) and Traut et al. (1986) for recent reviews).
In higher organisms, two phosphorylated acidic proteins are usually found associated to ribosomes Bielka, 1982). In Artemia salina, acidic proteins eL12 and eL12' have been found encoded by independent genes (Maassen et al., 1985). In mammals, the proteins have been called P1 and P2, and their sequences are known in the case of rat and human livers either by protein (Lin et al., 1982) or cDNA' (Rich and Steitz, 1987) sequencing studies. The genes for two acidic ribosomal proteins have also been reported in Drosophila melanogaster by independent groups which have called them A1 (Qian et al., 1987) andrp2lc (Bigboldus, 1987). One acidic protein, called Al, has been described in Schizosaccharomyces pombe (Beltrame and Bianchi, 1987).
In Saccharomyces cerevisiae, the presence of three acidic proteins, L44, L44', and L45, has been reported (Sinchez-Madrid et al., 1979;Juan-Vidales et al., 1984). The three polypeptides can be monophosphorylated , and this modification affects drastically their affinity for the ribosome and their capacity to reactivate protein-deficient particles (Juan-Vidales et al., 1984). The three proteins have strong structural similarities, such as analogous molecular weights, close amino acid compositions, and common immunological determinants (Sinchez-Madrid et al., 1979;Juan-Vidales et al., 1981b. On the other hand, tryptic analysis shows differences among the three proteins that may be due to either regions of different amino acid sequence or post-translational modification of a common precursor. In the first case, yeast would resemble higher eukaryotic organisms and in the second case, bacterial cells. Supporting the second alternative is the fact that, as in bacterial proteins L7 and L12, the amino-terminal group of L45 is free, whereas it is blocked in L44 . The amino acid sequence of a yeast acidic ribosomal protein, YPA1, has been reported (Itoh, 1980); but it is not known which of the three proteins reported here it corresponds to. This information then does not help to clarify the structural relationship among the three acidic polypeptides.
The availability of both polyclonal and monoclonal antibodies specific against the acidic ribosomal proteins of yeast The rat liver P1 sequence was provided by Y. L. Chan and I. G. Wool (personal communication). encouraged us to attempt to clone their genes to study the structural relationship of the three polypeptides. The results showing the existence of independent genes for each acidic protein and the differences among them are presented in this report.

MATERIALS AND METHODS AND RESULTS?
Physical Characterization of cDNA Inserts Coding for Acidic Proteins-By screening a cDNA library prepared with poly(A)+ mRNA from yeast in Xgtll with specific polyclonal and monoclonal antibodies (see Miniprint), three recombinant phages, XgtB242, XgtBl71, and XgtB302, which express proteins L45, L44, and L44', respectively, were obtained. The estimated size of the corresponding insert as determined by agarose electrophoresis was 470, 320, and 420 base pairs. The inserts were subcloned in plasmid pUC19. By digestion with different restriction enzymes, the physical maps shown in Fig.  1 for each of the inserts were constructed. The full sequences of the three cDNAs were obtained (not shown) by the dideoxy chain termination method using the subcloning strategy indicated in Fig. 1. Polyadenine stretches are present in the sequences of inserts 242 and 302, but not in that of insert 171.
Gene Copy Number for Acidic Proteins-The cDNA inserts were used as probes in Southern blots of total yeast DNA digested with HindIII and HincII (Fig. 2). In both digests, only one band hybridizes strongly with each one of the inserts, although a partial hybridization with the bands corresponding to the other two genes is also detected, especially for insert 302. The size of the DNA fragments hybridizing with the probes, especially for inserts 242 and 302, strongly indicates that the genes are not closely linked in the genome.
Cloning and Sequencing of Genes from Genomic DNA Libraries-The labeled cDNA inserts were also used as probes for screening genomic yeast DNA libraries in order to clone the genes encoding the three acidic ribosomal proteins. Only Portions of this paper (including "Materials and Methods," part "Results," and Figs. 1, 7, and 9-11) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press.

FIG. 3. Physical maps of genomic DNA inserts subcloned in plasmid pUC18.
The arrows indicate the extension and orientation of the fragment sequenced using a synthetic oligonucleotide complementary to the preceding 15 nucleotides. The doubled-haded arrow marks the part of the fragment corresponding to the cDNA insert. The black circle and the X indicate the positions of the translation initiation and termination codons, respectively. MCS, multiple cloning sites (see Yanish-Perron et al., 1985). Letters mark the positions of different restriction sites: A, AluI; E , BanII; D, DraI; E, EcoRI; H, HindIII; Hc, HincII; K, KpnI; P, PstI; Po, PuuII; SI, SalI; and Sc, S a d .
clones hybridizing with insert 171 (L44) were obtained from an EcoRI DNA bank in Xgtll. A genomic DNA insert of 2.3 kb was subcloned into plasmid pUC18.
In the screening of a Sau3a DNA library in plasmid YEpl3, positive clones were obtained only in the case of insert 242. A 5.9-kb3 DNA fragment was detected in the positive clones, from which a 3.7-kb EcoRI piece containing the insert 242 hybridizing part was subcloned into plasmid pUC18.
To assure the cloning of the L44' gene, a minibank was constructed by inserting yeast DNA fragments hybridizing with insert 302 into the HindIII restriction site of plasmid pUC18. The fragments were detected by Southern blots in the 1.7-2.2-kb region of HindIII-digested DNA-agarose gels (Fig. 2). As expected, several positive clones containing inserts of 1.9 kb were isolated by screening the constructed library with the insert 302.
The restriction enzyme maps of the genomic DNA inserts subcloned into pUC18 were determined (Fig. 3).
Part of the inserts, including the cDNA hybridizing fragment and its 3'-and 5"flanking regions, was sequenced by the dideoxy chain termination method using complementary synthetic oligonucleotides as primers . 891, 953, and 1226 nucleotides were sequenced from genes corresponding to proteins L44, L45, and L44', respectively. The nucleotide sequences found in the genomic DNA of the three genes coincide with the sequenced cDNA inserts except at nucleotides 23, 39, and 431 of the L45 gene and nucleotide 487 of the L44' gene. These positions correspond to cytidines in the cDNA probably due to errors made by the RNA reverse transcriptase, but the correction in the genomic DNA does not alter the amino acid sequence since the positions correspond to third-base positions in the reading frame. The most important difference in the cDNA sequence is the presence of an intron containing 300 nucleotides starting at position 114 of the L44' gene. The intron has all the characteristic sequences found in the intervening regions of yeast ribosomal protein genes (Planta et al., 1986).
The sequenced 3'-flanking region covers about 160 nucleotides including the polyadenylation site, whereas the 5'flanking region ranges from 415 to 460 nucleotides and con-The abbreviations used are: kb, kilobase; IPTG, isopropyl-8-Dthiogalactoside; mAb, monoclonal antibody; SDS, sodium dodecyl sulfate; BSA, bovine serum albumine; PIPES, piperazine-N,N'-bis(2ethanesulfonic acid); CETAB, N-cetyl-N,N,N,-trimethylammonium bromide; HPLC, high pressure liquid chromatography; ELISA, enzyme-linked immunosorbent assay. tains regulatory sequences, as will be discussed later. A summary of the most interesting features in the 3'-and 5'flanking regions of the three genes is shown in Table I.

Genes for Acidic Ribosomal Proteins in Yeast
Nuclease SI Mapping of Transcription Initiation Sites-Initiation of DNA transcription has been determined by S1 nuclease mapping. The restriction fragments BanII-Ban11 in pRVE45, HincII-HincII in pMRE44, and SalI-PstI in pM-RH46 (Fig. 3) labeled at the 5' ends were used for hybridization with poly(A)+ mRNA. After nuclease treatment, two main bands were found in the three cases (Fig. 7), which have been marked with vertical arrows in the respective sequences (Figs. 4-6). 5"Leader regions of approximately 113, 80, and 13 nucleotides were found in mRNA corresponding to proteins L44', L45, and L44, respectively. Amino Acid Sequence of Three Acidic Proteins-As deduced from the nucleotide sequences of their genes, proteins L44, L44', and L45 contain 106, 106, and 110 amino acid residues and have molecular weights of 12,747, 12,805, and 12,993, respectively. The amino acid sequences of the three polypeptides have been compared using a sequence comparison program from the Wisconsin University Biotechnology Center software package (Fig. 8). Proteins L44 and L45 are about 80% similar to each other, but only 63% similar to L44'. The  I I I I I I I1 I l l I I I l l I I I I I I I I I I I I I I I I I I I I I  codon usage in the three genes show the typical yeast bias corresponding to a highly expressed protein (Sharp et al., 1986).

DISCUSSION
The three acidic proteins of the S. cereuisiue ribosome, proteins L44, L44', and L45, are expressed separately from recombinant X phages containing cDNA inserts. These results exclude the possibility, suggested by the strong physicochemical similarity of the three polypeptides (Juan-Vidales et al., 1984), that they were the result of post-translational modification of a common precursor, as in bacteria. This situation is consistent with data from higher eukaryotic cells, where genes coding for each ribosomal acidic polypeptide are detected (Maasen et al., 1985).
The three proteins are expressed from recombinant X phages as fusion polypeptides joined to 8-galactosidase. It is interesting to note that, in spite of the fact that these fused proteins are under the control of the lacZ gene promoter (Young and Davis, 1983), expression of the polypeptides takes place in the absence of the inducer isopropyl-@-D-thiogalactoside. The functionality of the promoter is, however, clear in the Xgtll control; and therefore, an explanation for these results is not obvious, especially considering that the insert is quite near the termination codon of the /%galactosidase gene.
It is also worthwhile to mention the fast degradation of the foreign part of the fusion protein in the infected cells that results in a polypeptide, which is shorter than the enzyme, and is recognized by the anti-@-galactosidase serum, but not by the anti-acidic protein antibodies. This rapid degradation probably explains the difficulties encountered in detecting the fusion protein in lysates from liquid media-grown cells.
One single copy of the genes encoding proteins L44, L44', and L45 is detected by Southern blot analysis of DNA. This fact distinguishes the acidic proteins from most other ribosomal proteins, generally encoded in more than one gene copy in the yeast genome (Fried et al., 1981), and at the same time indicates that gene dosage is not the mechanism controlling the higher expression of the acidic proteins as compared with other ribosomal components. Southern blot results also suggest that, as in most other ribosomal protein genes, those coding for the acidic proteins are scattered in the genome.
The nucleotide sequences of the three genes and of their flanking regions show several interesting characteristics. Probably the most appealing result is the presence of an intervening region only in the gene encoding protein L44'. The intron has splice junctions that correspond to the consensus sequences found in eukaryotic spliced genes (Mount, 1982) and that are common to other yeast ribosomal proteins (Planta et al., 1986;Mager, 1988). The TACTAAC box required for splicing in yeast (Pikielny et Langford et al., 1984) is present in this intron, too.
From this point of view, protein L44' falls in the group of standard ribosomal proteins, whereas proteins L44 and L45 increase the number of exceptions.
The regulation of mRNA splicing seems to be a mechanism for controlling the expression of some ribosomal proteins (Warner et al., 1985). Proteins without introns in their genes would be free from this control and, in principle, able to accumulate in higher amounts. The acidic proteins are present in multiple copies in the yeast ribosome (Kruiswijk et al., 1978;Juan-Vidales et al., 1984); and, in addition, an important pool of acidic ribosomal proteins is found in the cytoplasm. It is interesting to note that the cytoplasmic pool seems to be lower for protein L44' than for proteins L44 and L45, which do not have introns in their genes.' However, other ribosomal proteins coded by intron-free genes (proteins L3, L16, S24, and S33) do not seem to accumulate in the cell, indicating that additional mechanisms must control the expression of ribosomal proteins. Moreover, splicing control does not work in all intron-containing genes (Warner et al., 1986).
Ribosomal protein genes are also characterized by the usual presence of two conserved sequences (UAS,, boxes) which are separated by no more than 40 nucleotides, at higher than 200 nucleotides upstream from the translation initiation site, and which act as transcription activators (Leer et al., 1985a;Rotenberg and Woolford, 1986). Using a sequence-fitting computer program, two boxes that fit the UAS,, consensus sequence (Woudt et al., 1987) can be found in the three genes . The quality of the fit is different for all three cases (high for L44', reasonable for L44, and poor for L45), meaning that, probably in the last case, the sequences do not function as activators. In the L44' gene, the sequences are present in the coding strand; whereas in the L44 and L45 genes, the boxes are in different strands in opposite orientation and, in the case of L45, separated by more than 100 nucleotides.
It seems therefore that, also from the point of view of the UAS,, boxes, protein L44' is closer to the typical ribosomal protein genes than the other two proteins. This is a very interesting fact since different biochemical results suggest that protein L44' seems to function like a typical ribosomal component, contrary to proteins L44 and L45. Thus, in addition to the different cytoplasmic pools of the three proteins discussed above, we have found that protein L44' requires a higher salt' concentration to be removed from the ribosomal particle.
It is interesting to note that the ribosomal protein genes without introns either lack UAS,, boxes (like proteins L3 and S33) or have apparently less efficient ones (like proteins s24, L45, and L44). Proteins L16 is the exception, having standard UAS,, regions in the two unspliced gene copies present in the genome. The T-rich region detected (in the ribosomal protein genes) closer to the transcription initiation site than to UAS,, is present in the three genes, too (Rotenberg and Woolford, 1986). The transcription initiation sites have been located by nuclease S1 mapping at about 113, 80, and 13 nucleotides upstream of the translation initiation site in L44' and L45 and L44 genes, respectively. The L44' and L45 initiation sites are in the sequence Pyr-Ade-Ade-Pur, common to other yeast genes; whereas L44 initiates in the sequence AACCAA present in some, but not all, ribosomal protein genes (Planta et al., 1986). The size of the 5"leader region is noticeably long in the case of the protein L44' gene since the reported extension of this flanking region does not exceed 40 nucleotides in other ribosomal protein genes (Planta et al., 1986). In the first two genes, the 3'-polyadenylation sites have also been identified from the sequence of their cDNA clones and are preceded at adequate distance by the presumed polyadenylation signal AATAAA (Fitzgerald and Shenk, 1981). In spite of having a AATAAA signal, a polyadenine track was not found in the cDNA from the L44 gene, which might reflect a possible mistake in the preparation of the cDNA bank.
Some other features in the nucleotide sequence of the three genes are summarized in Table I. A number of hairpin loops can be formed in the 3' and 5"flanking regions of the protein L44 gene; and, in one case, the same stretch (nucleotides -381 to -390) can be part of two different stem-loops. These structures are rare, however, in the genes of proteins L44' and L45, although they are found in the intervening region of L44'. It is interesting to note the presence of two sets of direct complementary sequences of 12 and 9 nucleotides each which can stabilize a circular loop in the 5"leader region of the L44' gene. The biological meaning of all these structures is difficult to assess at this moment, but its implication in transcriptional or translational control should not be disregarded.
The amino acid sequences of each of the three proteins deduced from the gene nucleotide sequence have been compared. The three polypeptides share an almost identical carboxyl terminus, including the last 30 amino acids. The similarity at the amino terminus is also high in proteins L44 and L45, which have the same first 10 amino acid residues. Protein L44' has a totally different amino-terminal end. Protein L45 has practically the same amino acid sequence as ribosomal protein Al, reported by Itoh (1980) from amino acid sequencing studies.
Presently, the amino acid sequences of the acidic ribosomal proteins from five different organisms, including A. salina (Maasen et al., 1985), D. melanogaster (Qian et al., 1987;Bigboldus, 1987), rat liver (Lin et at., 1982), human liver (Rich and Steitz, 1987), and S. cerevisiae (this report), are known. In the case of S. pombe, only one acidic protein has been reported (Beltrame and Bianchi, 1987). When the amino acid sequences of all of them are compared, they can be divided easily in two groups whose components have, on the average, around 70% similarity in their sequences. The similarity with the components of the other group is about 55%. One group is formed by mammal protein P2, A. salina eL12, D. melarwgaster AI, S. pombe Al, and S. cerevisiae L44 and L45; the other group is formed by mammal P1, A. salina eL12', D. melanogaster rp2lc, and S. cerevisiae L44'. Both protein groups share a highly conserved carboxyl terminus, with the same last 10 amino acids, but differ in the amino terminus, which is conserved among the members of each group. S. cerevisiae protein L44' is an exception since its amino-terminal end does not correspond to the rest of proteins in its group.
S. cerevisiae is unique in having two different proteins, L44 and L45, in the same group. Preliminary data indicate that the yeast Hamenula anomala6 and the filamentous fungi Geotrichum &tis6 have, like S. cereuisiae, three ribosomal acidic proteins when checked by isoelectrofocusing, suggesting that this might be a characteristic of lower eukaryotes. Additional evidence is required to confirm these results and to show that, as in S. cerevisiae, two out of the three proteins belong to the same structural group, as well as to determine the functional significance of this fact. The presence of an identical C terminus suggests the existence of a common interacting site in the ribosome for the different polypeptides. If this is so, the eurkaryotic acidic proteins would interact with the particle in the opposite way as their bacterial counterparts, which have been shown to bind through their amino terminus (Liljas, 1982). However, this fact might be explained by the existence of a transposition during evolution affecting the ends of the molecule, which postulated by different laboratories (Lin et al., 1982;Otaka et al., 1985). Alternatively, the eukaryotic acidic proteins may interact with the ribosome through their amino terminus in different ribosomal sites. This possibility would be compatible with the reported location of the acidic proteins at different regions in the ribosome (Traut et al., 1986).
The structural similarity between the bacterial and eukaryotic acidic proteins, stressed by their functional interchangeability (Skchez-Madrid et al., 1981b), has favored the idea that, as in bacteria, the different types of acidic proteins are carrying out an identical function in the eukaryotic ribosome. The data available now on the structure of the acidic proteins in higher organisms cast some doubts on this concept and raise the possibility of a different role for each type of acidic protein.
Additional information is required to clarify the real function of the acidic ribosomal proteins in eukaryotic cells, and the availability of their genes in a genetically accessible organism such as S. cerevisiae will help in that direction.