Isoleucyl-tRNA Synthetase from the Ciliated Protozoan Tetrahymena thermophila DNA SEQUENCE, GENE REGULATION, AND LEUCINE ZIPPER MOTIFS*

We have determined the nucleotide sequence of a protozoan aminoacyl-tRNA synthetase. The isoleucyl-tRNA synthetase (ileRS) gene [ilsA; formerly cupC, Martindale, D. W., Martindale, H. M., and Bruns, P. J. (1986) Nucleic Acids Res. 14, 1341-13641 from the ciliate Tetrahymena thermophila was sequenced and found to have eight introns, four transcription start sites, and a putative polypeptide of 1081 amino acids. A polypeptide 20 amino acids longer could be made if a transcribed in-frame ATG close to the start sites and with suboptimal sequence context is used. This gene was identified through hybridization and amino acid sequence similarity to the previously cloned and se- quenced ileRS (cytoplasmic) gene from Saccharomyces cereuisiae [Englisch, and Cra-mer, and Genet. with which it shares 47% of its amino acids. We also compared it to ileRS genes from E. coli and an archaebacterium.

* This work was supported by an Operating Grant (to D. M.) from the Natural Sciences and Engineering Research Council of Canada. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession numbeds) M30942.
Supported by  is then transfered to one of the hydroxyl groups of the 3'terminal adenosine of an appropriate tRNA molecule. The fidelity of this reaction and thus of translation is maintained by accurate substrate recognition and discrimination and sometimes by an editing deacylase mechanism (20, 23). Despite common functions, the aaRSs vary in polypeptide size, amino acid sequence, and subunit composition (reviewed in Refs. 62 and 63). Comparisons of aaRS amino acid sequences and three-dimensional structures deduced by x-ray crystallography of representative aaRSs have revealed the presence of at least two classes of aaRSs, each with their own unique short sequence motifs (12, 17, 64). These two classes of enzymes, one that includes aaRSs for most hydrophobic amino acids, appear to have evolved from different ancestral enzymes (17).
Sequences have been determined of genes representing all 20 aaRSs (18, 33). These genes have been obtained mainly from the eubacterium Escherichia coEi and from the yeast Saccharomyces cereuisiue (see Refs. 63,33). Examples of sequenced aaRS genes from other organisms are limited and include examples from an insect (8), mammals (22; 35; 69), and recently, from an archaebacterium (36). To improve our understanding of aaRSs, their conserved features and their idiosyncrasies, it will be instructive to examine these enzymes and their genes from a number of species.
In this report we describe the identification and full sequence of an isoleucyl-tRNA synthetase (ileRS) gene (ilsA) from the ciliated protozoan Tetrahymena thermophila. The ciliated protozoa are organisms that diverged relatively early from the main stream of eukaryotic evolution (67). They are very complex unicellular organisms that harbor two functionally distinct nuclei (reviewed in Ref. 38). This is the first reported sequence of an aaRS gene from a protozoan and one of the few reported sequences of an enzyme-encoding gene from a ciliate. We analyze its deduced polypeptide sequence and compare it to the available ileRS sequences from E. coli (73), S. cereuisiae (49), and the archaebacterium, Methanobacterium thermoautotrophicum (36). In addition, we examine its expression under different physiological conditions.

RESULTS
Isolation-cDNA (pC8) representing the T. thermophila ilsA gene (formerly cupC; 47) was originally isolated during a study of genes active during the sexual stage (conjugation) of Portions of this paper (including "Materials and Methods," Figs. 1, 2, 4, 6, and 7, and Table 1) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press.

4592
the T. thermophilu life cycle (45). The pC8 cDNA was found to contain sequences of an RNA transcript present and peaking in abundance during early conjugation but also present at lower levels relative to total RNA during normal cell growth. Almost no transcript was present in total RNA from starved cells prior to conjugation.
The insert in pC8 harbors one EcoRI site and hybridized to two genomic EcoRI fragments that represent a single chromosomal location for this gene ( Fig. 1 in miniprint). The ilsA gene is located on micronuclear chromosome 2 (47) and Southern analyses of micronuclear and macronuclear DNA digests indicated that no rearrangement or major sequence elimination occurs in or near this gene during differentiation of the somatic macronucleus (47). The ikA gene (pC8) was found to hybridize under stringent conditions to genomic DNA from S. cerevisiae indicating that the sequence is evolutionarily conserved (Fig. 1).
Cloning and Sequencing-The structural organization of the ilsA gene, restriction maps, and clones used for sequencing are illustrated in Fig. 2 (see miniprint). Both genomic DNA and cDNA representing the ilsA gene were sequenced. Both strands of the coding region (98 and 92% of each, respectively) were sequenced. Regions not confirmed by the opposite strand were confirmed using the sequence of an independent clone. The complete nucleotide sequence and deduced amino acid sequence of the ilsA gene are shown in Fig. 3. The ilsA gene spans approximately 4500 nucleotides, from transcription start sites to the polyadenylation site, and is composed of nine exons and eight introns. The ilsA gene's introns are A+T rich and have intron borders typical of T. thermophila nuclear pre-mRNA introns (11). ilsA introns have an average G+C content of 15%. This is much lower than the average G+C of its coding regions (38%).
Transcription Start Sites-Transcription start sites were deduced by primer extension analysis of polyadenylated mRNA ( Fig. 4 in miniprint) using a primer complementary to nucleotides +8 to +28 of the ilsA gene's sense strand (see Fig. 3). Three major start sites (positions -60, -66, and -70; underlined in Fig. 3) and a minor start site (position -72; also underlined in Fig. 3) were found (Fig. 4). The same start sites were used when the gene was transcribed during normal logarithmic growth3 and during conjugation (Fig. 4). The most complete cDNA clone went to position -31 (Fig. 3) and no 3' intron acceptor dinucleotide (AG) is present between position -31 and the -72 start site.
Open Reading Frame Analysis-The ilsA gene has a long open reading frame of 3243 nucleotides (Fig. 3). The open reading frame ends at a TGA and, as in several other ciliates, TAA and TAG code for glutamine instead of termination (26,

32).
The transcription start site analysis reveals that the first possible translation initiation site (an in-frame ATG at -60) is located very close to the transcription start sites. Indeed, the most abundant transcription product of the ilsA gene begins at the A residue of this putative start codon. It is only six bases and ten bases downstream of the two other abundant transcription start sites, is 12 bases downstream of the minor mRNA start site, and probably does not support efficient translation (see "Discussion").
The open reading frame originating from the ATG at +1 encodes a deduced polypeptide of 1081 amino acids (Fig. 3) with a putative molecular mass of 124 kDa and a PI of 6.3. The frequency of preferred codons (Fpr) within the ilsA gene was computed (48). The Fpr of this gene (0.85) is slightly lower than that of the highly expressed ciliate histone and C. Csank and D. W. Martindale, unpublished results. actin genes (Fpr = 0.90; 48). If translation occurs from the ATG at -60, the putative polypeptide is 20 amino acids longer. The region separating the first and second start codons does not have a codon bias typical of T. thermophilu genes (Fpr of 0.47; 48). The 1100 nucleotides of DNA upstream of the ilsA start sites (Fig. 3) is devoid of open reading frames with appropriate codon usage, and has a high A+T content (75%)

typical of T. thermophila intergenic regions (31).
Gene Expression-When T. thermophilu cells that have been growing exponentially in a rich medium are washed and resuspended in starvation buffer (10 mM Tris, pH 7.4), the rates of RNA and protein synthesis rapidly decline, and by 2 h are reduced to 6% or less of their original rates (46). Polysomes also dissociate (24), and most cells complete their cell division cycles and become arrested at macronuclear G1 phase (51). As RNA and protein synthesis decline, the level of the ilsA mRNA product decreases (45).
When different mating types of starved T. thermophila cells are mixed they undergo synchronous mating (conjugation). Soon after cells pair, micronuclei begin to undergo meiosis. A burst in RNA synthesis occurs almost immediately upon mixing cells, and this is followed by a rise in the rate of protein synthesis that peaks during meiotic prophase (46). Polysome formation is also induced (46). After the period corresponding to meiotic prophase, protein synthesis rates are again reduced. Several conjugation-specific genes are transcribed maximally during meiotic prophase, as is the conjugation-induced ikA gene (45). Primer extension analysis confirms previous findings (45) that ilsA transcripts make up a larger fraction of the polyadenylated RNA during early conjugation (3-5 h) than during normal growth?
We wanted to examine whether ilsA gene expression is coordinately increased with protein synthesis in a situation other than meiotic prophase. We chose to examine ilsA gene expression during nutritional shift-up. When starved T. thermophilu cells are refed, most mRNA is loaded onto polysomes within 2 h (15, 24) and a rapid induction of protein synthesis occurs (7). About 40% of the protein synthesis occurring in the first hour after refeeding is that of ribosomal proteins (15, 24). This is paralleled by an increase in ribosomal protein mRNA abundance that is much higher in newly refed cells than in exponentially growing cells (1, 25, 58). Fig. 5 shows a Northern blot analysis of total RNA extracted from T. thermophila cells that were growing exponentially, from cells starved for 24 h, or from starved cells that were refed. Approximately equal amounts of RNA (10 Kg) were size-separated on an agarose gel, transfered to nylon membrane, and hybridized with the cDNA probe, pC8 (Fig. 5). An mRNA transcript size of 3.5 kb was determined for the ilsA gene (Fig.  5). This transcript was present during normal growth and was greatly reduced in starved cells ( Fig. 5; Refs. 45,68). Upon refeeding, the ikA transcript rapidly reappeared and 0.5-1 h after refeeding made up a larger fraction of total RNA than it did during logarithmic growth. When the refed cells started dividing, the relative abundance of the transcript declined to levels somewhat less than that found in growing cells prior to starvation. This may be because of the slower growth rate reached by refed cells. The results described above suggest that transcription of the ikA gene parallels changes in protein synthesis rates. The transcription pattern of the ikA gene also resembles that of ribosomal protein genes (1,25, 58).
Identification of the ikA Gene as an Isoleucyl-tRNA Synthetase Gene-We have previously reported the cloning and sequencing of the cytoplasmic ileRS gene from S. cerevisiae (49). This gene was obtained from an S. cerevisiae genomic DNA library using the T. thermophila ilsA probe pC8. It is  identical except for minor strain differences to the ILSl gene isolated independently by transformation of a yeast strain with DNA that complemented a temperature-sensitive mutation in the gene for cytoplasmic ileRS (16,50).

G A~T A~G T~A T A T A~G g t t t t a t ; t t c t t c t t t~a a t a a a a c c~a a t t a a t c a ; t t t s a t t c c~t a t c c t a a~A~A A C~~T T T~~f f i~~~~A D A I R L Y M I N S P L V R A E E M S
The similarity of the DNA sequences of the T. thermophila ilsA gene and the ILSl gene from S. cerevisiae is shown in matrix form in Fig. 6 (miniprint). Over the region where the S. cerevisiae sequence overlaps with the pC8 probe the identity is 63%. The high similarity between the coding regions of the two genes may be expected because S. cerevisiae and T. thermophila have similar codon biases (48).
The ilsA polypeptide is aligned with its S. cerevisiae, E. coli, and M. thermoautotrophicum counterparts in Fig. 7 (miniprint). The deduced amino acid sequences of the T. thermophila ilsA gene and the ILSl gene from s. cerevisiae share 47% of their amino acids and their percentage similarity, which includes functionally conserved residues, is 61%. The amino acid sequences are most similar (56% identical amino acids) over their amino-terminal two-thirds and least similar (28% identical amino acids) in the remaining third closest to the carboxyl ends. This trend is also noticed when the DNA sequences are compared (Fig. 6). We did pairwise comparisons for each of the ileRSs and determined amino acid identities and similarities (Table 1, see miniprint). The two eukaryotic ileRSs are more similar to the archaebacterial ileRS than to the homologous enzyme from E. coli (see also Ref. 36). Differences in size (see Fig. 7 B ) between these homologous enzymes appear to be the result mainly of differences at the carboxyl termini of the amino acid sequences. Computer analysis of the T. thermophila ileRS identified, within the carboxyl region, the presence of two leucine zipper motifs (42) consisting of 4 or 5 leucines spaced at 7-residue intervals with adjacent basic regions ( Fig. 8; see "Discussion"). The aligned S. cerevisiae amino acid sequence reveals some conservation of these motifs (Fig. 8). a-Helical wheel diagrams (66) of the T. thermophila leucine zippers are shown in Fig. 9.
We also compared the T. thermophila ileRS polypeptide to aaRS sequences specific for the other branch-chained aliphatic amino acids (valine, leucine, and methionine). Similarities between the aaRSs for these amino acids as well as highly conserved regions have been previously noted (29, 49). We aligned the T. thermophila sequences to the other aaRS sequences (4, 13, 14, 19, 29, 37, 49, 70, 73) using the GCG programs BestFit and GAP, and calculated the percentage identity by counting the number of shared amino acids and dividing this number by the length of the T. thermophila polypeptide. As observed for the S. cerevisiae enzyme (49), the T. thermophila polypeptide is as similar to the valRSs from S. cerevisiae (21%), E. coli (18%), and Bacillus subtilis (22%) as it is to the ileRS from E. coli (22%). It shares 16% of its amino acids with leuRSs and 14% of its amino acids with metRSs from S. cerevisiae and E. coli. Conserved regions shared between these enzymes, including the T. thermophila ileRS, have been described in a previous publication (49).

DISCUSSION
The putative polypeptide of the T. thermphila ileRS is composed of 1081 amino acids as derived from the open reading frame originating at the ATG at position +1 (Fig. 3). A polypeptide 20 amino acids longer (an amino-terminal extension) could be made if a transcribed in-frame ATG at position -60 is recognized as a functional translation start site. It is probable that the -60 ATG is not used as a translation start, or is used inefficiently for the following reasons: 1) it is at or very close to the transcription start sites (-60 to -70; Fig. 3) (40,65); 2) its sequence context TTATGA is suboptimal for initiation in other eukaryotes (see Ref. 39); 3) the region from the -60 ATG to the +1 ATG has a high A+T content (88%) and a high A content (57%) common for T. thermophila leader sequences but not coding regions (6).
Unlike the -60 ATG, the +1 ATG has all the characteristics of a typical T. thermophila start site (6).
If the ilsA gene does have two alternate translation start sites it could encode both the mitochondrial and cytoplasmic forms of the enzyme. This type of arrangement is seen in the differentially transcribed and translated nuclear genes for the S. cerevisiae hisRS (53) and valRS (57) and the differentially translated nuclear S. cerevisiae MOD5 gene (52). The longer forms of these enzymes have amino-terminal extensions needed for mitochondrial localization. Indeed, translation from ilsAs -60 ATG gives an amino-terminal extension (MILFKKLLIQKKVNYLSRLL) with 11 hydrophobic residues, 5 basic residues, and 1 hydroxylated residue; this is similar to what is seen in other mitochondrial targetting sequences (28).
Identities between the homologous enzymes of different aaRSs from S. cerevisiae and bacteria range from 20 to 50% (63). The T. thermophila ileRS is most similar in size and amino acid sequence to the S. cerevisiae ileRS, and both are more similar to the archaebacterial ileRS of M. thermoautotrophicum than to the eubacterial E. coli ileRS ( Table 1). The S. cerevisiae ileRS is more similar to the archaebacterial ileRS than is the T. thermophila ileRS (Table 1). This suggests that the T. thermophila ileRS has diverged more from the original eukaryotic ileRS than has the S. cerevisiae enzyme. The ileRSs are most similar in the amino-terminal two-thirds of their polypeptides (Fig. 7). This region of the class I aaRSs has been proposed and observed to form a three-dimensional structure called a Rossman nucleotide-binding fold made up of alternating @-strands and a-helices (5, 33, 64). Two con-  served short amino acid motifs (HIGH and KMSKS) that interact with ATP lie within this structure (21,33,34,49,59,64).

-
The T. thermophilu ileRS has two leucine zipper motifs (42, 71) in the carboxyl third of the protein (Figs. 7 and 8) that are somewhat conserved in the other ileRSs (Fig. 8). These are typical leucine zippers in that they form amphipathic helices that can associate as dimers (42,56,66). The aaRSs, including ileRSs, often occur in multienzyme complexes (2, 10,27). Since leucine zippers can form heterodimers (42, 56), they may be involved in the formation of such complexes (35,72). Leucine zippers have also been shown to form coiled coils (56); we calculated the probability of coiled coils being formed by the ileRS's leucine zippers using a recently described algorithm (43). Both of T. thermophila's leucine zippers, as well as the yeast equivalents, were found to have amino acid sequence distributions typical of coiled coils. Intriguingly, the amino-terminal helices of several other aaRSs have been found to form either a coiled coil (E. coli serRS; 12) or an amphipathic helix (human aspRS; 35) (yeast valRS; 9). Leucine zippers found in DNA binding proteins have a basic region on their amino-terminal side that contacts DNA (71). In the T. thermophilu ileRS, leucine zipper-1 has a short basic stretch following the last leucine of the leucine repeats, while leucine zipper-2 is preceded by a long basic stretch of amino acids (Fig. 8) present also in the S. cereuisiue ileRS and the M. thermoautotrophicum ileRS. The carboxyl terminus of the E. coli ileRS has a zinc finger-like motif (see Ref. 55) that aligns with the basic region of T. thermophilu ileRS's leucine zipper-2. Since zinc finger motifs are also found in DNA binding proteins, this suggests the presence of structurally different nucleic acid binding regions that may have been acquired independently by the eubacterial enzyme and the eukaryotic ancestral enzymes.
Unlike the other sequenced ileRS genes, the T. thermophilu gene has eight introns. As aaRS sequences from eukaryotes emerge it will be useful to compare the position of their introns with those of the T. thermophilu ileRS to see whether they interrupt sequences at conserved positions and whether they are positioned between structural or functional domains of the aaRSs, as is implicated by the positons of some ikA introns.
Numerous studies have indicated that the levels and synthesis of aaRSs in cells are positively correlated with protein synthesis rates (reviewed in Refs. 54, 57). This suggests that the production of aaRSs and other components of the translational machinery may be co-regulated at the transcriptional level. Indeed, in T. thermophila the transcription pattern of the ilsA gene parallels changes in protein synthesis. When cells are starved, protein synthesis is greatly reduced (46). When cells are refed, protein synthesis resumes and so does ilsA transcription. The ilsA transcript peaks in abundance within 1 h after a nutritional shift-up and then declines to transcript levels observed in growing cells (Fig. 5). The ikA transcription pattern during refeeding resembles that of the ribosomal protein genes (1, 25, 58), suggesting concerted activation of these genes during refeeding. When cells mate and undergo conjugation, a transient increase in the protein synthesis rate occurs in early meiotic prophase (46). In our studies3 (45), we find a similar transient increase in the transcription of the ilsA gene as deduced from mRNA abundance. Nuclear run-on experiments have shown that the increase in abundance of the ilsA RNA transcript during conjugation is the result of new transcription rather than of the stabilization of the mRNA (68). It is probable that conjugation in T. thermophilu induces the transcription of several components of the translational apparatus that are needed to provide the cell with the required components for protein synthesis. The ikA transcription start sites used were identical in conjugating and growing cells indicating that changes in transcript levels of the ilsA gene are not due to transcription from different start sites. It is also of interest to note that, unlike many other mRNAs, the ilsA transcript stays on polysomes and is not degraded during a heat shock (41). This is presumably because the ileRS is needed for the synthesis of heat shock proteins. It is likely that the ileRS gene will be controlled in a similar manner to other genes involved in translation and will share, for example, promoter elements, and transcription factors with these genes.

347-369
Biochemistry 26   In o w experiments mating YIP smewhat delayed w i t h over 75% mating attained by 4 h after mixing. There c u l t u~e l were also mnitored by DAPl staining and full micronuclear crescents (meiotic prophase stage I V ) were observed between 4 h and 6 h after nixing. pH 7.4. A 3 x Concentrate Of the complex growth medium (described above) was added t o achieve I n refeeding experimentr. nutrients were reinpored on cells t h a t wele starved i n 1 M l r i s P f i n a l medium COnEentlatiDn equivalent to that prior to starvation.
In these refeeding experimentr. growth medium was supplemented with 6.6 mA T r i l pH 7.4 t o match the medium O f the refed cellr. Cell