Structural relationships and the classification of aminoacyl-tRNA synthetases.

where AA represents one of the 20 naturally occurring amino acids and tRNAAA represents a transfer RNA specific for that amino acid. In all, there are 20 such enzymes (one for each amino acid) that share the role of assigning amino acids to triplet codons in genetic translation. The synthetases are among the oldest proteins, and in contrast to more recently evolved and functionally related proteins, such as hemoglobin and myoglobin, their functional similarity is not reflected in a uniform structural framework. (This relationship also contrasts with that of enzymes that share structural homology but are mechanistically distinct, e.g. mandelate racemase and muconate lactonizing enzyme (l).) The diversity of synthetase structures has been a problem of long standing, whose solution may lead to a more basic understanding of protein structure/function and evolutionary elationships. The diversity of synthetases is illustrated in Fig. 1, where the relative sizes and subunit compositions of the Class I and Class I1 synthetases from Escherichia coli are summarized.’ The enzymes range in quaternary structure from monomers (e.g. IleRS)’ to tetramers (AlaRS), in primary structure from 334 amino acids (TrpRS) to 1112 amino acids (PheRS), and in native molecular mass from 51,000 (CysRS) to 384,000 (AlaRS). The Class  I enzymes are chiefly monomeric (only the dimeric Tyr and Trp enzymes cannot function as monomers (4, 5); the Met enzyme is converted into a monomer by limited proteolysis (6)), while the Class I1 enzymes are entirely oligomeric. Based on a limited number of sequences, structural information (7,8) and structural modeling, a group of related synthetases that initially included isoleucine, methionine, tyrosine, and glutamine were recognized (9,lO). This group was later expanded to include the Arg, Glu, Leu, Trp, and Val-tRNA synthetases (11-15). Each member of the group contains a “signature sequence” (9, lo), an l l -mer peptide that ends in a characteristic and mostly conserved tetrapeptide, HIGH (13). Additional sequence comparisons identified a second similarity, the pentapeptide KMSKS, in the same enzymes that contain the HIGH region (16). However, all synthetase sequences do not contain the signature sequence and KMSKS peptides. Even with the development of more sophisticated sequence comparison algorithms (17) and secondary structure predictions (18), similarities that are common to all 20 enzymes were not found. Nevertheless, with sequences and the characterization of the quaternary structures of all 21 synthetases from E. coli3 and the information from five crystal structures (7, 8, 37-41), at least two different structural frameworks were

established. Based on limited sequence similarities (2) that correlate with these structural frameworks (38,41), each enzyme can be placed uniquely into one of two classes. The remarkable diversity in size and quaternary structure (Fig. l), together with an apparently primordial origin of these two classes, explains the early difficulties in identifying sequence alignments within this group of enzymes.

Structural Basis for Limited Sequence Similarities That Define Two Classes
The first three structures to be solved (MetRS, TyrRS, and GlnRS) provided a structural basis for some of the limited sequence similarities. All three structures contain a nucleotidebinding (or Rossmann) fold (42). This motif of alternating ahelices and 6-strands is found in enzymes that bind to an adenine nucleotide (e.g. the NADH cofactor of the dehydrogenases or the ATP substrate of the kinases). Furthermore, regions that correspond to the characteristic signature sequence and KMSKS motifs in these three structures have high three-dimensional similarity, and together they form part of the ATP binding site (43).
With the determination of the seryl-tRNA synthetase crystal structure and a complete set of sequences for the E. coli synthetases, the diversity of the synthetases was extended to their threedimensional frameworks. The structure of SerRS (37) is unlike that of the Met, Tyr, or Gln synthetases. The dominant structural motif of SerRS is an eight-stranded antiparallel p-sheet that somewhat resembles the NAD-binding motif of the enterotoxins (44), which contain a seven-stranded sheet, but bears no relationship to the Rossmann folds of the earlier structures. Moreover, in those synthetases that lack the sequence motifs mentioned above, three unique regions of degenerate sequence similarity (motifs 1-3; Ref. 2) were identified. The lengths and the sequences of the central parts of these motifs are as follows: motif 1 (18 amino acids), @G($XX($XXP($($; motif 2 (23-31 amino acids), @($($X($XXXFRXE()@$XeF; motif 3 (29-34 amino acids),

($G($G($G($ER($($($($().4
As with the signature sequence ll-mer, a mostly conserved tetrapeptide can be used to define two of the three motifs; for motif 2, this tetrapeptide is FRNE, and for motif 3, it is GLER. These motifs (and the signature sequence similarities mentioned above) are the basis for dividing the 20 synthetases into two groups of 10 each (Fig. 1). Further support for this grouping has come from a partial structure of the S. cereuisiae aspartyl-tRNA synthetase:tRNAAsp complex (41). This Class I1 enzyme contains a segment that is similar to the antiparallel p domain of SerRS. Based on the two structures, the active site of the Class I1 enzymes has been tentatively identified as being formed in part from amino acids that make up motifs 2 and/or 3. Motif 1 in the Class I1 alignment forms a portion of a subunit interface in both the SerRS and AspRS structures. This motif may be present because the Class 11 synthetases are predominantly dimers (Fig. 1). An exception is the tetrameric E. coli alanine enzyme, which can function as a monomer (45-47) and is naturally monomeric in higher organisms (for example, Bombyx mori; Ref. 48).
Subgroups within Class I Enzymes-Within the Class I synthetases, additional diversity is apparent. Each of the three crystal structures has a unique variant of the nucleotide-binding fold framework. In Fig. 2, the different nucleotide-binding fold topologies of the three known structures are aligned according to the position of the signature sequence and KMSKS motifs. In the enzymes for which no structures are yet known, we assumed that the three-dimensional dispositions of these two motifs are fixed relative to one another and to other secondary structure elements of the fold. Based on this assumption, the seven remaining Class I structures can be tentatively divided into three groups (Fig. 2).
In all three families, the first half of the nucleotide-binding fold contains the signature sequence, and in all but the Arg enzyme, this sequence occurs within the first 70 amino acids. The signature sequence occurs between an inner helix and @-strand within the N-terminal half of the nucleotide-binding fold. The catalytic residues that form the active site include those in a loop after strand E, which contains the KMSKS region. For TyrRS, whose signature sequence is found at residues 40-50, the Nterminal sequence forms the outermost antiparallel strand of the second half of the fold. Because the signature sequence of the Arg enzyme is also displaced (by 122 amino acids) from the N terminus, it is tentatively classified with Tyr rather than with Gln and Glu. Furthermore, the Trp enzyme is not classified with Tyr because its signature sequence is found only 10 residues from the N terminus, not far enough to allow for the extra strand.
Additional differences between the families are found in the connectivity within the nucleotide-binding fold. The largest family corresponds to the methionyl-tRNA synthetase structure and contains synthetases which activate the hydrophobic amino acids Ile, Leu, Met, and Val, as well as Cys (which can also be classified as hydrophobic; Ref. 49). This family has both the largest and the smallest Class I enzymes, with the differences in size concentrated in the insertions that are designated connective polypeptide domains CP1 and CP2 (10,22). The connectivity differences between families are mainly in the second half of the fold, where the Met family contains a second connective polypeptide (CP2) between the first and second strands (D and E), the Gln family has CP2 between the second and third strands (E and F), and the Tyr family has no identifiable CP2. The alignment and subgroup classification is assisted by a short stretch of similarity that defines the first strand of the second half of the fold (strand D; Ref. 27), which allows the end of CP1 and the start of CP2 to be located.
These specific differences between Class I enzymes should not obscure their close relationship. A recent comparison of the Gln and Met enzyme structures (50) shows that the two enzymes, although in different families, share a rare "left-handed crossover" between strands E and F (where the corresponding strands of TyrRS are not directly connected). This topology places the signature sequence and KMSKS motifs on the same side of the 0-sheet. In the G1nRS:tRNA"'" cocrystal, strand F fits into the "armpit" region of the L-shaped tRNA molecule, between the minihelix domain and the D-anticodon domain (Fig. 3). The lefthanded crossover, together with the unusually short strand "D," provides part of the docking interaction for the acceptor stem of tRNA.

Motifs for Interaction of Class I Enzymes with tRNA
In the Class I Gln-and Met-tRNA synthetases, two separate domains are used for interactions with the two domains of the Lshaped tRNA structure (51-53; Fig. 3). The acceptor-T\kC minihelix domain of tRNA"'" interacts with the insertions into the nucleotide-binding fold framework of GlnRS by three a-helices, with an antiparallel @-loop that separates the 3' and 5' strands of the first base pair of the helix (39). Contacts with the 3:70 base pair are made by Asp23s at the N terminus of the helix that follows strand D (Fig. 2), the same point in the nucleotide-binding fold as the CP2 insertion in the methionine family.
A separate C-terminal domain interacts with the distal domain of the tRNA, which is comprised of the dihydrouridine (D) stemloop and the anticodon stem-loop (Fig. 3). The anticodon-binding domains of the Class I synthetases have at least two divergent structures; in methionyl-tRNA synthetase, this domain is predominantly a-helical, while in GlnRS, it is a @-barrel. By sequence analysis and structural prediction, the a-helical motif of the Met enzyme is also predicted for the other Class I synthetases of the Met family, the Cys, Ile, Leu, and Val enzymes (22). In addition to GlnRS, the Glu synthetase is predicted to have a @-barrel anticodon recognition domain, because it is likely to have been derived from a common ancestor. (In Bacillus subtilis, there is a single enzyme which aminoacylates both tRNA"'" and tRNA"'" (Ref. 55), and in E. coli, the C-terminal domains of the Gln and Glu enzymes share sequence similarity (Ref. 15).) The Trp enzyme falls into the Gln family based on the predicted nucleotidebinding fold topology, but current structural information (56) is not sufficient to identify a @-barrel in the C-terminal domain.

Assembly of Class I and Class 11 Enzymes
The assembly of Class I and Class I1 structures from individual domains is depicted in Fig. 4. The modular arrangement of functional domains common to both classes of synthetase was first suggested by deletion analysis of the Class I B. stearothermophilus tyrosyl-tRNA synthetase (57) and of the Class I1 E. coli alanyl-tRNA synthetase (45-47), where domains could individually be isolated and investigated. From genetic and biochemical studies, the Class I1 alanyl-tRNA synthetase appears to have a domain organization that resembles the Class I enzymes (11). It has an N-terminal amino acid activation domain, with insertions of sequences that contact the acceptor stem of bound tRNA (58, 59). This domain is followed by another motif that contacts tRNA (60), possibly via the D stem-loop and anticodon stem (61), followed by an oligomerization domain (44)(45)(46).
Unlike the Class I enzymes and alanyl-tRNA synthetase, the AspRS cocrystal with tRNA (41) demonstrates that contacts outside of the acceptor stem are mediated by an N-terminal domain, followed by a small domain (containing the motif 1 sequence) that is involved with oligomerization. This in turn is followed by an antiparallel @ domain that contains an insertion (or insertions) to contact the acceptor stem of the tRNA. As noted above, the topology of the Class I1 SerRS resembles AspRS, although the oligomerization domain is formally an insertion occurring after the first @-strand of the antiparallel 0 domain. The Ser enzyme has an unusual antiparallel a-helical coiled coil domain of 100 amino acids that protrudes from the N terminus of the catalytic domain. Because of its unusual shape, this domain is thought to be involved in tRNA binding. In the order of their functional domains from N to C terminus, the Class I1 enzymes differ both from the Class I enzymes and from one another.
Because of these differences, the Class I1 synthetases seem in general to be more variable in structure.
The reasons for the different quaternary structures within the two classes of synthetases remain obscure. For some synthetases, oligomerization is apparently required for function. In the B. stearothermophilus Tyr enzyme, this functional requirement seems to be due to the binding of the tRNA across the dimer interface (62). However, in at least some instances, the quaternary structure can be manipulated. For example, when the genes encoding the LY and @ subunits of the Class I1 glycyl-tRNA synthetase are artificially fused, an active enzyme results (63), and, in a different study, the quaternary structure of E. coli MetRS was artificially changed from a2 to app2 (64).
Additional diversity within the synthetases is provided by a variable requirement for divalent zinc. At least four enzymes, the Met, Ile, Trp, and Ala synthetases, have been shown to require this metal (65-68), and several other synthetases have sequence motifs that suggest interactions with divalent zinc (68, 69). Alanyl-tRNA synthetase contains a CX2CXtiHX2H "Cys-His box" sequence (68) that is found in the gag proteins of retroviruses, and is believed to be important for RNA packaging (70-73).

Additional Functions, Relationships to Other Proteins, and Evolution
Other functions of synthetases include roles in RNA splicing for the mitochondrial TyrRS in Neurospora crussa (74) and yeast mitochondrial LeuRS (75), transcriptional (76) and translational control (77)(78)(79). In one case, the novel function has been attributed to an additional sequence located at the N terminus of the nucleotide-binding fold (80).
Weak similarities of synthetases with other proteins have been observed in at least three cases. First, the sequence of the CP2 region of E. coli LeuRS bears a significant similarity to the leucine-specific binding protein from the same organism (81), suggesting a role for this insertion in amino acid recognition. Second, the 180-kDa GCN2 protein, which regulates amino acid biosynthesis in S. cereuisiae, has an extensive (60 kDa) segment that is similar to histidyl-tRNA synthetase (82). Finally, the putative catalytic domain of aspartyl-tRNA synthetase has sig-nificant sequence similarity to the ammonia-dependent asparagine synthetase of E. coli (83). One region of similarity includes motif 3. Because both enzymes proceed via an adenylate intermediate, this similarity supports the assignment of motif 3 to the active site.
The establishment of two broad classes of synthetases may have occurred early. There is no example of a "class switch" of a synthetase in evolution. The eubacterial (B.