Iron Superoxide Dismutase NUCLEOTIDE SEQUENCE OF THE GENE FROM ESCHERICHZA COLI K12 AND CORRELATIONS WITH CRYSTAL STRUCTURES*

The nucleotide sequence of the iron superoxide dis- mutase gene from Escherichia coli K12 has been determined. Analysis of the DNA sequence and mapping of the mRNA start reveal a unique promoter and a puta- tive p-independent terminator, and suggest that the Fe dismutase gene constitutes a monocistronic operon. The gene encodes a polypeptide product consisting of 192 amino acid residues with a calculated M. of 2 1,111. The published N-terminal amino acid sequence of E. coli B Fe dismutase (Steinman, H. M., and Hill, Acad. Sci. A. 70,3725-3729), along with the sequences of seven other peptides reported here, was located in the primary structure deduced from the K12 E. coli gene sequence. A new molecular model for iron dismutase from E. coli, based on the DNA sequence and x-ray data for the E. coli B enzyme at 3.1 A resolution, allows detailed comparison of the structure of the iron enzyme with manganese superoxide dismutase from Therm- thermophilus HBS. series of aromatics in the sequence 100-118 provided unambiguous markers in this region. We were able to position the entire sequence by following the trace (14) of the polypeptide At several chain was sive

polypeptide sequence of the Fe superoxide dismutase from Photobacterium lewgnathi (13). In addition, x-ray structures of the Fe superoxide dismutases from Escherichia coli B (14) and Pseudomonas ovalis (15) and the Mn superoxide dismutase from Thermus thermophih HB8 (16) have been reported. Taken together, these data suggest that Fe superoxide dismutases and Mn superoxide dismutases evolved from a common ancestor unrelated to that of copper/zinc superoxide dismutases. The crystal structures have demonstrated that Fe superoxide dismutases and Mn superoxide dismutases share a common polypeptide fold which is completely unlike that of Cu/Zn superoxide dismutase, and correlation of side chain shapes in the Mn superoxide dismutase electron density function with known sequences has identified the four protein ligands to the Mn(II1) cofactor. The strong similarity in the polypeptide fold of Mn and Fe superoxide dismutases (17) and identities in their amino acid sequence alignments (18, 19) implied that ligands to the Fe(II1) and Mn(II1) cofactors would be chemically identical residues located in equivalent positions of the three-dimensional structure.
The iron superoxide dismutase gene from E. coli K12 has been cloned previously (20,21). We have determined its nucleotide sequence and compared it with the reported gene sequence for E. coli K12 Mn superoxide dismutase (6). The protein sequence deduced from the cloned DNA is shown to be consistent with the sequence of tryptic fragments of the enzyme from the E. coli B bacterium. The sequence has been used to complete the three-dimensiopal model of the enzyme and to interpret details of the 3.1-A electron density map, particularly in the metal-ligand cluster and its environment. The results confirm the structural equivalence of the cofactor ligands in Fe and Mn dismutases and demonstrate that the ligand clusters are surrounded by remarkably similar chemical milieu. Comparisons of the protein sequence with other superoxide dismutases not only verify the homologies within the Fe and Mn superoxide dismutase family of enzymes but also locate variable regions in the structures.

MATERIALS AND METHODS
Bacterial Strains and Plasm&-Bacterial strain E. coli K12 71/18 (22) was used as host in the construction of the plasmid bank, and E. coli K12 QC774 (23) was used in complementation testa for superoxide dismutase activity. Plasmids pHS1-6 and pHS1-8, carrying the sodB gene, were used for DNA sequencing (Fig. 1). They carry bacterial DNA inserts which have been obtained by partial deletions of the insert carried by pHS1-4 (20).pEMBL19 (24), a derivative from pUCl9 (25) carrying the fl origin of replication, was used as a vector for DNA sequencing.
Reagents and Enzymes-DNA restriction enzymes, T4 DNA ligase, DNA polymerase I, polynucleotide (26). Single strand DNA was prepared as described (22) using the M13 derivative, M13K07 (Pharmacia, Uppsala, Sweden) as superinfecting phage instead of fl; the DNA sequence was determined by the Sanger dideoxy chain termination method (27). Ambiguities in a GCrich region were eliminated by using inosine instead of guanine in sequence reactions (28).
Determination of the Starting Point of the Transcript-The mRNA start was localized by primer extension (29). RNAs were prepared as described (29) from strain AB2463/pHS1-4 (20). DNA probes were labeled using the Maxam and Gilbert method (30), and reverse transcriptase DNA synthesis was performed according to Dkbarbouillk and Raibaud (29). Products of the reactions were analyzed on a 7% polyacrylamide gel.
Sequence Comparisons-Comparisons of primary structure were performed at the Centre Interuniversitaire de Traitement de l'Information (CITI 2, Paris) using the DIAGON program of Staden (31) which incorporates the MDM78 scoring system (32,33).
Purification and Sequence Determination of Peptides from Superoxide Dismutase of E. coli B-Using previously published methods, the Fe superoxide dismutase was purified from E. coli B (34), then reduced and S-carboxymethylated with [I-"C]iodoacetic acid, and digested with tosylphenylalanyl chloromethyl ketone-treated trypsin (35). Tryptic peptides were initially fractionated on Sephadex G-50SF in 0.1 N ammonia, 5% (v/v) 1-propanol, and then individual peaks were rechromatographed on Lichrosorb RP8, using a gradient of 0-60% 1-propanol in 0.1% (v/v) trifluoroacetic acid. Purified peptides were sequenced by a micro version of the manual Edman degradation (36) at the University of Michigan Protein Sequencing Facility.
X-ray Analysis of Fe Superoxide Dismutase from E. coli B-The original Fourier map, calculated at 3.1 A using multiple isomorphous replacement phases, was modified by averaging of the crystallographically independent subunits (14). Calculated phases derived by Fourier inversion of the solvent-leveled averaged map were combined (37) with starting phases, and the resulting map was again averaged and used for rebuilding of the Fe superoxide dismutase model on an Evans and Sutherland graphics display with the aid of FRODO (38). The building was guided by the locations of C, atoms which had been positioned in the earlier map.
Fitting the sequence derived from E. coli K12 DNA to the density relied on recognition of characteristic residue shapes, particularly tyrosine or phenylalanine, tryptophan, histidine, proline, and leucine. For example, the series of aromatics in the sequence 100-118 provided unambiguous markers in this region. We were able to position the entire sequence by following the published trace (14) of the polypeptide chain. At several locations the side chain density was not extensive enough to accommodate the residues expected from the sequence. With the moderate resolution of the current map, we cannot discern whether these discrepancies represent real differences between the Fe superoxide dismutase of E. coli strains B and K12. However, the x-ray results do support the presence of the seven tryptophan residues deduced from the DNA sequence.

Nucleotide Sequence Determination of the Iron Superoxide
Dismutase Gene (sodB) of E. coli K12"We have previously cloned the Fe superoxide dismutase gene in plasmid pHS1-4 (20). In the current study, the location of the coding region within the plasmid and its direction of transcription were determined prior to sequencing. The 5' end of the structural gene was deduced to lie about 600 bp' upstream of the EcoRI site ( Fig. l), given that the polypeptide chain is about 200 amino acid residues in length. This location was derived from restriction mapping of the sodB-kun fusion previously obtained by insertion of Mu transposons into the plasmid (23) and by analyzing plasmid subclones for complementation of a sodA sodB double mutant (for growth on minimal medium) and expression of Fe superoxlde dismutase activity in a wild type strain, as previously described (20,23,39).
The sequence of 970 bp, which includes the structural gene and flanking regions, was determined (Fig. 3). The sequencing strategy is summarized in Fig. 2. A single open reading frame was found beginning at the ATG at nucleotide 177. Tandem termination codons appear at nucleotides 756 and 759. A putative Shine-Dalgarno sequence (40) was identified 7 bp upstream of the ATG codon, and a putative transcriptional terminator containing an inverted repeat preceding a stretch of T residues (41) was identified at nucleotides 887-908. It is of interest to note that low Fe superoxide dismutase overproduction was observed in strains harboring pHS1-6 ( Fig. 1) in which this putative transcription termination signal is deleted, beginning at the ClaI site (nucleotides 818-823 in Fig.   3).
The 5' end of the mRNA was identified as A-122 by primer extension mapping (Fig. 4). The corresponding promoter sequences at -10(TAcccT) and -35 (TTGtCt) agree well with known consensus promoter sequences (42,43) and suggest a rather strong promoter (homology score, 53.8%) as predicted by the rules established by Mulligan et al. (44). This is in good agreement with the high in vivo level of the protein and with the high level of neomycine phosphotransferase in the SO&kan fusion.' The coding region of 576 bp predicts an amino acid sequence of 192 residues and a subunit molecular weight of 21,111 after methionine cleavage. The amino acid composition derived from this sequence is in good agreement, except for tryptophan, with that determined by hydrolysis of Fe superoxide dismutase from E. coli B (45). Furthermore, the amino-terminal 29 residues agree exactly with the amino-terminal The abbreviation used is: bp, base pair(s). A. Carlioz and D. Touati, unpublished observation.
Sequencing strategy. The DNA sequence was determined using the Sanger dideoxy chain termination method on subfragments cloned in the plasmid pEMBL19 (see "Materials and Methods"). The Pst fragment is from pHSI-8 (see Fig. 1). The arrows indicate the restriction sites used as well as the direction and extent of the sequences. 0, indicates fragments obtained by Ba131 nuclease digests; other fragments were obtained by digestion to the relevant restriction sites. The nucleotide sequence has been completely read on both strands, each part of the sequence has been deduced from several independent experiments, and each restriction site used for sequencing was also sequenced from a distinct site. The abbreviations for restriction enzymes are: N, NdeI and as in Fig. 1.

C~-A C A A T A A C C C T A T T~C C A A T A *~A A T A A A C C A C A C T A C C A
-sh  51-57, 79-91, 81-91, 92-107, 108-116, and the first 6 ligands.
residues of a peptide encompassing amino acids 30-43 (beyond residue 35, the sequence of this last peptide could not be established with certainty). Homologies in Sequence and Structure-Pairwise comparisons of the primary structures of four Mn dismutases (10) and of the Mn sequences with the recently published sequence of the Fe enzyme from P. leiognuthi (13) have shown substantial homologies, despite some variability in chain length and in residue composition. Using the alignment program of Staden (31), we have extended these comparisons to include the new sequence for Fe dismutase from E. coli. While it is evident from Fig. 5 that large sections of the sequences for Mn and Fe dismutases from E. coli are identical, the matching criteria used in Fig. 6 also allow definition of regions that retain sequence similarities and indicate the loci of insertions or deletions. The x-ray structures of Mn superoxide dismutase from T. thermophilus (16) and Fe superoxide dismutase from E. coli B (14) were aligned in order to compare structural variations with differences detected solely from sequences. With the aid of the structures, insertions or deletions can be located with respect to characteristic secondary structural features of the molecule. Fig. 7    Thus, the available three-dimensional structures support the notion that a region with conformational variability occurs around residue 50.
Comparisons of all known Fe and Mn superoxide dismutases at the level of primary sequence (Fig. 6) suggest the generality of genetic variations in the vicinity of residues 45-65, showing divergence in both length and composition in this region. Variability in chain length near positions 90 and 150 (Fe superoxide dismutase numbering) is also evident. Examination of the x-ray structures places the latter insertion/ deletion in the crossover connection between the second and third strands of an otherwise conserved P-sheet; the crossover must be longer in E. coli Mn superoxide dismutase than in the Mn enzyme from T. thermophilus (see Fig. 7B, legend). The segment near residue 90 connects the two domains and has been thought to function as a hinge in the folding and unfolding of the subunit (5, 16) even though its length and composition vary from species to species. It is worth noting that Mn and Fe superoxide dismutases from E. coli do not cross-react with polyclonal antibodies (39); while differences in composition and surface charge, which are distributed all along the chain, may account in part for antigenic differences, it is likely that the variable regions also play roles in immunogenicity and antibody recognition.
The Metal-binding Site-The full three-dimensional model of the iron-binding site of E. coli dismutase, constructed as described under "Materials and Methods," is represented in Fig. 8A. The new sequence information establishes the identity of the residues which serve as metal ligands; these are His-26, His-73, Asp-156, and His-160 (13,18). As can be seen by comparing panels A and B of Fig. 8, the three-dimensional similarities of the Fe and Mn proteins extend beyond the ligands to the next shell of residues that constitutes the metalligand environment. Almost every residue that penetrates the metal-ligand environment is conserved in the known sequences of Fe and Mn dismutases. The only exceptions found so far are at Fe superoxide dismutase positions 76 (tyrosine or phenylalanine), 69, and 141. In Fe superoxide dismutase residue 69 is glutamine and 141 is alanine, whereas the corresponding residues in Mn superoxide dismutase are glycine and glutamine. These differences represent the exchange of a glutamine from the first domain with one from the second. In both Fe and Mn enzymes, this glutamine functions structurally as a bridge between a tyrosine residue a t 34 and a tryptophan a t 122 (Fig. 8).
T o some extent the stereochemical equivalence of the environments of the Fe and Mn ligand clusters is surprising since a number of superoxide dismutases displays in vitro metal binding selectivity. Early reconstitution studies with several bacterial Fe and Mn superoxide dismutases (47, 48), including those from E. coli (49), indicated that the apoenzymes were able to rebind several metals but were active only with the native metal. Although no direct evidence concerning the site of ligation of the inactive metal was provided, incorporation of a n "incorrect" metal inhibited binding of the native metal. These results, therefore, seemed to anticipate structural differences in the metal-binding centers of Mn superoxide dismutase and Fe superoxide dismutase. Although the interdomain relocation of an active center glutamine residue is intriguing, it is not clear how either this exchange or the Tyr-Phe substitution near the second ligand could account for the observed selectivities. More recent reports have identified superoxide dismutases, from other organisms, which are reactivated from the apoenzyme by either Mn or In view of the near identity of the metal-binding site in the known structures, it is noteworthy that purified Fe-protein from E. coli contains virtually no Mn while the purified Mnprotein from E. coli contains virtually no iron (35, 54, 55).
Moreover, the levels of the two proteins are responsive to the amounts of the respective metal ion provided in the culture medium (8, 56, 57). Knowledge of the molecular basis of this in vivo metal-binding selectivity may be important in understanding the biological behavior of these proteins. and several other residues are truncated at CB for clarity. Residues 159B and 163B (thin bonds) from the adjoining subunit penetrate into the metal-ligand environment. The extensive system of imidazole and aromatic rings which partially encloses the metal-ligand cluster is maintained by hydrogen-bonding interactions and by a herringbone network of aromatic packing interactions (59). An additional interaction which stabilizes this system of rings is a bridge in which the side chain amide moiety of Gln6' links Tyr34 with TrplZ2. The electron density is consistent with ligation of a water molecule (not shown) to Fe at the same site where solvent is observed in the Mn superoxide dismutase structure. B, the active center of Mn superoxide dismutase from T. thermophilus. The residue assignments are in agreement with those of the known sequences (7)(8)(9)(10). Except for residues 86 and 151, the amino acids are identical with those found in the Fe superoxide dismutase structure, represented in Fig. 8A, and the two active sites can almost be superimposed. The bridge formed by residue 151 is functionally like that made by residue 69 in the Fe superoxide dismutase structure, although the C, atoms are not structurally equivalent. Coordinates used to create this figure are based on the crystallographic model which has been described previously (5,17,19) but which now has been partially refined at 1.8-A resolution using data collected at the Multiwire Detector Facility, University of California, San Diego. Difference density maps calculated at this resolution indicated a missing residue following position 124. The present numbering reflects the insertion of this residue.