Isolation , Characterization , and Expression in Escherichia coli of the DNA Polymerase Gene from Thermus aquaticus ”

The thermostable properties of the DNA polymerase activity from Thermus aquaticus (Taq) have contributed greatly to the yield, specificity, automation, and utility of the polymerase chain reaction method for amplifying DNA. We report the cloning and expression of Taq DNA polymerase in Escherichia coli. From a Xgt1l:Taq library we identified a Taq DNA fragment encoding an epitope of Taq DNA polymerase via antibody probing. The fusion protein from the Xgtl1:Taq candidate selected an antibody from an anti-Taq polymerase polyclonal antiserum which reacted with Taq polymerase on Western blots. We used the Xgt 11 clone to identify Taq polymerase clones from a XCh35:Taq library. The complete Taq DNA polymerase gene has 2499 base pairs. From the predicted 832-amino acid sequence of the Taq DNA polymerase gene, Taq DNA polymerase has significant similarity to E. coli DNA polymerase I. We subcloned and expressed appropriate portions of the insert from a XCh35 library candidate to yield thermostable, active, truncated, or full-length forms of the protein in E. coli under control of the lac promoter.

11 To whom correspondence and reprint requests should be addressed 1400 53rd St., Emeryville, CA 94608.
Taq Pol I only at the beginning of the PCR reaction rather than before each round of amplification.
A 62-63-kDa Taq Pol I has been purified from T. aquaticus, but growing the organism is more difficult than E. coEi and polymerase yields are low (4,5). We have developed an alternative purification protocol' yielding a 94-kDa enzyme with 10-20 times higher specific activity than that previously reported. While the activity yield is quite high (40-60%), the initial expression level of Taq DNA polymerase in the native host is quite low (0.01-0.02% of total protein). Therefore, we sought to clone the Taq Pol I gene and express the gene in E.
coli. In addition, the availability of the enzyme and the DNA sequence of the Taq DNA polymerase gene will facilitate the study of structure/function relationships and permit detailed comparisons with mesophilic DNA polymerases.
with only the candidate containing that insert. Subsequent DNA sequencing of two 115-bp EcoRI inserts, one each from the 12-mer and 10-mer libraries, confirmed that they were identical sequences. DNA sequence analysis of Taq and flanking lacZ DNA for the candidate from the 12-mer library indicated the presence of one EcoRI linker at its 5' lacZ junction. DNA sequence analysis of the T q and flanking lacZ DNA for the 115-bp candidate from the 10-mer library indicated the presence of three EcoRI linkers at the 5' lmZ junction, which resulted in the same frame with respect to 8galactosidase as that of the 12-mer linker candidate. Thus, we picked DNA fragments encoding the same epitope from two libraries.
Lysogens were made of all the candidates in strain Y1089 and were induced with isopropyl-1-thio-B-D-galactopyranoside (IPTG). Total proteins from crude lysates of induced cultures were run on SDS-PAGE gels, and Western blots were prepared by using the anti-Taq Pol I antibody for detection.
All of the clones made IPTG-inducible, lmZ-fusion proteins which reacted with the anti-Taq Pol I antibody (data not shown).
One clone each from the 115-,125-,160-, and 175-bp insert size classes was chosen for epitope selection. This method uses crude extracts of candidate clones to select antibodies from a polyclonal antiserum. These affinity-selected antibodies were used to probe Western blots of Taq Pol I. The results are shown in Fig. 1. In two experiments candidate X g t l l 1, the 115-bp insert candidate, was the only one of the four tested which successfully bound antibody that reacted with purified Taq Pol I and reacted uniquely with Taq Pol I in crude extracts. The other three candidates, which had been identified and purified with the anti-Taq Pol I antibody, failed to "fish" from that same polyclonal antibody an antibody that would react with Taq Pol I on a Western blot. A close inspection of the Western blot indicates a faint cross-reaction with 28-30-kDa proteins in total soluble Thermus crude extracts. The DNA sequences of these three candidates do not correspond to any part of the Taq Pol I DNA sequence (Fig.  2).
XCh35 Libraries-The 115-bp EcoRI fragment from clone X g t l l 1 was subcloned into Genescribe Z vector pTZ19R to use as a probe in screening the XCh35:Taq library. Construction of the partial Sau3A digest library of Taq DNA in XCh35 and screening of the library are detailed under "Materials and Methods," in the Miniprint. The in vitro packaged library was plated initially on E. coli strain K802. That strain was chosen to avoid the possibility of degradation of T q insert DNA by anti-Taq Pol I antibodies affinity purified with extracts of induced Xgtll clones 1,3,9, and 2-11, respectively. the mcrA or mcrB restriction systems (6). The amplified library was subsequently plated on E. coli strain MC1000. Nine candidates were isolated and purified from the XCh35:Taq library. From restriction analysis of mini DNA preparations, none of the candidates proved to be identical, though they all shared some common restriction fragments. Upon Southern blotting, the pTZ19R1 probe hybridized to a common 4.2-kb BamHI fragment and a common 6.5-kb PstI fragment in all the candidates, consistent with the hybridization seen in Southern blots of Taq genomic DNA (Fig. 3). For HindIII, the probe hybridized to fragments of different sizes, ranging in size from 5.6 to 10 kb. In addition, all nine candidates shared a common 4.5-kb HindIII fragment.
One candidate, designated 44-2, had a probe-hybridizing HindIII fragment of approximately 8 kb which corresponded to the HindIII fragment that hybridized with probe 1 in the Taq genomic Southern (Fig. 3). We chose this candidate for further study and subcloned each of its four detectable HindIII fragments (A = 8 kb, B = 4.5 kb, C = 0.8 kb, and D = 0.5 kb) into vector BSM13' in both orientations, transforming into host DG98. The two subclones of fragment A in both orientations, pFC82.35 and pFC82.2, were IPTG-induced and extracts were assayed for Taq Pol I activity (Table I). Subclone pFC82.35 had IPTG-inducible thermostable activity at a very low level, which was detectable because of the high sensitivity of the assay ( 4 molecule/lO cell equivalents). In contrast, pFC82.2 had a significantly lower basal level of Taq Pol I activity which was attenuated in extracts of IPTG-grown cultures.
A restriction map of the A fragment was generated and is shown in Fig. 4. Southern analysis showed that the X g t l l 1 probe hybridized at one end of the A fragment. Indeed, the DNA sequence of the AluI genomic fragment isolated in Xgtll 1 corresponds to nucleotides 619-720 in the Taq Pol I gene ( Fig. 2). Further, the EcoRI-adapted AluI site at the junction between E. coli lacZ and Taq in Xgtll 1 corresponds to the lac promoter-proximal Taq HindIII site in pFC82.35.
Deletions in the A Fragment to Localize the Taq Pol Gene-Two different deletions were made in the A fragment in pFC82.35 to aid in localizing the gene. In pFC84, approximately 2.4 kb of the right end of the A fragment was deleted from the SphI site ( Fig. 4) rightward to the SphI site in the vector polylinker. In pFC85, approximately 5.2 kb of the right end of the A fragment was deleted from the Asp718 site rightward (Fig. 4) to the Asp718 site in the vector polylinker, leaving 2.8 kb of Taq insert sequence. The activity of Taq Pol I was assayed in extracts of uninduced and IPTG-induced pFC84 and pFC85 in DG101. As can be seen in Table I, deleting 3' sequences in the A fragment had a dramatic effect on the IPTG-inducible expression of Taq Pol I. In addition, while we were unable to detect Taq Pol I in Western blots of IPTG-induced pFC82.35DG98, induced immunoreactive bands were clearly seen upon Western blotting of IPTGinduced pFC84/DG101 and pFC85/DG101 (Fig. 5). In the Western blots, induced pFC84/DG101 and pFC85/DG101 lanes revealed doublet immunoreactive bands that were approximately 65-and 63-kDa. These immunoreactive species were considerably smaller than full-length 94-kDa Taq Pol I. We determined that the doublet bands were not artifacts of the gel analysis because they were seen repeatedly in several experiments.
LacZcr Fusions-To define further the locus of the Taq Pol I gene and to confirm the reading frame at different sites for use as guideposts during DNA sequence analysis, we constructed several fusions of the left end of the Taq HindIII A fragment to lacZcr in the BSM13' vector. These fusions are V a l G l n A l a V a l T y r G l y P h e A l a L y s s~~L e u L e u L y s A l a~e u L y s G l u A s p G l y A s p A l a V a l I l e V a l V a l P h e A s P A l~~Y~~~~~~G S

XhoI
A r g L e u A l a L y S G l u V a l M e t G l u G l y V a l T y r P r o L e u A l a V a l P r o L e u G l u V a l~l~v~~~~~~~~~~~~~~~~~~~~~~ *  A background of 0.004% input counts has been subtracted. Extract protein corresponding to 3 X lo' cells was assayed.
Purified Taq DNA polymerase was added to a replicate cell pellet at time of lysis. The assay contained 4 X 10' molecules of Taq Pol I. "Purified Taq Pol I, corresponding to 4 X lo7 molecules, was admixed with the BSM13+ extract at time of assay. e A background of 0.002% input counts has been subtracted. BSM13' specific activity represents two times background. described under "Materials and Methods" and are summarized in Table 11. Using these fusions we determined the reading frame of Taq Pol I at the NheI site at nucleotide 2043, the BamHI site at nucleotide 1780, and at four locations at or leftward of the XhoI site at nucleotide 1408.
Assembling the Full-length Taq Pol I Gene-As described above, the SphI and Asp718 deletants, pFC84 and pFC85, produced thermostable polymerase activity upon induction. However, the size of the induced bands detected by anti-Taq Pol I antibody in Western blots was smaller than full-length  Taq Pol I, i.e. approximately 65-kDa as opposed to full-length 94-kDa. Thus, we felt that the A fragment lacked the 5' portion of the gene which would encode the N terminus.
Also mentioned earlier, all candidates from the XCh35 library which had been identified with the pTZ19R 1 probe shared a common, approximately 4.5-kb HindIII fragment which did not hybridize to the probe. This fragment, the B fragment, was subcloned into BSM13', yielding plasmid pFC83. The restriction map of the B fragment was determined (Fig. 4). By comparing those mapping results and the A fragment map with the results of Taq genomic Southern blots probed with probe 1 (Fig. 3) we deduced that Hind111 fragment B was likely to contain the 5' portion of the Taq Pol I gene.     (7) and may comprise a portion of the ribosome binding site for initiation of translation at the first ATG.

DISCUSSION
Several groups have reported the cloning and expression in E. coli of genes from thermophiles: malate dehydrogenase (mdh) from Thermus flnvus (8), P-isopropylmalate dehydrogenase (le&) from Thermus thermophilus (9), and the Tag1 restriction-modification system from Thermus aquaticus (10). Iijima et al. (8) selected the mdh gene from a T. flnvus partial HindIII library in pBR322 by screening crude extracts of pools of independent library transformants at 60 "C for malate dehydrogenase activity. Nagahari et al. (9) selected directly for expression of the leuB gene in E. coli. Although the activity of the enzyme at 37 "C was quite low compared to its activity at 75-80 "C, they were able to recover clones which complemented a leuB mutation in the E. coli host. Slatko et al. (10) also selected directly for expression of TaqI methylase in TuqpBR322 libraries. However, TaqI endonuclease appeared not to be active at 37 "C in E. coli, since clones with only the restriction gene were viable in the absence of modification.
Several groups have also reported cloning and expression of DNA polymerases in E. coli. Kelley et al. (11) cloned the structural gene for DNA polymerase I (Pol I) from E. coli in X bacteriophage. They observed polymerase activity in the transducing phage at a level of approximately 4% of total cell protein. However, they were unable to maintain a plasmid harboring the PolA' gene, probably because overproduction of Pol I in E. coli is lethal to the cell. More recently, T4 DNA polymerase has been cloned and expressed in E. coli (12). In this case, it was necessary to clone the gene under control of inducible promoters such that constitutive expression of the gene would be minimal. Attempts to clone the gene under control of its own promoter in E. coli were unsuccessful, probably because of the detrimental effect the polymerase had on the cell. We did not know if Taq Pol I would be toxic to E. coli cells at 37 "C. While the in vitro specific activity of Taq Pol I at 37 "C is only a few percent of the specific activity at 75 T , * we could not predict if the DNA binding activity of the enzyme might interfere with normal cell function. To avoid potential problems related to direct expression of the gene in E. coli we chose to clone an epitope of the Taq Pol I gene by using Xgtll libraries and antibody selection. The epitope-expressing clone was subsequently used to select the entire Taq Pol I gene from a library in XCh35.
We were unable to detect a thermostable polymerase activity in cells infected (11) with any of the XCh35 clones, including 44-2. The polymerase assay is extremely sensitive and can detect 1 molecule of polymerase per 10 cell equivalents. Upon subcloning of the 8-kb probe-hybridizing HindIII A fragment from 44-2 into BSM13+ and IPTG induction of the subclone pFC82.35, a low level of thermostable polymerase activity was detected (Table I). Based on the activity of purified Taq Pol I when admixed with E. coli cells, this activity represents two to three molecules of Taq Pol I per cell equivalent. The gene was localized to one end of the 8.0-kb HindIII A fragment by using deletion analysis. Upon IPTG induction, pFC84, the SphI deletion, and pFC85, the Asp718 deletion, yielded a 100-fold increase in Taq Pol I activity (Table I) compared to that of the full-length A fragment subclone, pFC82.35. This increase in activity allowed for ready detection of the induced protein(s) on Western blot (Fig. 5). The A fragment induced proteins were truncated with an apparent molecular mass of 63-65 kDa.
Fusing the 5' HindIII site in the A fragment with the HindIII site in BSM13' causes the Taq Pol I gene to be out of frame with respect to @-galactosidase. The reading phase at the HindIII site in BSM13+ with respect to @-galactosidase is A AGC TT, a frame of "0" (13). The reading frame of Taq Pol I at the HindIII site is AAG CTT ("plus 1"). The fusion gives rise to a minus 1 frame shift. In the @-galactosidase reading frame, there is a TGA stop codon at nucleotide 1478 of Taq Pol I. Downstream of this TGA there are several possibilities for restarts which could result in truncated forms of Taq Pol I: ATGs at nucleotides 1509 and 1752 and GTGs at nucleotides 1547, 1569, 1722, and 1731. In fact, we see a doublet in induced lanes of both pFC84 and pFC85 on Western blots (Fig. 5) indicating at least two reinitiation sites. All but one of the likely sites, the ATG at nucleotide 1509, would probably require a ribosome binding site for reinitiation. There are reasonable ribosome binding sites for the GTG at nucleotide 1722 and for the ATG at nucleotide 1752. Translation initiating at these sites would yield proteins of 59 and 58 kDa, respectively. However, the apparent molecular masses of the doublet bands seen on Western blots of pFC84 and pFC85 are approximately 65 and 63 kDa, based on comparison of the mobilities of the doublet bands with the molecular weight size standards. Whether the result of reinitiation or proteolytic processing, the thermostable, enzymatically active, truncated forms of Taq Pol I directed by plasmids pFC84 and pFC85 (Table I) suggest that significant portions of the Taq Pol I sequence are not essential for DNA polymerase activity.
The purpose of the set of fusions of 5' portions of the Taq Pol I A fragment with lacZa in BSM13+ was to confirm or determine the reading phase of the Taq Pol I gene internally as an aide to nucleotide sequencing. Since we knew the reading phase of lacZ in the BSM13+ polylinker, we could infer the reading phase of Taq Pol I in a-complementing in-frame fusions. DG98 harboring fusions which were in-frame were readily detectable as blue colonies on X-Gal indicator plates. We generated a series of fusions (Table 11) at nine sites between nucleotides 962 and 1782 of the Taq Pol I gene.
We compared the DNA sequence of Taq Pol I with that of E. coli DNA polymerase I. At the DNA level, the two genes lack any significant regions of homology (Table 111). In regions where the amino acid sequences are homologous, the DNA sequences diverge, especially in third positions of codons. The longest stretch of DNA sequence identity is 19 bases (Table  111). The predicted amino acid sequence of Taq Pol I is shown in Fig. 2. From this a codon bias table was generated (Table  IV). There is a heavy bias toward G and C in the third position (91.8% C and G) as would be expected for GC-rich organisms   (14) and as others have observed for other Thermus genes: 95% C and G for the gk24 gene encoding L-lactate dehydrogenase of Thermus caldophilus (15), 94.8% for mdh from T. jlauus (14), and 89% for led? from T. thermophilus (16).
Significant amino acid sequence similarity exists between Taq Pol I, E. coli Pol I, and bacteriophage T7 DNA polymerase. One possible sequence alignment yields 38% identity between the Taq Pol I and E. coli Pol I amino acid sequences (Fig. 7). There are two major regions of Taq Pol I and one region of T7 DNA polymerase that show extensive sequence similarity compared to E. coli Pol 1. The first region of Taq Pol I extends from the N terminus to approximately residue 300. The second region extends from approximately residue 410 to the C terminus of Taq Pol I. The N-terminal region of Taq Pol I corresponds to the N-terminal domain of E. coli Pol I shown to contain the 5'-3' exonuclease activity (17). The C-terminal regions of Taq Pol I and T7 DNA polymerase correspond to the E. coli Pol I domain shown to contain DNA polymerase activity (18). The x-ray structure of the Klenow fragment (19) shows that this domain contains a deep cleft believed to be responsible for DNA binding.
Apparently as a result of many mutations, deletions, insertions, etc. during evolution, Taq Pol I residues at positions 300-410 show little sequence similarity compared to E. coli Pol I. Taq Pol I is 96 residues shorter than E. coli Pol I; most of the deleted residues occur in the region encompassing residues 300-410. Ollis et al. (19) and Derbyshire et al. (20) have shown that E. coli Pol I residues Asp-355, Glu-357, Leu-361, Asp-424, Phe-473, and Asp-501 are involved in binding of divalent cation and deoxynucleoside monophosphate. A fragment of E. coli Pol I that contains only residues 515-928 is devoid of 3'-5' exonuclease activity, but still retains polymerase activity (18). Presumably, the E. coli Pol I region comprised of residues 324-515 forms at least part, if not all, of the 3'-5' exonuclease activity. Taq Pol I and E. coli Pol I display little sequence similarity in the presumptive 3'-5' exonuclease region. Of the E. coli Pol I residues shown to be involved in cation and deoxynucleoside monophosphate binding, the sequence alignment of Fig. 7 shows only Asp-424 as having an exact homolog in the Taq Pol I sequence. Although other high scoring sequence alignments are possible in the Taq Pol I 300-410 region, it is possible that the Taq Pol I gene has undergone key mutations, deletions, or insertions Sequence homology between E. coli Pol I and T7 DNA polymerase has been previously noted. Those T7 DNA polymerase sequences shown by Ollis et al. (21) to be conserved between that enzyme and E. coli Pol I are also present in the Taq Pol I amino acid sequence (Fig. 7). Analyses of the effects of various mutations in the E. coli Pol I gene upon enzymatic activity have also been used to define amino acid residues important for polymerase activity. For example, a Gly to Arg mutation at position 850 (po&5) results in a polymerase that is less processive on the DNA substrate (32). An Arg to His mutation at position 690 (polA6) results in a polymerase that is defective in DNA binding (33). As would be expected for an enzyme from a thermophilic organism, Taq Pol I is considerably more thermostable than Pol I from E. coli (data to be presented in a later publication). Although a better assessment of an enzyme's thermostability would result from a complete cataloging of all stabilizing amino acid interactions, in the absence of high resolution xray crystal structures, many researchers have attempted to explain enzyme thermostability by an analysis of amino acid content (35-37). Several features of thermostable enzymes have been noted in such studies. Among those features are increased ratios of Arg to Lys residues, Glu to Asp residues, Ala to Gly residues, Thr to Ser residues, and a reduced Cys content. Comparing Taq Pol I to E. coli Pol I, the Ala to Gly and Thr to Ser ratios are smaller for Taq Pol I than for E. coli Pol I. Of the thermostabilizing type amino acid alterations that hold true, it is particularly notable that the Arg to Lys ratio for Taq Pol I is nearly twice that for E. coli Pol I. It is possible that the propensity of thermophilic proteins to contain Arg rather than Lys residues is simply a reflection of the high GC content of thermophilic organisms. The structural gene for Taq Pol I contains 67.9% GC compared to a 52.0% GC content for E. coli Pol I. The six Arg codons are rich in G and C (13 out of 18 bases are G or C) compared to the two Lys codons (1 out of 6 bases is a G). This explanation for amino acid preferences in proteins from thermophilic organisms cannot be the basis for Glu versus Asp, Thr versus Ser, or Ala versus Gly preference, because there are equal ratios of GC versus AT in the codons for those pairs of amino acids.
A more likely explanation for the preference for Arg over Lys in thermostable proteins would seem to be based on the unique physical-chemical properties of the two amino acids (e.g. pK, values, hydrogen bonding patterns, hydrophobicity/hydrophilicity).
The truncated and full-length Taq Pol I enzymes produced upon IPTG induction show different reactivities to the anti-Taq Pol I antibody. For Western blots (Fig. 5), the immunoreactive band in the lane of induced pLSG1 is more readily detectable than induced pFC84 or pFC85, the SphI and Asp718 A fragment deletions. In fact, we loaded three times as much of the pFC84 and pFC85 extracts compared to pLSG1, and the resulting pLSGl immunoreactive band is still more intense. We infer that there are more epitopes for our antibody, prepared from full-length (94-kDa) Taq Pol I SDS-PAGE gel slice, in the N-terminal end of Taq Pol I than in the C-terminal two thirds of the protein. Or, based on activity, there is at least a &fold difference in reactivity with the antibody of the truncated uersm the full-length form of the enzyme. These advantages will aid in further study of the enzyme and will provide a ready source of Taq Pol I for use in PCR and other biochemical procedures in which Taq Pol I might prove useful, such as in DNA sequencing. the 116: . .