A common sequence motif, -E-G-Y-A-T-A-, identified within the primase domains of plasmid-encoded I- and P-type DNA primases and the alpha protein of the Escherichia coli satellite phage P4.

DNA primases encoded by the conjugative plasmids ColIb-P9 (IncI1), RP4, and R751 (IncP), and the protein of the Escherichia coli satellite phage P4 alpha were shown to contain a common amino acid sequence motif -E-G-Y-A-T-A-. The P4 alpha gene product, required for initiation of phage DNA replication, exhibits primase activity on single-stranded circular DNA templates. This priming activity resembles the enzymatic activity of DNA primases encoded by conjugative plasmids in terms of template utilization and the ability to synthesize primers that can be elongated by DNA polymerase III holoenzyme. The -E-G-Y-A-T-A- motif is part of an extended sequence region most conserved within the primase domains of the four enzymes. Single amino acid substitutions generated in the -E-G-Y-A-T-A- motif of the RP4 TraC2 and the P4 alpha protein affect priming activity, supporting the hypothesis that the conserved sequence motif is part of the active center for primase function. A mutation that eliminates priming activity causes P4 phage to grow poorly and to depend upon the host dnaG primase. Computer analysis identified two additional sequence motifs within the amino acid sequence of the P4 alpha protein: a potential zinc-finger motif and a "type A" nucleotide binding site, both strikingly similar to sequence motifs described in various DNA primases and helicases.


The -E-G-Y-A-T-A-motif is part of an extended sequence region most conserved within the primase domains of the four enzymes. Single amino acid substitutions generated in the -E-G-Y-A-T-A-mo-
tif of the RP4 TraCz and the P4 a protein affect priming activity, supporting the hypothesis that the conserved sequence motif is part of the active center for primase function. A mutation that eliminates priming activity causes P4 phage to grow poorly and to depend upon the host dnaG primase. Computer analysis identified two additional sequence motifs within the amino acid sequence of the P4 a protein: a potential zincfinger motif and a "type A" nucleotide binding site, both strikingly similar to sequence motifs described in various DNA primases and helicases.
A variety of conjugative plasmids of several different incompatibility groups encode DNA primases. DNA primases of the self-transmissible plasmids RP4 (IncPa), R751 (IncPP), and ColIb-P9 (IncIl) are known to be multifunctional proteins and part of the DNA transfer systems. Based on in vivo and in vitro assays the priming activity of these enzymes was exploited (for review see Willetts and Wilkins (1984), Guiney and Lanka (1989), Wilkins and Lanka (1992)). These primases are known to share common biochemical properties such as template utilization as well as composition and size of oligoribonucleotides synthesized in vitro. Suppression of Escherichia coli dnaG mutants in the presence of plasmid-* This work was supported by Sonderforschungsbereich Grant 344/ B2 of the Deutsche Forschungsgemeinschaft (to E. L.) and National Institutes of Health Research Grant AI 08722 (to R. C.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession numbeds) M8I 782.
8 To whom correspondence should be addressed. Tel.: 30-8307-242; Fax: 30-8307-383. specified primases demonstrates that these enzymes have the potential to mediate lagging strand synthesis on the host chromosome by replacing the DnaG protein of the hostspecified priming system. In addition to the priming function, the plasmid-specified polypeptides escort the transferred plasmid strand during conjugative passage through the cell envelope to the recipient bacterium Wilkins, 1989, 1990). Priming and "DNA transport" functions are located in distinct domains of the protein, and the primase domain can be separated to function independently in vitro. Primase loci specify at least two polypeptides having common C termini, the result of translational initiation at tandem in-frame start codons (Lanka et al., 1984;Miele et al., 1991;Wilkins et al., 1981;Boulnois et al., 1982).
Despite genetic and enzymatic similarities of RP4 and ColIb-P9 primases, specificity of the proteins for the homologous conjugation system has been demonstrated (Merryweather et al., 1986;Yakobson et aL, 1990). It is unknown which, if any, of the identified functions of primases governs this plasmid specificity. However, priming activity of these large polypeptides is likely to be dispensable for conjugation in intraspecific E. coli matings (Chatfield et al., 1982;Merryweather et al., 1986), but it has been proposed to increase the efficiency of the DNA transfer process and/or to play a role in host range of the conjugative transfer (Lanka and Barth, 1981;Nash and Krishnapillai, 1988).
RP4 and R751 are close relatives and their primase genes and products are homologous (Miele et al., 1991;Lanka et al., 1985). However, a DNA fragment of ColIb-P9 containing the sog gene (suppression of d m G ) (Wilkins et al., 1981) did not hybridize with RP4 DNA,' and immunological cross-reaction of I-and P-type primase polypeptides could not be detected (Lanka and Barth, 1981) indicating a more distant evolutionary relationship of the IncIl and IncP priming proteins.
In general, prokaryotic primer-generating strategies are biochemically and genetically diverse (Kornberg, 1980(Kornberg, , 1982Nossal, 1983). Single-stranded DNA phages utilize host-encoded proteins for initiation of complementary strand synthesis, whereas phages T4 and T7 are independent from the host because they specify their own primer generating enzymes. Vegetative plasmid replication normally depends on host enzymes. An exception is the small broad host range plasmid RSFlOlO (IncQ) that encodes its own DNA primase (Scherzinger et al., 1991). In none of these cases is there obvious sequence relationship, not even between plasmid-encoded primases acting in conjugation and the E.   Schuster et al. (1977) Campbell et al. (1972 Bertani and Weigle Sasaki and Bertani (1953) (1965) Six et al. (1991) Calendar et al. (1970 from C-4501 by lysogenization from a transduction by Gail Christie: donor GW518; recipient C-la Dumas and Miller (1974) Kranias and Dumas (1974) Bertani et af. (1969) Chattoraj andInman Lindahl (1971) (1972) proteins (Miele et al., 1991). The a protein was shown to possess an RNA-polymerizing activity (Barrett et al., 1972) and is thought to be the only phage-specified protein required for P4 DNA replication in vivo and in vitro (Bowden et al., 1975;Krevolin and Calendar, 1985). ' We have carried out experiments to facilitate evaluation of functional, structural, and sequence similarities among primases. We have determined the nucleotide sequence of the plasmid Colfb-P9 primase gene, sog, and analyzed its translation initiation sites by amino acid sequencing of Sog polypeptides. Analysis of sog deletion mutants served to define the minimal region required for Sog's priming activity. We identified sequence motif(s) present in Sog as well as TraC of RP4 and R751, as well as in the P4 a protein. An in vitro primase assay verified that P4 a indeed possesses priming activity as proposed previously (Krevolin and Calendar, 1985).
DNA Techniques-Routine molecular cloning techniques were performed as described (Sambrook et al., 1989). M13 subclones were sequenced using the chain termination method (Sanger et al., 1977). Experimental conditions for the sequencing reaction were as described (Ziegelin et al., 1991). Nucleotide sequences were analyzed using version 5 of the UWGCG software package (University of Wisconsin Genetics Computer Group, Madison, WI; Devereux et al. (1984)). Site-specific mutagenesis was performed as described by Sayers et al. (1988) using synthetic oligonucleotides. The traCl gene of RP4 and the a gene of phage P4 were isolated from the plasmids pFG101-lAl (Miele et at, 1991) and pMS4A1, respectively, and inserted into the multicloning site of phage M13 mp18. Following mutagenesis the base exchanges introduced were confirmed by nucleotide sequencing. Subsequently, appropriate restriction fragments of M13 mutant RF DNA were reinserted into the corresponding expression vector. The mutant RP4 traCz genes were isolated as EcoRI/HindIII fragments and inserted into the corresponding restriction sites of pJF119EH. In the case of the P4 CY gene a BssHII/MscI fragment of the M13 mutant RF DNA was isolated and inserted into the AscI/MscI restriction sites of pMS4A1.
Primase Assay-DNA primase activity was assayed in vitro in the presence of rifampicin. Viral fd DNA was used as template in an extract of E. coli dnnB dnaC mutant strain BC1304 (Lanka and Barth, 1981;Lanka et al., 1979).

Nucleotide
Sequence of the 5"Region of the Collb-P9 sog Gene-The conjugative plasmid ColIb-P9 encodes two Sog polypeptides with apparent molecular masses of 210 and 160 kDa  only the larger of which exhibits primase activity. A spontaneous deletion within the sog structural gene, resulting in the recombinant plasmid pLG214 (Wilkins et al., 1981), revealed the domain for priming activity to be located within an 1.4-kb fragment extending from the EcoRI restriction site (ColIb-P9 coordinate 57.73 kb, Rees et al., 1987) in counter-clockwise direction with respect to the ColIb-P9 map. Based on this result, an 1.6-kb EcoRIIPstI fragment (coordinates 57.73-56.10 kb) was inserted into M13 derivatives mp18 and mp19 in order to determine the nucleotide sequence of the ColIb-P9 primase domain. Subclones of the 1.6-kb fragment, obtained by molecular cloning of suitable restriction fragments, were subjected to dideoxy chain-termination sequencing. The nucleotide sequence on both strands has been determined.
The GC content of the 5'-portion of the sog coding region ( Fig. 1) is 56%. This is significantly lower than the 66% GC calculated for the IncP primase regions (Miele et al., 1991). Examination of the nucleotide sequence revealed a typical GC-rich segment of dyad symmetry (nt 139-165, Fig. 1) followed by a short stretch of thymidine residues, indicative of a rho-independent transcription terminator (Brendel and Trifonov, 1984;d'Aubenton et al., 1990). This potential transcriptional terminator is located just upstream the sog2,, structural gene. Interestingly, no consensus recognition sequence for the u70 containing E. coli RNA polymerase (Hawley and  TABLE I1 Plasmids used i n this study Nucleotide positions (nt) given are in accordance with Fig. 1 for plasmids containing ColIb-Pg DNA, to (Hailing et al., 1990) for plasmids containing P4 DNA, and to Miele et al. (1991) for pFG101-1A1. The kilobase pair coordinates were taken from the ColIb-pg standard map (Rees et al., 1987).

CTG~TCCGGTCACCGAGTTTCCCCAGCTTCTGGAGAATCCCGGACTGGTTCTGAAAGAACTGCCGGT~GACGGGAAAATCCACCGTGTCCCGACCGC D P V T E F A Q V L E N A C L V L K E L P V M D G K I H R V P T A
FIG. 1. Nucleotide sequence of the ColIb-P9 sog region and deduced amino acid sequence of the N-terminal portion of the sog gene products. The sequence shown corresponds to kilobase pair coordinates 57.73 and 56.10 of ColIb-P9 (Rees et al., 1987). Relevant restriction sites and Shine-Dalgarno sequences (SD) are marked.
The amino acid sequence of the Sog proteins is given below the nucleotide sequence. N-terminal amino acids of Sag,,, and Sag,,, confirmed by protein sequencing are underlined. The arrows at position 139-164 indicate the location of a potential transcriptional terminator. The promoter proximal extreme point of a 1.5-kb fragment, spontaneously deleted from the sog' recombinant plasmid pLG215 named pLG214 (Wilkins et al., 1981), is indicated (nt 1452).

r> PLG214
NLaI I I McClure, 1983) could be identified between the putative terminator and the sog gene employing the algorithm described by Harley and Reynolds (1987). This result suggests that the potential transcriptional terminator might be a regulatory element, i.e. an attenuator controlling the expression of sog and distal genes. A strikingly similar arrangement of an intraoperon rho-independent terminator was described for the rpsU-dnaG-rpoD operon of E. coli (Wold and McMacken, 1982). The transcriptional terminator is located in a small intergenic region separating the rpsU and the dnaG gene that encodes the E. coli DNA primase. An apparent nut site ( N gene utilization site) analogue in the rpsU coding sequence has been proposed to be a cis-acting antitermination element for dnaG expression (Almond et al., 1989). In the case of the ColIb-P9 sog region we could find no nut-like sequence in the approximately 150 bases preceding the potential terminator ( Fig. 1).

T C C G C C A C G C A G C A G G A A C A A C C A G C T G C T T C C A G C P A T A A C G G C A A A G A T G S A T Q Q E Q P A A S S S N I A D E T P S F T S H A T E N N G K D E
In order to localize the start positions of the structural genes sog2,, and s0g160 the N termini of the gene products were determined by microsequencing. The N-terminal amino acid sequences of Sogz10 and Sog160 were P-S- 1). This corresponds exactly with the sequences predicted from the nucleotide sequence following the ATG codons at nt 196 SO^^^^) and nt 1429 SO^^^^), except that the N-terminal methionine of SogZl0 is absent. The region just proximal to the corresponding translational initiation sites shows segments of 5 SO^^,^) and 4 bp (sog160) in length complementary to the 3'-end of the 16 S rRNA of E. coli (Shine-Dalgarno sequence) (Shine and Dalgarno, 1975) at appropriate distances from the ATG initiation codons (Fig. 1).
Overexpression and Identification of sog Gene Products-In order to achieve overexpression of sog the EcoRI fragment E3 of ColIb-P9 was isolated from the pBR325 derivative pLG215 (Table 11) (Wilkins et at., 1981) and inserted into the Ptnc/ l a c a expression vector pJF119EH resulting in pBS215 ( Fig.  2 and Table 11). However, addition of IPTG to SCSl(pBS215) (Fig. 2) did not greatly increase sog expression compared to cells containingpLG215 (not shown). The potential transcriptional terminator ( Fig. 1) most likely reduces transcription of sog. Removal of the terminator-like sequence, in pBS215-1 ( Fig. 2 and Table 11) increased priming activity about 80-fold (Table 111).
The sog gene products expressed by SCSl(pBS215-1) (Fig.  3, A and B ) were indistinguishable with regard to their electrophoretic mobilities in a SDS-polyacrylamide gel to those of pBS215 (not shown). The deletion plasmid, pBS216 (Fig.  2 and Table 11), a recombinant plasmid containing the 4.3-kb BglII fragment of pBS215, encoded the sog160 protein (Fig. 3,  A and B ) exhibiting the same electrophoretic mobility as the corresponding gene product of pBS215 and pBS215-1. This result indicates that the 5' end of sogz10 must be located upstream of the BglII site at nt position 254 (Figs. 1 and 2), whereas the 3' end of sog must be located upstream or immediately downstream of the BglII restriction site at 4.6 kb (Fig. 2). Note that the expression vector pJF119EH supplies stop codons in all three reading frames in the region following sog. Thus, use of any of these would result in fusions with a C-terminal truncated Sog. Assuming a stop codon for sog located at the BglII site, calculation of the molecular masses of Sogzlo and from the coding capacity of the DNA fragment results in 162 and 116 kDa, respectively. These data suggest that the actual molecular masses of the Sog polypeptides should be significantly smaller than the apparent molecular masses of 210 and 160 kDa determined from gel electrophoresis (Fig. 3A).
Deletion analysis of the ColIb-P9 sog region. Deletion derivatives of the sag region were generated based on pLG215 (Wilkins et al., 1981), containing the EcoRI (E3) fragment of ColIb-P9 (Rees et al., 1987). The restriction map of the E3 fragment is given in the upper part of the figure; the hatched portion represents the sequenced part ( Fig. 1) of the sog region. Horizontal bars below the map indicate the extent of the ColIb-P9 portions of recombinant plasmids generated ( Table 11). The pJF119 vector components are not shown. Plasmids pBS215 andpBS216 were generated by insertion of the 8.4-kb EcoRI (E3) and the 4.3-kb BglII fragment of pLG215 (Wilkins et al., 1981) into the EcoRI and BamHI restriction site of the P,,/lacP vector pJFllSEH, respectively. The deletion derivatives pBS215-1, pBS215A2-1 and pBS215A3-1 were constructed by XbaI linearization of pBS215 followed by a partial PstI digestion to remove segments of the promoter distal portion of the E3 fragment. In addition, the potential transcriptional terminator was deleted employing a synthetic oligonucleotide adaptor. Details of the construction are shown in the lower part of the figure. The upper nucleotide sequence presented corresponds to the wild type sequence (nt 139-238, Fig. 1). The 198-bp fragment between EcoRI (nt 1, Fig. 1) and the NlaIII restriction site (underlined), containing the ATG start codon of sog,,~, was removed. A synthetic oligonucleotide adaptor with EcoRI and NlaIII cohesive ends was designed (lowercase letters), restoring the nucleotide sequence of the wild type translational initiation region but lacking the potential terminator sequence. The coding region of the sag gene is shown by horizontal bars that contain arrowheads indicating the two in-phase translational starts. sog gene products expressed from deletion derivatives were analyzed by SDSpolyacrylamide gel electrophoresis and solid phase immunoassay, indicated by +, -, or (+) for expression of truncated gene products. The Primase Domain of Sog Occupies the Non-overlapping Portion of In-phase Translated sog Gene Products-As expected from previous data  no priming activity could be detected in extracts of SCSl(pBS216) cells, not even after strong overexpression of the sog160 polypeptide indicating that the N-terminal portion of the Sog210 protein, identification of Sog polypeptides by solid phase immunoassay. Samples applied to the gel were identical to panel A , except that reference proteins were fluorescein isothiocyanate (F1TC)-labeled bovine serum albumin and FITC-lysozyme (0.5 pg each). Extracts of cells harboring recombinant plasmids were diluted with an extract of plasmid-free strain SCSl as follows: SCSl(pBS215-I) 1:2, SCSl(pBS216) 1:3, and SCSl(pBS215A2-1) 130. Proteins were transferred to a nitrocellulose membrane (41), followed by reaction with rabbit anti-Sog serum (dilution 1:400) and fluorescein isothiocyanate-conjugated goat antirabbit IgGs (Lanka et al., 1984). A photograph (360 nm UV light) of the membrane is shown. Apparent molecular masses of Sog proteins are indicated; molecular masses of the truncated polypeptides are given in parentheses.
exclusive for the larger gene product, contains at least part of the primase domain. Deletions affecting the 3'-portion of the sog coding region (pBS215A2-1 and pBS215A3-1, Fig. 2 and Table 11) resulted in the expression of truncated Sog fusion proteins with apparent molecular masses of 85 and 63 kDa (pBS215A2-1; Fig. 3, A and B ) and 35 kDa (pBS215A3-1; Fig. 3, A and B ) . Determination of the N-terminal amino acid sequence of the 63-kDa sog gene product expressed from pBS215A2-1 revealed the sequence to be identical to the N terminus of (Fig. l), indicating that this protein might be a defined degradation product of the 85-kDa polypeptide, lacking C-terminal sequences. Priming activity could be detected in cells harboring pBS215A2-1 but not in extracts of cells harboring pBS215A3-1 (Table 111). These data indicate that either a specific nucleotide sequence within the HincIIl PstI (Fig. 2) fragment, or a change in protein conformation induced by this region is required for priming activity. Interestingly, the molecular mass of the truncated Sog2,O polypeptide of pBS215A2-1 calculated from the nucleotide sequence was significantly lower (53,856 Da) than the apparent molecular mass of 85 kDa estimated from electrophoresis under denaturing conditions. No such discrepancy is detected in the case of the 35-kDa truncated Sog-protein encoded by the recombinant plasmid pBS215A3-1.
In order to calculate the specific activity of the S0g2~0 and the truncated 85-kDa Sog2,, fusion protein, the fraction of these polypeptides of the total soluble protein in extracts of SCSl(pBS215-1) and SCSl(pBS215A2-1) cells was determined by laser densitometry. The specific activity of the truncated 85-kDa Sogzl0 fusion protein was found to be 2-3 times higher than the activity of the full-length Sog polypep-tide. This result indicates that only information up to the PstI site (nt 1663, Fig. 1) is needed for a fully active DNA primase polypeptide. Nucleotide sequencing of the HincIIl PstI fragment (Fig. 1) of pLG214 (Table 11) was performed in order to identify the 5"border of the spontaneous deletion. Since the mutant Sogzl0 protein of pLG214 (87 kDa) is active in an in vitro primase assay (Wilkins et al., 1981), the primase domain is suggested to be located within the N-terminal419 amino acids of the protein of Sog2,O.
Sequence Similarities between the IncI and IncP Primases Are Located in Regions of the Proteins Highly Conserved among the IncP Enzymes-In order to search for sequence similarities between the primase domain of the Sogzl0 polypeptide and the IncP DNA primases (Miele et al., 1991) their amino acid sequences were compared using the dot plot matrix . Three segments exhibiting significant sequence similarities to the N terminus of ColIb-P9 SogZ10 could be identified within the TraC2 protein of RP4 (IncPa) and the TraC4 protein of R751 (IncPP) (Fig. 4). Regions I and I1 were located in a segment of the IncP primases where the sequence similarity between the IncPa and IncPP TraC polypeptides was calculated to be 62% (positional identity of amino acids), whereas region I11 is part of the most conserved C-terminal segment (84%) of the IncP primases (Miele et al., 1991). Alignment of the three amino acid sequences within the regions I, 11, and 111, and their consensus is shown in Fig.  5. Overall positional identity was calculated to be about 50% in region I and I1 and 23% in region 111, that is the most extended sequence segment aligned. Two gaps were introduced into region I11 of the Sog sequence to achieve maximal similarity.
IncP, IncIl Primases, and the Protein of Phage P4 Share a T h e deduced amino acid sequences of RP4 TraCn, R751 TraCl (Miele et al., 1991), ColIb-P9 Sog (Fig. 1) and of P4 a (Flensburg and Calendar, 1987) are aligned within the regions of extended sequence similarities. Identical amino acids are boxed. The consensus sequence is given below the boxes. Amino acid positions conserved in all four sequences (region 111) are marked by asterisks. The position numbers, given in parentheses, indicate the first residue in each sequence relative to the N terminus of the corresponding polypeptide. Three gaps, marked by dots, were introduced in order to achieve maximal similarity of the sequences. protein of the E. coli satellite phage P4 to exhibit amino acid sequence similarities to the segment of IncP and ColIb primases designated region I11 (Fig. 5). Sequence similarity extends over a range of approximately 140 amino acids. Identical amino acid positions between IncP primases and the P4 a protein are more frequent among the first hundred residues of region I11 than between Sog of ColIb-P9 and P4 a. One motif of 6 contiguously arranged amino acids (-E-G-Y-A-T-A-) appears to be particularly conserved in all four sequences. These data support the proposal by Krevolin and Calendar (1985) that the P4 a protein might possess DNA primase activity. In order to prove this hypothesis expression vector cloning was carried out to overexpress and characterize the a gene product.

Common Sequence Motif: E-G-Y-A-T-A-Computer
Ouerexpression and Identification of the P4 Protein-A 3.39kb DraI/BamHI fragment of P4 DNA (nt 7655-4263, Halling et al. (1990)) carrying the P4 a gene was inserted into the P,,,/lucP expression vector pMS119EH resulting in pMS4 ( Fig. 6 and Table 11). According to the nucleotide sequencing data of Flensburg and Calendar (1987) this fragment contains two open reading frames: ORF 106 and a, encoding gene products of 11,821 and 84,841 Da, respectively. Two additional polypeptides of apparent molecular masses of 82 and 11 kDa, most likely the gene products of a and orflO6 (not shown), were detected in extracts of IPTG-induced cells harboring the recombinant plasmid pMS4. To facilitate overproduction, identification and purification of the a protein, pMS4A1, a deletion derivative of pMS4 was constructed containing only the a gene ( Fig. 6 and Table 11). The 82-kDa polypeptide, overproduced after IPTG induction of SCSl(pMS4Al) cells (Fig. 7), was purified (purification will be described elsewhere) followed by microsequencing of the N terminus. The N- kb DraI/BarnHI fragment of P4, encoding ORF 106 and the (Y protein (Hallinget al., 1990), was inserted into the P,,/lacp expression vector pMS119EH. Deletion derivatives of the resulting recombinant plasmid pMS4 were generated using appropriate restriction sites. Horizontal bars below the restriction map indicate the P4 portion of the plasmids; the pMS119EH vector components are not shown. Gene products overexpressed after IPTG induction of SCSl cells containing the deletion derivatives were analyzed by gel electrophoresis. Expression of the a protein was detected employing a rabbit anti-P4 a serum. Coding regions of orf106 and the a gene derived from nucleotide sequencing data (Flensburg and Calendar, 1987) are indicated in the lower part of the figure. The direction of transcription is given by arrowheads. Expression of gene products is indicated by +, -, or (+) for a truncated polypeptide.  (Towbin et al., 1979) followed by reaction with anti-P4 a serum (1:400) and by incubation with FITC-conjugated goat anti-rabbit IgGs. A photograph (360 nm UV light) is shown. Molecular mass standards (0.5 pg each): FITC-labeled bovine serum albumin and lysozyme; Expression of the P4 a protein in cells harboring recombinant plasmids was induced by IPTG. Extracts SCSl(pMS4Al) and SCSl(pMS4A2) were diluted 1:lO with an extract of plasmid-free strain SCS1. Apparent molecular masses of the P4 a protein (82 kDa) and the truncated polypeptide (42 kDa) are indicated.
terminal sequence M-K-M-N-V-T-A-T-V-was found to be identical to the sequence of the a gene product predicted from nucleotide sequencing data (Flensburg and Calendar, 1987). In contrast to the data of Flensburg and Calendar (1987) the N terminus of the a protein turned out to be unmodified beginning with a methionine residue.
A polyclonal anti-P4 (Y serum was used to identify the a protein in extracts of cells harboring the recombinant plasmid pMS4A1 and P4 uirl-infected cells (Fig. 7), a mutant phage exhibiting constitutive expression of the a protein (Lin, 1984).

Primase Domain of Phage P 4 Protein
No difference in size could be detected for the a polypeptide encoded by the recombinant plasmid and the mutant phage, indicating that in accordance with the nucleotide sequencing data of Flensburg and Calendar (1987) the complete a gene is located on the P4 AuaIIEcoRV fragment of pMS4Al.
The P4 a Protein Possesses DNA Primase Activity-To analyze priming activity of the a protein, extracts of cells harboring the recombinant plasmids pMS4 and pMS4Al as well as the purified a protein were subjected to an in uitro primase assay that is based on the ability to initiate synthesis of the complementary strand of phage fd DNA (Lanka et al., 1979;Lanka and Barth, 1981). The increase in primase activity detected upon addition of increasing amounts of extracts of induced SCSl(pMS4) cells (Fig. 8 ) compared with an extract of non-induced cells indicates that primase activity is due to the expression of the protein under the control of the LacI-regulated tac promoter. The primase activity is rifampicin-resistant. Complementary strand synthesis on fd and 4x174 DNA templates also takes place in extracts of E. coli strain HMS83 in the presence of P4 a indicating that DNA polymerase I11 holoenzyme elongates the primers synthesized by P4 a (not shown). HMS83 is deficient in DNA polymerases I and I1 (Campbell et al., 1972). The activity measured in extracts of induced SCSl(pMS4Al) is nearly identical to the activity detected in SCSl(pMS4) extracts suggesting that overexpression of ORF 106 protein at the same time does not influence the in vitro primase activity of the a protein. The observation that very small amounts of the purified a protein were sufficient to detect significant primase activity demonpg Protein I---) 0.5 strates that no additional P4-encoded protein is needed for P4 a primase activity.
The Primase Activity of P4 a Resides in the N-terminal Half of the Protein-A deletion derivative of pMS4 (pMS4A2; Fig.  6 and Table I) was generated to localize the primase domain of the a protein. pMS4A2 contains only the 5'-terminal half of the a structural gene, encoding a truncated protein with an apparent molecular mass of 42 kDa (Fig. 7). Cross-reaction with P4 a antiserum shows essentially one band indicating the existence of a stable truncated N-terminal P4 a polypeptide. Primase activity was detectable in extracts of IPTGinduced SCSl(pMS4A2) cells (Fig. 8) but significantly reduced compared with SCSl(pMS4Al). To calculate the specific activities of the a protein and the truncated 42-kDa a fusion protein, the amounts of these polypeptides in extracts of SCSl(pMS4Al) and SCSl(pMS4A2) cells were determined by separation of the soluble cell proteins on a SDS polyacrylamide gel, stained with Coomassie Blue and subsequent laser densitometry of the stained protein bands. Deletion of the Cterminal half of the a protein was found to cause a 160-fold reduction in enzyme activity. This result demonstrates that the primase domain of the a protein is located towards the N terminus of the polypeptide. The region contains the segment exhibiting sequence similarities to the IncP and ColIb-P9 DNA primase including the -E-G-Y-A-T- Figs. 4 and 5). Point mutations resulting in single amino acid changes were generated to decide whether or not this conserved sequence motif is connected to primase activity.
The Motif -E-G-Y-A-T-A-Is Part of the Primase Domain of RP4 TraCz and P4 a-Single amino acid replacements within the -E-G-Y-A-T-A-motif of the RP4 TraCa protein and the P4 a protein were introduced employing the method for site directed mutagenesis described by Nakamaye and Eckstein (1986) and Sayers et al. (1988). The following amino acid substitutions were selected glutamic acid to glutamine (E + Q), tyrosine to phenylalanine (Y + F), and threonine to serine (T + S). The choice for the substitutions followed the rules of Bordo and Argos (1991). The oligonucleotides used to construct these mutations were 19-21 residues in length containing one base substitution compared with the wild type sequence. Base substitutions were selected to minimize introduction of codons that are recognized by rare tRNA species of E. coli. Single-stranded DNA templates were isolated from M13 mp19 derivatives containing the EcoRI/ Hind111 fragments of pFG101-lAl for mutagenesis of the RP4 primase (Table 11) and pMS4Al encoding the P4 a protein ( Fig. 6 and Table 11). The base changes introduced were confirmed by nucleotide sequencing (not shown). In the case of P4 (Y the complete nucleotide sequences of the 255-bp AscI/MscI fragments containing the point mutations were determined. These fragments were used to replace the corresponding wild type sequence of pMS4Al to achieve overexpression of the mutant proteins. Expression vector cloning of the mutant traCz gene was performed by reinsertion of the EcoRI/HindIII fragment into the P,/k@ expression vector pJF119EH. The resulting plasmids were designated pFG101and pMS4AlS (Table IV).
Analysis of IPTG-induced SCSl cells harboring the recombinant mutant plasmids by solid phase immunoassay confirmed that the intensity and electrophoretic mobility of the mutant proteins was indistinguishable from the corresponding wild type proteins (not shown). Subsequently, the cell extracts were subjected to the in uitro primase assay, the results of which are summarized in Table IV. Primase activity of both enzymes was found to be completely abolished by the E + Q lAlQ, pFG101-lAlF, pFG101-lAlS, pMS4AlQ, pMS4AlF, The calculation of the amount of primase polypeptides is based on data obtained by laser densitometry and by determination of protein concentration (according to the method of Lowry et al. (1951)) in extracts of cells harboring the recombinant plasmids. ND, primase activity not detectable.
exchange. The amino acid substitutions Y + F and T + S caused slight differences on the primase activity of the mutant RP4 TraC2 and P4 a proteins. The mutant TraC2 protein expressed by SCSl(pFG101-lAlS) exhibits residual primase activity, whereas no activity could be detected in extracts of SCSl cells harboring the recombinant plasmid pMS4AlS encoding the mutant P4 a protein. Interestingly, a 2-3-fold increase in enzyme activity compared with the wild type protein was observed employing extracts of SCSl(pFG101-1AlF) cells. In contrast, no significant difference in primase activity could be detected between the P4 protein and the corresponding Y + F mutant. These data suggest that the -E-G-Y-A-T-A-sequence is part of the active center for primase function of the RP4 TraC2 and the P4 a protein.
Prirnase Activity of P4 a Affects P4 Multiplication-The mutations from plasmid pMS4 were crossed into P4 virl by fragment exchange using the unique restriction sites for ASCI and MscI. The transfectant plaques of the Y216F mutant phage in E. coli C-353 (lysogenic for P2) were as large or larger than plaques of P4 virl phage. The phage carrying the T218S mutation made plaques that were smaller than wild type, and the E214Q mutation gave rise to minute plaques of a sharply reduced frequency. The small plaques of the S mutation could be picked and grown into a high titer stock by reducing the bacterial host concentration to 70% of normal. The minute plaques from the Q mutation could not be grown into a high titer stock even when the bacterial inoculum was reduced to 10% of the normal amount.
To produce P4 E214Q mutant phage we transformed E. coli C-2422 recA(P2)/pMS4A2, which complements the primase mutation by providing the primase domain of the a protein from a plasmid. In the presence of 0.1 mM IPTG the E214Q mutant gave large transfectant plaques at a frequency similar to that seen for the T218S mutant. High titer stocks of the E214Q mutant could also be prepared on this complementing strain. The P4 E214Q mutant gave about one-third the normal yield of progeny on wild type E. coli strains lysogenic for P2, so it appeared that some host or prophage protein helps the E214Q mutant to prime DNA synthesis. We believe that the dnaG protein performs this function, because the P4 E214Q mutant will not grow at non-permissive temperature on an E. coli strain that carries a temperature-sensitive lesion in the dnaG gene (C-2309, Table V). We also found that a dnaB (helicase) temperature-sensitive mutant (LD 312) could not support the E214Q mutant phage (Table V). This result might  Bowden et al. (1975) was followed. Logarithmically growing cells lysogenic for P2 lg were infected with P4 uirl carrying no mutation in the 01 gene or the indicated primase motif mutation. The cells were coinfected with P2 uirl amK12, and the multiplicity of infection was about 8 for each phage type. Unabsorbed phages were inactivated with antiserum, and cells were diluted and incubated for 2.5 h at the indicated temperature before assay. be expected, since helicase activity stimulates primase activity. A rep mutation, which eliminates a nonessential helicase activity, did not affect the growth of the E214Q mutant phage (C-1681, Table V). We suspected that a dnaC mutation might affect growth of our mutant phage, since the dnaC protein helps to load the dnaB protein onto certain DNA templates, but this expectation was not fulfilled (LD331, Table V). In summary, the P4 primase mutation is not lethal, because the host priming system can substitute inefficiently for the primase activity of the P4 a protein.

DISCUSSION
Sequence similarities between the Pand I-type conjugation systems have been described recently indicating an evolutionary relationship. Two corresponding components of the relaxosome RP4 TraJ/R64 NikA and RP4 TraI/R64 NikB were found to share common amino acid sequences (Furuya et al., 1991;Pansegrau and Lanka, 1991). Even the target site for these proteins, the transfer origins of RP4 and R64 (IncI1) contain nearly identical structural elements in form of a large inverted repeat, the end of which is 8 bp apart from the nick site. This 8-bp segment is called the nick region (Waters et al., 1991) which is highly conserved in RP4 and R64 and other DNA transfer systems (Pansegrau and Lanka, 1991). The similarities of IncP and IncI primases described here provide additional indications for a divergent evolutionary relationship of the two DNA transfer systems as well as for mechanistic similarities of the conjugative process. Primase and DNA transport functions were assigned to the larger products of the primase genes, traCl(RP4) and sogz10(ColIb-P9), specifying polypeptides of 118 and 210 kDa, respectively. In both cases, TraCl and Sog210, the primase domain has been shown to exist as truncated polypeptide without losing much of its in vitro activity, although the arrangement of the primase and DNA transport domains in the wild type proteins is inverted. The primase domains are located at the C terminus in TraCl and at the N terminus in SOg210. This suggests a modular structural arrangement of functional domains within these large polypeptides.
The transport mechanism of TraCl and SOg,10/SOg1,, to the recipient cells during DNA transfer is not understood Wilkins, 1989, 1990). N-terminal amino acid sequences of the proteins lack typical signal sequences indicating that the protein is transferred by some process other than the classical protein export pathway. The finding that IncP TraCsequences show similarity to defined regions of colicin A might be important to explain the DNA transport function of primase proteins (Miele et al., 1991).  (Argos, 1988). The portion of the protein exhibiting amino acid sequence similarities to the IncP and the ColIb-P9 DNA primases is shaded. A2 indicates the extension of the truncated a protein encoded by the deletion derivative pMS4A2. Panel B, comparison of the potential zinc finger motif and the type A nucleotide binding site with similar motifs of various DNA primases and helicases. The amino acid sequences were taken from the following sources: T4 gene 41/61 (GenBank accession no. K03113; M. Nakanishi and B. Alberts, unpublished data), T7 gene 4 (Dunn and Studier, 1983), E. coli uurA (Husain et al., 1986), P4 a (Flensburg and Calendar, 1987), E. coli helD (Wood and Matson, 1989), E. coli rep (Gilchrist and Denhardt, 1987), E. coli uurD (Finch and Emmerson, 1984), and E. colipriA (Nurse et al., 1990;Lee et al., 1990). The position numbers indicate the first residue in each sequence relative to the N terminus of the corresponding polypeptides. The consensus sequence of the type A nucleotide binding site was taken from Walker et al. (1982) and Gorbalenya and Koonin (1990), and the Asp-Asp motif probably involved in metal ion binding was from Argos (1988). The position numbers given above the Asp-Asp motif refer to the numbering carried out by Argos (1988) in order to identify similar motifs within DNA and RNA polymerases.
Sequence analogy between conjugative DNA primases and the phage P4 a protein led to the experimental proof that one in uitro function of a is a priming activity. P4 a protein and conjugative primases show significant parallels. (i) The enzymes mediate rifampicin-resistant initiation of complementary strand synthesis on a variety of single-stranded phage DNAs. (ii) Synthesis of oligoribonucleotides in uitro is independent on the presence of other proteins. (iii) DNA polymerase I11 holoenzyme of E. coli is capable to utilize these primers for elongation.
A rifampicin-resistant RNA polymerizing activity induced by phage P4 has already been described by Barrett et al. (1972). This activity is a property of the a protein rendering P4 replication independent of the E. coli priming systems (Bowden et al., 1975). It is conceivable that the activity discovered 20 years ago is part of the primase function described here, explaining the autonomy of P4 DNA replication from the host chromosome-encoded priming system, i.e. E. coli RNA polymerase and DnaG protein. Barrett et al. (1972) used polydeoxcytidylate as template and GTP as substrate in the absence of any DNA polymerase. We have obtained conclusive results using natural single-stranded DNA templates in a coupled reaction that allowed priming by P4 a protein and elongation by DNA polymerase(s) (Scherzinger and Litfin, 1974).
The motif -E-G-Y-A-T-A-, conserved in the four priming proteins of plasmids RP4, R751, ColIb-P9, and phage P4, apparently is essential for the enzymes' function because the activity pattern obtained with the mutant proteins of RP4 TraCz and P4 a fit into a general scheme (Table IV). Two amino acid replacements within this motif abolish or strongly decrease the specific activity, whereas the Y + F change increases or leaves the activity unaltered. These results suggest that, provided there is no alteration in protein conformation, these residues could be forming a part of a critical domain involved in primase function. The choice of introducing changes only in functional groups of amino acid side chains and the prediction of secondary structures as a consequence of the alteration in the primary structure is in favor of the assumption of only minor conformational changes in the mutant proteins (Bordo and Argos, 1991).
The localization and characterization of the primase domain by enzymatic assays and plaque formation of mutant phages in only the N-terminal half of the a polypeptide and previous data from in uiuo and in uitro studies implied that P4 a possesses several activities needed for phage P4 replication (Bowden et al,, 1975). A hint for additional P4 a activities came again from data base search that revealed the protein to contain additional motifs exhibiting similarities to sequences identified in other well characterized DNA primases and DNA helicases. The arrangement of these motifs within the P4 a amino acid sequence is schematically summarized in Fig. 9A. A potential "zinc finger" motif is localized in the N-terminal region of the polypeptide (Fig. 9B). Interestingly, a comparable sequence was found within the Nterminal region of the T4 gene 61 and the T7 gene 4 protein.
Both polypeptides are known to function as DNA primase, the latter also as a 5' + 3' helicase (Matson et al., 1983;Venkatesan et al., 1982). The potential zinc finger motif of the T7 gene 4 protein was proposed by Bernstein and Richardson (1988) to be involved in binding of the recognition sequence on the template DNA. The zinc finger motif of P4 (Y could be responsible for the same step in template-dependent oligoribonucleotide synthesis. In addition, a type A nucleotide binding site (Walker et al., 1982;Gorbalenya and Koonin, 1990) was identified within the C-terminal half of P4 a exhibiting extended similarities to nucleotide binding sites of different DNA helicases such as the gene products of phages T4 gene 41 and T7 gene 4, and the E. coli proteins UvrA, helicase IV, Rep, helicase I1 (UvrD), and n' (PriA). (Fig. 9B). This observation may support the hypothesis that the a gene product also could function as a  (Bowden et al., 1975). A fourth sequence motif, probably involved in metal-ion binding at the nucleotide binding site, was identified in the four primase polypeptides showing some relationship to sequences found in various DNA polymerases (Argos, 1988) but also in some DNA helicases (Bernad et al., 1990).

Spacing between -E-G-Y-A-T-A-and the proposed binding site for
Mg2+ ions is identical in all four primases (Fig. 9B), supporting a possible functional importance for this motif.
In vitro experiments to verify the proposed multifunctional character of P4 a were successful. The protein possesses in addition to primase, helicase and P4 origin recognition activities (to be published elsewhere). These properties make the P4 a protein one of the most interesting prokaryotic replication proteins combining at least three activities in one polypeptide chain being involved in the initiation process of P4 DNA replication.