Adenovirus Terminal Protein Precursor PARTIAL AMINO ACID SEQUENCE AND THE SITE OF COVALENT LINKAGE TO VIRUS DNA*

The adenovirus terminal protein precursor functions as a primer for the initiation of virus DNA replication by covalently binding the first nucleotide in the DNA chain. It remains covalently attached to the 5‘-ends of the virus DNA and is cleaved to the terminal protein during virion maturation. The gene encoding the ter- minal protein precursor maps within a 7-kilobase region of the virus genome, which specifies multiple mRNA and protein species. We have determined the location, within this region, of the coding sequences for the terminal protein precursor by aligning partial amino acid sequence to the amino acid sequence predicted from the DNA sequence (accompanying papers; Gingeras, T. R., Sciaky, D., Gelinas, R. E., Bing-Dong, J., Yen, C. E., Kelly, M. M., Bullock, P. A., Parsons, B. L., O’Neill, K. E., and Roberts, R. J. (1982) J. BioZ. Chem 257, 13475-13491; Alestrom, P., Akusjarvi, G., Petters- son, M., and Pettersson, U. (1982) J BioL Chem 267, 13492-13498). The open translational reading frame between coordinates 23.4 and 28.9 on the genome contains the majority of the coding sequences for the pre- cursor protein. The virion terminal protein derives from COOH terminus of the precursor protein. Using this information, the site within the protein of covalent attachment to the DNA has been determined. This site also corresponds to that which covalently binds dCMP, the first nucleotide in nascent DNA synthesized in uitro. The coding region for terminal protein

9 To whom correspondence should be addressed.
The abbreviations used are: pTP, precursor to the terminal protein; 140K protein, protein of M, = 140,000 (other forms expressed similarly); ad2, adenovirus type 2; E2B, early transcription unit; SDS, sodium dodecyl sulfate. This protein is covalently attached to the termini of replicating DNA in vivo (Coombs et aL, 1978;Kelly and Lechner, 1978Stillman and Bellett, 1978Van Wielink et al., 1979;Challberg and Kelly, 1981) and nascent DNA synthesized in vitro (Challberg et al., 1980;Stillman, 1981). The unbound form of pTP can covalently bind dCTP to form a pTP-dCMP complex in a reaction which requires specific DNA sequences at the origin of replication Pincus et al., 1981;Tamanoi and Stillman, 1982;Challberg et al., 1982). Recently, Enomoto et al. (1981) have purified functional pTP and an associated 140K protein and demonstrated that these proteins together contained the dCMP binding activity and a DNA polymerase activity.
The mature virion TP was first recognized as enabling virus DNA to circularize via a protease-sensitive, noncovalent interaction (Robinson et al., 1973;Robinson and Bellett, 1974) and subsequently identified as a 55K protein covalently linked to each 5'-end of the linear virus DNA (Rekosh et al., 1977). The covalent linkage between DNA and protein is a phosphodiester bond between the &hydroxyl group of a serine residue in the protein and the 5'-hydroxyl of the terminal deoxycytosine residue (Desiderio and Kelly, 1980). That TP is derived from a precursor protein was first suggested by the detection of an 80,000-dalton protein that was covalently linked to the 5'-end of nascent DNA replicated in cell-free extracts prepared from virus-infected cells (Challberg et al., 1980). The 80K protein is structurally related to T P and is also covalently linked to the DNA by the same phosphodiester bond.
The origin of TP had long been an enigma until the demonstration by cell-free translation of hybridization-selected mRNA that proteins with estimated M, = 105,000,87,000, and 75,000 are encoded by the virus l-strand between coordinates 11 and 31.5 and that the 87K protein is structurally related to T P (Stillman et ab, 1981). The 87K protein is the same as the 80K protein described by Challberg et al. (1980); the discrepancy is due to the use of different molecular weight markers. In addition, the 87K protein is structurally related to the 80K-87K protein that is covalently associated with the DNA from ad2 tsl virions grown at the nonpermissive temperature (Stillman et al., 1981;Challberg and Kelly, 1981), a mutant virus that fails to cleave virus-encoded precursor proteins to their mature counterparts during virion maturation (Bhatti and Weber, 1979). The mapping of pTP to the virus genome also led to the definition of a new early transcription unit, designated E2B. Several mRNAs from E2B have been identified, including those containing leaders at coordinates 39, 68.5, and 75 linked to various RNA main bodies, which extend from either coordinate 30, 26, or 23 to coordinate 11.1 (Stillman et al., 1981). The existence of multiple mRNA and protein species and the length of the E2B region of the virus genome (approximately 7000 base pairs) are such that the location of the coding region for pTP was not known. Recently, Gingeras et al. (1982) and Alestrom et al. (1982) have deter-protein complex at 80 "C for 10 min in 2% (w/v) SDS, 5% (v/v) pmercaptoethanol, 0.3 M NaCl in TE. The iodinated protein was separated from free "'1 by chromatography through a Sephadex G-50 column and then the protein was removed from DNA by treatment with either DNase I or piperidine as described previously (Stillman et al., 1981). followed by electrophoresis on 10% SDS-polyacrylamide gels (Laemmli, 1970).
DNA-protein complex prepared from ad2 tsl virions grown at 40 "C was also digested with either trypsin or Staphylococcus aureus V8 protease for 3 h at 37 "C after labeling with '*'I. The DNA-peptide was separated from non-DNA-bound peptides by equilibrium centrifugation in CsCl, 4 M guanidine HCl in TE with 1 mM phenylmethylsulfonyl fluoride at an initial density of 1.408 g/ml. Centrifugation was for 36 h in a SW 50.1 rotor at 4 "C. The DNA-peptide was dialyzed against TE and 1 m phenylmethylsulfonyl fluoride and the peptide was removed from the DNA by treatment with piperidine.
Preparation, Selection, and Translation of mRNA-RNA was prepared from the cytoplasm of ad2-infected HeLa cells at 7 h postinfection after addition of anisomycin to 10 j m at 3 h postinfection. The mRNA was prepared, selected by hybridization to DNA from a plasmid containing the ad2 Ban-E fragment, and translated in a rabbit reticulocyte lysate as described by Lewis and Mathews (1980) and Stillman et al. (1981). For large scale preparation of ["'Slmethionine-labeled E2B-specific proteins, 2 mg of total cytoplasmic RNA were used for selections and the selected mRNA was translated in a 2 0 0 4 reaction mixture that contained 800 pCi of [35S]methionine (New England Nuclear) and 80 pl of a rabbit reticulocyte lysate. The products were separated on preparative 10% SDS-polyacrylamide gels as described by Laemmli (1970).
Formation of pTP-dCMP Complex-Nuclear extracts for DNA replication in vitro were prepared as described by Challberg and Kelly (1979) except that cells were infected at a multiplicity of infection of 100 plaque-forming units/cell. The extract (2 m l ) was made 0.2 M with NaCl and loaded onto a DEAE-cellulose column (1 m l ) equilibrated with buffer F (25 mM Tris, pH 7.5, 1 mM EDTA, 1 mM dithiothreitol, 20% glycerol) containing 0.2 M NaCI. The column was washed with the same buffer and the proteins that passed through the column were precipitated after addition of 2 volumes of saturated ammonium sulfate. The precipitates were resuspended in 1 ml of buffer F containing 50 mM NaCl and dialyzed against 200 volumes of this buffer for 4 h a t 4 "C, (DEAE-cellulose fraction). The pTP-dCMP complex was formed as described by Tamanoi and Stillman (1982).
Proteolytic Digestion, Peptide Mapping, and Amino Acid Sequence Determination-Labeled peptides were visualized by autoradiography of dried gels, excised, oxidized with performic acid, and digested with N-tosylphenylalanine chloromethyl ketone-trypsin while still in the gel slice, as described by Smart and Ito (1978). Chromatography of peptides through a Spherisorb ODS (C-18) reverse phase column (Spectra-Physics) on a Spectra-Physics SP8000 high performance liquid chromatography system was as described by Smart et al. (1981). Similarly, peptide sequence analysis using the spinning cup method on a Beckman 890C sequencer has been described previously .

RESULTS
Isolation of Radiolabeled Proteins-To obtain partial protein sequence of pTP and its related proteins, we radiolabeled the protein by three independent methods. First, we labeled only pTP with [3JS]methionine by translation of selected mRNA in a rabbit reticulocyte lysate. EBB-specific RNA was selected by hybridization and after cell-free translation, the products were separated by electrophoresis in a preparative gel. An analytical gel of approximately 2% of the product is shown in Fig methionine-labeled proteins obtained by translation of E2B-selected mRNA in a rabbit reticulocyte lysate. B, '*'I-labeled proteins that were covalently linked to the DNA from ad2 tsl virions grown at 32 "C. After labeling, the proteins were removed from the DNA by incubation with 0.5 M piperidine for 2 h at 37 "C. C, same as E, except that protein was covalently linked to the DNA from ad2 tsl virions grown at 40 "C. 0, [a-3'P]dCMP-labeled pTP obtained by incubating the 0.2 M DEAEcellulose nuclear extract with [a-"*P]dCTP in the presence of ad DNA-protein complex for 90 min at 30 "C. All proteins were subjected to electrophoresis on a 10% SDS-polyacrylamide gel as described by Laemmli (1970). The molecular weight markers were ["%]methioninelabeled ad2 virion proteins and the position of the 87K pTP is indicated.

Adenovirus Terminal Protein
Precursor 13501 selected by this plasmid DNA have been discussed previously (Stillman et al., 1981). The second source of radiolabeled protein was obtained by iodination of proteins covalently attached to the virion DNA from ad2 tsl which was grown at either 32 "C (permissive temperature) or 40 "C (nonpermissive temperature). The iodinations, by the chloramine-T method, were performed with and without extensive denaturation of the protein. Proteins were removed from the DNA by treatment with either deoxyribonuclease I or piperidine and separated by gel electrophoresis (Stillman et al., 1981; Fig. 1, B and C). At the nonpermissive temperature, only the precursor 87K protein was detected (Fig. 1C). However, at the permissive temperature, three proteins of M , = 87,000,62,000, and 55,000 were detected ( Fig. 1B).
The third method for radiolabeling pTP was by the formation of a 3'P-labeled pTP-dCMP complex in a cell-free extract prepared from adenovirus-infected HeLa cells. DNAprotein complex, the template for replication in vitro, and [a-3'P]dCTP were incubated with the DEAE-cellulose fraction (see "Experimental Procedures") and the reaction mixture was then subjected to electrophoresis in a preparative gel. An analytical gel of approximately 4% of the reaction mixture is shown in Fig. 1D and reveals a single labeled band at the position of the pTP and contains pTP covalently linked to a dCMP residue Challberg et al., 1982).

Tryptic Peptide Maps of Radiolabeled Proteins-The [35S]
methionine-labeled 87K protein that is translated from E2Bselected mRNA has been shown to be structurally related to both the 35S-labeled 87K pTP and 55K TP that are covalently attached to the DNA of ad2 tsl virus grown at 40 "C and wild type ad2 virus respectively (Stillman et al., 1981). For reference, Fig. 2 shows a tryptic peptide map of [35S]methioninelabeled 8TK pTP that was obtained by translating mRNA preparatively selected by the ad2 Ban-E fragment. Because very few counts were obtained by labeling pTP-related proteins with [35S]methionine in uiuo, we demonstrated that the 62K protein that is covaIently bound to the DNA of ad2 tsl virus grown at 32 "C was related to pTP and TP by labeling these proteins with 1251 ( Fig. 1, B and C). Tryptic peptide maps of the 87K pTP from ad2 tsl virus grown at 40 "C and the 87K pTP, 62K, and 55K TP proteins from ad2 tsl virus grown at 32 "C are shown in Fig. 3. The labeled proteins were removed from the DNA by treatment with either DNase I or with piperidine before preparative gel electrophoresis and digestion with trypsin.  Fig. 3 demonstrates that the 62K protein is structurally related to both the 87K pTP and 55K TP. Only peaks A and E are absent from the 62K protein but are present in the 87K protein. Peak H is present in the 55K protein but is not observed in either of the larger proteins. This observation, coupled with the fact that the relative amount of counts present in each peak differs when the 55K protein is labeled, suggests that the cleavage of the precursor protein to the 55K TP alters the accessibility of regions of the protein to iodination. Two peaks of radioactivity (fraction numbers 135-156; Fig. 3) are observed when the DNA is completely removed from the protein by treatment with piperidine but not when it is partially removed by DNase I digestion, and suggest that these peptides may be linked to the DNA. A problem that we observed when analyzing peptides labeled with lZ5I by the chloramine-T procedure was that single peptides eluted from the reverse phase column in two positions, probably due to either one or two lz5I groups being attached to the ring of the tyrosine residues in the protein ( e g . see Fig. 6).
Determination of the Position of the Radiolabeled Amino Acids within the Tryptic Peptides-The DNA sequence predicts three long open reading frames on 1-strand DNA within the region from coordinate 30 to coordinate 11. The leftmost reading frame, coordinate 15.2 to 11, had already been shown to encode the virion polypeptide IVaz (Chow et al., 1977;Lewis et al., 1979;Persson et al., 1979). To determine which of the other two remaining open translational reading frames encode pTP, we have carried out partial amino acid sequence analysis.
The [35S]methionine-labeled peptides derived from the 87K translation product of EBB-selected mRNA ( A -0 , Fig. 2) were subjected to automatic Edman degradation. Fig. 4 shows the position of each methionine residue within these peptides and the percentage of total counts remaining in the spinning cup.
Identification of a methionine at position 12 (peptide L ) was sufficient to distinguish between the open reading frame from coordinate 28.8 to 23.4 and that from coordinate 22.9 to 14.2, since a predicted tryptic peptide with a methionine at position 12 occurs uniquely at coordinate 24.2. The amino acid sequence obtained from the open reading frame that contains this peptide is shown in Fig. 7, which also contains the positions of other methionines relative to tryptic cleavage sites (arginine and lysine). Table I summarizes the position of methionines that were observed (from Fig. 4) and expected from the two large open translational reading frames. We have identified peptides corresponding to those peptides predicted in the region between coordinates 28.9 and 23.4, and do not see any signifcant correlation with those peptides predicted in the region between coordinates 22.9 and 14.2. We did not observe peptides having methionines at position 17 and additional peptides having methionines a t positions 3 and 11. Since these predicted peptides are large, they may not have eluted from the gel slice after digestion with trypsin. In addition, the predicted peptide with a methionine a t position 17 (Fig. 7, line 2) contains two glutamines, which distort sequence analysis. Indeed, sequencing of all peptides that contain a glutamine prior to the methionine resulted in a high percentage yield of counts remaining in the spinning cup.
Two peptides with methionines at positions 3 and 18 (Fig.  7 , line 6) contain multiple arginine residues at the trypsin cleavage site. Peptides with methionines at positions 3-5 (Fig.   4, D, E, and F) and 18-20 (Fig. 4, M, N , and 0) may correspond to these predicted peptides as a result of partial trypsin cleavage at these sites. There are two additional peptides that we have sequenced that cannot be explained by the predicted amino acid sequence from the region between coordinates 28.9  (Fig. 2) were sequenced and the percentage of total counts/min applied is shown for each residue number. Each line on the vertical axis represents 5% of the total counts/min applied and the per cent counts/min remaining in the spinning cup is shown as a bur on the right.

Adenovirus Terminal Protein
Precursor 13503 and 24.3. These peptides contain methionines at positions 2 (either G, H, I, K , or A4) and 15 (either M or 0) and may derive from coding sequences not present in this region (see "Discussion"). However, this analysis identifies the open reading frame in the DNA sequence between coordinates 28.9 and 23.4 as that which encodes at least the majority of pTP. The low number of counts obtained by 35S-labeling of pTP and its related proteins in vivo made it impractical to do partial amino acid sequence analysis on 35S-labeled 62K and 55K proteins. Therefore, to determine which part of pTP gives rise to the 62K and 55K proteins, we determined the partial amino acid sequence of selected ?-labeled peptides. This was complicated by the mono-or diiodination of tyrosines causing peptides to chromatograph as two peaks on the reverse phase columns and the fact that only a fraction of the predicted tyrosines were easily labeled by this procedure.
Peak JI and the related peptide J2 (Fig. 3) are present in all pTP-related proteins and both contain a tyrosine at position 12 (Fig. 5). A peptide with a tyrosine at position 12 is unique in the predicted amino acid sequence (Fig. 7) and is in the COOH terminus half of the molecule. This peptide also contains a methionine at position 15 and is either peptide M or 0 ( Fig. 4). Similarly, peak I and its related peptide eluting in fractions 138-145 are present in a l l three proteins, but only when the protein is removed from the DNA by cleavage with piperidine. Peptide I contains a tyrosine at position 17 (Fig.  5); only two such peptides are present in the predicted amino acid sequence and both are near the COOH terminus of the protein. We have identified peptide I as that which is covalently linked to the DNA (see below). Peptide D, which contains a tyrosine at position 8 (Fig. 5), is also present in all three proteins (Fig. 3) and corresponds to a unique predicted   Fig. 4 and the identity of each peptide is indicated in parentheses. These data do not include peak A (Fig. 3) which was obtained in variable amounts between experiments.
These peptides contain two methionines at positions 5 and 7 , 6 and 9, and I1 and 20, respectively. peptide that also lies near the COOH terminus of the protein. Peptide H , which is only observed in the 55K protein, contains a tyrosine at position 2 (Fig. 5). Although two such peptides are predicted in the amino acid sequence, one lies within the COOH terminus half of the pTP (Fig. 7, line 8). This tyrosine is probably made accessible to iodination after amino acids are removed during the maturation of 87K to 55K. This identifies the COOH-terminal end of pTP as that part of the protein from which the 62K and 55K proteins originate.
We have also identified peptides that are present in the 87K and 62K proteins, but absent from the 55K protein. Peptides B, C, or B / C (Fig. 3) contain tyrosines at positions 5, 6, and 5 and 6, respectively (Fig. 5). Similarly, peptides G and F contain tyrosines at positions 5 and 6 (Fig. 5 ) and may be the diiodinated counterparts to peptides B, C, or B/C. These peptides are only present in the 87K and 62K proteins, but not the 55K protein (Fig. 3). Peptides containing tyrosines at positions 5 and 6 are toward the NH2 terminus of the amino acid sequence of pTP, again indicating that the 55K protein is completely contained within COOH terminus of the pTP protein.
Finally, we have identified two peptides that are only present in the precursor 87K protein. Peptides A and E (Fig. 3) contain tyrosines at positions 2 and 1, respectively, and are only present in the 87K protein. Two predicted peptides contaii a tyrosine at position 1 and both are near the NH2 terminus of the amino acid sequence. Also in this region is a short tryptic peptide that contains a tyrosine at position 2 and is most likely to correspond to peptide A because of its early elution from the reverse phase column. Thus, sequence analysis of iodinated peptides has determined the relative positions within the predicted amino acid sequence of pTP, and of the 62K and 55K proteins. These are indicated in Fig. 8.
Identification of the Site of Covalent Attachment of Virus DNA to pTP-DNA-protein complex prepared from ad2 tsl virions was denatured and then iodinated by the chloramine-T method. The DNA-protein complex was then digested with either trypsin or V8 protease and the DNA then separated from free peptides by equilibrium centrifugation in CsCl gradients containing 4 M guanidine HC1. The peptides that remained bound to the DNA were labeled with "' 1, which indicated that a tyrosine was near the DNA attachment site. They were removed by treatment with piperidine and then subjected to reverse phase chromatography (Fig. 6, A and B). The tryptic peptide eluted as a doublet (Fig. S A ) with approximately the same retention time as the peptide that was only released from the DNA by treatment with piperidine (Fig. 3). The peak eluting in fractions 139-145 (Fig. 6-4) contained a peptide with the 1251 label at position 17 (Fig. 6E). The other peak in Fig. 6A corresponded to peak I (Fig. 3) which also contained a tyrosine at position 17 (Fig. 5). Similarly, the peptide released by digestion with V8 protease eluted as a doublet and the fwst peak contained a peptide with the ' ' ' 1 label at position 3 (Fig. SF). This revealed the identity of the peptide that is covalently attached to the virus DNA as a unique peptide in the predicted amino acid sequence (Fig. 7). This peptide also contained a methionine at position 12, which corresponds to peptide L (Figs. 2 and 4).
That the point of attachment of virion DNA was similar to the site of binding of dCMP in the pTP-dCMP complex was suggested by the following experiment. The terminal protein precursor, labeled with [ L U -~~P I~C M P as described above (Fig.   lo), was purified by preparative SDS-polyacrylamide gel electrophoresis and then digested with either trypsin or V8 protease. The resulting peptides were then chromatographed on a reverse phase column (Fig. 6, C and D). The two peptides eluted in approximately the same relative position as the Iz5I- Peptides A-J (Fig. 3) were sequenced and the percentage of total counts/min applied is shown for each residue number. Each line on the vertical axis represents 5% of the total counts/min applied and the per cent counts/min remaining in the spinning cup is shown as a bar on the right. Peptides that were sequenced were from 87K-40 "C DNase (A, B, C, E , F, G, J , ) ; 62K-32 "C DNase arately and subjected to automatic sequential Edman degradation. The percentage yield of total counts/min applied for each residue number is shown in E and F, respectively. Each line on the vertical axis represents 5% of the total counts/min applied and the per cent counts/min remaining in the spinning cup is shown as a bar on the right. The counts/min applied was E , 8,972, and F, 5,684. In F, residue 3, the yield was 68%.
labeled peptides that were covalently attached to the DNA, analysis, to determine the position within each peptide of the the slight difference in elution is probably due to the remain-serine residue that is linked to the dCMP residue via a ing dCMP residue attached to the peptides, which is more phosphodiester bond (Challberg et al., 1980;Lichy et al., 1981; evident for the shorter V8 protease peptide. This suggested Challberg et al., 1982) were not successful, again probably due that the DNA attachment site and the dCMP binding site to the dCMP residue attached to the peptide. However, each were one and the same. Attempts, by automatic sequence peptide only contains one serine residue (Fig. 7).

DISCUSSION
The adenovirus terminal protein has been the object of considerable interest because of its novel function in the initiation of virus DNA replication. The precursor form of the protein was shown to be encoded by the early region E2B in the genome (Stillman et aZ., 1981), which contains a large coding capacity between coordinates 11 and 30. The precursor protein was shown to be structurally related to the virion terminal protein (Challberg et aZ., 1980;Stillman et al., 1981) and is present in replication extracts from virus-infected cells, on the termini of ad2 tsl virion DNA and on the termini of intracellular replicating virus DNA. A protein of 62,000 daltons is also associated with the virion DNA of the ad2 tsl mutant and we have demonstrated that it is also structurally related to both TP and pTP. This protein may therefore represent an intermediate cleavage product of pTP produced during maturation of the virion. Since the mapping of the gene encoding pTP on the virus genome, the DNA sequence of the E2B region has been determined for both ad2 (Gingeras et al., (1982); Alestrom et al., (1982)) and ad7.' Both DNA sequences show extensive regions of homology in the large open reading frames on the 1-strand DNA. Indeed, Green et al. (1979) and Rekosh (1981) have shown a high degree of conservation between the terminal proteins from a number of adenovirus serotypes. We have obtained partial amino acid sequence of peptides from all three TP-related proteins from ad2 and have aligned these sequences to the amino acid sequence predicted from the DNA sequence. These data were sufficient to enable us to distinguish among the three large open translational reading frames in the E2B region. The gene encoding pTP maps to the right-hand end of early region E2B between coordinates 28.9 and 23.4 (Fig. 8). This open reading frame would produce a 74,709-dalton protein if translation were to begin at the fiist methionine residue in the sequence (however, see below). Thus, it is most likely that pTP is translated from the largest E2B mRNA that has leaders at coordinates 75,68, and 39 and a main body between approximate coordinates 30 and 11 (Fig.  8) (Stillman et al., 1981). Both the 62K and 55K proteins * J. A. Engler, personal communication.
derive from the COOH-terminal end of pTP and the exact location of the proteolytic cleavage sites are currently being determined. However, we note that there exists a predicted amino acid sequence within pTP (Asp-Met-Thr-Gly-Gly-Val-Phe; Fig. 7) that has striking similarity to the proteolytic cleavage site utilized for maturation of protein pVI to virion protein VI (Asn-Met-Ser-Gly-Gly-Ala-Phe; Akusjarvi and Persson, 1981). This latter cleavage is also affected by the ad2 tsl mutation (Weber, 1976). Another region of interest is a sequence that contains repeated amino acids in tandem (Arg,-Val-Pro~-Glua-Gly-Glu-Ala-Leu-Met-Glu~-Ile-Glu4; Fig. 7), which has unusual charge and conformational properties. Since this sequence is near the expected NH2 terminus of the 55K TP, it may separate two functional domains in the precursor protein. This is also a site of DNA sequence heterogeneity between different ad2 DNA molecules (Gingeras et al., (1982)).
This study does not define the COOH terminus of pTP due to the lack of methionine and tyrosine residues in this region; however, we have excluded the long open translational reading frame to the left of coordinate 22.9 as a part of pTP. Similarly, the nature of the NH2 terminus of pTP remains unclear and is not clarified by the DNA sequence of the region between coordinates 30 and 11. Attempts to directly sequence the NH2 terminus of [35S]methionine-labeled pTP synthesized by translation of selected mRNA have not been successful, which may indicate that the NH2 terminus is blocked. The first AUG in frame with pTP coding frame occurs 16 codons from the upstream terminator and the amino acids between this terminator codon and the AUG are highly conserved in ad2 and ad7. This terminator codon also corresponds to the point where the ad2 and ad7 DNA sequence homology diverges' and also the site where there is a viable, 2-base deletion in ad5 dl309 DNA (Thimmappaya et al., 1979). These factors place a limit upon how far the coding region extends toward the right end of the virus genome (Fig. 8). We consider it likely that a small amount of coding region from the leader at coordinate 39 is juxtaposed by RNA splicing to the long open reading frame discussed here. Indeed, a consensus splice acceptor site overlaps with the terminator codon and the XbaI site at coordinate 28.9 (Gingeras et al., 1982;Alestrom et al., 1982). We are currently making cDNAs across this splice  (Gingeras et al., 1982). The bar above the line indicates the region to which the N-group mutants of ad5 have been mapped. The E2B mRNAs (Stillman et el., 1981) are shown below the genome map. The bottom shows the region encoding pTP, the 62K, and 55K-terminal proteins; the dashed line indicates the regions where the proteolytic cleavage sites must occur. The site of covalent linkage of virus DNA to the protein and a possible cleavage site are also shown. junction to determine whether the open reading frame in the mRNA extends beyond the sequence at coordinate at 28.9.
A detailed knowledge of the amino acid sequence for pTP has enabled us to begin defining functional sites within the protein. We first chose to determine the site of covalent attachment of virus DNA to the molecule, which must lie within the sequence common to pTP, 62K, and TP. Peptides produced by either trypsin or V8 protease cleavage of lz5Ilabeled pTP that remained covalently bound to the DNA were purified and sequenced to determine the position of the labeled tyrosine. We were able to identify a unique sequence within the predicted amino acid sequence that was covalently bound to the DNA, and each overlapping peptide contained only one serine residue (Fig. 7). Desiderio and Kelly (1980) and Challberg et al. (1980) have shown that the covalent linkage between DNA and both T P and pTP is via a phosphodiester bond between a serine residue in the protein and the 5'-hydroxyl of the terminal dCMP residue. Thus, we conclude that the unique serine residue within the peptides covalently attached to DNA is the site within T P and pTP of covalent linkage to DNA. This conclusion is supported by the similar elution on a reverse phase column of peptides produced by either trypsin or V8 protease hydrolysis of pTP that had covalently bound [u-~'P]~CMP in a cell-free extract prepared from adenovirus-infected cells. The predicted amino acid sequence surrounding the site within the protein of covalent attachment of virus DNA is perfectly conserved between ad2 and ad7: suggesting that this region of the protein plays an important enzymatic role in the initiation of DNA replication.
An unexpected result from the mapping of the coding region for pTP was that it did not overlap with the region of the genome to which the N-complementation group mutants had been mapped by marker rescue experiments (coordinates 18-22.5;Galos et al., 1979). These mutants, including ad5 ts36 and ts149, fail to synthesize virus DNA at the restrictive temperature (Wilkie et aL, 1973;Ginsberg et al., 1974;van der Vliet and Sussenbach, 1975) and also fail to transform rodent cells at this temperature . Thus, a virusencoded protein, in addition to the 72K single strand DNA binding protein and pTP, must function in the replication of virus DNA. It is most likely that this protein is encoded by the large open reading frame in E2B (coordinates 24-14.2) and that it corresponds to the 140K protein that co-purifies with pTP (Enomoto et al., 1981). The 140K/pTP-purified proteins were able to complement in vitro, inactive extracts prepared from ad5tsl49-infected cells.3