Nucleotides sequence of the genes for the simian virus 40 proteins VP2 and VP3.

We have determined the nucleotide sequence of the DNA of simian virus 40. The proceeding report (Dhar, R., Reddy, V.B., and Weissman, S.M. (1978) J. Biol. Chem. 253, 612-620) presents the sequence of a portion of the simian virus 40 DNA that overlaps the region encoding the 5' end of the minor structural protein VP2. We report here the sequence of the remainder of the genes for minor structural proteins VP2 and VP3. The results indicate that the mRNA for the two proteins is read in the same phase and the initiation site for VP3 lies within the structural gene of VP2. The codons of the COOH-terminal amino acids of VP2 and VP3 are read in a second phase as the codons of the NH2-terminal amino acids of VP1.

We have determined the nucleotide sequence of the DNA of simian virus 40. The preceding report (Dhar, R., Reddy, V. B., and Weissman, S. M. (1978) J. Biol. Chem. 253, 612-620) presents the sequence of a portion of the simian virus 40 DNA that overlaps the region encoding the 5' end of the minor structural protein VP2. We report here the sequence of the remainder of the genes for the minor structural proteins VP2 and VP3. The results indicate that the mRNA for the two proteins is read in the same phase and the initiation site for VP3 lies within the structural gene of VP2. The codons of the COOH-terminal amino acids of VP2 and VP3 are read in a second phase as the codons of the NH,-terminal amino acids of VPl.
See the preceding paper (1) for the background of these studies.

METHODS
These have been described in the preceding publication (1).

RESULTS
The restriction endonucleaae cleavage pattern of this region of SV40' DNA is presented in the accompanying publication (1). Part of the nucleotide sequence of the fragment Hae I is shown in Fig. 1 and   times of electrophoresis. The A, AG, CT, and C above the columns refer to products generated by selective cleavage of DNA of the bases A, A and G, C and T, or C. Letters to the right of the autoradiogram indicate successive residues in the sequence. a, label at end of fragment nearest Hae III-J; b, label at end of fragment nearest Hae III-I. site strands of each region of the DNA. In addition, the predicted patterns of T, oligonucleotides were compared with those obtained either from transcripts of the fragment prepared in vitro or from digestion of labeled viral mRNA complementary to the fragment. Such patterns were obtained from all Hue II, Hue III, and Ah I cleavage sites within this segment of DNA.
Fiers et al. have determined the sequence of approximately 150 nucleotides of HindII,III-K nearest HindII,III-E (4). We have confirmed this sequence by analysis of limited venom diesterase products and limited chemical degradation maps from the Hue III, Ah I, and Eco RII restriction sites within HindII,III-K.
Further, we have analyzed tryptic digests of VP1 and found all peptides corresponding to the sequence of this part of Hind-K.2 The sequence of the region of SV40 DNA through the entire gene for VP2 and VP3 and the codons for the NH,-terminal amino acids of VP1 is shown in Fig. 13.
Several preparations of radioactive RNA were obtained from cells that had been infected for 36 to 48 h with SV40 virus. This RNA was extracted from disrupted cells after the 624 VP3 Genes nuclei had been removed by centrifugation and was further fractionated on oligo(dT)-cellulose. The retained RNA was annealed to restriction fragments Hue III-I ( Fig. 1) or Eco RILE, eluted, and digested with T1 RNase. The products of T, RNase digestion of the mRNA were further analyzed by digestion with pancreatic RNase. The oligonucleotides obtained were always those predicted from the sequences of the fragment even though one end of this fragment lies within about 40 nucleotides of the initiation codon for VP1 and the 16 S late mRNA that directs the synthesis of VP1 has been reported to have its 5' end near the initiation codon site (5).

DISCUSSION
The nucleotide sequence of this region of SV40 DNA (Fig.  13) shows that the late strand RNA transcript contains an AUG triplet beginning within HindII,III-D fragment 41 nucleotides from the HindII,III-L-Hind&III-D junction and that this AUG is followed in phase by 352 triplets extending across the remainder of the Hind-D fragment, all of the HindII,III-E fragment and into the HindII,III-K fragment, terminated by a UAA codon in the Hind-K fragment, 126 nucleotides from the Hind-E-Hind-K junction. No other reading frame of the late strand transcript of HindII,III-D or HindII,III-E can be translated into a large peptide. The SV40 fragments HindII,III-D and HindII,III-E include the regions where temperature-sensitive mutants of Group D have been mapped by marker rescue experiments and complementation studies with deletion mutants of SV40 (6-8). The 5' end of large, late mRNA has been located by electron microscopy and by nucleic acid sequence studies near the HindII,III-C-HindII,III-D junc-VP3 Genes 10. Products of limited chemical degradation of the DNA fragment Ala-P labeled at the end nearest Hue III-I. Labeling of the autoradiogram is as in Fig. 8. a, short electrophoretic run; b, long electrophoretic run. (5,9), probably between 30 and 10 nucleotides upstream from the AUG initiating the long run of in-phase sense codons. The large, late RNA directs the synthesis of VP2 in cell-free protein-synthesizing systems (10). With polyoma, a revertant of a temperature-sensitive mutant virus simultaneously changes a peptide in VP2 and the same peptide in VP3.3 The evidence in aggregate indicates almost certainly that this region of DNA includes the codons for VP2 and VP3.
The sequence is consistent with the suggestion that VP3 is read from codons from DNA that encodes a portion of VP2 and VP3 and is translated in the same phase as VP2. The estimated molecular weight of VP2 from sodium dodecyl sulfate-gel electrophoresis is 37,000 to 39,000. The maximum number of amino acid residues expected from the DNA se-3 Gibson, W., Hunter, T., Cogen, B., and E&hart, W. (1977) J. Viral., in press. quence is 352, so that the molecular weight by sodium dodecyl sulfate gel estimation is just slightly larger than that predicted. Both VP2 and VP3 synthesized in vivo are reported to have blocked NH, termini4 so that it is not known whether they retain the terminal methionine.
In cells infected with polyoma virus, pulse label experiments and studies with inhibitors of proteolysis have not given any evidence that VP3 is derived by cleavage of VP2.5 With the related virus, polyoma, there is some evidence that polyoma VP2 may be translated from an 18 S message somewhat smaller than the 19 S message which directs the synthesis of VP2.O One hundred and eighteen codons in phase downstream from the AUG that is the probable initiator for VP2 synthesis is the next methionine codon to appear within the sequence. It 626 FIG. 11. Products of limited chemical degradation of the DNA fragment Hae III-I labeled at the end nearest HindII,III-K. The autoradiograph is labeled as in Fig. 8. a, long electrophoretie run; b, short electrophoretic run. f --G appears quite possible that this codon actually acts as initiator codon for the synthesis of VP3, perhaps from a shorter messenger RNA. Following this methionine, there are 234 codons in phase. The estimated molecular weight of VP3 from sodium dodecyl sulfate-gel electrophoresis is 27,000, so that there is fairly good agreement between the predicted ammo acid sequence from the DNA data and the estimated molecular weight of the protein.
our estimate of approximately 7%. The valine content estimated at approximately 5% is substantially lower than the reported value of 6.9%. A complete comparison is shown in Table III. Presumably, at least some of the residual differences are due to impurities in the VP3 isolated from sodium dodecyl sulfate gels, although the, possibility for genetic variation between virus strains or even the presence of other material associated with VP3 cannot be ruled out. The ammo acid analysis of gel-purified VP3 has been Although no function is known for VP2 during infection or reported. Comparison of the reported amino acid composition in virus structure, the existence of regions of the genome (11) with that calculated from the codons that we believe coding only for VP2 indicate that it plays some specific role. direct the synthesis of VP3 shows some similarities and The NH,-terminal segment of VP2, outside of VP3, has few differences. Direct analyses of the protein showed no cystine basic amino acid residues and the content of hydrophobic in VP3. The nucleotide sequence also shows no cystine codons amino acids is nearly as high as that in the NH,-terminal in either VP2 or VP3. The most striking discrepancy between peptides of proteins thought to be associated with cell mempredicted and observed amino acid content for VP3 is a branes (12). VP2 may be involved in virus membrane interacreported value of 18% for glycine residues, in contrast with tion during virus assembly or cell penetration. The COOH-VP3 Genes 627 FIG. 12. Products of limited chemical degradation of the fragment HindII,III-E labeled at the end nearest HindII,III-D. The autoradiogram is labeled as in Fig. 8. a, short electrophoretie run; b, long electrophoretic run. 'j-.. .
terminus of VP3 has a cluster of 5 and another of 6 basic amino acids, somewhat reminiscent of the clusters of basic amino acids in histones and these regions of the proteins might make contact with DNA, perhaps substituting for histone Hl.
The overlap of the codons for the 3' end of VP2 and VP3 with the codons that are read in another phase as part of the VP1 gene is the first example in which interpenetrating genes function in eukaryotic cells. It is much less extensive than the overlapping of genes recently described by Sanger et al. 'for 4X174 (13). The total size of the genome of SV40 is very similar to that of +X174 and SV40 consumes a larger portion of its sequence information in nontranslated regions than does the bacteriophage.
It is therefore not unexpected that mechanisms for maximization of utilization of genetic (b) information do occur. Production of overlapping genes translated in the same phase is an alternative to having interpenetrating genes in which the codons are translated into two different phases. The structures around the 5' ends of the VP3 message are of some interest. Beginning within about 80 nucleotides upstream from the AUG initiation codon for VP3 is the sequence UUUUUUA in the VP2 mRNA. The sequence UUUUUU purine has been recognized as a part of a common termination signal for a variety of prokaryotic transcripts (14). It also occurs as one of the potential termination signals for the VA-RNA synthesized in adenovirus-infected human cells (15). The possibility that this or similar sequences could function as transcription termination sites in SV40 has been discussed elsewhere (16). If this sequence did act as a transcription  site, there might be a mechanism by which transcription arrest shortly upstream from the 5' end of the VP3 coding region would permit the formation of an mRNA whose 5' end lay near the coding region for VP3 and which would code for VP3 but not VP2. Preceding the putative AUG initiation codon for VP3 is a sequence that is relatively rich in uridylic acid. A uridylic acid-rich stretch also occurs preceding the initiator codon for VP2.
In the transcription of A phage late mRNA a constitutive promoter leads to production of a low molecular weight RNA, the 6 S RNA. Transcription terminates at the end of this RNA in vivo and i n vitro in the absence of Q gene product. In the presence of Q gene product it is elongated to make late mRNA (17). Very near or exactly at the site of transcription termination is an RNase III cleavage site and the sequence UUUAU found at the end of the 6 S RNA is also the sequence found at the 3' end of RNase III cleavage sites in T7 mRNA (18,19). Therefore, precedents exist in prokaryotes for overlap between transcription termination sites and sites at which RNA cleavage occurs during the mRNA processing. Available data do not permit one to choose among the possibilities that or both of these alternatives or a more complex process might generate the VP2 and VP3 mRNA. The availability of large amounts of SV40 DNA of known structure should provide a favorable system for probing fidelity of RNA transcription or processing with animal cell enzymes. If this or a similar interpretation of the sequence is correct, there is probably a mechanism influencing the relative amounts of VP2 and VP3 made during the later phases of infection.
Very recently the nucleotide sequences have become available for human (20) and rabbit (21) /3-mRNA and partial sequences for human cY-globin mRNA.' These sequences show a bias in the utilization of synonym codons. For example, UUG is not used at all in the p-chain mRNA and either rarely or not at all in the a-chain mRNA. CUG is by far the most abundant codon for leucine and GUG is the most abundant codon for valine. There are rather similar amounts of cytidylic acid and uridylic acid in the third position of codons for the /3-and cu-globin mRNA. In contrast, VP2 and VP3 mRNA appear to use almost all codons except certain codons containing the dinucleotide CG. This dinucleotide is known to be uncommon both in animal cell DNA and in SV40 viral DNA. The codon usage for VP2 is summarized in Table IV. Unlike the hemoglobin mRNA, the most common codons for 630 VP3 Genes

22.
leucine are the UU purine codons. Except for the region of overlap of VP2 and VP1 genes, there is a 2%fold excess of uridylic acids, as compared with cytidylic acids in third position of termination codons. This is reminiscent of the excess of uridylic acids noted in +X174 and also in the segment of the SV40 early mRNA whose sequence has been reported by Volckaert et al. (22). The reasons for such selection are unknown.
They could reside in requirements of the structure of DNA itself, in preference for the GU wobble pair in the third position of codons as compared with the GC pair, or in preference for tRNAs which contain adenylic acid or modified uridylic acid in the first position at the anti-codon and which would pair preferentially with codons having uridylic acid in third position. The results indicate that the pattern of codon utilization in hemoglobin mRNA is not a general feature of animal cell mRNA.