Rubella Virus cDNA SEQUENCE AND EXPRESSION OF El ENVELOPE PROTEIN*

A cDNA clone encoding the entire El envelope pro- tein (410 amino acid residues) and a portion of the C-terminal end of the E2 envelope protein of the rubella virus has been isolated and characterized. DNA sequence analysis a region 20 nucleotides in length at the 3‘ end of the cloned cDNA which may be a replicase recognition site or a recognition site for encapsidation. The proteolytic cleavage site between the El and E2 proteins was localized based on the known amino-terminal sequence of the isolated El protein and the deduced amino acid sequence. The mature El protein is preceded by a set of 20 highly hydrophobic amino acid residues possessing characteristics of a signal peptide. This “signal peptide” is flanked on both sides by typical protease cleavage sites for trypsin-like enzyme and signal peptidase. The presence of a leader sequence in the E, protein precursor may facilitate its transloca- tion through the host cell membrane.

A cDNA clone encoding the entire El envelope protein (410 amino acid residues) and a portion of the Cterminal end of the E2 envelope protein of the rubella virus has been isolated and characterized. DNA sequence analysis has revealed a region 20 nucleotides in length at the 3' end of the cloned cDNA which may be a replicase recognition site or a recognition site for encapsidation. The proteolytic cleavage site between the El and E2 proteins was localized based on the known amino-terminal sequence of the isolated El protein (Kalkkinen, N., Oker-Blom, C., and Pettersson, R. F. (1984) J. Gen. Mrol. 65, 1549-1557) and the deduced amino acid sequence. The mature El protein is preceded by a set of 20 highly hydrophobic amino acid residues possessing characteristics of a signal peptide. This "signal peptide" is flanked on both sides by typical protease cleavage sites for trypsin-like enzyme and signal peptidase. The presence of a leader sequence in the E, protein precursor may facilitate its translocation through the host cell membrane.
The E, protein of rubella virus shows no significant homology with alphavirus El envelope proteins. However, a stretch of 39 amino acids in the E, protein of rubella virus (residues 262-300) was found to share a significant homology with the first 39 residues of bovine sperm histone. The position of 4 half-cystines and 8 arginines overlaps.
The El protein of rubella virus has been successfully expressed in COS cells after transfecting them with rubella virus cDNA in simian virus 40-derived expression vector. This protein is antigenically similar to the one expressed by cells infected with rubella virus.
Rubella was first described in the 18th century in Germany, and thus, the name German measles was coined. Rubella in children and young adults is characterized by rash and mild fever. During early pregnancy, however, rubella virus infection can cause fetal death or multisystem birth defects including deafness, cataracts, mental retardation, and congenital heart disease (1)(2)(3)(4)(5)(6). An essential breakthrough in the development of a rubella vaccine was achieved in 1962 with the isolation of the rubella virus in cell culture by Parkman et al. (7) and by Weller and Neva (8). A worldwide epidemic of rubella in 1963-1965 prompted the development of an effective vaccine in 1966 by Parkman et al. (9) and other groups (IO, 11). AS a " _ --* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession numbeds) 502620.
_ _ _~ result of vaccination, there has been a dramatic reduction in the incidence of rubella and congenital rubella in children of early age groups; however, the majority of cases with rubella are now found in older adolescents and young adults.
The structure of the virus and the mechanism by which the virus infects the cell have been the subject of many studies (12,131. Several groups of investigators have tried to elucidate the structure of rubella virus by characterizing the viral proteins (14)(15)(16)(17)(18). The general consensus reached by these groups is that the virus consists of a nucleocapsid protein ( M , = 30,000) and three envelope glycoproteins, E, (M, = 58,000), EZa ( M , = 47,000), and EZb ( M , = 42,000) (17,18). However, tryptic peptide analysis has indicated that Eza and EZb are closely related and may represent two different glycosylated forms of the same polypeptide (17). It has been shown that the genome of rubella virus consists of a 40 S single-stranded polyadenylated RNA, which is infectious (19)(20)(21)(22). In infected cells, a subgenomic 24 S RNA which is derived from the 3' end of the 40 S RNA has been identified (22). This 24 S RNA encodes a polypeptide of M , = 110,000 which is thought to be the precursor for the envelope and capsid proteins (22). As part of our plan to investigate the mechanism of viral infection of host cells and to devise possible virus-specific therapeutic agents, we have undertaken the isolation and characterization of the rubella virus genome. In this report, we summarize our findings on the elucidation of the structure of a cDNA clone encoding the E, protein of rubella virus. In vitro translation of poly(A)+ RNA extracted from virusinfected cells in the rabbit reticulocyte system showed a polypeptide of 110 kDa precipitable with anti-rubella virus antibodies. Poly(A)+ RNA prepared from uninfected cells failed to translate this protein (data not shown). These experiments suggest that infection of Vero 76 cells with rubella virus induces the production of a protein of 110 kDa which presumably could be post-translationally processed to yield the 58-, 42-, and 30-kDa proteins.
Cloning of Rubella Virus mRNA-CDNA libraries from virally infected cultures were grown on duplicate filters and screened for rubella virus mRNA using cDNAs synthesized from either infected or uninfected cell total poly(A)+ RNA. One set of filters was hybridized to a-"P-labeled cDNA synthesized from infected cell total poly(A)+ RNA, and the second set was hybridized to c~-~~P-labeled cDNA synthesized from control cell total poly(A)+ RNA. Approximately 110 cDNA clones were obtained which hybridized preferentially to the infected cDNA probe. A cDNA clone (~R v~~) with the longest insert, 1889 bp? was isolated. To confirm that the pRvl15 cDNA clone is complementary to rubella virus RNA, a 32Plabeled PstI insert from clone pRvl15 was hybridized to total and poly(A)+ RNAs isolated from uninfected cells, infected cells, and viral particles. Fig. 1 shows the Northern blot analysis of such RNAs. Lanes 1 and 2 from uninfected cells do not show hybridization. Lanes 3 and 4 are from infected cells and show three hybridizable bands of 10.2, 3.5, and 1.7 kb. Lane 5 is RNA isolated from viral particles and shows one hybridizable band of 10.2 kb. The sizes of RNAs identified are in agreement with the published reports for rubella virus genomic and subgenomic RNAs except for the additional 1.7kb RNA (22). The presence of the additional 1.7-kb RNA could be due to an as yet unidentified subgenomic RNA species or to a defective intermediate RNAs.
Nucleotide Sequence of pRvl15 DNA-The restriction map and the sequencing strategy of pRvl15 are described in Fig. 2.
The nucleotide sequence was also determined from another cDNA clone (pRvz0) which is 1500 bp long and has only one PstI insert as compared to two PstI inserts found in pRvl15. The nucleotide sequence was determined in both directions by the method of Maxam and Gilbert (29). The complete nucleotide sequence of pRvl15 is given in Fig. 3. The cDNA clone is 1822 bp long and has a poly(A)+ tail of 17 residues at the C-terminal end and a C-tail of 25 residues on each end. One long open reading frame of 522 amino acid residues was revealed with two in-phase termination codons. The first termination codon is followed by a 253-nucleotide long 3'noncoding region. No putative polyadenylation signal sequence was found upstream of the polyadenylation site. Comparison of the amino acid sequence with the sequences in the protein data bank showed that a stretch of 39 amino acids between residues 262 and 300 showed similarity with bovine sperm histone (Fig. 4).
Construction of Rubella Virus E, Protein Expression Plasmid pcDpLRu4,-The scheme for the construction of pcD-pLRv45 plasmid is shown in Fig. 5. The chimeric plasmid was constructed by introducing a PstI fragment (1500 bp) of rubella virus cDNA (pRvl15) which codes for E, protein into the PstI site of the pL, vector adjacent to the SV40 early promotor. From this construct, a HindIIIIEcoRI fragment which contains the SV40 early promotor and the rubella virus cDNA were excised and ligated to a HindIII/EcoRI fragment of the pcD vector. The HindIII/EcoRI fragment of the pcD vector contains an ampicillin-resistant gene and SV40 polyadenylation signal sequences. This chimeric plasmid designated pcDpLRvl15 contains the SV40 early promotor, rubella virus cDNA, and SV40 polyadenylation signal in that order.
Expression of El Glycoprotein in COS Cells-Expression of rubella virus cDNA coding for E, protein was measured both by the production of rubella virus mRNA and protein after transfection of pcDpLRvl15 DNA into COS cells (Fig. 6). Cells were labeled with [35S]methionine for 2 h after 72 h transfection. Total proteins were extracted and immunoprecipitated using rubella virus antibody. A protein of 58 kDa was observed in cells transfected with pcDpLRv,, DNA (Fig. 6A, lane 6). This protein migrated with El protein which was immunoprecipitated from the rubella virus-infected Vero 76 cells (Fig.  6A, lane 7). No such protein band was precipitated from COS cells, COS cells transfected with either the pcD vector or pcDdhfr plasmid (Fig. 6A, lanes 2-5). Lack of similar protein band was observed in uninfected cells as well (Fig. 6A, lane 8).
Total RNA was isolated 72 h after transfection, and rubella virus mRNA was detected and characterized by the S1 nuclease procedure using a 3' end-labeled DNA probe (Fig. 6B). PstI insert (1500 bp long) of the rubella virus cDNA pRv45 which encodes the entire El protein was labeled at the 3'protruding end with [c~-~'P]dideoxy-ATP using terminal deoxynucleotidyltransferase enzyme. The labeled fragment was used in hybridization reaction with various RNAs and then subjected to S1 nuclease digestion. Hybridization with RNA isolated from the COS cells transfected with pcDpLRvl15 DNA protected the 1500-bp fragment (Fig. 6B, lane 4 ) . A similar size fragment was also protected by RNA from rubella virusinfected cells (Fig. 6B, lane 5). However, this fragment was not protected when RNA either from COS cells transfected with pcD vector (Fig. 6B, lane 3) or from uninfected cells (Fig.  6B, lane 6) was hybridized. The labeled fragment in the absence of RNA was completely digested with S1 nuclease (Fig. 6B, lane 2). Thus, from the protein as well as RNA data, it is evident that E, glycoprotein of rubella virus is expressed in COS cells from the pcDpLRv15 DNA under the control of SV40 promotor. The protein synthesized in these cells is antigenically similar to the one expressed in viral infected cells. In addition, these data further confirm the identity of the pRv15 cDNA clone.

Asp Ala Ala Cys Trp Gly Phe Pro Thr Asp Thr Val MET Ser
."  8, uninfected Vero 76 cells. B, S1 nuclease analysis. Lane 1, PstI fragment labeled with [a-32P]dideoxy ATP without S1 nuclease treatment; lune 2, with S1 nuclease; lune 3, hybridized with RNA isolated from pcD vector-transfected COS cells and digested with S1 nuclease; lune 4, hybridized with RNA isolated from pcDpLRv&ransfected COS cells and digested with S1 nuclease; lane 5, hybridized with RNA isolated from rubella virus-infected Vero 76 cells and digested with S1 nuclease; lune 6, hybridized with RNA isolated from uninfected cell RNA and digested with S1 nuclease. The molecular mass markers depicted are derived from HindIIIdigested X DNA and HueIII-digested 6x174 replicative form DNA.

DISCUSSION
A cDNA clone coding for the entire E, glycoprotein and a C-terminal portion of the Ez protein of the rubella virus has been isolated and characterized. The cDNA clone isolated hybridized to two size classes of RNAs (10.2 and 3.5 kb) isolated from virus-infected cells and one size class of RNA (10.2 kb) from purified virus particles (Fig. l ) , confirming that the cDNA clone isolated is rubella virus-specific. Similar sizes of RNA for rubella virus have been identified by other investigators (19)(20)(21)(22). In addition, the isolated cDNA clone hybridized to a 1.7-kb RNA species from infected cells. Whether this RNA represents another subgenomic viral RNA or a defective interfering RNA is not known at this time. Defective interfering RNAs approximately 2 kb in size have been isolated and characterized from both Sindbis ViNS and Semliki forest virus (37-39). The 3"noncoding region of the rubella virus cDNA clone lacks the consensus sequence for the polyadenylation signal. Similar lack of polyadenylation addition consensus sequence has been noted for a number of viral and nonviral mRNAs (40)(41)(42)(43)(44). A conserved sequence of 20 nucleotides at the 3' end of cDNA (Fig. 3) was found which is similar to the one observed in other alphaviruses (Fig. 7) (45). This conserved sequence is thought to be a replicase recognition site or a recognition site for encapsidation (45).
DNA sequence analysis of clone pRvd5 shows a long open reading frame coding for 522 amino acids. Based on the previously established amino acid sequences of a 12-amino acid fragment in the amino-terminal portion of the purified E, protein from the Therian strain of rubella virus (46), the starting site of the deduced E, protein from the cDNA clone has been assigned to the glutamic acid residue as shown in Fig. 3. This residue is preceded by a highly hydrophobic sequence of 20 amino acids possessing characteristics of signal peptides found in many secretory proteins (47). This hydrophobic region (residues -1 to -20) is flanked on both sides by potential protease cleavage sites. Ala-', Glu", a typical cleavage site for signal peptidase (47), and Cy~-'~-Arg-Arg-Ala-Cys-Arg-Arg-',, containing double basic amino acids, is typical of a trypsin-like enzyme cleavage site (48)(49)(50)(51)(52). The upstream amino acid sequences from -27 to -112 could be part of the E2 envelope protein since the gene order of structural proteins in rubella virus has been shown to be NH2terminal capsid, Ez glycoprotein, El glycoprotein, and carboxyl terminus (53).
It is tempting to speculate that the El protein is first synthesized as a precursor and then processed by proteolytic cleavage at either Arg-'5 or Arg-" in the polyprotein as shown in Fig. 3. The precursor form of El with the hydrophobic "signal" peptide would be further processed by the host signal peptidase to form a mature E, protein upon secretion. Similar hydrophobic sequences have been found between E, and EP proteins of Sindbis and Semliki forest viruses (40,41 (Fig. 4). The similarity is found in the arrangement of half-cystine and arginine residues which appear to be conserved in these segments of the two proteins (54). The position of 4 half-cystines and 8 arginines overlaps (Fig. 4). In the bovine sperm histone, the half-cystines are believed to be involved in disulfide bond formation to provide stability for optimal interaction of the positively charged arginine functional groups of histones with deoxyribonucleic acids. Likewise, it is possible that this region of E, protein might be involved in the binding of viral RNA or host DNA sequences. The hydropathic profile of E, protein of rubella virus has been examined (data not shown). There is a hydrophobic region toward the carboxylterminal end of El protein which could be a transmembrane region. Similar transmembrane regions are found in alphaviruses (40,41). The overall hydropathicity profile of rubella virus E, glycoprotein is similar to that of E, proteins from alphaviruses even though they do not share significant homology in their protein structures (data not shown). The E, protein of rubella virus has been successfully expressed in COS cells using a SV40-derived expression vector system. The E, protein produced is antigenically similar to the one expressed by cells infected with rubella virus in that it could be immunoprecipitated by anti-rubella antibody.