Primary structure of the low molecular weight nucleic acid-binding proteins of murine leukemia viruses.

Murine leukemia viruses contain a low molecular weight basic protein, designated p10, which binds to single-stranded nucleic acids. The complete amino acid sequence of p10 from the Rauscher strain of virus has been determined. The partial amino acid sequences of p10s from Moloney, Friend, AKR, Gross, radiation leukemia, and BALB/2 viral strains have also been determined using microsequencing techniques. Rauscher p10 is composed of 56 amino acid residues; the other p10s are similar in size but differ from Rauscher by a few conservative amino acid substitutions. The structure of Rauscher p10 was compared to the structure of a functionally homologous protein from Rous avian sarcoma virus. The comparison revealed regions of amino acid sequence homologies which indicate a phylogenetic relationship between the murine and avian viral strains. The analyses revealed a periodic placement of three Cys residues and a Gly-His sequence. A structure involving these residues is found once in the murine protein and twice in the avian protein. A similar structure is seen in the single stranded nucleic acid binding protein of bacteriophage T4. However, in the latter case, the order of amino acid residues is inverted.

virus involves concomitant morphological changes, increase in infectivity, increase in reverse transcriptase activity, and proteolytic cleavage of Pr65BW (7). The cleavage products are proteins designated p15, p12, p30 and p10, listed in the order in which they occur from the NH2 to carboxyl terminus of the precursor (8, 9).
Pr6YW plays several critical roles in the process of viral assembly. It must specifically complex with portions of gPr85'"" (or its cleavage products), while forming a specific complex with viral RNA. It may also be capable of selfassociation complexes which assist in the provision of a driving force for the budding process. The natural cleavage products of Pr6YW (listed above) may play critical roles in the subsequent stages of the viral reproduction cycle.
A detailed knowledge of the structure and function of Pr65gag and its natural cleavage products will be a necessary step toward understanding the biology and evolution of retroviruses. The primary structure of Pr6YW is most easily approached through structural studies of the natural cleavage products. Partial amino acid sequence data of gag gene products from a variety of type C retroviruses have revealed homologies which suggest phylogenetic relationships among viruses of diverse species of origins (10)(11)(12). They have also revealed structural relationships between viral p12s and cell u l a r histones (13). Functional studies have shown that viral p12s are capable of specific binding to a few sites on homologous viral RNA (14,15), while viral plOs bind nonspecificdy to single-stranded nucleic acids (DNA or RNA) (16) and form RNA-protein complexes which can be isolated from disrupted virus (17).
In this communication, we report the complete primary structure of R-MuLV p10, as well as amino acid compositions, and NHz-terminal sequences of plOs isolated from ecotropic F-MuLV, M-MuLV, G-MuLV, AKR-MuLV, Rad-MuLV, and the xenotropic BALb/2-MuLV. These data show that the p10 portion of MuLV Pr6Yag is highly conserved in evolution. A comparison of the R-MuLV p10 structure to the recently completed structure (18) of R-ASV p12, a functionally homologous nucleic acid-binding protein of avian type C virus (17,19), revealed regions of amino acid sequence homology between avian and mammalian viruses. Preliminary reports of this work have been presented (20, 21).

RESULTS AND DISCUSSION
The complete amino acid sequence of R-MuLV p10 is shown in Fig. 1. The structure was determined by semiautomated Edman degradation of the whole protein through the first 28 residues and by sequence analysis of the peptides produced by trypsin-, chymotrypsin-, or endoproteinase Lys-C-catalyzed hydrolysis of p10 or acetimido-pl0. The critical peptides needed to prove the structure are also indicated in Fig. 1. The additional peptides which serve to c o n f m the structure are given in the miniprint supplement.
R-MuLV p10 is a linear polypeptide of 56 amino acid residues with a molecular weight of 6347. Early investigation placed the molecular weight of p10 at approximately 10,000 (22); hence the designation p10 (23). More recent work has estimated the molecular weight to be approximately 7,000 (19), which is in good agreement with the structure presented here. Previously reported amino acid compositions of p10 were based on an estimated molecular weight of 10,000 and, hence, over-estimated the number of residues/mol. The structure of R-MuLV p10 presented here is in good agreement with the reported amino acid composition (19,24) when the latter is revised to a molecular weight of 6347. R-MuLV p10 has seven carboxyl groups (four aspartic acid residues, two glutamic acid residues, and one a-carboxyl group) and 16 basic groups (five lysine residues, nine arginine residues, one histidine residue, and one a-amino group). The structure indicates that R-MuLV p10 is a basic protein, in agreement with its reported isoelectric point of greater than 9.0 (16,25). Therc are three cysteine residues (positions 26,29, and 39 in Fig. 1). This raises the possibility of disulfide cross-links; however, attempts to demonstrate unique disulfide bonds have been 5 10 15 Ala-Thr-Val-Val~Ser~Gly.Gln-Arg~GIn-Asp-Arg-Gln-Gly-Gly~Glu. inconclusive. Such bonds may be formed but may not be present in the native protein. This is analogous to the recent finding with the homologous protein (p12) from R-ASV (26).
The amino acid composition of MuLV plOs from various virus strains is shown in Table 1, and the partial amino acid sequences are compared in Fig. 2.  (Table VII). Similar results were obtained for AKR-MuLV p10 (Table VII). The conservative nature of the MuLV plOs was suggested by their high degree of immunological cross-reactivity (27). The data presented here indicate that the MuLV plOs may differ from each other by only a few conservative amino acid substitutions. The primary structure of the nucleic acid binding protein (p12) of R-ASV (Prague strain) has been reported by Misono et al. (18). The protein is a single chain composed of 87 amino acid residues. The structures of R-MuLV p10 and R-ASV p12 were compared and analyzed by the method of Dayhoff (28). When the complete structures of both proteins were compared, an alignment score of 4.25 was obtained. However, some segments of R-MuLV p10 could be aligned with more than one segment of R-ASV p12 giving alignment scores well above the range of random chance. This situation arises  The values are expressed as whole numbers based on sequence analysis (Fig. 1); amino acid analysis given in Table VI in the miniprint. 'Values were based on two samples hydrolyzed for 24 h and calculated assuming a molecular weight equal to R-MuLV p10.

A T V V S G Q K Q D R Q G G E R R R S Q L D R D Q X A Y X K E K G H W A K D X X K K X X G
fr. Studies of the effects of chemical modifications of R-MuLV p10 on the nucleic acid binding activity of the protein have indicated that Tyr and Lys residues may be involved in the binding site (21). Aromatic residues have been implicated in the binding site of other proteins which interact with singlestranded nucleic acids (29)(30)(31). R-MuLV p10 has five Lys p10.
indicates gaps introduced to bring homologous regions of the structures into alignment in the matrix. Identical residues in the alignment are underlined. The amino acid sequence of R-ASV p12 is taken from Misono et al. (18). residues and two aromatic residues and all are located within the segment from residues 26-42. This segment of R-MuLV p10 can be aligned with either of two segments of R-ASV p12, resulting in a high number of identities (Fig. 3). Within this region, the residues that are conserved in R-MuLV p10 and in both segments of R-ASV p12 are Cys residues in alignment columns 27,30, and 40 and a Gly-His sequence in columns 34 and 35. This structure can be expressed as a set of three Cys residues spaced at n, n + 3, and n + 13 and a Gly-His sequence at n + 7 and n + 8, where n is the position in the amino acid sequence of the first Cys residue in the set. An analogous structure is seen in the amino acid sequence of the ssDNAbinding gene 32 protein of T4 phage. However, in the gene 32 protein the structure is inverted with Cys residues at n, n -3, and n -13 and a His-Gly sequence one residue displaced at n -8 and n -9 (residues 77-90 in the amino acid sequence of the gene 32 protein (32)). The structure may be important to the biological function of these proteins and evolved independently in the T4 phage and the retroviruses.
The functional significance of amino acid sequence homologies between R-MuLV p10 and R-ASV p12 at the amino and carboxyl-terminal regions of the two proteins may be related to a common mode of cleavage of their precursor proteins (24). Very little is known about the nucleic acid binding mechanism of these proteins. The MuLV plOs are among the smallest of the known nucleic acid binding proteins and, as such, should be good model protein for studies of proteinnucleic acid interaction. The knowledge of this structure will greatly facilitate such studies.