Primary structure of the major coat protein of the filamentous bacterial viruses, If1 and Ike.

The primary structures of the major coat proteins from If1 and Ike filmentous coliphages have been determined by automated Edman degradation before and after cyanogen bromide cleavage and by manual sequencing of certain tryptic peptides. Carboxypeptidase A and B digestion was also used to determine the sequence of the COOH termini of these proteins. A comparison of the major coat proteins from these two phages with those from other filamentous phages show that they all share several common features, namely an asymmetric distribution of positively and negatively charged amino acid residues, which are clustered with the COOH-terminal and NH2-terminal regions respectively, and a region of 19 residues which is located in the middle of the polypeptide chain. The consequences of this charge distribution for a possible mechanism of virus maturation are discussed.

Isolation of the B Protein from the If1 and Ike Viruses.-The viruses were purified as described previously (3). The protein was separated from the virus DNA by the phenol extraction method of Asbeck et al. (6). For some of the sequence analyses, viral DNA was not separated from the protein since the solubility of the intact virus was greater than the isolated protein and the presence of DNA appeared to be advantageous to the liquid phase automated sequencing analysis. Neither the presence of the minor coat proteins nor DNA affected the sequence analysis of the major coat protein.
Sequencing reagents-Succinic anhydride, cyanogen bromide, BNPS-skatole, 5-dimethylaminonaphthalene-1-sulfonyl chloride, and polyamide plates were obtained from Pierce. All of the proteolytic enzymes used were purchased from Worthington. Automated sequencing reagents were those of Beckman and high performance liquid chromatograph solvents were obtained from Burdick and Jackson and from Pierce. The p-Bondapak CM column was purchased from Waters and the [1,4-14C]succinic anhydride (2-10 mCi/mmol) was purchased from New England Nuclear.
Sequence Analysis-Succinylation, cleavage by cyanogen bromide and BNPS-skatole, and digestion with trypsin, chymotrypsin, subtilisin, and carboxypeptidases A and B were performed by the standard methods (12). Automated Edrnan degradation was performed with a Beckman 890C Sequencer, using a peptide program (Beckman 102974) with dimethylallylamine-trifluoroacetic acid buffer or a 0.1 M Quadrol program with combined S, + Sa wash program (Beckman 030176). The phenylthiohydantoins from the Edman degradation cycles were identified by high performance liquid chromatography (Waters) using a p-Bondapak CI" column in a solvent of Bhown et al. (13) and by amino acid analysis using a Durrum D-500 analyzer following hydrolysis of the phenylthiohydantoin-derivatives in HI (14). Short peptides derived from proteolytic digestion were isolated by gel filtration and by high voltage paper electrophoresis. Manual microsequencing of these short peptides was performed according to the methods described by Gray (15).

RESULTS'
The Amino Acid Sequence of the If1 B Protein

Amino Acid
Sequence of If1 and Ike 5793 a molecular weight of 5293 calculated from the sum of the constituent residue weights. The complete amino acid sequence of the protein is shown in Fig. 1. The protein lacks proline, cysteine, and histidine. The proposed sequence was deduced as follows. The intact protein was fist submitted to automated Edman degradation with a liquid phase sequencer, which gave the sequence of 41 residues from the NH, terminus as indicated by sequence 1 in Fig. 1. The identification of the phenylthiohydantoin-derivatives derived from the Edman degradation cycles and their yields are summarized in Table   I. Following succinylation of the a-and eamino groups in the protein with [1,4-'4C]succinic anhydride to make the protein more soluble in aqueous solvents, treatment with CNBr split the protein into fragments, one of which was soluble in 0.2 M NH4HCOn, pH 8, and the other not. The soluble fraction contained only the peptide spanning residues 1 to 22, as shown by the amino acid composition of the CNBr fragment 1 given in Table 11. The insoluble fraction contained the fragment spanning residues 23 to 51 together with some intact protein which had resisted cleavage. The amino acid composition of this insoluble fraction is also given in Table 11. This mixture was analyzed directly by automated sequencing without further purification since the a-amino group of the intact protein was blocked with succinic anhydride and therefore was unable to react with phenylisothiocyanate. Automatic sequencing of this fraction gave the sequence of residues 23 to 49 (Fig. 1, sequence 2). The identification of PTH-derivatives and their yields are given in Table I, sequence 2. This sequence was consistent with the amino acid compositions (which are given in Tables I and 11, respectively) of the intact protein as well as with that of the insoluble fraction. The sequence of residues 50-51 and confirmation of the results of the second sequenator run was obtained by analysis of the soluble peptides obtained by digesting the protein with chymotrypsin, trypsin, and carboxypeptidases A and B. The amino acid composition and the NH,-terminal residues of these peptides are given in Table   111.
Step NO. complete primary structure of the Ike B protein is shown in Fig. 2. It consists of 53 residues giving a calculated molecular weight of 5696. The amino acid composition of the intact protein (Table IV) is consistent with the proposed sequence. The only noteworthy features are the absence of cysteine and histidine. The sequence of the protein was established by procedures similar to those described above for the If1 B protein. Automated sequence analysis of the intact Ike B protein enabled the sequence of the first 34 residues from the NH2 terminus to be determined as indicated by sequence 1 in Fig. 2. Cyanogen bromide cleavage of the radiosuccinylated B protein produced a soluble fraction containing the peptide spanning residues 1 to 14 and an insoluble fraction which   Table IV. By submitting the insoluble fraction to automated Edman degradation, the sequence was extended to residue 47 as shown by sequence 2, in Fig. 2. Treatment of the radiosuccinylated B protein with BNPS-skatole resulted in the cleavage of the protein at the tryptophan residue located at position 29. The peptide from residue 1 to 29 was isolated from the digest by extraction into 0.1 M pyridine acetate, pH 5, and its amino acid composition is given in Table IV. The remaining insoluble fraction, which contained the segment from residues 30-53 together with intact protein as a minor contaminant, was submitted to automated Edman degradation. The sequence spanning residues 30 to 51 was determined as indicated (sequence 3, Fig.  2). The results of the automated sequence analyses on the    Table V.
The carboxypeptidase A digestion of the protein gave the COOH-terminal sequence -Ala-Val-COOH and the carboxypeptidase A plus B digestion of the protein resulted in the assignment of the sequence -Lys-Phe-Ser-Ser-Lys-Ala-Val-COOH based on the kinetics of the amino acids released as shown in Fig. 3, A and B. The succinylated protein was also digested with chymotrypsin, resulting in soluble peptides which were isolated by gel filtration and purified by high voltage paper electrophoresis. The amino acid composition, and NH, terminal analyses of these peptides, which are summarized in Table VI, were consistent with the primary structure of the protein as determined by automatic sequencing. The location of these chymotryptic peptides in the protein sequence is shown in Fig. 2.

DISCUSSION
The amino acid sequence of the If1 and Ike major coat proteins together with those of other class 1 filamentous viruses are presented in Fig. 4. A comparison of these sequences shows that If1 and Ike coat proteins possess features which are common to all major coat proteins of bacterial viruses. The proteins are similar in size, having about 50 amino acid residues, and they contain large amounts of alanine but lack histidine and cysteine. Both If1 and Ike proteins contain 19 uncharged amino acid residues in the middle of the polypeptide chain spanning residues 21 to 39. In addition to the predominance of hydrophobic residues in the central region, the proteins have a clustering of negatively charged amino acids localized in the NH2-terminal region and a high concentration of positively charged residues located in the carboxyl-terminal section. This distribution of acidic, hydrophobic, and basic amino acids may allow the major coat proteins of If1 and Ike to be embedded in the host bacterial membrane in a fashion similar to that found with other filamentous bacterial viruses (16)(17)(18)(19).
The amino acid sequences of coat proteins from six fiamentous viruses are now known: fd, f l , ZJ-2, M13, Ifl, and Ike. They are all either identical or show a strong sequence homology with one another. Thus, it seems likely that the genes for these proteins may have evolved from a common prototype ancestral gene. The sequences of fd, fl, and M13 coat proteins are all identical and differ only in one residue at position 36 from the sequence of ZJ-2 but the sequences of If1 and Ike coat proteins do not match those from the fd group. Comparison of the sequences in the NH2-terminal regions (positions [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20] indicate that the Ifl coat protein is more closely related to fd than to Ike. Prediction of the secondary structure of the If1 coat protein by the methods of Chou and Fasman (22) suggested that the region 26-37 forms a fl structure. This is consistent with the estimate of the amount of p structure determined by circular dichroism studies of the protein in detergents and in phospholipid vesicles (19)(20)(21). Since the sequences of If1 and Ike proteins resemble the fd coat protein and the hydrophobic character in all three virus proteins (residues 21-39) is highly conserved, similar predictions may be made for the structures of If1 and Ike B coat proteins.
In contrast to some sequence variation observed at the NH2 termin. of fd, ZJ-2, Ifl, and Ike major coat proteins, the sequence of the coat proteins in the region near the carboxyl terminus (position 40-50) is markedly conserved. The four basic amino acids are located in the same span of 11 residues and the distribution of these residues appears to have a certain regularity. Based on the Chou-Fasman predictions, the secondary structure of this basic region should be largely ahelical (22). Alignment of the amino acid residues of the segment in an a-helical array predicts that all of the basic residues in this segment will be located on the same side of  Physicochemical studies indicate that the gene 5 protein interacts with viral DNA by intercalation of the tyrosine residues between nucleotide bases, with the basic amino acids of the protein contributing weak but significant charge-stabilizing forces to the complex (26-28). Based on the crystallographic studies of the gene 5 protein-DNA complex, Mc-Pherson et al. (29) suggest that the gene 5 protein is aligned in a helical array forming protein discs that are surrounded by DNA. This model allows us to postulate one feature of a possible mechanism in which the B protein may be involved in virus maturation; namely, that the positively charged amino groups on the basic amino acids in the carboxyl-terminal region (see Ref. 2 for the arrangement of the B protein in the membrane) could provide "counter charges" that should destabilize the nonspecific interactions between the positive charges of the gene 5 protein and the negative charges of the DNA. Furthermore, the positive charges of the B protein could also cause condensation or bending of the DNA by "charge neutralization" (30, 31) which would be sufficient to disrupt possible stacking interactions between the tyrosine residues of the gene 5 protein and the nucleotide bases. In this way, gene 5 protein would be replaced by gene 8 protein as the progeny single-stranded DNA is extruded through the cell membrane of the infected bacteria to form the progeny virion as shown in Fig. 5. Although we would like to suggest this as one part of a general mechanism for the maturation of all filamentous phages, it fails to explain how the maturation process is initiated. Clearly, involvement of other gene products is also necessary (4, 23, 32) and their function must be revealed before we have a complete understanding of the life cycle of these phages.

NH2-Alr-Asp-Asp-Ala-Thr-Ser-Gln-Al~-Lys-Ala-Ala-Phe-Asp-Ser-Leu-Thr-Alr-G~n-A~~-Th~
The sequence of the If1 and major Ike coat proteins augment the growing collection of known filamentous virus coat (B) protein sequences. Since these proteins can be considered integral membrane proteins it is worth comparing these with others in this class. Unfortunately, information regarding the primary structure of other transmembrane proteins is scanty. It should be noted that the erythrocyte membrane protein The "stacking interaction" pictured on the right represents the postulated intercalation of tyrosine residues between the bases in the single-stranded DNA (26). This would align the bases into a regular array and allow the negative charges on phosphate backbone to be neutralized ("charge neutralization") by the amino groups located in the COOH-terminal region of the gene 8 protein which are situated at the inner face of the cytoplasmic membrane (41). As this happens, the gene 5 protein-fd DNA complex is transformed to a gene 8 protein-fd DNA complex in the membrane and is released from the cell as an intact virion. glycophorin (33) has a regional distribution of hydrophobic and hydrophilic residues similar to that found in the B proteins. For instance, the uncharged, and presumably membrane-penetrating polypeptide segment of glycophorin is followed by the cluster of basic residues which are thought to extend into the cytoplasm. Similarly, the COOH-terminal segment of p membrane immunoglobulin contains a stretch of 26 uncharged amino acid residues (34) that is comparable in hydrophobicity (35) to the membrane-associated regions in glycophorin (36). Nevertheless, it should be noted that other membrane proteins such as the ATPase subunit and bacterial rhodopsin have their polar and nonpolar residues distributed differently from the B proteins and from glycophorin (37-39).
In addition, the membrane topology of the M13 and presumably other filamentous phage coat proteins does not resemble either the ATPase subunit or bacterial rhodopsin. It will therefore be of interest to see if a general pattern emerges that will permit integral membrane proteins to be distinguished from other proteins on the basis of the charge and polarity distribution of their constituent amino acids. This should be possible as sequence information on other membrane proteins becomes available.