Analysis of Retroviral Protease Cleavage Sites Reveals Two Types of Cleavage Sites and the Structural Requirements of the P1 Amino Acid*

Retroviruses encode a protease which cleaves the viral Gag and Gag/Pol protein precursors into mature products. To understand the target sequence specificity of the viral protease, the amino acid sequences from 46 known processing sites from 10 diverse retroviruses were compared. Sequence preference was evi- dent in positions P4 through P3‘ when compared to flanking sequences. Approximately 80% of all cleavage site sequences could be grouped into two classes based on the sequence composition flanking the scissile bond. The sequences at the amino-terminal cleavage site of the major capsid protein of Gag is always a member of one of the two classes while the carboxyl-terminal cleavage site is of the other class, suggesting a biological role for the two classes. Known processing site sequences proved useful in a motif searching strategy to identify processing sites in retroviral protein sequences, particularly in Gag. In all known cleavage sites, the P1 amino acid is hydrophobic and unbranched at the j3-carbon. The sequence requirements of the P1 position were tested by site-directed mutagenesis of the P1 Phe codon in an HIV-1 Pol cleavage site. Mu- tations were tested for protease-mediated cleavage of the Pol precursor expressed in Escherichia coli. The

Several lines of evidence indicate retroviral proteases are members of the aspartic proteinase family of proteases. However, the viral proteases differ from cellular aspartic proteinases in that they are smaller and function as a dimer (11)(12)(13)(14)(15)(16)(17)(18)(19). Substrate binding occurs within a cleft that is formed by symmetrical association of two P R monomers. Each monomer contains a flap that folds over the cleft that may contribute to substrate recognition and binding through interaction with the substrate (20)(21)(22).
The mechanism of processing site recognition in the precursor proteins by the viral protease is poorly understood. The PR is highly specific in site selection; only a few sites are cleaved in either the Gag or Gag/Pol precursor proteins. The cleavages are conservative in that only the scissile peptide bond is hydrolyzed at each cleavage site. Despite the specificity of site selection, recognition sequences for the retroviral PR are diverse in amino acid composition (1). Structural studies of the HIV-1 PR and substrate analogs suggests 6 to 7 residues of the substrate reside in the binding cleft (20,22,23). Studies with peptide substrates suggest the minimum substrate length for retroviral proteases is small (24,25). The minimum substrate length for the HIV-1 PR corresponds to positions P4 through P3' flanking the scissile bond (26)(27)(28) (P1 is the first amino acid upstream of the scissile bond, P1' is the first amino acid downstream).
To understand cleavage site specificity, we analyzed 46 known retroviral P R cleavage site sequences from 10 different retroviruses to characterize amino acid preferences in processing site sequences. Amino acid preference was evident from position P4 through P3' flanking the scissile bond. Most significantly, the identity of the amino acid in the P1' position correlated with two distinctive patterns in the composition of the flanking amino acid sequence. Approximately 80% of the processing sites examined could be classified as one of the two types of sites. The type of processing site was conserved a t certain locations in the retroviral genomes, suggesting a biological role for the two classes of sites. In addition, the sequence requirement of the P1 amino acid, which is always hydrophobic and unbranched at the @-carbon, was examined experimentally by mutagenesis of a Phe/Pro-processing site located at the amino terminus of the HIV-1 PR. Analysis of the effects of the mutations on processing of the Pol polyprotein was accomplished by an expression system in which Pol is processed to completion in Escherichia coli.
Plasmid Constructions-The phagemid used for mutagenesis, sequencing, and expression of the HIV-1 pol gene was pART2. pART2 contains the pol gene from HIV-1 isolate HXB2D (31) inserted into the vector pIBI20 (International Biotechnologies) (32,33). Features of pART2 include the pol gene under transcriptional control of the lac promoter, and an f l phage origin of replication to allow the production of single-stranded DNA for use in site-directed random mutagenesis and DNA sequencing (32,34).

Design and Synthesis of Mutagenic D N A Oligonucleotides and the
Production and Screening of Mutations in the HIV-1 pol Gene-To produce random mutations in the codon for the P1 amino acid of the Pol cleavage site located at the amino terminus of the PR domain upstream of reverse transcriptase, a 25-base DNA oligonucleotide was synthesized complementary to single-stranded pART2 except for the three positions of the target codon where all four nucleotides were inserted with equal frequency. Single-stranded uracil-substituted pART2 for use in mutagenesis was prepared by passaging the vector once through E. coli strain CJ236 (dut-ung-) (35). In vitro mutagenesis reactions and preparation of pART2 libraries containing mutations were performed as previously described (32,33,35). Individual plasmid clones of the library were rescued as single-stranded DNA by infection with M13K07(b) helper phage (34). Mutations in the codon for the P1 amino acid were identified by DNA sequence analysis using the dideoxy-chain termination method (36).
Induction of Expression of the Pol Gene i n E . coli-E. coli strain JMlOl containing either pART2 or pART2 with mutations were grown overnight in modified M9 medium (no glucose, 0.4% casamino acids, 100 pg/ml ampicillin, 0.001% thiamine) (37). Expression of the pol gene from the lac promoter was induced by the addition of isopropyl-6-o-thioglactopyranoside to a final concentration of 5 mM. After 90 min, the bacteria were pelleted by centrifugation, resuspended in 1/10 volume of sodium dodecyl sulfate-polyacrylamide gel electrophoresis loading buffer (0.25 M Tris-C1, pH 6.8, 4% sodium dodecyl sulfate, 20% glycerol, 10% 6-mercaptoethanol, 0.05% bromphenol blue), and lysed by heating to 100 "C for 3 min.
Phenotypic Screening and Western Blot Analysis-In the Western blot analysis of Pol processing, 10 pl of sample was subjected to sodium dodecyl sulfate-polyacrylamide gel electrophoresis (38) and transblotted a t 70 V (300 mAmp) for 1 h onto nitrocellulose (Schleicher & Schuell BA 85) in buffer containing 25 mM Tris-C1, pH 8.3, 192 mM glycine, and 20% (v/v) methanol. Blots were washed in phosphate-buffered saline pH 7.2 (PBS) twice for 5 min, blocked for 1 h in 3% gelatin in PBS, and exposed to 10 pl of anti-RT monoclonal (Du Pont) or anti-PR polyclonal antibody in 10 ml of I'BS + 0.1% Tween 20 (PBS-Tw) for 2 h a t room temperature. Blots were washed three times for 5 min in PBS-Tw and reacted with a 1:2000 dilution of either goat anti-rabbit or goat anti-mouse antibody conjugated with alkaline phosphatase in 10 ml of PBS-Tw for 1 h at room temperature (Promega Biotechnologies). The blots were then washed twice for 5 min in PBS-Tw, 5 min in Tris-buffered saline, and 5 min in carbonate buffer, pH 9.8 (0.1 M NaHCOs, 1 mM MgCl,). Color was developed using 5-bromo-4-chloro-3-indolyl phosphatetoluidine salt and p-nitro-blue tetrazolium as substrates for alkaline phosphatase according to the instructions of the manufacturer (Bio-Had).

RESULTS
Two Sequence Families Correlate with the PI' Amino Acid Sequence-To identify amino acid sequence determinants in the substrate that define PR specificity, a database containing 46 different processing site sequences from 10 highly divergent retroviruses was compiled ( Table I). The cleavage sites were selected discriminately by choosing only sites proven by amino-terminal sequencing of the protein products. Several types of analyses were applied to the data base of sequences. First, substrate positions P10 through P10' were analyzed to determine amino acid variability, the composition of amino acids, and the average hydrophobicity of the amino acids located in each position (Table 11). Less sequence variability was evident in positions P4 through P3'. The four positions closest to the scissile bond (P2 through P2') were the most restricted in amino acid composition (Fig. 1A). The exception was the P3 position, which was as variable as positions farther away from the scissile bond. The amino acid composition in positions P10 through P10' was also analyzed for hydropathicity (39) (Fig. 1B). Amino acids in substrate positions P2 through P3' were increased in hydropathicity relative to the flanking sequences. Hydropathicity and sequence variability data, when taken together, suggested substrate sequences important for interaction with retroviral proteases generally extend from positions P4 through P3'. These positions correspond to the minimum substrate length of seven amino acids (P4 through P3') required for cleavage by the HIV-1 PR (26,27,28).
The PR cleavage site sequences were also analyzed to ascertain if the presence of a certain amino acid at any one position predisposes the appearance of any particular amino acid at another position. This analysis was accomplished by correlating amino acids that appear frequently at specific positions between P5 and P5' with the frequent appearance of specific amino acids at other positions. The analysis indicated most processing site sequences could be grouped into two classes based on two patterns of sequence preference that were evident from positions P4 through P3'. Differences in flanking amino acid composition in the two classes correlated with the amino acid in the P1' position of the substrate. Based on this analysis, we have designated the two classes of cleavage sites as type 1 and type 2 sites. Type 1 sites have Pro in the P1' position (Table 111), while type 2 sites have either Leu, Ala or Val in the P1' position (Table IV). Differences in amino acid preference in the two types of sites were quite obvious. For example, type 1 sites frequently have Phe or Tyr in the P1 position while type 2 sites favor Leu over Phe or Tyr. In contrast to the differences in sequence preference between the two types of site, there were several sequence preferences that were apparently shared by both classes of sites. For example, amino acids with side chains branched at the P-carbon (Ile, Val) were excluded from the P1 position of all the processing site sequences (Table 11). The patterns of differences and similarities in the positions flanking the P1' amino acid in the two types of PR cleavage sites can be summarized as follows: P4, small amino acids in both types of sites (Pro, Ser, Thr, Gly, Ala); P3, generally uncharged with Gln seen in some type 1 sites and Arg/Lys seen in some type 2 sites; P2, aliphatic for both types of sites but with Asn occasionally seen and Val largely excluded from type 1 sites; PI, Tyr, Phe, Leu predominate in type 1 sites while Leu predominates in type 2 sites; Pl', Pro (type 1) and the amino acids Ala, Leu, and Val (type 2) define the two types of sites; P2', aliphatic in both types (Ile is excluded from type 2 sites); P3', aliphatic in both types (Ala is favored in type 2 sites).
While approximately 80% of processing site sequences were either type 1 or type 2 sites by the composition of the P1 amino acid, some PR cleavage site sequences contained amino acids other than Pro, Leu, Val, or Ala at the P1' position. Nine such sites are listed in Table I. These sites as a group shared the general features of type 2 sites, although polar

G P G Q K A R L M / A E A L K E A L A P 43 PVNC L A P A P I P F A A / A Q Q K G P R K P I 43 NC/Pl M A K C P N R Q A G / F L G L G P W G K K 43
amino acids were common in the P3 position (not shown). in Gag is always a type 1 site while the processing site at the The argument for the existence of two classes of sites is carboxyl terminus of the capsid protein is usually a type 2 site strengthened by the observation that the classes of sites were, and never a type 1 site (Table I). This conservation of procto a degree, conserved in their genomic location. The processessing site type and location is evident in all 10 retroviral ing site located at the amino terminus of the capsid protein genomes examined.  e  r  2  5  6  3  3  2 7  2   1  1   1  2  3  2  1  3  5  2  3  T  h  r  2  4  5  3  3  4 6  2   3  2  Pro  4  9  6  8  6  9  1  2  1   1  3  4  3  2  1  2  3  1  4   A  l  a  6  4  2  3  2  2 5  4  1  1  1   5  7  6  4  2  9  5   G  l  y  6  4  2  7  5  1   9  6  9  3  5  1  3  4  3  3  3  2  1  3  2  3  2  4  9  . .  Table I. B , sum of the hydropathy index (39) of the amino acids found in each position is plotted for the 46 retroviral processing sites. In this analysis, the sum of the hydropathic index is calculated for each position by multiplying the number of each amino acid by its hydropathic value before summation to give a weighted value based on the frequency of appearance.
The sequence patterns associated with the type 1 and 2 sites located at the boundaries of the capsid protein are not different from other type 1 and 2 sites located elsewhere in the genome. One possible exception is the type 2 sites located at the carboxyl terminus of the capsid protein which frequently contain a basic amino acid in the P3 position. In type 2 sites located elsewhere in the genome, the P3 position is generally occupied by hydrophobic or uncharged polar amino acids.
Zdentification of PR Cleavage Sites by Sequence Searching-We were interested in developing computer methods to predict potential processing sites within Gag and Pol, based on the sequences flanking the 46 known processing sites. To preserve the relative importance of each amino acid a t each position, the sequences from positions P4 to P3' were aligned relative to the scissile bond and a weight matrix was constructed from the number of occurrences of each amino acid at each position (29,30). The ANALYSEP program performs searches for those sequences which exceed a specified score when compared to such a weight matrix (30). We began by using this program to identify potential processing sites in a database consisting of the Gag, Pol, and PR amino acid sequences of all 10 retroviruses. When the weight matrix was derived from all 46 P R cleavage sites, 7 of the top 10 and 12 of the top 20 scores corresponded to known processing sites (Fig. 2 A ) . Approximately half of the known processing sites in the Gag, Pol, and P R sequences of the 10 viruses were identified before the search efficiency began to decrease. When searches were performed using a weight matrix derived from only the type 1 sites (16 sites) or only type 2 sites (21 sites), 8 of the top 10 scores corresponded to known cleavage site sequences. (Fig.  2, panels €3 and C). The processing sites identified in Gag for the 10 retroviral genomes are shown in Fig. 3A. Only those scores which fell within the efficient portion of the search (the region of the diagonal line in Fig. 2, B and C) are included.
None of the nine known cleavage sites that were not classified as type 1 or 2 were identified. However, 25 of the 33 type 1 and 2 sites were identified, as were three sequences not representing known processing sites. The efficiency of cleavage site recognition in Pol was much less than for Gag (data not shown).
The success of the searches a t identifying known processing sites with a fair degree of reliability encouraged us to search five additional retroviral genomes, in which processing sites have not been experimentally identified, for potential processing site sequences in Gag. The retroviral genomes examined were diverse consisting of three lentiviruses, a foamy retrovirus, and an oncovirus. The searches, conducted with data bases derived from either type 1 or 2 cleavage site sequences, identified processing site sequences of equal or better quality than many of the known cleavage site sequences. Many of the type 1 cleavage site sequences are located in genomic positions where protease cleavage sites  CY s  1  1  Ser  2  2  3  4  1  1  1  2  1   1   T h r   1  2  3  3  3  2  1  1  1  1  1  3  P  r  o  1  3  1  3  3  4  3  16  1  3  1  4  1  Ala  4  1  1  1  1  3  1  1  1  1  1  2  1  1  Gly  2  1  3  2  1  1  1  5  3   would be expected to occur (Fig. 3B). Searches for type 2 sites identified very few sequences when performed under the same stringent conditions as the type 1 sites. These results suggest known processing site sequences can be used with some success to identify likely processing sites in proteins, particularly type 1 sites. Test of P1 Specificity: Mutagenesis of the PI Amino Acid in a Pol Cleavage Site-Analysis of the amino acids at the P1 position of retroviral processing sites indicated hydrophobic amino acid were preferred, with Leu, Phe, Tyr, and Met being the most common (Table 11). Curiously, while Leu appeared over 30% of the time, the similar amino acids Val and Ile did not appear at all (Table 11). A consideration of side-chain structure of the excluded amino acids indicated that the amino acids in the P1 position have the common feature of being both uncharged and unbranched at the P-carbon.
To test the significance of this observation, amino acid substitutions were introduced by site-directed mutagenesis into the P1 position of a type 1 processing site (SFNF/PQI) located at the amino terminus of the PR-encoding domain in the HIV-1 pol gene. This site, as well as the other sites in Pol, is cleaved by the PR when the entire pol gene is expressed in bacteria (32, 33, 40). The Phe codon for the P1 amino acid was mutagenized to encode 14 other amino acids. The mutated pol genes were expressed in E. coli under control of the lac promoter and the extent of processing determined by Western blot analysis (Fig. 4A). In this assay, the appearance of mature forms of the viral reverse transcriptase (64 and 51 kDa) is indicative of proper P R activity, while the size of the processed P R (normally 11 kDa) indicates whether processing has occurred at the mutated processing site. Three distinct phenotypes were observed depending on the amino acid substituted into the P1 position. Substitution of Leu for Phe permitted cleavage to occur at the mutated site at near wild-type levels as indicated by the intensity of t h e P R p l l band (Fig. 4B).

An intermediate phenotype, characterized
by reduced amounts of mature p l l , was observed when Ser and Gly were substituted at the P1 position. The Ser substitution apparently reduces the amount of mature p l l more than the Gly substitution. The remainder of the mutations tested had a negative phenotype for processing to generate mature PR (Fig. 4B). Identification of processing site sequences using pattern searches. Each identified sequence is assigned a quality score relating to the degree of similarity to a weight matrix derived from the processing site sequences in Table I (29). In this plot the number of correctly identified processing sites is plotted versus the total number of sites identified. A line with a slope of one indicates the identification of only known processing sites. A slope greater than one indicates the additional identification of other related sites. The point where the line turns in a vertical direction indicates the position where most of the identified sequences are not known processing sites. Searches were performed using processing site sequences from positions P4 through P3' to define the motif. Searches used all 46 processing site sequences (panel A ) , the 16 type 1 sites (panel B ) , or the 21 type 2 sites (panel C ) to define the weight matrix.

Retroviral Protease Processing Sites
Associated with the intermediate and negative phenotypes for processing at the mutated site were the presence of several larger products (the most abundant being 25, 29, 17, and 15 kDa) that were detectable with anti-PR antibody by Western blot (Fig. 4B). The larger products were present with all amino acid substitutions tested except Leu but were absent when the pol gene contained a mutation (D25I) that eliminates protease activity (data not shown), suggesting they result from protease activity. Several of the larger protein products can be predicted from the organization of the pol gene. The 25-kDa species was probably formed by cleavage at the carboxyl terminus of PR, resulting in the retention of the amino-terminal polypeptide that is normally cleaved from the PR. Substitutions negative or reduced for processing at the amino terminus of PR also exhibited reduced processing at the other sites in Pol (Fig. 5). The amount of processing at the distal sites appeared to correlate with the amount of processing at the mutated site. While bands representing the The arrows indicate protein sequences identified as potential PR cleavage sites in a search using either type 1 sites (bold arrows) or type 2 sites (thin arrows) to derive the weight matrix. These sites have quality scores values that fell within the diagonal range of the plots shown in Fig. 2, B and C. B, location and type of possible processing site sequences in Gag for retrovirus genomes in which processing sites have not been characterized. Arrows denote possible type 1 and 2 processing site sequences of equal of greater quality to those sequence identified as described for panel A. mature reverse transcriptase p51 and p64 were present with all mutations tested, many extra bands were also evident. These bands were also present when the pol gene contained a mutation (D25I) that eliminated protease activity (PR-), suggesting they result from E. coli proteinase activity on the Pol precursor. These results show that substitutions that blocked processing at the mutated site reduced but did not eliminate protease activity.

DISCUSSION
To understand the sequence requirements that define specificity of the retroviral protease, we compared the sequences of cleavage sites from a number of retroviruses. Retroviral proteases recognize a heterogeneous collection of sequences as cleavage sites. Despite the heterogeneity, definite patterns of amino acid preference were found. Several schemes have been previously described to account for PR specificity (41-43). While the general features of those schemes are apparent in the patterns described here, the larger number of sites used here has allowed more detail to be added and has revealed the existence of two families of cleavage sites that are, in part, conserved in location in all retroviral genomes.
Because we selected sites from distantly related retroviruses, sequence specificity evolving within a family of closely related retroviruses would be overlooked in this classification Tu)o Types of Clcaoage Sitcs Arc Ilctcrmincd by thr PI' Amino Acid-Evidence supporting the concept of two distinct classes of cleavage site sequences was based primarilv on two ohservations. First, in sites where the amino acid in 1'1' is Pro (type 1 sites), we found distinct differences in the flanking amino acid sequence compared to sequences in which the P I ' position was occupied by Ala, IRU, or Val (t.ype 2 sites) (Tahles 111 and IV). Second, certain classes of cleavage sites are preferred at certain locations in all retroviral genomes. For example, the cleavage site sequence that generates the amino terminus of the major capsid protein is alwnvs a type 1 site. In contrast, the cleavage site that generates the carhoxvl terminus of the major capsid protein is nearlv alwnvs a type 2 site and never a type 1 site (Tahle I, Fig. :M ).
There are several possihle reasons whv two t-ypes of cleavage sites might exist. Because of the high conservation of the cleavage sites around the major capsid protein, it is possible that these sites are functionallv different.
For HIV-I it has been shown that the Gag precursor undergoes an ordered series of cleavage events (44, 45) that correlates with the different rates of cleavage at the different processing sites (27,46,47). The rate of cleavage may he influenced not only hv the amino acid composition of the cleavage site hut also hv the structural cont.ext and surface accessihilitv of the processing site. However, it is not possihle at this time to determine the role secondary structure or accessihilitv mav plav in protease cleavage site sequence selection.
Degenrracy within PN Cl~auagr Sifcs-The patterns ohserved in amino acid preference were less apparent when examining the sites individually. Onlv about 50r; of the type 1 or 2 cleavage site sequences listed in Table I had flanking amino acids from positions P4 to P.7' that were in complete agreement with the two general patterns for t.ype 1 or 2 cleavage sites. The majoritv of the remainder o f the sites differed by a single amino acid. A few sites even appeared to contain elements of both cleavage site families, for example the HIV-1 NC/p6 sit.e has Phe-Leu in the P -P I ' positions, and the MuI,V p12E/p2E site has Val in P1' and Gln in 1'3.
With the exception of the cleavage sites flanking the major capsid protein, none of the other cleavage sites in Gag or I ' d were conserved with respect to having a t?rpe 1 or 2 site at a specific location. I t is possible that processing sites at other locations may not have the same structural and sequence constraints as the cleavage sites flanking the major cnpsid protein.
Structural Requirements of the PI Amino Acid-Of the 46 processing sites considered, eight different amino acids appeared in the P1 position (Table 11). However, Phe, Tyr, Leu, and Met accounted for nearly 90% of the sites, suggesting these amino acids best fit the sequence requirements of the P1 amino acid. Thus, the specificity of the amino acid side chain in the P1 position is best described as large, hydrophobic, and unbranched at the /?-carbon. The absence of the hydrophobic amino acids Ile and Val from the P1 position of cleavage sites suggested that the structure of the P1 amino acid side-chain at the /?-carbon is an important determinant of cleavage site specificity.
There are several possible reasons why an amino acid with a branched @-carbon may be excluded from the P1 position. First, a branched amino acid may make hydrolysis of the peptide bond difficult, as is seen with some peptide bonds. An example is the peptide bond following the amino acids Val and Ile, which are particularity resistant to cleavage by acid hydrolysis (48). While it is generally agreed that aspartic proteinases hydrolyze peptide bonds by general base catalysis (49), there may be similar constraints by the upstream amino acid side-chains in both mechanisms of hydrolysis. Second, it is possible a @-branched side-chain on the P1 amino acid may not allow proper association of the substrate with the S1binding pocket of the PR. Third, an unbranched amino acid may be required for some structural feature of the processing site. One possible structural constraint could involve the assumption of the cis conformation by prolines in the P1' position. By analogy with retroviral PR cleavage sites, 50% of the prolines known to be in a cis conformation within proteins are preceded by the amino acids Phe, Tyr, and Leu, and only 3% are preceded by the branched @-carbon amino acids Ile, Val, and Thr (50).
Mutational analysis of the P1 position confirmed the sequence requirements of the P1 amino acid in a Pol cleavage site. Replacement with the @-branched amino acids Val, Ile, Pro, and Thr blocked processing at the mutated site and reduced the amount of processing at the other sites in Pol ( Figs. 4B and 5). This is in agreement with an observation made by Kotler et al. (24) in which substitution of Ile in the P1 position of an avian myeloblastosis virus PR peptide substrate eliminated cleavage and the peptide functioned as an inhibitor. It is not known if the reduction of cleavage at the other processing sites in pol seen with some mutations is the result of inhibition of protease activity by the mutated sequence or by the retention of a peptide at the amino terminus of protease. The ability of ser and Gly to substitute, at least in part, for Phe may be related to the fact that these amino acids are found in the P1 position at other processing sites, although never with Pro in P1' (Table I). This result is in agreement with those of Partin et al. (51), in which a Ser substitution at the P1 position of the type 1 site at the amino terminus of PR in a native protein substrate allowed efficient processing in vitro by protease provided both in cis and in trans. However, a Ser substitution in the P1 position of the type 1 site located at the amino terminus of CA prevented cleavage at this site in vitro with both peptide and native protein substrates (51).
Of the amino acid substitutions we tested at the P1 position, only Leu permitted cleavage at or near wild-type levels. This suggests Leu may function as an equivalent replacement for Phe without a significant effect on cleavage. The examination of amino acid sequences from different HIV-1 isolates (52) also suggests Leu and Cys may be able to serve as functional equivalents for Phe in the P1 position of the processing site located at the amino terminus of PR. While Phe is found most often in the P1 position of this processing site in the different isolates, two isolates have Leu (HIVOYI and HIVHAN) and one isolate has Cys (HIVSF2) at the P1 position.