Proinsulin: A Proposed Three-Dimensional Structure

SUMMARY Empirical analysis of the primary structures of 10 mammalian C-peptides has indicated a conservation of conformation. The positioning of the C-peptide in the proinsulin molecule is essentially defined by the proposed secondary structure, the covalent connection to the Al and B30 residues of insulin and the requirement that C-peptide lies against the exposed surface of the insulin hexamer, allowing a three-dimensional structure for proinsulin to be predicted. Conserved residues in the C-peptide are proximate to residues in insulin that are also conserved, suggesting that interactions between these residues are highly probable. Residues in insulin thought to be important for biological activity are principally those that interact with the C-peptide residues. The role of the C-pep-tide region in proinsulin in preventing expression of insulin activity and for nucleation of the folding of the prohormone are discussed. The three-dimensional structure the hormone insulin However,

R. SNELL AND DEREK G. SMYTH From the National Institute for Medical Research, Mill Hill, London, NW7 IAA, England SUMMARY Empirical analysis of the primary structures of 10 mammalian C-peptides has indicated a conservation of conformation. The positioning of the C-peptide in the proinsulin molecule is essentially defined by the proposed secondary structure, the covalent connection to the Al and B30 residues of insulin and the requirement that C-peptide lies against the exposed surface of the insulin hexamer, allowing a three-dimensional structure for proinsulin to be predicted. Conserved residues in the C-peptide are proximate to residues in insulin that are also conserved, suggesting that interactions between these residues are highly probable. Residues in insulin thought to be important for biological activity are principally those that interact with the C-peptide residues.
The role of the C-peptide region in proinsulin in preventing expression of insulin activity and for nucleation of the folding of the prohormone are discussed.
The three-dimensional structure of the hormone insulin has been reported (1,2) but studies of the prohormone are at an early stage (3,4). However, primary structures of proinsulin, which comprises the B-and A-chains of insulin together with the Cpeptide that connects them (5, 6), are known for 10 mammalian species. While the insulin sequences are relatively conserved, the C-peptides exhibit wide variation, and on the basis of optical data it has been suggested that the C-peptide region of proinsulin may possess no ordered secondary structure (7, 8); though more recent data has indicated the presence of some cy helix (9). We have applied to the C-peptides the empirical rules of Chou and Fasman (10,11) to predict ordered structures and those of Lewis et al. (12) and Crawford et al. (13) to predict p bend regions from inspection of the amino acid sequences. The results indicate that the C-peptides possess common elements of secondary structure. By constructing a framework model of bovine C-peptide, and also of human C-peptide, it was found possible to fit the proposed structure to a model of the three-dimensional structure of bovine insulin. It was observed that the residues that are conserved in the C-peptides occupy positions in space which permit specific interactions between these and other conserved residues in the insulin moiety of the prohormone.

PROIKSULIN MODEL
The empirical analysis of the C-peptide sequences predicts that all of the C-peptides possess a helical structure from positions C6 to Cl1 (in most cases the helical sequence is extended in one or both directions from this core), followed by a stretch of non helical structure from positions Cl3 to Cl9 (Fig. 1). Although the procedures for predicting /3 bends are more equivocal than those for predicting helical and /I structures (13) it seems probable that the sequence Gly-Pro-Gly-X, which occurs in the central region of five of the C-peptides, forms a p bend; similarly t,he sequence Gly-Pro-Pro-Gln, which occurs in OS and pig C-peptides, may also form a /3 bend. The high rate of occurrence of proline at position 2 and glycine at positions 2 and 3 in fi bends (13) suggests that all of the C-peptides contain a p bend at approsimately positions Cl5 to C18. A second helical region from rcsidues C21 to C27 is predicted in 9 of the 10 C-peptides; in bovine C-peptide an extra /I bend is assigned to this region in place of the deleted helical residues at C22 to C26. This prediction contrasts with the suggestion of Fullerton et al.
(3) that bovine C-peptide may contain two short helical regions.
Before fitting the assigned secondary structure of the C-peptides onto a model of bovine insulin, certain assumptions were made. The conformation of the A-aiid B-chains of insulin was considered to be the same in the prohormone as in the hormone. This is consistent with physical studies of the molecules using CD and ORD (7) and Laser-Raman spectroscopy (9). For purposes of comparison it, was further assumed that the thrce-dimensional structure of insulin is the same in the 10 species shown in Fig. 1 (though guinea pig insulin may prove an exception). The C-peptide, flanked by the two pairs of basic residues, has to be placed between the 1330 alanine and the Al glycine (5), which in bovine insulin lie approximately 10 A apart (1). Proinsulin, like insulin, forms a stable dimer and in the presence of zinc ions forms a stable hesamer (7, 24, 25); and proinsulin has been co-crystallized in a 1 : 1 complex with insulin (4). Clearly the C-pcptide in proinsulin does not mask the tlimer and hcsamer aggregation sites of insulin. It is a reasonable conclusion, therefore, that the C-peptide lies against the area of insulin that is on the surface of the hesamer.
Working within these constraints, it proved possible to build a model of proinsulin in which the C-peptide fits well onto the known insulin structure.
In the model (Fig. 2), the first helical section of the C-peptide fits readily into a pocket formed by the A-chain from residues Al to A13. The "random" region and the p bend of the C-peptide lie on the outside of the molecule and follow the course of the remainder of the A-chain. The C-pcptide chain continues into a second region of ordered structure, in most cases another helical section, the axis of which runs from the B22 arginine to the Al glycine nearly parallel to the axis of the B-chain from residues 22 to 29. In the case of the bovine molecule, the same general course is followed by the peptide 2. Framework model of the proposed three-dimensional structure of bovine proinsulin.
The model was constructed from components described by C. F. Dare (26).
chain with the p bend compensating for the absent helical residues. The area of the insulin molecule covered by the C-peptide does not include the side chains involved in dimer formation (B24, B26 and parts of the B-chain helix) or hexamer formation (B14 Ala, B17 Leu, 1318 Val, and BI Phe with A13 Leu and Al4 Tyr) or those that occur at the center of the insulin hexamer (B9 Ser, I310 His, and B13 Glu). The C-peptide does not interact with any of the aromatic residues of insulin, though Al4 Tyr is proximate to the Cl5 and Cl6 residues. Line diagrams demonstrating the course taken by the polypeptide chain of bovine and human C-peptide are shown in Figs. 3 and 4 superimposed on the course taken by the A-and B-chains of bovine insulin.

INTERACTIONS
With the C-peptide in position on the insulin molecule, a number of interactions between specific groups in the C-peptide and other groups in the prohormone become apparent. In the model, the carboxylate of glutamic acid at position Cl is exposed to solvent and lies adjacent to the guanidino group of arginine at BCl; similarly the carboxylate of glutamic acid at C3 is on the outside of the molecule and can interact with the guanidino group of arginine at position BC2. The P-carboxamide of glutamine at C6 is buried under the first C-peptide helix and can align with the corresponding side chain of glutamine at A5. The carboxylate of glutamic acid at Cl1 is on the outside of the prohormone in a position to form a salt bridge with arginine at CA2, and the carboxylate of glutamic acid at C27 is adjacent to the guanidino group of arginine at B22. The carboxamide side chain of the glutamine residue at C31 is capable of interacting with the asparagine side chain at A21 in the interior of the molecule. In addition, lecuine at position C21 is present in all species; it occurs at the NH*-terminal end of the second ordered section of the C-peptide and presumably forms important interactions with adjacent hydrophobic residues. Of the 11 polar side chains in human C-peptide, 8 occupy positions on the outside of the proinsulin molecule; of the 11 hydrophobic residues, 7 are buried in the interior.
The remaining residues are solvated glycines and from atomic coordinates using the models of bovine in Fig. 2. Residues in the C-peptides are numbered as in Fig. 1. and human proinsulin.
The C-peptide is denoted by the heavy lines, prolines.
The observed ionic and hydrogen bond interactions, along with the buried hydrophobic contacts, would stabilize the binding of the C-peptide to insulin in the prohormone.
It is a striking fact that the amino acid residues that have been assigned to form interacting pairs in the model are essentially conserved in the C-peptides and in the insulins of all 10 species. In insulin there are 8 conserved residues on the surface of the molecule not involved in the dimer or hexamer association: they are at positions Al, A4, A5, A19, AZ, 7313, B22, and 1325 (27). In the model the Al glycine is covalently attached to the COOH terminus of the C-peptide (through lysyl arginine), A5 glutamine is proposed to interact with C6 glutamine, A21 asparagine can interact with the amide of C31 glutamine, and B22 arginine is postulated to interact with C27 glutamic acid. Of the other residues, the B13 glutamic acid is on the inner surface of the insulin hexamer and B25 phenylalanine probably stabilizes the association between monomers in the dimer (2). The only exposed conserved residues not assigned to participate in interacting pairs are glutamic acid at A4 and A19 tyrosine. The former is already paired with B29 lysine in the x-ray structure of insulin; the latter, while exposed in insulin, is concealed by C-peptide in the prohormone.
The interaction of glutamic acid at C27 and arginine at B22 deserves further consideration.
Guinea pig C-peptide does not have an acidic residue at C27, the position being occupied by glutamine (17) ; and guinea pig insulin is unique among mammalian insulins in having aspartic acid in place of arginine at B22 (28). Hence the electrostatic interaction proposed between B22 and C27 cannot occur in this species. Furthermore the conformation of guinea pig insulin is probably different from other mammalian insulins because of the atypical amino acid sequence. In all insulins, except guinea pig, arginine at 1322 appears to form a salt bridge with the carboxylate of asparagine at A21 (2, 29) and the proximity of the two functional groups seems important for the expression of biological activity. Thus des-Ala (B30) des-Asn (A21) bovine insulin is only weakly active in mouse convulsion assays (30). When assayed in the guinea pig, on the other hand, des-Ala des-Asn insulin exhibits substantial biological activity.2 It is clear that guinea pig insulin, which lacks the guanidino function at B22, does not have the same requirement for asparagine at A21 as the other mammalian insulins; and guinea pig proinsulin does not have a requirement for glutamic acid at C27.
In the general case, Arg B22 in the hormone appears to interact with the carboxylate of Asn A21 but in the prohormone Arg B22 couid equally interact with the carboxylate of Glu C27. The driving force for the transference of the salt bridge could come from displacement of the asparagine at A21 by an interaction between its carboxamide side chain and that of Gln C31; in this context the side chain at Leu C21 may contribute a hydrophobic environment for these interactions. An exception to the conserved interacting residues occurs in horse C-peptide, where an alanine residue replaces the glutamic acid normally present at C27 (19).3 The related mammals, horse, pig, and OS, however, show a greater variability in the COOH-terminal region of their C-peptides and horse proinsulin may have other less obvious compensating interactions in this region. In the case of dog C-peptide, it has been suggested that the product isolated from pancreas represents residues C9 to C31 and that an octapeptide may be missing from the NH:! terminus (16). This seems likely as the proinsulin model could not accommodate such a short C-peptide with retention of the proposed interactions common to the other C-peptides.
These proposals are based on empirical predictions. They stand on the complementarity of the fit of the C-peptide to insulin and on the large number of interactions that can occur between conserved residues. Unambiguous determination of the structure will come from an x-ray crystallographic study of proinsulin; but the position of C-peptide with respect to the three-dimensional structure of insulin is open to experimental verification. Certain residues may be exposed in the hormone, yet buried in the prohormone. In the model the COOH-terminal asparagine residue of proinsulin is less accessible than the same residue in insulin because of the many C-peptide interactions t.hat can occur in this region. Therefore it would be expected that carboxypeptidase A would release asparagine more readily from insulin than from proinsulin. Slobin and Carpenter (31) studied the rates of release of asparagine from insulin bythe action of carboxypeptidase.
In parallel experiments using zinc-free porcine insulin and proinsulin, we have observed that at pH 8.5, 37", with an enzyme concentration of 6.3 pg of carboxypeptidase A/ml, the release of asparagine from insulin (120 fig/ml) was essentially complete in 30 min; from proinsulin (216 pg/ml) less than 10% of the COOH-3 D. E. Massey, C. R. Snell, and D. G. Smyth, unpublished experiments.
terminal asparagine was released under the same conditions. Assuming first order kinetics at the concentrations used, asparagine is released from proinsulin 10 times more slowly than from insulin.

C-PEPTIDE FUSCTION
If the prediction of a high degree of ordered structure in the C-peptide is correct, then this region of proinsulin is likely to be important as a nucleation center for folding of the protein during biosynthesis. The helical regions would serve as the principal nucleation centers, with the /3 bends acting as directing regions for further interactions between the secondary structure components of the molecule. The C-peptide follows a course that is close to many residues of the A-chain and to the COOH-terminal nonapeptide of the B-chain. The C-pcptide could act to bring these less ordered regions into the spatial arrangement necessary for correct disulfide formation.
Another nucleation center is probably the helix of the B-chain which runs between the two B-chain disulfides and lies on the distal side of the proinsulin molecule. After biosynthesis of the prohormone, complete with its intrachain disulfide bridges, the C-peptide is excised by intracellular enzymes to liberate the active hormone.
It is notable that the C-peptide appears to interact specifically with a number of residues in insulin that are considered important for biological activity, e.g. Al Gly, A5 Gln, A21 Asn, and B22 Arg (32). Furthermore the prohormone in in V&O assays, where activation to the hormone is less likely, has been reported to exhibit only 107" of the biological activity of insulin (33). Thus the attachment of the C-peptide to insulin has the effect of masking the activity of the hormone in a highly specific manner.
The recent experimental observations of Yu and Kitabchi (33) on the in vitro biological activity of modified proinsulins are consistent with the above hypothesis, that the C-peptide covers the area of insulin that is important for biological function. It was reported that cleavage of the chymotrypsin-sensitive bond at Leu C24-Ala C25, or removal of the dipeptide at BCI-BC2, does not lead to an increase in the specific activity of proinsulin. In contrast,, when the dipeptide CAl-CA2 was removed the activity of bovine or porcine proinsulin increased 3-to 4-fold. With porcine proinsulin, removal of 11 more residues from C22 to C31 led to little further increase in activity. On the basis of the proinsulin model, the "split" proinsulin, with the Leu-Ala cleavage, could exhibit a similar conformation to intact proinsulin, with the two halves of the C-peptide still able to bind to insulin and block access to the active region of the hormone. In the proinsulin lacking BCl and BC2, the C-peptide again may remain bound without the covalent attachment at the B30 alanine. The partial recovery of biological activity in the proinsulins cleaved at the COOH terminus of the C-peptide, on the other hand, may reflect the more exposed nature of the second ordered region which, in the absence of the covalent linkage at Al and the basic dipeptide, is readily dissociated from the active region on the hormone. In this case, full activity is not recovered because the more buried, first helical region may remain bound and prevent perfect complementarity with the receptor. In summary, empirical analysis has indicated a conservation of conformation among divergent sequences of mammalian Cpeptides. The retention of conformational elements, in turn, has focussed attention on residues that may have functional importance and are essentially conserved in the species studied. The importance of C-peptide as a nucleation center for the folding of proinsulin is suggested from the high percentage of ordered structure in this region. The positioning of the C-peptide in the pro-insulin molecule is defined by the proposed secondary structure, its covalent connection at the Al and B30 residues of insulin, and the requirement that ib lies against the exposed surface of the insulin hexamer. The conserved residues in the C-peptides are proximate to residues in insulin which are also conserved, suggesting that interactions between these residues are highly probable.
Residues considered important for biological activity in insulin are, in many cases, the same residues that interact with C-peptide and proinsulin itself has little intrinsic biological activity. It is an attractive hypothesis that the specific interactions that take place between insulin and the C-peptide may also take place between insulin and the complementary "insulin receptor protein" at the site of action of the hormone.