“Register-shift” insulin analogs uncover constraints of proteotoxicity in protein evolution

Globular protein sequences encode not only functional structures (the native state) but also protein foldability, i.e. a conformational search that is both efficient and robustly minimizes misfolding. Studies of mutations associated with toxic misfolding have yielded insights into molecular determinants of protein foldability. Of particular interest are residues that are conserved yet dispensable in the native state. Here, we exploited the mutant proinsulin syndrome (a major cause of permanent neonatal-onset diabetes mellitus) to investigate whether toxic misfolding poses an evolutionary constraint. Our experiments focused on an invariant aromatic motif (PheB24–PheB25–TyrB26) with complementary roles in native self-assembly and receptor binding. A novel class of mutations provided evidence that insulin can bind to the insulin receptor (IR) in two different modes, distinguished by a “register shift” in this motif, as visualized by molecular dynamics (MD) simulations. Register-shift variants are active but defective in cellular foldability and exquisitely susceptible to fibrillation in vitro. Indeed, expression of the corresponding proinsulin variant induced endoplasmic reticulum stress, a general feature of the mutant proinsulin syndrome. Although not present among vertebrate insulin and insulin-like sequences, a prototypical variant ([GlyB24]insulin) was as potent as WT insulin in a rat model of diabetes. Although in MD simulations the shifted register of receptor engagement is compatible with the structure and allosteric reorganization of the IR-signaling complex, our results suggest that this binding mode is associated with toxic misfolding and so is disallowed in evolution. The implicit threat of proteotoxicity limits sequence variation among vertebrate insulins and insulin-like growth factors.

Globular protein sequences encode not only functional structures (the native state) but also protein foldability, i.e. a conformational search that is both efficient and robustly minimizes misfolding. Studies of mutations associated with toxic misfolding have yielded insights into molecular determinants of protein foldability. Of particular interest are residues that are conserved yet dispensable in the native state. Here, we exploited the mutant proinsulin syndrome (a major cause of permanent neonatal-onset diabetes mellitus) to investigate whether toxic misfolding poses an evolutionary constraint. Our experiments focused on an invariant aromatic motif (Phe B24 -Phe B25 -Tyr B26 ) with complementary roles in native self-assembly and receptor binding. A novel class of mutations provided evidence that insulin can bind to the insulin receptor (IR) in two different modes, distinguished by a "register shift" in this motif, as visualized by molecular dynamics (MD) simulations. Register-shift variants are active but defective in cellular foldability and exquisitely susceptible to fibrillation in vitro. Indeed, expression of the corresponding proinsulin variant induced endoplasmic reticulum stress, a general feature of the mutant proinsulin syndrome. Although not present among vertebrate insulin and insulin-like sequences, a prototypical variant ([Gly B24 ]insulin) was as potent as WT insulin in a rat model of diabetes. Although in MD simulations the shifted register of receptor engagement is compatible with the structure and allosteric reorganization of the IR-signaling complex, our results suggest that this binding mode is associated with toxic misfolding and so is disallowed in evolution. The implicit threat of proteotoxicity limits sequence variation among vertebrate insulins and insulin-like growth factors.
The structure and function of globular proteins are determined by their sequences. Yet the informational content of such sequences must also contain determinants of folding efficiency, providing safeguards against toxic misfolding (1). Such safeguards may be inapparent once the native state is reached (2). In this study, we demonstrate that insulin can in principle exhibit two modes of receptor binding, but only one protects from toxic misfolding. Evidence is provided that the excluded mode would enhance the risk of pancreatic ␤-cell dysfunction and diabetes mellitus (DM) 4 due to endoplasmic reticulum (ER) stress (3).
Our study was motivated by the hypothesis that residues may function as critical "safeguards" in protein biosynthesis, but are dispensable in the native state (13). Two classes of safeguards have previously been identified (14). The first, prodomains, functions within the primary translation product but is removed during biosynthesis (15). An example is the connecting (C) domain of proinsulin, which markedly favors native disulfide pairing (16) relative to combination of isolated A and B chains (17). Such internal catalysis enables native folding despite the intrinsic amyloidogenic properties of A and B chain segments (18). The second class of safeguards consists of residues in the mature protein required for foldability but otherwise dispensable once the native state is reached (13). Although such residues may be overlooked in functional screens in vitro, they can be conserved due to critical (yet unseen) roles in erecting kinetic barriers to toxic aggregation (12). Insulin provides a model for studies of protein evolution (19). Biological selection is strict due to its physiological importance (20). Indeed, large quantities of the hormone must be expressed and stored in ␤-cells (21). The foldability of proinsulin is nonetheless only precariously maintained; the majority of human cell lines cannot efficiently fold proinsulin (3), highlighting the specialized milieu of the ␤-cell. Even in ␤-cells, overexpression of proinsulin (e.g. in response to insulin resistance as often seen in obesity (22)) can induce ER stress (23), ultimately leading to ␤-cell dysfunction and death (24). This process is central to the natural history of type 2 DM (24). The current pandemic of "diabesity" reflects societal changes rapid on the evolutionary time scale (25).
Given that even wildtype (WT) proinsulin lies near the border of ER stress (21), it is not surprising that mutations might impair foldability at physiological levels of expression (i.e. in the absence of insulin resistance (26)). Such mutations define a monogenic cause of DM, designated the "mutant proinsulin syndrome" (also known as mutant INS diabetes of the young (MIDY) (27)). The first such mutation, Cys A7 3 Tyr, identified as an autosomal DM locus in the Akita mouse (28), was subsequently observed in human neonatal-onset DM (29). Diverse clinical mutations have subsequently been identified that remove or introduce a Cys, leading in either case to an odd number of thiols (30). The variant proinsulin must therefore contain an unpaired cysteine, in principle mediating aberrant disulfide interchange and inter-molecular disulfide bridges (31). Such a variant proinsulin interferes with the folding, trafficking, and secretion of the WT protein (32), leading in the 1st year of life to unremitting ER stress, progressive ␤-cell dysfunction, apoptosis, and permanent DM (33).
Of particular interest are DM-associated mutations in proinsulin not involving cysteine. Whereas patients with Cys-related mutations invariably present with neonatal DM, nonCys-related mutations are associated with ages of onset ranging from neonatal to the 3rd decade of life; furthermore, the latter pedigrees typically exhibit incomplete penetrance (34). Such non-Cys-related mutations presumably perturb, to varying extents, key structural contacts stabilizing on-pathway folding interme-diates (21), such as the B8 dihedral angle (35). The present study was broadly stimulated by a mild MIDY mutation, Phe B24 3 Ser (36). This variant was originally described as an insulinopathy (insulin Los Angeles (37)) due to its isolation in the proband's serum, implying at least partial preservation of prohormone processing and secretion (36). [Ser B24 ]insulin exhibits decreased but non-negligible receptor-binding affinity (38). The decrement is unlikely to explain patient phenotypes. Delayed DM onset correlates with extent of ER stress induced on expression of the variant proinsulins (32).
Phe B24 , invariant among vertebrate insulins (Fig. S1) and insulin-like growth factors (IGFs), anchors the B20 -B23 ␤-turn in the free hormone (Fig. 1, A and B) (39) and directly contacts the IR (40). In the classical structure of insulin, this turn enables the C-terminal B chain ␤-strand (residues B24 -B28) to pack against the conserved side chains of Ile A2 , Val A3 , Leu B11 , Val B12 , and Leu B15 . These contacts seal the hydrophobic core of the ␣-helical domain. On insulin self-assembly, the ␤-strand forms a dimer-related antiparallel ␤-sheet ( Fig. S2) (39,41). On receptor binding, a conformational change ensues: rotation of the B20 -B23 ␤-turn and adjoining Phe B24 -Phe B25 element leads to detachment of the C-terminal B chain ␤-strand from the core. The ␤-strand itself packs in a groove between IR-ectodomain elements L1 and ␣CT (Fig. S3A), wherein the side chain of Phe B24 packs within a nonpolar pocket; the side chains of Phe B25 and Tyr B26 occupy more peripheral sites (Fig. S3, A and B) (40). Although the B24 -B25-B26 "triplet" of aromatic residues is a hallmark of the vertebrate insulin family (20,42,43), this motif is not present among invertebrate homologs (44).
Substitutions at B24 -B26 uncovered distinct and specific side-chain determinants of IR binding. The B24-binding pocket (defined by residues in L1, ␣CT, and the central B chain ␣-helix) optimally accepts Phe but can also accommodate cyclohexylalanine (Cha), Met, or branched aliphatic side chain; Tyr, His, and Trp are disfavored despite their aromaticity (39,45). The B25-binding cleft in the ␣CT domain by contrast requires aromatic side chains; Phe, Tyr, and Trp each confer high affinity (46,47). Despite the conservation of Tyr B26 (or Phe among IGFs) (48), the solvent-exposed B26-binding surface accommodates diverse charged, polar, or aromatic side chains (42,48). The striking preservation of the B24 -B26 aromatic triplet among vertebrate insulins and IGFs, conserved for more than 500 million years (49), is believed to represent intersecting evolutionary constraints imposed by function, foldability, assembly, and stability.
We address here a long-standing anomaly: the native activity of [Gly B24 ]insulin (42,45). Given the structure of the hormonereceptor interface (Fig. 1D) (40), substitution of Phe B24 by Gly would seem to leave a destabilizing cavity (blue box in Fig. 1E). How might native affinity and activity be regained? A potential framework is provided by the register-shift model; an alternative receptor-binding mode of insulin in which residue B24 reorients as part of a noncanonical five-residue chain reversal (B20 -B24), in turn enabling residues B25-B27 to engage the respective B24-, B25-, and B26-binding pockets of the IR (39,50,51). This model posits that the introduction of Gly B24 (or D-amino acid substitutions at B24 (45, 52)) destabilizes the classical four-residue ␤-turn (B20 -B23) (53,54). Such a shift would

Cryptic constraints in protein evolution
(i) preserve insertion of a Phe within the B24-binding pocket (Phe B25 ), (ii) deploy a well-tolerated Tyr within the B25-binding pocket (Tyr B26 ), and (iii) exploit the promiscuity of the solvated B26-binding surface to accommodate a small polar side chain (Thr B27 ) (illustrated in Fig. 1F).
This study provides mutational evidence in support of the register-shift model and suggests that this alternative mode of binding has been excluded in the evolutionary history of the vertebrate insulin family by toxic protein misfolding. Our results are discussed in relation to molecular dynamics (MD) simulations of [Gly B24 ]insulin/IR ectodomain complexes based on the crystal structure of insulin bound to its primary IR-binding site (PDB entry 4OGA) (40). Analysis of an excluded evolutionary path highlights the implicit role of Phe B24 as a conserved safeguard of native foldability.

Results
Insulin analogs were prepared by trypsin-catalyzed semisynthesis (55). To eliminate the native Lys B29 tryptic site during semi-synthesis, the analogs contained ornithine (Orn) at position B29 (39). Because of the chemical similarities between Lys and Orn, [Orn B29 ]insulin exhibits native-like biological and biophysical properties. In accordance with the classical studies of Tager and co-workers (42,45), substitution Phe B24 3 Gly preserved IR-binding affinity in a competitive displacement assay (Table 1 and Fig. S4A). Ser B24 by contrast impaired binding by ϳ25-fold. Although similar qualitative trends in biological activity were observed (as assessed in rats rendered diabetic by streptozotocin (STZ) (39)), the activity of the Ser B24 analog was less markedly impaired in vivo, presumably due to the compensating effects of delayed clearance from the bloodstream (as documented in human subjects) (56). The activity of [Gly B24 ,Orn B29 ]insulin in STZ rats was indistinguishable from that of control [Orn B29 ]insulin (Fig. S4B).

Classical SARs underlie specific predictions of the register-shift model
Variant insulins were designed to test the "register-shift" model of [Gly B24 ]insulin-IR engagement. These analogs exploited the following SARs at respective positions B24 -B26.
Position B24 -Whereas the B24-binding pocket at the hormone-receptor interface (40) has strict size, polarity, and geometric requirements, analogs with ␥-branched aliphatic or small aliphatic residues (Leu, Cha, 5 or Phe) at the B24 position retain significant IR affinity. By contrast, the para-OH group of Tyr impairs affinity by more than 10-fold (39).
Position B25-No classical pocket exists at the B25-cognate surface of the receptor. Binding is markedly impaired by a tetrahedral ␥-carbon in the B25 side chain (such as in Leu or Cha), highlighting a geometric requirement for trigonal sp 2 hybridization at C ␥ (46,47). Shorter side chains lacking a ␥-carbon (such as Ala or Ser) likewise lead to low activity.
Position B26 -The B26-related IR surface can accommodate diverse (but not all) side chains. Despite broad conservation of Tyr B26 among vertebrate insulins (42), small, polar side chains are preferred in vitro (48). Such conservation appears enjoined by the contribution of Tyr B26 to native self-assembly (48), a critical feature of insulin storage in the secretory granules of pancreatic ␤-cells (57).
Given these distinct site-specific SARs, the register-shift mode would lead to successive SAR shifts (Fig. 1, D-F). Whereas a tetrahedral ␥-carbon in the B25 side chain (Leu or Cha) would ordinarily impede the binding of insulin, for example, the same modification in the context of a Gly B24 analog might be "rescued" as the variant B25 side chain could then be well-accommodated within the B24-binding pocket. Similarly, whereas Ser B26 ordinarily confers high activity, in a shifted complex Ser B26 would be directed onto the incompatible B25binding surface (Fig. S5).

Comparative studies of paired insulin analogs provide critical tests of model
To test such predictions, a series of paired Phe B24 /Gly B24 analogs were prepared. Probes of respective B24 -B26-binding sites were provided by Cha B25 , Leu B25 , Tyr B25 , and Ser B26 (Fig.  S5). The pattern of receptor-binding affinities was found to be in broad agreement with the predictions (Fig. 2 and Table 2, column 3). Particularly striking is the rescue by Gly B24 of otherwise unfavorable B25 substitutions as exemplified by the tetrahedral ␥-carbons of Leu B25 and Cha B25 . Conversely, introduction of Gly B24 rendered Tyr B25 and Ser B26 (ordinarily welltolerated) unfavorable. Whereas this pattern of affinities would seem paradoxical in the context of the native hormone/receptor complex (40), register-shifted SARs provide a unifying and coherent account.
To assess the in vivo relevance of these receptor-binding studies, biological activities were tested in STZ rats ( Fig. 3 and Table 2, column 2). Pharmacodynamic (PD) results were broadly in accordance with the in vitro data, including Gly B24 rescue of minimally-active [Leu B25 ]insulin and [Cha B25 ]insulin analogs. Indeed, the double mutants exhibited activities similar to those of the corresponding B24 analogs. Conversely, addition of Gly B24 to super-active Tyr B25 or Ser B26 analogs led to impaired function, similar to respective Tyr B24 or Ser B25 analogs. Relative biological potencies were estimated based on areas under the PD curves (AUCs) as given in Table 2.
Systematic characterization of the complete set of analogs (15 in total) was obtained in cell-based assays. The assays employed human breast cancer cell line MCF-7 (58). Insulindependent IR autophosphorylation was assessed via the ratio of phosphorylated IR (p-IR) to total IR in immunoblots ( Fig.  S6A) (59). Insulin-dependent transcriptional regulation was probed via activation of cyclin D1 and repression of cyclin G2 as measured by q-rtPCR (Fig. S6B) (59). This extensive 5 Cha designates cyclohexylalanine, a nonplanar aliphatic analog of Phe. 0.07 Ϯ 0.04 WT human insulin (HI) 0.08 Ϯ 0.03 a Dissociation constants (K d ) for the lectin-purified IR (isoform A) were determined in a competitive binding assay as described previously (111).

Cryptic constraints in protein evolution
set of results, consistent with in vitro and rat-based studies, collectively provided evidence for the register-shift model ( Fig. 4 and Fig. S7).

Thermodynamic studies highlight the contribution of Phe B24 to monomer stability
Thermodynamic unfolding studies, based on circular dichroism (CD)-monitored guanidine titrations (60,61), demonstrated that Gly B24 attenuates stability (⌬⌬G u 1.3(Ϯ0.2) kcal/ mol relative to parent [Orn B29 ]insulin (Fig. 5, A  These thermodynamic decrements are in accordance with the structural role of Phe B24 at the edge of the hydrophobic core (39); in this context truncation of the B24 side chain would lead to a destabilizing crevice (39). That Cha B24 is also destabilizing (1.1(Ϯ0.2) kcal/mol) suggests that the aromaticity of Phe B24 may also make a specific contribution, presumably via weakly polar interactions (39).

Figure 1. Sequence and binding mode of insulin.
A, insulin contains two chains, A and B, connected by two disulfide bridges (A7-B7 and A20 -B19); an intra-chain bridge spans residues A6 -A11. The "aromatic triplet" comprises residues Phe B24 , Phe B25 (red circles), and Tyr B26 (blue circle). The present analogs contain substitution Lys B29 3 Orn (green circle), which eliminates a tryptic site during semisynthesis. B, ribbon structure of crystallographic T-state insulin protomer (PDB code 4INS). The A chain is shown as a green ribbon and the B chain as an orange ribbon. A type 1 ␤-turn comprises residues B20 -B23 (blue). Phe B24 and Phe B25 are shown as red sticks, and residue Tyr B26 is shown in cyan. C, cartoon representation of insulin-IR binding. Insulin's C-terminal B chain ␤-strand intercalates between receptor domains ␣CT and L1; the binding surfaces of B24, B25, and B26 are indicated by orange, purple, and blue-gray semicircles, respectively. D, structure of insulin bound to IR (PDB code 4OGA). The L1 domain is shown as a blue-gray surface, and the ␣CT domain is shown as a purple ribbon. The insulin A chain is shown as green ribbon; the B chain is shown as an orange ribbon; residue Phe B25 is shown as red sticks; and residues B23-B24 and B26 -B29 are shown as orange sticks. E, simulated structure of [Gly B24 ]insulin bound to IR, and the color code is as in D. If canonical binding conformations were maintained, the B24-binding pocket in the L1 domain would remain unoccupied as highlighted by the blue box. F, cartoon representation of the register-shift model. Residues Phe B24 , Phe B25 , and Tyr B26 occupy specialized binding surfaces (color code as in C) on the ␣CT and L1 domains during IR complexation (left). In the "register shift" model (black arrow), Gly B24 would enable Phe B25 to occupy the B24-binding pocket, Tyr B26 to pack against the B25-binding surface, and Thr B27 to bind the B26-binding surface.

Cryptic constraints in protein evolution
Although the side chain of Phe B25 is flexible in NMR-derived structures of insulin (as indicated by motional narrowing) (62), its substitution by Leu B25 or Cha B25 was also observed to impair thermodynamic stability (although to a lesser extent than the B24 substitutions): ⌬⌬G u 0.7(Ϯ0.2) and 0.9(Ϯ0.2) kcal/mol, respectively. Such destabilization may in part reflect a solvation free-energy penalty (63). In contrast to the above functional rescue, co-introduction of Gly B24 accentuated the destabilizing effects of Leu B25 and Cha B25 . The combination of Gly B24 and Leu B25 impaired stability (⌬⌬G u 1.9(Ϯ0.2) kcal/mol), whereas the combination of Gly B24 and Cha B25 kcal/mol impaired stability by 1.5(Ϯ0.2) kcal/mol ( Fig. S8 and Table S1). These additive perturbations are uncorrelated with effects on activity.

Phe B24 stabilizes native self-assembly and metal-ion coordination
Protein self-association (in the absence of zinc ions or phenolic ligands) was probed by size-exclusion chromatography (SEC) (48). A range of elution times and peak shapes was observed in relation to WT insulin (earliest eluting) and engineered monomer (KP-insulin; latest eluting) (Fig. 5C); molecular masses were inferred in reference to these standards ( Fig. S9 and Table 4). Whereas WT insulin and [Orn B29 ]insulin were essentially dimeric under these conditions (inferred molecular masses 6.8 and 6.6 kDa, respectively), the Ser B24 and Gly B24 analogs exhibited elution profiles intermediate between monomer and dimer (molecular masses in each case 4.9 kDa). Such variant dimers are presumably destabilized by a cavity at the dimer interface, comprising the contiguous volumes occupied by Phe B24 and its dimer-related mate.

Figure 2. In vitro competition IR-binding assays.
In vitro affinities of insulin analogs containing B24 -B26 substitutions for solubilized IR-A were determined by competitive displacement of [ 125 I-Tyr A14 ]insulin. Colored squares represent % tracer bound at increasing concentrations of analog; corresponding colored curves are fitted (see Table 1). Colors are defined within insets.

Cryptic constraints in protein evolution
contributes to a kinetic barrier that locks the R 6 insulin hexamer (Fig. 5, D and E.) A qualitative probe for the dynamic stability of the tetrahedral Co 2ϩ -coordination site within the R 6 hexamer is provided by the magnitude of the 574 nm d-d absorbance band; this spectral feature would be absent in an octahedral site. The Co 2ϩ d-d band was attenuated in the Gly B24 analog relative to the other proteins, including [Ser B24 ,Orn B29 ]insulin and insulin lis-  Table 2.

Cryptic constraints in protein evolution
pro (Fig. 5, D and E, and Table 4). These data suggest that the Gly B24 -analog hexamer contains a distorted coordination site and/or that this site exhibits local dynamic interconversion between tetrahedral and octahedral structures. Either mechanism would represent a transmitted perturbation as the mutation site (B24) is distant from the His B10 -related site of metalion coordination (66).

Phe B24 protects insulin from fibrillation, a toxic misfolding process
Susceptibilities of the Gly B24 and Ser B24 analogs to the formation of insoluble amyloid fibrils (Fig. 5F) were evaluated (in the absence of metal ions) relative to WT insulin, [Orn B29 ]insulin, and insulin lispro. Aliquots were agitated at a protein concentration of 60 M at 37°C. Lag times were measured using thioflavin T (ThT) as a fluorescent probe of cross-␤ assembly (19). The B24 analogs exhibited shorter lag times than the control samples ( Fig. 5G and Table 5). Comparison of corresponding single and double mutants demonstrated that Gly B24 also accelerated the fibrillation of analogs containing substitutions at positions B25 or B26 ( Fig. S11 and Table S2; data obtained at room temperature using "gentle sloshing" in glass vials). We envision that, under native conditions, the B24 substitutions enhance subpopulations of amyloidogenic partial folds as components of a conformational equilibrium (67).

B24 mutations in proinsulin impair cellular folding efficiency and induce ER stress
How Gly B24 affects the biosynthesis and secretability of proinsulin (the single chain precursor of insulin (68)) was evaluated in transiently-transfected HEK293T cells (Fig. S12) (69). Nascent proteins were pulse-labeled with [ 35 S]cysteine/methionine for 30 min followed by a 2-h chase. Oxidative folding and secretion of [Gly B24 ]proinsulin were evaluated using nonreducing Tris-Tricine-urea-SDS-PAGE (Fig. 6A, the c and m refer to the cells and chase media, respectively) (3).
[Gly B24 ]Proinsulin exhibited impaired folding efficiency (relative to WT) as demonstrated by a weaker band corresponding to native proinsulin and greater relative prominence of bands corresponding to non-native disulfide isomers. In accordance with previous studies (32), Ser B24 led to similar qualitative perturbations, but with less marked attenuation of the native band in the cellular fraction (Fig. S13).
Additional experiments were performed to assess protein trafficking and secretion. The variant proinsulins exhibited preferential secretion of non-native disulfide isomers (upper bands in lanes "m" in Fig. 6A and Fig. S13). Radiolabeled WT or variant proinsulin in cell lysates and post-chase media was quantified; relative secretion efficiencies of [Gly B24 ]proinsulin and [Ser B24 ]proinsulin were 25 and 40%, respectively (Fig. 6B). Further insight was obtained through fluorescence studies of variant proinsulins containing green fluorescent protein (GFP) within the C domain, as described previously (33). This chimeric construct is shown in schematic form in Fig. S14.
In rat INS1 ␤-cells, WT-proinsulin-GFP chimera exited from the ER were stored in the secretory granules and exhibited a punctate secretory-granule pattern that did not co-localize with an ER marker (Fig. 6C, red). Although previously, the [Cys B24 ]proinsulin-GFP (a mutation associated with neonatalonset DM) was severely misfolded and completely retained in the ER (Fig. S15), Ser B24 and Gly B24 were partially exported from the ER (Fig. 6C). It is possible that partial ER exit of the B24 variants in these assays was due to co-expression of WT rat proinsulins, which might act in trans to rescue trafficking of the human chimeric variants (70).
Our cell-based studies could be calibrated in relation to MIDY variants. Whereas Ser B24 represents a mild mutation (with onset in the 3rd decade of life and with variable penetrance (26, 71)), a negative control was provided by Cys B24 : its secretion efficiency was ϳ2% of WT (Fig. 6B). Because induction of ER stress is central to the pathogenesis of the mutant proinsulin syndrome (72), we next exploited a resident ER quality-control sensor (binding immunoglobulin protein; BiP), constructed as a luciferase fusion protein, to probe the extent of proinsulin misfolding (Fig. S16). Luciferase activity was stimulated in the order WT Ͻ Gly B24 , Ser B24 Ͻ Cys B24 (Fig. 6D). In addition to its the structural roles highlighted above (i.e. receptor binding, native-state stability, self-assembly, and protection from fibrillation), Phe B24 thus contributes to the folding efficiency of proinsulin in the ER and its secretability.
Together, these cell-based studies suggest that potential inheritance of Gly B24 in a vertebrate would enhance the risk of ␤-cell dysfunction, rationalizing its evolutionary exclusion despite the native biological function of the mature hormone. Comparison with MIDY mutations Cys B24 (neonatal) and Ser B24 (onset in early adulthood) suggests that in humans a Gly B24 variant INS gene would lead to DM onset in adolescence.

Discussion
The Anfinsen paradigm posits that the sequence of a protein determines its structure and stability (73). The "Levinthal paradox" (74), highlighting the complementary requirement for folding efficiency, has long stimulated investigation into biophysical mechanisms by which a folding chain avoids the need for an exhaustive conformational search (75). This study was motivated by the notion that the informational context of protein sequences can either be overt (apparent in the native structure), implicit (critical to folding efficiency but dispensable in the native state, once obtained), or cryptic (safeguards against off-pathway events) (76). Studies of diseases of protein misfolding (77) have provided insight into implicit determinants of foldability and the evolution of cryptic safeguards against toxic misfolding (5). These concepts have gained prominence as cross-␤ assembly, the core structure of amyloid (67), was recognized as a general thermodynamic ground state for polypeptides as a class of heteropolymers (1).
Insulin provides an attractive model for interdisciplinary investigation of protein foldability due to its small size, structural richness, deep evolutionary history, and therapeutic importance (78). To this end, this study has focused on an invariant aromatic residue (Phe B24 ) that makes critical contributions to structure, assembly, and function. A site of clinical mutations associated with DM (30,36), position B24, illustrates the remarkable compression of information that is possible within the sequence of a globular protein. The rigorous conservation of Phe B24 in vertebrates presumably reflects the intersection of multiple distinct constraints, from biosynthesis to function, that have governed the evolution of insulin-related sequences for Ͼ500 million years (Fig. S1).

Anomalous structure-activity relationships
A starting point for this study was provided by a long-standing enigma: the native activity of [Gly B24 ]insulin. Originally regarded as anomalous by Tager and co-workers (42,45), the analog's high activity stands in contrast to the impaired function of mutant insulins containing diverse L-amino acid substitutions at B24 (39). This seeming paradox was deepened by the enhanced IR-binding affinities of nonstandard analogs containing D-amino acid substitutions at this position (45,52). Such SARs motivated the hypothesis that Phe B24 functions as a site of conformational change on receptor binding (39,51). Detachment of the B24 -B28 segment from the ␣-helical core of insulin was proposed both to liberate the "aromatic triplet" (Phe B24 -Phe B25 -Tyr B26 ) as a conserved receptor-binding motif and to expose part of the underlying core to enable its further receptor engagement (80,81). Such extended SARs were supported by residue-specific photocross-linking studies ("photomapping mutagenesis" (82)).
The B chain detachment model envisaged the following: (a) the classical structure of insulin (wherein the B24 -B28 ␤-strand packs against the hormone's ␣-helical core (41)) represents an inactive conformation, and (b) Gly B24 or the corresponding D-amino acid substitutions destabilize this auto-inhibited state to enable native IR binding (45,52). Supported by the low activities of cross-linked (83) or single-chain (80,84) insulin analogs, this hypothesis received direct experimental support from crystallographic studies of insulin bound to fragments of the IR ectodomain (40,85). Indeed, the aromatic triplet was observed to pack within a groove between ectodomain elements L1 and ␣CT; their displacement indeed enables receptor engagement by conserved nonpolar side chains in the central B chain ␣-helix and N-terminal A chain ␣-helix (86). In addition, recent structures of insulin bound to the intact IR ectodomain (obtained by cryo-electron microscopic (cryo-EM) single-particle image reconstruction) have revealed that binding of the detached B24 -B28 ␤-strand is coupled to large-scale reorganization of domains within the dimeric assembly, a plausible first step in transmembrane signal propagation (87).

Register-shift model
In the hormone/ectodomain complex, the side chain of Phe B24 inserts within a classical nonpolar pocket, with boundaries formed by aromatic and aliphatic side chains in L1, ␣CT, and the central B chain ␣-helix (40). With the exception of Gly B24 and D-amino acid substitutions (above), the dimensions and nonpolar character of this pocket rationalizes SARs at B24 (39). The crystallographic and cryo-EM structures further defined the binding sites of Phe B25 and Tyr B26 (40,88). Respective SARs at these sites differ markedly from those at B24 in accordance with their distinctive structural features.   a Fibrillation lag times pertain to zinc-free WT insulin and analogs (in a monomer-dimer equilibrium); each protein was made 60 M in phosphate-buffered saline (pH 7.4). Samples were agitated via continuous shaking at 37°C. Time of initial fluorescence, defined as a 2-fold increase over baseline in ThT fluorescence, provided a criterion for onset of fibrillation.

Cryptic constraints in protein evolution
Our experimental strategy exploited these differences to test the register-shift model. The essential idea, first envisaged by Kaarsholm and co-workers (50) and made explicit by Brzozowski and co-workers (51), posits that the C-terminal segment of the insulin B chain may undergo a shift in alignment relative to the B24-, B25-, and B26-binding sites in the receptor. In this model, Gly B24 or D-amino acid substituents would not themselves engage the receptor; rather, their destabilizing effects would enable "slippage" of the B chain such that residues B25-B27 would align with the B24 -B26 receptor sites. The side chain of Phe B25 would thus replace Phe B24 in the B24-binding pocket; Tyr B26 would replace Phe B25 at an adjoining surface of ␣CT, and Thr B27 would replace Tyr B26 at a peripheral surface near the L1-␣CT interface. The latter, although a seemingly nonconservative substitution, would be in accordance with the intrinsic compatibility of the B26-binding surface of the ectodomain for small polar side chains (48).
The register-shift model made specific yet counter-intuitive predictions that Gly B24 could rescue the function of otherwise unfavorable substitutions at B25 or B26 and, conversely, render unfavorable substitutions at these positions otherwise compatible with high activity. Thus, whereas a tetrahedral ␥-carbon (such as in the aliphatic side chains of Leu or Cha) is ordinarily excluded at residue B25, such side chains could readily dock within the B24-related pocket via a register shift and hence be rescued as a mechanism-based example of second-site compensation. Conversely, Phe and Tyr are each well-accommodated at the B25-binding site (and indeed IGFs contain Tyr at this position (89)). Gly B24 would be predicted to impair the activity of a Tyr B25 analog by forcing its insertion into the B24binding pocket (in which the para-OH group is unfavorable (39)). Together, these results are in accordance with such reasoning and so provide strong evidence in support of the register-shift hypothesis.
Systematic screening of analog activities in human breast cancer cell line MCF-7 (assessinginsulin-dependent IR autophosphorylation and downstream transcriptional regulation) provided additional evidence in support of the register-shift model. This complete set of activities (15 analogs in total) corroborated selected measurements of receptor-binding affinities and in vivo potencies. Together, these results suggest that the native and register-shift modes of insulin binding lead to similar structural reorganization of the IR ectodomain and mechanisms of transmembrane signal propagation (90).
To explore potential structural mechanisms underlying the proposed register shift, two models of the interface between [Gly B24 ]insulin and the IR ectodomain were constructed. Modeling was based on the crystal structure of WT insulin in complex with the primary IR binding site (PDB code 4OGA) (40). Relative to the WT complex (Fig. 7A), a naïve model of the [Gly B24 ]insulin complex was first constructed without a shift, i.e. Phe B24 was substituted by Gly in the context of the native structure (Fig. 7B). In this model, such a binding mode would leave a cavity in the space ordinarily occupied by the B24 aromatic ring with an average volume of 27.3(Ϯ0.5) Å 3 across the simulation. Such a large cavity would presumably incur a free-

Cryptic constraints in protein evolution
energy penalty (91); an estimate of this penalty (based on the use of graph-based signatures assessing the change in residue environments proposed by Blundell and co-workers (92)) would be 2.12 kcal/mol. Although such a model illustrates how the WT complex could accommodate a large-to-small substitution without transmitted conformational changes, it appears inconsistent with the present functional studies of insulin analogs.
To allow for structural rearrangements near the site of substitution, the modeling procedure next exploited the unoccupied space in the WT structure near the native B20 -B23 ␤-turn. This space would readily accommodate a five-residue noncanonical turn (Gly-Glu-Arg-Gly-Gly) without steric clash such that the side chain of Phe B25 can orient toward and partially fill the B24-binding pocket (Fig. 7C). The root mean square deviation between this model and the WT complex (after MD simulation) is 1.9 Å. In this model Tyr B26 inserts into the B25-binding cleft (between ␣CT residues Val 715 and Pro 718 ), thereby mimicking the role of Phe B25 in WT complexes. MD simulations of all three binding modes (WT insulin in the native complex (Fig. 7D), [Gly B24 ]insulin in the naïve model (Fig. 7E), and [Gly B24 ]insulin in a shifted model (Fig. 7F)) indicated that such modes were stable throughout simulations. Whereas in the unshifted mode only the main-chain atoms of Gly B24 can mitigate the otherwise empty B24-binding pocket, in the register-shifted model this potential space is more efficiently filled by the side chain of Phe B25 (pink spheres in Fig. 7).
Estimates of residual cavity volumes in the respective models predicted the register-shift mode reduces the potential B24related cavity by 12.9(Ϯ0.4) Å 3 (Fig. S17).
We note in passing that the first mode, wherein the native Phe B24 -binding pocket would largely be empty, native affinity might be in part rescued from the full cavity penalty (91) by a presumed reduced free-energy cost of conformational change in the more flexible variant insulin. Although this scheme would be in accordance with general biophysical principles, it would not rationalize the present pattern of second-site effects. In particular, an empty B24-binding pocket could not rationalize why Gly B24 rendered Tyr B25 unfavorable or rescued the activity of Cha B25 through second-site compensation. In the future, such entropic effects could be addressed via free-energy molecular dynamics simulations. It would also be of interest to obtain a definitive structure of a [Gly B24 ]insulin analog bound to the IR ectodomain, such that the status of the B24binding pocket (empty or filled by Phe B25 ) could be directly visualized.
We emphasize that both models assume "molecular parsimony" as potential large-scale reorganization of the hormonereceptor interface (including changes in domain-domain contacts within the dimeric ectodomain) was not considered. These more complex possibilities, if realized, would underscore the essential insight that two modes of hormone binding are possible, but one is disallowed in evolution (see below).

Figure 7. Molecular models of insulin and variant [Gly B24 ]insulin in complex to the IR. Structural representations before (A-C) and in a representative model after (D-F) 250 ns of MD simulation of native insulin (A and D), [Gly B24 ]insulin without a register-shift (B and E)
, and with register-shift (C and F) leading to Phe B25 occupying the B24-binding pocket. In all models, the L1 domain is shown as a blue-gray surface, and the ␣CT domain is shown as a purple ribbon; the insulin A chain is shown as green ribbon; the insulin B chain is shown as orange ribbon, and B chain residues Gly B20 -Thr B30 are depicted as orange sticks, with pink spheres depicting the cavity within the B24 pocket formed due to the absence of Phe B24 .

Evolutionary implications
Our rodent studies demonstrated that the register-shift mode of receptor engagement not only permits IR binding but also can mediate metabolic regulation in a vertebrate. Indeed, [Gly B24 ,Orn B29 ]insulin was as effective as its parent [Orn B29 ]insulin for the short-term treatment of DM. This observation was not self-evident based on receptor-binding studies, as it was a priori possible either (a) that a shifted mode of receptor engagement could have been less active in signal propagation or (b) that the impaired stability of the Gly B24 analog could have led to its premature degradation in vivo. Given the observed maintenance of biological activity in vivo, however, we were motivated to consider why Gly B24 has apparently been disallowed in the evolution of vertebrate insulins and IGFs. This question relates to possible evolutionary constraints beyond prominence of Phe B24 in the structure and function of insulin's native state. A precedent for cryptic functions of insulin residues was previously provided by Phe B1 , shown to be critical to the folding efficiency of proinsulin in mammalian cells, but dispensable for structure and function of the mature hormone once its folding is achieved (93).
Implicit or cryptic functions of Phe B24 were probed through studies of the kinetics of insulin fibrillation in vitro (39,94) and through cell-based studies of the nascent foldability of variant proinsulins (69). Gly B24 analogs exhibited shorter lag times prior to onset of fibrillation, suggesting that the heightened presence of conformational substrates (which may constitute a small fraction of protein molecules at any one time) are amenable to non-native aggregation en route to formation of an amyloidogenic nucleus (95). The side chain of Phe B24 may thus serve to reduce the stability or accessibility of such conformational excursions relative to the predominant (native) conformation. Because in general fibrillation lag times have been found not to correlate with thermodynamic stabilities (⌬G u relative to the unfolded state), this role of Phe B24 presumably relates to amyloidogenic partial folds (39). Analogous concepts were introduced by Dobson and co-workers (96) in the analysis of human lysozyme variants associated with pathological amyloid deposits as a prototypical disease of toxic extracellular protein misfolding.
WT insulin is itself exquisitely susceptible to fibrillation. Such misfolding is ordinarily avoided in vivo through storage of the hormone as zinc-coordinated hexamers (97). The hexamers in turn crystallize with calcium to form protective microcrystals within the ␤-cell's secretory granules (57). An elaborate machinery of zinc import has co-evolved within the glucoseregulated ␤-cell granules (57). Such native self-assembly sequesters the flexible insulin monomer and presumably dampens conformational fluctuations that could otherwise nucleate cross-␤ polymer assembly (39). These studies have shown that substitution of Phe B24 by Gly impairs native self-assembly in the absence of metal ions and reduces the kinetic stability of R 6 hexamers (containing Co 2ϩ as an isomorphic replacement for Zn 2ϩ ) once formed. The intrinsic structural role of Phe B24 to protect the insulin monomer from non-native aggregation is thus accompanied by a second protective function in promoting protective native self-assembly.
An insulin species variant (in the South American rodent Octadon degus) forms senile amyloid plaques in pancreatic islets; these are proposed to impair ␤-cell viability leading to late-onset DM (98,99). The variant B chain sequence contains Pro B27 -His B28 , which impairs dimerization (98) (in a manner similar to insulin lispro; KP (100)), and substitution His B10 3 Asn, which blocks binding of zinc ions. This species variant is thus less protected from fibrillation than other vertebrate insulins. The continued survival of this species (and thus existence of the variant insulin as an extant vertebrate sequence) presumably reflects the post-reproductive onset of DM.

Phe B24 as a cryptic determinant of foldability
Proinsulin, the single-chain biosynthetic precursor of insulin (16,101), ordinarily enables efficient disulfide pairing in the ER within ␤-cells. Diverse clinical mutations have been identified in the insulin gene that impair this process (30), leading to chronic ER stress, ␤-cell dysfunction, and eventual ␤-cell death (78). This syndrome (MIDY) is a major cause of permanent neonatal-onset diabetes. Whereas the majority of such mutations add or remove a cysteine (thus giving rise to an odd number of reactive thiol groups in the nascent polypeptide chain), mutations have also been identified unrelated to Cys. Such mutations presumably identify sites critical to the mechanism of disulfide pairing (21).
MIDY mutations span a broad range of folding defects, from severe to subtle (78). The mutation Phe B24 3 Ser, originally identified by Tager and co-workers (insulin Los Angeles (37)), moderately impairs the foldability of proinsulin and is associated with variable onset of diabetes in the 3rd decade of life (32). In our cell-based studies, Gly B24 was at least as perturbing as Ser B24 with respect to fidelity of disulfide pairing, efficiency of folding, induction of ER stress, and extent of secretion into the medium. Prior to the discovery of insulin for the treatment of insulin-deficient DM, onset during the reproductive years (even with variable genetic penetrance) is likely to have been sufficient to reduce reproductive fitness and so purge such a sequence variation from extant insulin genes.
Previous fluorescence-imaging studies of ␤-cell lines expressing proinsulin variants have defined a range of potential defects in protein biosynthesis and/or subcellular trafficking. At one extreme, MIDY mutations resulting in an odd number of cysteines (including classical Akita allele Cys A7 3 Tyr) typically induce severe defects in nascent folding, leading to ER sequestration and ER-associated degradation (70). By contrast, a "zip code" variant compatible with native folding (His B10 3 Asp; proinsulin Providence) readily exits the ER, but altered trafficking in the Golgi apparatus leads to storage largely within a constitutive pool of secretory granules lacking prohormone convertases and the machinery of glucose-dependent exocytosis (102). Although [Asp B10 ]insulin exhibits enhanced affinity for the IR and augmented thermodynamic stability, the patients were found to exhibit elevated levels of circulating [Asp B10 ]proinsulin due to its aberrant constitutive secretion 7 (102).
Clinical mutations in proinsulin may exhibit a range of intermediate cellular phenotypes depending on the specific amino acid substitution. Fluorescence images have thus revealed partial or near-complete sequestration of the variant proinsulin within the ER with varying degrees of impaired Golgi trafficking (103). Within this spectrum of mutations, the most severe (exemplified by Gly B23 3 Val, perturbing a ␤-turn adjoining cystine B19 -A20) are associated with neonatal-onset DM (70); less destabilizing substitutions (such as Leu B6 3 Met adjoining cystine B7-A7) are associated with disease onset in early adulthood (akin to maturity-onset diabetes of the young) (104). The pattern of fluorescence conferred by Gly B24 (as probed by the C domain-inserted GFP in a proinsulin chimera) would predict onset in the 2nd or 3rd decade of life should this mutation be observed in future patients.
Phe B24 thus plays a role in the ER of ␤-cells (to enhance the foldability of nascent proinsulin) that is distinct from its subsequent roles in the post-Golgi network (to enable productive trafficking) and in glucose-regulated secretory granules (to promote native self-assembly). These aspects of biosynthesis are unrelated to the function of Phe B24 in the mature hormone (as a key contact at a conserved hormone-receptor interface). The multiple roles of this invariant residue thus illustrates the remarkable compression of coding information that is possible within a protein sequence. Gly B24 honors in the breach the importance of cell-biological mechanisms and pathological processes not discernable in the three-dimensional structures of proteins, however beautiful and compelling as molecular architectures.
Gly B24 is unique among potential amino acid mutations at B24 in that this variant insulin would retain native activity should its biosynthesis be feasible. Our results nonetheless predict significant defects in the folding, trafficking, and secretion of the variant proinsulin. It would be of future interest to test this prediction in an engineered mouse. To this end, a CRISPR/ Cas9-directed Gly B24 variant could define the extent of associated ␤-cell dysfunction and so inform which evolutionary constraints govern its exclusion among vertebrate insulin sequences. Because the mouse contains two insulin genes (four per diploid genome), such a mouse model may also require a deletion of one WT gene (per haploid genome) to mimic the gene dosage in a human heterozygous patient with the mutant proinsulin syndrome.

Concluding remarks
The complex conformational "life cycle" of the insulin molecule-from its nascent folding within pancreatic ␤-cells to receptor engagement at target tissues-imposes evolutionary constraints that may be overt, implicit, or cryptic. Safeguards against proteotoxic misfolding, as exemplified by the Phe B24anchored B20 -B23 ␤-turn, can make independent contributions to the biological function of the native state (40). Multidisciplinary dissection of such a safeguard, made possible here by a collaborative team, may in the future enable a deeper understanding of the informational context of insulin sequences. Such studies may define a subset of MIDY mutations amenable to compensation by chemical chaperones as an opportunity for precision medicine beyond insulin replacement therapy (105).
The Phe B24 -anchored protective hinge (40) is a conserved functional element of vertebrate insulins and insulin-like growth factors. A broader view of the space of insulin-like sequences would encompass invertebrate as well as vertebrate homologs. The genome of Caenorhabditis elegans, for example, encodes 38 putative insulin-like proteins with divergent features relative to vertebrate insulins (106). Remarkably, despite containing multiple core and surface substitutions, each individually disallowed in vertebrate insulin, the structure of one such nematode protein closely resembles human insulin (107). We envision that multisite compensation in this broader sequence space has markedly enlarged the fitness landscape (34). Second-site compensation among the present set of Gly B24 human insulin analogs provides proof of principle that this landscape can include novel modes of hormone-receptor recognition leading to transmembrane signaling. Although excluded in vertebrate evolution evolution, these alternative mechanisms of signaling may in principle be exploited in the design of therapeutic analogs (108).

Preparation of insulin analogs
Analogs were prepared by trypsin-catalyzed semi-synthesis using an insulin fragment, des-octapeptide (B23-B30)-insulin and modified octapeptides as described (55). The fragment was made by tryptic cleavage of human insulin and purified by reverse-phase HPLC (rp-HPLC). Octapeptides were prepared by solid-phase synthesis (109). The resulting 51-residue insulin analogs were purified by preparative C4 rp-HPLC (Higgins Analytical Inc., Proto 300 C4 with 10-m particle size and dimensions 250 ϫ 20 mm), and their purity was assessed by analytical C4 rp-HPLC (Higgins Analytical Inc., Proto 300 C4 with 5-m particle size and dimensions 250 ϫ 4.6 mm). Predicted molecular masses were in each case verified using an Applied Biosystems 4700 proteomics analyzer MALDI-TOF.

Circular dichroism spectroscopy
Far-UV spectra were obtained from 200 to 250 nm on an AVIV spectropolarimeter equipped with an automated syringe-driven titration unit (39). Helix-sensitive wavelength of 222 nm was used as a probe of protein denaturation by guanidine-HCl. Thermodynamic parameters were obtained by application of a two-state model (61). In brief, data were fit by nonlinear least squares to a two-state-model as shown in Equation 1, where x is the concentration of guanidine hydrochloride, and A and B represent respective estimates of the baseline ellipticities of the protein in its unfolded and native states as extrapolated to zero guanidine concentration (61). Simultaneous fitting of preand post-transition baselines avoided artifacts of linear plots of G versus concentration of denaturant (110).

Receptor binding assays
Affinities for IR-A were measured by a competitive-displacement scintillation-proximity assay (48). The assay employed detergent-solubilized holoreceptor with C-terminal streptavidin-binding protein tag. The receptor was purified by sequential wheat-germ agglutinin and streptactin-affinity chromatography from detergent lysates of polyclonal stably-transfected 293PEAK cell lines expressing each receptor. To obtain analog dissociation constants, competitive binding data were analyzed by nonlinear regression by the method of Wang (111).

Rodent assays
Male Lewis rats (mean body mass ϳ300 g) were rendered diabetic by STZ. To test potencies, the analogs were made 10 g per 100 l in a formulation buffer (16 mg/ml glycerin, 1.6 mg/ml meta-cresol, 0.65 mg/ml phenol, and 3.8 mg/ml sodium phosphate (pH 7.4)) and injected intravenously into tail veins (39). WT insulin or analogs were each re-purified by rp-HPLC, lyophilized, dissolved in Lilly diluent at the same maximum protein concentration, and re-quantitated by analytical C4 rp-HPLC to ensure uniformity. Dilutions were made using the above buffer.
Rats were injected at time t 0 . Blood was obtained from the clipped tip of the tail at time t 0 and every 10 min for the 1st h, every 20 min for the 2nd h, every 30 min for the 3rd h, and every hour thereafter to a final time of 5 h. Efficacy of WT insulin or analog to reduce the blood glucose concentration was calculated using the following: (a) the rate of change in blood glucose concentration over 240 min following initial injection, and (b) the integrated area between the x axis and the curve representing fractional blood-glucose level with relation to initial blood-glucose concentration (AUC). Statistical significance was assessed using a Student's t test.
Animals used in this study were housed at the Association for Assessment and Accreditation of the Laboratory Animal Care (AAALAC)-accredited facilities of Case Western Reserve University (CWRU) School of Medicine. All procedures were approved by the Institutional Animal Care and Use Committee (IACUC) Office at CWRU, which provided Standard Operating Procedures and reference materials for animal use (in accordance with the NIH Guide for the Care and Use of Laboratory Animals). The animal health program for all laboratory animals was directed by the CWRU Animal Resource Center. Animal care and use were further monitored for Training and Compliance issues by Veterinary Services.

Signaling assays in mammalian cell culture
Signaling activities of the insulin analogs were tested in MCF-7 human breast adenocarcinoma cells (which express IR-A, IR-B, and high levels of IGF-1 receptor (58,90)). MCF-7 cells (American Type Culture Collection, Manassas, VA) were cultured in Eagle's minimum essential medium supplemented with 10% FBS, 1% penicillin/streptomycin, and sodium pyruvate (1 mM). 24-h serum-starving protocol using appropriate culture medium except FBS was applied at 70 -75% cell confluence. After serum-starving, serum-free medium containing selected concentrations of insulin analogs was added to each well (control wells received medium with no added insulin).
Media were removed after 15 min, followed by cell lysis using RIPA buffer with protease and phosphatase inhibitors (Cell Signaling Technology, Danvers, MA). Protein concentrations in cell lysates were determined with a BCA assay kit (Thermo Fisher Scientific) for use in immunoblotting. Cells cultured for q-rtPCR assays (see below) were, after 24 h of serum starvation, treated with either a protein-free medium or medium supplemented with an insulin analog medium for 8 h.

Assessment of fibril formation
Physical stability was assessed by propensity to form fibrils. Insulin or analogs were made 60 M in phosphate-buffered saline (PBS; pH 7.4) with 0.02% sodium azide and 16 M ThT. Samples were plated in a Costar plate (250 l/well) and incubated at 37°C with continuous shaking at 1096 cycles/min in a Biotek (Winooski, VT) Synergy H1 plate reader. ThT fluorescence at 480 nm was assessed (following excitation at 450 nm) at 15-min intervals. The time of initial THT fluorescence defined the lag time.

Cryptic constraints in protein evolution
Fibrillation lag times of active [Gly B24 ]insulin analogs containing Cha or Leu at the B25 position were assessed in relation to control analogs at room temperature in a milder assay designed to simulate a patient-carried insulin pump. Analogs were made 60 M in PBS containing 0.01% sodium azide as an antimicrobial agent. The insulin solutions were gently rocked at 25°C in glass vials containing a liquid-air interface. Aliquots, taken at regular intervals, were frozen to enable later analysis of ThT fluorescence (19). For a given sample tube, the assay was terminated on the 2nd day following the appearance of cloudiness in the solution.

Assessment of hexamer stability
Visual absorption spectroscopy was employed to probe formation and disassembly of phenol-stabilized R 6 Co 2ϩ -substituted insulin hexamers (112). WT insulin or analogs were made 0.6 mM in a buffer containing 50 mM Tris-HCl (pH 7.4), 50 mM phenol, 0.2 mM CoCl 2 , and 1 mM NaSCN. Samples were incubated overnight at room temperature. Spectra (450 -700 nm) were obtained to monitor tetrahedral Co 2ϩ coordination with its signature peak absorption band at 574 nm (64). To determine the rate of Co 2ϩ release from the hexamers, metal-ion sequestration was initiated at 25°C by addition of an aliquot of EDTA (stored at 50 mM at pH 7.4) to a final concentration of 2 mM. Attenuation of the 574-nm absorption band was monitored on a time scale of seconds to hours. Kinetic data were consistent with monoexponential decay and were fit using Kaleidagraph software allowing for calculation of kinetic halflife (t1 ⁄ 2 ).

Assessment of insulin self-assembly
Oligomeric states of the insulin analogs were monitored by HPLC SEC (48). Analogs were made 0.6 mM in PBS. Samples (10 l) were applied through a Waters 717 autosampler onto a Zenix-C SEC-150 column (Sepax Technologies) with a nominal fractionation range of 0.5-150 kDa. Proteins were fractionated at a flow rate of 1 ml/min using a Waters Binary HPLC system. Elution was monitored at 215 and 280 nm using a dual-Waters 2487 absorbance detector. The mobile phase consisted of 10 mM Tris-HCl (pH 7.4) and 140 mM NaCl (48). Data acquisition and processing utilized Waters HPLC Empower software. The column was calibrated for apparent molecular-mass determination by fractionating standard proteins individually on the column (see Fig. S9 legend).

Cell-based proinsulin folding and secretion
HEK293T cells were plated into 6-or 12-well plates 1 day before transfection. A total of 1-2 g of plasmid DNA was transfected using Lipofectamine (Invitrogen). At 48 h posttransfection, cells were pulse-labeled with [ 35 S]Cys/Met and chased for 120 min. A proteinase inhibitor mixture was added to cell lysates and chase media. Samples were pre-cleared with Zysorbin and immunoprecipitated with anti-insulin antibodies (113) and protein-A-agarose overnight at 4°C. Anti-insulin immunoprecipitates were boiled for 5 min in gel sample buffer (1% SDS, 12% glycerol, and 0.0025% Serva Blue in 50 mM Tris (pH 6.8)) and analyzed using Tris-Tricine-urea-SDS-PAGE under nonreducing conditions (69).

Confocal microscopy of transiently transfected ␤-cells
INS1 cells were co-transfected with plasmids encoding (a) GFP-tagged proinsulin-WT or mutants and (b) RFP fused with KEDL sequence. At 48 h post-transfection, the cells were fixed with 3.7% formalin in PBS for 20 min, mounted with Prolong Gold with DAPI (Invitrogen), and imaged by epifluorescence in an Olympus FV500 confocal microscope. For immunofluorescence, transfected cells were fixed with 3.7% formalin in PBS for 20 min, permeabilized with TBS containing 0.4% Triton X-100, blocked with TBS containing 3% bovine serum albumin (BSA) and 0.2% Triton X-100, and then stained with primary rabbit anti-calnexin at 4°C overnight. Thereafter, sections were rinsed and incubated with secondary antibodies conjugated to Alexa Fluor 488 or 568 (Invitrogen). Slides were mounted with Prolong Gold with DAPI (Invitrogen) and imaged by epifluorescence in an Olympus FV500 confocal microscope (114).

Molecular modeling
Models of variant interfaces between [Gly B24 ]insulin and the IR ectodomain were constructed, as described previously (40), using the MODELLER (version 9.X) program (115). Both the monomeric NMR structure of insulin (PDB code 2KJJ) and the crystal structure of insulin in complex with the micro-receptor and the IR-A isoform ␣CT (PDB code 4OGA) were used as templates, modeling only the L1 and ␣CT IR domains in complex with [Gly B24 ]insulin. Models were generated both with the alignment against the templates leading to a void generated in the B24 pocket (Gly B24 aligned at the B24 position) and with the B chain register-shift leading to Phe B25 occupying the B24 pocket. Glycosylation is accounted for via the use of a single N-linked N-acetyl-D-glucosamine carbohydrate at each of the IR asparagine residues 16, 25, and 111. From the 50 models created, the model with the lowest MODELLER objective function was used in subsequent MD simulations.
MD simulations were performed, as described previously (40), using the GROMACS (version 5.1.2) (116) suite of programs and the CHARMM36 (117,118) force field. The simulation consisted of an initial steepest descent minimization, a short 50-ps positionally restrained MD holding the protein fixed, and finally unrestrained MD for 250 ns. Each complex was placed within a cubic box extending 10 Å beyond all atoms, with the remaining volume solvated using the TIP3P water molecule with sodium and chloride ions to an ionic strength of 0.1 M. The temperature and pressure of the systems were maintained using the velocity rescaling (119) thermostat at 300 K, and the Berendsen barostat (120) at 1.0 bar, updated every 0.1 and 0.5 ps, respectively. A cutoff of 12 Å was used to account for nonbonded interactions and the particle-mesh Ewald method Cryptic constraints in protein evolution (102) to account for long-range electrostatics applying a grid width of 1.2 Å and a sixth-order spline interpolation. All bond lengths were constrained with the P-LINCS algorithm, which allowed a time step of 2 fs.

Modeling analysis
The DUET (121) and mCSM (92) webservers were used to estimate in silico the effect of the Gly B24 mutation on the interaction between insulin and the IR primary binding site. Cavity analysis was subsequently performed using the trj_cavity tool (122) across all trajectories, defining cavities with a minimum volume of 50 Å 3 , and then were completely buried (i.e. in all six dimensions surrounding each voxel).