Structural basis of the effect of activating mutations on the EGF receptor

Mutations within the kinase domain of the epidermal growth factor receptor (EGFR) are common oncogenic driver events in non-small cell lung cancer. Although the activation of EGFR in normal cells is primarily driven by growth-factor-binding-induced dimerization, mutations on different exons of the kinase domain of the receptor have been found to affect the equilibrium between its active and inactive conformations giving rise to growth-factor-independent kinase activation. Using molecular dynamics simulations combined with enhanced sampling techniques, we compare here the conformational landscape of the monomers and homodimers of the wild-type and mutated forms of EGFR ΔELREA and L858R, as well as of two exon 20 insertions, D770-N771insNPG, and A763-Y764insFQEA. The differences in the conformational energy landscapes are consistent with multiple mechanisms of action including the regulation of the hinge motion, the stabilization of the dimeric interface, and local unfolding transitions. Overall, a combination of different effects is caused by the mutations and leads to the observed aberrant signaling.


Introduction
The epidermal growth factor receptor (EGFR) is a cell surface receptor, which regulates cell proliferation and differentiation by phosphorylating downstream signaling proteins (Sigismund et al., 2018;Wee and Wang, 2017). Its physiological activity is tightly regulated and genetic mutations that lead to its uncontrolled activation have been associated with cancer development (Sharma et al., 2007;Sigismund et al., 2018). Indeed, acquired somatic DNA alterations on the gene which encodes EGFR are found in 14-32% of non-small cell lung cancer (NSCLC) cases (Collisson et al., 2014;Zhang et al., 2016) and are correlated with adverse prognosis (Hirsch et al., 2003;Sharma et al., 2007). EGFR in-frame deletions in exon 19 are the most prevalent of EGFR kinase domain mutations accounting for 45% of EGFR mutations in NSCLC, followed by the L858R missense mutation in exon 21, which accounts for approximately 40-45% of EGFR kinase domain mutations (Sharma et al., 2007). Exon 20 insertions (Ex20Ins) collectively account for the third most common category of EGFR mutations found in NSCLC and are detected in 4-10% of the cases Yasuda et al., 2012).
Consistent with their purported role in the etiology of NSCLC, recent studies have shown that b3-aC DELREA, L858R, A763-Y764insFQEA, and D770-N771insNPG mutated EGFR proteins are oncogenic in both cell cultures and transgenic mouse models (Ji et al., 2006;Politi et al., 2006;Sordella et al., 2004;Xu et al., 2007;Yasuda et al., 2013). These mutations increase the kinase and clinically relevant DELREA and L858R mutations (Figure 1), we performed here long-timescale atomistic molecular dynamics (MD) simulations and enhanced sampling free energy calculations with a recently developed force field. The chosen force field has been shown to be able to reproduce reliably the dynamics of both well-folded and partially unfolded domains (Robustelli et al., 2018). As most Ex20Ins mutations have been shown to render EGFR resistant to most tyrosine kinase inhibitors (TKIs) (Yasuda et al., 2012;Ward et al., 2021), with the exception of the relatively rare A763-Y764insFQEA insertion, we tried to rationalize the behavior of a prototypical Ex20Ins, D770-N771insNPG, and of A763-Y764insFQEA, for which structural and biochemical data is available. The unbiased MD simulations of the isolated monomers and the asymmetric homodimers collectively lasted 36 ms for the WT and the mutant forms (see Materials and methods, Table 1    simulations hinted at slow motions that could not be sampled even by long MD simulations, we also applied enhanced sampling techniques, and in particular, parallel tempering metadynamics (PTme-taD) simulations. PTmetaD simulations allow the exploration of biologically meaningful conformational changes of kinases that take place on time scales longer than those accessible through standard MD simulations and reconstruct the associated free energy landscapes (Sutto and Gervasio, 2013;Marino et al., 2015;Kuzmanic et al., 2017).

Results
WT apo populates inactive states with a semi-closed A-loop and a partially disordered aC-helix In the simulations that we initiated from the active conformation of the apo, unphosphorylated EGFR monomer, the wild-type (WT) EGFR repeatedly departed from its initial aC-in conformation and transitioned to an aC-out-like conformation within~100 ns (Figure 1-figure supplement 2) that resembles the conformation seen in the DFG-out inactive state (PDB ID: 2RF9). However, although the K745-E762 salt bridge was broken, the repositioning of the aC-helix was not sufficient for the A-loop to adopt the characteristic two-turn helix that is found in the Src-like inactive conformation. In fact, the A-loop remained extended throughout the simulations, which suggests the presence of an energy barrier that needs to be crossed for the A-loop to break the b9-strand and form the characteristic two-turn helix (Sutto and Gervasio, 2013). This behavior is in line with the known activation mechanism of EGFR according to which dimerization is important for keeping the aC-helix in the aC-in conformation (Shan et al., 2012) and suggests that the presence of ATP and phosphorylation may also be critical for full stabilization of the active conformation.
Indeed, our unbiased simulations of the asymmetric WT-EGFR dimer further support the notion that the active conformation is not adopted spontaneously in solution but is promoted by the interactions between the kinase domains in the asymmetric configuration. In particular, we see that the aC-helix is kept in the aC-in conformation predominately on the receiver monomer ( Figure 1-figure supplement 4). On the contrary, the aC-helix of the activator monomer is flexible enough to adopt aC-out conformations, similar to the ones the aC-helix adopts in the monomeric form of EGFR. Unlike the behavior of the aC-helix that we see in the asymmetric dimer, in the simulations of the symmetric dimer, in which both monomers are found in the Src-like inactive conformation, the aC-helix of both monomers remains in the aC-out conformations throughout the simulation (Figure 1-figure supplement 4). The free energy landscape reconstructed from multiple replica metadynamics simulations shows three main minima. The shape and position of them are reassuringly equivalent to those we previously obtained with a different force field (Sutto and Gervasio, 2013). As expected, in the absence of a ligand or phosphorylation, these minima for the WT-EGFR correspond to inactive-like conformations ( Figure 2). Specifically, the ensemble of conformations that corresponds to the deepest free energy minimum -(CV1, CV2) = (0.6, 0.16) -contains conformations in which the A-loop has adopted a semi-closed conformation that precludes the ATP-binding, while the aC-helix is mostly ordered and has adopted the aC-out conformation (Figure 2, basin a1). The conformations in this basin are consistent with crystal structures of the Src-like inactive state (e.g. PDB ID: 2GS7) not only in the backbone arrangement but also in the details of key interactions. In particular, in this ensemble of conformations, the salt bridge of E762 with K745 is broken, and a new one is formed between K745 and D855.
In one of the two secondary free energy minima -(CV1, CV2) = (1.4, 0.1) -the characteristic two-turn helix on the A-loop of the Src-like inactive conformation is formed and the aC-helix has rotated away from the C-lobe and adopted the aC-out conformation (Figure 2, basin a2). Notably, in the most populated state of this basin, the N-terminal region of the aC-helix is unfolded, in accordance with the predicted partially disordered nature of this region (Shan et al., 2012). In the other secondary free energy minimum -(CV1, CV2) = (0.6, 0.2) -the N-terminal end of the aC-helix is folded, and the helix has rotated out of the core of the protein, while the A-loop is still relatively unstructured ( Figure 2, basin a3). Although the conformations in this basin resemble a lot the ones seen in the deepest energy minimum, they differ in the fact that the aC-helix points further away from the C-lobe, which results in a slight increase in the interlobal distance seen in the conformations of this basin. Interestingly, reprojection of the free energy with respect to the interlobal distance shows that the WT samples multiple conformations of different N/C separation (Figure 7-figure supplement 4C). Together, these two secondary minima may represent metastable states that are important during the transition from the active to the Src-like inactive conformation and reflect the necessary rearrangements for the deactivation to take place, that is unfolding of the N-terminal end of the aC-helix, increase in the interlobal distance and formation of the two turn helix on the A-loop. Despite the fact that the minimum energy path connecting the Src-like inactive to active conformation does not pass through the basin where the N-terminal end of the aC-helix is highly disordered (Figure 7-figure supplement 3), the 2 kcal/mol energy difference between the deepest minimum and this metastable state suggest that, under physiological conditions, this state should still be accessible.

L858R stabilizes active-like conformations
It has been suggested that the mutation of L858 to arginine enhances the kinase activity by shifting the equilibrium toward the active conformation through disruption of the hydrophobic packing of L858 with L747, I759, and L861 ( Figure 1-figure supplement 3) that is supposed to stabilize the Src-like inactive conformation (Yasuda et al., 2013). However, our unbiased simulation of the monomeric L858R starting from the Src-like inactive conformation showed that transient, favorable interactions of R858 with D855 and D837 of the DFG and HRD motifs, respectively, and F723 of the P-loop (Figure 3-figure supplement 1) stabilize the Src-like inactive conformation. On the contrary, in our unbiased simulations where L858R started from the active conformation, the side chain of R858 interacted mainly with E762 only (Figure 3-figure supplement 1).
The metadynamics simulations of the L858R mutant confirm the results that we obtained with an earlier force field (Sutto and Gervasio, 2013). The new free energy surface and the underlying structural ensemble is equivalent to the one obtained with the older force field, which bodes well for the robustness and accuracy of the results. The free energy landscape ( Figure 3) shows that the deepest basin corresponds to an ensemble of conformations that are very close to the active conformation ( Figure 3, basin d1). The positively charged R858 was surrounded by a cluster of negatively charged residues (E758, E762, D837, and D855) and in this ensemble, the side chain of R858 fluctuated between two states in which it interacts with either D837 and D855, or E758 and E762. The interactions of R858 with D837 kept the A-loop in an extended, active-like conformation, in which the b9strand on the A-loop is present. At the same time, as a result of the favorable interactions of R858 with E758 and E762, the aC-helix exhibited high secondary structure stability as opposed to the disorder seen in the aC-helix of the wild-type. Although the salt bridge of E762 with K745 is never present in this ensemble as is in the fully active conformation, the aC-helix is maintained close to the aCin conformation, probably due to the interaction of R858 with E762. Overall, the presence of R858 favors the active conformation at the expense of the disordered state that is highly populated in the WT. Stabilization of the aC-helix through a mutation like L858R has been shown to promote EGFR dimerization and downstream signaling, which explains the activating nature of this mutation (Shan et al., 2012;Zhang et al., 2006). In a secondary minumum higher in energy ( Figure 3, basin d2), the side chain of R858 is found completely exposed to the solvent, but although the aC-helix is found in the aC-out conformation, the short helix on the A-loop is not formed. Instead, the A-loop assumes a semi-closed conformation.  Figure 3. Free energy surface of the L858R mutant as a function of CV1 and CV2. Representative structures for two main conformations found in the deepest minimum are depicted in cartoon representation. The interaction of R858 with E758 and E762 is responsible for the secondary structure stability of the aC-helix, while the interaction of R858 with D837 and D855 prevents the formation of the two-turn helix on the A-loop. A yellow cross Figure 3 continued on next page D770-N771insNPG populates active-like and disordered aC-helix conformations Based on the crystal structure of the D770-N771insNPG mutant in the active conformation, it has been postulated that the NPG insertion at the C-terminal end of the aC-helix, locks the helix in the aC-in conformation (Yasuda et al., ). Proline is a residue found commonly in turns and in the case of the D770-N771insNPG, the inserted P772 induces a turn that leads to the formation of a hydrogen bond between the backbone of D770 and the amide of the inserted G773 in the crystal structure. This backbone hydrogen bond orients, in turn, the side chain of D770 such that it forms a salt bridge with R779 ( , suggesting that the aC-in conformation in which the mutant is found in the crystal structure is probably ligand-induced rather than mutation-induced. The unbiased simulation of the D770-N771insNPG insertion that was initiated from the Src-like inactive conformation in the monomeric form suggests that as soon as the inserted triplet adopts an extended conformation where the backbone D770-G773 hydrogen bond is lost ( Interestingly, R779H has been found to increase the phosphorylation of monomeric EGFR (Ruan and Kannan, 2015), underlying the role of R779 in the regulation of the aC-in and aC-out equilibrium. Our unbiased simulations suggest that the orientation that D770 adopts because of the inserted residues effectively prevents R779 from interacting with A767 that is seen in the aC-out conformation. This already shows the crucial role of the P771, G773, D770, R779, A767 network of interactions in the stabilization of different aC-helix conformations.
In the PTmetaD simulations and in the absence of a phosphate group on Y869 or a ligand, the aC-helix adopts the aC-out conformation where the K745-E762 salt bridge is broken, but the A-loop assumes an extended, active-like conformation. Mapping of the conformational space sampled during the unbiased monomeric simulations of the D770-N771insNPG on the FES shows that the deepest minimum corresponds to the CV space that the mutant samples in the simulations that we initiated from the active conformation (Figure 7-figure supplement 2). Projection of the FE on the (CV1, CV2) space results in a broad minimum around where the active conformation lies. Reprojection of the FE on the (CV1, CV3) space, however, separates the two dominant conformations that fall together in the same basin in the (CV1, CV2) projection of the FE (Figure 4).
In one of the two isoenergetic basins in the free energy surface ( Figure 4, basin e1), the side chain of the inserted N771 has flipped 180˚with respect to the crystal structure and interacts with the side chain of R779. The mutation does not seem to be able to quench the intrinsic disorder of aC-helix fully, as can be seen from the unfolded N-terminal part of the helix in the representative conformation of the minimum (Figure 4). In the second basin ( Figure 4, basin e2), the D770-R779 salt bridge is formed, similar to the crystal structure, and so is the backbone hydrogen bond between D770 and G773, and the aC-helix is fully helical. However, unlike the crystal structure, the aC-helix is in the aC-out conformation.     In all the conformations that correspond to the global minimum of the reported free energy surface, the A-loop is predominately found in the extended, active-like conformation regardless of the order or position of the aC-helix. Inspection of the structures in this basin suggests that the A-loop is kept in the active conformation through the interaction of E762 with K863 of the nine sheet.
Interestingly, in the unbiased simulations of the mutant, as well as in a relatively populated cluster within the broad basin seen in the PTmetaD simulations, the insertion seems to amplify the N/Clobe separation. The same N/C-lobe separation was seen also in the activator kinase in the asymmetric dimer (Figure 1-figure supplement 4). The role of this amplification, which is also seen in the A763-Y764insFQEA as it will be discussed later, is not yet clear, but it may be related to the Hsp90 recruitment and/or the binding kinetics of the substrate/inhibitors. A763-Y764insFQEA populates active-like states and semi-closed elongated states When studied in vitro, the EGFR Ex20Ins variant A763-Y764insFQEA was the only EGFR Ex20Ins-harboring cell line inhibited by erlotinib at concentrations less than 0.1 mM, while other EGFR Ex20Ins, such as the D770-N771insNPG, had a reduced affinity and sensitivity to EGFR TKIs (Yasuda et al., 2013;Hirano et al., 2018). From the homology model of the A763-Y764insFQEA mutant that we built, which was based on the active conformation of the wild-type, it is not immediately clear how the mutation exerts its effect. Mutagenesis studies indicate that this insertion extends the aC-helix toward the N-terminal direction while the glutamic acid that is inserted through the mutation assumes the role of E762 in the WT-EGFR (Yasuda et al., 2013). The insertion leads to the replacement of I759 with alanine that has a shorter side chain. Again, the amino acid at position 759 is involved in hydrophobic interactions around L858 (Figure 1-figure supplement 2), which have been speculated to be essential for stabilizing the Src-like inactive state. Therefore, based on the homology model, it is tempting to think that the mutation has an activating effect by disrupting this hydrophobic network, which is found in the Src-like inactive conformation, and making the inactive conformation less energetically favorable.
In the case of the unbiased simulations that we initiated from the monomeric Src-like inactive conformation, the aC-helix remained in the aC-out conformation throughout the simulation (Figure 1figure supplement 2), and the overall fold of the kinase was maintained with a noticeable increase in the interlobal distance. On the other hand, in the simulations that we initiated from the active conformation, the A763-Y764insFQEA was the only mutant in which the aC-helix transitioned back to the aC-in conformation several times after it visited the aC-out conformation (Figure 1-figure supplement 2). This unique behavior implies that the intrinsic tendency of the A750-N756 segment of the N-terminal of the aC-helix to fold that we see in the simulation may restrict the conformational flexibility of the aC-helix and favor aC-in conformations.
The deepest minimum in the free energy landscape of A763-Y764insFQEA -(CV1, CV2) = (À0.7, 0.3) -corresponds to an ensemble where the A-loop of the mutant samples semi-closed conformations, but unlike the WT, the aC-helix is stabilized in an aC-in conformation, where the K745-E762 salt bridge is formed ( Figure 5, basin g1). Interestingly, this ensemble contains several conformations in which the N-lobe has separated from the C-lobe, similar to the behavior seen in the D770-N771insNPG. In the case of the A763-Y764insFQEA, the activity of the mutant has been found to be sensitive to Hsp90 inhibitors, suggesting that this mutant is dependent on Hsp90 for stability and downstream-signaling (Jorge et al., 2018). As the function of this mutant has been shown to be chaperone-dependent (Jorge et al., 2018), the biological relevance of the observed elongated states with additional space between the two lobes might be to facilitate the recruitment of the Hsp90 chaperone.
In the second main minimum -(CV1, CV2) = (1.4, 0.1) -A763-Y764insFQEA samples inactive conformations where the two-turn helix on the A-loop is formed and stabilized by interactions of K860 with the inserted glutamic acid, and the shifted E762 and E758, as well as of interaction of L858 with the inserted F764 ( Figure 5, basin g2).
Notably, the fully active state is populated, albeit there is a small thermodynamic penalty of 1-2 kcal mol À1 with respect to the inactive state. In the representative structure of this basin -(CV1, CV2) = (À0.45, 0.55) -the K745-E762(inserted Glu) salt bridge is formed, and the C is in the aC-in conformation ( Figure 5, basin g3). At the same time, the A-loop assumes an extended, active-like conformation.
Higher in energy we found an ensemble of conformations which has structural features from both basins 2 and 3. In this basin ( Figure 5, basin g4), the A-loop of A763-Y764insFQEA adopts the extended conformation seen in the active state, similar to basin 3, while the K745-E762 salt bridge is broken and replaced by the K745-D855. In this ensemble, the N-terminal end of the aC-helix is highly disordered. Although high in energy, this metastable state seems to be important for the transition to the active state, since the minimum energy path for this transition passes through it (Figure 7-figure supplement 3).
In the apo, monomeric form, the insertion results in an extension of the 3 C loop, which was expected to increase the mobility of the aC-helix. Instead, in both the unbiased and PTmetaD simulations that we performed, this extension seems to provide structural stability to the aC-helix. Even though the aC-helix samples disordered states in all the basins, in the dominant conformations of these basins it is highly helical, suggesting that the mutation may also activate the receptor by suppressing the disorder of the aC-helix that is important for the formation of the asymmetric homodimer and downstream signaling. Moreover, looking at the position of the aC-helix in the two monomers of the asymmetric dimer, we see that, unlike the WT, the aC-helix is kept predominately in the aC-in conformation in both monomers (Figure 1-figure supplement 4). Interestingly, although the aC-helix of both monomers samples occasionally aC-out conformations, it transitions back to the aC-in conformation, even in the case of the donor kinase, where there is nothing to push the aC-helix back. We were not able to associate this behavior of the donor kinase with the mutation on the acceptor kinase or find an allosteric communication between the dimerization interface and the aC-helix. Nevertheless, since simulations of the monomeric state of A763-Y764insFQEA-EGFR shows that the aC-helix transitions naturally from the aC-out to the aC-in (Figure 1-figure supplement 1), we reason that the more frequent sampling of the aC-in conformation in the case of the donor kinase is most likely mutation-driven rather than dimerization-driven.
DELREA populates aC-out conformations EGFR mutants harboring DELREA show increased activity, comparable to the activity seen in L858R mutants and much higher than the one seen in the WT (Foster et al., 2016). Foster et al. showed that the deletion of five amino acids at the 3 C loop is optimal for kinase activation as this deletion is expected to prevent aC-helix from adopting an aC-out conformation (Foster et al., 2016). Moreover, it has been recently shown that DELREA EGFR is still oncogenic even in the absence of asymmetric homodimerization (Cho et al., 2013), unlike the wild-type and L858R EGFR, which acquire their oncogenic properties following asymmetric dimerization. However, we note that heterodimerization of the DELREA with other members of the ERBB family might play a role.
Modeling of the DELREA deletion in the active conformation suggests that the deletion does not significantly perturb the overall kinase domain. Inspection of the region around the mutation indicates that the deletion can be accommodated in the active, aC-in conformation by repositioning of residues 753 PKAN 756 , which form the initial N-terminal turn of the aC-helix in the WT-EGFR after the three-strand. On the other hand, modeling of the Src-like inactive conformation suggests that unfolding of this turn at the N-terminal end of the aC-helix is necessary for it to adopt the characteristic aC-out conformation. Long unbiased molecular dynamics simulations by Shan et al. suggest that the DELREA mutation stabilizes the aC-in active conformation relative to the Src-like inactive conformation, but the mutation does not prevent aC-helix from sampling the aC-out conformation (Shan et al., 2012). In our unbiased simulations, we confirm that the aC-out is a stable conformation in terms of the structural integrity of the two-turn helix on the A-loop, the aC-helix and the overall folding of the receptor.
The reconstructed free energy landscape confirms this observation, as the aC-helix is found in the aC-out conformation in one of deepest free energy minima -(CV1, CV2) = (1.4, 0.1) -the characteristic two-turn helix on the A-loop of the Src-like inactive conformation is formed, and the aChelix has shifted to adopt the aC-out conformation ( Figure 6, basin b2). The deleted L747, which on the WT contributes to the stabilization of the two-turn helix through a hydrophobic network between I759, L861, and L858 (Figure 1-figure supplement 3), is replaced by P753 (Figure 6, basin b2) and, therefore, the stability of the Src-like inactive conformation is not compromised despite the deletion.
In the second free energy minimum -(CV1, CV2) = (1.6, 0.45) -DELREA sampled mainly activelike conformations in which the C was again in the aC-out conformation, but the A-loop assumed extended conformations (Figure 6, basin b1) similar to those seen in the unbiased simulations of the active WT-EGFR. Interestingly, the A-loop in this ensemble also visited conformations in which the middle section of the A-loop formed a three-turn helix, reminiscent of an extended helix in the A-loop of MPSK1 kinase ( Figure 6-figure supplement 1, PDB ID: 2BUJ). This conformation is intriguing because the helix puts Y869, a well-known phosphorylation site of EGFR kinase, in an exposed position along with a group of glutamate residues (E865, E866, E868, and E872), which interact with K754 of the unfolded N-terminal end of the aC-helix (Figure 6, basin b1). Interestingly, in a basin between 1 and 2, this conformation that is less populated in basin two becomes dominant and we see Y869 interacting with K754 and E762 with K860 ( Figure 6, basin b4). Moreover, the minimum energy path connecting the Src-like inactive to active conformation passes through this basin (Figure 7-figure supplement 3), which suggests that interactions of the unfolded N-terminal end  of the aC-helix, which is now the 3 C loop, with the A-loop in the DELREA mutant should be important. This helical conformation of the activation loop might be involved in the phosphorylation of Y869. A similar conformation has been described in Shan et al., 2013 as an intermediate state of the active to Src-like inactive transition of the WT.
The third metastable minimum, slightly higher in energy than the other two, is very broad and consists of a divergent ensemble in which DELREA samples a range of conformations most of which are 'substrate-competitive' like ( Figure 1-figure supplement 1), that is neither the two-turn helix nor the nine strand on the A-loop is stable and the residues of the DFG motif are in the DFG-in orientation ( Figure 6, basin b3). In particular, in this ensemble, the A-loop points to the opposite direction with respect to the one seen in the active conformation, as seen in many kinases like the inactive conformation of the AuroraA kinase (PDB ID: 2WTV).
Although the DELREA is one of the most frequent mutations in NSCLC, the effect of DELREA on the dimerization ability of the DELREA-EGFR is poorly understood. Our simulations of the DELREA-EGFR in the asymmetric dimer do not show any significant alteration on the interface of the mutant dimer with respect to the WT, apart from the marginal stabilization of the inter-monomeric salt bridges K944-E962 and K752-E958 ( Figure 6-figure supplement 2). We anticipate, however, that suppression of the intrinsic disorder of the aC-helix of the monomeric mutant that we see in all of the explored conformations during the PTmetaD simulations should promote the formation of active asymmetric dimers.

Discussion
In light of the clinical importance of EGFR kinase domain mutations in lung cancer, understanding the unique properties of mutant kinases is crucial to drug development and lung cancer biology and provides a good opportunity to understand better the biophysical mechanisms regulating receptor kinase activation. The design of conformation-and mutation-specific kinase inhibitors to target a diversity of cancer-causing mutations is of paramount importance to achieve the optimal therapeutic outcome for cancer patients and represents an unmet clinical need in precision oncology. A better understanding of how oncogenic mutations affect, at a molecular level, the structure and dynamics of the receptor with respect to the WT may potentially lead to personalized cancer treatment tailored to the genetic profile of each cancer patient. Here, we have presented the effect of four such mutations, namely the L858R, D770-N771insNPG, DELREA, and A763-Y764insFQEA, all of which have been shown to increase the catalytic activity of EGFR in vitro (Zhang et al., 2006;Foster et al., 2016;Yasuda et al., 2013) and continuously signal the pathway, which stimulates cancer cell growth.
It has been well established that the activation of WT-EGFR is driven by ligand-induced dimerization or multimerization (Zanetti-Domingues et al., 2018;Lemmon and Schlessinger, 2010). In agreement with previous studies (Shan et al., 2013;Shan et al., 2012;Sutto and Gervasio, 2013), we observe that the active conformation of the WT-EGFR, in the absence of phosphorylation and ATP, is energetically unfavorable and infrequently sampled. The absence of a low-energy aC-in active conformation in which the catalytically important KE salt bridge between K745 and E762 is maintained suggests that dimerization is important to stabilize this conformation fully. Indeed, in our unbiased simulations of asymmetric EGFR homodimers, we see that the aC-helix is kept in the aC-in conformation predominately on the receiver monomer both of the WT and of the mutants (Figure 1-figure supplement 4). Dimerization seems to suppress the intrinsic disorder at the receiver interface stabilizing the active conformation of the receiver kinase, as this is reflected on the increased stability of the KE salt bridge in the receiver kinase, but not in the activator. However, the aC-helix of the receiver is still flexible enough to sample occasionally aC-out conformations in most of the asymmetric dimers, albeit in lower frequency than the monomeric kinase or when it acts as a donor, which suggests that the binding of ATP might still be essential for the aC-helix to be locked fully in the aC-in conformation.
It should be noted that apart from the dimerization, which is affected by the order-disorder transitions of the C helix, the binding of the substrate and the phosphorylation of the A-loop and the C-terminal segment will also play a role in the relative population of active and inactive conformations Kovacs et al., 2015). For instance, the phosphorylation of the A-loop, albeit not strictly necessary to propagate the signal downstream (Zhang et al., 2006), affects the flexibility of the loop (Shan et al., 2012), further stabilizing the active state and affecting the catalytic efficiency both directly and indirectly.
We reasoned that a collective motion that might be more directly linked to the steady state catalytic activity might be that of the hinge region, which has been linked to the phosphorylation of the substrate (Pucheta-Martínez et al., 2016), the active to inactive transition (Shan et al., 2012), and also the release of the products. The unbiased simulations of the different mutants in the monomeric and homodimeric form show that, the monomer explores more 'open' hinge conformations with respect to the receiver kinase of the dimer, which is in agreement with the activating role of dimerization. Interestingly, two of the mutants (DELREA and L858R) explore more 'closed' hinge conformations in the dimer (Figure 7-figure supplement 4A,B).The same trend is clearly visible in the reprojection of FE with respect to the interlobal distance, where the position of the global minimum follows the same order as the catalytic efficiency (Figure 7-figure supplement 4C). Although we can only speculate about this, it is likely that this compressed kinase seen in DELREA and L858R might be linked to slow release of the substrate/ligands.
In the global minimum of the free energy landscape of the WT-EGFR, the A-loop has adopted a semi-closed conformation that precludes the ATP-binding, and the aC-helix has adopted the aC-out conformation. The N-terminal end of it is partially unfolded in conformations that correspond to local minima, which is in line with the disordered nature of the aC-helix (Shan et al., 2012).
Light scattering and native gel electrophoresis have suggested that, unlike the WT-EGFR, which is found predominately in monomers in solution, the L858R EGFR preferentially forms dimers and oligomers (Shan et al., 2012). Earlier studies have shown that L858R promotes dimerization by suppressing the intrinsic disorder of the aC-helix (Shan et al., 2012;Sutto and Gervasio, 2013), which is an integral component of the dimerization interface, while it also promotes the active state. Our simulations corroborate this combined mechanism of action. The deepest free energy minimum corresponds to active conformations in which the interaction of R858 with D837 and D855 disfavors the formation of the characteristic two-turn helix on the A-loop of the Src-like inactive conformation, while the interaction of R858 with E758 and E762 represses the disorder in the aC-helix region, thus favoring the formation of asymmetric homodimers.
In-frame Ex20Ins are a subcategory of EGFR mutations, most of which are found in the aC-b4 loop that connects the aC-helix to the b4-strand of the kinase domain. TKIs (afatinib, lapatinib, neratinib, dacomitinib) targeting Ex20Ins EGFR have been shown to have limited activity in patients with EGFR-Ex20Ins-mutant tumors, with exception of the covalent inhibitor poziotinib, which has been found to demonstrate greater activity in vitro (Yasuda et al., 2013;Ruan and Kannan, 2018;Robichaux et al., 2018) and in patient-derived xenograft models of EGFR Ex20Ins mutant NSCLC and in genetically engineered mouse models of NSCLC (Robichaux et al., 2018). According to our simulations, we reason that the amplification of the lobe separation that we observed in both the unbiased and PTmetaD simulations could explain the low sensitivity of D770-N771insNPG-EGFR to TKIs, as this motion is expected to affect the residence time of the TKIs in the binding site. Unlike previous suggestions, according to which the inserted residues would prevent the receptor from adopting an aC-out conformation (Lemmon and Schlessinger, 2010), we sample such conformations, in which though the A-loop is maintained in an extended, active conformation. Interestingly, BDTX-189, a recently developed inhibitor that targets Ex20Ins mutations, exhibited potent tumor growth inhibition and tumor regression in preclinical models (Hamilton et al., 2020). The scaffold of BDTX-189 is similar to that of the lapatinib analogue neratinib, which suggests that BDTX-189 probably binds aC-out conformations in accordance with the low energy of such states on the free energy landscape of D770-N771insNPG. It is possible that the mutation decouples the N/C-lobe motions and subsequently the coupling between the aC-helix and the A-loop -a coupling that is important for the transition from the active to the inactive state (Huse and Kuriyan, 2002) -and makes the transition to the inactive conformation less favorable.
Unlike the majority of EGFR Ex20Ins (including the relatively rare D770-N771insNPG), which appear to be relatively insensitive to early EGFR TKI's, recent data show that the A763-Y764insFQEA insertion has greater sensitivity to agents such as gefitinib, erlotinib, and osimertinib (Yasuda et al., 2013;Floc'h et al., 2018). Our simulations show that the most stable conformation of the apo A763-Y764insFQEA corresponds to an aC-in conformation in which, however, the A-loop is not fully extended as seen in the active conformation. Interestingly, the fully active conformation that is not favorable in the apo WT-EGFR, appears as a relatively low energy secondary minimum in the FES of the A763-Y764insFQEA. What is more, after simulation of a homology model of the mutation in the asymmetric dimer, we anticipate the mutation to also promote the formation of such dimers, which would explain the higher activity of A763-Y764insFQEA-EGFR compared to D770-N771insNPG-EGFR. Specifically, the introduced F760 of A763-Y764insFQEA is accommodated in the hydrophobic cleft formed by V952, M949, and K953, while K757 of the receiver kinase in the WT is replaced by D761 in the A763-Y764insFQEA, which can interact with K953 of the activator kinase ( Figure 5-figure supplement 1A). Moreover, the introduction of the insertion and the consequent extension of the aC-helix by one turn brings K757 and K754 in such a position that allows them to interact more efficiently with the side chains of E967 of the aI-helix, and of D960 and D958 of the aH-aI loop, respectively ( Figure 5-figure supplement 1B). Notably, these interactions cannot not occur in the WT.
Through PTmetaD simulations, we explored the differences between the conformational landscapes of EGFR Ex20Ins and the WT, L858R, and DELREA mutants. Such differences are not evident from the available crystal structures where the residues of the ATP pocket appear to be identical among the different structures. We believe that the conformational ensembles from the PTmetaD simulations of the EGFR Ex20Ins with the more extended conformations that lead to a more open active site can be used to identify novel inhibitors with increased selectivity against the mutant EGFR. We should note that in such drug design efforts, using the appropriate conformation is of great importance. For example, we can already see that targeting the L858R-EGFR with aC-out inhibitors, like lapatinib, would lead to steric clashes that are not present in the pocket of an Ex20Ins EGFR like A763-Y764insFQEA, while inhibitors like gefitinib can be accommodated better in the tighter pocket of L858R (Figure 7-figure supplement 5).
It has been postulated that the b3-aC deletion forces the aC-helix into the aC-in position promoting the kinase activation and decreasing the sensitivity to aC-out inhibitors, like lapatinib (Foster et al., 2016). The free energy landscape of the DELREA mutation, however, suggests that this notion needs to be revisited. The most stable conformations of the unphosphorylated monomeric form correspond to minima in which the C-terminal end of the b3-strand unfolds, giving the aC-helix the necessary flexibility to adopt aC-out conformations. However, both the unbiased simulations and the conformations within all the energy basins show that the DELREA deletion almost completely suppresses the local disorder at the N-terminal end of the aC-helix region, which in turn is expected to favor the formation of asymmetric dimers that could explain the increased activity of DELREA EGFR with respect to the WT. Moreover, simulations of the binding of lapatinib to the WT-EGFR have suggested that the slow kinetics of lapatinib's binding are probably associated with the transition to a conformation in which the aC-helix is partially unfolded (Shan et al., 2012). It is, thus, tempting to speculate that the absence of such intrinsically disordered conformations in the free energy landscape of DELREA could explain the decreased sensitivity of DELREA-EGFR to lapatinib.
Our study reveals intricate differences between EGFR mutations, even between mutations that occur within the same exon. In light of the present results, we can rationalize the activation effect of these mutations in an atomic level resolution. Inhibiting the activity of abnormal EGFR proteins, as well as other proteins in the pathway, can interrupt this signaling pathway that causes cancer cells to grow.

Materials and methods
Protein structure preparation -monomeric EGFR The crystal structures for the active, unphosphorylated kinase domain of the wild-type (PDB ID: 2GS2) and mutant (PDB ID: 2ITV for the L858R mutant, and PDB ID: 4LRM for the D770-N771insNPG mutant) EGFR were retrieved from the Protein Data Bank and the used sequence comprises the interval L703-Q976 in our notation, L67-Q952 in the PDB notation. The amino acid numbering convention adopted here includes the 24-residue long membrane-targeting signaling peptide that is deleted in the mature protein.
The missing residues of the A-loop in 2GS2 were added using the software Modeller, according to the respective Uniprot sequence. Any co-crystallized ligand present in the crystal structures was removed. The structure of the active WT was used then as a template to model the missing residues in the structures of the L858R and D770-N771insNPG mutants. In the absence of crystallographic data regarding the structures of the active A763-Y764insFQEA, and DELREA mutations, we used the structure of the WT to introduce the mutations using the automodel functionality of MODELLER (Sali and Blundell, 1993). In the case of the A763-Y764insFQEA mutant EGFR, the four-residue FQEA insertion occurs in the middle of the aC-helix. The one turn of a helix that these residues form was modeled such that it shifts the register of adjacent residues in the helix toward the N-terminal direction, in accordance with the mutagenesis studies performed by Yasuda et al., 2013 to determine the direction of the shift.
Since none of the studied mutants has been crystallized in an inactive conformation, we used the crystal structure of the unphosphorylated WT-EGFR in the Src-like inactive conformation (PDB ID: 2GS7) to create models of the Src-like inactive states of the mutants. The mutations were introduced using MODELLER (Sali and Blundell, 1993).
MD simulations combined with experimental data have shown that the flip of the DFG motif in many protein kinases, including the kinase domain of EGFR, is dependent to the protonation state of the Asp residue of the DFG motif where protonation promotes the flip and favors the DFG-out conformation (Shan et al., 2013;Shan et al., 2009;Sultan et al., 2017). In all the structures that were used in the simulations that are described below, the Asp was unprotonated and, therefore, the DFG-flip was not sampled in any of the simulations described here.

Protein structure preparation -homodimeric EGFR
The apo, unphosphorylated WT-EGFR monomers were assembled into asymmetric and head-tohead symmetric homodimers based on the crystal packing of the monomers observed in PDB entries 2GS6 and 5CNO respectively (Zhang et al., 2006;Kovacs et al., 2015). In the asymmetric dimeric configuration, the kinase domain of each monomer is found in the active conformation, while in the head-to-head symmetric homodimer the kinase domains are found in the Src-like inactive conformation.
Out the four mutants that we simulated here, available crystal structures of mutant dimers exist only for the L858R EGFR (PDB ID: 2ITV) and D770-N771insNPG-EGFR (PDB ID: 4LRM), the monomers of which form active, asymmetric assemblies in their crystal structures. Given the activating nature of all the studied mutants, and in the absence of crystallographic data for symmetric dimers for any of them, we simulated the mutants only in the asymmetric dimers. The A763-Y764insFQEA and DELREA were modeled to the asymmetric dimer using the WT-EGFR as a template using the software MODELLER (Sali and Blundell, 1993).

Unbiased MD simulations details -monomeric EGFR
Prior to any simulation, the protonation state of each residue was calculated using PROPKA3.0 server (Li et al., 2005) and correspond to pH 7. Then, each system was enclosed in a dodecahedron box with periodic boundary conditions and solvated with TIP4PD water molecules, while Na + and Clions were added to reach neutrality, and the final NaCl concentration of 150 mM. The MD simulations were performed using the GROMACS 2018.3 simulation package (Abraham et al., 2015) patched with the PLUMED 2.4.1 plug-in (Tribello et al., 2014). For all simulations, we used a99SBdisp protein force field with its modified TIP4PD water model (Robustelli et al., 2018). The energy of all systems was minimized using the steepest descent integrator and the solvated systems were equilibrated afterwards in the canonical (NVT) ensemble for 5 ns, using a Berendsen barostat (Berendsen et al., 1984) at 1 bar with initial velocities sampled from the Boltzmann distribution at 300 K. The temperature was kept constant at 300 K by a velocity-rescale thermostat (Bussi et al., 2007) and a time step of 2 fs. The long-range electrostatics were calculated by the particle mesh Ewald algorithm, with Fourier spacing of 0.16 nm, combined with a switching function for the direct space between 0.8 and 1.0 nm for better energy conservation. The systems were equilibrated for additional 10 ns in the isothermal-isobaric (NPT) ensemble prior to the production run applying position constraints to the protein (with a restraint spring constant of 1 kJ mol À1 nm À2 ). An unconstrained 1000 ns production run in the NPT ensemble, coupled with a velocity-rescale thermostat (Bussi et al., 2007) at 300 K and a Parinello-Rhaman barostat (Parrinello and Rahman, 1981) at 1 bar, was carried out for each of the systems starting independently from the active and Src-like inactive conformation ( Table 1).

Unbiased MD simulations details -homodimeric EGFR
For the unbiased simulations of the asymmetric dimers, a similar protocol to the monomeric EGFR was followed for consistency. In particular, the protonation states of the residues at pH 7 were determined by PROPKA3.0 (Li et al., 2005), which left all the residues in their usual charge states. Then, each system was enclosed in a truncated dodecahedron box with periodic boundary conditions and solvated with TIP4PD water molecules, while Na + and Clions were added to reach neutrality, and the final NaCl concentration of 0.15 M (the total number of atoms was~230,000 per system). All molecular dynamics simulations were performed using a99SB-disp (Robustelli et al., 2018) protein force field in the NPT ensemble, keeping the temperature at 300 K with a velocity-rescale thermostat (Bussi et al., 2007), and the pressure at 1 bar with a Parinello-Rahman barostat (Parrinello and Rahman, 1981). The long-range electrostatics were calculated by the particle mesh Ewald algorithm, with Fourier spacing of 0.16 nm, combined with a switching function for the direct space between 0.8 and 1.0 nm for better energy conservation. The systems were equilibrated for additional 10 ns in the NPT ensemble prior to the production run applying position constraints to the protein (with a restraint spring constant of 1 kJ mol À1 nm À2 ). An unconstrained 4000 ns production run in the NPT ensemble was carried out for each of the systems ( Table 2).

Metadynamics simulations details -monomeric EGFR
A preliminary PTmetaD simulation in the well-tempered ensemble was performed for each of system using eleven replicas at increasing temperatures (300, 305, 310, 318, 326, 335, 344, 354, 363, 375, and 382 K), in which only the potential energy was biased so that to increase its fluctuation and reach a target exchange rate between neighboring replicas of 15%. To ensure that all meaningful conformations would be sampled over the course of the PTmetaD simulations, three out of the 11n replicas were started with the WT or mutant EGFR in the Src-like inactive conformation, whereas in the rest the kinase was in the active conformation. Once the desired exchange rate between the replicas was reached, the simulations were interrupted and the deposited energy was saved and used as an initial bias during the production PTmetaD simulations.
After this preliminary run, for the production stage, we run PTmetaD in the well-tempered ensemble for each system using the eleven replicas (300 to 382 K), where we kept biasing the potential energy by depositing gaussians of height, W 0 ¼ 1 kJ mol À1 , and width, s PotEnergy ¼ 100 kJ mol À1 to maintain a good exchange rate despite the big temperature separation. An exchange between adjacent replicas was attempted every 2 ps. The deposited Gaussians from the preliminary well-tempered run were loaded in the PTmetaD simulation to maintain the exchange rate between the replicas to 15%. A Gaussian was deposited in the collective variable space every 1 ps with W ¼ W 0 e À Vðs;tÞ ðf À1ÞT , where W 0 ¼ 4 kJ mol À1 is the initial height, T is the temperature of each replica, f ¼ 15 is the bias factor, and V(s, t) is the bias potential at time t and CV value s.
The following three collective variables were biased simultaneously during the production stage: the difference between atomic distances CV1 ¼ dðN K745 Z ; C E762 d Þ À dðN K745 Z ; C D855 g Þ; the distance in contact map space with respect to the inactive conformation CV2ðRÞ ¼ 1 N P g2G ðD g ðRÞ À D g ðR inactive ÞÞ 2 ; the distance in contact map space with respect to the active conformation CV3ðRÞ ¼ 1 N P g2G ðD g ðRÞ À D g ðR active ÞÞ 2 . D g ðRÞ is a sigmoidal function that measures the degree of formation of the contact g in the structure R and is defined as D g ðRÞ ¼ w g ðrg =r 0 g Þ n ðrg =r 0 g Þ m , where r g is the contact distance in the structure R, r 0 g is the contact distance in either the reference inactive or active conformation depending if g is a contact specific to the former or latter conformation, w g is the weight of the contact and is set to one for regular contacts and three for salt bridges, N is a normalization constant, n ¼ 6, and m ¼ 10. Given the previous success of these three CVs to probe the effect of oncogenic mutations on the free energy landscape of EGFR's kinase domain, here, we used the same set of contacts for the contact map distance of CV2 and CV3 as the ones used by Sutto et al. for the WT (Sutto and Gervasio, 2013). The set of contacts includes only those pairs that are able to discriminate between the two structures, that is, pairs of atoms that describe contacts, which are formed in the active conformation but are broken in the Src-like inactive and vice-versa.
To further assess whether these contact maps and their adaptation from Sutto et al. were indeed able to describe the transition of interest, prior to the PTmetaD simulations, we ran a set of steered-MD (SMD) simulations, where we steered the active to Src-like inactive transition along the contact map space. In these exploratory SMD simulations, the decrease of the root-mean-square deviation (RMSD) with respect to the Src-like inactive conformation when we start from the active conformation and steer toward the Src-like conformation (Figure 7) as well as the satisfactory superposition of the final SMD conformation to the Src-like inactive conformation confirmed the adequacy of these contact maps, which were then used in the PTmetaD.
The PTmetaD simulations were run until the free energy in the bidimensional projections and in the monodimensional projections did not change more than 2.5 kcal mol À1 in the last 100 ns and diffusive behavior was reached for all CVs (Figure 7-figure supplement 1). This convergence criterion led to 1000 ns/replica long simulations for WT, A763-Y764insFQEA, and DLREA, and to 900 ns/replica long simulations for L858R and D770-N771insNPG.  The input files for the PTmetaD simulations as well as the necessary files for the definition of the contacts that were used as CVs will be deposited to the public repository of the PLUMED consortium, PLUMED-NEST (PLUMED Consortium, 2019).
Replica exchange methods are generally based on the idea of sampling one 'cold' replica, from which the unbiased statistics can be extracted, plus a number of 'hot' replicas, whose only purpose is that of accelerating sampling. The 'hottest' replica should explore the space fast enough to overcome barriers for the process under investigation, whereas the intermediate replicas are necessarily introduced to bring the system smoothly from the 'hottest' ensemble to the 'coldest' ensemble. In the parallel tempering scheme that we used, the temperature is exploited to enhance the phasespace exploration within a replica exchange scheme, which ensures the correct sampling of the canonical ensemble for all the replicas. The free energy surface of each mutant was reconstructed by integrating the deposited bias during the simulation of the biologically relevant and lowest-temperature replica (T = 300 K), as required by the metadynamics algorithm.
To obtain a representative structure of each free energy basin, a clustering of the ensemble of conformations falling within each basin has been performed. The gromos algorithm of the g_cluster program from the GROMACS package has been used with the RMSD on the C atoms of the A-loop and the aC-helix residues (Pan et al., 2014) on the ensemble of the 300 K replica as the clustering metric with a cutoff of 0.2 nm. In the most populated cluster of each basin, the central structure, that is, the structure with the smallest distance to all of the other members of the cluster, has been picked as representative of the basin.
To map the position of the equivalent mutant structures on the contact map space of the WT, the distance between the atoms of each contact in the energy minimized structures of each mutant was compared to WT-reference structure and the degree of formation of each contact in the structure R was calculated using the sigmoid function that is described above. Deviation of the position of the atoms that define each contact in each mutant from the position of the equivalent atoms in the WT structure that was used as the reference, results in different positions of the crosses in the contact map space of the WT. On top of that, for the FES plots, the calculated contact-map distances were normalized and, therefore, mutants that explored larger portions of the contact-map space comparatively push the position of crosses further toward zero.
To trace the minimum energy path that connects the active to Src-like inactive conformation based on the calculated FES we used the MEPSAnd tool (Marcos-Alcalde et al., 2020) in a similar way to that described previously to characterize complex paths associated with biomolecular processes, including conformational transitions, and ligand (un)binding (Berteotti et al., 2009;Bernetti et al., 2019).