Analysis of the Contrasting Pathogenicities Induced by the D222G Mutation in 1918 and 2009 Pandemic Influenza A Viruses

In 2009, the D222G mutation in the hemagglutinin (HA) glycoprotein of pandemic H1N1 influenza A virus was found to correlate with fatal and severe human infections. Previous static structural analysis suggested that, unlike the H1N1 viruses prevalent in 1918, the mutation did not compromise binding to human α2,6-linked glycan receptors, allowing it to transmit efficiently. Here we investigate the interconversion mechanism between two predicted binding modes in both 2009 and 1918 HAs, introducing a highly parallel intermediate network search scheme to construct kinetically relevant pathways efficiently. Accumulated mutations at positions 183 and 224 that alter the size of the binding pocket are identified with the fitness of the 2009 pandemic virus carrying the D222G mutation. This result suggests that the pandemic H1N1 viruses could gain binding affinity to the α2,3-linked glycan receptors in the lungs, usually associated with highly pathogenic avian influenza, without compromising viability.


INTRODUCTION
2009 saw the first influenza pandemic of the 21st century, when an H1N1 influenza A virus spread to over 200 countries, resulting in more than 18 000 deaths globally. 1−3 The wide spread of the virus resulted from its efficient human-to-human transmission. 4−6 Transmission requires binding of the virus surface envelope protein, hemagglutinin (HA), to the host surface glycan receptors containing terminal sialic acid (SA). 7−9 The H1N1 human flu virus preferentially binds to receptors with a α2,6 linkage between SA and galactose (Gal). These α2,6-linked sialic acids (α2,6-SA) are predominantly found in the epithelial cell surface of the human upper respiratory tract. 10−14 In contrast, the HAs of avian-adapted virus preferentially bind to α2,3-SA, which are found in the digestive and respiratory tracts of birds, and on human lung alveolar cells. 9,10,12,15 Although the overall case fatality rate of 2009 H1N1 was not much different from seasonal flu, a specific mutation in HA at position 222 (H1 numbering) from aspartic acid to glycine (D222G) was more likely to be associated with severe or fatal cases. 16−20 The presence of this D222G mutation in both the HA from 1918 H1N1 and 2009 H1N1 viruses was reported to enhance HA binding to α2,3-SAs, potentially causing the severe cases observed early in the 2009 pandemic. 21−26 The results also showed that the D222G mutation significantly weakened binding to α2,6-SA for HA from 1918 H1N1, but only modestly affected the 2009 H1N1 virus. 21,22,24,26 To explain this discrepancy, we previously performed computer simulations for the α2,6-SA analog 6′SLN binding with HAs from the 1918 H1N1 (A/South Carolina/1/18, SC18) and 2009 H1N1 (A/Netherlands/602/2009, NL602) viruses with and without the D222G mutation. 21 For wild type SC18 (SC18-WT), two binding modes for α2,6-SA were identified. As shown in Figure 1b, these modes differ in the interactions formed between α2,6-SA and HA, which alter the position of the galactose sugar within the receptor binding pocket. Structures with shorter distances between the ligand and the residue at the 224 position are classified as mode 1, and those with longer distances are identified as mode 2. By definition, the ligand in mode 1 is buried deep inside the pocket, while in mode 2 it is situated higher up. NL602-WT was also observed to adopt two similar binding modes alongside many intermediate structures. For SC18 carrying the D222G mutation (SC18−D222G), almost no binding mode 2 was observed, while for NL602−D222G, although a to the viral membrane is highlighted for one monomer. (b) Schematic illustration of the two binding modes; the 6′SLN glycan in mode 1 and mode 2 is colored blue and green, respectively. The distance between the α-C of 224A and the O3 atom of Gal (denoted as d 224 ) is used to distinguish the two binding modes in the present work. decrease in binding mode 2 was seen, it was not entirely absent. This contrasting stability is a result of the additional interaction between α2,6-SA and the side chains of 130K, 142K, and 224E, which can stabilize α2,6-SA binding to HA. We suggested that these interactions allow NL602−D222G to maintain binding to α2,6-SA, explaining the fitness of this mutant virus strain.
Other efforts have also been made to understand how the D222G mutation changes the binding preference, in both experiments and computer simulations. In 2013, Zhang et al. 22 reported the crystal structures of HA from both H1N1 A/California/04/2009 and SC18 flu virus bound to α2,6-SA and α2,3-SA analogs. They suggested that the D222G mutation results in a missing salt bridge between 222D and 219K, loosening the 220-loop and enabling the key residue 223Q to interact with the avian receptor, switching the binding preference. Using NMR and molecular dynamics (MD) simulation, Elli et al. 14,27 also reported stable structures of α2,6-SA binding to HA from 1918 flu before and after D222G mutation. The results suggested that the interaction between α2,6-SA and the 220-loop in wild type HA no longer exists after the mutation, which allows the galactose sugar to move away from the protein surface. Priyadarzini et al. 28 also performed MD simulations investigating the binding of Sialyldisaccharides to various HAs. Using Neu5Acα(2−6)Gal as the analog of α2,6-SA and HA from the A/swine/Iowa/30 virus, they also observed two binding modes. Due to the difference of the analog and HA model, these two binding modes are not exactly the same as those reported in 2010, 21 most likely due to differences in the HA model and glycan analogue employed.
The previous work described above mainly focused on the differences between static structures; the kinetics of the binding process were not addressed. Having identified alternative binding modes, one can investigate not only their kinetic stability, both in terms of on/off rates and the free energy of binding, but also how readily they interconvert. This information, especially the transition mechanism, the barriers, and the key intermediates, could be helpful in identifying additional therapeutic targets and provide a more complete understanding of the effect of a particular mutation on viral fitness. In the present work, we investigate interconversion pathways between two binding modes to elucidate the contrasting affinity of the two different H1N1 virus mutants for α2,6-SA. A new intermediate network scheme is introduced to accelerate the search for kinetically relevant paths in the potential energy landscape. By comparing the distribution of energy barriers and energy changes of key rearrangements, we find that the D222G mutation exchanges the energetic preference of the two binding modes in both SC18 and NL602 HAs and that the effect is more dramatic in SC18. By analyzing the shape of the binding pocket, we have identified differences between the two binding modes for the 1918 and 2009 strains. Our results indicate that changes at positions 183 and 224 generate a smaller pocket in NL602 than in SC18, permitting greater flexibility in the α2,6-SA transformation between the two modes in NL602.

COMPUTATIONAL DETAILS
2.1. Simulation Setup. The AMBER ff99SB 29−31 molecular mechanics force field in conjunction with the GB OBC Generalized Born implicit solvent method (igb = 2) 32 was used to represent the protein. The potential was modified to remove a discontinuity at the nonbonded interaction cutoff, which was set to 16 Å. The trisaccharide, NeuAcα2,6-Galβ1− 4GlcNAcβ1 (6′SLN), with a methyl cap attached to the O1 atom of the terminal sugar, was used to represent α2,6-SA in our simulations, since it has been seen experimentally to bind to a range of H1N1 flu viruses. 33,34 Parameters for the three sugars that comprise this analog (NeuAc, Gal, and GlcNAc) were obtained from the GLYCAM06g 35 force field, as in our previous study, where the binding modes were first identified. 21 2.2. Structure Generation. The initial structure of the HA in NL602-WT was based upon the crystal structure in the SC18 H1N1. 21 The order parameter used as the criterion to choose initial structures was the separation of the center of mass of the backbone atoms of residue 224 and the center of mass of the O2 and O3 atoms of the α2,6-SA galactose sugar. This order parameter was chosen because it was found to demarcate two distinct binding modes, which had been observed in MD snapshots from earlier work. 21 Once these snapshots had been sorted into one of these binding modes (or intermediate structures), we selected representatives based on the following criteria: (i) structures temporally separated from an observed transition between modes and (ii) structures clearly assigned as mode 1 or mode 2. These choices ensure that the structures are indeed representative, prior to minimization.
For SC18-WT and NL602-WT, the selected structures were first locally minimized. We then manually interpolated between the two end points, removing those differences in binding mode 2 far away from the pocket, which do not affect interconversion behavior. For SC18−D222G, binding mode 2 was entirely absent from the MD trajectory reported previously. 21 Hence, in the current work, a structure for mode 2 was generated using the ligand from SC18-WT, superimposing it into the pocket of SC18−D222G, incorporating residue moves around the pocket corresponding to binding mode 2, and finally performing a local minimization. For NL602−D222G, although some of the structures seen in previous work exhibit features of binding mode 2, 21 the total number is very small. In the present work, binding mode 2 was also generated using the ligand from NL602-WT mode 2, superimposing it into the binding pocket, then minimizing. Figure 2 shows the structure of all the end points after minimization. A detailed description can be found on our website 36 and in the Supporting Information.
2.3. Identifying the Fastest Path. The doubly nudged elastic band (DNEB) method, 37−39 as implemented in the OPTIM program, 40 was used to generate likely candidate structures for transition states along the pathway between a pair of end point structures. These stationary points were then refined accurately using hybrid eigenvector-following. 41,42 The two minima that each transition state connects were identified by calculating approximate steepest-descent paths. All the minima and transition states characterized during the connection of each pair of end points were collected in the initial database generated by the intermediate network scheme, as described in the following section. The databases were then expanded using the discrete path sampling (DPS) framework, 43−45 with several SHORTCUT, SHORTCUT BAR-RIER, and UNTRAP runs, 46,47 as implemented in the PATHSAMPLE program. 48 Dijkstra's shortest-path algorithm 49 with appropriate edge weights 43,45 was used to determine the discrete path that makes the largest contribution to the rate constant between the end points of interest (the fastest path).
2.4. Intermediate Network Scheme. Efficiently finding a connected pathway between two significantly different end points can be very difficult if attempted in a single step due to

Journal of Chemical Theory and Computation
Article limitations in how intelligently we can interpolate between them. We found that the main differences between the two end points are mainly rotations of 6′SLN and the side chains of residues on the edge of the binding pocket. The main body of the complex, including the position of the center of mass of 6′SLN in the pocket, is almost the same. This observation inspired a new intermediate network interpolation technique to optimize refinement of the kinetic transition network and ensure that we have included as many paths as possible in the initial database. This scheme is illustrated in Figure 3 and consists of the following steps: (1) For two end points A and B, identify all the differences of local conformations and label these differences as a i and b i , (i = 1, n). The end point A can then be denoted as the set A = {a 1 , a 2 , ...,a n }. For example, in Figure 3, the molecular fragment in question is represented by the blue square, which should be similar for A and B. Four differences are identified and labeled with small circles, which are white in A and red in B.
(2) New configurations are generated for end point A by transplanting the different local B conformations into the original A minimum. We adopt the notation I ϵ to denote the minimum obtained by relaxing the structure after the local conformation replacement, where ϵ is a binary string, identifying the origin of each local configuration. Those originally from A and B are denoted as "0" and "1", respectively. The leftmost digit represents a 1 or b 1 . For example, in Figure 3, corresponds to the minimum obtained by relaxing the structure with the b 2 and b 3 conformations replacing a 2 and a 3 . Hence I 0···0 = A. We note that I 1···1 can differ in detail from B because the local relaxations after transplanting conformations can produce minor changes in nearby side chains. In principle the total number of I a is 2 n , including A. However, some minima obtained after relaxation may be the same, because certain substitutions may have to follow a particular sequence to be favorable.
(3) Double-ended connection attempts are then run for all the possible I ε and I γ corresponding to a single local conformation change between ε and γ. Here we require the pathway search algorithm to locate multiple transition states, ensuring that any given intermediate pairs will eventually be connected, as implemented in the "missing connection" approach, 50 which is coded in OPTIM. 40 In Figure 3, we have shown several possible connections. In principle, the total number of connections covered by this network could be n × 2 n−1 + 1, where the extra +1 corresponds to the case when I 1···1 ≠ B. The number will be less than this upper bound if some of the minima generated in step 2 are the same. We consider this procedure to be converged when pathways have been found between all of the states I. One advantage of the scheme is that all the minimizations in step 2 and the connection attempts in step 3 can be run in parallel independently. The corresponding pathways can have multiple transition states with additional local minima stabilized by nonlocal interactions. However, they are precisely the paths that are likely to have the lowest barriers and make the largest contribution to the overall rate constant. For the ideal case that I 1···1 = B, and any pair of intervening minima have only one possible connection, corresponding to a single transition state, then in principle, this network will cover all the possible paths that connect the two end points. In practice, we may have to group more than one residue into a single fragment so that one single connection in step 3 still contains several elementary steps and alternative possibilities. Hence, it is still worthwhile to further refine the initial database that includes all the minima and transition states from the intermediate network connection searches.

RESULTS AND DISCUSSION
3.1. Fastest Paths for the Four Strains. The glycan binding pocket is located on the top of each monomer of the HA trimer. As shown in Figure 1b, it consists mainly of three secondary structure features: the 190-helix (residues 187 to 195), 220-loop (residues 218 to 225), and 130-loop (residues 132 to 135). 2,22,51 The aspartic acid residues located at the 222 and 187 positions contribute most to the binding of α2,6-SA. By definition, the fastest path makes the largest contribution to the rate constant between the end points of interest. Analyzing this path helps us to understand how displacements of individual residues contribute to the overall interconversion mechanism between the two binding modes. The fastest paths in each of the four strains all involve loss of the interaction between NeuAc and 187D and formation of a new hydrogenbond between Gal and the 220-loop. For the fastest path in SC18-WT, we can distinguish four sets of movements. NeuAc first loses interactions with 187D by rotating the 7-hydroxyl and forms new intramolecular hydrogen-bonds, after which the G 4 and G 3 movements occur in succession. The 3-hydroxyl of Gal points toward the 2-hydroxyl of Gal in the binding mode 1, which is unique to the SC18-WT strain. These two steps together involve a highest barrier of 7.8 kcal/mol and make the largest contribution to the distance change between the α-C of 224A and the O3 atom of Gal (denoted as d 224 ), which distinguishes the two binding modes. The final change is the R 187 motion, where 187D loses most of its interactions with the ligand, leaving only one hydrogen-bond with GlcNAc. Further details of the fastest path can be found in the Supporting Information.
In SC18−D222G, the position of the ligand is first changed by the rotation of the 8-hydroxyl of NeuAc, which breaks its hydrogen-bond with 187D and forms new hydrogen-bonds within NeuAc itself. This step has a high barrier of 14.1 kcal/mol and is endothermic by 11.7 kcal/mol. Next are the G 4 and G 3 movements, which are similar to those in SC18-WT, except that the 3-hydroxyl of Gal interacts with the backbone of 222G in binding mode 1. The G 3 movement still results in the largest change in d 224 but the corresponding energy changes are much smaller than those in SC18-WT. The final steps again correspond to motion of R 187 .
For NL602-WT, the first steps involve breaking the hydrogen-bond between 224E and the 2-hydroxyl of Gal, followed by conformational changes of the 224E side chain. This step does not exist in SC18 since the 224A side chain in SC18 is too short to interact with the ligand. The 9-hydroxyl of Gal then rotates, forming a hydrogen-bond with 224E, which shifts the whole 220-loop toward the ligand. The G 4 , G 3 , and R 187 movements that follow have two interesting differences from previous pathways: (1) For both SC18-WT and SC18− D222G strains, we observed translational motion of the ligand toward the 220-loop when the G 4 and G 3 movements happen, which is not seen in NL602-WT. (2) After the R 187 motion, the ligand in SC18-WT and SC18−D222G moves away from the base of the pocket, losing interactions with 187D, while in NL602-WT, most of the hydrogen-bonds remain after the R 187 displacement. These differences are discussed further below.
For the fastest path in NL602−D222G, the R 187 displacement comes first, together with hydrogen-bond breaking between 224E and the 2-hydroxyl of Gal. As discussed for NL602-WT, no interactions between 187D and the ligand are lost. Then, following the G 4 and G 3 movements, d 224 increases from 5.2 to 6.4 Å. The final two steps correspond to motion of residues 180 H and 191L toward the base of the pocket.

Kinetics of the Key Elementary
Step. By comparing the fastest path that connects the two binding modes in each case, we find that the shift of the 3-hydroxyl of Gal, either binding to the side chain of 222D of wild type HAs, or moving to where it would have bound to the side chain of 222D in mutant HAs with 222G (the G 3 movement), is the key elementary step differentiating the two binding modes. This step is directly affected by the D222G mutation.
Although the G 3 displacement is common to all four cases, the fastest transition paths exhibit systematic differences. In SC18-WT, the 3-hydroxyl does not bind to the backbone first before binding to the side chain of 222D. In NL602-WT this change, and the binding of the 4-hydroxyl of Gal to the backbone of 222D, occur simultaneously. To compare further we collect all the transition states corresponding to steps where the 3-hydroxyl of Gal either binds to the side chain of 222D in wild type HAs, or moves to the place where it would have bound to the side chain of 222D in mutant HAs with 222G. The barrier and energy change distributions for the four cases are illustrated in the contour plots shown in Figure 4. In SC18-WT, 127 out of 2742 transition states in the database correspond to this step. The peak value of the distribution is around (0.7, −1.7), corresponding to an energy barrier of 0.7 kcal/mol and an energy difference of −1.7 kcal/mol. The pathways that contribute to this peak all involve breaking the hydrogen-bond between the 3-hydroxyl of Gal and the backbone of 222D. The transition states that contribute to the shoulder peak around (0.5, −2.5) are broadly similar, with insignificant differences in nearby side chains, for example, the position of the 7-hydroxyl of NeuAc. The step in which the 3hydroxyl of Gal breaks hydrogen-bonds with non-222D residues mainly contributes to the small peak at (3.9, −1.2), as we observed in the fastest path.
In SC18−D222G, 126 out of 3475 transition states in the database correspond to the key elementary step. The main peak shifts to (0.6, 0.5), which still mainly correlates with interactions of the 3-hydroxyl of Gal and the backbone of 222G in the initial state. The energy barrier remains almost the same as in the wild type HA, while the overall energy difference increases by at least 2.2 kcal/mol, as a result of the D222G mutation. The small peak located at around (2.8, 2.3) corresponds to cases where the 3-hydroxyl of Gal is not bound to the backbone of 222G in the initial state. The structure of the corresponding initial state is very similar to the one reported by Zhang et al., 22 where the 3-hydroxyl binds to 219K and the 4-hydroxyl of Gal binds to the backbone of 222G.
In NL602-WT, 332 out of 3719 transition states in the database correspond to the 3-hydroxyl motion. The main peak in Figure 4c is around (0.1, −3.4) and represents paths where the 3-hydroxyl breaks the hydrogen-bond to the backbone of 222D, while simultaneously binding to the side chain of 222D. Compared to SC18-WT, this step is significantly faster and more exothermic in NL602-WT. The same mechanism also contributes to the broad peak ranging from (2.7, −5.5) to (3.6, −4.9), and in these paths, the 4-hydroxyl of Gal is already bound to the backbone of 222D in the initial state. The 4hydroxyl subsequently binds to the backbone of 222D, as in the fastest path. Another small peak located at around (0.1, −1.0) corresponds to paths where the 3-hydroxyl of Gal does not interact with the backbone of 222D in the initial state.
In NL602−D222G, 128 out of 3524 transition states in the database correspond to 3-hydroxyl motion. Only one peak located at (0.3, 0.1) can be seen in Figure 4d, corresponding to paths with the 3-hydroxyl of Gal bound to the backbone of 222G in the initial state.
For wild type HAs, we see that binding the 3-hydroxyl of Gal to the side chain is generally an energetically favorable process. When the hydroxyl shifts from interacting with the backbone to the side chain, the barrier is always lower than 1 kcal/mol. As a result, mode 2 corresponds to an energy minimum in this transformation, increasing the equilibrium population and making it accessible during MD simulations, as observed in previous work. 21 For mutant HAs, mode 2 is generally energetically unfavorable compared to mode 1. In particular, for SC18− D222G mode 2 is so unfavorable that the reverse process (mode 2 to mode 1) has a barrier of only 0.1 kcal/mol. According to the energy profile, other processes, such as the rotation of the 187D residue, are also more endothermic in SC18−D222G than in NL602−D222G. As a result, the equilibrium occupation probability of mode 2 in SC18− D222G is rather low, which is consistent with the previous observation that mode 2 is rarely observed during MD simulations. 21 For NL602−D222G, the paths are similar, but the energy difference is lower. Hence, binding mode 2 can still be observed in the NL602−D222G mutant HA, albeit with decreased probability. A more detailed analysis of this key elementary step will be addressed in future work 3.3. Shape of the Binding Pocket. One reason why the binding of the 3-hydroxyl of Gal to the side chain of 222D is more exothermic and has a lower barrier in NL602 may be the differences in the shape of the binding pocket. When the ligand binds to 222D/G or unbinds from 187D in SC18, we always observe a significant shift of the Gal and GlcNAc sugars toward the 222 position. However, in NL602, there is no such translation. We examined the shape of the pocket for the four strains using the POVME program 52,53 and illustrate the differences observed in Figure 5. Structures were aligned based on the 190-helix, 130-loop, and 220-loop using VMD. 54,55 Two major differences between SC18-WT and NL602-WT can be observed in Figure 5a: (1) The pocket in SC18-WT is larger than that in NL602-WT on the side of 220-loop and 190-helix, where the majority of the interactions between ligand and pocket are. (2) The pocket in NL602-WT is larger on the side of the 130-loop and also between the 220-loop and 130-loop. Comparing D222G mutants in Figure 5b, the blue region is larger than that in Figure 5a. However, the extra volume is located mainly at the top of the pocket, which may arise due to measurement error, since the pockets are open at the top.
To analyze the pocket shape differences further, in Figure 6 we show a triangular region defined by 187D, 219K, and 222D/G on the edge of the binding pocket.
The shape of this region provides a simple representation of the 220-loop side of the binding pocket, where the majority of the rearrangements between the two binding modes are localized. We find that the 219K residue in SC18 is about 1−2 Å further away from 187D but slightly closer to 222D/G compared to NL602. As a result, 219K is nearer to the center of the pocket in NL602.

Journal of Chemical Theory and Computation
Article Another difference is the position of residue 224. In SC18, 224A interacts with the 4-hydroxyl of Gal in binding mode 1 and 224E in NL602 interacts with the 3-and 4-hydroxyl of Gal. Hence, 224A/E mainly determines how far to the left (from the viewpoint of Figure 6) and how deep α2,6-SA binds within the pocket. In SC18, the position of 224A is close to the center of the triangle. The distance between 224A and the line connecting 187D and 222D/G is 2.8 Å in the wild type HA and 2.5 Å in the 222G mutant HA. In NL602, the position of 224E is close to the line connecting 187D and 222D/G. The distance between 224E and the line connecting 187D and 222D/G is 1.1 Å in the wild type HA and 1.4 Å in the mutant HA, and the distance between 224A and 187D in SC18 is also longer than that in NL602. Since 222D protrudes slightly from the edge of the pocket, the position of 224A in SC18 allows the ligand to bind deeper and closer to the left side. The variations in the distance between the 220-loop and 190-helix confirm the shape differences observed from the POVME measurement: the pockets in SC18-WT and SC18−D222G are larger than for NL602-WT and NL602−D222G on the side of the 220-loop.
We also used POVME to identify changes in pocket shape during the interconversion. For SC18-WT, the first nontrivial shape change of the pocket results from the rotation of the 4hydroxyl of Gal (Figure 7a), which shrinks the volume between the 220-loop and the 130-loop. This reduction in size is caused by both the conformational change of the 223Q side chain and the movement of the 220-loop toward the 130-loop. Another significant shape change results from the rotation of 187D (Figure 7b), which reduces the size of the region in front of 187D. The observations in the case of SC18−D222G are similar to those for SC18-WT. The first few movements, including the rotation of the 8-hydroxyl of Gal and the movement of the 4-and 3-hydroxyl of Gal, contribute to the shrinking of the pocket between the 220-and 130-loops, as shown in Figure 7c. The side chain rotation of 187D then removes the region in front of 187D (Figure 7d).
The changes in pocket shape for NL602-WT differ greatly from those for SC18-WT. The rotation of the 9-hydroxyl of Gal shrinks the pocket between the 220-loop and 190-helix and enlarges the volume between the 190-helix and 130-loop, as shown in Figure 7e. The rotation of 187D, however, contributes less to the shape change than it does for SC18-WT, which is consistent with the previous observation that the rotation of 187D does not remove many interactions with the ligand. The observations for the NL602−D222G simulations are similar to those for NL602-WT. The rotation of 187D slightly shrinks the region in front of 187D, and enlargement of the pocket next to 224E occurs via movement of the 224E side chain (Figure 7g). The next few steps, including the movement of the 4-and 3-hydroxyl and residues 180 H and 191L, slightly shrink the pocket between the 220-loop and 190-helix, as for NL602-WT.
In summary, the two strains from 1918 South Carolina are more flexible between the 220-loop and 130-loop but more rigid between the 220-loop and 190-helix when compared to the 2009 Netherlands strains. This observation is consistent with the distance changes among key residues ( Figure 6). The maximum distance variation in SC18-WT is only 0.3 Å for 187D and 222D. However, in NL602, the distance from 187D to residues 219K, 222D, and 224A is reduced by roughly 1 Å in binding mode 2. Since the ligand mainly interacts with the 190helix and the 220-loop, the flexibility of this region plays an important role in the two binding modes. When the transformation between the two modes occurs, due to the shape and the size of the pocket in NL602, the 220-loop can adapt its position to reflect the position of the ligand, which allows most interactions with the edge of the pocket to be conserved. However, for SC18, because the pocket is larger, the ligand must move from one side to the other during the transformation to maintain these interactions, and the 220-loop cannot adapt to the ligand position as flexibly as for NL602. As a result, the binding of α2,6-SA to SC18 relies more heavily on the side chain of 222D than in NL602, since the ligand has limited access to the edge of the pocket. The additional interactions between the ligand and the pocket in NL602 that were reported previously can also been explained in the light of this observation. 21 Interestingly, as shown in Figure 8a, the junction structures between the 190-helix and 220-loop, which determine the relative position of the two regimes, are different in the two HAs. In SC18 183P is in front of the 190-helix and does not interact with other residues nearby. The hydrogen-bonds between 215A and 182P, together with 217R and 224A, determine the relative positions of the 190-helix and the 220loop. The residues at the junction between the 190-helix and 220-loop change from 183P and 224A in SC18, to 183S and 224E in NL602. Due to these differences, the shape of the  Figure S1). (b) R 187 movement in SC18-WT (m5−m9 in Figure S1). (c) Rotation of the 8-hydroxyl of Gal and G 4 and G 3 shifts in SC18−D222G (m1− m9 in Figure S3). (d) R 187 movement in SC18−D222G (m9−m11 in Figure S3). (e) Rotation of the 9-hydroxyl of Gal and G 4 shifts in NL602-WT (m4−m6 in Figure S5). (f) R 187 movement in NL602-WT (m7−m9 in Figure S5). (g) R 187 movement in NL602−D222G (m5− m7 in Figure S7). (h) G 4 , G 3 , 180H, and 191L movement in NL602− D222G (m7−m11 in Figure S7).

Journal of Chemical Theory and Computation
Article pocket in NL602 is affected by the interaction between 183S and 213E, as well as the interaction between 183S, 224E and 226R, as shown in Figure 8b.
There have been several experimental results reported since 2010 that the S183P mutation 56−61 and the E224A mutation 62−64 with or without other mutations, are correlated with the pathogenesis and transmission changes observed for H1N1. Since the majority of the shape difference between SC18 and NL602 during the binding mode interconversion occurs between the 190-helix and 220-loop, the accumulated mutations at positions 183 and 224 may explain the differences in pocket shape flexibility during binding mode interconversion observed here.

CONCLUSION
In the present work, we have investigated the transformation mechanism between two binding modes for α2,6-SA in both the wild type and mutant H1N1 hemagglutinin proteins from 1918 and 2009 pandemic flu using the 6′SLN ligand. We introduced an intermediate network scheme to accelerate the search process of pathways connecting the two binding modes. The advantage of this procedure is that searches can be systematically conducted in parallel for all the single changes in identifiable local conformational states. By comparing the paths predicted to dominate the kinetics, we have identified the key elementary step that distinguishes the two binding modes, namely the 3-hydroxyl of Gal either binding to the side chain of 222D in wild type HAs or moving to the place where it would have bound to the side chain of 222D in mutant HAs with 222G. We find that the different residues at the 183 and 224 positions in the two HAs change the connecting pair between the 190-helix and 220-loop, and hence the shape of the pocket. The smaller size of the binding pocket in NL602 allows the ligand to interact with more residues in NL602 than in SC18 and also makes the 220-loop more flexible.
The D222G mutation exchanges the energy preference between the two binding modes in both SC18 and NL602 hemagglutinins. However, the transition between mode 1 and mode 2 raises the energy more in SC18 than in NL602, probably because of the different shape of the binding pocket.
The different shape, the flexibility of the pocket during binding mode interconversion, and the resulting energy profile changes provide insight into the contrasting pathogenicities induced by the D222G mutation in the SC18 and NL602 viruses. These results may help to explain the difference between the impact of the 1918 and 2009 H1N1 strains. A change in preference from α2,6to α2,3-linked sialic acid is associated with a shift in pathogenicity. However, in the 2009 D222G mutant, the 2,6 affinity is not weakened as much as for the corresponding 1918 mutant. Understanding such dis-tinctions at an atomic level of detail should be helpful in predicting the potential emergence of strains that may pose serious threats to human health.

* S Supporting Information
The website for the structures of end points and the example input for PATHSAMPLE. Structural details of each end point. The energy profile of the fastest paths. Detailed descriptions of each path. This material is available free of charge via the Internet at http://pubs.acs.org/.