Of mice and men: Dissecting the interaction between Listeria monocytogenes Internalin A and E-cadherin

We report a study of the interaction between internalin A (inlA) and human or murine E-cadherin (Ecad). inlA is used by Listeria monocytogenes to internalize itself into host cell, but the bacterium is unable to invade murine cells, which has been attributed to the difference in sequence between hEcad and mEcad. Using molecular dynamics simulations, MM/GBSA free energy calculations, hydrogen bond analysis, water characterization and umbrella sampling, we provide a complete atomistic picture of the binding between inlA and Ecad. We dissect key residues in the protein–protein interface and analyze the energetics using MM/GBSA. From this analysis it is clear that the binding of inlA–mEcad is weaker than inlA–hEcad, on par with the experimentally observed inability of inlA to bind to mEcad. However, extended MD simulations of 200 ns in length show no destabilization of the inlA–mEcad complex and the estimation of the potential of mean force (PMF) using umbrella sampling corroborates this conclusion. The binding strength computed from the PMFs show no significant difference between the two protein complexes. Hence, our study suggests that the inability of L. monocytogenes to invade murine cells cannot be explained by processes at the nanosecond to sub-microsecond time scale probed by the simulations performed here.


Introduction
Listeria monocytogenes is a Gram-positive bacterium that causes listeriosis, a food-borne infection with a mortality rate up to 30%.Listeriosis causes meningo-encephalitis, gastroenteritis, and abortion in pregnant women.All of this is due to the ability of bacterium to cross the immune barriers of the host and to invade nonphagocytic cells.To invade host cells, L. monocytogenes uses two proteins of the internalin family and one of them, internalin A (inlA), is the focus of this study.InlA binds to the E-cadherin (Ecad) receptor on the hostcell surface, causing a cascade of signals that eventually leads to the internalization of the bacterium by the host cell [ -3].
The internalin family of proteins contains 25 proteins.All of these share a common architecture, including a signal peptide at the amino-terminus and several 22 amino acid leucine-rich repeats (LRR).The LRRs are followed downstream by several regions that are less conserved among the family members.InlA is an 800 aminoacid protein with 5 LRRs (see Figure ) in the inter-repeat region that are fundamental for its biding to the Ecad, a motif for anchoring itself at the bacterial cell wall, and a sorting peptide at the carboxyterminus [4,5].The crystal structure of inlA alone or in complex with the EC domain of Ecad has been solved [6].
Because of the emerging occurrence of L. monocytogenes in the industrialized world, it is important to understand how the bacterium invades the human cell.An important tool to study bacterial infections is to use animal models.Mouse is a popular model because it is eukaryotic but a much simpler species than human [7,8].However, L. monocytogenes does not invade mouse cells at the same rate as humans, all because the binding between inlA and murine Ecad is too weak for the bacterium to adhere to the cell surface [3,6].This observation has spurred research on the interface between inlA and Ecad to determine key interactions that are present in the human but not the murine case.One key residue on human Ecad (hEcad) that has been identified is Pro 6 that is mutated to a Glu in murine Ecad (mEcad).In hEcad, the apolar proline binds in a neutral and hydrophobic cavity on inlA at LRR loop 6 (see Figure ) [6].Therefore, it has been hypothesized that the larger and charged Glu cannot fit in the cavity in addition to lacking any clear interaction partner, resulting in impaired inlA-mEcad interaction.Other key interactions have been hypothesized and tested with mutant proteins [9,0].The Y369S and S 92N mutations on inlA have been shown to improve the affinity for hEcad, especially if they are introduced simultaneously, by improving the interfacial interactions.Furthermore, the Q64E mutation on mEcad has shown to improve the interaction with inlA, but only if the E 6P mutation is also introduced.These two mutations correspond to the conversion of the mEcad sequence to hEcad at the specific sites.All mutations are illustrated in Figure 2.
In this contribution, we will dissect the interaction between inlA and Ecad using computational tools, including investigation of both wild type and various mutant systems.By using a combination of molecular dynamics, free energy calculations, hydrogen-bond analysis, and water site characterization, we will present a detailed description of the interface at an atomistic level.Such techniques have been readily used to study several protein-protein interfaces [ -4].We not only reveal the energetics of the protein-protein interaction, but we also show that energetic differences alone between the inlA-hEcad and inlA-mEcad complexes are not sufficient to describe the inability of inlA to invade murine cells.

CSBJ
Abstract: We report a study of the interaction between internalin A (inlA) and human or murine E-cadherin (Ecad).inlA is used by Listeria monocytogenes to internalize itself into host cell, but the bacterium is unable to invade murine cells, which has been attributed to the difference in sequence between hEcad and mEcad.Using molecular dynamics simulations, MM/GBSA free energy calculations, hydrogen bond analysis, water characterization and umbrella sampling, we provide a complete atomistic picture of the binding between inlA and Ecad.We dissect key residues in the protein-protein interface and analyze the energetics using MM/GBSA.From this analysis it is clear that the binding of inlA-mEcad is weaker than inlA-hEcad, on par with the experimentally observed inability of inlA to bind to mEcad.However, extended MD simulations of 200 ns in length show no destabilization of the inlA-mEcad complex and the estimation of the potential of mean force (PMF) using umbrella sampling corroborates this conclusion.The binding strength computed from the PMFs show no significant difference between the two protein complexes.Hence, our study suggests that the inability of L. monocytogenes to invade murine cells cannot be explained by processes at the nanosecond to sub-microsecond time scale probed by the simulations performed here.

Methods
The complex between L. Monocytogenes Internalin A (inlA) and either human or murine E-cadherin (hEcad and mEcad, respectively) was simulated.The inlA-hEcad complex is shown in Figure , together with numbering of the β-sheets (LRRs).Both wild-type (WT) and various mutants were simulated, based on several available crystal structures as outlined in Table .If a complex did not exist in the PDB database, it was created from an available crystal structure by modifying the side-chain of the amino acid(s) in silico.All protein residues were described with the Amber99SB-ILDN force field [ 5].The side-chains were set to normal protonation states at pH 7, i.e., all Arg and Lys residues were positively charged, and all Glu and Asp residues were negatively charged.The protonation state of the histidine residues were decided by considering hydrogen-bond networks.Hence, His392 in InlA was protonated on the NE2 atoms, and His79 in Ecad was protonated on the ND atom.The complexes were solvated in a rectangular box of pre-equilibrated TIP3P water molecules [ 6], extending at least 9 Å from the solute.In total, ~95,000 atoms were simulated in each system.A few simulations as described in the text used a larger box, extending at least 5 Å from the solute.In those cases, a total of ~35,000 atoms were simulated.All simulations were performed using Gromacs v4.5.5 [ 7].All bonds involving hydrogen atoms were constrained using the LINCS algorithm [ 8], and the time step of the integration of motions was 2 fs.The non-bonded cut-off was 9 Å, and the non-bonded pair-list was updated every 50 fs.Electrostatic interactions were treated using particle-mesh Ewald summation [ 9], and long-range van der Waal interactions were corrected using a continuum approach [20].The temperature was kept at 300 K using a velocity re-scaling algorithm with a stochastic term [2 ] and a coupling constant of ps.The pressure was kept constant at atm using a weak-coupling [22], isotropic algorithm with a coupling constant of ps.
Ten independent simulations were generated for each complex by solvating the complex in different boxes of pre-equilibrated solvent and by assigning different initial velocities [23].Each of the ten independent simulations were first minimized using 500 steps of minimization with harmonic restraints of 200 kJ/mol on protein non-hydrogen atoms, followed by a 00 ps simulation in the NPT ensemble using the same restraints.Thereafter, the systems were equilibrated in the same ensemble but without restraints for 000 ps if the complex did exists as a crystal structure, or 4000 ps if the complex was created by modifying a crystal structure.The equilibration was followed by a 000 ps production run in the NPT ensemble, where snapshots were extracted every 5 ps.Hence, from each simulation, 200 snapshots were extracted for analysis.
The free energy of binding between inlA and Ecad was estimated using MM/GBSA (molecular mechanics with generalized Bornsurface area) [24], with the mmgbsa.pyscript in AmberTools 2 [25].The free energy is expressed as the difference in free energy between the complex and the two binding partners, i.e., ΔG = ΔG(inlA-Ecad) -ΔG(inlA) -ΔG(Ecad), and each of these free energies are calculated as [24] G = < Eint + Eele + Evdw + ΔGpol + ΔGnp -TS > where the first three terms are the molecular mechanics internal, electrostatic and van der Waals energy, respectively; ΔGpol and ΔGnp are the polar and non-polar solvation free energy, and T and S is the absolute temperature and an entropy estimate.The brackets indicate an average over an ensemble of snapshots from the MD simulations.Here, we make a common approximation and evaluate the free energy of free inlA and Ecad from the complex simulation, because of the improved precision [26].Thereby, the Eint term cancels.Furthermore, because accurate calculation of the entropy term is extremely costly for such a large protein-protein complex, and because we cannot easily decompose the entropy, it will be ignored herein.For relative free energies of similar systems, this has been shown to be a good approximation [27].
The energy terms were evaluated using the same force field as in the simulations, but without any non-bonded cut-off.The ΔGpol term was evaluated using the generalized Born method of Onufriev,Bashford and Case,model I [28].The ΔGnp term was evaluated through a linear relation to the solvent accessible surface area (SASA), i.e., γSASA, with γ = 0.03 kJ/mol [29].The free energy for each system was evaluated using 200 snapshots from 0 independent simulations, i.e., 2,000 snapshots in total.The reported uncertainties are the standard deviation of the mean over the 0 independent simulations.MM/GBSA was also used to perform alanine-scanning mutagenesis (ASM) [ ].In ASM, the free energy of mutating one amino acid to an alanine is computed.Here, we used the common single-trajectory approach [25], i.e., the mutated residue was estimated using the ensemble of snapshots generated with the original residue.We also tested a variant of ASM, which we will denote scaled ASM (sASM) [27,30].In this approach the internal dielectric of the protein used in calculating electrostatic and polar solvation terms is scaled to correct for the fact that we use a single-trajectory approach and thereby ignore the protein reorganization energy.For apolar amino acids, the scaling factor is two, for polar and uncharged amino acids three, and for charged amino acids four.
Hydrogen bond analysis was performed on the same 2,000 snapshots per system that were used for the MM/GBSA analysis.We analyzed hydrogen bonds between residues in inlA and residues in Ecad, as well as between interfacial residues and water molecules.Interfacial residues were determined to be residues in inlA that had an atom at most 4 Å from a residue in Ecad, and vice versa.The crystal structure of inlA-hEcad was used to calculate the distances.The threshold for finding hydrogen bonds was a length of 3.5 Å between the heavy atoms and an angle cut-off of 35°.
Conserved water sites in the interface were found using a clustering algorithm [3 ].Each MD snapshot was superposed onto the crystal structure by fitting the backbone heavy atoms of each residue within 8 Å of the interfacial residues.(Interfacial residues defined as in the hydrogen bond analysis.)Then, oxygen atoms of water molecules within 3 Å of the interfacial residues were saved for clustering.When all snapshots had been processed, the stored water molecules were clustered.The water molecule with the largest number of water molecules within Å was defined to be the center of a conserved water site, and this water molecule and all water molecules within Å were removed from further analysis.This procedure was repeated until the number of water molecules found at a site was lower than what is expected from a bulk water simulation.
The interaction energy between each water molecule in the cluster and the rest of the system was monitored.An entropy estimate for each site was calculated from inhomogeneous solvation theory [32,33] by considering the internal translational and rotational entropy.Hence, water-water correlation was ignored.The translational entropy was calculated by assuming a uniform distribution and the rotational entropy was calculated by considering the rotation of Euler angles using an approach outlined recently [34].
A potential of mean force (PMF) between inlA and hEcad or mEcad was calculated using umbrella sampling [35].The complex was placed in a 95x 35x 30 Å box such that it was roughly 0 Å from the edge of the box in all directions.Next, either hEcad or mEcad was displaced from inlA at specific center-of-mass distances in the y-direction (illustrated in Figure 3).Displacements of 0, , 2, 3 Å, and then in 2 Å intervals, for displacements up to 48 Å were used.At each displacement, the complex was solvated with TIP3P water molecules.In total ~65,000 atoms were simulated.The complex was subsequently simulated at each value of displacement and the center-of-mass distance in the y-direction was enforced with a harmonic potential with a force constant of 000 kJ/mol (this magnitude gives a good overlap of the distance distributions between

Interaction between Internalin A and E-cadherin
individual simulations).The simulations were performed as described above for the unconstrained MD simulations.The systems were equilibrated for 2 ns before a 6 ns production run.The PMFs were then estimated by the weighted histogram analysis method [36] implementation in Gromacs [37].

Results
We have simulated the complex between iternalin A (inlA) and either human or murine E-cadherin (hEcad or mEcad) using molecular dynamics.Both wild-type (WT) systems and a range of mutants have been simulated.In what follows, we will present the results of the various analyses performed on the generated trajectories.
The binding free energies between the inlA and Ecad in the various complexes were estimated by MM/GBSA and are given in Table 2.It should be noted that conformational entropy was neglected, as mentioned above, an approximation that has often been used when studying protein-protein complexes [ 2,3,30].An RMSD analysis (see Table S ) indicated that the simulations were sufficiently stable.The affinity of WT inlA-hEcad is -207 kJ/mol compared to the affinity of inlA-mEcad that of only -52 kJ/mol, consistent with experiments.The uncertainty is rather high, indicating that the total free energy is not fully converged.However, as we will see, this has minor importance when considering individual residues.By decomposition, we can obtain an estimate of how each species contribute to the total free energy.It is clear from Table 2, that in general a majority of the binding free energy comes from Ecad, although the ratio is close to 50%.
We then simulated a number of different mutants to probe key interactions in the interface that have been explored experimentally.The inlA mutations Y369S and S 92N have been shown experimentally to improve the binding between inlA and Ecad.However, the simulations with these mutations or the double mutant predict a reduced affinity by up to 27 kJ/mol for the inlA-hEcad complex.Because of the large uncertainty, the differences are not statistically significant.For the S 92N / Y369S double mutant simulation of the inlA-mEcad complex, the affinity is increased by 7 kJ/mol.
Two residues on mEcad have been probed experimentally, namely Glu 6 and Gln64.Mutating Glu 6 to Pro 6, gives a 3 kJ/mol more negative free energy estimate, in accordance with experiments (but not statistically significant).Likewise, mutating Gln64 to Glu64, give a 25 kJ/mol more negative binding affinity, and the double mutant E 6P/Q64E also gives a significant 24 kJ/mol more negative binding affinity, irrespective of whether the inlA S 92N / Y369S double mutant is introduced or not.It is interesting to note that the largest change to the binding affinity when introducing the Q64E mutation on mEcad comes from inlA, not from mEcad as one would suppose.
To check the importance of these residues, we introduced reverse mutations on the inlA-hEcad complex (i.e.modifying hEcad towards mEcad).Both the P 6E and E64Q mutants give statistically significant reduced binding affinities, by 34 and 33 kJ/mol respectively.The double mutant gives an even more reduced binding affinity (4 kJ/mol), and if the S 92N / Y369S double mutant is also introduced on inlA, this reduces even further.Introducing the E64Q mutation gives rise to a large change in the contribution from inlA, but only a moderate change in the contribution from Ecad, whereas for the P 6E mutant the opposite is found.This complements perfectly the opposing trends seen for the E 6P and Q64E mutants in mEcad.

Interaction between Internalin A and E-cadherin
The total binding free energy was decomposed on a residue-wise basis to determine which residues that are most important for binding.The free energy contributions from all residues are plotted in Figure 4 (and shown in Table S2).For the residues on inlA, the major contribution comes from a few residues throughout the sequence, and there are a few distinct differences between the inlA-hEcad and inlA-mEcad complexes.Interestingly, the charged residues Arg85, Arg2 , Glu255, Glu323, and Glu326 all show a difference in interaction with hEcad vs. mEcad larger than 5 kJ/mol, when comparing the two complexes.Instead looking at the residues on Ecad, it is clear that many of them display large contributions (see Figure 4 and Table S3).However, when summing up the difference between inlA-hEcad and inlA-mEcad, most of the residue contributions cancel.Only residues Lys 4, Gly 5, Pro/Glu 6, and Glu/Gln64 show a difference larger than 5 kJ/mol.
An alternative to this energy decomposition (ED) is alaninescanning mutagenesis (ASM), in which the effect on the free energy of mutating a particular residue to alanine is estimated.ASM is more expensive than ED, and it is therefore not feasible to perform ASM on all residues in the complex.To determine which residues on which to perform ASM, we used a number of criteria.First, the residue should have an ED contribution of more than 4 kJ/mol in either the inlA-hEcad or inlA-mEcad complex.Second, the residue should be an interfacial residue, i.e. it should be within 4 Å of a residue on the other protein.Third, the residue is identified as a hydrogen-bonding partner (see below).Fourth, and last, the residue has been discussed in the literature to be important for the binding.If at least one of the criteria is fulfilled, ASM and scaled ASM (sASM) were computed, with the exception of Glycine residues as well as N-and C-terminal residues (due to limitation in mmpbsa.py).The residues on inlA and Ecad identified in this way are included in displayed in Figure 4 and listed in Tables S2 and S3.
For inlA, 29 residues were detected using the above criteria.Of these, 4 are charged residues, nine are uncharged but polar, and six are apolar.It is common to introduce a threshold to determine the most important residues, usually called hot or warm spots [ 3].There are different definitions of this; here we use a threshold of 8 kJ/mol to determine hot spots, i.e., all residues that have an absolute ED contribution or an ASM or sASM absolute free energy of greater than 8 kJ/mol are considered to be important.Unfortunately, ED, ASM, and sASM do not always agree.This is not surprising, as the method use different levels of approximations.For the inlA-hEcad complex, ED distinguishes eight hot spots, ASM 7, and sASM 4, and only on eight residues do the methods completely agree.However, if we use the argument that it is sufficient that two methods agree, we can identify twelve hot spots on inlA for the inlA-hEcad complex and 6 for the inlA-mEcad complex.For the inlA-hEcad complex, the hot spots are Arg85, Phe 50, Glu 70, Arg2 , Asn259, Lys30 , Tyr343, Tyr347, Phe348, Arg365, Phe367 and Trp387.Most of these residues are either charged or polar.For the inlA-mEcad complex, the hot spots are Arg85, Phe 50, Arg 68, Glu 70, Gln 90, Arg2 , Asn259, Lys30 , Glu326, Tyr343, Tyr347, Phe348, Arg365, Phe367, Tyr369 and Trp387.Hence, Arg 68, Gln 90, Glu326, and Tyr369 were identified as hot spots on inlA-mEcad but not on inlA-hEcad.In total, the hot spots contribute -8 and -39 kJ/mol, to the inlA-hEcad and inlA-mEcad affinities, respectively.
In Figure 5, we have plotted the residue-by-residue difference (corresponding to ED in Tables S and S2) for the mutant simulations, compared to the WT simulation.A negative value implies that the residue has a more negative binding free energy in WT than in the mutant.In accordance with the small effect of the mutants Y369S and S 92N on the binding energies, very few residues show a large difference for these two mutants.In addition, introducing the E 6P and Q64E mutations on mEcad, gives surprisingly few changes throughout either inlA or mEcad.Only three residues on inlA and only two residues on mEcad show a difference larger than 5 kJ/mol.Introducing the double mutant E 6P / Q64E, gives a few more residues with a difference larger than 5 kJ/mol.

Interaction between Internalin A and E-cadherin
Volume No: 6, Issue: 7, e201303022 Computational and Structural Biotechnology Journal | www.csbj.org Likewise, if we introduce the reverse mutation on the inlA-hEcad complex, we only see a few changes for the P 6E and E64Q mutations.The changes are highly localized around the respective mutations.
The inter-protein hydrogen bonds between inlA and Ecad were monitored throughout the simulations.The hydrogen bonds with an average occupancy of more than 0% are listed in Table 3. Starting with interactions between a backbone donor/acceptor and a side chain acceptor/donor, we identified five hydrogen bonds.The backbone oxygen atom of Ile4 on Ecad forms a very clear hydrogen bond with the side chain of Arg365 on inlA, in both the inlA-hEcad and the inlA-mEcad complexes.The same interaction was observed between Val48 on Ecad and the Arg85 side-chain on InlA, although the occupancy is much lower.Furthermore, the backbone nitrogen atom of Phe 7 in Ecad donates a hydrogen atom to the side chain of Glu 70 in inlA, in both complexes.However, the hydrogen bond between the backbone oxygen atom of Gly 5 in Ecad to the sidechain of Arg2 in inlA is only formed in the inlA-hEcad simulation.
Looking at side-chain-to-side-chain interactions, we find nine hydrogen bonds in the inlA-hEcad complex, and ten in the inlA-mEcad complex.Certain hydrogen bonds are formed in inlA-hEcad only, namely between Glu255 on inlA and Lys 9 on Ecad, between Asn282 on inlA and Gln23 on Ecad, and between Glu323 on inlA and Lys25 on Ecad.Likewise, hydrogen bonds between Glu 6 on Ecad and various nitrogen atoms of Arg2 on inlA are only formed in the inlA-mEcad complex.These are naturally not possible in inlA-hEcad due to the presence of Pro 6 in that case.The hydrogen bonds between Glu54 on Ecad and Ser2 6 on inlA, as well as between Asn259 on inlA and Trp59 on Ecad are formed in both complexes, but in inlA-mEcad, the hydrogen bonds are formed with a very low average occupancy.Furthermore, Gln23 on Ecad forms a hydrogen bond with Asn259 and Lys30 on inlA in both complexes.The same is true for Glu/Gln64 on Ecad and Arg85 on inlA.Lastly, Glu326 on inlA makes a hydrogen bond with Lys25 on Ecad in the inlA-mEcad complex, and with Lys30 in both the inlA-hEcad and inlA-mEcad complexes.
We identified conserved water sites by clustering water molecules in the interface between inlA and Ecad.In Table 4 we list the water sites with occupancy of at least 25%, i.e., that occurred in at least 400 of the 2000 snapshots saved for the 0 independent simulations.We identified 26 such sites in the inlA-hEcad complex, and 8 sites in the inlA-mEcad complex.The average interaction energy of the water sites in the inlA-hEcad complex ranges from -33 to -95 kJ/mol, with an average of -63 kJ/mol.For the inlA-mEcad complex, the average interaction energy of the water sites shows a much larger range from + to -90 kJ/mol, with an average of -52 kJ/mol.The total internal entropy of the sites is positive for all sites and is dominated by the rotational entropy (not shown).It ranges from 2 to 27 kJ/mol for the inlA-hEcad complex and from 0 to 22 kJ/mol for the inlA-mEcad complex.
The water sites are displayed in Figure 6, and clearly show that a majority of these are located in two large and one smaller clusters.One of the clusters is close to residues Asn259, Lys30 , Glu323, Tyr343, and Tyr347, on inlA, and Val3, Pro5, Gln23, Lys25, and Asn27 on Ecad.In the inlA-hEcad complex, this cluster is also close to Ala28 , Asn282, and Asn325 on inlA, and in the inlA-mEcad complex, it is close to Glu326 on inlA and Trp59 on Ecad.The cluster contains nine and eleven water sites in the inlA-hEcad and inlA-mEcad complex, respectively, with an average occupancy of the water sites of 770 and 669.This cluster will be denoted cluster .
A second cluster is close to residues Thr 48, Phe 50, Glu 70, Ser 92, Arg2 , Asp2 3, Ser233, and Ile235, on inlA, and residues Pro/Glu 6, and Phe 7 Ecad.In addition, in the inlA-hEcad complex, this cluster is close to Arg 68, Gln 90, Ser233 and Glu255 on inlA, and Gly 5, Lys 9, and Asn20 on Ecad.In the inlA-mEcad complex, the cluster is close to Leu 9 on inlA and Pro 8 on Ecad.In the inlA-hEcad complex, this cluster contains twelve water sites that have an average occupancy of 838, and in the inlA-mEcad, the cluster contains eight water sites with an average occupancy of 686.This cluster will be referred to as cluster 2.
The smallest of the clusters, cluster 3, is close to residues Arg85, Asn 07, and Asn 28 on inlA, and residues Thr63 and Glu/Gln64 on Ecad.In addition, it is close to Asn 04 and Ser 06 on inlA in the inlA-hEcad complex.The cluster contains two water sites in the inlA-hEcad complex with an average occupancy of 958, and only one site in the inlA-mEcad complex, with an occupancy of 833.The water sites of cluster and 3 are fairly consistent when comparing, inlA-hEcad and inlA-mEcad.However, the water sites in cluster 2 occupy partly different locations.In addition to the three clusters, there is a water site between Arg365 on inlA and Pro6 on Ecad, which is present in both the inlA-hEcad and inlA-mEcad complex, and one between Arg85 on inlA and Pro46, Pro47 and Val48 on Ecad, that is only present in the inlA-hEcad complex.
The residues close to the water sites form hydrogen bonds to water molecules found in most of the simulations, as shown in Table 5.For the inlA-hEcad complex, the average occupancy ranges from 22 to 200%, with an average of 76%, and for the inlA-mEcad complex the average is slightly lower at 57%.Most of the hydrogen bonds occur in both complexes, with the exception of hydrogen bonds to Arg85 on inlA in the inlA-mEcad complex, and hydrogen bonds to Glu 6 on mEcad.The latter hydrogen bonds are naturally not possible in the inlA-hEcad complex.
It is interesting to note that there is a "dry" region between cluster and 2 (see Figure 6), where water molecules exchange readily with bulk water.This highlights that the interface between the two subunits is not contiguous.VI.The green protein is inlA and the blue protein is Ecad.Sites are shown as red and orange spheres, the red were found for the inlA-hEcad complex and the orange for the inlA-mEcad complex.Residues with 3 Å of the sites are shown as well.
To monitor the stability of the inlA-hEcad and inlA-mEcad complexes during a longer period of time, 200 ns simulations were performed for each of these.The simulations were performed in a slightly larger box allowing the proteins to diffuse in case of complex dissolution.The structural evolution of the complexes measured as the root mean square deviation after fitting each snapshot to the starting structure is shown in Table 6.To monitor the evolution, we made the fit based on the backbone atoms of inlA rather than the full complex.As such, the analysis will more easily reveal if the complex is separating or not.We will therefore only see a modest evolution of the inlA residues; the RMSD is in this case .4 to .6 Å for backbone atoms, and .7 to .9Å for all heavy atoms.If we instead look at the Ecad atoms we observe larger deviations, and surprisingly, hEcad show larger deviations than mEcad, although inlA-hEcad should be a tighter complex.The RMS for hEcad is 2.7 Å for backbone atoms and 3. Å for all heavy atoms over the entire simulation.The corresponding measures for mEcad are 2.0 and 2.5 Å, respectively.However, looking at the two halves of the simulation individually, it is clear that most of the changes occur after 00 ns.Considering only the interfacial residues, it is clear that not all of the overall change comes from these residues, and that the RMSD in this region is similar between the two complexes.
We also calculated the MM/GBSA binding free energy of the complex for the last 0 ns of the simulations.The binding affinity for the inlA-hEcad complex is -83.kJ/mol, and -70.9 kJ/mol for the inlA-mEcad complex.The difference compared to the average over the 0 short simulations is significant for both complexes.This analysis shows that the structural evolution observed by the RMSD analysis leads to a looser inlA-hEcad complex and a tighter inlA-mEcad complex.
To measure the binding strength between inlA and Ecad in an alternative way to MM/GBSA, we computed the potential of mean force (PMF) between the proteins using umbrella sampling and the weighted histogram analysis method.The direction in which Ecad was artificially displaced from inlA is illustrated in Figure 3.We also tried to displace Ecad in a perpendicular direction but the PMFs were very noisy (results not shown).The average PMFs for inlA-hEcad and inlA-mEcad are shown in Figure 7.We performed three independent sets of simulations for inlA-hEcad and two independent sets of simulations for inlA-mEcad.The PMFs are sufficiently converged at a center-of-mass distance of 50 Å, which implies that we could estimate a binding free energy from this point by taking the negative of the PMF (we set the PMF to zero at the displacement of 0 Å).Using this approach, the binding free energy of inlA-hEcad and inlA-mEcad is -32±6 and -27± kJ/mol, respectively.The estimates from the individual sets of simulations are given in Table 7.It is clear that the inlA-hEcad estimate is much more uncertain than the inlA-mEcad estimate, although the curve obtained for the inlA-mEcad system (Figure 7) is much more noise.Hence, the difference in binding affinity between the complexes, albeit indicating that inlA binds weaker to mEcad than to hEcad, should be taken with some caution.

Discussion
The hot spots can be divided into two main clusters of residues.One of the clusters contains residues on LRR's 9, , 3, and 4 of inlA and residues on Ecad located on the loop close to the Nterminal, between β-sheets b and c, and on β-sheet d (see Figure for numbering of β-sheets).These are the hot spots Asn259, Lys30 , Glu323, Tyr343, Tyr347, Phe348, Arg365, Phe367, and Trp387 on  inlA, and Val3, Pro6, Gln23, Lys25, Asp29, and Lys30

Interaction between Internalin A and E-cadherin
Volume No: 6, Issue: 7, e201303022 Computational and Structural Biotechnology Journal | www.csbj.orgbackbone of Gly 5 on Ecad and coordinates conserved water sites.
Gly 5 was shown to be important by ED, but cannot be analysed using ASM.Lys 4 and Lys 9 on Ecad also stabilize the conserved water sites.The important Pro 6 residue on Ecad, contributing more than -30 kJ/mol to the binding free energy, forms unspecific, apolar contacts with the residues on inlA LRR 6 that forms a cavity-like structure.Pro 6 and the surrounding residues are lined with conserved water sites.Pro 8 on Ecad and Phe 50 on inlA make apolar contacts.
Apart from these two clusters of residues and contacts, there are two additional hot spots.Arg85 of inlA forms a relatively wellconserved charge-charge interaction with Glu64 on Ecad, which is not a hot spot.Arg85 also stabilize two water sites and contributes with -39 kJ/mol to the binding affinity.Lastly, Glu56 does not have any clear binding partner and it is unclear why this residue should be important for the binding.
The hot spots on inlA contribute -8 kJ/mol binding affinity, and those on Ecad contribute -04 kJ/mol.Divided into the clusters of residues discussed above, the residues on inlA in cluster contribute -7 kJ/mol and those on Ecad contribute -35 kJ/mol.The residues on the second cluster contribute -6 and -50 kJ/mol, for the inlA and Ecad residues, respectively.This indicates that most of the binding affinity comes from these two clusters, although there are a few other separate residues that also contribute greatly thereto (such as Arg85).It is also interesting that inlA contributes mostly through the residues in the first cluster, but Ecad contributes mostly through the residues in the second cluster.
The hot spots of the interface in this system are to a large degree equivalent to those in the inlA-hEcad complex.This is interesting to note as the interface between inlA and wild-type mEcad has been only partially characterized by experiments, due to the inability to crystalize the complex.Hence, this study complements existing literature.
The cluster of residues close to the LRR β-sheets 9, , 3, and 4 of inlA also includes Glu326 (which forms a hydrogen bond with Lys25) as well as Tyr369 (which forms a non-specific interaction with Asn27 on Ecad).At least the sASM analysis suggests that Asn27 should be considered as a hot spot.This interaction has been discussed much in the literature, and it is argued that it is favorable to mutate Tyr369 to serine.However, the simulations with the Y369S mutant did not result in any improved binding affinity.The hot spot residues on the loop close to the N-terminal, residues on and between β-sheets b and c, and residues on β-sheet d of mEcad are identical to the residues in inlA-hEcad.
The second cluster, located on the LRR β-sheets 4, 5 and 7 of inlA and on the loop between β-sheets a and b of Ecad, differs more.
The largest difference comes from the substitution of Pro 6 to Glu 6.It contributes as much as 7 kJ/mol less to the binding affinity than Pro 6 in the inlA-hEcad complex, but the contribution is nonetheless favorable.However, instead of protruding into the apolar cavity of LRR 6 it bends outwards and forms stable hydrogen bonds with Arg2 .Instead, the cavity seems to be filled with conserved water sites.Pro 8 on Ecad and Phe 50 on inlA make apolar contacts, similar to those in the inlA-hEcad complex.This is has not been described experimentally, and show that although the cavity is unfavorabe for Glu 6 (as hypothesis by experiment and confirmed here), the protein is able to adapt and form new interactions.Two hot spots, Arg 68 on inlA and Glu 3/3 on Ecad make non-specific contacts and do not interact directly with the opposite protein.
Lastly, Arg85 on inlA forms consistent hydrogen bonds with Gln64 on Ecad, and in the inlA-mEcad complex both residues are hot spots.
The hot spots on inlA and Ecad contribute -37 and -50 kJ/mol, respectively.Looking at residues in the first cluster only, the contributions are -67 and-29 kJ/mol, for inlA and Ecad, respectively, whereas in the second cluster the residues on inlA contribute -5 kJ/mol, and the residues on mEcad -20 kJ/mol.mEcad thus provides a much weaker interaction (30 kJ/mol less) in the second cluster, than what hEcad does.In the inlA-hEcad complex, hEcad is the dominating contributor of this cluster.

Conclusions
We have performed simulations of L. monocytogenes internalin A (inlA) and either human or murine E-cadherin (Ecad).Both the wild type and various mutants have been simulated.Although the different methods to analyze the interfacial residues give somewhat ambiguous results, we believe that a lot of useful information is provided with regards to the energetics of the interaction.The interfaces of the two complexes are very similar and there are small differences that result in the apparent lower binding affinity for the inlA-mEcad complex.The two proteins bind together using two large clusters of residues, in addition to one smaller cluster.One of the two large clusters is more or less identical in the two complexes, and all the difference in binding affinity stems from the other two clusters.The substitution of Pro 6 on hEcad to Glu 6 on mEcad, shifts the hydrogen-bonding partners and conserved water sites.While Pro 6 in hEcad protrudes into an apolar cavity at LRR 6 of inlA that is lined with conserved water sites, Glu 6 in mEcad bends outside the cavity to form hydrogen bonds with Arg2 on inlA, thereby pushing the water molecules towards the cavity.It is clear that the latter configuration of water sites is less favorable than the former, as shown by the much lower occupancy.The mutant simulations clearly show that the binding affinity is lowered when Pro 6 is mutated to Glu, and that a Glu 6 to Pro mutation strengthens the affinity.
The last cluster of important residues is mainly formed by interactions between Arg85 on inlA and Glu/Gln64 on Ecad, and a number of conserved water sites.In the inlA-hEcad complex, Arg85 and Glu64 is able to form a tight salt bridge that is also able to attract more water molecules, whereas in inlA-mEcad, there is a single hydrogen bond between Arg85 and Gln64 and fewer water molecules.That the salt bridge interaction is favorable was clearly shown in the mutant simulations, where a E64Q mutation in inlA-hEcad considerably lowered the binding affinity, whereas the Q64E mutation in inlA-mEcad strengthened the binding affinity.However, due to weaker binding of the cluster containing Glu 6 vs. that containing Pro 6, the interaction between Arg85 and Gln64 is of higher relative importance in inlA-mEcad than in the inlA-hEcad E64Q mutant.
Up to this point, we have confirmed the experimental observation that inlA-mEcad is a weaker complex than inlA-hEcad.This observation has been used as the main argument to explain why the bacterium is unable to invade murine cells, while it can invade human cell.However, we have performed our simulations from crystal structures of the already formed complex.To this end we performed 200 ns simulations of the inlA-hEcad and inlA-mEcad complexes and showed that the observed differences are not sufficient for the dissolution of the inlA-mEcad complex.Contrary, we observe larger changes for the inlA-hEcad complex.The umbrella sampling simulations also corroborate this observation.The binding strength estimated from these simulations show no significant difference between inlA-hEcad and inlA-mEcad.However, it must be noted that we have judiciously chosen one dissociation pathway, and that more than one pathway may exist.The umbrella sampling is fundamentally different to MM/GBSA so it should not come as a surprise that they indicate different relative free energies.Still, based on the results in this study we cannot attribute the inability of L. monocytogenes to invade murine to the interactions between the inlA and mEcad at the nanosecond to sub-microsecond timescale (the time scales of our simulations).Either, the processes involved occur on a much longer timescale than is readily accessible with conventional simulations or there is some hitherto unknown mechanism that precludes the binding from taking place altogether.One possible reason could be that the unbound structures of mEcad and hEcad differ substantially such that mEcad cannot be properly presented for inlA to bind.Our conclusion is interesting as it questions an important hypothesis regarding L. monocytogenes invasion.

Figure 1 .
Figure 1.Complex between inlA in green and hEcad in blue.The numbering of every second inlA β-sheets(LRRs) is shown, as well as the numbering of the β-sheets in hEcad.N-and C-termini of the protein chains are marked with an N and C, respectively.The hEcad loop containing Pro16 and the tip of LRR 6 is encircled in grey.

Figure 2 .
Figure 2. Illustration of mutation in inlA, a) and b), and differences between hEcad and mEcad, c) and d).a) Illustrates the S192N mutation, which leads to an interaction between Asn192 and Phe17.b) Illustrates the Y369S mutation, which leads to an interaction with Asn27.c) Illustrates the important Pro/Glu16 difference between hEcad and mEcad.d) Illustrates the Glu/Gln64 difference.

Figure 3 .
Figure 3. Illustration of the direction of displacement in the umbrella calculations.inlA is shown in green at one edge of the simulation box.The position of hEcad as observed in the crystal structure is then shown in blue, and at displacements of 24 Å and 48 Å in purple and pink, respectively.The simulation box is sketched for reference.The hEcad loop containing Pro16 and the tip of LRR 6 is encircled in grey.

Figure 4 .
Figure 4. Free energy contributions of residues on inlA and Ecad in kJ/mol.a) inlA residues in the inlA-hEcad complex, b) inlA residues in the inlA-mEcad complex, c) Ecad residues in the inlA-hEcad complex, d) Ecad residues in the inlA-mEcad complex.Residues were selected based on a number of criteria as outlined in the text.Free energy contributions are determined by energy decomposition (ED), alanine scanning mutagenesis (ASM), and scaled ASM (sASM).

Figure 5 .
Figure5.Per-residue free energy difference between wild-type inlA-Ecad complex and various mutants.a) Difference relative to inlA-hEcad complex for residues on inlA, b) Difference relative to inlA-hEcad complex for residues on Ecad, c) Difference relative to inlA-mEcad complex for residues on inlA, d) Difference relative to inlA-mEcad complex for residues on Ecad

Figure 6 .
Figure 6.Conserved water sites in the interface.Showing the location of the sites listed in TableVI.The green protein is inlA and the blue protein is Ecad.Sites are shown as red and orange spheres, the red were found for the inlA-hEcad complex and the orange for the inlA-mEcad complex.Residues with 3 Å of the sites are shown as well.
on Ecad.These residues are illustrated in Figure8.Asn259, Lys30 , and Glu323 on inlA and Lys25 and Trp59 on Ecad form a network of hydrogen bonds and charge-charge interactions.Of these, the residues on Ecad are most important for the binding.Furthermore, Tyr343 and Tyr347 are involved in stabilizing interfacial water sites and contribute a fair amount to the binding affinity.Lys30 forms a hydrogen bond with Glu326 on inlA, albeit not being a hot spot.This interaction is thus not important for the binding, although Lys30 does contribute.The other main cluster of residues consists of residues on LRR's 4, 5 and 7 of inlA and residues on the loop between β-sheets a and b of Ecad, as illustrated in Figure8.These are the hot spots Phe 50, Glu 70, and Arg2 on inlA and Glu 3, Lys 4, Pro 6, Pro 8, and Lys 9 on Ecad.Arg2 on inlA forms a hydrogen bond with the

Figure 8 .
Figure 8. Illustration of the two main clusters of hot spots.inlA is shown in green and hEcad is shown in blue, hot spots are colored by atom.