Rapid, Precise, and Reproducible Prediction of Peptide − MHC Binding A ﬃ nities from Molecular Dynamics That Correlate Well with Experiment

: The presentation of potentially pathogenic peptides by major histocompatibility complex (MHC) molecules is one of the most important processes in adaptive immune defense. Prediction of peptide − MHC (pMHC) binding a ﬃ nities is therefore a principal objective of theoretical immunology. Machine learning techniques achieve good results if substantial experimental training data are available. Approaches based on structural information become necessary if su ﬃ ciently similar training data are unavailable for a speci ﬁ c MHC allele, although they have often been deemed to lack accuracy. In this study, we use a free energy method to rank the binding a ﬃ nities of 12 diverse peptides bound by a class I MHC molecule HLA-A * 02:01. The method is based on enhanced sampling of molecular dynamics calculations in combination with a continuum solvent approximation and includes estimates of the con ﬁ gurational entropy based on either a one or a three trajectory protocol. It produces precise and reproducible free energy estimates which correlate well with experimental measurements. If the results are combined with an amino acid hydrophobicity scale, then an extremely good ranking of peptide binding a ﬃ nities emerges. Our approach is rapid, robust, and applicable to a wide range of ligand − receptor interactions without further adjustment.


INTRODUCTION
The recognition of major histocompatibility complex (MHC) bound peptide ligands by T-cell receptors (TCRs) lies at the heart of the human immune response. 1 To allow recognition, cytosolic peptides undergo the MHC class I (MHCI) pathway, while extracellular peptides undergo the MHC class II (MHCII) pathway. MHCI processing requires a protein to be cleaved into peptide fragments by the proteasome. These peptides are usually 8−11 amino acid residues long. The peptides then enter the endoplasmatic reticulum (ER) via the "transporter associated with antigen processing" (TAP) or other transporter proteins such as Sec61. 2 In the ER, the peptides are loaded on MHCI molecules. The pMHCI molecules are transported to the cell surface where they can be recognized by the TCRs of CD8+ T-cells.
Similarly, in the MHCII pathway extracellular proteins are endocytosed and ingested in lysosomes. MHCII molecules are targeted at the class II-associated invariant chain peptides (CLIPs). The nonclassical MHCII molecule, human leukocyte antigens-DM (HLA-DM), helps to release CLIP and hence the binding of its degraded peptides to MHCII. The pMHCII molecules are transported to the cell surface where recognition by TCRs of CD4+ T-cells can take place.
In both MHC pathways, the binding affinity between the peptide and MHC is crucial. The MHC-associated peptides which engage T-cells through TCRs and induce T-cell responses are termed T-cell epitopes. In Figure 1, the binding between a peptide and MHCI is illustrated. If an MHC allele is unable to bind and present a peptide, then no recognition of this pMHC complex by T-cells can take place. Since this process is so central to the immune response, there is considerable interest in the development of prediction methods that reliably indicate T-cell epitopes. 3 A wide range of predictive approaches have been published over the past few years (for a review, see, e.g., Tong et al. 5 ). Early prediction approaches focused on anchor residues and simple sequence motifs. Binding matrices improved this approach by counting the frequency of each residue in each peptide position for experimentally known binders and nonbinders. This can be achieved by higher order machine learning techniques such as artificial neural networks, support vector machines, or hidden Markov models. These approaches have been extrapolated for pan allele predictions yielding coverage of many human MHCI and MHCII alleles. 6 Such sequence-based machine learning predictions of pMHC binding are highly dependent on the availability of pre-existing training data. In common with most sequence based "immunoinformatics" techniques, they can potentially answer the question of whether a peptide binds to an MHC allele but not why. In addition, the high degree of polymorphism in MHC, there are more than 10 000 human alleles (http://www. ebi.ac.uk/ipd/imgt/hla/stats.html), many with insufficient data for training, further undermines the machine learning approach. For these reasons, "training free" structure-based docking approaches for pMHC binding affinity prediction have also been investigated.
Molecular dynamics (MD) calculations have been widely used in understanding and characterizing the structural, dynamical, and functional properties of biomolecular systems. Its application in the T-cell receptor/peptide/MHC systems has been recently reviewed. 7 The use of MD to calculate binding affinities is of central importance here. Several techniques exist to predict binding free energies based on MD simulations, 8 including free energy perturbation (FEP), thermodynamic integration (TI), molecular mechanics Poisson−Boltzmann surface area (MMPBSA), and molecular mechanics generalized Born surface area (MMGBSA) techniques. While FEP and TI are often seen as "exact" and even "rigorous," 9 they have found relatively little application in drug design 10 probably due to their high computational cost and, even then, the frequent lack of convergence in the calculated free energies, particularly for larger ligands. Although in principle there is no restriction on the similarity of two states in the TI/FEP calculations, the methods are mainly applied to similar drug−protein combinations (perturbative changes) due to practical challenges in the calculations. 11 Although a recent study reported encouraging results for the FEP method in the prediction of relative ligand binding potency, 12 its application in lead optimization is still at the very early stage for pharmaceutical drug discovery. MMPB(GB)SA approaches have become popular in the estimation of binding free energies because they offer a good balance between speed and accuracy. 13,14 MMPB(GB)SA combines force field based molecular mechanics (MM) with a continuum model, based on the Poisson−Boltzmann (PB) or generalized Born (GB) equation to handle electrostatic interactions, and a nonpolar solvation energy correlated with solvent accessible surface area (SA). The method can be applied in principle to any aqueous binding process: it is not dependent on assumptions of small (perturbative) differences between two similar systems. The configurational entropy due to ligand−protein binding can be calculated by a normal mode (NMODE) method. The MMPB(GB)SA methodology has been applied in various small molecule−protein systems, including drugs binding to their receptors 14,15 and T-cell epitope binding to MHC. 16,17 The approach we described here is most closely associated with this method.
Theoretically, the accuracy of these free energy predictions is limited mainly by two factors: (1) the force field used to describe the atomistic interactions and (2) the ability to sufficiently sample the different microstates, in both bound and unbound states, for a given system. 14 Other factors also play a role in any specific approach, such as the neglect of explicit water interactions in an implicit continuum solvent model. The development of force fields is an ongoing activity, 16 but it is not our main concern here. We work with an existing and well established force field. Inclusion of water molecules which mediate ligand−protein interactions can improve the accuracy of binding free energy predictions in the MMPBSA approach, which requires some specific effort to identify such water molecules. 14,18 The purpose of the present study is to assess the convergence of the free energy calculations we make in order to be able to report reliable, precise, and reproducible pMHC binding affinities. Our approach is based on a "protocol" originally reported for estimating binding affinities of inhibitors to HIV-1 protease. 14,19 We call this protocol "enhanced sampling of molecular dynamics with the approximation of continuum solvent" (ESMACS). This builds around the socalled MMPB(GB)SA method, including configurational entropy and free energy of association, but with important additional features to address reliability and reproducibility. The ranking of binding affinities is investigated comprehensively, including the several contributions that together lead to a full estimate of the free energy of binding of peptides with MHC proteins. Our article is structured as follows. In Section 2, we discuss the theoretical basis of the calculation methods we use in this work, while in section 3 the methods are described as they are implemented. The approach requires access to substantial computational resources (typically, to achieve the turnaround times we are able to deliver here, which is a key aspect of the work, one needs access to a supercomputer with at least 7200 cores). Our results are presented in section 4, and the conclusions from this work may be found in section 5.

THEORY
The affinity of combining two reactants A and B at constant temperature and pressure is characterized by the change in the Gibbs free energy upon binding, as The binding free energy is composed of the changes in enthalpy (ΔH) and in entropy (ΔS) at temperature T. In this study, we aim to evaluate the validity of the end-point free energy calculation methodologies MMPBSA and MMGBSA. The end-point calculations estimate the free energies of the two reactants and their complex using conformations generated via molecular dynamics simulations.
2.1. Free Energy Calculation Using the MMPB(GB)SA Method and Configurational Entropy. The MMPB(GB)-SA method is an efficient way to estimate molecule−molecule interactions from a classical MD simulation. In this approach, the binding free energy is calculated from where G AB , G A , and G B denote the free energies of the complex AB and the reactants A and B, respectively, as The calculation of the configurational entropy is usually broken up into three components: S vib , S trans , and S rot due to the changes in vibrational, translational, and rotational degrees of freedom upon binding, respectively, as In the Amber package, 21 the entropy terms are calculated by a normal mode method, employing the rigid-rotor harmonic oscillation (RRHO) approximation. The changes in the S trans and S rot terms can be calculated by the free energy of association 22 which improves the ranking of the relative binding strengths for the HIV-1 protease inhibitors. 14 2.2. Enhancement of Conformational Sampling. Achieving sufficient sampling of conformational space is the key to the accuracy and reproducibility of free energy predictions. Our previous studies 14,15,19 show that ensembles of multiple short duration simulations are more capable of producing well converged free energy estimates than a single long one is. In the current study, we perform ensembles of MD simulations for the 12 peptides using our ESMACS protocol and single long simulations for four of them ( Table 1). The analyses and comparison of these simulations will be used to validate our calculation protocol for the pMHC molecular systems in which the peptides are larger in size and more flexible than the ligands we have studied previously. A sufficient number of replicas is needed to ensue full error control and hence reproducibility. Fifty replica ensembles are used in this study, the same size we have used in our previous studies, although the study on HIV-1 protease inhibitors indicated that convergence can be reached for ensembles with a smaller number of replicas. 14 One particularly interesting feature of the ESMACS protocol we report here is the enhancement of predictions when we move from the so-called 1-trajectory to the 3-trajectory variant of the method, relaxing the assumption that the substrate and receptor protein sample similar conformations in both the free and bound states. The 1-trajectory method, in which the energies of protein−inhibitor complexes, receptor proteins and ligands/drugs/peptides are extracted from the MD trajectories of the complexes alone, has been used in virtually all MMPBSA studies in the literature. The 1-trajectory method uses the exact same sampling for energy calculations of the protein, ligands, and complex. As a result, the internal bonded interaction (E int ) is completely and the van der Waals (E vdW ) and electrostatic (E ele ) interactions are largely canceled out in the calculated ΔG binding (eqs 2 and 3). The assumption in the 1-trajectory approach is questionable, as binding can, and typically does, lead to conformational changes in both protein and ligands. The free energy associated with the conformational changes of the protein and peptides, also known as the adaptation free energy, is calculated by subtracting the energy of the free state from the bound state: The 3-trajectory method is able to address the role of this adaptation energy, in which separate simulations are performed for complexes, proteins, and ligands, and the energies are calculated from the corresponding trajectories. Indeed, we are unaware of any extensive studies of the 3-trajectory version of MMPB(GB)SA; this is most likely due to the very high fluctuations reported in the individual molecular dynamics trajectories, making it largely impossible to deduce meaningful free energy differences from the differences in parameters derived from the trajectories. Things change dramatically, however, when one performs ensemble studies, as the error

Journal of Chemical Theory and Computation
Article bars are then under full control, and sums and differences can then be performed reliably, as we shall see (section 4).

METHODS
In the present section, we describe how we built the individual molecular models and set up the in silico environment in order to perform these calculations (see Figure 2). It should be evident that, at the scale we are operating on, with vast numbers of models to build and then to run in ensemble mode cohorts, followed by trajectory data processing to determine the free energies of binding, a considerable degree of automation is required. We cannot pursue such an approach by manual means. We have therefore employed a binding affinity calculator (BAC) tool, 19 previously built in our group, to integrate and automate the process of preparing the models, executing simulations and analyses, and managing and staging input and output data.

Modeling of Peptide/MHC Complexes.
We use the HLA-A*02:01 MHC allele as the basis for our study. This allele is among the most frequently studied MHCs, and a large number of experimental binding affinities as well as X-ray structures are available. We downloaded the 10 HLA-A*02:01 structures with the best resolution from the Protein Data Bank (PDB). 23 We aligned their amino acid sequences with the wildtype HLA-A*02:01 sequence downloaded from the IMGT/ HLA database. 24 The PDB structures truly containing the HLA-A*02:01 wildtype were then structurally superimposed. Finally, we chose PDB accession code 3pwn as the basis for our study because its peptide backbone configuration is close to the mean of all HLA-A*02:01 peptide configurations.
We selected 12 different peptide sequences and HLA-A*02:01 binding affinities from Ishizuka et al. 25 (see Table 1). This data set was chosen because the data was determined by a single technique and published by the same group in one article. This makes it most likely to have been measured using common experimental conditions for all peptides and therefore should be internally self-consistent. Like many such binding affinity data sets, no error bars are reported, while the binding affinities themselves are estimated in terms of IC 50 values. The peptides studied here are larger and more diverse than our previously studied small-molecule drugs; 14,15 they have more rotatable torsional degrees of freedom and more variations in charge (−1 to +3).
We used SCWRL (side-chains with a rotamer library) 27 via the peptX 28 framework to model the altered peptide ligands within the MHC binding groove because we have previously shown that this is the most appropriate approach for predicting side chain conformations for the peptides. 29,30 3.2. Molecular Dynamics Calculation Setup. Simulations of pMHC complexes, MHC, and peptides were performed, based on the protocol described in our previous publication. 14 The simulations of the pMHC complexes are used for the free energy estimations in the 1-trajectory MMPB(GB)SA method, while all three sets of simulations are used in the 3-trajectory study. Preparation of all pMHC, MHC, and peptide systems was performed using the binding affinity calculator (BAC) tool. 19 BAC was originally designed to automate the calculation of binding affinities for drugs with HIV proteases and has been extended more recently for generic molecule−molecule interactions. The pMHC complexes were solvated in orthorhombic water boxes with a minimum buffering distance between the protein and box boundary of 14 Å. Counterions were added to electrostatically neutralize the systems. The AMBER ff99SB-ILDN 31 force field was used to describe all interactions. The MD package NAMD2.9 32 was used throughout the equilibration and production simulations with periodic boundary conditions. The systems were maintained at a temperature of 300 K and a pressure of 1 bar in an NPT ensemble.
An ensemble simulation was performed for each pMHC, MHC, and peptide system, in which 50 replicas were used with identical atomic coordinates but different initial velocities generated from a Maxwell−Boltzmann distribution at 300 K ( Figure 2). For each replica, energy minimizations were first performed with heavy protein atoms restrained at their initial positions. Then a series of equilibration runs, totalling 2 ns, were conducted, while the restraints on heavy atoms were gradually reduced. Finally, 4 ns production simulations were run and coordinates were recorded at every picosecond. The choice of 4 ns is based on our previous study 14 and is justified by the studies reported here, which show good equilibration on those time scales (see section 4). To compare results of the ensemble simulations with those from single long runs, 1 μs simulations were run for four of the peptides, two with the highest experimental binding affinities and two with the lowest ( Table 1). Two of these long simulations were repeated with a different force field and MD engine (see Supporting Information). Figure 2 provides the details of the ensemble simulation workflow and indicates the relative core counts and wall clock time elapsed while performing these computations. Overall, on a high performance supercomputer today, like ARCHER used in this study, we are able to execute all of the computations required within a few hours. To compute more than one binding affinity concurrently, one needs to upscale the requirement by the number of drugs of interest. This takes the requirements from the order of 10 000 cores to 100 000 cores

Journal of Chemical Theory and Computation
Article or more; these too are available on a number of large supercomputers available today. The long time scale simulations, however, took about two months each to complete, including the queuing time on the supercomputers. The increasing availability of large supercomputers to a broader community of users makes the ensemble approach a much more time-efficient method by means of which to conduct MD studies as we are able to complete all simulations in the time it takes to perform a single one.
3.3. ESMACS Calculation of the MMPBSA Binding Free Energy between a Peptide and MHC. The binding affinities of the peptides to MHC were analyzed using the MMPB-(GB)SA 32 method within the AMBER 12 package. 21 The MMPBSA.py.MPI 33 module of the package was used, which employs the SANDER module for E MM i , the Poisson− Boltzmann equation solver for G PB i , the generalized Born formula for G GB i , and solvent accessible surface area calculation for G SA i . No cutoff was applied to the nonbonded energies. The PB calculation used a grid spacing of 0.5 Å, with dielectric constants of 1 and 80 for pMHC and solvent, respectively. The nonpolar solvation free energy was calculated using eq 4 in the section 2. The energetic analyses were conducted on configuration snapshots of pMHC, MHC, and peptide, all generated over the course of ensemble MD simulations. In the 1-trajectory study, the snapshots of MHC and peptide are extracted from the trajectories of pMHC complexes, while in the 3-trajectory method, they are also obtained from the separate MHC and peptide simulations.
The free energy G i , including the configurational entropy TS i conf , was calculated for 96 equally spaced snapshots throughout the last 3.84 ns of the production trajectory. The number of snapshots was chosen so that MMPBSA.py.MPI can parallelize calculations of TS conf by assigning the snapshots evenly across all the required computational nodes, which comprise 24 cores each on ARCHER, the UK's national High Performance Computing Service (Figure 2). We can get the calculations completed for all of the snapshots in the time it takes for a single one when we use the same number of cores as the number of snapshots. For comparison with experimental values, the means over all snapshots and replicas per molecular system were determined. As the main focus of this study is the evaluation of the ensemble method for the pMHC systems, the binding free energies from the long single simulations were calculated without the inclusion of the configurational entropies in order to keep the computational costs as low as possible.
3.4. Convergence Analyses of the Calculated Binding Free Energies and Their Rankings. In order to assess the impact of sampling rate on the convergence of averages obtained by the MMPB(GB)SA and NMODE calculations, we have chosen to use the statistical technique of bootstrapping to quantify the errors in the averages obtained from a given ensemble containing N data points. 14 The method involves constructing a number of "resamples with replacement" of the calculated data sets, also containing N data points, equal in size to the calculated ones. "Resample" means that the constructed samples are drawn randomly from the existing sample of data obtained from MD simulations, while "with replacement" means that some data might appear more than once while others might not appear at all. In the current analyses, the data sets are the energy values of ΔG MMPBSA , ΔG MMGBSA , and −TΔS conf for all calculated snapshots. Autocorrelation analyses demonstrate that the binding free energies, which were calculated from snapshots with a time interval of 40 ps, are independent. This is consistent with previous studies which show the correlation time for the decay of the fluctuations is on the order of 1 ps. 34,35 The resampling is repeated many times (in our case 10 000 times), and the standard deviations (σ boot ) are calculated for the resamples.
The ranking of binding affinities is evaluated using two correlation metrics, namely, the Pearson correlation coefficient (r P ), and the Spearman ranking correlation coefficient (r S ). The former is used to assess the linear correlation, and the latter to measure monotone association between two sets of variables. The two correlation metrics give similar results, and only the Pearson correlation is presented in the main text (see Supporting Information, Table S1  (on the order of 10 20 for N = 1, and of 10 344 for N = 50). To simplify the calculations, 1 million sets are randomly chosen for each N-combination and r P calculated from these.

RESULTS
In this section, we present our results. We organize our findings into several sections. In the first, we look at the convergence of free energies computed from individual MD runs and compare these with the behavior of ensembles. We assess the effect of simulation length and number of replicas on convergence and present the ranking of pMHC binding affinities for both 1-traj and 3-traj approaches, as we include increasing contributions to the free energy based on the various terms described in section 2.  (Table 2), while the contribution from the association free energy 14 is negligible (data not shown). Although inclusion of the entropy component has a relatively small impact on the correlations, it has a high impact on the computation of the absolute free energies: the calculated binding free energies are then closer to the experimental data.
It is evident from the scatter plot of Figure 3 that some outliers are responsible for the moderate correlations. An investigation of the basic molecular properties of the peptides

Journal of Chemical Theory and Computation
Article shows that the outliers are the least hydrophobic peptides (Figure 4). The two peptides with the lowest hydrophobicities, HQWDIDSAI and VLWTVFHGA, both lie below the trend line in ΔG MMPB(GB)SA of 1-and 3-trajectory simulations ( Figure  3 i and j). The peptide with the third lowest hydrophobicity is also below the line in the 1-trajectory study but not in the 3trajectory simulation. Removing these data significantly improves the correlations between calculated and experimental data. Excluding the three least hydrophobic peptides ( Figure  4a), for example, increases the Pearson correlation of the 1trajectory study from 0.63 to 0.92 between the calculated ΔG MMPBSA and the ΔG exp . Introducing other parameters, such as normalization of ΔG MMPBSA by the number of heavy atoms or the peptides' hydrophobicities (Figure 4b and c), for example, improves the correlations from 0.63 to 0.74 and 0.88, respectively. A combined normalization by the number of heavy atoms and the hydrophobicity attains a correlation of 0.91 and a ranking among the best peptides in perfect agreement with the experimental data ( Figure 4d). Similar improvements are also found in the normalized correlations for other energy terms (see Supporting Information, Figure S1). Although the improvement is more evident for the peptides with the lowest hydrophobicities, it is indeed applied to the overall correlation for the whole set of peptides. It is a common occurrence that the mass 36 and the hydrophobicity 37 of a ligand influence its potency for binding (see discussion and Figure S2 in the Supporting Information). The continuum solvent model does not explicitly take account of water molecules of which some play a direct role in mediating ligand−protein interactions. The hydrophobicity scale is indeed a good indicator for predicting systematic errors in the binding affinity calculations with implicit models. The role of hydrophobicity in the free energy predictions and the impact of its inclusion on binding free energy ranking merit further investigation for general small molecule−protein complexes. In the following analyses, however, none of the data is excluded, and no extra parameters introduced, as our primary goal is to find a general computational way to rank the binding affinities without any a priori or a posteriori selection of data deemed to be "appropriate" or not. The two peptides with the least hydrophobicities are shown as solid filled circles in i and j, and the peptide with the third least hydrophobicity is also shown as a shadow filled circle in i. The hydrophobicity is indeed a good predictor for a point to be an outlier, as attested in Figure 4. The correlation coefficients are listed in Table  2.

Journal of Chemical Theory and Computation
Article For the whole 4 ns simulation, the error bars in the ΔG MMPB(GB)SA are very similar for different sampling rates, although in the first half of the simulations the error is significantly larger for a sampling rate of less than 20 snapshots per 4 ns ( Figure 5). For the configurational entropy TΔS conf , the errors are reduced for a sampling rate of more than 20 snapshots per 4 ns over the whole course of the 4 ns simulations. The errors in ΔG MMPB(GB)SA − TΔS conf are dominated by that of the ΔG MMPB(GB)SA part and hence exhibit very similar trends to those in ΔG MMPB(GB)SA . It is interesting to note that the errors in ΔG MMPB(GB)SA − TΔS conf are smaller than those in the corresponding ΔG MMPB(GB)SA . The smaller errors suggest that the individual errors in the two components ΔG MMPB(GB)SA and −TΔS conf are correlated and partially canceled out in the overall binding free energy. As the enthalpy change is the main part of ΔG MMPB(GB)SA (along with solvation free energy) and the change of entropy is mainly configurational, the cancellation indicates the validity of the notional entropy−enthalpy compensation.
Most published MD studies use data from a single simulation. In the current study, we performed 50 replica MD simulations of each of our 12 pMHC complexes. The mean values of calculated binding free energies, ΔG MMPB(GB)SA and ΔG MMPB(GB)SA − TΔS, over all frames of all replicas, achieves a good correlation with experimental data. However, the averaged energies from separate members in one ensemble differ substantially, by a maximum of 64.6 kcal/mol in the 1trajectory of RQVSVKLLI case for ΔG MMPBSA , and fall far outside the standard deviation of the same energy term (up to a maximum of 14.5 kcal/mol in any replicas of the RQVSVKLLI simulations) for any single simulation in the ensemble. The structural and dynamic properties, such as the changes of surface area and the fluctuations of peptide and MHC structures, also demonstrate large variations among replicas (see Figure S3, Supporting Information ). Despite all of these fluctuations, the binding free energies from an ensemble simulation reach well converged mean values when the number of replicas is of the order of or greater than 25 (see Supporting Information, Figure S4).
For the four peptides of which longer simulations were performed, the ranges of the ΔG MMPBSA obtained from the long simulations are all smaller than those from the ensemble studies, despite the single simulation duration of 1 μs being much longer than the total production run of their ensemble counterparts (200 ns). The long simulations for each peptide indeed do not increase significantly the range of the calculated binding energies from a single replica in its ensemble, which is only a 4 ns production run ( Figure 6). It is interesting to note that for the two peptides with the most unfavorable binding free energies (AAAKTPVIV and WIKTISKRM), the single long simulation gives more negative binding energies than the ensemble simulation does; for the other two peptides with the most favorable binding, the opposite holds. However, caution should be taken when interpreting the results, as single simulations, even very long by current MD standards, are unlikely to generate reproducible results, as demonstrated by the different behavior of the two long simulations of AAAKTPVIV (Supporting Information, Figure S5). Ideally, the long simulations should also be performed in an ensemble manner to generate reliable estimates. Such simulations are extremely time-consuming both in terms of computational resources and wall clock time.
Reproducibility is a very serious problem when properties are extracted from a single trajectory calculation: it is unlikely to produce the same results when carried out again, by the same person, or by anyone else. To obtain reproducible and reliable results, our previous study 14 required the use of ensemble simulations comprising no less than 25 replicas. Here, we investigate the number of replicas required for each ensemble simulation by assessing its impact on the convergence of the Pearson correlation coefficient r P . Figure 7 shows the influence of ensemble size N on the reproducibility of the correlation coefficient r P between calculated ΔG MMPBSA and the experimental data. ΔG MMPBSA is used here as it is the dominant component in the (ranking of) binding free energies. It is clear that larger sizes of ensemble make rankings more reproducible and with lower standard deviations, as shown in the distributions and averages of r P . There are only minor changes Figure 5. Variation of the bootstrap statistics, σ boot , with replica simulation length and the sampling rate used for the average from 50 replica simulations. Figure 6. Normalized frequency distribution of the binding affinities. Distributions for MMPBSA energies (ΔG MMPBSA ) are shown for four peptides AAAKTPVIV, WIKTISKRM, LLMMTLPSI, and FLIDLA-FLI from ensemble (black lines) and long (red lines) simulations. For each peptide, the energy distribution of one replica, whose energy is closest to the average of the long simulation, is shown with black dots. The expected normal distribution given the same mean and standard deviation for the replica is shown by the blue dashed line.

Journal of Chemical Theory and Computation
Article in ⟨r P ⟩ and its uncertainty beyond the inclusion of 25 replicas in an ensemble, for both 1-and 3-trajectory studies. Just as in the simulation of HIV-1 protease bound to various inhibitors, 14 our results demonstrate that one should use ensembles containing a minimum of 25 replicas per ensemble to provide reproducible results. As expected, the 3-trajectory is better than the 1-trajectory method when ensemble simulations are used; the averaged Pearson coefficient is improved from 0.62 ± 0.03 to 0.74 ± 0.04 for the 50-replica ensembles (Figure 7). If applied only for the single simulation method, however, the 3-trajectory simulation generates a larger uncertainty in the Pearson coefficient (0.45 ± 0.21) than the 1-trajectory does (0.49 ± 0.16). The larger uncertainty comes from the heavier tail in the r P distribution toward negative values ( Figure 7). As we have seen (Figure 8), single simulation models of each of the pMHC complexes, peptide and MHC are subject to large fluctuations, so that the probability of getting anything meaningful by combining these unrelated replicas from 3-trajectory simulations is very small. It is interesting to note that similar numbers of replicas (a minimum of 25) per ensemble are required to provide a converged and reproducible Pearson correlation coefficient for both the 3-trajectory and 1-trajectory methods.
Components of the free energy calculations provide insight into the mechanism of peptide binding. Correlations are calculated between the computed components and the experimental free energies ( Table 2). The van der Waals interaction, the dominant component of the free energy differences, manifests only moderate correlation with the

Journal of Chemical Theory and Computation
Article experimental binding free energies (ΔG exp ) in the 1-trajectory simulation, with r S = 0.24 and r P = 0.39. The 3-trajectory calculation significantly improves these correlation coefficients (r S = 0.66 and r P = 0.58). Although neither the electrostatic (ΔE ele ) nor the electrostatic solvation energy (ΔG PB or ΔG GB ) correlates with the experimental data, their sum does, and gets improved in the 3-trajectory study ( Table 2).
For the pMHC molecules studied here, inclusion of the configurational entropy contribution in the MMPB(GB)SA estimations does not improve the correlations between the calculated binding free energies and the experimental data. Previous studies, invariably using single simulations, draw different conclusions as to the impact of including the entropy term on the agreement of calculations with experimental data. A recent study in our group, 14 using ensemble simulations on the binding of inhibitors to HIV-1 protease, shows that improvements are achieved in the reliability of drug ranking by incorporating entropy estimates in the MMPB(GB)SA calculations; in contrast, Rastelli et al. found that the entropy contribution worsens the agreement between calculations and experiments in the case of inhibitors binding to dihydrofolate reductase (DHFR) with single MD simulations. 38 The inadequate sampling of the conformation space, especially in the single simulation cases, undoubtedly affects the credibility of such conclusions concerning the configurational entropy. Our previous study on HIV-1 protease 14 and the current pMHC simulations follow the same protocol and sample much deeper conformational spaces than other published MMPB-(GB)SA studies. The two protein systems produce different outcomes for the relative importance of the contribution of configurational entropy to the strength of inhibitor binding. A likely explanation is that, for diverse protein−ligand complexes, binding may be driven predominantly by enthalpy, by entropy, or by both. 39 Adaptation free energy is defined as the difference of a molecule, MHC, or peptide here, in free energies between the bound and free states (eq 6). It provides an indication of the conformational changes of the protein and the peptides upon binding, and of the energetic costs involved. The binding free energy from MMPBSA calculation, ΔG MMPBSA − TΔS conf , is used here to evaluate the adaptation energies (the MMGBSA method gives similar results; see Supporting Information, Figure S6). The G MMPBSA − TS conf distributions follow a welldefined Gaussian distribution in the bound and the free states, indicating the convergence of the sampling (Figure 8). The simulations of separate MHC and peptides sample energetically more favorable conformations than those in the complex, implying that there is indeed an energetic penalty associated with conformational rearrangement upon binding. The adaptation energies ΔG adap are calculated for both MHC and the peptides ( Table 3). The more positive the adaptation energy is, the greater the conformational and energetic penalties apply. For MHC, the adaptation energies vary widely upon the binding of different peptides (Table 3), indicating different degrees of conformational rearrangement taking place when the peptides bind. The adaptation energies for peptides are smaller than those for MHC. It is interesting to note that the changes of configurational entropy vary more widely for the peptides than those for MHC, even though the latter has a much larger number of residues than the former. The peptides harbor specific residues, called primary and secondary anchor residues, which are accommodated in the MHC binding pockets. Anchor residues play significant roles in peptide− MHC interactions, and the conformations of their side chains are more likely to be constrained in the bound form. As a result, the adaptation energy is correlated to the rotatable torsional degrees of freedom of the primary anchor residues (Supporting Information, Figure S3). The substantial adaptation energies (Table 3) suggest that separate trajectories should be employed to simulate the flexible binding of the peptide−MHC association. The better correlation of binding affinities based on the 3-trajectory method ( Figure 3) indicates that the approach addresses at least some of the problems pertaining to the inbuilt assumption that the adaptation energy is identically zero within the 1-traj approach.

CONCLUSIONS
In the present study, we have assessed the capability of our recently developed binding affinity protocol for calculating the binding of peptides to MHC. The current molecular systems differ from the systems we have studied in both the ligands (nonpeptide versus peptide) and the binding sites (loops and small β-strand interactions versus helices and a β-floor). The peptides studied here are larger and more flexible than the small-molecule drugs investigated in our previous study and indeed than most orally active drugs. Our new ESMACS protocol combines ensemble based MD simulations with free energy estimates computed from a combination of MMPB-(GB)SA and configurational entropy methods. The current study indicates that the protocol can be applied to pMHC systems without parameter fitting (except potential para-

Journal of Chemical Theory and Computation
Article metrization for a novel molecule) or "training" as used in machine learning. Sufficient conformational sampling is one of the most important factors in our ability to obtain reliable and reproducible ranking of free energies with reasonable accuracy. Correct ranking of inhibitors is often emphasized and suffices in many areas, including lead discovery in the pharmaceutical drug development and drug selection in the clinical practice. 40 We have applied ensemble simulations, containing 50 replicas per ensemble, in this study to produce the rankings we have presented. We have also compared the results from 1-and 3trajectory methods. The free energy values differ significantly between replicas in an ensemble, up to 35 and 43 kcal/mol in the 1-and 3-trajectory simulations, respectively. The binding induces conformational changes in both MHC and peptides, with which the adaptation energies are associated. The large adaptation energy, up to 39 kcal/mol, indicates that it is necessary to invoke the 3-trajectory method, especially for flexible small-molecule/protein binding such as the peptides to MHC in this study. By comparing the results from ensemble simulations and single long runs, we have shown that performing multiple replicas with different initial velocities in combination with the 3-trajectory method is essential to produce reliable results. The dependency of rankings on the ensemble size shows that a minimum of 25 replicas per ensemble should be used for these pMHC systems. This is consistent with our previous study on the binding of inhibitors to HIV-1 protease, in which a minimum of 25 replicas were recommended to obtain reproducible results. Using a general hydrophobicity scale for amino acids also significantly improves the calculated binding free energy ranking. It should be noted that no errors are associated with the reported experimental binding affinity estimates which comprise IC 50 measurements; thus, experimental uncertainties may be one source of the errors reported here.
We have now provided a systematic, reproducible, rapid, and precise protocol for the ranking of binding affinities for various small molecules to their target proteins. These molecular systems include inhibitors to HIV-1 protease, 14 tyrosine kinase inhibitors to epidermal growth factor receptor, 15 peptides to MHC, and others currently under investigation in our laboratory. To make this approach more broadly accessible, and to be able to apply it to a wide range of molecular systems, as well as more easily, requires full scale automation through workflows and on demand access to suitably secure HPC resources. Substantial high performance computing resources are required to execute the workflow, including multiple model building steps, large scale ensemble molecular dynamics production runs, together with concurrent data analytics performed on the multiterabytes of data generated, in order to compute reliable free energies with controlled error bounds. Provided that all of the ensemble calculations can be deployed on a large supercomputer, we can get all of the results in the time it takes to perform a single run (namely, of the order of 24 h).