The Early Phase of β2m Aggregation: An Integrative Computational Study Framed on the D76N Mutant and the ΔN6 Variant

Human β2-microglobulin (b2m) protein is classically associated with dialysis-related amyloidosis (DRA). Recently, the single point mutant D76N was identified as the causative agent of a hereditary systemic amyloidosis affecting visceral organs. To get insight into the early stage of the β2m aggregation mechanism, we used molecular simulations to perform an in depth comparative analysis of the dimerization phase of the D76N mutant and the ΔN6 variant, a cleaved form lacking the first six N-terminal residues, which is a major component of ex vivo amyloid plaques from DRA patients. We also provide first glimpses into the tetramerization phase of D76N at physiological pH. Results from extensive protein–protein docking simulations predict an essential role of the C- and N-terminal regions (both variants), as well as of the BC-loop (ΔN6 variant), DE-loop (both variants) and EF-loop (D76N mutant) in dimerization. The terminal regions are more relevant under acidic conditions while the BC-, DE- and EF-loops gain importance at physiological pH. Our results recapitulate experimental evidence according to which Tyr10 (A-strand), Phe30 and His31 (BC-loop), Trp60 and Phe62 (DE-loop) and Arg97 (C-terminus) act as dimerization hot-spots, and further predict the occurrence of novel residues with the ability to nucleate dimerization, namely Lys-75 (EF-loop) and Trp-95 (C-terminus). We propose that D76N tetramerization is mainly driven by the self-association of dimers via the N-terminus and DE-loop, and identify Arg3 (N-terminus), Tyr10, Phe56 (D-strand) and Trp60 as potential tetramerization hot-spots.


Introduction
Protein aggregation is a process whereby monomeric proteins self-associate to form higher-order oligomers (i.e., dimers, trimers, tetramers, etc.). Often, the final products of protein aggregation are amyloids, i.e., elongated unbranched fibrils consisting of β-structures of separate monomers positioned perpendicular to the fibril axis and stacked strictly above each other [1].
Since amyloids are often associated with disease (e.g., Parkinson's, Alzheimer's, and several systemic amyloidoses [2]), establishing the mechanisms of protein aggregation leading to amyloids is particularly important. This task, however, is proving more challenging than solving the folding with I 2 . Indeed, by combining in crystallo with in silico data, the authors reported that D76N sparsely populates a highly dynamic conformation that exposes very aggregation prone regions as a result of a loss of β-structure at the N-and C-terminal strands. Conformational dynamics in this species are especially enhanced relative to the wt protein in the EF-loop and residues located at the end of strand A. The authors associated the higher aggregation propensity of D76N to the destabilization of its outer strands, in line with predictions from simulations [34].
The identification of monomeric species with the potential to self-associate is the first step towards establishing the mechanism of aggregation. The latter implies the identification of all microscopic steps leading to mature fibrils, the rate constants governing each step, and determining the manner according to which they depend on protein sequence and environmental conditions (e.g., pH). Ideally, it should be possible to determine the size distribution and structures of the oligomeric assemblies, filaments, protofibrils and fibrils that form along the amyloid cascade [35]. In the case of b2m, there is experimental evidence supporting the view that the dimerization of aggregation prone monomers is the first step of fibrillogenesis under physiological conditions [36][37][38][39] and that the aggregation pathway of wt b2m proceeds exclusively by the formation of even-numbered oligomers (soluble tetramers and hexamers) formed through the addition of dimeric units [40,41].
In the present study, we conducted an extensive comparative analysis of the dimerization process of D76N and ∆N6 through rigid-body docking simulations of monomeric conformations representative of I (∆N6), I 1 (D76N) and I 2 (D76N) populated under acidic and physiological conditions. In the case of D76N, whose folding space embeds two intermediate states, we analyze the interfacial regions of both homo and heterodimers. We also report first glimpses into the interfacial structure of D76N tetramers resulting from the association of dimers of I 2 at physiological pH. To gain mechanistic insights into Radford's 'prionlike hypothesis,' we investigated the formation of heterodimers resulting from the association of monomeric conformations representative of the native state of wt b2m and the native state of ∆N6 under relevant pH conditions.
The cost function used in the docking simulations optimizes the interfaces for shape, hydrophobic and electrostatic complementarity; a specific term for intermolecular hydrogen bonds is also included. The novel cost function is an improved version of that used in previous studies [25,33], which optimizes dimer interface for shape complementarity by only taking into account packing interactions.
The deployed simulation approach delivers a structurally resolved picture of the dimerization and tetramerization interface that recapitulates previous experimental findings, and provides novel predictions regarding the structural regions and residues that trigger aggregation as a function of pH.

Materials and Methods
Previously, we explored the dimerization phase of ∆N6 [25] and D76N [33] as a function of pH by means of a three-stage methodological approach that comprises discrete molecular dynamics (DMD) simulations, constant pH molecular dynamics (CpHMD), and rigid-body Monte Carlo docking simulations based on a cost function that only takes into account packing interactions to optimize shape complementarity. In what follows, we provide an overview of our approach and describe with more detail a novel version of the Monte Carlo ensemble docking (MC-ED) protocol that was employed to perform the comparative analysis reported here. The reader is referred to our previous publications for details and results on DMD and CpHMD molecular dynamics simulations [7,25,33,42].

Overview of the Methodological Approach
We start by saying that the adopted methodological approach does not consider the possibility of protein dimerization occurring concurrently and concomitantly with folding, a situation that could lead to domain-swapped dimers [30]. Instead, it considers the scenario according to which protein-protein association occurs upon the formation of intermediate states en route to the native state, which have the potential to trigger the aggregation pathway. Such folding intermediates must be aggregation prone (by exposing hydrophobic patches) and thermodynamically stable enough (i.e., sufficiently long-lived) to allow for the establishment of intermolecular interactions. Therefore, the first step in our approach was the exploration of the equilibrium folding space of the considered model systems and the identification of aggregation-prone intermediate states. This was achieved via replica-exchange DMD simulations of a full atomistic protein representation combined with a simple, structure-based Go potential. Simple Go potentials are native centric, which means that folding energetics is exclusively driven by native interactions. Since Go potentials do not incorporate non-native interactions, they will not be able to capture misfolding processes leading to compact non-native states or, more generally, regions of the folding free energy landscape where non-native interactions may play a determinant role (e.g., the denatured state). The aggregation-prone intermediates we identified previously [25,33] were native-like in the sense that they exhibit a well-preserved native core but feature unstructured termini.
While DMD simulations allow capturing the timescale of protein folding, they do not incorporate the pH, which is known to be an important environmental parameter in protein aggregation. Indeed, the pH modulates the charge of the ionizable side-chains, which can induce large-scale conformational changes (e.g., the modification of secondary/tertiary structure content [43]) and, more often, minor structural modifications that may have a direct impact on protein-protein association. Furthermore, the modulation of the ionizable side-chains controls the pattern of intermolecular electrostatic interactions established upon protein-protein association. To assess the effect of pH on the dimerization process of b2m we conducted CpHMD with explicit titration starting from conformations representative of the structure-based folding intermediates. This approach generates ensembles of conformations representative of intermediate states at a specific pH (i.e., whose structure and charge has been modulated by the pH), which are subsequently used to study dimerization via protein-protein docking. The outcome of docking simulations was an ensemble (1000-1500 conformations) of (homo or hetero) dimers formed by monomers of intermediates under different pH conditions (e.g., an ensemble of dimers of I 1 -I 1 intermediates at acidic pH, an ensemble of dimers of I 1 -I 2 intermediates at physiologic pH, etc.). The dimers within each ensemble were then subjected to a classical molecular dynamics protocol of structure relaxation to remove clashes and other structural errors. The resulting dimers of I 2 were subsequently docked in order to produce an ensemble of (1000) tetramers. The relaxed ensembles of dimers were eventually structurally analyzed to get information about the triggers of aggregation, i.e., the most likely regions initiating the process, and, at a finer level, the residues that will most likely establish a larger number of intermolecular interactions, acting as aggregation hot-spots.

Monte Carlo Ensemble Docking
The Monte Carlo ensemble docking (MC-ED) was originally introduced in Reference [44] to study the dimerization phase of the small folding domain spc-SH3. It is a low-resolution rigid-body docking procedure that creates protein complexes with an interface optimized for shape complementarity by using a cost function that uniquely considers packing interactions.
While shape complementarity is considered one of the main drivers of protein-protein association, it is known that electrostatic interactions (including hydrogen bonds) and hydrophobicity ( Figure 1A) also play an important role [45][46][47]. Therefore, in this study, we considered a more realistic cost function that optimizes the complex's interface for shape, electrostatic and hydrophobic complementarity, the so-called triad of protein-protein association. The protein representation used in the docking simulations considers all atoms (including hydrogen) explicitly. We followed Urbanc et al. [48,49] and used square-well potentials to model inter-atomic interactions ( Figure 1B-D). In what follows, we describe the three types of considered intermolecular interactions.
Biomolecules 2019, 9, x 5 of 23 We followed Urbanc et al. [48,49] and used square-well potentials to model inter-atomic interactions ( Figure 1B-D). In what follows, we describe the three types of considered intermolecular interactions. Hydrogen bonds (h-bond) occur when a donor (D) atom donates its covalently bonded hydrogen atom to an electronegative acceptor (A) atom, D-H···A (D, A = N, O, S). In a protein-protein association, an h-bond can be established between the backbones of two interacting chains (i.e., with the donor atom located at one of the backbones and the acceptor atom located at the other), or between pairs of (acceptor-donor) atoms pertaining to the side-chains located at the interfaces of the resulting protein complex. In this study, the distance between donor and acceptor atom was the only criteria to determine the occurrence of intermolecular h-bonds. The reasons are threefold: (1) There is no consensus regarding the geometric constraints that should be imposed when modelling hydrogen bonds [50][51][52]; (2) the linearity of the hydrogen bond is not a stringent requirement for its establishment because hydrogen bonds can be found in other geometric arrangements (which are, nevertheless, weaker) [53]; and (3) to keep the method as computationally efficient as possible. In order to model h-bond interactions, we considered a square-well potential energy function ( Figure  1B), whose width ranges between the hard-core distance σ (which is the sum of the van der Waals radii [54] of the two interacting atoms) and a cut-off of 3.2 Å, which is considered the maximum distance for the establishment of a moderately stable (1-5 kcal/mol) hydrogen bond [55][56][57][58]. Since hydrogen bonds in protein-protein interfaces are usually geometrically less optimal (and therefore less stable) than those in the protein interior [53], we took a conservative choice for the corresponding interaction strength E HB = 3 kcal/mol. This approach also avoids overestimating the contribution of non-geometrically optimal h-bonds for dimer stability. As in Urbanc et al. [55], we set the potential energy for the h-bond to unit energy ( E HB = −1).
Following References [48,55], we approximated the electrostatic interaction between two charged atoms by a double square-well potential. Two atoms with charges of the same sign interact through a positive (i.e., repulsive) two-step potential, while the interaction between two atoms with opposite charge is modeled through a negative (i.e., attractive) two-step potential ( Figure 1C). For these pairwise interactions the signs of the atomic charges in the GROMOS 54A7 force field were used [59]. The protonation states of the protein's titrable groups at each pH were assigned accordingly to their pKa values, obtained from the CpHMD simulations. The cut-off distance was set to 2.3σ, and the width of first potential well is σ < r < 1.4σ (short-range interactions 4-5 Å) [60]. The Hydrogen bonds (h-bond) occur when a donor (D) atom donates its covalently bonded hydrogen atom to an electronegative acceptor (A) atom, D-H···A (D, A = N, O, S). In a protein-protein association, an h-bond can be established between the backbones of two interacting chains (i.e., with the donor atom located at one of the backbones and the acceptor atom located at the other), or between pairs of (acceptor-donor) atoms pertaining to the side-chains located at the interfaces of the resulting protein complex. In this study, the distance between donor and acceptor atom was the only criteria to determine the occurrence of intermolecular h-bonds. The reasons are threefold: (1) There is no consensus regarding the geometric constraints that should be imposed when modelling hydrogen bonds [50][51][52]; (2) the linearity of the hydrogen bond is not a stringent requirement for its establishment because hydrogen bonds can be found in other geometric arrangements (which are, nevertheless, weaker) [53]; and (3) to keep the method as computationally efficient as possible. In order to model h-bond interactions, we considered a square-well potential energy function ( Figure 1B), whose width ranges between the hard-core distance σ (which is the sum of the van der Waals radii [54] of the two interacting atoms) and a cut-off of 3.2 Å, which is considered the maximum distance for the establishment of a moderately stable (1-5 kcal/mol) hydrogen bond [55][56][57][58]. Since hydrogen bonds in protein-protein interfaces are usually geometrically less optimal (and therefore less stable) than those in the protein interior [53], we took a conservative choice for the corresponding interaction strength E HB = 3 kcal/mol. This approach also avoids overestimating the contribution of non-geometrically optimal h-bonds for dimer stability. As in Urbanc et al. [55], we set the potential energy for the h-bond to unit energy (E HB = −1).
Following References [48,55], we approximated the electrostatic interaction between two charged atoms by a double square-well potential. Two atoms with charges of the same sign interact through a positive (i.e., repulsive) two-step potential, while the interaction between two atoms with opposite charge is modeled through a negative (i.e., attractive) two-step potential ( Figure 1C). For these pairwise interactions the signs of the atomic charges in the GROMOS 54A7 force field were used [59]. The protonation states of the protein's titrable groups at each pH were assigned accordingly to their pK a values, obtained from the CpHMD simulations. The cut-off distance was set to 2.3σ, and the width of first potential well is σ < r < 1.4σ (short-range interactions 4-5 Å) [60]. The interaction strength corresponding to the first potential well,E EL1 , was obtained by normalizing the median free energy gain upon salt bridge formation at the protein surface (1.2 kcal/mol) [55,61,62] to the interaction strength of an hydrogen bond (i.e., 3.0 kcal/mol in our model). As in [48], we set the interaction strength of the second potential to E EL2 = 0.3 × E EL1 .
To include amino acid interactions that result from the amino acid's hydropathic nature, we considered a third square-well potential that captures interactions between pairs of hydrophobic/hydrophilic atoms within the side-chains ( Figure 1D). The interaction energy was negative (positive) when the distance between two hydrophobic (hydrophilic) atoms was smaller than 160% of the sum of their van der Waals radii [63], which was the cut-off distance below which the interfacial volume was considered as solvent-excluded.
The potential energy between two interacting atoms is defined as E HP = HP 1 + HP 2 , with HP t = −s t × SASA t ÷ n t being the hydropathy value of a specific type of atom t. In this equation, s t is the atomic solvation parameter of atom t (which corresponds to the free energy gain/loss per unit of solvent-exposed area), SASA t is the solvent accessible surface area of atom t, and n t is an estimate for the number of neighboring atoms (usually two). We used the values of the atomic solvation parameters derived by Cummings et al. [64] (Table 1), based on the transfer free energies obtained by Fauchère et al. [65], using the solvent accessible surface areas reported by Lesser et al. [66]. These parameters have a higher discriminating power in the evaluation of protein-protein docking solutions [64]. The hydropathy value of each interacting atom was normalized to the energy of a hydrogen bond, which results in hydrophatic parameters within the interval 0.1 ≤ HP t ≤ 0.4. Table 1. Atomic solvation parameters (cal Å −2 mol −1 ) derived by Cummings et al. [64].
In the MC-ED protocol, the center-of-mass of one of the monomers (or dimers) was kept fixed at the origin of a Cartesian coordinate axis system, and at each MC step the other monomer (or dimer) was allowed to move (i.e., to perform a random rotation or random translation) along the docking axis. The total energy of the complex's interface contained the three contributions discussed above, i.e., Figure 1C), and U HP ij = U HP (r) ( Figure 1D). At the end of each MC run, a dimer (or tetramer) was generated whose interface was optimized to have a minimum number of clashes and a minimal interfacial energy. The MC-ED was consecutively applied to randomly selected pairs of monomers (or dimers) until the mean and standard deviation of the binding energy both converged. At that point, an ensemble of dimers (or tetramers) containing 1000-1500 (or 1000) conformations was generated. Further information and details on the operational implementation of the MC-ED procedure can be found elsewhere [25,44].
The conformations within each dimer ensemble were finally subjected to a classical protocol of structure relaxation to ensure GROMOS 54A7 compatibility. This protocol started with three energy minimizations steps followed by three restrained MD steps. For the minimization, the first two steps (steepest descent and conjugated gradient) were performed without constraints followed by a steepest descent step with all bonds constrained by the p-LINCS algorithm. The MD steps were performed for 100 ps (the first) and 200 ps (the last two) with gradually decreasing position restraints (1000-10 kJ nm −2 mol −1 ). The first were performed at NVT using Berendsen thermostat (310 K with a coupling constant of 0.01) while, in the other two, the pressure was also held constant using Berendsen barostat at 1 atm (with a coupling constant of 2.0 and an isothermal compressibility of 4.5 × 10 -5 bar −1 ). The GROMOS 54A7 force field was used to perform the relaxation protocol with GROMACS 4.0.7. All GROMOS typical parameters were used namely: 2 fs time step, a twin-range cut-off (8 Å and 14 Å), a reaction-field with a dielectric constant of 54.0 to treat electrostatic contributions beyond 14 Å, and the dimers were solvated with SPC water molecules ensuring that one monomer only interacted with the other in one direction (15,000-35,000 water molecules). About 75% of the dimer structures produced with the MC-ED docking protocol were successfully relaxed to the classical force field. The other 25% were discarded because they had significant conformational clashes that lead to very high-energy interactions that the minimization algorithms were not able to correct. The adopted protocol drove a local structure relaxation of protein complexes, which means that the oligomerization interface never changed significantly after the relaxation step. At this point, we feel that a word of caution is necessary. Since the MC-ED is a rigid-body procedure and the relaxation step only allows for local structure relaxation, the adopted methodology is not able to capture large structural changes that may accompany protein association; therefore it cannot be used to make accurate predictions on oligomer structure. However, this limitation should not be determinant in the context of the present work, whose ultimate goal was to predict the residues that trigger protein-protein association establish by establishing a higher number of intermolecular interactions. Indeed, the dimerization (and, by extension, aggregation) hot-spots should depend, in the first place, on the encounters between monomers (and on how monomers face each other. The ensemble nature of the docking method naturally incorporates this dependence by considering a significantly large number of monomer pairs with different relative orientations. The method's limitation to capture large conformational changes may be more relevant to identify the hot-spot residues that trigger tetramerization. However, by also capturing the structural heterogeneity of protein-protein association at the tetramer level, the method mitigates-at least up to a certain extent-the limitation resulting from its rigid-body nature.

Model Systems
b2m is a 99 amino acid long chain that folds natively into a seven-stranded antiparallel β-sandwich formed by two sheets of antiparallel β-strands ( Figure 2). One of the sheets comprises strands A-B-E-D, and the other sheet contains strands C-F-G. The two β-sheets are linked covalently by one intramolecular disulfide bond between Cys25 (located on strand B) and Cys80 (located on strand F), which is required for aggregation in vitro [67]. Another key structural feature of b2m is the existence of a peptidyl-prolyl bond on the BC-loop (between His31 and Pro32), which adopts a thermodynamically unfavorable cis isomerization in the native state.
With the exception of a higher isoelectric point and a new hydrogen bond between Asn76 and Tyr78, the native structure of the D76N mutant ( Figure 2A) matches closely that of the wt form [68].
∆N6 is a truncated variant of b2m, which lacks the first six N-terminal residues ( Figure 2B). Radford and co-workers proposed that ∆N6 is a structural mimic of the intermediate I T because it populates a conformational state that reproduces the conformational features of I T and represents 90% of ∆N6's in vitro equilibrium population [37]. In line with this finding, ∆N6 reveals a major repacking of the hydrophobic core to accommodate the nonnative peptidyl-prolyl trans-isomer at Pro32.
In the present study, we used the high-resolution crystal structure of Goto and co-workers as a model system for the wt form under physiological conditions (PDB ID: 2YXF) and that of Bellotti and co-workers as a model system for the D76N mutant (PDB ID: 4FXL). For ∆N6, we used the NMR structure determined by Radford and co-workers (PDB ID: 2XKU).
Previously, we explored the folding transition of the wt b2m, D76N mutant [33] and ∆N6 variant [25] by means of replica-exchange DMD simulations of a full atomistic structure-based Go potential. We found that D76N populates two intermediate states ( Figure 3A), which we termed I 1 and I 2 , featuring a well-preserved core (strands B-F) and showing one (I 1 ) or two unstructured termini (I 2 ). While I 1 displayed a 20% increase in SASA relative to the native state, in the case of I 2 the SASA enhancement reached 53%. In particular, 53% of b2m's hydrophobic residues became considerably solvent-exposed in the case of I 1 , and 76% in the case of I 2 . While I 2 was exclusively populated by D76N, I 1 is conserved across wt and mutant [33]. The ∆N6 variant populated an intermediate state, termed I, which was topologically similar to I 1 ( Figure 3B). In particular, it also featured a well-preserved core and an unstructured and detached terminus [25]. However, in the intermediate I, the labile terminus was the N-terminus instead of the C-terminus. In the intermediate I, about 62% of the hydrophobic residues became considerably more solvent-exposed than in the native state. All intermediates exhibited the trans peptidyl-prolyl bond on the BC-loop, characteristic of the intermediate I T .     is unstructured [25].
In order to assess the effect of pH on the structure and dynamics of the intermediate states, we conducted CpHMD simulations where protonable residues were allowed to titrate. The trajectories Biomolecules 2019, 9, 366 9 of 22 started from conformations representative of the folding intermediates obtained from structure-based simulations. For both model systems, physiological pH 7.2 was considered. For ∆N6, we also investigated pH 6.2, because this is the pH of the inflamed joints and ∆N6 shows an enhanced in vitro aggregation potential at this pH [29]. For the D76N mutant, we considered a more acidic pH 5.2 because, to the best of our knowledge, the aggregation process of D76N was only assessed under extremely acidic (pH 2.5) [68] and physiological conditions [32,34,68,69].
A detailed analysis of the pH-induced conformational effects at the monomer level was performed in previous studies [25,33], and, here, we focused on its major findings (SI: Tables S1-S3). First, acidic pH induced significant deviations (~16 Å) relative to the native structure of the region comprising strand A and the AB-loop in the intermediate I (∆N6) (SI Table S1). In the case of I 1 and I 2 (D76N), we observed striking deviations of the two terminal regions (up to~20 Å) both at neutral and acidic pH. This is in line with results reported by Le Marchand [34]. The DE-loop and the EF-loop also exhibited significant deviations from their native positions (up to~9 Å for I 2 ) across the investigated pH values (SI Tables S2 and S3). In the case of intermediate I 1 , we highlighted a comparably striking mobility of the C-terminus (up to~20 Å), while the N-terminus, DE-loop and EF-loop showed more conservative motions (between 5.4 and 7.6 Å) (SI Table S3).

Dimer Stability under Different pH Conditions
We started by computing the probability density function (PDF) for the binding energy of dimers composed by monomers (representative of the identified intermediate states) that were extracted from the equilibrated part of CpHMD trajectories at acidic pH ( Figure 4A), slightly acidic pH ( Figure 4B), and physiological pH ( Figure 4C). We also computed the PDF for the binding energy of heterodimers formed by the native state of ∆N6 and the native state of wt b2m, which are the species involved in the 'prion-like hypothesis' ( Figure 4B). For comparative purposes, we evaluated the PDF for the binding energy of dimers formed by monomers representing the native state of the D76N mutant under different pH conditions. Finally, we evaluated the probability distribution function for the number of intermolecular contacts, which showed that the dimers produced with the deployed docking exhibited a similar degree of compactness, with interfaces reaching~5K intermolecular atomic contacts (and 250-300 atomic clashes on average) (data not shown).
Under acidic conditions (pH 5.2), the probability density function (PDF) for the binding energy was conserved across the dimers formed by monomers of D76N intermediate states ( Figure 3A). The mode (which represents the most likely binding energy within the ensemble of dimers) was approximately the same (E M~− 18) for dimers of the intermediate I 2 (i.e., I 2 -I 1 and I 2 -I 2 complexes) and slightly higher (E M~− 16) for homodimers of I 1 .
Interestingly, at pH 7.2, there were noticeable differences in the PDFs. First, the PDF corresponding to I 1 -I 1 dimers shifted towards higher binding energies. This loss in stability may have resulted from the deprotonation of His84, which in turn was coupled to a smaller mobility and detachment from the core of the C-terminal region. The loss of C-terminal mobility implies that this region became less available to participate in intermolecular interactions. The PDF of I 2 -I 1 dimers fairly conserved the mode (E M~− 17), while I 2 -I 2 dimers got slightly more stable (E M~− 19) and clearly more stable than I 1 -I 1 dimers (E M~− 13) at physiological pH. Furthermore, the tails of the PDF of I 2 -I 2 dimers extended towards lower energy values with higher probability than at pH 5.2. Taking the binding energy alone as a proxy of dimer stability, one can predict that at physiological pH homodimers of I 2 will be the most stable. Since dimers must be stable enough to oligomerize further, it is also likely that homo and heterodimers of I 2 are more prone to aggregate than I 1 homodimers and are therefore the key species in D76N aggregation. We note, however, that if dimers are too stable, they are likely to remain soluble. Therefore, the most stable dimers, i.e., those pertaining to the tails of the distributions, are not necessarily the ones that will grow into fibrils.
homodimers of I2 will be the most stable. Since dimers must be stable enough to oligomerize further, it is also likely that homo and heterodimers of I2 are more prone to aggregate than I1 homodimers and are therefore the key species in D76N aggregation. We note, however, that if dimers are too stable, they are likely to remain soluble. Therefore, the most stable dimers, i.e., those pertaining to the tails of the distributions, are not necessarily the ones that will grow into fibrils. It is interesting to compare the behavior of D76N with ΔN6, which populates the intermediate I. At pH 7.2 and 6.2, I-I dimers had binding energies (EM ~ −19) similar to D76N I2-I2 dimers, suggesting similar aggregation potential for these two b2m intermediates.
The PDF for the binding energy of heterodimers formed by the native state of ΔN6 and the native state of wt b2m was strikingly shifted towards higher energies with the mode located at EM ~ −6 at both considered pH values, indicating that these dimers are weakly bound. This observation is in line with experimental evidence based on NMR measurements reported by Radford and co-workers [70], who, nonetheless, argued that such a weak binding can still induce conformational changes in the wt b2m protein, originating an amyloidogenic conformation, as well as on ΔN6 itself, while a stronger binding may block the conformational alterations characteristic of a prion mechanism. It is interesting to compare the behavior of D76N with ∆N6, which populates the intermediate I. At pH 7.2 and 6.2, I-I dimers had binding energies (E M~− 19) similar to D76N I 2 -I 2 dimers, suggesting similar aggregation potential for these two b2m intermediates.
The PDF for the binding energy of heterodimers formed by the native state of ∆N6 and the native state of wt b2m was strikingly shifted towards higher energies with the mode located at E M~− 6 at both considered pH values, indicating that these dimers are weakly bound. This observation is in line with experimental evidence based on NMR measurements reported by Radford and co-workers [70], who, nonetheless, argued that such a weak binding can still induce conformational changes in the wt b2m protein, originating an amyloidogenic conformation, as well as on ∆N6 itself, while a stronger binding may block the conformational alterations characteristic of a prion mechanism.

Structure of D76N Dimers under Different pH Conditions and Dimerization Hot Spots
To get insight into the structural features of the D76N dimer's interface and identify which regions of the monomers are more likely involved in the formation of dimers at acidic and neutral pH, we computed intermolecular probability maps (IPMs) ( Figure 5). The IPMs provide the probability of formation of every intermolecular contact established at the dimer's interface. To properly count contacts within a dimer, it is important to recall that the binding energy contains contributions from three interaction potentials. Therefore, we firstly identified the type of atoms in every possible pair and used the corresponding interaction cut-off to decide if they were within interaction distance (i.e., making a contact). The probability of each intermolecular contact was evaluated by counting the number of times the contact was present in the ensemble of dimers that was used to determine the PDF. A representative dimer conformation (i.e., a conformation with binding energy corresponding to the mode of the PDF) was reported together with the IPM. It is interesting to compare these results with those obtained previously [33], which were based on docking simulations with a cost function that optimized dimers exclusively for shape complementarity. The DE-and EF-loops were already rather important adhesion zones in the dimerization of I2 at acidic and physiological pH. However, the C-terminus and the (unstructured) C-terminal region (i.e., C-terminus and G strand) played a strikingly dominant role in I2 dimerization at acidic pH that was substantially suppressed when electrostatic and hydrophobic interactions were considered in the cost function. Indeed, the electrostatic interactions involving polar and charged residues of regions such as the N-terminal A-strand were likely accountable for the observed differences. As for the dimerization of I1, the AB-loop was already an important driver of monomer association (also at physiological pH), and the participation of the C-terminus in dimerization was significative, in line with the present results. To carry out a finer analysis of the dimer interfaces, we identified the so-called dimerization hot-spots (HS), i.e., residues that had a higher tendency to establish intermolecular interactions ( Figure 6). Such residues may act as nucleation sites in dimerization (and, by extension, in the aggregation mechanism). Predicting HSs is important, because it represents information that can be tested in vitro (e.g., through mutagenesis). Within the adopted working framework, HSs were identified by computing the probability of intermolecular interaction per residue within the subset of the 50 most frequent intermolecular interactions.
The analysis of the IPMs ( Figure 5) indicated that the DE-loop and EF-loop behaved as adhesion zones in the association of all considered intermediate states at pH 7.2 ( Figure 5A-C). Their importance, however, was more evident in the association of I 2 -I 2 dimers at physiological pH. It is possible that the detachment of both the N-and C-terminal regions from the core in the I 2 intermediate state facilitated (and fosters) the movement of the DE-and EF-loops, in line with observations reported in [34]. Since this mobility enhancement was stronger at pH 7.2 (SI Table S2), the loops will more likely establish intermolecular interactions at physiological conditions. The intermolecular interactions involving these loops became less likely at acidic pH, but their fingerprint was still noticeable in the IPMs ( Figure 5D-F), with Phe70 (EF-loop) acting as an HS residue ( Figure 6A,B). The leading HS residue at physiological pH was clearly Trp60 (DE-loop), whose role as an interaction hub sharply decreased as the pH was lowered to 5.2. Indeed, under acidic conditions, the dimerization of I 2 was majorly triggered by Arg3 (N-terminus), followed by two clusters of residues located on the DE-loop and adjoining D-strand (His51, Phe56 and Trp60), and, to a lesser extent, on the EF-loop and adjoining E-strand (Tyr67, Phe70 and Lys75) ( Figure 6B). We also pinpointed the participation of Arg3 (N-terminus), Tyr10 and Arg12 (A-strand) in the association pattern of homo and heterodimers, particularly, at physiologic pH.

Structure of ΔN6 Dimers under Different pH Conditions and Dimerization Hot Spots
Here we extended the analysis to the ΔN6 variant. We performed a comparative analysis of the dimerization phase of the intermediate I with that triggering a prion-like templating mechanism (Figure 7). According to the latter, the conversion of wt b2m into an aggregation prone conformer was induced by bimolecular collision between the wt protein and the ΔN6 mutant. Therefore, we investigated the structure of dimers formed by the native structure of the wt protein and the native structure of ΔN6.
The analysis of the IPMs revealed that at pH 7.2 the homodimers of the intermediate state I populated by ΔN6 ( Figure 7A) associated through the DE-loop and BC-loop, and, to a lesser extent, via the FG loop and the C terminus. When the pH was lowered to 6.2 the N-terminal region (comprising the A-strand and AB-loop) became an important adhesion zone ( Figure 7B), in part due to its higher mobility (SI Table S3). The DE-loop conserved its importance, and the interactions involving the FG-loop gained relevance. The major difference between the reported IPMs and those obtained before [25], i.e., with a cost function that optimizes shape complementarity, was a more evident fingerprint for the BC-loop, presumably due to the role of electrostatic interactions established by the His31. It is interesting to note that Trp60 was one of the most significant aggregation HS at physiological and acidic pH ( Figure 6D), but its role as an interaction hub in ΔN6 was significantly downgraded when compared with the results for D76N. Phe30 (BC-loop) conserved its role as an HS when the pH was lowered, while His84 and Thr86 (FG loop) saw an enhancement Under acidic pH, the C-terminus gained relevance as an adhesion zone in the heterodimers and, more strikingly, in homodimers of I 1 (Figure 5D-F) possibly due to an increased detachment of the C-terminus, which was coupled with increased protonation of His84 (FG-loop) (pK a~5 .2). The AB loop also stood out as an important structural element in I 1 -I 1 dimerization, establishing preferential interactions with the EF loop and AB loop of the other monomer, as well as with the C-terminus. At pH 5.2, I 1 monomers associated mainly through Trp95 and Arg97 (C-terminus), followed by His13 and Lys19 (both pertaining to the AB-loop) ( Figure 6A). The former were also leading HSs in the dimerization of the heterodimers (Figure 6C), where His51 (D-strand) also acted as an HS because of its increased protonation at acidic pH (pKa~6.5).
It is interesting to compare these results with those obtained previously [33], which were based on docking simulations with a cost function that optimized dimers exclusively for shape complementarity. The DE-and EF-loops were already rather important adhesion zones in the dimerization of I 2 at acidic and physiological pH. However, the C-terminus and the (unstructured) C-terminal region (i.e., C-terminus and G strand) played a strikingly dominant role in I 2 dimerization at acidic pH that was substantially suppressed when electrostatic and hydrophobic interactions were considered in the cost function. Indeed, the electrostatic interactions involving polar and charged residues of regions such as the N-terminal A-strand were likely accountable for the observed differences. As for the dimerization of I 1 , the AB-loop was already an important driver of monomer association (also at physiological pH), and the participation of the C-terminus in dimerization was significative, in line with the present results.

Structure of ∆N6 Dimers under Different pH Conditions and Dimerization Hot Spots
Here we extended the analysis to the ∆N6 variant. We performed a comparative analysis of the dimerization phase of the intermediate I with that triggering a prion-like templating mechanism (Figure 7). According to the latter, the conversion of wt b2m into an aggregation prone conformer was induced by bimolecular collision between the wt protein and the ∆N6 mutant. Therefore, we investigated the structure of dimers formed by the native structure of the wt protein and the native structure of ∆N6. of Tyr10 (A-strand) clearly stood out at pH 6.2, presumably because of the increased mobility of the N-terminal region coupled with the increased protonation of His13. The IPMs for the heterodimers formed by the native state of wt b2m and the native state of ΔN6 revealed an important role of the DE-loop (especially at pH 6.2) ( Figure 7D) and of the CD-loop (more pronounced at pH 7.2) ( Figure 7C) in the dimerization process. Under physiological pH, the F-strand and the FG-loop also participated, although to a less extent, in protein-protein association ( Figure  7C). These results are in line with those reported by Radford and co-workers [70], that claimed the involvement of the DE-loop, BC-loop and FG-loop in the interfaces of the heterodimers of ΔN6 and wt b2m. Trp60 stood out again as an aggregation HS (especially at pH 6.2), and Arg45 (CD-loop) at both physiological and pH 6.2 ( Figure 6E). Arg81 (F-strand) was also an important linker at pH 7.2 followed by Arg97 (C terminus), as well as Arg3 (N terminus) at pH 6.2. It is likely that the intermolecular interactions between the positively charged arginine residues may have contributed to destabilize the interfacial region, which is in line with results reported in [70], according to which the high binding energies of these heterodimers arise from an interaction interface that is predominantly electrostatic (instead of hydrophobic) in nature due to a stronger participation of electrostatic interactions (particularly involving residues from the FG-loop).  The analysis of the IPMs revealed that at pH 7.2 the homodimers of the intermediate state I populated by ∆N6 ( Figure 7A) associated through the DE-loop and BC-loop, and, to a lesser extent, via the FG loop and the C terminus. When the pH was lowered to 6.2 the N-terminal region (comprising the A-strand and AB-loop) became an important adhesion zone ( Figure 7B), in part due to its higher mobility (SI Table S3). The DE-loop conserved its importance, and the interactions involving the FG-loop gained relevance. The major difference between the reported IPMs and those obtained before [25], i.e., with a cost function that optimizes shape complementarity, was a more evident fingerprint for the BC-loop, presumably due to the role of electrostatic interactions established by the His31. It is interesting to note that Trp60 was one of the most significant aggregation HS at physiological and acidic pH ( Figure 6D), but its role as an interaction hub in ∆N6 was significantly downgraded when compared with the results for D76N. Phe30 (BC-loop) conserved its role as an HS when the pH was lowered, while His84 and Thr86 (FG loop) saw an enhancement at acidic pH. The HS character of Arg97 (C-terminus) was enhanced at physiological pH, while that of Tyr10 (A-strand) clearly stood out at pH 6.2, presumably because of the increased mobility of the N-terminal region coupled with the increased protonation of His13.
The IPMs for the heterodimers formed by the native state of wt b2m and the native state of ∆N6 revealed an important role of the DE-loop (especially at pH 6.2) ( Figure 7D) and of the CD-loop (more pronounced at pH 7.2) ( Figure 7C) in the dimerization process. Under physiological pH, the F-strand and the FG-loop also participated, although to a less extent, in protein-protein association ( Figure 7C). These results are in line with those reported by Radford and co-workers [70], that claimed the involvement of the DE-loop, BC-loop and FG-loop in the interfaces of the heterodimers of ∆N6 and wt b2m. Trp60 stood out again as an aggregation HS (especially at pH 6.2), and Arg45 (CD-loop) at both physiological and pH 6.2 ( Figure 6E). Arg81 (F-strand) was also an important linker at pH 7.2 followed by Arg97 (C terminus), as well as Arg3 (N terminus) at pH 6.2. It is likely that the intermolecular interactions between the positively charged arginine residues may have contributed to destabilize the interfacial region, which is in line with results reported in [70], according to which the high binding energies of these heterodimers arise from an interaction interface that is predominantly electrostatic (instead of hydrophobic) in nature due to a stronger participation of electrostatic interactions (particularly involving residues from the FG-loop).
A comparative analysis of the HS exhibited by the two model systems at pH 7.2 ( Figure 6F) allowed us to identify the residues whose importance as potential nucleation sites was conserved across variants. Such residues may be considered relevant to understand b2m aggregation. Our study predicts that Tyr10 (A-strand), Phe30 and His31 (BC-loop), Arg45 (CD-loop), Trp60 and Phe62 (DE-loop), Lys75 (EF-loop), and Trp95 and Arg97 (C-terminus) are key players in b2m dimerization. We propose Tyr10, Lys75, Trp95 and Arg97 as novel dimerization HS because the remaining identified residues have already been proposed to have an important role in b2m dimerization (e.g., His31 in the interface of a ∆N6 nanobody-trapped domain-swapped dimer [30], Arg45 in the interface of DCIM50 covalent homodimers obtained by the substitution of glutamate at position 50 by a cysteine [71], Trp60 in interfaces of b2m dimers [30,39], and Phe62 in the interface of the DimC33 covalent homodimer obtained by the substitution of serine at position 33 by a cysteine [39]).

First Glimpses into the Tetrameritation Phase of D76N
Dynamic light scattering measurements by Vachet and co-workers suggested that in the presence of Cu 2+ ions and urea the aggregation pathway of wt b2m proceeds exclusively by the formation of even-numbered oligomers (soluble tetramers and hexamers) formed through the addition of dimeric units [40]. In line with this proposal, results from cryo-electron microscopy studies by White et al. [41] indicated that the basic assembly units of the fibril protofilaments are tetramers obtained by a dimer-of-dimers arrangement.
Here, we reported first steps in the study of the tetramerization phase of D76N at physiological pH. The justification for focusing our analysis on the D76N mutant is two-fold: First, it is the most aggregation-prone form of b2m according to studies in vitro [32], and second, the D76N mutant has a clear biological significance [68]. Because the simulations (i.e., protein-protein docking and structure relaxation with molecular dynamics) were significantly time consuming, we restricted our analysis to the most aggregation-prone intermediate state, namely I 2 .

Conclusions
Solving the aggregation mechanism of protein beta-2-microglobulin (b2m) is a task of paramount importance given its role as causative agent of dialysis related amyloidosis (DRA), a conformational disorder that affects more than 90% of the individuals undergoing long-term hemodialysis [73].
In vivo, the high affinity of b2m for collagen drives its preferential deposition on the osteoarticular system, which becomes significant when serum concentration attains the micromolar range observed during hemodialysis. It has been proposed that the intense local electric fields due to charge arrays on collagen's surface energetically destabilize the native structure, which becomes structurally labile and therefore prone to conformationally convert into an amyloid-competent state [3]. Other factors that may influence aggregation in vivo include the interaction of b2m with copper Assuming that the aggregation pathway of the D76N conserves the parity of that of wt b2m, we studied the formation of D76N tetramers by docking dimers of I 2 at pH 7.2. An ensemble of 1000 tetramers was generated. The PDF for the binding energy indicated that tetramers were significantly less stable (E M~− 10) than the homodimers of I 2 (E M~− 19) ( Figure 8A), suggesting that dimers were the most likely dominant species in the initial phase of D76N aggregation. The analysis of the probability map ( Figure 8B) for the intermolecular contacts suggested that the N-terminus together with the DE-loop were the most important adhesion zones in the tetramer ( Figure 8C). In line with this observation, we found that intermolecular interactions in the tetramer were most likely mediated by 1) Trp60 (DE-loop), 2) Arg3 (N-terminus), 3) Phe56 (D-strand), 4) Tyr 10 (A-strand), and, to a lesser extent, by 5) Lys58 (DE-loop) and 6) Arg97 (C-terminus) ( Figure 8D). Trp60 was shown to participate in the interfaces of DCIM20 and DCIM50 tetramers formed by the DCIM50 and DCIM20 covalent homodimers obtained by the substitution of glutamate and serine at position 50 and 20, respectively, by a cysteine [71]. The participation of Lys58, Arg97 and Phe56 in the tetramer's interface is supported by selective covalent labelling experiments with the wt form in the presence of Cu(II) [72]. Interestingly, while the latter also suggests an important role for the D-and G-strands in the establishment of the tetramer's interface, they precluded the participation of the N-terminus. However, this observation may result from the fact that the residues located on the N-terminus remained Cu(II) binding sites, and, therefore steric hindrance precluded their participation in the tetramer's interface.

Conclusions
Solving the aggregation mechanism of protein beta-2-microglobulin (b2m) is a task of paramount importance given its role as causative agent of dialysis related amyloidosis (DRA), a conformational disorder that affects more than 90% of the individuals undergoing long-term hemodialysis [73].
In vivo, the high affinity of b2m for collagen drives its preferential deposition on the osteoarticular system, which becomes significant when serum concentration attains the micromolar range observed during hemodialysis. It has been proposed that the intense local electric fields due to charge arrays on collagen's surface energetically destabilize the native structure, which becomes structurally labile and therefore prone to conformationally convert into an amyloid-competent state [3]. Other factors that may influence aggregation in vivo include the interaction of b2m with copper ions (Cu 2+ ) [74] (which catalyze the isomerization of the His31-Pro32 bond), medium acidification resulting from inflammation [75], and the presence of molecules like lysophosphatidic acid [76], non-esterified fatty acids [77] and glycosaminoglycans [78].
Unfortunately, the wild-type (wt) form does not aggregate de novo in vitro under physiological conditions, and, over the years, researchers have been exploring engineered or naturally occurring model systems of b2m to gain insight into the fibrillogenesis mechanism of the parent species. The present study focused on the truncated mutant ∆N6, whose biological significance in not clear, and on the single point mutant D76N found in one French family, which aggregates in several visceral organs causing a systemic amyloidosis. While the results reported here help gain insight into the pre-amyloid phase of the parent species in DRA, they do not entail an exclusive role of the truncated species in the actual fibrillogenesis pathway of the full-length wt protein. Aditionally, they do not seek to reduce the latter to the aggregation pathway of the D76N mutant. Indeed, it is likely that aggregation of the full length wt form is strictly dependent on unique environmental conditions occurring in the osteoarticular system of dialysis patients, and, therefore, the latter should be identified and mimicked both in vitro as well as in simulations in order to draw a more accurate picture of wt b2m aggregation in DRA. The goal of the present study is to provide mechanistic insights, hypothesis, and testable theoretical predictions on the early dimerization phase of two model systems (∆N6 and D76N) that aggregate in vitro under physiological unseeded conditions, which may provide clues and insights into the aggregation mechanism of the wt form.
This study differs from our previous contributions [25,33,44] in one important point: The docking procedure we deployed to explore the dimerization and tetramerizatoin phases of b2m used a cost function that extended beyond packing interactions, i.e., created dimers optimized for shape and energetic complementarity by also considering electrostatic (including hydrogen bonds) interactions), and interactions between hydrophobic atoms.
While the original version of the Monte Carlo ensemble docking (MC-ED) algorithm predicted a direct role (i.e., via the establishment of intermolecular contacts) of the unstructured terminal regions (of all intermediate states) as triggers of aggregation, the method's version used herein indicates that the unfolding and detachment of the terminal regions from the core may increase the mobility and solvent exposure of other structural elements that now appear as sticky regions, namely the DE-loop (in the dimerization of both b2m variants), the EF-loop (in the dimerization of the D76N mutant), and the BC-loop (in the dimerization of the ∆N6 variant). In particular, the new cost function highlights a more important role for the DE-loop and the EF-loop in the dimerization of the I 2 intermediate (D76N) at pH 5.2. Overall, the DE-, EF-and BC-loops dominate at physiological pH and the terminal regions at acidic pH. Interestingly, the strikingly dominant role of the C-terminus and adjacent unstructured G strand in the dimerization of I 2 at acidic pH [33] is substantially suppressed when, besides packing interactions, electrostatic interactions (e.g., those involving polar and charged residues of the N-terminus and A-strand) are also considered. Our results indicate the involvement of residues (1) Arg3, (2) Trp60 and Phe22, (3) His51 and Phe70, (4) Phe56, (5) Tyr67, Lys75, and (6) Trp95 in I 2 dimerization at pH 5.2, and they also predict the involvement of (1) Arg3, (2) Tyr10, and (3) Arg12 in the homo and heterodimers of I 1 and I 2 intermediates at physiological pH.
The results reported here predict a relevant role for the A-strand in the dimerization of ∆N6 under slightly acidic conditions (pH 6.2) and indicate a clear fingerprint for the BC-loop in the dimer's interfacial region (most likely due to the electrostatic interactions established by His31), which was not detectable with the original version of the docking method. We also analyzed, for the first time, the dimerization interfaces resulting from intermolecular interactions between the native state of ∆N6 and the native state of wt b2m, which would underlie a "prion-like" mechanism for b2m amyloidogenesis. The dimers we obtained are the most unstable of all dimers studied here, featuring relatively high binding energies in agreement with experimental data reported by Radford and co-workers [70]. Our results support the involvement of the DE-loop, BC-loop and FG-loop in the interfaces of the heterodimers of ∆N6 and wild-type b2m, also in line with experimental data [70].
These theoretical predictions could be tested in the context of mutagenesis experiments in which single (or double) point mutations are done to replace the hot-spot residues and the kinetics of aggregation are measured.
Finally, by studying the dimerization of dimers of the intermediate I 2 populated by the D76N mutant, we obtained first theoretical insights into the tetramerization interface of the mutant. We predict that the N-terminus and DE-loop play an important role in the interface of the tetramer, and we propose that the formation of the latter is mediated by interactions involving (1) Trp60 (DE-loop), (2) Arg3 (N-terminus), (3) Phe56 (D-strand), (4) Tyr 10 (A-strand), and, to a lesser extent, (5) Arg97 (C-terminus), and (6) Lys58 (DE-loop).
Supplementary Materials: The following are available online at http://www.mdpi.com/2218-273X/9/8/366/s1, Table S1: Cα RMSD of the full length ∆N6 intermediate as well as of specific protein regions measured in relation to the native structure; Table S2: Cα RMSD of the full length intermediate I 2 as well as of specific protein regions in relation to the native structure; Table S3