Globular bundles and entangled network of proteins (CorA) by a coarse-grained Monte Carlo simulation

Using a coarse-grained model, self-organized assembly of proteins (e.g. CorA and its inner segment iCorA) is studied by examining quantities such as contact profile, radius of gyration, and structure factor as a function of protein concentration at a range of low (native phase) to high (denature phase) temperatures. Visual inspections show distinct structures, i.e. isolated globular bundles to entangled network on multiple length scales in dilute to crowded protein concentrations. In native phase, the radius of gyration of the protein does not vary much with the protein concentration while that of its inner segment increases systematically. In contrast, the radius of gyration of the protein shows enormous growth with the concentration due to entanglement while that of the inner segment remains almost constant in denatured phase. The multi-scale morphology of the collective assembly is quantified by estimating the effective dimension D of protein from scaling of the structure factor: collective assembly from inner segments remains globular (D aroud 3) at almost all length scales in its native phase while that from protein chains shows sparsely distributed morphology with D around 2 in entire temperature range due to entanglement except in crowded environment at low temperature where D around 2.6. Higher morphological response of chains with only the inner-segments due to selective interactions in its native phase may be more conducive to self-organizing mechanism than that of the remaining segments of the protein chains.


Introduction
Self-assembly of proteins [1][2][3][4][5][6][7][8] provides mechanical strength and stable dynamical response to the underlying environment such as membrane via its hierarchical morphology. It can provide a reliable and responsive pathways in ion channels [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] for selective transport of such specific elements as potassium ions. There are many examples of protein self-assembly with conflicting (adverse and cooperative) effects [3,4]. For example, the self-assembly of proteins into a viral capsids is critical in preserving the genome from non-conducing external factors such as digesting enzymes of the host cells, undesirable pH, temperature, etc. [3]. On the other hand, the self-assembly of proteins triggered by the conformational changes of the proteins may lead to undesirable aggregation such as amyloid beta-proteins into fibrils. Of a diverse range of proteins with unique and universal response properties, we consider a transmembrane protein CorA [9][10][11][12][13][14][15][16][17][18][19][20][21][22] with a well-defined inner (iCorA) and outer (oCorA) segments [23]. The functional structure of CorA is known to exist as a homo-pentamer [19] to provide coordinated open and close states for the selective transport of Mg 2+ across the ion channels. Transport of ions depends on the permeation pathways and consequently on the conformation of individual proteins and their interacting network. How CorA proteins assemble collectively into a well-defined morphology is not well understood.
Despite numerous experimental investigations, understanding of the underlying pathways remain highly speculative. Computer simulations provide a viable tool to probe the structural response of proteins and its assembly that may help clarifying such speculative hypothesis. Extensive computer simulations have been recently performed to understand the structure of CorA protein embedded in a model membrane by all-atom MD simulations [19]. Such atomic-scale investigations are insightful in probing the small-scale structural response, however, it is limited to short time scale despite large-scale computer simulations; it is not feasible to examine largescale structural responses by such approach without severe constraints. For example, the morphology of a set of five proteins with inner segment confined by a nanodisc seems to deform out if they start from a symmetric configuration (see figure 1). Even with a large-scale all-atom MD simulation, it is rather difficult to see a significant dissociation besides fusion of the symmetry. Therefore, a large-scale coarse-grained analysis [23,24] is needed to augment and clarify the distinctions (see below) if feasible. Perhaps protein-protein interaction may not be enough to direct five CorA proteins into a stable pentamer; other factors such as a nanodisc or membrane matrix may be needed for the stability. Before incorporating additional constitutive components to probe directed assembly, it would be important to understand the self-assembly of the proteins first. Since the protein channel involves cooperative response of many proteins, it would be desirable to investigate the collective structures in a crowded protein environment. Our goal is to understand the stability of the multi-scale morphologies of interacting proteins in its native and denatured phases. The model is described in the next section, followed by results and discussion with a concluding remark at the end.

Model
We consider a coarse-grained model [24] of the protein chain on a cubic lattice for simplicity, efficiency, and practical utility: it incorporates specificity of each residue, ample degrees of freedom (unlike minimalist lattice models) for residues to perform their stochastic movements and their covalent bonds to fluctuate. The internal structures of residue (i.e. finegrained details at atomistic scale) are ignored, however, the specificity of residues is considered via their unique residue-residue interactions (see below) which is critical in capturing the unique characteristics of the protein. In our coarse-grained representation, CorA is a chain of 351 nodes (residues) tethered together on a cubic lattice by flexible peptide bonds. A node represents a residue and occupies a unit cell (of size (2a) 3 , with lattice constant a); the bond length between consecutive residues varies between 2 and (10) in unit of lattice constant (a) [25]. Typically, a protein chain is placed in the simulation box in a random configuration, initially with minimum bond length (2a) between the consecutive nodes; this initial configuration is further randomized by allowing each node to perform its stochastic movement (to its 26 neighboring cells with varying distance) with the strictly implemented excluded volume constraint. The bond-fluctuating mechanism has been used extensively in addressing a range of complex problems in polymers [25]; we have extended its utility to model protein chains [23,24]. Coarse-graining of the simulation box (cubic lattice) and the residues representation by a lattice cell make it one of the efficient computational method for modeling such systems while retaining the potential for fine-graining [26] to further enhance the degrees of freedom.
Each residue interacts with the neighboring residues within a range (rc) of interaction with a generalized Lennard-Jones potential, where rij is the distance between the residues at site i and j; rc=8 and  = 1 in units of lattice constant. We use a knowledge-based residue-residue interaction as input in our phenomenological potential (1) for ij. The knowledge-based residue-residue interactions [27,28] are derived from the distribution of the amino acids in a growing ensemble of frozen protein structures in protein data bank (PDB); the underlying solvent environment is therefore taken into account implicitly. Various assumptions and approximations [27,28] are further made in deriving these contact potentials which makes it somewhat difficult to calibrate the scales of the physical quantities in absolute units. The relative strength of residue-residue interactions is critical for protein to adopt its specific conformations. Thus, the potential strength, ij, is unique for each interaction pair with appropriate positive (repulsive) and negative (attractive) values selected from the knowledgebased contact interactions [27,28]. The knowledge-based residue-residue interactions have been used extensively in modeling protein structures for decades [29][30][31][32]. We use the residue-residue contact matrix by Betancourt-Thirumalai (BT) [27], an improved version of classic Miyazawa-Jernigan (MJ) interaction [28]. It is worth pointing out that there can be alternate methods to incorporate the specificity of residue interactions in a phenomenological potential (1) including the computed interactions involving all-atom MD simulations [33,34]. Structural evolution of protein chains (CorA) are analyzed in detail as they interact, associate, and dissociate in their native (relatively low temperature) to denatured (high temperature) phases. The sample is prepared by inserting protein chains randomly first in the simulation box; protein chains are moved around with excluded volume constraints to randomize their conformation and distribution further. The protein chain CorA consists of 351 residue, inserting many chains, each of length (Lc = 351) in a simulation box of size L 3 (=350 3 ) become difficult beyond a certain number (Nc = 5  100) of chains due to onset of jamming. The computer processing unit (CPU) also increases with the number of chains. Therefore, the number of protein chains is restricted to perform computer simulations with reasonable CPUs (order of days to week) while capturing the cooperative and competing effects of residue-residue, segmental, and overall protein-protein interactions on the local and global association and dissociation of protein chains. Further, relaxation time to reach steady-state (by examining the temporal evolution of various physical quantities) also increases which makes it difficult to assess the thermal response of physical quantities in equilibrium. The protein concentration is defined by the fraction p of the lattice sites occupied by the residues of the protein chains, i.e. p = 8  Lc / L 3 . The number of protein chains Nc = 5  100 each with length Lc = 351 (protein chain of CorA) and Lc = 61 (inner segment of protein) are considered with simulation box of size L = 350, and 150 respectively. Note that the simulation box becomes crowded at a lower protein concentrations with longer chain lengths.
Each residue performs its stochastic movements with the Metropolis algorithm which involves selecting a residue randomly at a site say i of a randomly selected protein chain and attempting to move it to one of its neighboring site say j. If the excluded volume constraints and limits on the bond length for the proposed move are satisfied then we calculate the corresponding change in energy ∆Eij = E j − E i between its old (E i) and new (E j) configurations, and move the node with the Boltzmann probability exp(− ∆Eij /T) where T is the temperature in reduced unit of the Boltzmann constant (kB). Attempts to move each node once defines the unit Monte Carlo step (MCS), i.e. Nc  Lc attempts to move randomly selected nodes in a simulation box with Nc protein chains each with Lc residues defines the unit MCS. Simulations were performed for sufficiently long time steps (typically 10 million time steps) to make sure that system has reached its steady state equilibrium with a number of independent samples (10-100) to estimate the average values of the physical quantities. A number of local and global physical quantities were determined such as the energy of each residue and protein chain, contact map, their mobility, mean square displacement of the center of mass of the protein, radius of gyration, and its structure factor. These physical quantities are in arbitrary units, i.e., the length is in units of the lattice constant which is different from many all-atom simulations where realistic units for size and time scales are used via calibration of σ and εij from experimental data for simplified systems. It is difficult to quantify physical quantities in absolute units due to the phenomenological nature of the interactions (Eq. (1)). The trend in response of the physical quantities to such parameters as the temperature and network density should however be qualitatively comparable with appropriate observations.

Results and discussion
We study the effects of protein-protein interaction on a number of local and global physical quantities as proteins organize, associate and disassociate as a function of temperature and their concentration (i.e. the number of proteins). Our system has reached the steady-state equilibrium during the course of simulations (i.e. in 10 7 time steps) for almost entire temperature regime and concentrations except in extreme limits (i.e. high temperature and higher concentrations) where the relaxation time is too large due to entanglement and jamming of extended structures of proteins. Nevertheless, it is illustrative to understand the organizing morphologies as the protein chains assemble, entangle, and disassociate.

Snapshots, contact map and profiles
Representative snapshots of CorA and that of its inner segments (iCorA) with 100 protein chains each in the simulation box is presented in figure 2 at a low (T=0.020) and a high (T=0.040) temperature. Both, the protein chains and its inner segments appear to disperse almost uniformly. A closer inspection shows that, in native (low temperature) phase, CorA proteins dissociate more on the scale comparable to the size of a protein chain in general while the clusters of inner segments appear to form isolated globular bundles. The residue-residue interaction is more conducive to agglomeration of chain with the inner segments than that of the full protein (CorA). Corresponding snapshots at a higher temperature (T=0.040) show different morphologies where some degree of phase-separation seem to occur with CorA as the proteins continue to dissociate, expand, and entangle. The spreading of extended inner segments at T=0.040 reduces the segregation of protein clusters seen at the low temperature. Attempts are made to quantify our qualitative observations of complex morphologies by analyzing the structure factor (see below).
Contact maps (figures S1-S4) of the assembly of CorA chains and inner segments at a low temperature (T=0.020) seem to suggest a more compact morphology of iCorA in crowded protein environment than that of CorA. At high temperature (T=0.040), frequency of segmental contacts is much lower in their crowded protein matrices than that at the low temperature; segmental contacts in chains with inner segment is still higher than that with CorA. The average number (Nn) of residues around each is a measure of the contact profile which is presented in figure 3 and 4 for a range of temperatures (T=0.0200.040). Obviously, the residue contact density decreases on increasing the temperature with more contacts at low temperatures (native phase) and less in denatured phase. Variation in pattern of the contact profiles with the temperature however, reveals how the protein segments organize during the self-assembly that leads to a global morphology. At the low temperature (T=0.020), most part of the protein chains seem to be surrounded by proteins leading to a compact (globular) morphology. Onset of preferential contacts emerges on raising the temperature (T=0.025). For example, second half of residues in chains with only inner segment (iCorA) of the protein (residues 316 F  351 L) have higher contact density than the first half (residues 291 M  315 N) (figure 4). Preferential contact becomes more localized ( 316 F  325 W) on increasing the temperature further (T=0.030). Even though the contact density of many residues is relatively high in self-assembled morphology in CorA, its random distribution along the backbone makes less defined at higher temperatures ( figure 3). Further, the fraction of segments with relatively low contact density remains appreciably high. Thus, the protein-protein interaction in inner segments is likely to provide a more stable support for a collective morphology of CorA protein.

Radius of gyration
How does the size of protein chains depend on the protein concentration in native and denatured phases? How are residues distributed as the protein chains associate and dissociate? We address it by analyzing the radius of gyration of each protein and structure factor of the selforganized assembly. Variations of the average radius of gyration of the protein CorA with the concentration is presented in figure 5 at representative low and high temperatures. The radius of gyration of the protein does not show much variation with the protein concentration at low temperatures (inset figure 5). At the high temperature (T=0.040) the radius of gyration grows continuously with the concentration reaching a steady-state value in the crowded regime. It should be pointed out that the entanglement is enhanced at higher protein concentrations. We believe that a large increase of the radius of gyration is caused by the competition between entanglement and the thermal expansion of chains.  Figure 6 shows that the radius of gyration of chains with only inner segments increases with the protein concentration p systematically at low temperatures (T=0.015, 0.020) in its native phase and exhibits little variation with p at higher temperatures (T  0.025) in its denatured phase. We therefore believe that the protein-protein interaction among the inner-segments of the protein in its native phase may be more responsive to self-organized collective morphology of proteins.

Structure factor
Overall distribution of residues over multiple length scales can be assessed by analyzing the structure factor S(q) which is defined as, where rj is the position of each residue in all protein chains and |q| = 2/ is the wave vector of wavelength . Using a power-law scaling of the structure factor with the wave vector, i.e., one may study the spread of residues over the length scale  by evaluating the exponent  which describes the mass (residue) distribution. Since we know the overall size of each protein chain via its radius of gyration ( figure 5, 6), we can estimate the dependence of the number of residues on multiple length scales (), i.e. from the size of a residue to the linear scale (L) of the simulation box. On the spatial length scale comparable to radius of gyration (  Rg), we can estimate the scaling exponent  from the power-law, Rg  N  , where N is the number of residues (a measure of the molecular weight of the protein); the effective (fractal) dimension (D) of each protein D = 1/ ,  = . Similarly we can estimate the mass distribution of the protein self-assembly at   Rg. Variation of the structure factor (S(q)) with the wave length () of the self-assembly of the protein (CorA) is presented in figure 7 at temperature T = 0.030 for a wide range of protein concentrations, dilute (Nc=5) to crowded (Nc=100) regime. Note that the radius of gyration has reached a steady-state value for all concentrations of the protein at T = 0.030. We see that D  2.6 for each protein (  Rg), D  2.2 at Rg    50 (in unit of lattice constant). For all residues distributed over the entire simulation box,   L, the effective dimension decreases systematically from D  2 with Nc=100 to D  1 with Nc=5 where each protein chain appears isolated. At high temperature (T=0.040), it is difficult to reach steady state during the course of simulations for the radius of gyration of CorA proteins at higher protein concentrations due to entanglements (figure S5). Variation of the structure factor with the wave vector (figure S5) reveals a rather tenuous morphology ( from size of a residue to about third of the simulation box, L/3 much larger than Rg of protein chains) of the entangled fiberous protein chains at the high temperature with an effective dimension D  1.7.
The structure factor of residues' distribution in a dilute-to-crowded protein environment (Nc=5 100) of chains with inner segments is presesnted in figure 8    The variation of the structure factor with the wavelength in the most crowded environment (Nc=100) is presented in figure 9 for a range of temperatures. The globular morphology (D  3) of individual protein chain seems to persist at lower temperatures (T=0.015, 0.025) at length scale comparable to its size ( Rg). On a larger-scale (Rg    3 Rg), the assembled structure is not as compact (D  2.5), but the overall assembly remains compact (solid with D  3) for its spread over the simulation box at the low temperature (T=0.015). The multi-scale morphology of selforganizing proteins is heterogenious regardless of its globular conformations in its native phase.
The morphology remains relatively dense D  2.5 even at a much higher temperature, i.e. T=0.030. On large length scale, (  L), the self-assembled morphology appears tenuous. The morpholgy adopts to a random coil D  1.8 at high temperature T = 0.040 where onset of a regular long range structure appear to sets in (with an oscillation in S(q)).

Conclusions
Self-organized structures of interacting proteins (CorA) and its inner segments are investigated by a coarse-grained Monte Carlo simulation as a function of protein concentration at a range of low to high temperatures. Visual inspection show clear distinctions in morphology of the assembly in dilute solution and that in the crowded (dense) matrix at both low and high temperatures. CorA proteins seem to dissociate more on the scale comparable to the size of the proteins while the clusters of chains with inner segments phase-separate in its native phase (low temperatures). It appears that the protein-protein segmental interaction is more conducive to agglomeration of inner segments than that of the outer segments of the protein.
Variation in pattern of the contact profiles with the temperature reveals how the protein segments organize during the self-assembly that leads to a global morphology. We find that a relatively lower fraction of residues in outer segments (residues 1 M-290 V) of the protein participate in segmental globularization (with a random distribution) in comparison to that of the inner core segment ( 291 M… 351 L) with well-defined globular region. The radius of gyration of the protein does not vary much with the protein concentration at low temperatures. However, it shows enormous growth as the entanglement increases with the concentration at high temperature. In contrast, the radius of gyration of chains with inner segments increases with the protein concentration p systematically at low temperatures (T=0.015, 0.020) in its native phase. We therefore believe that the protein-protein interaction among the inner-segments of the protein in its native phase is conducive to responsiveness of the self-organizing pathways.
The multi-scale morphology (isolated globular bundles to entangled network) is quantified by evaluating an effective dimension D from the scaling analysis of the structure factor with the wave vector. In general, the effective dimension of the assembled morphology of protein chains is much lower than that of its inner segments. For example, the most compact morphology of the proteins with D  2.6 (  Rg) even at the highest protein concentrations (most crowded) appears at low temperature while it remains almost extended in a random configuration with D  2 at almost entire temperature and concentration range considered here. In contrast, the self-assembled morphology of chains with inner segments remains globular D  3 at almost all length scales in its native phase. It is difficult to identify main cause of a stable pentamer configuration due to complexity of the crowded proteins resulting from inter-and intra-chain residue-residue interactions, competition between the self-organizing mechanism and steric constraints, and entanglement. Based on our analysis presented here, we believe that a stronger protein-protein interactions among the inner segments of the protein is conducing to its collective stable selfassembly. Figure S1: Contact map of residues around each of CorA ( 1 M 2 E … 351 L) residues in a simulation box with the number of protein chains Nc = 5, 10, 20, and 100 at a low temperature (T = 0.020).