Multiple-binding-site mechanism explains concentration-dependent unbinding rates of DNA-binding proteins

Recent work has demonstrated concentration-dependent unbinding rates of proteins from DNA, using fluorescence visualization of the bacterial nucleoid protein Fis [Graham et al. (2011) (Concentration-dependent exchange accelerates turnover of proteins bound to double-stranded DNA. Nucleic Acids Res., 39:2249)]. The physical origin of this concentration-dependence is unexplained. We use a combination of coarse-grained simulation and theory to demonstrate that this behavior can be explained by taking into account the dimeric nature of the protein, which permits partial dissociation and exchange with other proteins in solution. Concentration-dependent unbinding is generated by this simple model, quantitatively explaining experimental data. This effect is likely to play a major role in determining binding lifetimes of proteins in vivo where there are very high concentrations of solvated molecules.

W e use a Brownian Dynamics simulation that incorporates a Monte-Carlo type update step to simulate binding and unbinding events. The former iteratively solves the discretized Langevin equation (with finite time increments δt) to update a number of particles i with a radius of a and positions r i : where µ ij = δ ij /(6πηa) is the Stokes mobility without the effect of hydrodynamic interactions (though in principle they could be included), ξ i is a random velocity satisfying the fluctuation-dissipation theorem ( ξ i = 0 and ξ i ξ j = 2k B T µ ij δ ij ), δ ij is the Kronecker delta, and U ij is the * To whom correspondence should be addressed. Tel: +44 000 0000000; Fax: +44 000 0000000; Email: john-marko@northwestern.edu pairwise interaction between particles i and j: where the first contribution is a Lennard-Jones potential of strength˜ k B T that occurs between beads i and j a distance r ij apart, and the second contribution is a spring potential with spring constantκ = 200.0 that provides connectivity between beads i and j. This value ofκ assures connectivity between i and j such that deviations from a distance of r ij = 2a are small. The matrices ω LJ αβij and ω C αβij provide the information regarding interaction and connectivity respectively between beads i and j of type α and β. In our system, we have two groups of beads α/β: DNA beads D and binder beads B. We focus largely on dimeric sets of binders due to the dimeric geometry of most non-specific DNA binding proteins. (1,2,3) We use the constraint ω C BBij = ω C BBji = 1 (ω C BBij = 0 otherwise) when i is even and j = i+1 for the dimeric binders. In monomeric situations, ω C BBij = 0 for all binder pairs. We also use the conditions that ω LJ BDij = 1 (binders interact with DNA through Lennard-Jones forces), ω LJ BBij = 0 (binders do not interact through Lennard-Jones), and ω C BDij is a timedependent manifestation of the connectivity that arises due to binding interactions. This behavior is based on the Bell model of biological interactions,(4) and the following manifestation in Brownian Dynamics is adapted from previous work on biological systems. (5,6) The matrix ω C BDij , which we will denote as ω R,ij for brevity, represents the accounting of all the binding interactions between proteins and DNA monomers. In a nonbonding scenario, ω R,ij = 0 and the harmonic potential in Equation S2 is not applied between the two species. The c 2013 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. possibility of binding, and consequently unbinding, occurs through a Monte-Carlo type update step that is applied every time interval τ 0 . We note that this time interval is chosen such that computational expediency and accurate statistics were obtained, however otherwise the selection is arbitrary; the jump frequency is therefore defined as ν 0 = 1/τ 0 in the simulation, however in principle the combination of this jump frequency with the energy barrier allows for a redefinition of the absolute energy scale used to describe results. Each time interval τ 0 , the matrix is recalculated based on the update step: where Ξ is a random number between 0 and 1, chosen randomly for each i and j in every update step τ 0 . Energies denoted with tildes are normalized by k B T (i.e.Ẽ U B = E U B /(k B T ); positions and times can also be normalized by the bead radius a and single bead diffusion time 6πηa 3 /(k B T ) = τ D , which when referenced in this manner will also be denoted with a tilde. r RXN (= 2.1a) is a reaction radius which sets the spatial limit within which binding can occur. For this simulation, the time step of the Langevin simulation is δt = 0.002τ D and the Monte Carlo update step is τ 0 = 0.05τ D .
What the ω R,ij matrix conceptually does is it places a spring connection between spatially adjacent beads if they meet the Boltzmann factor-based criterion for overcoming the energy barrier to transition into a bound state (and vice versa). This reproduces the appropriate thermodynamics for a binding pair,(5) and provides a means to change the binding kinetics of a single interaction through changing the unbinding barrier ∆Ẽ U B and the binding energy ∆Ẽ 0 independently. In our system, we fix the value of ∆Ẽ B to reflect the idea that the relevant time scale for binding τ B is on roughly the same order of magnitude as the binder diffusion. With a value of ∆Ẽ B = 3.0, we reproduce this behavior since τ B =τ 0 e ∆Ẽ B ≈ 1.0τ D . The binding barrier and binding energy are therefore changed simultaneously, and we report these values in terms of the binding energy ∆Ẽ 0 .
The overall simulation takes place in a box with periodic boundary conditions 100a×100a×200a with the DNA molecule represented by N beads centered along the long axis of the simulation box (see simulation snapshot in Figure 1). To expedite the simulation and compare it to systems where the DNA is tethered and stretched significantly such that its thermal motions are highly constrained, we do not update the position of the DNA beads in the iteration through the Langevin equation and consider it immobile in our model.

CHAIN-LENGTH DEPENDENCE
One common consideration in the study of DNA/protein interactions is the length and topology of the DNA chain itself, (7,8) especially since there are hypothesized states Figure S1. Chain length-normalized exchange kinetics (ln n B /N versus t) for ∆Ẽ 0 = −5.0, and a number of different lengths of DNA N (N = 10 dotted, N = 25 dashed, N = 50 solid) and concentrations c (c = 50nM black, c = 500nM red, c = 5µM blue). The actual kinetics, upon normalization by N do not change a great deal. Differences only manifest at low N = 10, which we attribute to edge effects. These effects lead to quantitative differences, but the same qualitative behavior is observed. The normalization of n B by chain length N suggests a local process of concentration-enhanced unbinding.
where the binding protein is unbound but still affiliated with a nearby DNA chain.(8) Such effects may be due to long-distance chain-chain correlations, and recent work has elucidated the geometric possibility of a random walk escaping from the vicinity of a chain of various dimensions (straight chain, random coil, and collapsed globule) without rebinding. (7) These investigations suggest that rebinding is not strongly dependent on length of a straight DNA segment like the ones in our simulation,(7) however we nevertheless test to see the importance of such effects by adjusting the length of the DNA chain.
In the analytical results, Equation 10 does not contain any reference to the length of the chain. The numerical results likewise do not include the chain length to be a factor in the matrix k ij , suggesting that the dynamics of these systems is independent of chain length. This is further backed up by the highly local g(j) functions in Figure 2c. Therefore, we expect that the decay in the fraction of binders that remain bound to the chain during an exchange experiment ( n B /N ) should decay with the same time evolution n B /N = f (t) = f (t,N ) in a way that does not depend on N . At large N this holds true, though edge effects appear at smaller values of N . This is illustrated in Figure S1, which plots the fraction n B /N as a function of time t for N = 50,20,10. N = 50 and N = 20 are essentially identical (within simulation error). This significantly suggests that, at least at the large N limit, this scaling holds true ( n B ∼ N ). The time evolution is qualitatively similar, but at N = 10 edge effects apparently start to influence results. We anticipate that these edge effects are due to the oscillatory correlation functions shown in Figure 2, which extend a distance of ca. 3 sites away along the chain. This suggests that beads less than 3 Figure S2. Exchange kinetics (ln n B versus t) for ∆Ẽ 0 = −5.0 and N = 2, with a number of different concentrations c. c-dependent unbinding is observed, on the same order of magnitude as in Figure S1 and Figure  3 however quantitative differences arise due to chain end effects which dominate at N = 2. This limiting case is often used in experimental investigations,(9) and we demonstrate here that c-dependence must be taken into account even in small DNA oligomer investigations.
indices away from a chain end have different equilibrium and therefore dynamic behavior than long chains. This would alter the quantitative (but not qualitative) picture in a complicated fashion.
The kinetics of DNA-protein binding interactions are often observed using DNA oligomers with sequences that specifically attach to the binding proteins of interest. (9) This results in a situation that in the context of our work is N = 2. The derivation of our analytical and numerical theory had long DNA chains in mind, however in principle there is nothing about the shortness of the chain that prevents the same qualitative mechanism from applying in this special limiting case. Indeed, the states considered in Figures 3 and 5 only involve 2 binding sites. To illustrate that this concentrationdependent unbinding effect is indeed still relevant, albeit quantitatively altered, we ran a few simulations at ∆Ẽ 0 = −5.0 for a number of concentrations c, which are plotted in Figure S2. Clearly, concentration plays a prominent role that must be accounted for in these experiments.

RANDOM BINDING EFFECTS
In experiment, it has been observed that large swaths of binders remain bound for a large amount of time, and do not leave even at long times after the apparent relaxation time scale. These are not homogeneously dispersed; it appears that large numbers of binders are clustered at positions along the chain, an effect which may be due to sequence heterogeneity. In a DNA chain, there are a limited number of permutations of sequence spanning the two dimers in a chain. Nevertheless, it is not clear what the connection is between binding sequence and binding strength is in these systems. Therefore, we investigate the behaviors resulting from a non-constant binding strength and consider a system that represents the non-constant binding extreme situation: a fully randomized binding strength. Such random binding energies are known to have significant effects on the one-dimensional diffusion of protein-binding DNA,(10) and we can explore the possibility of strong effects in this dynamic process as well. We introduce this into the system by considering a binding energy ∆Ẽ U B = ∆Ẽ U B,0 +ξ B,sγS that represents a Gaussian distribution of random corrections to the mean binding strength ∆Ẽ U B,0 . This is characterized by a magnitude of the random energetic contributionγ S and a set of random numbers ξ B,s that each correspond to a DNA position s and are distributed around ξ B,s = 0 with Gaussian distribution such that ξ B,s ξ B,s = δ s,s . Figure S3 demonstrates the effect of including a random binding energy at ∆Ẽ 0 = −7.0 and two extreme values of c (50nM and 5000nM). Dynamics at both concentrations are essentially unchanged upon small random deviations being introduced (γ S < 1.0), however the time scale and long-time limit of the decay both increase significantly when there is a large amount of variance in the binding energies (γ S = 1.0,2.0). This suggests that sequence heterogeneity may be a primary reason for the experimental observations that the decays are not to n B = 0 in Figure 6, which is the main difference between our predictions and the experimental data in Graham,et al.(11)

TRANSFER-MATRIX THEORY FOR DIMER BINDING EQUILIBRIUM
In order to understand the equilibrium properties of these DNA-protein binding simulations (and subsequently the behavior of the experimental systems) we use a transfer matrix calculation of the partition function. This method is wellknown in statistical mechanics, and we demonstrate how an abundance of dimer binding statistics can be obtained upon using these tools. As a simplified representation of such a calculation, we can determine the partition function (and number of bound binders n B ) for a system of monomeric binders. This will yield Equation 2, which can be determined in a number of alternative routes.
To calculate the grand partition function, we must calculate: where we use a vector notation and n B represents the occupation (each component i is 1 or 0 for bound or unbound at index i respectively) state of the system, and ∆ E 0 and µ are the vectors where the ith components are the binding energy upon binding at i and chemical potential at i respectively. This partition function can be calculated, rather than a direct sum, by a multiplication of matrices M i,j that represent the conditional probabilities of the state at the index i based on the possibilities of what the state was at index i−1. For this system, such a matrix is: where P = e −∆Ẽ+μ is the contribution to the partition function of having a bound binder at an arbitrary index (we proceed with the homogenous case that ∆ E 0 = 1∆E 0 where all indices have a binding energy ∆E 0 ) and 1 is the contribution to the partition function of having an unbound position. The matrix provides all the contributions to the partition function at position i given all the contributions at position j. The partition function for a chain of length N is then: where φ N = φ 0 = (1P ) are the contributions from the first and last positions. This iterative equation is tedious, but as long as N is large the result can be well-approximated by the largest eigenvalue of M ij , λ 0 , to the N -th power: The eigenvalue of the above matrix M i,j for the monomeric binding (Equation S5) is λ 0 = (1+P ), so for monomer binding we have: The thermodynamic relationship −k B T lnΞ = G−µn P (the value of n P counts the number of singly plus doubly bound dimers as opposed to the occupied binding sites n B ; the two values are equivalent for monomers but not for dimers) allows the determination of n B from Ξ: which is the result presented in the manuscript. The same method can be used to determine the much more complicated scenario of dimeric binding. In this situation, the matrix M i,j is 3×3 and the states are unbound, bound by the first monomer in the dimer, and bound by the second monomer in the dimer. We write this matrix as: where the first row corresponds to the possibility of moving from any state (unbound, first monomer, second monomer) to the unbound state, the second row corresponds to the possibility of moving from any state to the first monomer in a dimer being bound (with the contribution P the same as earlier), and the third row corresponds to the possibility of moving from only the first monomer in the dimer to the second monomer in the dimer (with the contribution P = e −∆Ẽ 0 ). P does not include the chemical potentialμ since the binder has already bound once upon the chain and has already moved from outside the system. An additional contribution of 1 to account for the loss of rotational degrees of freedom upon binding is not included, since the simulation constraints still permit significant rotational freedom even in the bound state. We attribute the slightly non-constant nature ofμ 0 , which varies by about k B T /2 in our fits (−μ 0 = 14.1,13.4,13.2,13.1 for c = 0.05,0.5,2.5, and 5.0µM respectively), to such degree of freedom ambiguities. The largest eigenvalue for this matrix is given by: which yields the result: which is the result in Equation 2. We note that any number of other similar calculations are possible using this method. We can introduce an artificial field µ 2 , for example, that acts only on states of interest. For example, we can calculate the prevalence of singly-bound states using the matrix: where we have included the factor P = eμ 2 on the components that are characteristic of singly-bound states (i.e. those transfer movements that involve the first monomer on a dimer going to anything other than the second monomer on the same dimer). The largest eigenvalue can be found to be: and the number of singly bound states n * B is:

MASTER EQUATION NUMERICAL THEORY
We presented an abbreviation of the numerical approach to calculating exchange kinetics using the Master Equation representation of the simplifies states of the system. To provide a more thorough picture of how we carried out this calculation, we discuss the development of this theory in more detail. The starting point for a Master Equation approach is the Master Equation itself: where the matrix of rate constants k ij has units of 1/time. We defined the full matrix conceptually in Figure 3 and mathematically in Equation 5. This allowed the writing of an evolution equation upon expanding time evolution using small time increments ∆t: where the matrix δ ij +∆tk ij can be thought of as an operator that evolves the state at time t, φ j (t), to a later time φ j (t+∆t). We can therefore numerically solve for the exchange process at time t = ∆t×n: where the initial state is that all the binders are fully bound φ j (t = 0) = δ j0 . In principle, this works for any system so long as k ij is appropriately defined, and the states φ i can be articulated in an unambiguous fashion. This is often practically difficult due to the abundance of independent and interrelated states that can be defined. In our system, we provided a simplified picture that focuses on a minimal representation of φ i such that we can define a relatively straightforward representation of the rate constants k ij , an approach that has found success in similar systems. (6) To determine the dimers that have separated from the DNA chain and moved outside of the simulation box (and hence become untagged), we must convolute the unbound states with the Green's function G(0,E;τ ) to diffuse to the edge E of the box: This equation describes the incremental increase in occupation of state i = 5 due to the binder processes indicated by Equation S18, and subsequently the binder propagates diffusively (via the Green's function G) to the boundaries of the simulation box. It is this process which dictates the decay of the original bound population of binders n B,0 . While in a perfectly cylindrical region around the DNA molecule we could, in principle, calculate the exact Green's function since 2-dimensional diffusion is a well-known process. (12) We use an approximate form, since for this aspect of the calculation it is not imperative that we have anything too complicated. For this system, we use the approximation: where τ DE = N 2 a 2 /D is the diffusive time scale for a dimer with diffusion constant D to reach a distance of 2N a (the distance from the center to one of the faces). This method, upon quantitative comparison with the simulation results, permits for a much more rapid calculation of the exchange kinetics; instead of iterating through the non-deterministic (i.e. multiple runs are needed to evaluate averages) Langevin equation for ca. 100−900 species, we are able to iterate through what is essentially 5×5 matrix multiplication. Since the time scales of the experimental results are on the order of 1000s, this results in 1×10 10 −10 11 iterations and is prohibitively long for a Langevin simulation.