Lac Repressor Mediated DNA Looping: Monte Carlo Simulation of Constrained DNA Molecules Complemented with Current Experimental Results

Tethered particle motion (TPM) experiments can be used to detect time-resolved loop formation in a single DNA molecule by measuring changes in the length of a DNA tether. Interpretation of such experiments is greatly aided by computer simulations of DNA looping which allow one to analyze the structure of the looped DNA and estimate DNA-protein binding constants specific for the loop formation process. We here present a new Monte Carlo scheme for accurate simulation of DNA configurations subject to geometric constraints and apply this method to Lac repressor mediated DNA looping, comparing the simulation results with new experimental data obtained by the TPM technique. Our simulations, taking into account the details of attachment of DNA ends and fluctuations of the looped subsegment of the DNA, reveal the origin of the double-peaked distribution of RMS values observed by TPM experiments by showing that the average RMS value for anti-parallel loop types is smaller than that of parallel loop types. The simulations also reveal that the looping probabilities for the anti-parallel loop types are significantly higher than those of the parallel loop types, even for loops of length 600 and 900 base pairs, and that the correct proportion between the heights of the peaks in the distribution can only be attained when loops with flexible Lac repressor conformation are taken into account. Comparison of the in silico and in vitro results yields estimates for the dissociation constants characterizing the binding affinity between O1 and Oid DNA operators and the dimeric arms of the Lac repressor.


Introduction
In living organisms, the level of protein production must be optimally adjusted in order to ensure the adaptation of the organisms to environmental changes. Therefore, the mechanism of transcription regulation must be tightly controlled by elements such as DNA binding proteins that, often with the help of an inducer molecule, bind to specific DNA sites and promote or repress the transcription activity of RNA polymerase. The action of some of these regulatory proteins involves DNA looping, a process in which the protein binds simultaneously to two sequentially remote sites along a DNA molecule and brings these sites to a close proximity by bending the intermediate DNA segment [1,2].
The functional relevance of DNA looping in transcription regulation has been revealed by the intensively studied, and so far the best understood, regulatory mechanism of the lac operon in E. coli [3]. In this system one dimeric arm of the Lac repressor tetramer binds to an operator site (O1) near the promoter resulting in inhibition of expression of the genes coding the proteins involved in the metabolism and transport of lactose. A loop can be formed by a simultaneous binding of the other dimeric arm of the Lac repressor tetramer to one of two auxiliary operators located 410 bp (O2) downstream and 82 bp (O3) upstream of O1. Such a looped state has a greater transcriptional repression due to a stabilization of the interaction between the Lac repressor and the operator O1 [4], which, in part, can be explained by the statistical mechanics of the system: the fact that the looping protein is bound to one operator increases its concentration in the vicinity of the other operator. The process has been studied extensively in vitro by gene expression measurements [1,2,5].
Various techniques have also been developed to study DNA looping in vivo. Since the formation of a DNA loop in a linear DNA segment results in a shortening of the mean end-to-end distance of DNA ends, it can be detected and quantified by single-molecule manipulation experiments. In one such experiment, called tethered particle motion (TPM) experiment, a single linear DNA molecule with two protein binding sites is attached at one of its two ends to a microscope coverslip and at the other to a large bead that can be tracked by light microscopy, and then looping protein is inserted into the solution. By observing the position of the bead as a function of time one can estimate the length of the DNA tether; a shortening of the tether that persists over a predetermined time window is interpreted as an indicator of the presence of a loop. By measuring the mean time spent in looped and unlooped states one can track the kinetics of loop formation and breakage [6]. This type of setup was first used for measurements of the relative motion between a single DNA and RNA polymerase during transcription [7,8], and later was utilized for the study of lacrepressor mediated loop formation [6].
Because of the symmetry of the two identical dimeric arms of the Lac repressor, an operator can bind to each arm in two possible orientations yielding four distinct loop types: two parallel loop types P1 and P2, and the two antiparallel types A1 and A2. Similar loop types were suggested to be formed by the binding of two dimeric galR proteins to two operators in the E. coli gal operon [9]. In addition, arguments have been presented for the existence of another loop type in which Lac repressor attains an extended conformation in which the hinge, i.e., the four-helix bundle tetramerization domain, permits the two dimeric arms of the Lac repressor to extend out from the nearly rigid ''V'' conformation [10]. The evidence for existence of an extended Lac repressor conformation ranges from electron microscopy [11], modeling of DNA footprinting and looping free energy measurements [12], gene expression by very small Lac repressor-DNA loops [13], modeling of in vitro binding assays [14], and FRET measurements of distances between DNA constructs bound to LacI [15,16]. Especially the latest FRET results of Haeusler et al. [17] show convincingly that the flexibility of Lac repressor during looping process cannot be ignored.
The TPM measurements can be used to shed light on the structure of the loop. If the length of the loop is known from the positions of binding sites for the looping protein, then the change in end-to-end distance measured by TPM can be used to infer geometrical and deformational details of the looped protein-DNA structure. The influence of the flexibility of the binding protein on loop formation was investigated by Vanzi et al. [18], who showed an increased mean duration of looped states with a more flexible Lac repressor hinge region. Recent TPM experiments showed that the motion of the tethered bead gives rise to two-peak distribution [19] of the values of the end-to-end distance that correspond to looped states. Unfortunately, TPM measurements do not provide direct information about the structure of the loop, and in order to confirm or disprove any hypotheses about the type of the observed loop one needs to construct a model that provides, for each loop type, the end-to-end distribution of DNA and hence the observable length of the tether.
A number of coarse-grained DNA models can serve as a framework for the study of the statistical mechanics of DNA looping [20]. Continuum elastic rod model, which treats DNA as an ideal elastic rod, i.e., thin elastic body that is inextensible, intrinsically straight, transversely isotropic and homogeneous [21] is the basis of the classical helical worm-like chain model [22] that has been used in many papers to study DNA supercoiling, topoisomer distributions, and single-molecule stretching experiments. More recently discrete base-pair level models have been developed, which account for the dependence of DNA elasticity on sequence [23,24]. Statistical mechanics of DNA molecules of length in the order of a few hundreds to several thousands base pairs has been studied using Gaussian sampling approach [24,25,26,27], which utilizes the multivariate quadratic form of DNA deformational energy. The Gaussian sampling method can generate efficiently a very large canonical ensemble (up to 10 14 configurations) of uncorrelated configurations. A model of this type has been used in the interpretation of TPM looping experiments reported in [19,27] to suggest that the observed end-to-end distribution is the result of the competition of several loop types (extended Lac repressor loop was not among them), each of which corresponds to particular geometrical constraints applied to the DNA by the protein.
However, a significant drawback of Gaussian sampling is that only a very small fraction of sampled configurations obeys prescribed end conditions. Furthermore, this method cannot be utilized when other contribution to the total energy, such as the intra-molecular electrostatic energy of a DNA molecule in solution, is taken into account [28]. For these reasons Metropolis Monte Carlo simulations [29,30,31] have been developed, based on the classical Markov Chain Monte Carlo method of Metropolis and Hastings [32], in which a model DNA is changed by small deformations and any new configuration is accepted with a probability that depends on the difference between its energy and the energy of the previous configuration.
In this paper we describe a new Metropolis Monte Carlo algorithm which enables us to generate large canonical ensembles of DNA configurations subject to any composition of geometric and topological constraints, including looped configurations of given topology, closed configurations, and configurations with preassigned end-to-end (or, more generally, site-to-site) distance. This algorithm utilizes explicit expression for the Jacobian of the mapping between DNA deformational parameters and end-to-end conditions, derived in [33], and successfully overcomes a significant difficulty: applying reversible perturbations that preserve the given geometric constraints. This scheme differs from other Monte Carlo schemes utilizing local perturbations such as the crankshaft move and global perturbations such as the slithering move, introduced in [29], which cannot be used for a nonhomogenous sequences and are limited for circularized molecules or molecules subject to fully clamped end conditions.
We also present the results of the application of the new Monte Carlo scheme to simulations of TPM studies of Lac repressor mediated DNA looping. In these simulations, as in the experiment, DNA is attached to a rigid substrate at one end and fluctuating bead at the other, and a loop may be formed by bridging operator sites with bound Lac repressor. The scheme enables us to take into account the fluctuations of the looped segment of the DNA, an effect that was neglected in [19,27]. We calculate looping probabilities for a prescribed loop type and use this information to analyze the likelihood that a loop of a particular length has a given topology. By comparing our computational results to our TPM experimental results we estimated also the dissociation constants associated with the binding of the Lac repressor to each of the operators.

Methods
Our theoretical model is based on the familiar naturally discrete model for DNA elasticity [23,34,28] in which the intramolecular electrostatic interactions and their dependence on the ionic strength in the aqueous media are taken into account [28,33]. In our simulations of TPM experiments all the excluded volume effects such as the impenetrability of the DNA molecule, the bead, and the plate, are taken into account. We start with an introduction of the underlying theory in the following subsection.

DNA model
We here employ the theory presented in [23,28,33], in which the energy of a DNA molecule with Nz1 base pairs is determined when there is given, for each n, both the location x n of the barycenter of the n-th base pair and an orthonormal triad (d n 1 ,d n 2 ,d n 3 ) that is embedded in the base pair as shown in Figure 1. The total energy, U, of a DNA configuration is taken to be the sum of elastic energy and electrostatic energy W, The elastic energy of a configuration is taken to be the sum over n of the energy y n of interaction between the n-th and the (nz1)-th base pairs, i.e.,~X N n~1 y n : The local elastic energy y n associated with the n-th base-pair step is given by a function of the relative position and orientation of the n-th and (nz1)-th base pairs, which can be parameterized by six kinematical variables: shift, slide, and rise, (r n 1 ,r n 2 ,r n 3 ), which describe local shearing and extension (i.e., stretching), and tilt, roll, and twist, (h n 1 ,h n 2 ,h n 3 ), which describe local bending and twisting of the molecule [35,28] (See Figure 2). The displacement deformations (r n 1 ,r n 2 ,r n 3 ) have only a small influence on the end-to-end distribution of long DNA compared with the angular deformations, because each angular deformation contributes a change in the end position that is proportional to the distance of the end from the location of that deformation. A single displacement deformation contributes, on average, 0.1 nm to the position of the end of the DNA, while an average bending deformation of 5 degrees, if occurring in the middle of a 300 bp segment, contributes about 4.5 nm to the position of the end. Therefore, in order to reduce the number of kinematical variables and make the calculation computationally feasible we shall assume that the values of the shift, slide, and rise of each base-pair step are constants. This simplification reduces by one half the number of degrees of freedom results in about 10 times shorter simulation times.
As a further simplification, the local elastic energy h n i is assumed to be a quadratic form in the excess tilt Dh n 1 , the excess roll Dh n 2 , and the excess twist Dh n 3 defined as where o h n i are the intrinsic values appropriate to a stress-free (minimum energy) state of the n-th base-pair step. Thus, The elastic moduli, F n ij~F n ji , and the intrinsic parameters are constants that may depend on the nucleotide composition of the nth and (n+1)-th base pairs. Here, for simplicity, we assume that the molecules are transversely isotropic and that their elastic properties are independent of the sequence. This is done in order to determine whether the experimental results can be explained without resorting to sequence-dependent effects. Thus, we assume for n~1,:::,N. In the present case we take a value for the two bending moduli that gives rise to a persistence length of 476 , namely, with k B Boltzmann's constant and T the temperature which is taken to be 300 K. The electrostatic energy of a configuration has the form where Q mn is the electrostatic energy associated with the interaction of the m-th and the n-th base pairs of the DNA molecule. As an approximation it is assumed that the two negative charges associated with each base pair are located at the barycenter, x n , of that base pair [36]. Following Manning's theory of charge condensation [37] the energy Q mn is given by (See also the discussion of Westcott et al. [36]), E 0 is the permittivity of free space, and E w is the dielectric constant of water. In accord with Manning's theory [37,38], q is set equal to 24% of the charge of the two phosphate groups associated with each base pair, i.e., q~2|0:24e { , where e { is the charge of an electron. The DNA molecule is assumed to be in a solution of water and monovalent salt (e.g., NaCl) of concentration c. The Debye screening parameter k is given by the formula in which c is measured in the units of moles per liter and k in the units of A {1 . For the present work we take c to be equal to its physiological value: c~0:1M. Although the present value of the salt concentration, c~0:1M, yields a non-negligible repulsive intra-molecular interaction, under our assumptions, the magnitude of the resulted repulsive force between sequentially remote sites that are almost in an immediate contact, is not strong enough to avoid self penetration. We therefore implemented the theoretical model introduced in [33] in which the DNA is regarded as a tubelike structure composed of the union of rigid cylinders connected by spherical joints. Accordingly, every generated configuration with self-penetration is rejected.

Generation of constrained configurations
When, as in the present work, the displacement parameters are given as preassigned constants, a configuration of a DNA molecule with Nz1 bp is determined (up to a rigid body transformation) by the set of 3N angular variables~fh m 1 ,h m 2 ,h m 3 g, m~1,2,:::,N. A loop, comprised of M bp, with MƒN, is formed when two sequentially remote sites along a single DNA molecule are attached to a single linker protein and, as a result, are brought into a close proximity. A linker protein that is simultaneously attached to two base pairs, say n A and n B , with n A {n B~M {1, confines the two termini of the intermediate DNA segments to six constraints prescribing the relative orientation and displacement between the bound base pairs: The numbers l X i and Q X ij are the components of the displacements and the orthogonal transformation between the two termini of the loop, and are determined solely by the structure of the linker protein.
Here, for simplicity, we associate the boundary conditions with the middle base pair of each binding site, i.e., the termini of the loop are taken to be base pairs n A and n B . Because a single linker protein may form loops of several distinct topologies, we indicate the topology by superscript X . A looped configuration satisfying equations (12) gives rise to independent constraints of the form: The number of constraints, N c , is six, but can be reduced to any value from one to five by relaxing some of the constraints. For example, N c~1 , when the loop is free of external moments, i.e., when the two termini of the loop are free to rotate (as in the case of spherical joints), and hence only the distance x n A n B is prescribed, or, N c~3 , when only the left hand side of equations (12) is applied.
A looped DNA molecule in a TPM experiment can be regarded as a composition of 3 segments separated by the middle base pair in each of the two binding sites: the two terminal segments with base pairs (1,:::,n A {1) and (n B z1,:::,Nz1), are attached to a planar plate (e.g., a microscope cover-slip) and a bead, respectively, and the intermediate segment that may form a loop.
Metropolis Monte Carlo technique requires that a configuration satisfying constraints (13) is randomly perturbed in such a way that the constraints are still satisfied. Suppose that in a given move the chosen segment is the intermediate looped segment. To perform a change in the configuration of the looped segment that does not perturb equations (13), we randomly select four base-pair steps, n A ƒn 1 vn 2 vn 3 vn 4 vn B , and change only the 12 angular variables, fh n1 1 ,h n1 2 ,h n1 3 ,h n2 1 ,:::,h n4 3 g while holding all the remaining variables fixed. For simplicity of exposition we set these variables to be the 12 components of a, a~fa 1 ,a 2 ,a 3 ,a 4 ,:::,a 12 g~fh hold, we first calculate the 6 linearly independent 12-dimensional unit vectors b a that span null(J), the null space of J. With the basis b a , a~1,:::,6, in hand, we set a random move that satisfies the linearized constraints (16): The 6 numbers j a are randomly generated to yield a random vector D r uniformly distributed in the volume bounded by the (projected) hypersphere of radius x, i.e., ED a r E 2P 12 t~1 (Da r t ) 2 ƒx 2 . According to this procedure, whenever a o satisfies equations (13) any non-zero variation D a r gives a new configuration, a o zD a r that, although obeys the equations (16), does not necessarily satisfy (with high enough precision) the constraints (13).
To complete a single move in our Metropolis algorithm, we calculate the unique correction D a c~JT l in the subspace, range(J T ), for which a n~ao zD a r zD a c satisfies the constraints (13). Since the correction, D a c , is restricted to a six-dimensional space, the problem is now reduced to a nonlinear system of the six a H a equations (13) with the six unknown components of l. Implicit function theorem and the smoothness of the hyper-surface characterized by equations (13) assure the uniqueness of D a c for a small enough value of ED a r E. Thus, for the range of values of x appropriate for the Metropolis Monte Carlo algorithm used here there exists a 1-1 mapping from the configuration a o zD a r to a n . In the calculations performed here we took x to be equal or less than 5p=180. This restricts a single change in each angular variable to be of a magnitude smaller than 5 degrees. The unique solution for l is here calculated using a modified Newton-Raphson scheme which gives a solution, a n , of the equations (13) to within machine accuracy in no more than 3 iterations.
The method described above can be easily utilized for any number of constraints. For the sake of assuring a strict detailed balance condition, a new configuration a n is accepted only if the total move is within the hyper-sphere of radius x centered at a o , i.e., we require that This, together with the uniform distribution of D a r implies that the probability p(a o ?a n ) to move from a o to a n equals p(a n ?a o ). A schematic description of the scheme for 3dimensional problem with two constraints is shown in Figure 3. The shaded disc represents the null space of J, and the normal to the disc describes its orthogonal complement range(J T ). We note that the very important calculation of the entries of the Jacobian matrix J is done based on analytical expressions in a way similar to the numerical scheme for finding equilibrium configurations of DNA molecules introduced in [28]. This improves significantly both the numerical stability and efficiency of the scheme.

Statistical Mechanics of DNA configurations
To calculate a canonical ensemble of configurations obeying the Boltzmann distribution we utilize Metropolis Monte Carlo algorithm in which configuration is changed incrementally using a move that is acceptable, i.e., in accord with the constraints (13), under strict detailed balance and ergodicity conditions [39,40] assuring unbiased sampling and possible explorations of the complete configurational space. Then we calculate the total energy of the new configuration and apply the acceptance criteria of the Metropolis scheme. The DNA, the attachment plate, and the bead are all modeled as rigid impenetrable objects and hence any configuration with DNA-DNA, bead-DNA, bead-plate, or DNAplate spatial overlap are rejected. The impenetrability of DNA is treated in accord with the model introduced in [33], in which a DNA molecule is treated as a union of rigid spheres separated by rigid cylinders.
The standard acceptance criteria according to the Metropolis Monte Carlo scheme [32] used here yield a canonical distribution of configurations. Thus, when the displacement parameters are held fixed, the statistical weight of a configuration H with total energy U(H) is proportional to the Boltzmann factor e {U(H)=KBT . The configurational integral of a molecule has the form where the integral is taken over the domain of configurations obeying the impenetrability constraints and any other set of geometric constraints relevant to the investigated case. The Jacobian factors J m (not to be confused with the Jacobian matrix J used in the previous subsection) are necessary for the change of variables from canonical parametrization to the non-canonical parametrization used here [41]. This is important because any set of values for the angular variables, , that is rendered from a random number generator with uniform distribution does not yield a sample of the triads (d m 1 ,d m 2 ,d m 3 ) that is uniformly distributed in the group of proper rotations. It can be shown (see e.g., the appendix of [28]) that for the El Hassan and Calladine parametrization used here we have Equation (20) can also be obtained using the formulation suggested in [41]. The Jacobian J m (h m 1 ,h m 2 ,h m 3 ) was taken into account in our numerical scheme.
It is convenient, for calculations of loop probability, to use the axis-angle representation of proper rotations, in which a rotation is characterized by a unit vector describing the axis of rotation and a rotation angle about it. The use of this representation is discussed later in this section. The axis of rotation is defined by an azimuthal angle b 1 and a polar angle b 2 , and the rotation about it is given by b 3 . In a similar way to the derivation of equation (20), it can be shown that the Jacobian associated with this parametrization is given by, Looping probabilities For the purpose of calculating looping probabilities associated with TPM experiments, we generate four different canonical ensembles: f1g DNA configurations with no loops. f2g Configurations with ''freely-jointed loops'', for which only the distance, l, between the two loop termini (centers of binding sites) is held fixed. f3g Configurations for which the barycenter of one loop terminus is fixed in position relative to the other terminus, but the terminus is free to have any orientation. f4g Configurations containing loops of prescribed topology, for which the two loop termini have fixed relative position and orientation and the loop has fixed The surface can be thought of as a representation of a single constraint for a system with three degrees of freedom. The random move, D a r is along the tangent plane to surface (the shaded discoid) spanned by null(J), and the correction move, D a c is along the normal to that plane. The complete move must be bound by the shaded sphere, so that the probability to move from a o to a n is equal to that of the reverse move. doi: 10 linking number. (The linking number is a topological invariant equals to the number of times one of the strands of a circularized DNA molecule is linked with the other.) Since a loop is not a closed DNA segment, for the purpose of computing its topological and geometrical properties we close the looped segment by including the terminus-to-terminus step as an additional virtual base-pair step. The linking number associated with a loop is given by the relation (Precise definitions are given in [42] and [43].) We calculate the writhe, W r , according to the scheme proposed in appendix B of [44] and the total twist (in turns), T w , using the formula proposed recently by [45]. The loop with rigid V-shaped Lac repressor can be one of four types suggested in [12]: two parallel loop types P1 and P2, and the two antiparallel types A1 and A2. (See Figure 4).
Although for each such type the relative orientation between the two base pairs associated with the loop termini is precisely defined, each group may contain multiple loops with topologies differing by the linking number. When one has in hand 4 canonical ensembles of the conditions described above, one can use ensemble f1g to calculate the probability pf2D1g of having configurations obeying condition f2g given that they obey condition f1g. This is done by recording the number of configurations from ensemble f1g for which the distance l between the loop termini obeys ' 0 vlv' 1 . Similarly we denote p X f3D2g (or p X f4D3g) as the conditional probability of satisfying f3g (or f4g) under f2g (or f3g) for loop type X . For the calculation of p X f3D2g we used the ensemble f2g to record the number of configurations for which one loop terminus is within the volume of a cone that is fixed with respect to the other loop terminus and its vertex coincides with the center of that terminus. To estimate p X f4D3g we use the axis-angle representation to calculate, for each configuration in ensemble f3g, the angle of rotation, b 3 , needed to bring one loop terminus to a relative orientation (with respect to the other terminus) that is equal to that of the specified loop type X . For this calculation one records the number of configurations in ensemble f3g for which the angle of rotation is close enough to zero. Ensembles obeying condition f4g are used for calculations of the distribution of the measurable projected (onto the plate) end-to-end distance, r xy , between the bead and the DNA end that is attached to the plate.
We here calculate the J-factor, J X , associated with a given loop type X as follows: Where N A is Avogadro's number, and V C is the volume bounded by the surfaces of spheres of radii, ' 0 and ' 1 , and a cone with vertex angle w 0 (with the centers of the spheres and the vertex of the cone in a coincidence.), i.e., The volume, V, of the group of proper rotations is calculated using equation (21): For all the calculations reported here we used ' 0~6 7 A, ' 1~7 7 A, w 0~5 o , and v 0~1 0 o . We found these values sufficiently small to be close enough to the limiting values in which ' 1 {' 0 ?0, w 0 ?0, v 0 ?0, but in the other hand high enough so that the flexibility of the Lac repressor can be taken into account [27]. All ensembles consisted of 10 8 configurations.
We consider the possibility that when a Lac repressor is bound to both operators it may be either in its V-shape conformation or in an open conformation in which its two dimeric arms are open [46] [12]. To mimic the open conformation we regard the two dimeric arms of the Lac repressor as if they are connected through a spherical joint as shown in Figure 5. Accordingly, each arm is free to rotate (together with the confined lac-repressor head group) about the joint.

Dependence of end-to-end distribution on Lac repressor concentration
For our calculations bearing on the dependence of the distribution of r xy on the concentration, [LacI], of the tetrameric Lac repressor we follow the theory proposed by [19,27]. In the TPM experiments discussed here, the DNA sequence includes two operator sequences (binding sites), O1 [47] and Oid. (Although in vivo looping occurs between operators O1 and O2 or O1 and O3, looping experiments generally utilize the ideal operator Oid,  V˚˚ which binds much more tightly to the Lac repressor. The reason is that a loop with operator O2 or O3 has too brief a lifetime to be observed in a TPM experiment. The use of Oid alters the proportion of looped and unlooped configurations, and hence any interpretation to in vivo experiments must be done with care.) The symmetric operator Oid [48,49] was used to achieve higher binding affinity [50], resulting in increased durations of looping events. The consequent large difference between K 1 and K id permits a simplified calculation of experimental loop probabilities. See e.g., equation 4 in [19]. This simplification has been used very recently in [51].
The binding affinities for these operators are characterized by the dissociation constants K 1 and K id . Since the binding affinity of the ideal operator Oid is significantly higher than that of the operator O1 we have K id vvK 1 . The interaction of such DNA with the Lac repressor can be classified into the following states: The probabilities of all states, based on the analysis suggested in [19] are written here, with a modification accounting for the inclusion of the open loop: Where the factor Q is given by and the number 0ƒfƒ1 expresses the overall balance between the V-shape loops and the extended loops. In terms of the binding protein, f is the equilibrium ratio between its V-shaped conformation to its possible open conformation when bound to the two operators. This modification does not change the partition sum from the expression suggested in [19]. Therefore, Z is given by: WhereĴ J is the average J-factor: Dependence of looping probability on phasing To investigate the effect of changing the phasing between the loop termini, we calculate a canonical ensemble f3 Ã g of configurations obeying three translational constraints but only two angular constraints. This means that the center of one loop terminus is fixed with respect to the other terminus, but that loop terminus is free to rotate about its unit vector d nB 3 , which is fixed in space in accord with the prescribed loop type. This ensemble permits us to analyze the dependence of J-factor on the excess link, DL k , required to bring one loop terminus to a complete agreement with the orientation associated with the specified topology. An example of two configurations in this ensemble is schematically depicted in Figure 6.
Although a specified loop can be formed only with integral values of L k , a canonical ensemble of the type f3 Ã g gives the probability distribution of a loop of a given type for any real value of DL k .
For the calculation of the dependence of the J-factor on DL k , we modify the relation in (23) as follows: The angular volume, V Ã , is given by The probability p X f3 Ã D3g is calculated as the fraction of configurations in ensemble f3g for which the angle, u, between d n B 3 and Q X i3 d n A i is such that 0ƒuƒu 0 . Note that equation (12) implies that the triads d n A i and Q X ij d n A i satisfy the loop end conditions associated with the topological group X . For the probability p X ,DL k f4D3 Ã g we count all configurations in ensemble f3 Ã g with excess link value within the interval (DL k {E 0 =2,DL k zE 0 =2). (See Figure 6).

TPM Experiment
DNA fragments used in the TPM experiments, included the two Lac repressor operators, Oid (AATTGTGAGCGCTCACAATT) and O1 (AATTGTGAGCGGATAACAATT) sequences, spaced 600 or 900 bp apart (center-to-center distance). The protocol for the TPM experiments was similar to those published previously [52,53]. Opposite ends of DNA tethers were labeled with biotin and digoxigenin to link a streptavidin-coated microsphere (bead) of radius 160 nm (Spherotech, Inc., Lake Forest, USA) to an antidigoxigenin-coated coverslip (plate). The motion of beads in 10 mM Tris-HCl pH 7.4, 200 mM KCl, 5% DMSO, 0.1 mM EDTA, 0.2 mM DTT and 0.1 mg/ml a-casein, with varying lac repressor concentrations (1 pM to 200 nM), was observed using differential interference contrast (DIC) microscopy. The experimental setup is shown schematically in Figure 7. The position of beads was tracked in real-time and recorded at 50 Hz with an exposure time of t~1 ms. To remove instrumental drift affecting all beads in each field of view, positions were determined with respect to immobile beads in the same field of view. Asymmetric movement of the tethered bead is the simplest indicator of a bead attached to multiple tethers. To exclude these cases, any bead for which the scatter of observed positions displayed an ellipticity ratio greater than 1.07 was discarded from further analysis [54,55].
The point of attachment was calculated for each time window as the barycenter of the xy scatter that includes all the projected (onto the plate) positions of the bead center measured within the associated time window. For the results reported here, 8 second time windows were used. At each recorded time point t, the excursion i.e., the 2D projected distance between the bead and the point of attachment, r xy (t) was determined using The root mean square of the projected distance, r RMS xy , used as a measure of the excursion, was also averaged over 8 second window, Plots of the excursion with respect to time in Figures 8A and 8B reveal how the length of a single DNA tether changes during the course of the experiment due to the formation and breakdown of 600 bp and 900 bp loops, respectively. In each case the value of r RMS xy is close to one of two levels, the lower corresponding to

Calibration curves
A comparison between simulated and the experimentally determined probability distributions for projected bead centerto-tether end distance r xy for the case of unlooped and proteinfree 1632 bp DNA is shown in Figure 9. The largest differences between the theoretical and experimental distributions can be observed at the tail of the distribution (above 330 nm) where the theoretical prediction overestimates the measured distribution. The RMS value of the distribution (i.e., the quantity reported in Figure 10) is very sensitive to the tail of the distribution, which increases demands on the accuracy of simulation. Another potential issue is that the measured r xy distribution utilizes the formula (32) which underestimates the true value of r xy because it estimates the point of attachment by averaging over 8 s window.
Simulated calibration curves, i.e., curves showing the dependence of RMS value of the projected end-to-end distance between the center of the bead and the tethered end, r RMS xy versus the tether length, are shown in Figure 10 together with experimental curves based on the numerical fit of collected data given in [26]. The results show that our predicted RMS value is greater by about 1*20 nm than the experimental results. Nelson et al. [26] found that lowering the persistence length to a smaller value (43 nm) leads to better agreement between the simulation and the data. However, the account of electrostatic repulsion and excluded volume in our simulations results in higher computed values of RMS than those reported in Nelson et al. [26] for the same persistence length (not shown). The discrepancy between our simulation data and the experiment is most likely due to discrepancy between the real experimental electrostatic screening and the value of ionic strength we assumed in the simulations. Other sources of discrepancy could be effects not accounted for in the simulation, such as the intrinsic curvature of the DNA tethers. The discrepancy decreases with the bead size and this decrease is more significant for short tethers. This suggests that the observed reduction of the in-plane motion of the bead could also be related to some form of attractive DNA-bead interaction, such as that caused by the hydrodynamic effects discussed in [56]. All of the discrepancy sources mentioned above are systemic and unlikely to affect relative positions of RMS value nor the ratios of J-factors associated with different loop types studied in the next section.

Dependence of radial distribution on Lac repressor concentration
Our simulations were focused on three cases for which data were available to us: A. 900 bp DNA molecule with the centers of the binding sites, O1 and Oid, located at n A~4 38, and n B~7 65 yielding 326 bp Lac repressor-induced loops, B. 1632 bp DNA molecule with the centers of the binding sites located at n A~4 44, and n B~1 044 yielding 600 bp Lac repressor-induced loops, and C. 1632 bp DNA molecule with the centers of the binding sites located at n A~4 44, and n B~1 344 yielding 900 bp Lac repressor-induced loops.
The case A mimics the molecule investigated experimentally in [19], while the cases B and C correspond to molecules studied in our lab. To match the experimental setup, for the simulation of the 900 bp molecule we assumed a bead radius of 245 nm, while for the 1632 bp molecules the bead radius was taken to be 160 nm. Figure 11 shows computed distributions of the projected end-toend distance, r xy , for DNA tethers with V-shaped loops of type A1, A2, P1, or P2 and with extended Lac repressor loops. In addition, the figure shows calculated distributions of r xy for tethers in which loop is not formed but one or both of the two binding  Table 1. We found that, in each case, antiparallel loop type distributions have the smallest RMS values, followed by the parallel loops and the open loop. Not surprisingly, the largest RMS values were found for unlooped configurations, with RMS value increasing with the decreasing number of bound Lac repressor molecules in all three cases. The antiparallel loops have the smallest RMS distance because the angle between DNA exiting and entering the loop is about 120 degrees, while in parallel loops this angle is about 30 degress (see Figure 4). The variability of RMS distances for looped configurations is about 20 nm for case 900 bp DNA with 326 bp loop, less in the other cases, which is roughly 12% of the RMS distance of free DNA, large enough to be observable by TPM experiment.
Although the results in Figure 11 contain the complete information about the distribution r xy for looped DNA, these probability distributions cannot be compared directly to experimental TPM results as they are usually reported. In TPM experiments, the projected position of the bead is recorded in a rate of about 30-50 frames per second and reported as the distribution of the root mean square value r RMS,s xy (t) averaged over frames taken within a time window of s seconds, as in equation (33). In the TPM experiments reported in [19], s~4 s and hence, each RMS value is calculated from about 120 consecutive values of r xy (t).  Figure 12. In addition to the windowing, it was reported in [19,27] that their TPM experiments were underestimating the true RMS value due to the blurring of the image of the bead caused by a long exposure time (t~30:8 ms) in a single frame taken in the TPM experiments. For case A, which is to be compared with experimental results of [19] we took the blurring into account in computing the distribution r RMS,120 xy . For our experiments the exposure time is t~1 ms, which eliminates blurring effect. (See the discussion in the supplementary information provided in [27]). With this transformation the separation of different loop types is much more apparent. Note that in each case the distribution corresponding to open Lac repressor conformation is wider than distributions corresponding to V-shaped loop types, due to the less stringent constraints on the bound loop resulting from the flexibility of the protein.
Simultaneously with calculation of the distributions we computed the J-factors for each loop type using the scheme discussed in the subsection on the statistical mechanics. The values of J-factors for different loop types are reported in Table 2.
Once all simulations were concluded, we used the calculated values for the J-factors, Equations 26, and assumed values of the binding constants K 1 , K id , and the Lac repressor opening ratio f to predict the joint distribution of r RMS xy as a function of the concentration [LacI] of the Lac repressor. We then optimized the K 1 , K id , and f to obtain the closest fit between the relative occupancies of various looped states in our theoretical prediction and the corresponding values obtained experimentally. The optimal values of the parameters K 1 , K id , and f, for all three cases are given in Table 2.
Resulting optimized distribution for case A is shown in Figure 13, which is to be compared with Figure 3 of [19]. The fitted distribution was able to recover qualitative and quantitative features of the experimental distribution, namely, the decomposition of the distribution into one double-peaked component corresponding to a mixture of looped molecules and one singlepeaked component corresponding to a mixture of unlooped states. In the double-peaked component, the peak with smaller RMS peak value corresponds to a mixture of loops of types A1 and A2, while the peak with larger RMS corresponds to a mixture of loops of types P1, P2 and the open Lac repressor loop. With increasing Lac repressor concentration the percentage of looped molecules increases until about 10 pM of Lac repressor and then decreases until it vanishes above 100 nM of Lac repressor. This same behavior is seen in the experiments. The distance of the peaks in the double-peaked component is about 10 nm, which is smaller than 25 nm seen in the experimental data. The distance between the outermost peaks in the theoretical distribution is about 58 nm, which is identical to the experimental distribution. The RMS values of peaks in the theoretical distribution are about 20 nm higher than in the experimental, in accord with the discrepancy observed for the calibration curve. In summary, our theoretical approach correctly predicts the number and relative heights of peaks in the RMS distribution, but not their absolute positions.
In a similar fashion we fitted the theoretical predictions for cases B and C to our TPM results for the two 1632 bp molecules. We have estimated the values of parameters K 1 , K id (assuming they are identical for the two cases since the experiments were performed in identical conditions), and the parameter f for each case. The resulting computed joint distributions for various concentrations of Lac repressor are shown in Figure 14   The key component contributing to the occupancy of the middle peak of the distribution in the cases A and C is the loop with extended Lac repressor. Equation (26) implies that the ratio R of occupancies of the middle peak (consisting of loop types P1, P2, and Open) and lower peak (consisting of loop types A1 and A2) is independent of the binding constants K 1 , K id and the concentra-tion [LacI] of the Lac repressor, and is given by Note that if the Open loops (with extended Lac repressor) were absent from the ensemble (i.e., if f~1 in the above equation) then, in view of the J-factors reported in Table 2, in the case A,   R~0:27, which is smaller than the average value 0.46 observed in Han et al. [19]. In the case C, the computed value R~0:19 for f~1 is also significantly smaller than the value 1.09 estimated from experimental distribution in Figure 14, case C. Therefore, in the absence of extended Lac repressor loops the occupancy of the middle peak in the distributions would have to be substantially lower than what the data show.

Dependence of radial distribution on phasing
The J-factor of a DNA loop is very sensitive to phasing, i.e., the amount of excess link trapped in the loop upon closure. This sensitivity can be explored by changing the loop length by 1 bp at a time -such a change has little effect on the loop length but a large effect on excess link due to the intrinsic helicity of DNA which has a period of about 10.5 bp [57,58,59]. TPM experiments of this type were performed by Han et al. [19] with DNA   (26)) as described in the text. For the case A we used experimental results shown in Figure 3 of [19]. doi:10.1371/journal.pone.0092475.t002 Figure 13. The predicted joint distribution of RMS projected distances as a function of the concentration of Lac repressor for the 900 bp DNA. The predicted joint distribution of r RMS xy was calculated from the probability density functions given in Figure 12A, and our calculated looping J-factors. Here the bead radius is 245 nm, the DNA length is 900 base pairs, the centers of the binding sites are located at n 1~4 38, and n 2~7 65 yielding 326 bp Lac repressor-induced loops. doi:10.1371/journal.pone.0092475.g013 Figure 14. The predicted joint distribution of RMS projected distances as a function of the concentration of Lac repressor for the two cases of the 1632 bp DNA. Computer simulation and Experimental TPM results showing a DNA molecule of 1632 bp with binding sites centered at (Case B) n 1~4 44, and n 2~1 044 yielding 600 bp Lac repressor-induced loops; and (Case C) n 1~4 44, and n 2~1 344 yielding 900 bp Lac repressor-induced loops. The bead radius for both cases is 160 nm. The figure shows the joint distribution of RMS projected distance between the center of the bead attached to the end for which n~1632 to the DNA end that is attached to the plate. Each single RMS value in the experimental results (blue) was calculated using a time window of 8 s. The simulation results (red) were calculated from the probability density functions given in Figures 12B, and 12C and our calculated looping J-factors (See Table 2 constructs of length 900-903 bp containing O1 and Oid sites separated by any chosen distance in the range 300-310 bp. Each change in the loop length by 1 bp results in a change in the excess link of the loop of about 1/10 of a full turn. The change in the looping probabilities is clearly visible in the experimental joint distribution (Figure 7 of [19].) We here model the effect of changing the excess link of a loop by using the approach described in Methods section. In particular, we constrain the position and tangent orientation of one of the bound DNA operators but allow it to rotate around the tangent. The angle of rotation divided by 2p describes the change of the linking number (or, correspondingly, the excess link) of the loop. Although a specified loop can form only with integral values of L k a canonical ensemble of the type f3 Ã g gives, for a given loop type, the probability distribution of DL k from which the most likely linking number for that loop type can be deduced. Our simulation results of the distributions for each loop type of the 900 bp molecule are shown in Figure 15. Using equation (30), the probability density function of L k (Figure 15), and the probability density of the projected end-to-end distance ( Figure 12A) we computed the RMS distributions corresponding to excess link values shown in Figure 16. In this figure one can see the variability in the proportion of peak heights in the portion of the distribution corresponding to looped configurations. The antiparallel loops A1 and A2, comprising the left-most peak, are of the highest percentage when the linking number differs by about half a turn from an integer value, which is in agreement with the experimental results reported in [19].

Conclusions
We have introduced a novel numerical scheme for accurate statistical mechanical simulation of constrained DNA molecules, and used that scheme to compute the RMS distributions reported in TPM studies of DNA looping induced by the Lac repressor. Our modeling scheme enhances the interpretation of TPM experiments by associating geometrical and topological information to the peaks observed in the experimental TPM distribution of RMS bead-to-attachment site distance. Our results strongly suggest that the looped states observed in the experimental TPM distribution are, in fact, composed of five distinct looping states. The peak with lower RMS value corresponds to the antiparallel types A1 and A2 while the peak with higher RMS value corresponds to the parallel types P1 and P2 and a looped state in which Lac repressor attains an open, flexible conformation. The main reason that necessitates the inclusion of the open state is that without it, the parallel types P1 and P2 do not occur with high enough probability to explain the observed area of the peak.
Besides the structure of the looped states, we have used the model to estimate the Lac repressor-DNA operator dissociation constants. The ranges of our estimates, 0:86*1:00 nM for K 1 and 1:47*5:47 pM for K id compare well with the ranges obtained by Han et al [19] by a similar technique (0:49+0:45 nM for K 1 and 0:2+2:3 pM for K id ). Our estimated range for K id is within the range of values measured using binding assays (2:4*8:3 pM [60]) while our estimates for K 1 are about 50 times higher than bulk binding assay results (10*22 pM [60]) which would correspond to 50-fold weaker binding. It is important to note that no DNA looping occurs in such binding assay experiments and therefore the conditions for DNA-protein binding are close to ideal. In the TPM experiments, on the other hand, the DNA deformation that leads to the formation of loops may also result in local deformation of the DNA binding site for Lac repressor which would lower the DNA-protein binding affinity (increase the dissociation constant). Such an effect is difficult to quantify as we do not possess at the moment an accurate theory of the dependence of DNA-protein binding affinity on local DNA deformation.
We have generated an arsenal of large canonical ensembles of configurations confined to an appropriate set of geometric constraints, and use this arsenal to calculate the looping J-factors characterizing looping probabilities as the product of conditional probabilities in a manner similar to that utilized in [24]. In contrast to the approach in which all the conditional probabilities are taken from a small fraction of a single canonical ensemble calculated using Gaussian sampling method [24,27,19], in our present scheme each of the conditional probabilities is deduced from an adequate ensemble that is generated separately.
In our numerical studies we calculate canonical distributions of constrained DNA molecules. In contrast to other simulations of looped DNA such as those reported in [19,27] in which the fluctuations of the looped subsegment are neglected by either treating it as rigid or as a single extended step, in our approach fluctuations of the loop itself are taken into account. We believe that because of the loop-DNA steric effects, and the fact that all other intramolecular excluded volume effects and electrostatic interactions are taken into account, our results overestimate the mean (or RMS) end-to-end distances when compared to either experimental results, or other simulations in which some or all intramolecular interactions are not taken into account. This may suggest, that not only the bead volume exclusion effects that were analyzed in [61] and were taken into account in this work, are important, but also hydrodynamic effects [56] and possible steric attraction effects between the large bead and the DNA molecule may play a significant role and should be treated carefully in future models. Nonetheless, our results show qualitative agreement with both our experimental data and the results reported in [19] and support the existence of extended conformation of the Lac repressor. , is shown as a function of the excess link, DL k for the 900 bp DNA. The distribution was calculated from the probability density functions given in Figures 12A and 15, and the looping J-factors. For each value of DL k all configurations with values of L k zDL k differing by an integral number were taken into account. doi:10.1371/journal.pone.0092475.g016 Our simulations suggests that the looping probabilities for the anti-parallel loop types are 4-and 6-fold higher than those of the parallel loop types, even for loops of length 600 and 900 base pairs. This is somewhat surprising, as one expects the orientations of the two loop termini to be uncorrelated for loops significantly longer than the persistence length of the DNA. However, we believe, this is a direct result of the TPM experimental design. Because the total length of the DNA is kept unchanged in our experiments, a longer loop makes the upstream operator closer to the comparatively large bead. Consequently, the orientation of that operator is more correlated with the, very restricted, position of the bead. This makes the anti-parallel loops entropically favored.