Is the Kinetoplast DNA a Percolating Network of Linked Rings at its Critical Point?

In this work we present a computational study of the Kinetoplast genome, modelled as a large number of semiflexible unknotted loops, which are allowed to link with each other. As the DNA density increases, the systems shows a percolation transition between a gas of unlinked rings and a network of linked loops which spans the whole system. Close to the percolation transition, we find that the mean valency of the network, i.e. the average number of loops which are linked to any one loop, is around 3 as found experimentally for the Kinetoplast DNA. Even more importantly, by simulating the digestion of the network by a restriction enzyme, we show that the distribution of oligomers, i.e. structures formed by a few loops which remain linked after digestion, quantitatively matches experimental data obtained from gel electrophoresis, provided that the density is, once again, close to the percolation transition. With respect to previous work, our analysis builds on a reduced number of assumptions, yet can still fully explain the experimental data. Our findings suggest that the Kinetoplast DNA can be viewed as a network of linked loops positioned very close to the percolation transition, and we discuss the possible biological implications of this remarkable fact.


INTRODUCTION
A kinetoplast (1) is a network of linked DNA loops commonly found in a group of unicellular eukaryotic organisms of the class Kinetoplastida. Some of these organisms are responsible for important diseases such as sleeping sickness and leishmaniasis (2,3,4). The kinetoplast DNA (kDNA) is known for its unique structure. Thousands of short (1−2.5 kbp) DNA loops are interlinked, forming a spanning network that fills the mitochondria. The short loops, or mini-circles, are also linked with few large circles, or maxi-circles, consisting of around 30−50 kbp (5). The loops are found to be in a relaxed state, i.e. they are not supercoiled, contrarily to * To whom correspondence should be addressed. Email: d.michieletto@warwick.ac.uk DNA loops in other similar organisms. C. fasciculata minicircles are confined by the mitochondrial membrane in a disk which measures 1 µm in diameter and is 0.4 µm thick. The concentration has been found to be 50 mg/ml (5), similar to that found in bacteria (20 mg/ml) but far smaller than the one inside the head of a T4 bacteriophage (800 mg/ml or more) (6), meaning that the loops are overlapping but there is considerable space between DNA strands (5). Previous findings strongly suggest that the loops in the network are linked once with their neighbours, and that the valence of each rings, i.e. the number of neighbours, is around 3. In other organisms of the same class, i.e. L. tarentolae, the valence number is smaller, probably due to their different DNA concentration. During replication, catenation between the loops introduces a nontrivial topological problem, which is solved as follows. First Topoisomerase II disentangles one loop at a time from the network, the loop then undergoes duplication in a complex nearby, and later on it links again to the periphery of the network, together with the progeny minicircles (7,8). At this stage, each circle has a valence which is higher than 3: again, most likely due to the increase in density following DNA synthesis. Finally, when the cell divides, two copies of the network are produced and the valence number is brought back to 3. The topology of the kinetoplast DNA network is unique in its own kind and has been studied in the past with experiments and highly simplified models (9, 10), but a full understanding of its role, origin and replication continues to represent a challenge for the scientific community (11,12,13).
Here we propose a simple model where phantom rings are confined to move inside a box of linear size L, which we vary in order to simulate varying values of confinement experienced by the kinetoplast network inside the mitochondrial membrane. By computing the Gauss linking number between pairs of rings, we analyse the topological constraints experienced by the rings within the system. The model kinetoplast DNA can be naturally represented as a network, by mapping rings to nodes and links between two rings to edges. Our findings suggest the existence of a critical c XXXX The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. density ρ p at which the system percolates, i.e. when the size of the giant connected component scales with the number of nodes in the network. A percolating network can be viewed as a state in which all the rings are mutually inter-locked to form an extended collection of inseparable rings, which correspond to the state of the kinetoplast DNA. We also further study the topology of the network by simulating its digestion, which is realised experimentally, for instance, by adding nucleases or restriction enzymes that cut the DNA, to the solution containing the mini-circles. Remarkably, the simulated digestion provides results which are quantitatively comparable with experiments (14), and give new insight on the origin of the network.

MODEL
We model the kinetoplast genome as N = 50 DNA rings, each of which is a worm-like polymers made of M = 128 beads of size σ and with persistence length l p = 20 σ. A Lennard-Jones potential is used to model steric interaction between beads belonging to the same chain, so that we ensure that the rings do not get knotted and that they assume self-avoiding configurations. In physical units, σ 2.5 nm is the hydrated diameter of double-stranded DNA, l p 50 nm, while the contour length of each of the loops is L c = 128 σ = 320 nm 1 kbp. The network is enclosed in a cube of size L×L×L, with L between 150 and 80 (in units of σ).
We sample different network configurations by letting the rings thermalise with no steric interaction between different rings (i.e. rings are invisible to each other during equilibration), for at least the time taken for a ring to travel its own gyration radius, i.e. τ R = R 2 g /D CM . After this interval, we turn a soft repulsion on, which acts on every pair of beads distant r to each other and which we model as: πr r c with A(x,y;t) a ramp function which brings A(x,y;t) from x to y in t timesteps and r c = 2 1/6 σ. After this operation, which ensures that no contours are overlapping, we compute the pairwise Gauss linking number of any two rings, defined as: where γ i and γ j are the contours of the two rings, and r i and r j the respective spatial coordinates (15,16). This is a topological invariant which describes the pairwise state of rings, as far as we forbid two rings to pass through each other. Once the linking number has been measured we turn the soft potential off and repeat the procedure in order to obtain an ensemble of (independent) configurations for a given density ρ. This ensemble should be interpreted as an ensemble over different systems rather than an ensemble over time, since in reality, the mini-circles are not allowed to cross through each other without the presence of an external agent. For each configuration in the ensemble, we generate a corresponding network representation by assigning an (undirected) edge between each two rings which have Lk(i,j) = 0. This maps the system of linked rings to an undirected network, whose properties are directly related to the properties of the system of linked rings. Note that, because this procedure is based on the pairwise linking number, it would classify Borromean and Brunnian links as unlinked; we expect such nontrivial links to be rare within the kinetoplast network, where a good approximation is that each minicircle is linked identically and once to its neighbours (14).
Our main control parameter is the size of the confining container, L, which we modify to vary the density ρ, which determines the physical properties of the resulting network. The overlapping (number) density ρ * , at which rings start to feel each other, can be estimated as where R g has been measured from relaxed rings in sparse solution 1 . To convert to a biologically realistic value, we may assume that the volume occupied by each bead is that of a cylinder of size and height equal to σ, which leads to a volume fraction φ ∼ 0.60% occupied by the DNA, or equivalently a concentration of ∼ 8.1 mg/ml (calculated with a DNA density of 1.35 g/cm 3 (17)).

RESULTS
In Fig. 2(a) we show the size of the giant (i.e. the largest) connected component, |GCC|, and the first Betti number of the graph G, b 1 (G) divided by the size of the system N . The latter quantity is defined as b 1 (G) ≡ N CC −|V|+|E|, where N CC is the number of connected components and |V| and |E| the size of the sets of vertices and edges, respectively. This quantity equals the number of closed subgraphs in the network, which is also the total number of cyclic d-mers (18). For mostly unconnected graphs, b 1 (G) 0, while for nearly fully connected graphs: |E| N (N −1)/2 and hence b 1 (G) N 2 /2 for large N . An increase in b 1 (G) signifies an increase in network connectivity. Fig. 2(a) therefore suggests that a percolating component emerges in our simulations at values ρ p ρ * , meaning that a connected component spanning the entire network starts to emerge just before the overlapping regime. At this value ρ p ∼ 0.0064 σ −3 , the corresponding valence (see Fig. 2(b)) is around 3, which is compatible with experimental findings (14).
In 2(b) the valence, or mean vertex degree k is shown together with the linking probability p Lk as a function of the density ρ. The figure shows that the average degree k scales linearly with the density ρ, according to previous works (10). By assuming that the network configurations are sampled from an ensemble of random graphs, the linking probability p Lk can be calculated as This hypothesis is confirmed by Fig. 2(d), where we plot the degree distribution p(k) from simulations (data points) and the theoretical prediction obtained by assuming the distribution is the one of a random graph, . One can notice that the theoretical prediction fits the data points well; in other words, the system can be thought of as a random graph with linking probability p Lk which is directly proportional to the valence of the rings and inversely proportional to the number of rings in the box. We also show the distribution of the linking number p(Lk) in Fig. 2(c). This is found to be peaked at zero for any density ρ. This is due to the fact that the network in our simulations is never fully connected, i.e. there are always more pairs of rings which are unlinked than pairs which are linked. The "shoulders" of the distribution at Lk = ±1 increase with ρ. Values of |Lk| > 1 are very unlikely (< 10 −3 ). This is presumably because the rings are relatively stiff (L c /l p ∼ 6.4, so the ring contour length L c is only ∼ 3.2 Kuhn lengths). This is in agreement with experimental findings, that in the kinetoplast DNA every mini-circle is linked only once with their neighbours. Notice that also in that case, the mini-circles length lays between 3 and 8 Khun lengths (5). For simplicity, we always assign a single edge between a pair of nodes even if they have an higher linking number. Because they are so rare, they represents a small fraction of all the links, which can be neglected. The mean linking number Lk is zero within errors, as must be the case as configurations with Lk = −1 are as likely as ones with Lk = +1. In Fig. 1(c)-(l) we show all the possible sub-graphs with three and four nodes, up to symmetries. We call these patterns "motifs". Every connected graph formed by three and four nodes is isomorphic to those in Fig. 1(c)-(d) and Fig. 1(e)-(l), respectively. In order to count the number of motifs of each type, we consider every connected sub-graph with given number of nodes and check whether it is isomorphic to one of the motifs shown in Fig. 1. The results are shown in Fig. 3. In agreement with the findings of Chen et al. (14), we find that linear trimers (motif 1(d)) are roughly 10 times more frequent than cyclic ones (motif (1c)), for any density ρ. Similarly, linear tetramers, are three times more common than star tetramers (motif 1(f)) and semi-cyclic tetramers (motif 1(g)). Fully cyclic tetramers are very uncommon, again in agreement with previous experimental work (14).
To further quantitatively compare the properties and structure of the random network of linked confined loops found in simulations to those of the kinetoplast DNA, and inspired by the common biological procedure known as "digestion", we simulate the presence in solution of restriction enzymes, i.e. enzymes which are able to cut DNA strands. In this way we can simulate the random breakage of the network due to a concentration of restriction enzymes which can cut the network, and which was used in Ref. (14) to further study the network topology experimentally. To model digestion in silico, we associate a probability p to each bead composing the rings to be removed. Such probability is related to the concentration of restriction enzymes in the solution and time left to act on DNA mini-circles. The equivalent probability p r of a ring to become linearised, e.g. by removing one or more of its beads, is where M is the number of beads composing the rings. Naturally, for probability p = p c = 1/M , every ring has been cut, on average, once, and therefore there are no longer closed rings in the system. This procedure maps to the network representation as we can assign the same probability p r to each of the nodes, and with probability p r we remove a node from the network. In practice, we consider an ensemble of (independent) configurations from the Molecular Dynamics simulation and for each one we simulate 10 digestions by removing nodes at random with probability p r . The average number of removed nodes is n r = p r N . At the end of the (partial) digestions we measure the fraction of monomers (single uncatenated rings), dimers (two catenated rings) and trimers (three catenated rings) obtained from the digested network. These quantities can be obtained by running a high resolution gel electrophoresis test on the samples, as in (14). The relative fraction of monomers, dimers, trimers etc,  correlates directly with the intensity of the bands, as these oligomers move with different velocity in the gel. In Fig. 4 we report our findings for different values of density ρ, as a function of the linearised fraction of mini-circles, l = n r (to ease comparison with the data in Ref. (14)). From the results in Fig. 4, one can notice that, for all parameter values studied here, even after half of the nodes have been removed, one can still observe some large, or percolating, clusters. In other words, the network shows high resistance against random breakage. When viewed as a property of the biological kinetoplast genome, this appears to be functionally relevant: the DNA network needs to remain intact either when some of the mini-circles are removed, e.g. by Topoisomerase II, either accidentally during the cell cycle, or during replication when decatenation is required.
The distribution of the fraction of monomers Φ M , dimers Φ D and trimers Φ T show peaks as a function of l, whose locations depend on the density. In generalρ x = {ρ|Φ x (ρ) = max(Φ x )} increases with density, meaning that the denser the system, the more it has to be digested before the probability of observing monomers, dimers or trimers, rises and becomes sizeable. For fixed density, the peaks show that trimers are best produced at lower l than dimers, and dimers at lower l than monomers; this is expected as increasing l should increase the probability of finding smaller and smaller catenanes. We find that the best range of l within which gel electrophoresis of oligomers can give information on the network structure depends on the density ρ. For the highest density studied here, the fraction of linearised mini-circles has to be close to 80%, while for ρ ∼ 0.0064 σ −3 the value of l can lay between 30% and 80%, after which the survival fraction of dimers and trimers start to decrease. This range is very similar to the one observed in (14). Even more strikingly, in Fig. 4 we superimpose the data from (14), and observe again a very good quantitative agreement with the curve for ρ = 0.0064 σ −3we recall that ρ = 0.0064 σ −3 also leads to k 3 as inferred from the experiments. Together with it, the observation in Fig. 4 strongly suggest that the kinetoplast DNA network structure is that of a network of connected links close to the percolation transition.

DISCUSSION AND CONCLUSIONS
In summary, we studied the statistical physics of a percolating cluster of linked rings, by confining rings (invisible to one another) in a box and varying the density. The onset of the percolation occurs at concentrations ρ p ∼ 0.0064 σ −3 slightly lower than the overlapping value ρ * ∼ 0.0076 σ −3 at which two neighbouring rings "feel" each other's presence. At this density, the mean valence of the nodes is around 3, which is compatible with the findings in the kinetoplast DNA. Importantly, at the same density value, we compared the results from an in silico digestion of the network by a restriction enzyme, finding very good quantitative agreement with the experimental data found in (14). These results strongly suggest that the kinetoplast is well represented by this model at density ρ ∼ ρ p , i.e. by a network of linked rings close to the critical point.
Our findings also suggest that the density of DNA loops in the kinetoplast networks should not be too far from the overlap density. Taking a typical case with N = 5000 loops of say 1 kbp each, we find that the overlap density is ∼ 8.1 mg/ml; the density of the same network within a mitochondion of volume ∼ 1 µm 3 volume is about 5.43 mg/ml, which fits very well with our simulations. The DNA structure in C. fasciculata, which is well studied, has larger loops (2.5 kbp), and larger density (∼ 50 mg/ml), but this is achieved by further compaction by histone-like proteins, hence does not reflect purely geometric confinement. Furthermore, even if the density is larger than the overlap density, a network could still exist close to the percolation transition if the activity of Topoisomerase II, which allows catenation and is tacitly assumed by our model as loops are invisible to each other, is limited, for instance by the enzymatic concentration.
Being close to the percolation transition may well provide an evolutionary advantage for the kinetoplast DNA network, as this structure may be favoured over a more heavily connected network, as it facilitates the decatenation during replication, but at the same time ensures that mini-cirlces are not released by mistake. Another property of the kinetoplastlike network is that it is very resistent to digestion by a restriction enzyme, in the sense that digestion has to proceed significantly before large clusters disappear (Fig. 4). This feature again appears to be functionally relevant, as it provides a way to preserve genetic material against random breakage and replication mistakes.