Insights into the non-homologous end joining pathway and double strand break end mobility provided by mechanistic in silico modelling

After radiation exposure, one of the critical processes for cellular survival is the repair of DNA double strand breaks. The pathways involved in this response are complex in nature and involve many individual steps that act across different time scales, all of which combine to produce an overall behaviour. It is therefore experimentally challenging to unambiguously determine the mechanisms involved and how they interact whilst maintaining strict control of all confounding variables. In silico methods can provide further insight into results produced by focused experimental investigations through testing of the hypotheses generated. Such computational testing can asses competing hypotheses by investigating their effects across all time scales concurrently, highlighting areas where further experimental work can have the most significance. We describe the construction of a mechanistic model by combination of several hypothesised mechanisms reported in the literature and supported by experiment. Compatibility of these mechanisms was tested by fitting simulation to results reported in the literature. To avoid over-fitting, we used an approach of sequentially testing individual mechanisms within this pathway. We demonstrate that using this approach the model is capable of reproducing published protein kinetics and overall repair trends. This provides evidence supporting the feasibility of the proposed mechanisms and revealed how they interact to produce an overall behaviour. Furthermore, we show that the assumed motion of individual double strand break ends plays a crucial role in determining overall system behaviour.


Introduction
In the context of radiotherapy, one of the most important determinants of cell fate is the creation and subsequent attempts at repair of double strand breaks (DSBs). This radiobiological process not only has a role in determining cell fate through triggering various cell death pathways [1], but in recent literature has also been implicated in initiating an immune response through the cGAS-STING pathway [2]. Furthermore, genetic defects in the DSB repair pathways of patients can confer pronounced radiosensitivity [3,4], highlighting the importance of these processes in determining response to radiotherapy. DNA DSBs have multiple pathways of repair, the most dominant being Non-Homologous End Joining (NHEJ) [5] and Homologous Recombination (HR) [6]. The NHEJ pathway is available throughout the cell cycle, and is the only pathway available outside of the replication phases [7,8].
However, the complexity of biological systems, even when reduced to a single pathway, presents a significant challenge to in vitro/in vivo experiments when attempting to clearly and unambiguously define mechanisms responsible for DNA repair. This has resulted in a number of publications by different groups that come to contradictory conclusions about mechanisms within the NHEJ pathway, each of which are supported by the evidence observed experimentally. For example, Reid Computational models derived from experimental results are an effective tool for the disambiguation of potential underlying mechanisms (for a recent review of the field see McMahon and Prise [11]). Models which explicitly describe the recruitment of individual repair proteins, such as those by Cucinotta et al. [12] and Taleei and Nikjoo [13], add the opportunity for additional constraints. The modelled mechanisms related to these individual proteins can be directly compared to observed results from experiments which isolate the behaviour of these proteins. Furthermore, by encoding into a larger system these individually hypothesised mechanisms, they can be tested to be more or less probable based on the capability of the whole system to reproduce overall behaviours across various endpoints.
Alongside unrepaired DSBs, misrepaired DSBs are also of importance for determining cell fate [14]. Models that have implemented mechanisms attempting to describe misrepair have shown good correlation to biological endpoints such as chromosome aberrations [15] and cell fate [16,17]. Such models often propose that the proximity, or clustering, of breaks plays an important role in determining misrepair events [15,18]. Monte Carlo track structure simulations provide detailed descriptions of radiation interactions within matter [19]. Specific geometric distributions of DSBs can be obtained by applying this technique to a model of the cell nucleus and clustering the interaction points together [20][21][22]. Using such models as input therefore makes it possible to investigate how changes in the proximity [23] and complexity of breaks [24] affect the repair machinery.
The motion that DSBs undergo play a significant role in this misrepair. Several approaches exist to incorporating this into models of DNA repair, ranging from parametrising it as an interaction range [18] to explicitly modelling the motion [25,26]. Both approaches show good agreement with experimental results. Explicitly modelling motion disentangles it from protein kinetics, enabling separate exploration of the underlying mechanisms, but increases the necessity for accurate modelling. It has been experimentally demonstrated that DSBs move by subdiffusion [27][28][29]. Additionally, although not explicitly concluded, there is experimental evidence that this applies to the individual chromatin ends of a DSB [30]. A detailed description of sub-diffusive motion and anomalous diffusion in general are beyond the scope of this manuscript (see instead Metzler et al. [31,32]). However, there are two important implications relevant to this work. Firstly, sub-diffusive motion can arise from distinct physical mechanisms operating on the micro-/nanoscopic scale [33][34][35] which will by necessity impact the operation of the repair mechanisms. Secondly, this motion encourages interaction of nearby objects whilst strongly discouraging long range interactions [36,27].
The PARTRAC code developed by Friedland et al. [25] is a mechanistic model which explicitly describes the recruitment of individual proteins and uses radiation track structure simulations to predict DSB formation. In the interest of simplicity, the PARTRAC code has modelled DSB end motion as semi-free diffusion, rather than explicitly modelling sub-diffusion, which they report to be capable of reproducing sub-diffusive behaviour [26]. Implementing a more detailed model of sub-diffusive motion would provide scope for testing which of the different microscopic mechanisms of sub-diffusion governs DSB end motion. Currently there are contradictory conclusions in the literature; Girst et al. [27] reports that a continuous time random walk (CTRW) description fits their experimental data best while Lucas et al. [28] reports that their experimental data shows hallmarks of fractional Langevin motion.
Combining discrete, stochastic, mechanistic models using DNA damage derived from detailed track structure simulations with genuine sub-diffusive motion will provide a comprehensive model for studying a broad range of radiation doses and qualities. In this manner the relationships between various biological and physical mechanisms, and how they determine the overall behaviour of the repair pathways, can be made clearer through the combination of experimental and computational biology. Thorough investigation of the emergent behaviour of such a system gives insight into the operation of the cellular repair machinery and subsequently into the experimentally observed variations in radiosensitivity. This approach sets these types of models apart from phenomenological models. Clinically, observed variations in radiosensitivity is important for determining the Relative Biological Effectiveness (RBE) of the different radiation modalities used; RBE simply being defined as the ratio of iso-effective doses between these radiations. RBE itself is necessary as a conversion factor for translating the wealth of clinical experience in conventional X-ray radiotherapy to newer treatment modalities such as proton beam therapy.
In this work we have identified several publications in the field that, based on their experimental results, propose biological mechanisms for different stages in the canonical NHEJ (c-NHEJ) pathway. We then implement each of these specific biological mechanisms as a specific simulation mechanism within a Monte Carlo based model. The individual simulation mechanisms are then combined to build a model of the c-NHEJ pathway as a whole. The DNA Mechanistic Repair Simulator (DaMaRiS) was developed for this work, which is a mechanistic, Monte Carlo based framework that was developed within the Geant4-DNA tool-kit [20,37,38] and allows for event-by-event tracking of individual DSB ends. Initial conditions for the model were set from interfacing with a nano-dosimetric DNA damage model [22,21], directly linking radiation track structure to biological effect [39]. Interfacing between the two codes was achieved through the new standard DNA damage (SDD) data format [40]. The model produces as its output predictions of biologically measurable end points, including repair protein recruitment kinetics and DSB repair kinetics, which are compared to published experimental data to assess the feasibility of the combined mechanisms.

Materials and methods
Only the salient points of the c-NHEJ pathway construction and implementation of DSB motion are expanded upon here. For more detailed descriptions of how the model has been constructed inside Geant4-DNA see Supporting Material.

Implementation of the non-homologous end joining process
Time evolution of individual DSB ends is governed by a series of time constant based state changes (see Appendix S.3 in Supporting Material). The states are linked according to the scheme outlined in Fig. 1. States A-F represent the progression through the c-NHEJ pathway; boxes A-C and I represent single DSB ends and boxes D-F represent two DSB ends, either in a synaptic complex or fully ligated. Dashed boxes are processes which can affect states of DSB objects. The excellent work done by Friedland et al. [25] proposes a general scheme for c-NHEJ. This work differs in its description of the pathway by inclusion of further, experimentally supported mechanisms published in the literature.
State A represents an exposed DSB end which may have associated single strand breaks and base lesions (see Appendix S.1). Recruitment of the Ku70/80 heterodimer is widely reported in literature to be the first step in the c-NHEJ pathway [5,8,43] and occurs in both heterochromatin and euchromatin [44] (state B). Once recruited, a dynamic equilibrium between recruitment and dissociation of Ku70/80 is established at sites of DSBs in cells. The model assumes that in isolation this equilibrium is sufficiently biased towards recruitment such that dissociation can be neglected. When Ku70/80 is phosphorylated this equilibrium is shifted towards favouring dissociation (mechanism proposed by Lee et al. [41]). DSB ends may alternatively become temporarily inhibited towards recruitment of Ku70/80 (state I). The inhibition is a catch all state which represents competition from other proteins with affinities for exposed DSB ends. Following attachment to a DSB end, Ku70/80 recruits the DNA-Protein Kinase catalytic subunit (DNA-PKcs) to form the DNA-PK complex (state C). Similarly to Ku70/ 80, the model assumes that DNA-PKcs does not dissociate from DNA ends until phosphorylated (mechanism proposed by Uematsu et al. [42]). DNA-PK is required for the formation of synaptic complexes between proximal DSB ends [10] (state D, see Appendices S.2 and S.4). Once in a synaptic complex DNA-PKcs autophosphorylates, inducing conformational changes necessary for further processing and allowing it to phosphorylate Ku70/80 [41,42]. This initial "long range" synaptic complex is prone to dissociation (mechanism proposed by Graham et al. [10]) and due to the phosphorylated state of both Ku and DNA-PKcs the entire complex dissociates to state A [41,42] (process Y, see Appendix S.4). If the complex survives, progression to a stable "short range" synaptic complex follows (state E) (mechanism proposed by Graham et al. [10]). The action of the various proteins required for steps past stabilisation of the synaptic complex have not been explicitly modelled, instead being gathered together in generalised steps. Processing of the break site occurs in state E, repairing all base lesions and extra DNA backbone lesions present (processes Z, see Appendix S.3) before allowing final ligation to state F.

Motion
To the best of our knowledge sub-diffusive motion has not yet been specifically included in any mechanistic in silico models of DNA repair. In this work we have chosen to implement a continuous time random walk (CTRW) model of sub-diffusion. This represents the DSB ends undergoing transient states of confinement, such as would be expected in a system with rapid formation and dissociation of synaptic complexes [34] as suggested by Graham et al. [10].
To implement such a continuous time random walk model, DSB objects are assigned a waiting time drawn from: where R is a uniform random number between 0 and 1, A is the minimum waiting time and α is the anomalous diffusion exponent. In this work the anomalous diffusion exponent was set to 0.5 and the minimum waiting time was set to 1 ms in order to generate the waiting times. Whilst a DSB object has a waiting time associated it is trapped and cannot diffuse. Waiting times are tracked using the internal clock of the simulation and once reduced to 0 seconds the DSB object is released. Diffusion then occurs through normal Brownian motion means of stochastic displacement by a distance drawn from a normal distribution. The distribution is described by the diffusion coefficient and the time the object is allowed to diffuse. In our model the diffusion time is set at 1 ps, which promotes a single movement event within the simulation before setting a new waiting time. In this manner we implement a mechanism which is expected to lead to sub-diffusive behaviour. The diffusion coefficient during this movement event is taken to be the free fitting parameter to manipulate the overall scale of motion and fit to literature data.

Recruitment Kinetics of DNA-PKcs and Ku70/80
Literature containing results of protein recruitment and recovery experiments were collected and data extracted from the graphs. Table 1 displays the cell lines and radiation types used as well as the χ 2 for each experimental data set as compared to the model behaviour. χ 2 is reported per experimental data point to facilitate comparison between experiments with different amount of data points collected. The Fig. 1. Scheme of the c-NHEJ pathway implemented in this work with associated time constants. Boxes A-F and I represent the different states DSB objects can occupy in the simulation, dashed boxes Z and Y and all arrows represent the different mechanisms which can change the state of a DSB object in the simulation. States A-C and I represent a single DSB end; states D-F represent a synaptic complex of two DSB ends. Progress from states C to D is governed by a proximity based reaction which produces a single synaptic complex from two DSB ends. Progress between all other states is governed by a time constant based stochastic process. Process Y splits this synaptic complex back into two individual DSB ends. Also noted in the figure are the areas of influence of the three experimentally supported mechanisms combined by this work. Lee et al. [41] suggested that phosphorylation of Ku70/80 is required to promote dissociation, Uematsu et al. [42] suggested that phosphorylation of DNA-PKcs is required to promote dissociation, and Graham et al. [10] showing formation of transient synaptic complexes between DNA-PK complexes.

Table 1
Summary of the chi-squared (χ 2 ) per data point (n) for the different experimental datasets compared to the simulated behaviour. Numbers in parenthesis correspond to those in Fig. 2. Data from the different datasets had different measuring time-points and consequently exactly corresponding simulation time points were not available. Experimental data was therefore compared to the nearest simulation point within a 1% time deviation where available. Column 6 shows the average time deviation of the points used and their standard deviation.

Authors
Cell Radiation Protein majority of data was generated by use of laser systems which create damage through non-ionising interactions. The breaks are considered to be complex in nature and thus require the DNA-PKcs mediated c-NHEJ pathway described in this work [45]. Uematsu et al. [42] reported that for their 365 nm pulsed nitrogen laser they created 2500-3700 DSBs per 1.7 μm 2 spot. Uematsu et al. also report experimental data from uranium ions accelerated to 3.8 MeV/u. The high Linear Energy Transfer (LET) of these ions, >12,800 keV/μm, is expected to result in highly clustered and complex damage. Fig. 2B shows that there is not a large difference in the recruitment of DNA-PKcs between the laser and Uranium experiments. This can be particularly seen when comparing to the results of Li et al. [46], where the Uranium data is often within experimental error bars. Therefore we postulate that the recruitment kinetics in this work are valid for predicting the response of both ionising and non-ionising damage, but perhaps is limited to scenarios where the damage would be considered mainly complex and/or clustered. Fig. 2A and B shows good agreement between the simulated behaviour and literature reported experimental data from cell lines derived from Chinese hamster ovary (CHO) cell lines, namely Xrs6 and V-3. Initial rapid recruitment of Ku70/80 and DNA-PKcs reaches 50% of maximum in 2-3 s before slowing, reaching 80% at ∼10 s. Data from non-CHO cell lines show slightly more rapid recruitment of Ku70/80 which is not reproduced as well. Andrin et al. [47] use human bone osteosarcoma epithelial cells (U2OS) in their work and Hartlerode et al. [48] show similar Ku70/80 recruitment in mouse embryonic fibroblast stem cells (MEF). Fitting the DNA-PKcs recruitment to experimental data required only manipulation of the state B to state C time constant Investigating the fluorescent recovery after photobleaching (FRAP) of a protein provides another constraint which the model behaviour must fit simultaneously to recruitment. This again acts to reduce overfitting but also provides data which can more clearly be related to protein dissociation. The fluorescent recovery of a system can be diffusion limited, reaction limited, or a complex mix of the two. The recovery of diffusion controlled systems tends towards that of Eq. (2) and is limited by the rate at which non-bleached proteins diffuse into the bleached area to replace bleached proteins [50]. On the other hand, the recovery of reaction controlled systems tends towards that of Eq. (3) and is limited by the rate at which bleached proteins detach from binding sites in the breached region, allowing non-bleached proteins to replace them [51].
where f t ( ) is the relative fluorescence, t is the time, τ D is the "characteristic" diffusion time, I 0 and I 1 are Bessel functions of the 0th and 1st order respectively, and k off is the detachment rate of the protein in question.
To determine how we should implement the modelling of FRAP in DaMaRiS the literature reported data was first theoretically analysed to see which of these scenarios best describes the experimentally observed behaviour. Table 2 shows the goodness of fit achievable by Eqs. (2) Table 1 for cell lines and radiations used. Recruitment of (A) Ku70/80 and (B) DNA-PKcs to exposed DSB ends; values are normalised to the maximum value achieved during the 30 seconds plotted. Individual simulation repeats were normalised to their initial number of DSBs. Final values are the average of these normalised trends. (C) Recovery of fluorescent DNA-PKcs at DSB ends after a simulated photo-bleaching event; intensities are set to 0 at the photobleaching event and normalised to the maximum value achieved during the 600 s plotted. To remove the influence of a variable number of initial DSBs, individual simulation results were normalised to themselves before all repeats were averaged to produce final values. Error-bars represent standard errors in the mean, apart from those associated with Weterings et al. where the source of the error-bars is not reported. may contribute to this disparity. We postulate that the data can be thought of as better fitted by a reaction limited, rather than diffusion limited system, and therefore proceed to simulate the FRAP of DNA-PKcs as if it is reaction limited. In short, this was achieved by a customisation of the simulation pathway to include photobleached counterparts of the components that involve molecules of DNA-PKcs. At 30 seconds a bleaching event is simulated converting all DSB ends with attached DNA-PKcs into photobleached versions. Recovery is measured as the rate at which DSB objects acquire new unbleached DNA-PKcs, DSB object with bleached DNA-PKcs having to first undergo dissociation to exposed DSBs (Process Y in Fig. 1). The resultant behaviour is again compared to literature reported laser generated DSB data, shown in Fig. 2C. Once parameters are adjusted to fit the experiments, discussed below, we confirm that the implemented mechanism produced the expected behaviour. Simulated time points were compared to the values predicted by Eq. (3) for that time point. From Table 2 it can be seen that this produces behaviour well described by the theoretical reaction limited system (χ 2 /n = 0.008).
The theoretical fit to simulated data agrees with Weterings et al. [53] up until 140 seconds (χ 2 = 0.56). After this the fit worsens (χ 2 = 10.47) as the data plateaus earlier in spite of the same cell line being used. The lack of data from Hammel et al. [54] for times greater than 300 seconds prohibit identification of a plateau. Due to the rapid recruitment of Ku70/80 and DNA-PKcs derived from the data in Fig. 2A and B, the dissociation time constant is the remaining parameter which has a dominating influence on the initial behaviour of the recovery curve for the first 300 s in our model. The final time constant selected for dissociation of transient synaptic complexes is 140 s.
In the simulation, DNA-PKcs is assumed to be retained until the DSBs are converted into a fully fixed state. Due to the complex nature of all the breaks, this state is only reached after backbone breaks and base lesions are cleaned. The plateau of the recovery curve from 300-600 seconds in Fig. 2C therefore gives a lower bound for the time constants associated with these actions. Final time constants selected are Backbone Clean-up = 300 s and Base Lesion Clean-up = 900 s, taken from Friedland et al. [25].
The fluorescent recovery plateaus at ∼60% of pre-bleaching event intensity as observed by Davis et al. [52]. In the simulation DNA-PKcs can only dissociate as part of the entire synaptic complex dissociating. Dissociation of these synaptic complexes can only happen whilst they are in their unstable, transient, states. Therefore, manipulation of the rate at which synaptic complexes stabilise is sufficient to vary the amount of bleached DNA-PKcs trapped at DSBs and determine the overall recovery. The final time constant selected for stabilisation of transient synaptic complexes is 250 s.

Repair kinetics
To allow direct comparison of simulation to literature reported data, DNA damage patterns were generated using the model published by Henthorn et al. [21,39] with proton beam parameters chosen to match their experimental set-up. Fig. 3 shows repair kinetics data as measured by (A) γ-H2AX, (B) 53BP1, and (C) comet assays plotted against absolute repair kinetics as predicted by DaMaRiS with the remaining final ligation time constant set to 1200 s, taken from Friedland et al. [25].  Table 2 Summary of the χ 2 per data point for the different experimental datasets compared to theoretical fits assuming diffusion limited (Eq. (2)) or reaction limited (Eq. (3)) recovery. Numbers in parenthesis correspond to those in Fig. 2. All works are fitted best by assuming reaction limited recovery apart from that of Hammel et al. [54].  fall-off in residuals observed, that is the DSBs that are yet to be fully ligated, in the first 100 min in Fig. 3 with minimal impact on the slope of this curve. The experimental data is such that no single value of final ligation time constant can reproduce initial trends in all data sets simultaneously. The value required is not clearly cell line or LET dependent as can be seen from data sets 13 and 14, where an increase would improve the fit to 13 but reduce the fit to 14. Neither is it clearly dose dependent as can be seen from data sets 18 and 19, where an increase would improve the fit to 19 but reduce the fit to 18. The mechanisms of the repair machinery have thus far been determined by fitting to experimental data which has resulted in limiting the period of action to the initial two hours. Consequently, to fit our model to the residual breaks at 24 h we must address the diffusion of DSB ends.

Diffusion
The yield of residual DSBs at 24 h for 1 Gy of proton irradiation was compared across a range of LET to residual 53BP1 foci experimentally observed by Chaudhary et al. [56] (Fig. 4A). It was found that using a CTRW jump diffusion coefficient of × 2.4 10 11 nm 2 /s could well reproduce the experimental data with both gradients and intercept for fitted lines within error of each other.
To quantify the sub-diffusive behaviour in the model, the mean squared displacement of DSB ends were tracked over the initial 300 s of the simulation for 20 repeats, Fig. 4B. The data was fitted with a power law: with D α in units of nm 2 /s α being the general diffusion coefficient, and α being the anomalous diffusion exponent. = α 1 indicates normal Brownian diffusion following Fick's law, > α 1 indicates super-diffusive properties, and < α 1 indicates sub-diffusion. Mobility of individual DSB ends are difficult to investigate, however, there are numerous groups that have investigated the mobility of DSB loci or chromosomal loci. Cabal et al. [59] measured the motion of fluorescently labelled GAL genes in Saccharomyces cerevisiae, measuring = ± α 0.42 0.01 for gene foci not confined to the nuclear envelope. Weber et al. [60] studied fluorescently labelled chromosomal loci in E. coli and Vibrio cholerae, measuring α to range from ± 0.35 0.02 to ± 0.44 0.06 for various gene foci. Bronstein et al. [61] used GFP-tagged TRF2 in U2OS cells to measure α for telomeres, observing growth from ± 0.32 0.12 to ± 0.51 0.20 during initial anomalous motion. Girst et al. [27]

Discussion
The complex nature of biological systems involves many individual actions that combine to produce an overall behaviour. To model these systems in silico, either a top down or bottom up approach can be used. Top down approaches, such as phenomenological models, have the advantage of limiting the number of variables used to recreate the observed behaviour of a system. Such an approach is useful for parametrising dependencies of the system on known variables, but does not provide clear insight into the mechanisms responsible. Bottom up approaches have the advantage of being useful in exploring the unknown effects of specific mechanisms on overall behaviour. As such they are useful tools for determining the compatibility of separately proposed, experimentally supported mechanisms. In this work we have combined several such mechanisms into one system and tested the subsequent overall behaviour against experimentally observed trends. We show that, with reasonable parameters for individual mechanistic steps, experimental results can be reproduced for relevant time points when combining mechanisms proposed by Graham et al. [10], Lee et al. [41], and Uematsu et al. [42]. However, we show that a CTRW interpretation of the sub-diffusive motion of DSB ends, as suggested by Bronstein et al. [61] and Girst et al. [27], is likely insufficient to explain overall repair kinetics.
Mechanistic, bottom up models must assign parameters to each mechanism which increases the danger of over-fitting. To guard against over-fitting, we have chosen to fit the model detailed in this work progressively, starting from fitting the earliest mechanisms in isolation and building from there. Table 3 summarises these results and other parameters used in the model. The fitting was done to literature reported experimental results from a variety of cell lines and with a number of different radiation types. NHEJ is a highly conserved pathway in mammals and the experimental results follow very similar self-normalised trends. Furthermore, the work done by Graham et al. come from experiments done in cell free extracts [10], and as such, are not directly comparable to any cell line. As a pragmatic compromise, the behaviour of the mechanisms investigated in this work were therefore fitted to a range of cell lines. It should be noted that the parameters so produced will not describe any one cell line exactly but instead the response of a "generalised" cell. If these parameters are within reasonable confines then it would suggest that these mechanisms could be representative of the cellular machinery. In this way we believe that we obtain a general mechanistic description of the NHEJ pathway. Extension to different cell lines can then be done by altering transition rates between states of the model, or by blocking certain progression points. This extension to the model is possible as more experimental data become available.
The initial pre-synaptic recruitment of Ku70/80 and DNA-PKcs is well reproduced by the combination of mechanisms in our model. The observed behaviour of DNA-PKcs recruitment can be well explained simply as a sequential recruitment of Ku70/80 followed by DNA-PKcs. This makes sense structurally as it has been shown that DNA-PKcs binds to the C-terminal motif of Ku80 [54], and Ku70/80 is commonly reported to be the first NHEJ protein recruited to break sites [5,8,43]. The inhibition of Ku70/80 in our model represents competition by alternative pathways for exposed DSB ends. This means that the rate of Ku70/80 phosphorylation, and subsequent dissociation of the synaptic complex, governs the rate at which DSBs have the opportunity to progress down alternative pathways, as suggested by Lee et al. [41].
Due to the self-normalised nature of the experimental data reported, determining the mechanisms governing Ku70/80 recruitment by fitting parameters in our model is challenging and ambiguous. There are conceivably different combinations of the 3 time constants used which would lead to the same overall kinetics. For example, the difference in Ku70/80 recruitment reported by Andrin et al. [47] and Hartlerode et al. [48] could be explained by either a decrease in the time constant for release from inhibition substantial increase in the time constant for inhibition (see Supporting Material Fig. S.5). Decreasing the time constant would logically result in an increase in the overall recruitment rate of Ku70/80. A substantial increase in the inhibition time constant would result in the inhibited state rarely being entered during the time frame investigated and thus the self-normalised Ku70/80 recruitment would represent primarily the rapid kinetics of Ku70/80 recruitment as unperturbed by inhibition. More rigorous fitting therefore requires a better understanding of the initial mechanisms governing competition by more directly comparable experiments, such as those by Mao et al. [62,63].
The These synaptic complexes are prone to dissociation, but can alternatively stabilise retaining the phosphorylated DNA-PK on the DSB (G). By implementing this combined system (L+U+G) into DaMaRiS, and using the scheme proposed by Friedland et al. [25] to fill in the rest of the pathway, we show that it is mathematically capable of reproducing the observed DNA-PKcs foci kinetics during FRAP experiments as well as the reported long term recovery of ∼ 60% pre-bleach intensity [42,52]. Sequential fitting of time constants governing DNA-PKcs recovery was possible due to individual parameters dominating largely separate areas of the FRAP curve, again avoiding over-fitting. Long term recovery was controlled by the rate at which unstable phosphorylated DNA-PK synaptic complexes were stabilised. This could be the result of XRCC4 recruitment by Ku70/80, which together with XLF has been shown to form bridging filaments resistant to dissociation [9,64]. Experiments investigating co-localisation of fluorescently labelled XRCC4 and DNA-PKcs foci would enable implementation and verification of a more detailed mechanistic description of this process in our model. Having fitted the explicit mechanisms of Ku70/80 and DNA-PKcs recruitment kinetics we show that in order to subsequently fit the general mechanism of repair kinetics for the initial 30 minutes, all fitted time constant are set such that their influence is restricted to the first two hours. In our model therefore, the yield of residual breaks at 24 h is predominantly determined by the mobility of the individual DSB ends. This leads to the conclusion that the determining factor which leads to a residual break is the capability of the individual ends to escape the local volume and end up isolated. This mechanism alone is enough to reproduce the LET response of residual breaks at 24 hours observed by Chaudhary et al. [56].
However, fitting in this manner causes the model to deviate from literature reported experimental data over intermediate time points. This is made clear by looking at data sets 13, 14, and 18 in Fig. 3. Despite both the initial and final recorded data points approximately being reproduced by DaMaRiS, the intervening behaviour underestimates the number of residuals present. Similar conclusions can be reached for other data sets if the final ligation time constant were to be adjusted to fit the initial recorded experimental point individually. Adjusting to a higher anomalous diffusion coefficient would increase the number of residual breaks present at these time points but also, as shown earlier, increase the residual yield reported at 24 h. This would invalidate the fit to experimental data shown in Fig. 4A. Therefore some mechanism is needed by which breaks have the high mobility necessary to delay formation of synaptic complexes, but remain local enough that given sufficient time meeting another break is probable.
In this work we have chosen to use a CTRW description of subdiffusion as observed experimentally by Bronstein et al. [61] and Girst et al. [27]. We show that this leads to sub-diffusive behaviour similar to that reported by others [27,[59][60][61]. Having fitted motion to residual DSBs from work by Chaudhary et al. [56], it is interesting to note that we also reproduce the ∼ 100 nm displacements of individual DSB ends observed by Soutoglou et al. [30]. Soutoglou et al. reported that when Ku80 was missing this displacement increased to >500 nm, which could be explained by a dependency of the constrained motion on synapsis formation. If this is the case then it lends weight to the CTRW description of sub-diffusion, which could arise due to the mechanism of transient synaptic complex formation proposed by Graham et al. [10].
In contrast to the CTRW model of sub-diffusive motion, Lucas et al. and Weber et al. observe behaviours indicating a fractional Langevin motion (fLm) form of sub-diffusion [28,60]. This form of motion arises when an object is trapped by a visco-elastic boundary; collisions with which results in the frequent reversals of direction observed by these The general diffusion coefficient was found to be × − 2.4 10 μ 3 m 2 /t 0.5 , two orders of magnitude larger than in this work, and the anomalous diffusion exponent of 0.5, similar to that of this work, whilst the confinement radius was found to reproduce experimental data best at 500 nm. We propose then that this mechanism could reproduce the experimental behaviour which DaMaRiS is currently deviating from discussed above. The increase in diffusion coefficient would increase the separation of initial partner DSB ends, whilst the boundary radius would maintain proximity such that the ends are likely to encounter each other again. Determining which of these two motions discussed is involved in DSB repair is important scientifically as improper modelling of this motion could lead to behaviours which need to be compensated for by fitting of the recruitment and activity of repair proteins. This in turn would lead to erroneous conclusions about the kinetics of these proteins and the biological mechanisms they affect.

Conclusion
In this work we use an in silico model to test the compatibility of several experimentally supported mechanisms hypothesised to operate at distinct stages along the pathway. We determine that the mechanisms proposed by Graham et al. [10], Lee et al. [41], and Uematsu et al. [42] can be incorporated into a theory of sequential joining of the major c-NHEJ proteins [45,65] to reproduce experimentally observed fluorescent foci kinetics. Furthermore, we investigate the impact of an assumed CTRW mode of sub-diffusion on the pathway. We show that the motion of individual DSB ends has a determining role in the repair kinetics of our model from 30 min to 24 h. Although the motion implemented produces similar displacements as reported by Soutoglou et al. [30], and reproduces the LET dependence of residual breaks reported by Chaudhary et al. [56] we show that CTRW sub-diffusion alone is insufficient to reproduce repair kinetics observed between 60and 400 min post-irradiation. This wide time frame over which DSB motion can enact substantial influence over the repair kinetic curve highlights an important need for scientific investigation. The modelling framework described here (DaMaRiS) is incorporated into the existing Geant4 Monte Carlo software familiar to the field of radiation research. As such it can be readily combined with simulations defining the irradiating apparatus and extending these physics models into the biological realm. DaMaRiS has been constructed in such a way that it can be easily expanded to include relevant biological processes on the scale of individual cell nuclei. Due to following the conventions and structure of Geant4 it can be readily modified and incorporated into existing code by researchers familiar with the tool-kit. As an example we have, in collaboration with Massachusetts General Hospital and the Harvard Medical School, demonstrated the successful implementation of this model into the TOPAS-nBio software [66][67][68]. These properties make this model a good foundation for continued development towards a multi-scale system capable of clinically relevant simulations.

Conflicts of interest
The authors declare that there are no conflicts of interest.

Author contributions
JWW developed the DNA repair model, generated and analysed the data presented in this work, and drafted the manuscript. NTH generated the DNA damage data used in this work and assisted in drafting the manuscript and development of the code base. SPI assisted in development of the code base. ALC and MS made useful contribution to the manuscript. RIM, KJK and MJM provided scientific supervision of the project. All authors reviewed and approved the manuscript.