3 tera-basepairs as a fundamental limit for robust DNA replication

In order to maintain functional robustness and species integrity, organisms must ensure high ﬁdelity of the genome duplication process. This is particularly true during early development, where cell division is often occurring both rapidly and coherently. By studying the extreme limits of suppressing DNA replication failure due to double fork stall errors, we uncover a fundamental constant that describes a trade-off between genome size and architectural complexity of the developing organism. This constant has the approximate value N U ≈ 3 × 10 12 basepairs, and depends only on two highly conserved molecular properties of DNA biology. We show that our theory is successful in interpreting a diverse range of data across the Eukaryota.


Introduction
Organisms are made from cells, and their functional and morphological integrity relies upon the integrity of cellular processes, particularly cell division.In turn, this relies upon the integrity of the molecular process of DNA replication [1].Thus, there is a direct link across multiple biological scales, connecting organismal robustness to genomic fidelity.Indeed, it is vital for developmental and other growth processes in organisms that the DNA in each new cell is as faithful as possible to the original zygotic genome.Errors in DNA replication will inevitably occur and cells have sophisticated means to identify and repair such errors.However, repairing DNA errors, particularly gross ones, is time-consuming, and such a bottleneck in a given cell could interfere badly with higher-level coordinated cell division processes.This is particularly relevant in embryo development, which for many organisms is highly streamlined, with 'stripped-down' cell division cycles (e.g.cleavage divisions) operating across the embryo in synchrony [2].The coherent generation of significant numbers of correctly differentiated cells enables the formation of complex architectures that constitute the emerging morphology of the organism.For many organisms development must be rapid to allow the nascent life form to function as an autonomous agent, able to compete for resources and evade predation in a hostile environment.
Thus, a tension exists between the robustness and the rapidity of development; between the requirements of integrity of DNA replication during cell division and of the speedy emergence of autonomously functional biological form.This can be restated more concisely as a tension between information fidelity and organismal functionality.We investigate this by considering an important example of DNA replication error for which repair is possible but costly in time, namely, double fork stalls (DFS) [3,4].We shall be able to quantify in a surprisingly simple way the tension described above, which, in a developmental context, takes the form of a trade-off between genome size (information complexity) and embryonic cell number (architectural complexity).This trade-off is expressed in terms of a single constant which we denote by N U , and which has dimensions of DNA length.We believe N U to be highly conserved across the eukaryotes.It has the approximate value 3 Tbp, i.e.N U ≈ 3 × 10 12 bp.
The outline of this paper is as follows.We provide a short overview of the biology of DFS and summarise a recent theory that has successfully captured much of the experimental data for DFS in both yeast cells and human cell lines.We use one element of this theory to derive the main result of this paper, and then proceed to test this against data from a diverse range of biological examples drawn from the Eukaryota, including eutely, syncytial development and polyploidy.We end with a summary of our results and a discussion of extensions of our theory.A guide to notation and further calculational details are provided in the appendix.

Background to DFS and a recent quantitative theory
Replication of DNA is initiated at multiple sites, called replication origins (ROs), situated along the DNA chain.In order to prevent any RO from firing twice in the same cell cycle (which would cause sections of DNA to be replicated twice in the same cell cycle), eukaryotic cells divide the process of replication into two non-overlapping phases [5].From late mitosis until the end of G1, before DNA synthesis begins, cells 'license' ROs for use by loading them with double hexamers of the MCM2-7 (minichromosome maintenance) proteins.Once cells enter S phase, when RO firing can occur, no further ROs can be licensed.When an RO is activated ('fires') during S phase of the cell cycle, two replication forks proceed with replication in opposite directions along the DNA, each driven by one of the two MCM2-7 hexamers loaded onto the origin.Note that only a subset of licensed ROs fire during any particular S phase, with the remaining 'dormant' origins remaining as potential backups for use if problems occur to the active replication forks [3,4].If a replication fork encounters a dormant ('unfired') RO, replication continues past the dormant origin and the MCM2-7 loaded onto it is removed (the dormant origin becomes 'unlicensed').This prevents re-replication of already replicated DNA [5].The complex of proteins at a given replication fork is called a 'replisome' and consists of an assembly of molecular machines working in a coordinated fashion to replicate the DNA rapidly (ca 50 bp s -1 in eukaryotes) and accurately (ca single nucleotide error rate of 10 -9 ) [1].Despite this sophistication, replication forks can fail through rare irreversible stalling.This is typically not problematic, as the unreplicated DNA lying ahead of the stalled fork will eventually be replicated by another fork moving in the opposite direction having been initiated by an RO upstream of the stalling event.Very rarely though a severe error can occur, a DFS.In this situation, two converging replication forks irreversibly and independently stall with no dormant RO available in the stretch of unreplicated DNA lying between them.A more detailed description of DFS with schematic illustrations can be found in [3].
A simple theory of DFS statistics has recently been developed and is successful in predicting error rates and RO distributions for genomes spanning Mbp (e.g.yeast) to Gbp (e.g.human) [6,7].The theory has a single a priori unknown parameter q, the genome-wide average probability of a single fork stall per nucleotide replication.Fits of the theory to various experimental data have consistently indicated the approximate value q ≈ 5.8 × 10 −8 bp −1 .This parameter can be recast as the length of DNA replicated before a 50% chance of a single fork stall, which we denote by N s , and which has the approximate value N s = ln 2/q ≈ 12 Mbp.Henceforth we shall exclusively use the symbol N, with one of a number of subscripts, to denote various length scales of DNA that arise in the theory.A complete list of the symbols used is given in the appendix to aid the reader.
In previous applications of the theory, to yeast cells [6] and human cell lines [7,8], a key experimental input was the set of inter-RO separations, which has a mean value typically of order 10 kbp in these examples.The theory was able to explain how this scale of RO separation leads to small tolerable DFS error rates in single cell divisions.The theory was also able to show that the observed RO distributions are optimized to constrain the number of DFS errors in a single cell division for the very different genome sizes under consideration.
Here, we consider a different situation; that of extreme elimination of DFS errors.We have foremost in our minds the case of rapid coordinated cell divisions, for instance in early embryo development, but our theory has wider applicability than this.Note, we are not concerned with the 'timing question' of ensuring complete DNA replication within a single cell in a preset time period, which has had considerable previous study using other theoretical approaches [9][10][11].

Derivation of the central result
This work was spurred by the experimental finding of very high levels of RO licensing proteins in the cells of the developing Xenopus embryo [12][13][14][15].These studies suggest that the total amount of MCM2-7 in the Xenopus egg is sufficient to provide a double hexamer at least every 400 bp throughout the first 12 embryonic cell cycles until zygotic transcription starts (at the mid-blastula transition).Although the spacing between fired origins has been measured to be ∼10 kbp [12], the density of dormant origins is at least ten times higher than this [16,17].One can postulate that for an embryonic cell to absolutely minimise its chance of a DFS error, it would, prior to S phase, saturate its DNA with ROs.The finest scale at which this is possible is the 'quantum' of eukaryotic DNA organisation, i.e. the nucleosome (and accompanying inter-nucleosome regions of DNA) [1].The length of nucleosome linkers across eukaryotes ranges between ca 20-90 bp, and the footprint of licensing molecules is ca 60 bp [18][19][20][21].Therefore an average inter-nucleosome distance of ∼60 bp allows for an essentially whole-genome saturation with ROs.For the purposes of our theory, we therefore consider the DNA as quantised on the periodic scale of nucleosomes and their accompanying inter-nucleosome regions, which we denote by N n , and which has a value of ca 200 bp [1].We define the parameter ρ to be the probability that a given inter-nucleosome region is occupied by an RO.In the limit of ρ → 1 the DNA is saturated with ROs, the number of which across a genome of size N g is in this case given by N g /N n .
In section B of the appendix we present the theory for the general case of 0 < ρ 1.For the main results of this paper we are interested in the extreme case of ρ → 1, for which a short and straightforward derivation of the theory is possible, as we now describe.
A basic ingredient of the recent theory of DFS error rates is the probability of a DFS event in a region of DNA of size N.For 1 ≪ N ≪ N s this has the form (see equations (A8)and (A16) in [6]): Thus, if we assume that every inter-nucleosome region is occupied by an RO, the probability of a DFS event within a 200 bp nucleosomal region N n is which is exceedingly small, as expected.We now consider a total amount of DNA of length N t to be replicated, all of which is saturated with ROs as described above.This total amount of DNA may reside inside a single cell or may be distributed among more than one cell, depending upon the application of interest.Given that potential DFS errors within each nucleosomal stretch of DNA are independent events, the probability of no DFS errors occurring within the entire replication process is given by (1 − P DFS (N n )) raised to the power of N t /N n .Thus, the probability of one or more DFS errors occurring is Given the extremely small value of P DFS (N n ) this expression may be rewritten as Now, focussing on the argument of the exponential, we have from equation ( 1): where we have introduced the fundamental constant We describe U as 'fundamental' as it comprises two molecular constants which are strongly conserved across eukaryotic life: i) the per nucleotide spontaneous stalling probability of the DNA replication machinery and ii) the average periodicity of nucleosomes.
It is more convenient for our purposes to define the inverse of U, which has dimensions of DNA length.We define Given that N U is simply the inverse of U, the adjective 'fundamental' applies equally well to it, and thus we posit that the value of three tera-basepairs (i.e. 3 Tbp) is a fundamental scale in rapid, large-scale DNA replication and the biology that depends upon it.Our results, presented shortly, appear to support this view.Returning to our expression for P error (N t ) in equation ( 4), and using equations ( 5)-( 7), we have our central result: If the total amount of DNA under consideration has length much less than N U , i.e. much less than 3 Tbp, then the expression can be simplified to In anticipation of the biological examples to follow, we can consider two general cases.Case I: this occurs when the total amount of DNA to be replicated is distributed among more than one cell (each of which we assume to have the same genomic content).We define the genome size N g of each cell to be the number of basepairs in a haploid set of chromosomes.If we assume these cells to be diploid, and for there to be a final count of M c cells (starting from a single cell after M c − 1 cell divisions), then the total amount of DNA to be replicated is N t = 2(M c − 1)N g .If the cells saturate their DNA with ROs in order to ensure the smallest chance of DFS errors, then assuming that P error (N t ) is small (and taking for simplicity M c ≫ 1) we have from our theory above and consequently the inequality: This expression encapsulates the trade-off between genome size and the number of cells involved in the coordinated cell division process.The product of the 'architectural complexity' (M c ) and the 'informational complexity' (N g ) are bounded by N U ; they cannot be simultaneously increased such that their product exceeds N U without introducing costly forms of DNA error repair.Typically, we would imagine such a process occurring during embryonic development, though examples involving rapid, coordinated cell divisions in adult organisms could also be relevant.
Case II: this occurs when the entirety of the DNA to be replicated is within one cell.Defining M p as the degree of polyploidy, we have N t = M p N g .If the cell saturates its DNA with ROs in order to minimise the chance of DFS errors, then following the same line of argument as above we have the inequality: which encapsulates a trade-off between genome size and degree of polyploidy for such cells.Before turning to some biological examples, we briefly discuss the more general case of ρ < 1.As mentioned above, section B in the appendix provides a derivation of the central result (the analogue of equation ( 8)) for arbitrary values of ρ.This general result is analysed in section C of the appendix resulting in two useful observations.Firstly, equation (Aix) shows that as ρ decreases from unity, the DFS error rate increases dramatically as 2/ρ.This will provide strong pressure to keep the RO density close to saturation (ρ = 1) when replication of DNA content close to 3 Tbp is required.Second, equation (Axi) provides a lower bound on ρ which can be calculated using knowledge only of the theoretical error rate at saturation (i.e.inserting N t into equation ( 8)) and the experimentally observed failure rate of the biological process under consideration.This bound will prove useful when more detailed data becomes available of RO distributions during early embryonic processes.We will give an example of the use of this bound below, when discussing eutelic organisms.

Testing the central result using specific biological examples
In this section, we test our central results against experimental data.Relating to case I, we look at two examples: i) eutelic organisms from across the Eukaryota, and ii) the syncytial phase of Drosophila development.We then turn briefly to case II, using examples of high degree polyploidy from Drosophila, mouse and human cell types.

Eutely
We start by considering what is perhaps the most highly coordinated mode of development, in which the form of the organism emerges from a completely prescribed set of cell divisions, such that the number of cells and their individual differentiated states are precisely defined at each stage of development.This process is called eutely and has been adopted across diverse branches of the eukaryotes [22].The eutelic organism has a predictable number of cells and after cell division ceases it grows larger through each cell increasing in size.In terms of our theory, we would expect the inequality in equation (11), namely M c N g ≪ N U , to place a profound constraint, simultaneously, on the genome size and cell number of eutelic organisms.This is under the assumption, of course, that the cell divisions during development are highly coordinated and rapid such that significant time spent repairing gross errors from DFS is not possible.
To test this idea, we examine eutelic organisms for which cell number and genome size are known and then compare their product to the fundamental constant N U .A more precise test is also possible, since use of equation ( 8) with the ratio of 2M c N g to N U substituted in the argument of the exponential gives the probability of one or more DFS errors.If such errors are essentially lethal for eutelic embryos, then this ratio provides an estimate (more precisely, a lower bound) for the failure rate of development of such embryos.
Table 1 provides data for three species of eutelic organisms which sit within three distinct branches of the eukaryotes: the nematode Caenorhabditis elegans [1,23], the tardigrade Hypsibius dujardini [24,25] and the rotifer Brachionus calyciflorus [26,27].We note a remarkable similarity of the cell number counts and genomic complexity of the organisms despite their very distinct taxonomies, morphologies and environments.The data is in good accord with the predictions of our theory.Our estimates of DFS errors, assuming saturation of the DNA with ROs, are also consistent in being slightly smaller than the observed developmental failure rates of the organisms (denoted by P obs ).This does not constitute conclusive proof that DFS errors ultimately limit the complexity of eutelic organisms.Experiments are required to demonstrate this; for instance, to show that the DNA of cells in eutelic development are saturated with ROs, or to show that those embryos that fail contain cells that are unable to complete timely divisions due to the occurrence of one or more DFS errors.We can also use equation (Axi) to estimate lower bounds of ρ (denoted by ρ min ) using the error rates in columns 5 and 6 (mean value) of the table.These bounds are provided in column 7, and range from 0.67 to 0.89 indicating that all organisms are utilising near saturation in order to control DFS errors.In the pre-gastrula C. elegans embryo, the number of identified ROs is ∼15 000 (although noting that large parts of the genome in the microarray-based study were missed due to the technical limitations in accessing highly repetitive sequences) [28].If the abundance of dormant origins in this organism is as high as in Xenopus (ten times that of active ROs), then our calculated ρ min of 0.67 suggests around 30 000 active ROs genomewide.This rough estimate is twice that observed in the microarray experiment possibly suggesting that half the origins licensed are in highly repetitive regions of the genome.

Syncytial development
Our analysis indicates that it is not possible for an organism with a relatively large genome (>100 Mbp) to grow rapidly beyond a few thousand cells in a purely eutelic manner.To grow beyond thousands of cells, development must slow considerably to allow for identification and subsequent repair or destruction of those cells which will inevitably arise with DFS errors.In order to test our theory for larger organisms it is necessary to focus on early stages of development in which rapid coherent DNA replication occurs.syncytial phase of insects is an important example [2].Such is the rapidity of replication in this phase, cell division is itself forsaken, with, instead, repeated rounds of nuclear division within the single large cell of the syncytium.Our theory would predict that the number of nuclear divisions is limited according to equation (11).
We test this using data from the most intensively studied insect, the fruitfly Drosophila melanogaster [29].This organism has a haploid genome size of approximately 175 Mbp.In its syncytial phase, it undergoes 13 synchronised rounds of nuclear division, the number of nuclei increasing by a factor of 2 in each round, thus creating approximately 8192 nuclei.Nuclear division then ceases, the nuclei are transported to the syncytial membrane and cellularisation occurs to create the embryonic epiblast.The amount of DNA replicated during the syncytial phase is approximately 8192 × 2 × 175 Mbp = 2.9 Tbp, which, remarkably, is just below the limit imposed by the universal constant N U .Interestingly, the haploid mutant (with half as much DNA per nucleus) goes through 14 rounds of nuclear division, resulting in the same amount of DNA being replicated in the syncytium [30].This could, for example, be explained by the existence of a critical concentration of a key molecule (utilised during replication, and thus being depleted with each round of replication) ensuring that nuclear division in the syncytium does not overstep the N U bound.
Given that 2.9 Tbp is so close to N U , a small number of errors will occur with a non-negligible frequency.From equation (8) we see that the probability of having no DFS errors is approximately 38%.A straightforward analysis using Poisson statistics indicates that the probabilities of one and two DFS errors are 37% and 18%, respectively.Thus, fewer than 1 in 10 embryos would have three or more DFS events.The errors can occur in any of the doubling cycles, though will be exponentially more likely to occur in the last few cycles.Presumably, such errors, topologically linking two daughter nuclei, would be left uncorrected with those nuclei excluded from the cellularisation process.One can extend the analysis to catalogue the frequencies with which errors occur in earlier or later cycles, and to then predict the variation in nuclei numbers after 13 cycles, but this lies beyond the scope of the current paper.
One can speculate on the implications of the (diploid) embryo having a hypothetical 14th cycle, thus creating 16 384 nuclei.In this case the amount of DNA to be replicated is almost twice N U , and fewer than 1 in 6 embryos (15%) would successfully complete the syncytial phase free of DFS errors.Poisson statistics indicate that approximately 1 in 3 embryos (30%) would have three or more errors, and more than 1 in 7 embryos (13%) would accumulate four or more errors.These significantly higher frequencies of error may simply be too costly for robust subsequent development, hence the limitation to 13 cycles of division.

Highly polyploid cells
We now turn briefly to case II-the significant replication demands in a single cell in which there is a high degree of polyploidy.There are many examples of this phenomenon in the Eukaryota [31], and we examine here three important organismal examples, Drosophila, mouse and human, for which high quality data is available.
Many of the cell types in Drosophila are polyploid, some highly so [32].Detailed data are available for three different cell types: fat body cells, midgut cells, and salivary gland cells, and are summarised below in table 2. We note that the product of genome size and ploidy level approaches but does not exceed N U , indicating that these cells are capable of robust and rapid DNA replication so long as near-saturation of DNA with ROs is utilised.It is striking that mature polyploid cells in Drosophila have DNA content limited to a similar degree to the final syncytial phase of the Drosophila embryo (both observations consistent with, and possibly linked by, the theory proposed here).A number of studies have reported that the degree of polyploidy is not necessarily constant across the entire genome, with higher rates of ploidy for gene rich regions [33,34].As the value of N t approaches N U in terminally differentiated endoreplicating cells, one possibility is to tolerate the inevitable DFS errors by allowing deleterious events in regions of the genome which are no longer functionally important.Indeed, under-replicated genomic regions in Drosophila polyploid cells suffer from a significant paucity of licensed origins in comparison to those regions rich in active genes [29,33].Turning now to mammals, two examples of cell types with very high degrees of polyploidy are trophoblast giant cells (TGCs) (mainly studied in rodents and analogous to cytotrophoblast cells in humans) and megakaryocytes.TGCs are primary cells in placental development [35], while megakaryocytes are the last stage of the differentiation process to produce platelets in the blood [36].A single megakaryocyte is able to produce several thousand platelets.Both these cell types are large (up to 100 microns in diameter) and use endoreplication to increase their ploidy within a single cell entity.We provide data in table 2, and again we see that these cells have total DNA content that approaches but does not exceed N U .

Discussion
In this paper we have considered extreme safeguards against DFS errors, through a mechanism in which cells saturate their DNA with ROs on a scale of the average nucleosome separation.Using results from our recent theory of DFS statistics we have derived a formula for the probability of DFS error in this case, and find it to be expressed in terms of a fundamental constant N U ≈ 3 × 10 12 basepairs, which essentially defines the upper limit of DNA that can be rapidly replicated with minimal chance of DFS error.The constant is fundamental as it arises from a product of two highly conserved biomolecular parameters, cf equations ( 6) and (7), and is thus expected to be applicable to organisms spanning the Eukaryota.
Our result is particularly relevant to cell division processes which are required to be efficient in time, i.e. in which there is not the leisure of time for costly postreplication repair of DFS errors [8,[37][38][39][40].As such, we have tested our theory against data from developmental processes which require efficient coordinated cell division processes.Our theory suggests there is a hard trade-off between informational complexity (i.e.size of the genome) and architectural complexity (i.e. the number of developmental cell divisions), and that the product of these two be much smaller than N U .Data from both eutelic organisms and from the Drosophila syncytium are in excellent accord with this prediction.
Our theory is also relevant to single cells which have massive DNA content due to high levels of polyploidy.For such cells which are required to replicate their DNA efficiently in time we again expect a tradeoff between genome size and degree of polyploidy.Data from three different highly polyploid cell types in Drosophila, TGCs in mice, and megakaryocytes in humans are all in accord with the predictions from our theory.
Naturally, none of this constitutes proof that DFS avoidance is the underpinning biological factor in all of these cases.However, the excellent agreement between a diverse range of biological data and our theoretical prediction of the central importance of N U ≈ 3 Tbp does strongly suggest the fundamental role of this constant in shaping biological processes in development and polyploidy.Our theory can be tested experimentally by examining cases of developmental failure (or anomalies in polyploid cells) and ascertaining whether these arise from DFS errors.
The hard limit on rapid DNA replication set by N U suggests, as described by equation (11), that strategies for development must undergo a sharp transition when the product of the number of embryonic cells, M c and the size of the genome, N g approaches this value.If the product M c N g is well below N U then the DFS error rate is negligible and there will be no significant bottlenecks to rapid cell division.However, when the product is similar to or greater than N U , DFS errors are inevitable and the costly repairs thereby required will greatly slow down the developmental process.
One strategy to cope with this limit is simply not to exceed it, and to make every cell and cell division count, i.e. to have a finely choreographed developmental process in which each cell division is pre-programmed.This is eutely, and indeed we find that eutelic organisms across the Eukaryota have very similar genome sizes and cell number counts, respecting the upper bound set by N U , despite the diverse natural histories and morphologies of the organisms concerned.
The alternative strategy is to divide development into a rapid phase (during which there is a negligible chance of DFS errors) followed, once the product M c N g exceeds N U , by a slower phase (allowing time for DFS repairs [8,[37][38][39][40]).The example of syncytial development in insects appears to be an excellent example: extremely rapid and synchronised nuclear divisions in the syncytium, then slowing to a cellularisation process and subsequent tissue-based gastrulation process.It is remarkable that the Drosophila data show that this transition occurs precisely when the N U limit is reached.The existence of small numbers of polar bodies after syncytial development [41] might indeed correspond to the small number of failed nuclear divisions due to DFS, and Poisson statistics can be employed in conjunction with our theory to provide predictions on the number of polar bodies expected to arise.
Higher organisms have a whole series of developmental transitions related to morphological requirements, e.g.gastrulation, neurulation, limb development [2].The very first transition from a cluster of cells (i.e. the blastula) to a more structured morphology might be expected to be tuned to N U , and preliminary data analysis in mammals confirms this.Indeed, equation (8) of our theory provides an estimate of the probability of DFS occurring.If DFS occur during early embryogenesis, and constitute a fatal error, then this estimate provides a lower bound on embryo failure, and work in progress shows these predictions are consistent with data from zebrafish, chicken, and a range of mammalian species [42].A counterexample is found in the amphibian model Xenopus, which undergoes rapid cell division until a cell mass of several thousand cells is formed [2].As DFS errors will inevitably arise in this case, we postulate that large numbers of cells in the early embryo with DFS errors could be discarded without disruption of future development.This is akin to an r-strategy in ecology [43], namely, large numbers of progeny with little parental care and hence high failure rate.In this sense, the choreography of cell division and differentiation in eutelic development is akin to the K-strategy, namely, small numbers of progeny with significant parental care to maximise survival.
Returning briefly to the subject of polyploidy, there are extreme cases which break the bound set by N U .For example, the giant neuronal cells of the sea hare Aplysia california have genome size ∼930 Mbp [44] and ploidy of 600 000 [45] giving a total DNA content of over 500 Tbp.Our theory is moot in such a case, beyond the obvious conclusion that DNA replication in the creation of such cells will be choked with DFS errors requiring repair; other mechanisms beyond replication such as cell fusion may contribute to such enormous ploidy levels.High DFS rates in this case are presumably tolerable for the organism as these cells do not have a role in the earlier developmental processes, and they are not involved in processes which require rapid cell division, unlike the examples we studied earlier, such as trophoblastic giant cells (driving placental development) and megakaryocytes (driving platelet production).
Our theory also places a strict bound on the largest possible genome of an organism, assuming that cell replication must occur with reasonable efficiency at some stages of the organism's life cycle.Assuming a diploid organism, we would predict that one half of N U , namely 1.5 Tbp, is an upper limit on haploid (or half the value of total) genomic content.This compares favourably with very large genome sizes known in single-celled eukaryotes and plants.Specifically, the estimated genome lengths, found in Amoeba dubia and Amoeba proteus, are ∼0.67 and ∼0.29 Tbp, respectively [46].Relatedly, the 2C value for the largest known plant genome, in octaploid Paris japonica, corresponds to a genomic content of ∼0.298 Tbp and another very close candidate is the fern Tmesipteris obliqua (∼0.294Tbp) [47].
The question of whether an organism could sustain a 3 Tbp DNA load in each cell is brought into sharp focus by the question: how much volume is required to store the protein complexes needed to saturate such a large genome?Back of the envelope calculations yield some interesting answers.The MCM2-7 double hexamer complex has a volume of approximately 3000 cubic nanometers [48].To saturate 3 Tbp of DNA (i.e. at the level of one complex per 200 bp repeat of the nucleosome) requires 1.5 × 10 10 complexes whose collective volume is therefore approximately 50 000 cubic microns.Interestingly, this is about the size of a large eukaryotic cell (a cell of diameter ∼40 microns).So, it is physically impossible for an organism to achieve saturation of such a large genome without utilising very large cells (particularly in the embryonic stage where errors are presumably less tolerated).It is natural to then ask whether such severe physical constraints are present for the two applications we studied where saturation was assumed, namely eutelic organisms and the Drosophila syncytium.The answer is no.The volume of MCM complexes required to saturate the modest 100 Mbp genome of C. elegans requires only 0.1% the volume of a typical eukaryotic cell.And the approximately cylindrical Drosophila syncytium has length 0.5 mm and diameter 0.15 mm, yielding a volume of ∼ 10 7 cubic microns.This is ∼200 times larger than the 50 000 cubic microns required to store the MCM complexes necessary to saturate 3 Tbp of DNA.
It may also be interesting to study very large genomes in the Archaea.Here, a modified constant N U would be required as the details of the molecular machinery for DNA replication and packaging will differ from the Eukaryota.For example, the inter-nucleosome periodicity is ∼140 bp rather than ∼200 bp as in the eukaryotes [49].

M Al Mamun et al
We end with a brief comment on the role of fundamental constants in science.The current theoretical framework of physical phenomena involves a small number of fundamental constants.These constants arise from general principles, and are highly valued as conceptual touchstones of physics [50].Examples are the speed of light in vacuum, c, and Planck's constant, h, which arise, respectively, from relativistic invariance and limits of measurement precision due to quantum uncertainty.From the point of view of biological physics it is tantalising to think that correspondingly general principles exist in living systems, manifesting themselves through fundamental constants.Whether N U ≈ 3 Tbp plays such a role, in constraining and guiding developmental strategies of organisms, remains to be seen.

Table 1 .
Data and theory predictions for three eutelic organisms.a a Note, P error is calculated from equation (8) and ρ min from equation (Axi).

Table 2 .
Data for various cells with high degrees of polyploidy.