The physical size of transcription factors is key to transcriptional regulation in chromatin domains

Genetic information, which is stored in the long strand of genomic DNA as chromatin, must be scanned and read out by various transcription factors. First, gene-specific transcription factors, which are relatively small (∼50 kDa), scan the genome and bind regulatory elements. Such factors then recruit general transcription factors, Mediators, RNA polymerases, nucleosome remodellers, and histone modifiers, most of which are large protein complexes of 1–3 MDa in size. Here, we propose a new model for the functional significance of the size of transcription factors (or complexes) for gene regulation of chromatin domains. Recent findings suggest that chromatin consists of irregularly folded nucleosome fibres (10 nm fibres) and forms numerous condensed domains (e.g., topologically associating domains). Although the flexibility and dynamics of chromatin allow repositioning of genes within the condensed domains, the size exclusion effect of the domain may limit accessibility of DNA sequences by transcription factors. We used Monte Carlo computer simulations to determine the physical size limit of transcription factors that can enter condensed chromatin domains. Small gene-specific transcription factors can penetrate into the chromatin domains and search their target sequences, whereas large transcription complexes cannot enter the domain. Due to this property, once a large complex binds its target site via gene-specific factors it can act as a ‘buoy’ to keep the target region on the surface of the condensed domain and maintain transcriptional competency. This size-dependent specialization of target-scanning and surface-tethering functions could provide novel insight into the mechanisms of various DNA transactions, such as DNA replication and repair/recombination.


Introduction
Genetic information, which is three-dimensionally (3D) organized as chromatin in the cell, is scanned and read out by transcription. Although the way that protein factors search and reach their target sequence is not well understood, organization of genomic DNA to govern accessibility to the target is a critical factor.
Despite the irregularity of nucleosome fibres, chromatin nevertheless assembles into higher order structures that can be detected by imaging modalities as well as methods that capture chromosome conformation (e.g. [19]). Such studies have revealed that genomic DNA forms numerous packed domains, called 'topologically associating domains' (TADs) [20,21] (or 'topological domains' [22] or 'physical domains' [23]). TADs with hundreds of kilobases in size have been identified in fly, mouse, and human cells, suggesting that domain structure could be a universal building block of chromosomes [20,22,23]. X-ray scattering analysis has revealed condensed domain features of interphase chromatin up to ∼275 nm [14,17,18]. These studies suggest that interphase chromatin domains are highly condensed as 'chromatin liquid drops' [11,17,18], which could be formed by the macromolecular crowding effect [24]. Similar condensed chromatin features with a megabase-sized genomic DNA have been also observed using pulse labelling as DNA replication foci [25][26][27][28] that were retained stably during the cell cycle and subsequent cell generations [29][30][31]. This condensed feature likely hinders accessibility of macromolecular complexes mediating various DNA transactions to the inner core of chromatin domains.
A potential solution to the accessibility issue is to position the target DNA segment at the surface of the condensed chromatin domains. Indeed, transcription seems to occur outside the chromatin domains (figure 2) [32][33][34], raising the possibility that genes to be transcribed are relocated to the domain surface upon demand. Such relocation of DNA segments takes advantage of the flexible and dynamic nature of chromatin, enabled by the irregular folding of nucleosome fibres [35][36][37]. Single nucleosome imaging of live mammalian cells has revealed local nucleosome fluctuations of ∼50 nm/30 ms [35,37]. Relatively large displacement of specific chromatin regions that encompass 20-50 nucleosomes has also been observed in various cells and organisms using LacO array/LacI-GFP system [38][39][40][41][42][43][44]. These extensive chromatin dynamics could facilitate exposure of genomic DNA sequences to the surface of chromatin domains; thereby, increasing accessibility of the DNA for template-directed biological processes such as transcription.
Thus, a critical question in genome organization is how can a specific gene be relocated to, and maintained on, the domain surface for transcription. An as-yetunexplored possibility is that positioning of the transcriptional template depends on factors that mediate the transcriptional process itself. Transcription by RNA polymerase II involves two distinct sequences, regulatory elements and the core promoter, which bind different classes of proteins [45][46][47][48]. Regulatory elements are recognized by sequence-specific DNA-binding proteins, collectively called 'gene-specific transcription factors' (e.g. forkhead family proteins, nuclear receptors, and GATA family in the online supplementary table 1 stacks.iop.org/JPCM/27/064016/mmedia) [49][50][51]. As shown in figure 1, gene-specific transcription factors are relatively small (∼50 kDa).
These factors recruit various complexes to the core promoter, including Mediator, general transcription factors (GTFs), and RNA polymerase II to carry out mRNA synthesis using the DNA template [45,[52][53][54][55][56][57]. Gene-specific transcription factors also recruit various nucleosome remodellers and/or histone-modifying enzymes, which somehow facilitate transcription [58,59]. Notably, most of these factors recruited to DNA at later steps exist as protein complexes and are more than an order of magnitude larger than gene-specific transcription factors (figure 1 and the online supplementary table 1). Whether this size difference among factors mediating transcription simply reflects the degree of complexity of their functions or whether there is a specific requirement for size itself is unknown.
Here, we present a new model of the mechanism by which the physical size of various transcription factors contributes to transcriptional regulation, reflecting their accessibility to condensed chromatin domains. Based on Monte Carlo computer simulation results, we demonstrate that gene-specific transcription factors can dive into chromatin domains and scan the genome for their target sequences. When the target gene has a chance of being exposed on the domain surface, these factors then act as a tag or a 'lifesaving light' to recruit large protein complexes. Due to their large size, such complexes prevent the site from moving back to the inner region of chromatin domains; they act as a 'buoy' to keep the bound regions at the domain surface. This buoyancy mechanism could maintain transcriptional competency and facilitate transcription.

Methods
Expected sizes (diameters) of the transcription factors in the online supplementary table 1 and figure 1 were calculated based on the formula Rn(N ) = 2.24N 0.392 , where N is the number of amino acids in the protein and Rn is given in angstroms [60]. The total number of amino acids in the protein complex was used for the calculation.
All molecules in the simulations were treated as spherical hard-bodies. A Metropolis Monte Carlo method without long-range potentials and hydrodynamic interactions was employed to determine diffusive motion of all molecules [61]. The diffusion coefficients (Ds) of the molecules were determined by the Stokes-Einstein relationship based on parameters from the EGFP monomer, the diameter and D of which were 3.8 nm and 23.5 µm 2 s −1 , respectively [35]. To realize a volume equivalent to that of a nucleosome [62], the diameter and Ds of spheres representing nucleosomes were 10.3 nm and 8.68 µm 2 s −1 . The D values of 5 nm, 10 nm, 15 nm, 20 nm, 25 nm and 30 nm spheres were 18 µm 2 s −1 , 9 µm 2 s −1 , 6 µm 2 s −1 , 4.5 µm 2 s −1 , 3.6 µm 2 s −1 and 3 µm 2 s −1 , respectively.
Simulations were conducted in a 215 nm cubic box with periodic boundaries (figure 5). Initially, 1500 copies of 10 nm spheres (nucleosomes; corresponding to 0.5mM) were randomly placed in the right half region ('chromatin domain') of the box with respect to the x-axis, and 50 spheres of their respective sizes (5-30 nm) were placed in the left half region (figure 3). Motion of the molecules was iteratively simulated under the following procedures: (1) for each sphere, displacements along three axes ( r = ( x, y and z)) were drawn from the normal distribution with zero mean and 2D t variance using a pseudo random number generator; (2) a putative position after the step was computed (r new = r + r), where r is the current position; (3) If the new putative position of the sphere overlapped with any other sphere (collision), the move was rejected; (4) if the moving sphere was a nucleosome, moves that lead to displacements longer than the mobility length (for example, 20 nm) were rejected (the 'dog on a leash' model; see also [35]); Furthermore, moves out of the chromatin domain were rejected (x-coordinate <108 nm or >215 nm) to keep the nucleosome concentration within the domain constant; (5) steps 1-4 were repeated for all spheres in a random order newly determined for each step; (6) time is incremented by t. Results were obtained by averaging 500 samples from 10 independent trials. The simulation time step, t, was 1 ns.

A 'buoy' model of transcriptional regulation
Condensed chromatin domains (figure 2) provide a novel mechanism for transcriptional regulation. As transcription in cell nuclei seems to occur outside of chromatin domains [32][33][34], we imagine that transcription is prevented inside the chromatin domains (inside region with yellow line in figure 2(a)). The transcriptional competency of genes could thus be regulated by changing their 'buoyancy' towards the chromatin domain surfaces [11]. We propose that buoyancy is provided by factors (or complexes) that mediate the transcriptional process itself. While large proteins (green spheres and pink ovals in figures 2(a)-(c)), such as large transcription complexes and RNA polymerase II, cannot enter the condensed chromatin domains, small proteins (yellow spheres) penetrate into the domains to search for their target sequence (red nucleosomes). Once the small protein-target complex is formed (figure 2(b)), the complex moves around and, by chance, ends up on the domain surface (figure 2(c)). This step largely depends on local nucleosome movement (fluctuation) in the domains, presumably driven by Brownian motion [35,37]. The small protein can then act as a tag, or a 'lifesaving light', to recruit large transcription complexes. As shown in figure 2(d), binding of large transcription complexes (green spheres) keeps the transcriptional regions (red nucleosomes) on the surface of the chromatin domain like a 'buoy'. RNA polymerase II (pink oval) then transcribes the region (figures 2(e) and (f )). This buoyancy mechanism could maintain transcriptional competency and facilitate transcription.

Monte Carlo simulation of the transcriptional activation model
To determine the physical size limit of transcription factors (or complexes) that can enter condensed chromatin domains, we reconstructed the chromatin environment in silico using the Metropolis Monte Carlo method [61,63]. In the simulation, transcription factors and nucleosomes were represented as diffusing spherical particles with diameters of 5-30 nm and mobile spherical particles of 10 nm hydrodynamic diameter ( figure 3). The nucleosomes were placed in the right half of the simulation space ('chromatin domain') at a concentration of 0.5mM (figure 3; online supplemental movie stacks.iop.org/JPCM/27/064016/mmedia). The 0.5mM condition corresponds to interphase dense chromatin or mitotic chromosomes [64,65]; for review, see [66]. The nucleosomes are mobile but their movements are restricted to a certain range, resembling 'a dog on a leash' situation [35]. In the left half of the space, which is free from nucleosomes, some spheres of various sizes were placed and moved around freely (figure 3; online supplemental movie). The diffusion coefficients of the spheres were determined based on the Stokes-Einstein relationship (see section 2 for details).
We examined how many spheres were found in the chromatin domain for each sphere type. Figure 4 shows the fraction of each sphere type localized in the chromatin domain as a function of time. After ∼1 ms, ∼40% of the 5 nm spheres moved around in the chromatin domain. The 10 nm spheres had a similar tendency with much lower (∼15%) efficiency. The mean square displacements profiles of these spheres show their apparent free diffusion in the millisecond range (see the online supplementary figure S1 stacks.iop.org/JPCM/27/064016/mmedia). However, we rarely found spheres >15 nm (15-30 nm spheres) in the chromatin domain. The 2D and 3D trajectories in figure 5 demonstrate almost free diffusion of the 5 nm spheres and constrained movement of the 10 nm spheres. Spheres >15 nm were confined outside the chromatin domains.
These simulation results indicate that the chromatin domains have a potential barrier for spheres with diameters >15 nm. Importantly, we also found that the potential barrier of the chromatin domains was effective not only at a highly crowded 0.5mM of nucleosomes, but also under the 0.3mM condition, which is a rather open chromatin state (figure 6). and spherical particles of 10 nm diameter (blue spheres), respectively. The mobile nucleosomes were placed in the right half of the simulation space ('chromatin domain') at a concentration of 0.5mM. Some spheres of various sizes were placed and moved around freely in the left half of the space, which is free from nucleosomes. Note that the small yellow spheres can move in the chromatin domain but not the large green spheres. See section 2 for details.
As previously suggested [35,37,67,68], once proteins can go into the dense domains, they can move around with help of the nucleosome fluctuation. However, the proteins outside the domains tend to keep staying there because they have more space to go. We assume that this tendency generates a potential barrier of the chromatin domain.

Functional relationship between transcription factors (or complexes) and their size
The 5 nm spheres in silico corresponded to gene-specific transcription factors, which bind to their regulatory elements prior to recruitment of other larger transcription complexes. seems to cause isolation of nucleosome fibres by repulsion forces and stabilizes the region on the domain surface ( figure 2(e)). Such nucleosome modification or eviction to increase negative charge could function as a buoy to keep the regions on the chromatin domain surface (figures 2(e) and (f )). As mRNAs are negatively charged polymers, they also might act as a buoy. Taken together, the in silico results support our hypothetical buoy model: the physical size of the transcription factors (complexes) matters in transcriptional regulation. Small proteins can reach the target region inside the chromatin domains and then act as a tag or a 'lifesaving light' to recruit large transcription complexes. The large transcription complexes, which cannot penetrate the condensed chromatin domains, function as a buoy to keep the regions to be transcribed on the domain surfaces.

Discussion
We have emphasized the importance of the physical size of transcription factors and complexes. The large size of transcription complexes allows themselves (or them) to process multiple regulatory inputs (whether from proteins or nucleic acids) simultaneously. In addition to these plausible known functions, our 'buoy' model provides novel insight into transcriptional regulation, particularly in the condensed chromatin domains that have been investigated extensively: small transcription factors such as gene-specific transcription factors are used to search for regulatory elements on the genomic DNA, even in the condensed chromatin domains. The large transcription complexes, including GTFs, Mediator, RNA polymerase, nucleosome remodellers, and histone modifiers are used to tether the target region to the surface for efficient transcription. We also propose that large transcription complexes (green spheres) can stabilize the condensed chromatin domains (figure 7). The boundaries between the domains are highly transcribed as housekeeping genes, tRNAs, and SINE elements [22,69]. As shown in figure 7, in case of housekeeping genes, binding of large transcription complexes, such as GTFs, RNA polymerase II, and Mediator, to the boundaries prevents the boundaries from embedding in the chromatin domains. To transcribe tRNA genes and SINE elements by RNA polymerase III (pol III) [70], highly stable and large (∼1.5 MDa) transcription complex comprising TFIIIB, TFIIIC, and pol III forms at the promoter of these genes [71,72]. This large complex should also stabilize the boundary structures. Again, nucleosome eviction (figure 7(a)), nucleosome modification ( figure 7(b)) and RNAs to increase negative charge could function to stabilize the boundary regions. Binding of specific proteins including histone subtypes to the boundary regions may also function in a similar way (figure 7(c)) . Establishing these boundaries helps to maintain the chromatin domains by avoiding fusion. Since the condensed chromatin domains such as TADs seem to be evolutionally conserved from flies to humans [20,22,23], it is reasonable to consider that the domains have some selective advantages. One possibility is that the condensed chromatin could be more resistant to DNA damage [73]. Another is the benefit in the transcriptional regulation [21,74]. The regulatory mechanism using large and small protein complexes should also be widely used in these species. Our 'buoy' model is applicable not only for transcriptional regulation but also for other genome functions such as DNA replication and DNA repair/recombination.
For example, the pre-replication complex (pre-RC) for initiating eukaryotic DNA replication is a huge complex. Eukaryotic DNA replication is initiated by formation of the pre-RC at the origin of replication. The pre-RC is generated by the ordered assembly of many replication factors, including the origin recognition complex (ORC), Cdc6 protein, Cdt1 protein, and Mcm2-7 (e.g. [75]. The 3 D structure of the ORC has dimensions of ∼16× ∼ 13× ∼ 10 nm. The dimer structure of Mcm2-7 has dimensions of ∼23× ∼ 15× ∼ 15 nm [76,77]. Besides multi-regulatory functions, the physical size of the pre-RC might contribute to maintain the origin sequence to the chromatin domain surface for efficient DNA replication. Some DNA repair proteins also have large dimensions. Another example is the DNA-PK complex, which is required for the non-homologous end-joining pathway that re-joins double-strand breaks (e.g. [75]). The complex is large (∼12× ∼ 15 nm for DNA-PKcs) and binds to the DNA break ends [78,79]. To complete the DNA repair process, such large complexes may move the DNA break ends outside the chromatin domain to avoid losing the ends. BRCA1-PALB2-BRCA2, involved in the homologous recombination repair pathway (e.g. [75]), is also a large protein complex that may have a similar function.
So far chromatin environment in live cells has been observed by various imaging methods.
For instance, fluorescence correlation spectroscopy (FCS) detects Brownian motion of fluorescence probe molecules in a small detection volume generated by confocal microscopic illumination [80,81]. This approach, which is often combined with computer simulation, demonstrated anomalous diffusion of fluorescent probes in crowded environment inside live cell nuclei [82][83][84][85]. Because the FCS detection regions, which are ∼0.4 µm in diameter ×∼1-2 µm in height, are much larger than the typical condensed chromatin domains discussed in this paper (∼100-300 nm in diameter), FCS provides rather macroscopic view on chromatin environment. Singlemolecule imaging can directly reveal the dynamics of specific molecules ( [86][87][88]; for review, see [89]) including nucleosomes [35,36], and can also reveal cellular structures at high resolution sufficient to visualize the chromatin domains in live cells [90,91]. Live cell imaging with dual labelling of chromatin domains and specific transcription factors/complexes at such a super resolution would be able to prove our proposed buoy model.
In conclusion, the physical size consideration of the protein complexes and geometric constraints in the environment bring novel insights into the functions of the chromatin domains and various protein complexes.