Chromatin image-driven modelling

The challenge of modelling the spatial conformation of chromatin remains an open problem. While multiple data-driven approaches have been proposed, each has limitations. This work introduces two image-driven modelling methods based on the Molecular Dynamics Flexible Fitting (MDFF) approach: the force method and the correlational method. Both methods have already been used successfully in protein modelling. We propose a novel way to employ them for building chromatin models directly from 3D images. This approach is termed image-driven modelling. Additionally, we introduce the initial structure generator, a tool designed to generate optimal starting structures for the proposed algorithms. The methods are versatile and can be applied to various data types, with minor modiﬁcations to accommodate new generation imaging techniques.

In our previous work, we demonstrated a modelling engine called spring-model [3].It allows the use of multiple types of data in any combination, and the user can decide which data to include in the modelling and how it will affect the shape of the modelled structure.So far, our engine has allowed for the construction of models based on 1D E-mail address: michal.kadlof@pw.edu.pl(M.Kadlof).and 2D data.In this work, we present two alternative ways to use 3D microscopy data in the model-building process.
Both of these approaches are inspired by methods used in protein modelling [34,35].One increasingly popular research method is cryoelectron microscopy, which involves taking numerous 2D images of the same protein at different angles.The images are then automatically combined into a 3D image.Such an image does not directly represent the positions of atoms, but rather the density of the biomolecule.This density is then used as an additional energy factor in modelling.One example of its use is to improve the structure obtained from indirect methods (e.g., homology modelling).Such an initial structure can be subjected to molecular dynamics, in which the image density generates an additional force that acts on each particle, "pushing" it into its interior.Since the crystallographic structure is not always consistent with the structure under physiological conditions, such an improvement can produce favourable results.
In our approach, we propose to apply a similar method to 3D images obtained from microscopy.There are several methods that allow for microscopy at a sufficiently high resolution.Among them, the following methods should be distinguished: 3D-EMISH, IPALM, and STED.Resolution of FISH microscopy is too low to be used for modelling.In general, any imaging method that generates an image that can be interpreted as density -i.e., a three-dimensional mesh with unambiguously defined values at the nodes of the grid -can be used as input data for the methods described below.We consider a practical resolution threshold for images suitable for our modelling techniques, characterized by the ability to distinctly visualize individual nucleosomes.It is important https://doi.org/10.1016/j.ymeth.2024.04.006Received 7 September 2023; Received in revised form 13 March 2024; Accepted 5 April 2024 to note that this threshold is not rigid and should not be regarded as an absolute constraint.

Methodology
In our work, we propose two alternative methods that can be applied to constructing image-guided models.Both of them are used as an additional energy factor to the force field or molecular dynamics used for structure building (Fig. 1).The first one is called MDFF -Molecular Dynamics Flexible Fitting [34].In this approach, the image is interpreted as a potential field "attached" to a specific point in space.The maxima of intensity in the image correspond to the energy minima that the structure converges to.This method has been successfully used for fitting protein structures to images from Cryo-EM microscopy and lowresolution electron density maps in crystallography [36].We propose applying a similar approach, wherein we substitute both the polymer type and microscopy data.The potential is given by: where  is number of beads, ⃗   is three-dimensional vector pointing to position of -th bead, and   is potential acting on particular bead given by: where  is scaling factor and Φ is 3D image.
So in every point in space there is fixed potential acting on each bead separately.Bead is a subject to an additional force: The second of the described methods is the correlation method also derived from the world of protein modelling [35].In this case, at each step of the simulation from the current conformation of the structure, a "synthetic microscopic image' is calculated, which is then compared with the "truth" or experimental image.A three-dimensional Gaussian function is used to transform the structure into an image.Each point is replaced with the result of a three-dimensional Gaussian distribution function.The sum of these distributions creates a synthetic image with higher values in areas of local point density and lower values in the opposite case.The Gaussian function is computed on the same grid as the microscopic image.This process creates two objects of the same type -3D matrices with dimensions corresponding to the dimensions of the microscopic image.These two objects can be compared in the sense that they can be flattened into a vector, and the classical Pearson correlation can be calculated for those vectors, which in this case serves as a measure of similarity between the synthetic and experimentally obtained images.Therefore, the correlation is a function of many points forming the structure, and by calculating the gradient of this function, we can indicate the directions in which the points should be moved to approach the energy minimum.The potential is given by: where  is scaling factor and   is cross-correlation coefficient given by: where, K is set of all coordinates in the image array.Values of Φ  are determined by applying 3D Gaussian function in every bead position as follows: Let ⃗  denote voxel position in simulated image Φ  , then are contributions to voxel ⃗  originating from bead ⃗   .
Then we accumulate all of them to each voxel.
where integration is over the volume  ⃗  of the voxel ⃗ , ⃗  ∈ .Values of Φ  represent the image signal.Since those values remain constant throughout the simulation, during the optimisation of the potential a simplified cross-correlation coefficient may be used.The equation ( 5) may be rewritten as follows:

Limitations of a method
The above methods, like any method of modelling chromatin, are sensitive to numerous factors such as the quality of the obtained data (especially resolution and number of image channels), data preprocessing, assumptions regarding the number of contacts, local chromatin density, as well as internal parameters.Minor errors in modelling preparation can result in results inconsistent with expectations.

Limitations of the MDFF method
Some modelling methods use so-called excluded volume, while others do not.Excluded volume (EV) is a concept used in some modelling methods.It refers to the idea that each point in 3D space occupied by a structure has a reserved volume around it, inaccessible to other points.Figuratively speaking, the spheres cannot penetrate each other or "collide" with each other.There are many methods of modelling excluded volume -in the case of the Spring Model, the recommended way is to use the repulsive part of the Leonard-Jones potential.The MDFF method requires the use of EV potential.Otherwise, some of the points may eventually accumulate in the same local minimum of the structure.This effect arises from the fact that the spheres that fell there once would pull the rest behind them, and in the absence of a counteracting factor, after some time, they would all fall into the same potential well, creating a non-physical artifact.This limitation has a double meaning: firstly, it makes it impossible to model chromatin using the so-called perfect polymer model -in which points do not have volume, but which at a certain scale is a good approximation of the chromatin chain.Secondly, calculating the EV factor is usually the most computationally expensive step in calculating energetic interactions during MD simulation.
To counteract the phenomenon described above, each point must have a certain volume -effectively becoming a sphere.Thanks to this, spheres that have already fallen into the potential well will not allow the next one to fall there, because the next one will not fit anymore.Therefore, it becomes obvious that the key parameter of this method is the determination of the size of a single sphere.The SM model is a mesoscopic coarse-grained model, where in typical applications, one sphere can represent 0.5-20kbp.Determining the amount of chromatin represented by a single sphere and its physical volume becomes crucial and depends on specific modelling parameters -model resolution, fragment size, and local chromatin density.
Unlike rigid spheres, these spheres possess a soft nature, allowing them to partially overlap due to the repulsive force described by the Lennard-Jones potential.When an external force acts upon the spheres, those positioned within the cluster experience heightened compression.This compression arises not only from the sheer number of spheres but also from the slightly diminished distances between their centres.As a result, the local density undergoes a notable increase, creating a region of enhanced particle packing.The impact of this phenomenon might be negligible in most cases.However, this effect can be precisely controlled using parameters describing the EV potential.

Limitations of the correlational method
The difficulties described above completely disappear when using the correlation method.In this approach, the minimal energy does not correspond to maximum point density in density centres, but to maximizing the similarity between the experimental image and the artificially generated one from the model.There is therefore no pressure to force an excessive number of points, and thus no problem of lacking EV -as a consequence, the correlation method can be used for polymer modelling without EV.The disadvantage of this method is a significantly higher computational cost.

Limitations arising from the availability of experimental data
Both methods are only able to find the optimal conformation within the local environment of the structures.There are three types of positional uncertainties: 1. uncertainty in the positions of the geometric centres of chromosomes 2. uncertainty in the N-terminus to C-terminus orientation 3. uncertainty in the rotation.
When modelling local fragments (e.g.chromatin domains), the positional uncertainty disappears because the geometric centre of the modelled fragment is identified with the centre of mass of the microscopy image.Below we present a proposal for addressing all three types of uncertainty based on microscopy image analysis.Although we are aware of their imperfections, we believe that with the advancement of microscopy techniques, these problems will gradually decrease over time.

Initial structure
Both of the described methods can in principle be applied to the microscopic images at any genomic scale.In the case of the whole genome simulations both methods require a properly designed initial structure.Our goal in this work was creating a complete pipeline for the conversion of the microscopic images of the nuclei of particular cells into their 3-dimensional chromatin models that would later be used as an initial structures for the methods discussed previously.
Our take on modelling those genomic features consists of the following parts: critical points estimation, creation of a weighted graph, graph partitioning, and finally chromatin path modelling.
Critical points are identified by randomly drawing them from the image density.One may manually apply some threshold to exclude background noise and draw only from the relevant signal space.Those points are connected together with Delaunay triangulation.Then each edge is assigned a weight proportional to its length and strength of a signal that it passes through.Example of triangulated structure is depicted in supplementary materials S1.In case of multichain modelling (e.g.multiple chromosomes) one may run partitioning algorithm to split graph multiple components.In case of multi chromosomal simulation those sub-graphs may be interpreted as separate chromosomal territories.In case of single chain modelling (e.g.single chromosome or single chromatin domain) partitioning step is skipped.
In the last stage of the process of building initial structure, for each sub-graph the path passing through all the vertices has to be calculated.We use the travelling salesman problem (TSP) heuristic algorithm [37] with the weighted graphs.In the next and final step splines were applied in order to smooth out the chromosomal path obtained from the TSP solving algorithm, and force distances between neighbouring beads to be close to 1 diameter of bead.The idea was depicted on the Fig. 2.

Modelling artificial example
To test our algorithms we created an artificial test system (ATS) based on a known structure, which serves as our ground truth.Using a Spring-model [3] we built a simple 86 bead structure with 3 loops and 5 stabilizing long-distance interactions.The structure is depicted on Fig. 3a.
For clarity our ATS structure was flat, however all stages where done in 3D space.Then based on that structure the hypothetical microscopic image was generated by applying gaussian kernel on each bead and additionally some random noise was added.The result was depicted on a 3b.Our goal was to check if our method is able to reproduce the ground-truth structure without adding any information about constraints used to build it.The only information we used was the ATS microscopic image.The first step to accomplish that was to convert an image (raster graphics) into potential function   .Since we relay on OpenMM [38] modelling tool, all we had to do was to convert the image into 3D numpy array [39], and normalize it int values [0, -1].This conversion in the form of contour plot with visible force vectors is depicted on Fig. 3c.
The first step is running our simulation (the force method) in fully aligned conformation; without any additional constraints, to check if it is stable.All the parameters used in this simulation are in supplementary materials S2.1.The animation from that simulation is also attached as supplementary materials (file: force_method_aligned-2023-  This observation assured us that the image may be suitable source of interactions, and as long as image quality is good enough and structure is well aligned, we are able to reproduce the structure without adding any extra long-distance constraints.
So the next step was to check if we are able to get proper conformation without knowledge about initial position and shape.We tried two types of structures -regular circle and randomly generated structure with our structure generator.The steps are: randomly draw points from image density, triangulate them, add weights to the edges using values from the image, find the shortest path between them, and finally fill missing beads using spline interpolation.
In both cases we managed to only partially reproduce the desired structure (Figs. 4 and 5).
In first case two out of three loops were recovered.In second case all three loops appeared, however, one additional loop was formed, and positions of beads forming loops were shifted; inappropriate topology was introduced.Those results have proven that: 1. it is possible to obtain stable chromatin structures using only images, 2. this method is suitable for modelling only as long as we are capable to pinpoint some key polymer regions into certain image regions.

Modelling 3D-EMISH example
To showcase the method's potential on real-world data, we applied it to four raw 3D-EMISH microscopy images (samples 159, 175, 188, and 200) from the study by Trzaskoma et al., 2020 [12], and we have taken four of the raw pictures as the example (namely, samples 159, 175, 188 and 200).
Further, we have used PartSeg [40] software to remove the background.We have applied the upper threshold, with minimum size of connected component for cut off the background along with gauss filter of radius 2,0.Then, we have used USCF Chimera, and changed the voxel size of the image to 1, 1, 4 and applied gaussian filter with radius equal 2.0 to smooth the noise.The initials structures were built with the structure generator with 500 initial beads, img_tsp mode, and threshold of 0.
Finally, we have modelled using Spring Model engine with two described methods, obtaining the final result (see Supplementary Materials for simulation video).Fig. 6 shows the results of the modelling process for sample 188.

Conclusion
In this work, we propose two image-driven methods for modelling of the chromatin.We have proven that the usage of the image is sufficient for the proper modelling, and have reviewed the limitations and applications of our two methods.The proposed algorithms are compat-   ible with current microscopy data and can be readily adapted to future advancements in imaging techniques.In our study, we have proposed intelligent solutions that enhance the utility of the algorithms, even when confronted with the constraints inherent in contemporary stateof-the-art microscopy techniques.

Fig. 3 .
Fig. 3. a) Ground truth structure used to prepare (ATS).b) Slice of hypothetical microscopic image.c) Potential function with forces vectors.(For interpretation of the colours in the figure(s), the reader is referred to the web version of this article.)06-09_14.53.23.mkv).In the early stage of simulation the structure slightly contracted, and then remained stable for the rest of simulation.Final RMSD was 1.170 (in units of bead radius).The final structure created using the image, and comparison of initial and final structure are presented in supplementary materials (Fig. S2 and S3 respectively).This observation assured us that the image may be suitable source of interactions, and as long as image quality is good enough and structure is well aligned, we are able to reproduce the structure without adding any extra long-distance constraints.So the next step was to check if we are able to get proper conformation without knowledge about initial position and shape.We tried two types of structures -regular circle and randomly generated structure with our structure generator.The steps are: randomly draw points from image density, triangulate them, add weights to the edges using values from the image, find the shortest path between them, and finally fill missing beads using spline interpolation.In both cases we managed to only partially reproduce the desired structure (Figs.4 and 5).In first case two out of three loops were recovered.In second case all three loops appeared, however, one additional loop was formed, and positions of beads forming loops were shifted; inappropriate topology was introduced.Those results have proven that: 1. it is possible to obtain stable chromatin structures using only images, 2. this method is suitable for modelling only as long as we are capable to pinpoint some key polymer regions into certain image regions.

Fig. 4 .
Fig. 4. Simulation starting from regular circle a) Initial structure b) final structure.

Fig. 5 .
Fig. 5. Simulation starting from randomized structure a) Initial structure b) final structure.

Fig. 6 .
Fig. 6. 3D-EMISH example a) Raw image (z=11) a slice from the raw 3D-EMISH microscopy data.b) 3D visualization of the chromatin structure.The initial model, generated using the structure generator algorithm, is overlaid on the processed microscopy data.c) a representative frame captured during the Spring Model simulation.The simulation refines the initial model based on the image data to create a more realistic representation of the chromatin structure.