Multi-scale structural community organisation of the human genome

Background Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. Results We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. Conclusions Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1616-x) contains supplementary material, which is available to authorized users.


Supplementary text: Graph wavelet transform and community mining
We propose here a short introduction to signal processing on graphs, in particular the spectral graph wavelet transform, and its application to graph community mining. Graph communities are major graph structures described as groups of nodes highly connected between them and less connected with the rest of the graph. This supplementary text does not contain original material, it is intended to provide the reader with the background information to fully apprehend the proposed analysing approach of Hi-C data, as well as, references to the original bibliography.
Networks have become essential to represent data from a variety of complex systems in social sciences (Wasserman and Faust, 1994;Scott, 2000), biology (Rives and Galitski, 2003;Spirin and Mirny, 2003;Mendes and Dorogovtsev, 2003), computer sciences (Mendes and Dorogovtsev, 2003;Pastor-Satorras and Vespignani, 2004), engineering and many other areas of fundamental and applied sciences (Albert and Barabasi, 2002;Boccaletti et al., 2006;Caldarelli and Vespignani, 2007). These networks can be represented as graphs (West, 1995;Bollobas, 1998;Bondy and Murty, 2008;Diestel, 2010), mathematical objects where the elements of study are represented as nodes (or vertices) and the connections between them constitute the edges of the graph. A graph G = (V, E) is defined by a set of nodes V and a set of edges E ⊂ V 2 that link the nodes to each other. Here, we only consider finite graphs where edges are not directed (undirected graphs) and with no loop (node self-connection). We note n = |V |, the number of nodes and m = |E|, the number of edges. Matrices are powerful tools for representing graphs in a computer and developing mathematical tools to study the associated graph properties (West, 1995;Chung, 1997;Bollobas, 1998;Bondy and Murty, 2008;Diestel, 2010). The adjacency matrix A of a graph G is a n × n matrix where the entries a ij = 1 if the nodes x i and x j are connected (adjacent), and a ij = 0 otherwise. Note that the adjacency matrix of an undirected graph is symmetric. When assigning a weight to each edge the graph becomes weighted and the weighted adjacency matrix W = (w ij ) has non-null values when x i and x j are connected and 0 otherwise: the higher the value of w ij , the stronger the link between them. We note M the adjacency (A) or the weighted adjacency (W ) matrix.
In applications such as the ones cited above, one may be interested in analysing the distribution of data values residing on the vertices of the graph, such data sets have been described as signals on graphs. Naturally, one can wonder what are the best strategies to characterise and to extract efficiently the information from these signals on graphs. Recent developments in the area of graphsignal processing provide us with operators to analyse signals on graphs (Shuman et al., 2013); they generalise classical signal expansions, such as Fourier and wavelet decompositions, to the graph signal setting. Importantly, the graph Fourier modes and the graph wavelets depend on the graph that is considered and, thus, convey information about the graph topology. This property can be used to build graph community mining methods (Fortunato, 2010;Tremblay and Borgnat, 2014). For example, Spectral clustering, in its simplest form, defines a bipartition of a graph based on the sign of the graph Fourier mode of lowest, non zero frequency (Fiedler, 1973;von Luxburg, 2007) and was previously applied to the Hi-C interaction network (Chen et al., 2015). Here, we present the construction of spectral graph wavelets (Hammond et al., 2011) and their usage to build a fast multi-scale community mining algorithm (Tremblay and Borgnat, 2014).

Graph spectral domain
In the area of signal processing on graphs, spectral graph theory has become a tool to define frequency spectra and expansion bases to define graph Fourier transforms. We present the construction of graph Fourier basis as the eigenvectors of the Laplacian matrix of the graph (Shuman et al., 2013) from which the spectral graph wavelets we will be using are constructed (Hammond et al., 2011). Note that alternative constructions of Fourier modes have also been explored (Sandryhaila and Moura, 2012).
We first recall some spectral graph theory elements (Chung, 1997). Let G = (V, E) be an undirected and connected graph with M = A or W , the adjacency or weighted adjacency matrix corresponding to the graph. A signal (or a function) F : V −→ R on the nodes of the graph can be seen as a column vector F ∈ R n , where the i th component F i of the vector F represents the signal value at node x i . The graph Laplacian matrix L is defined as: L = D − M , where D is a diagonal matrix whose element d ii = Σ j m ij , is the degree (or weighted degree i.e. strength) of the node.
Note that the Laplacian matrix can also be found under its normalised form: L = D − 1 2 LD − 1 2 . In the case of a weighted graph, the non-normalised graph Laplacian is also known as the combinatorial graph Laplacian. In all cases, the graph Laplacian is a real symmetric matrix (because M is symmetric), and hence, it has a complete set of orthonormal column eigenvectors, denoted {χ l } l=0,..,n−1 . These graph eigenfunctions have associated positive eigenvalues {λ l } l=0,..,n−1 . Zero appears as an eigenvalue with multiplicity equal to the number of connected components. Since we are only interested in connected graphs, we consider the Laplacian eigenvalues ordered as 0 = λ 0 < λ 1 ≤ λ 2 ... ≤ λ n−1 := λ max . The set of λ i 's is called the spectrum of L (or spectrum of the associated graph G). We note χ = (χ 0 | · · · |χ l | · · · |χ n−1 ) = (χ il ).
Let us recall that the classical Fourier transform is the projection of a function f on the complex exponentials, which are the eigenfunctions of the one dimensional Laplace operator: −∆(e 2πiξt ) = − d 2 dt 2 e 2πiξt = (2πξ) 2 e 2πiξt . When considering the circular graph of n nodes * that is nothing but the regular (discrete) line with periodic boundary conditions, L is the classical discrete Laplace operator i.e. the discrete second derivative operator, and its eigenvectors are the usual discrete Fourier modes. Following this fundamental analogy between the classical case and the graph setting, the eigenvectors ({χ l } l=0,..,n−1 ) of the graph Laplacian of graph G are interpreted as the graph Fourier modes associated to frequencies √ λ l for that graph. Consequently, the Graph Fourier Transform (GFT)F of a signal F on the vertices of a graph G is defined as the projection of F on its graph Fourier modes :F l = n i=1 F i χ il , which can be written in matrix notation as:F = χ F, (S2) * Nodes are labelled from 0 to n − 1 and node i is connected to the two nodes i − 1 (mod n) and i + 1 (mod n).
where χ is the transpose of χ. In the same manner, the classical inverse Fourier transform: is the reconstruction of f as a weighted sum of the Fourier vectors, which is mimicked in the graph setting as: F i = n−1 l=0F l χ il , that can be written in matrix notation as: In this construction, signal processing on graphs can be considered as a generalisation of the "classical" discrete signal processing, which is recovered when considering circular graphs. Figure S2 illustrates some Fourier modes on a hierarchical toy graph (Fig. S1). One can already remark that the first eigenvectors are very informative for community detection as discussed above. For instance, partitioning the toy graph according to the sign of χ 1 , leads to 2 meaningful communities ( Fig. S2A).

Spectral graph wavelets and the graph wavelet transform
There is not a unique way to extend the notion of wavelets to the graph setting. For example, graph wavelets have been defined using the notion of diffusion in the node neighbourhood (Coifman and Maggioni, 2006), lifting (Jansen et al., 2009), and filter banks (Narang and Ortega, 2012). Here, we present the construction of graph wavelets using the graph spectral domain (Hammond et al., 2011). These graph wavelets have been shown to be well adapted to the problem of community mining (Tremblay and Borgnat, 2014).
The classical wavelet transform (WT) is a space-scale analysis which consists in expanding signals in terms of wavelets which are constructed from a single function, the "analysing wavelet" ψ, by means of translations and dilations. The WT of a real-valued function f is defined as (Mallat, 1998): where x 0 and s (> 0) are the space and scale parameters respectively. At a fixed scale s, Eq. (S5) can be interpreted as the convolution of the signal with the wavelet ψ s centered on 0 and derived from the analysing wavelet ψ as: ψ s (t) = 1 √ s ψ( t s ). Importantly, given the spectral perspective of our construction, convolution in the direct space of a signal f by a filter h corresponds to multiplication in the Fourier space: (h * f )(ξ) =f (ξ)ĥ(ξ). Mimicking this property, convolution of the graph signal F by the graph filter H can be defined as the inverse Fourier transform (Eq. (S4)) of the product of the graph Fourier transformĤ andF of F and H, respectively: where • stands for the component wise column multiplication. Mimicking the fact that the Fourier transform of ψ s can be written as follows:ψ s (ξ) = √ sψ(sξ), we introduce a continuous function g : R + → R + equivalent to the band-pass filterψ, and we construct the graph Fourier transform Ĥ (s) of the graph wavelet filter at scale s as follows: This results in the following definition of the graph wavelet transform of a graph function F : Denoting G (s) the filter matrix at scale s in the graph Fourier space defined by: and using Eq. (S2), the definition of the graph wavelet transform of F can be written as: Let ∆ (k) denote the Dirac function on the graph centred on the node x k : ∆ is the identity matrix, we can write the matrix Φ (s) of all the wavelets at scale s as the following simple matrix product: For a given graph, the graph wavelets Φ (s) are thus entirely determined by the band-pass kernel filter g. We use the filter g(x) proposed in Tremblay and Borgnat (2014) that is band-pass in the range [1, 1 λ1 ] and that is well conditioned for the task of community detection over the scale range [s min = 1 λ1 , s max = 1 λ 2 1 ]. The above construction of graph wavelets does not guarantee their proper normalisation, so that the √ s normalisation factor can be dropped. Note, however, that normalisation is not required for the multi-scale community mining protocol described below.
When required this can be remedied by a posteriori normalisation of the graph wavelets.
The graph wavelet at a reference node x k gives an idea on how the graph is perceived from that node. As illustrated in Figure S3 for our hierarchical toy graph, at small scales, the graph wavelet coefficients are higher in the "close" neighbourhood of the reference node and for larger scales the graph wavelet extends over a larger neighbourhood. At small scale, the graph wavelet has an "ego-centered" view of the graph, it takes the value 1 at the reference node and 0 elsewhere (like a Dirac) (Fig. S3A). At larger scales, the graph wavelet coefficients expend on the neighbourhood of the reference node (Fig. S3B-D), reflecting the graph topology. In Figure S3B, the graph wavelet coefficients are positive over the full 16 node group around the reference node, reflecting the second level of organisation of the graph in 8 groups of 16 nodes (Fig. S1B). In the same manner, at a larger scale, the positive values of the graph wavelet coefficients (Fig. S3C,D) recover the 32 (resp. 64) node group of the reference node at the third (resp. fourth) level of the hierarchical organisation of the toy graph (Fig. S1C,D) .

Multi-scale community mining using graph wavelets
The intuition behind the community mining method based on graph wavelet proposed by Tremblay and Borgnat (2014), is to consider that the wavelet centred around a node is an "ego-centered" vision of the graph, and that two nodes in the same community are expected to have a very similar view of the graph (Fig. S3). In other words, and this is the central idea of the algorithm, nodes are classified together if their neighbourhood as defined by graph wavelets are similar. The method consists at each scale s in three steps (Tremblay and Borgnat, 2014): • For each node x k , one defines its feature vector as the coefficients of the wavelet Φ (s,k) that encodes local information on the graph topology seen by the node x k (Fig. S3). Spectral graph wavelets Φ (s,k) are derived from the normalised Laplacian L.
• To define to which extent two nodes x k and x k have a similar environment, a distance matrix D (s) is created where the correlation distance D (s) (k, k ) between nodes x k and x k is one minus the correlation C (s) (k, k ) between the wavelets Φ (s,k) and Φ (s,k ) (Fig. S4): Note that this distance measure is independent of graph wavelet normalisation.
• A hierarchical clustering algorithm is used to classify the nodes (Fig. S4). The hierarchical algorithm outputs a dendrogram that needs to be "cut" to obtain a partition P (s) . To cut the dendrogram, the method defines a criterion based on averaging the maximal gaps of all the root-leaf paths of the dendrogram. For each node x k , one computes the gap function Γ k as the path length between the leaf corresponding to node x k and the begining of the dendrogram. Then after averaging all gaps functions into a global gap function, the best cut corresponds to the maximum of this global gap function (Tremblay and Borgnat, 2014).
Repeating these three steps for a set of scales (s i ) i∈ .

(S13)
Fast multi-scale graph wavelet community mining The major inconvenient of the graph wavelet community mining protocol is the computation cost.
Two approximations have been proposed allowing this protocol to be suitable for the analysis of large graphs ( 10000 nodes) (Tremblay and Borgnat, 2014).
On the one hand, instead of computing the graph wavelets centred on the n nodes of the graph which requires n wavelet transforms of Dirac functions (Eq. (S10)), the matrix of correlations C (s) (k, k ) between graph wavelets at scale s (Eq. (S11)) is estimated using the graph wavelet transforms W (s) [R (j) ] of η ( n) random Gaussian functions on the graph (R (j) ) j∈[1...η] 6 (Tremblay and Borgnat, 2014). More specifically, given a random Gaussian vector R, it can be shown that the correlations between its graph wavelet coefficients (the projections of R on the graph wavelets: W (s) [R] k = Φ (s,k) R) reduce to the correlation between graph wavelets: can be estimated by the sample correlation coefficient between the vectors ( (Tremblay and Borgnat, 2014).
On the other hand, the graph wavelet transform can be computed using the fast algorithm proposed in Hammond et al. (2011). Eq. (S9) shows that the calculation of the graph wavelet transform at a scale s requires the knowledge of the graph Fourier matrix χ, itself calculated by the diagonalisation of the graph Laplacian. The diagonalisation of a matrix of size n typically needs a calculation time cubic in the number of nodes n, which makes it impracticable to use for graphs with more than a few thousand nodes. To overcome this difficulty and to calculate the wavelet transform of a signal F quickly, it is in fact possible, using an approximated algorithm, to avoid the explicit calculation of the graph Fourier matrix (Hammond et al., 2011). This approach consists in approximating each filter g(s.) into a truncated Chebyshev polynomial of degree p: From Eqs. (S8) and (S14), it results the following approximation of the matrix G (s) where Λ = diag(λ 0 , . . . , λ n−1 ) is the diagonal matrix whose diagonal entries are the eigenvalues λ k of the Laplacian matrix L. Observing that χΛ i χ = L i , we can write the following approximations for the construction of graph wavelets and the computation of the graph wavelet transform: and    Dendrogram representation of the hierarchical clustering resulting from the graph wavelet correlation matrix; the red line mark the best cut level used to cut the dendrogram and to define the graph communities at the scale of analysis. Nodes where ordered according to the dendrogram representation. 8 communities are delineated, they correspond to the 8 red squares clearly apparent on the correlation matrix and they perfectly recover the 8 groups of 16 nodes at the second level of construction of the hierarchical toy graph (Fig. S1B).  condensed and spatially organised state during interphase to a highly condensed and morphologically reproducible metaphase chromosome state [243]. This study provides Hi-C data for HeLaS3 cells during mid-G1 and metaphase. In the former phase, the interaction maps display similar plaid patterns of regional enrichment or depletion of long range interactions (as the one shown in Figure. 6.1) while the maps in mitotic cells ←− Figure S7. Structural communities during the cell cycle. Same as in Fig. 3 for HeLaS3 cells during G1 (left) and mitosis (right), along the complete masked chromosome 17. The black arrow point to the location of chromosme 17 centromere. Note that the two communities obtained at large scale correspond to the partition of chromosome 17 into its two chromosmome arms. Figure S8. Simulation of a non-hierarchical structural domain organisation. (Left) Model structural interaction matrix for 2000 nodes organised in fully connected interval-communities with no specific organisation at scales larger than the community size: the matrix is built as a series of 40 pairs of domains of size 20 nodes and 30 nodes with internal domain interaction set to 60, with the two first (resp. second) sub-diagonals set to 80 (resp. 70) to assure connectivity and with an additive Poisson noise over all interaction pairs of mean value λ = 50. (Right) Interval-communities obtained when using the multi-scale community mining algorithm based on graph wavelets on the non-hierarchical interaction network described above and represented like in Fig.3. Figure S9. Same as Fig. 4 for the TADs grouped in different size categories: 0.3 ≤ L < 0.6 Mb (light pink), 0.6 ≤ L < 1 Mb (pink), 1 ≤ L < 2 Mb (magenta), 2 ≤ L < 3 Mb (dark pink).  Figure S12. Same as Figure 6 for the comparison of the TAD sets in H1 ES and IMR90. Blue points were obtained with IMR90 interval-communities as the query set and H1 ES interval-communities as the reference set. Yellow points corresponds to the reversed analysis. Cell line Not sequenced Low interacting High interacting Total  IMR90  1792  1100  97  2989  H1 ES  1734  1286  20  3040  K562  1732  565  584  2881  GM06990  1730  990  212  2932  HeLaS3  1836  986  130  2952   Table S1. Masked data. We removed from the original data (28688 100 kb loci over the 22 autosomes) low and high interacting fragments along with fragments corresponding to not sequenced regions of the genome. For each considered Hi-C dataset and for each chromosome, we computed the meanc and the standard deviation σ of the total intra-chromosomal interaction count per loci ni (sum over the Hi-C matrix line). Setting the thresholds to low = max(0,c − 2σ), and high = min(0.99L,c + 2σ), where L is the chromosome size (in number of 100 kb pixels), we only retained loci where ni ∈ [low, high], removing 10% of the data (6% correspond to unsequenced fragments, ∼ 2 to 4% correspond to low interacting fragments and ∼ 2% correspond to high interacting fragments).