Node Overlap Removal Algorithms: an Extended Comparative Study

In the context of graph layout, many algorithms have been designed to remove node overlapping, and many quality criteria and associated metrics have been proposed to evaluate those algorithms. Unfortunately, a complete comparison of the algorithms based on some metrics that evaluate their quality has never been provided and it is thus diﬃcult for a visualisation designer to select the algorithm that best suits their needs. In this paper


Introduction
Graph-drawing algorithms are good at creating rich expressive graph layouts but often consider nodes as points with no dimensions.After changing the size of nodes in the case of annotation or evolving weighted graphs, it causes node overlap which hides information.Post-processing algorithms, named layout adjustment [21], have been proposed to remove node overlap.
The objective of these algorithms is, given an initial positioning of the nodes and a size for each one, to provide a new embedding so that there are no overlapping nodes any more.A classical zoom-in function maintaining the sizes of the nodes (i.e., uniform scaling) provides such an embedding, but it expands the visualisation, resulting in large areas without any objects.Therefore, a node overlap removal algorithm must take into account the area of the drawing, and try to minimise it.Positioning the nodes evenly on a grid meets this objective but will result in the loss of the user's mental picture 1 of the original embedding.Thus, it is also important to minimise the change on the layout.
Since a preliminary work in 1995 [21], many algorithms have been designed to reach these goals, and many quality criteria have been proposed to evaluate them.Unfortunately, a complete comparison of the algorithms based on the different criteria has never been provided and it is thus difficult for a visualisation designer to select the one that best suits his needs.
In this paper, our contribution comes in three forms: (1) We propose a classification of 22 quality metrics, grouping them according to the quality criterion they try to capture.We also discuss their relevance and we select a representative one for each class.(2) We compare state-of-the-art node overlap removal approaches in regards to the previously selected metrics.Experiments involve 854 graphs, including synthetic ones (random, tree, scale-free, small-world) and real-world ones.(3) We present a JavaScript library 2 , that contains all the algorithms described in this paper, and a Web platform, AGORA 3 (Automatic Graph Overlap Removal Algorithms), in which one can upload a set of graphs, apply the node overlap removal algorithms and download the results and the values of the quality criteria 4 .
The paper is organised as follows: after a brief reminder in Section 2 of the definitions and the notations used in this paper, we present and discuss the quality criteria and the metrics in Section 3. Then we compare the algorithms in Section 4. We discuss threats of validity and future research directions in Section 5. Finally we describe the Web platform in Section 6 and we conclude in Section 7. 1 An interesting discussion about the concept of mental map preservation is available in [1]. 2 https://github.com/agorajs/agorajs.github.io(accessed: 2020-03) 3 https://agorajs.github.io/(accessed: 2020-03) 4 A preliminary version of this work has been published in the proceedings of the Symposium on Graph Drawing and Network Visualisation 2019 [3].This new version includes further details on some criteria, a new node overlap removal algorithm (Diamond [20]), a more detailed analysis of the results, a discussion on the threats to validity and the directions for future work, the library and the Web platform.
In this paper, we use the following definitions and notations.
G = (V, E) denotes a graph where V is a set of nodes and E a set of edges.The number of nodes |V | is denoted by n and the number of edges |E| by m.We consider each node as a rectangle.Thus, for a node v ∈ V , its width and its height are denoted by the couple (w v , h v ) which is not changed by the layout adjustment.
The initial embedding is defined as an injection where (x v , y v ) are the coordinates of the center of the node v.The overlapping-free embedding is denoted by E G .To simplify notations, we denote Remark that two nodes (u, v) ∈ V 2 are overlapping when: The bounding box bb of an embedding E G is defined as the smallest rectangle containing all the nodes of G; w bb (resp.h bb ) denotes the width (resp.the height) of the initial embedding, w bb (resp.h bb ) denotes the width (resp.the height) of the overlapping-free one.They are determined as follows: The position of the center of the bounding box is denoted by c bb = (x bb , y bb ) in the initial embedding, and c bb = (x bb , y bb ) in the overlapping-free embedding.
The convex hull of an embedding E G is defined as the smallest convex region containing all the nodes of G.Note that it is computed by using the 4 corners of the nodes, and not only their center, in a way that the rectangles representing the nodes are fully included into it.In the following, ch denotes the convex hull of the original embedding, ch the convex hull of the free-overlapping one, c ch the center of mass of ch, c ch the center of mass of ch .

Quality criteria
Many criteria have been proposed in the literature to evaluate the quality of the embeddings resulting from adjustment algorithms.Unfortunately, the experiments provided by the authors of the different approaches are not always based on the same metrics.In order to provide a uniform protocol of experiment and a complete comparison of the algorithms, we need to review the quality criteria and the metrics used to evaluate them.We also need to select a representative metric for each criterion.
We identified 5 classes of metrics: Orthogonal Ordering preservation (oo), Spread minimisation (sp), Global Shape preservation (gs), Node Movement minimisation (nm) and Edge Length preservation (el).Each of them depicts a quality criterion.Table 1 shows the metrics assigned to the classes.The formulas are given in the discussion below.The abbreviations of the classes are used as prefix for the metrics.
The following subsections contain the metrics of a specific class.In each of them, we select one representative metric, based on the corresponding quality criterion and the properties that the metrics aim at capturing.Our discussion also sometimes involves the coefficient of correlation of two metrics run following the protocol described in the comparison section, Section 4.

Orthogonal Ordering preservation
The orthogonal ordering class groups the metrics which try to quantify how much an adjustment algorithm preserves the initial orthogonal ordering, i.e., the following conditions: The first metric of this class introduced in [21], called here oo o, is equal to 1 if the overlapping-free graph embedding preserves the initial orthogonal ordering, 0 otherwise.Also, if only one couple of nodes does not satisfy those conditions, the value of oo o is the same as when many ones do not satisfy it.
To overcome this issue, Huang et al. [16] proposed a metric based on the Kendall's Tau distance (oo kt).For each couple of nodes, they first compute an inversion number inv(u, v) corresponding to 0 if the orthogonal ordering is preserved between them, 1 otherwise.The metric is then defined as the normalised sum of the inversion numbers: Strobelt et al. [26] introduced the number of inversions: This metric has the drawback of providing non-normalised values.However, it holds the benefit of penalizing inversions occurring on each axis independently (x− and y − axis), instead of penalizing in the same manner an inversion occurring in only one axis and an inversion occurring in the two axes.Thus, in our study, we combine the two metrics by using a normalised version of the latter:

Spread minimisation
A classical zoom-in function maintaining the sizes of the nodes (i.e.uniform scaling) provides an overlapping-free embedding, but it expands the visualisation, resulting in large areas without any objects.To avoid this issue, quality metrics have been introduced to quantify embedding spreading.Their purpose is to favour algorithms inducing low spreading.
The L1 metric length [17] is the ratio: The drawback of this technique is to consider only one dimension of the embedding, width or height.For instance, considering an example where w bb = 4, h bb = 2, w bb = 4, h bb = 4, the value of the L1 metric length is 1 (which is the target value), whereas the area of the overlapping-free embedding is twice as large as in the initial embedding.The ratio between the bounding box areas of the two embeddings [21] overcomes this issue: sp bb a = w bb × h bb w bb × h bb While the result gives an unbounded value greater than 1, Huang et al. [16] proposes a normalised version producing values in the interval [0, 1[: Unfortunately, this criterion is rather unintuive and it is hard to figure out what the values represent.
In our comparison, we selected another version of the ratio of areas involving convex hulls [26], as it better captures the concrete area of the drawing: sp ch a = area(ch ) area(ch)

Global Shape preservation
This class contains metrics that try to capture the ability of the algorithms to preserve the global shape of the initial embedding.The first one was proposed by Li et al. [17]: The underlying idea is to capture the variation of the aspect ratio (w bb /h bb ) between the initial and the overlapping-free embedding.For instance, let us consider an example where w bb = 3, h bb = 2, w bb = 6, h bb = 4 (see grey and green rectangles below).In this case, the overlapping-free embedding is twice as large as the initial one but the aspect ratio remains the same 3/2.The gs bb ar is 1, which is the target value.Now let us consider another example where w bb = 3, h bb = 2, w bb = 4, h bb = 6 (blue rectangle below).In this case, the initial aspect ratio is 3/2 whereas the overlapping-free one is 2/3.The gs bb ar is now 2.25, which is not the target value; it reveals a distortion of the initial embedding during the overlap removal process.The main drawback of this metric is that it can reach values in the interval ]0, +∞[ while the target value is 1.For instance, w bb = 3, h bb = 2, w bb = 6 and h bb = 5 induce a gs bb ar equals to 0.8 (purple rectangle above), while w bb = 3, h bb = 2, w bb = 4 and h bb = 2 induce a gs bb ar equals to 1.33 (yellow rectangle above).In this case, it is hard to decide which algorithm is the best between two of them if the first one obtains the purple bounding box and the second one obtains the yellow one, because we have no clue to compare 0.8 and 1.33 when 1 is the target value.To overcome this issue, we propose to refine it as follows: In this case, the target value is 1 and the metric cannot reach values below it (see the values below the rectangles).This criterion is the one we selected for our study.
An alternative to this approach based on the convex hull has been proposed by Strobelt et al. [26].The idea is to evaluate the distortion of the convex hull by comparing, between both embeddings, the distances of convex hull points to their center.Let θ (resp.θ ) be the Euclidean distance between the center of mass c ch (resp.c ch ) of the convex hull ch (resp.ch ) and the intersection of the convex hull with the line going through c ch (resp.c ch ) and with an angle θ (θ varying from 0 • to 350 • in 10 • steps).Then, the difference is defined as the ratio d θ = θ / θ .The metric is the standard deviation of the 36 measures of d θ : where d = 1 36 d θ is the mean value Based on the experiments presented below in Section 4, we observed that gs bb iar and gs ch sd have a correlation coefficient of 0.77, showing that they both tend to capture similar aspects of the adjustment process.We selected the former for its simplicity and its ease of interpretation.

Node Movement minimisation
This class contains the metrics quantifying the changes in node positions after running an adjustment algorithm.The underlying intuition is that an algorithm involving high node movements will provide an overlapping-free configuration different from the original one, and thus may result in a substantial loss of the mental model.
The simplest metric of this class was presented by Huang et al. [16]: Here, nb represents the number of nodes which have moved between the initial and the overlapping-free embedding.The main drawback of this approach is that a node overlap removal algorithm may induce very small changes in most nodes, which does not affect the mental model preservation, while inducing a very bad result.To tackle this problem and add more granularity over the evaluation of node movements, a series of metrics have been proposed, based on the same underlying quality function: where f is a normalising function of n = |V | and dist is a distance between v and v .Table 2 sums up the ones used in the literature.The function f comes in three different forms.Marriott et al. [19] and Huang et al. [15] do not include any f , which is similar to having f (n) = 1.The drawback is that the resulting value highly depends on the number of nodes in the graph.That is why Strobelt et al. [26] proposed to use the mean of Table 2: Functions used to tune the distance moved metric (with references to the first papers mentioning the use of these functions in the context of node overlap removal) the distances, which corresponds to f (n) = 1/n.Finally, Lyons et al. [18] proposed where k is the maximum between w bb and h bb .In this case, k √ 2 is the diagonal of a square containing the embedding, thus a maximum distance available for a node.Unfortunately, this normalisation generates very small values and is harder to interpret than f (n) = 1/n.That is why we preferred the latter for our study.
Three dist functions have been proposed in the literature.The most intuitive one is the Euclidean distance v − v [26,18].The squared Euclidean distance v − v 2 [19] avoids the square root computation and discriminates high changes better.It is the one we selected for our study.The Manhattan distance |x v − x v | + |y v − y v | has also been used [15], but it is less intuitive and has close results (nm dm se and nm dm h have a correlation coefficient of 0.9).
Let us consider an adjustment algorithm that pushes nodes on the x-axis.The preservation of the global shape is not optimal but the preservation of the configuration should reach a good score, as a node on right-top in the initial embedding would remain on right-top in the overlapping-free embedding.In order to better capture the relative movement of a node between the two embeddings, a shif t function can be applied to align the center of the initial bounding box with the center of the final one, and a scale function to align the size of the initial bounding box to the size of the final one: Considering this, we selected the following node movement metric: (the complete formula is available in the paper) is also based on the idea that the metric should be based on modified initial positions to better capture the relative movement of the nodes between the two embeddings.Besides including the shif t and the scale functions, it also rotates the initial embedding with an angle θ that minimises the distances between the nodes of the initial embedding and the ones of the overlapping-free embedding: We have not included the rotation in our experiment as we consider that it can induce a loss of the mental model (think about the recognition of a map turned upside down).
An alternative to quantify how much an overlapping-free configuration may result in a substantial loss of the mental model is to look at the neighbourhoods at the nodes and compare them before and after the adjustment.Based on a k-N N approach, Nachmanson et al. [22] proposed the following metric: where N k (v) (resp.N k (v )) denotes the k nearest neighbours of v (resp.v ), in terms of Euclidean distance, in the initial (resp.overlapping-free) embedding.We did not select this metric because, unlike the other metrics of the class, it requires to fix a parameter (k).

Edge Length preservation
This class contains the two metrics based on edge lengths.The set of edges can be E or can be another set derived from the graph.
Standard force-based layout algorithms tend to produce uniform lengths of edges.Indeed, the first metric of this class captures whether the edge lengths of a graph remain uniform or not after applying an adjustment algorithm [17]: As many layout algorithms are not designed to produce uniform edge lengths, mental map preservation is not necessarily captured by such kind of metrics.Hence we decided to consider alternatives.The first alternative is based on the edges of a graph derived from the initial embedding, via a Delaunay triangulation.Let E dt be the set of edges of a Delaunay triangulation performed on the nodes of the initial embedding.The second metric of this class, el rsdd, is based on computing the coefficient of variation, also known as the relative standard deviation, of the edge lengths ratio as follows [8]: The main drawback of this metric is that it is based on a derived set of edges, instead of the real one.As a consequence, it only partially captures whether an algorithm preserves edge lengths or not.In our study, we use the coefficient of variation of edge lengths ratio, el rsd (same equations as el rsdd replacing E dt by E).
These graphs are provided by 4 generation models available on the OGDF library [4]: random graphs [6], random trees, small world graphs [28], and scale-free graphs [2].We also use 14 real-world graphs selected from the Graphviz test suite5 [9], previously used by the authors of PRISM [8] and GTREE [22].All the graphs are available online 6 as GML files including the initial embedding.
2. Overlapping-free embedding computation.Synthetic graphs resulting from the first step are initially positioned by the F M 3 layout algorithm [11].Then, we apply the 9 node overlap removal algorithms, thus providing a set of 7.560 overlapping-free graph embeddings.Graphviz test suite graphs are initially positioned by the SF DP layout algorithm [14] to follow the same baseline embedding as Gansner et al. [8].We then apply the 9 node overlap removal algorithms thus providing 126 overlapping-free graph embeddings.
3. Metrics computation.We finally compute the values of the 5 selected metrics on the 7.686 overlapping-free synthetic and real-world graph embeddings.We also measure the computation time of the algorithms.
The values of the metrics discussed in this comparison are measured from the results of the implemented algorithms that are available in our library (see Section 6), and thus might differ from our original paper [3] in terms of running time.All the algorithms are coded in JavaScript.We implemented PFS, PFS', FTA and Diamond ourselves from the algorithms provided by the authors in their seminal papers.As Diamond is based on a linear optimization, we used the jsLPSolver7 .For VPSC, we directly used the JavaScript program provided by the authors 8 .PRISM and GTREE have been adapted from the Microsoft Table 3: Aggregated values of the selected metrics on the synthetic graphs: first quartile, median and third quartile.
Automatic Graph Layout library9 and converted into JavaScript with Sharp-Kit 10 .Finally, we were inspired by the Java program of RWordle-L11 provided by the authors of [26].

Quality
Figure 1a shows a random graph containing 100 nodes and 400 edges, positioned by the F M 3 layout algorithm [11].This initial embedding contains 274 overlaps.Figures 1b-1j show the overlapping-free embeddings obtained after applying the algorithms mentioned above.The sizes of the figures reflect the spread of the embeddings.
Figure 2 shows another example with a real-world graphs from the Graphviz test suite, mode.It contains 213 nodes, 269 edges and 1105 overlaps.Figures 2b-2j shows the overlapping-free embeddings.In this case, we did not maintain the relative size for Scaling and PFS, as the drawing was too large.The actual embeddings are twice as large as they appear in the figure .Table 3 shows the aggregated metrics values obtained on the synthetic graphs: for each of the five selected metrics and for each algorithm, the first quartile, the median and the third quartile of the values are given.Table 4 shows the metric values on the real-world graphs.In these figures and the next ones, the colour of the cases represents the quality of the algorithm on the criterion: green for high quality, orange for intermediate and red for poor.The ranges are defined by comparing the values lying on a single row, i.e. the values of a single criterion obtained on the different algorithms.Table 4: Mean values of the selected metrics on the real-world graphs.
Orthogonal Ordering preservation Unsurprisingly, Scaling, PFS and PFS' obtain the best scores at oo nni as it is proved that they maintain the original orthogonal ordering.Though, all the algorithms tested got good results for this criterion.
Spread minimisation As shown in Figure 1b, Scaling highly increases the size of the embedding, which induces a bad score for sp ch a. P F S also obtains a bad score for this criterion.VPSC and RWordle-L produce the most compact embeddings, while the other algorithms give intermediary results.However, looking at Figures 1f, 1h, 2f and 2h, we can observe that the embeddings resulting from these two algorithms are so compact that they do not allow to visualise the edges nor the structures of the graph (e.g.communities or clusters).Depending on the task one wants to perform on the overlapping-free embedding, this observation illustrates a possible limitation of the criterion when it is considered independently from the other ones.
Global Shape preservation Surprisingly, the global shape preservation score (gs bb iar) is not exactly 1 for Scaling because of the size of the nodes that remains the same between the initial and the overlapping-free embeddings.Nevertheless, it preserves the initial global shape.PFS is the worst algorithm on this criterion.The other algorithms obtained good median scores on synthetic graphs, but the third quartile scores show that FTA and VPSC can produce a certain amount of distorted embeddings.This is confirmed by the tests on real-world graphs, where they obtain worse results, and on the Figures 1e, 1f and 2f, where we can observe that they spread the layout along only one of the axis (x-axis for FTA and y-axis for VPSC ).
Node Movement minimisation Scaling obtains the best results for the node movement minimisation criterion, followed by VPSC and RWordle-L.FTA also obtained a good median score on synthetic graphs, but its third quartile value shows that it can generate a certain amount of embeddings with high changes, as also illustrated by the bad score obtained on the real-world graphs.PFS' and PRISM obtained intermediary results.GTREE had bad results on the synthetic graphs, while it obtained pretty good ones on the real-world graphs.Finally, PFS and Diamond obtained bad results on both synthetic and real-world graphs.
Edge Length preservation Scaling preserves relative edge lengths.Diamond obtains the worst scores on synthetic graphs but this phenomenon is not confirmed on real-world ones, for which it obtains pretty good scores.All the other algorithms obtained a median score between 0.08 and 0.36 on synthetic graphs.The third quartile shows that FTA generates a certain amount of embeddings with higher edge length variations.This observation is confirmed by the results on the real-world graphs, for which it obtains the worst score.

Computation time
Tables 5 and 6 show the aggregated running time values in milliseconds on the synthetic graphs (first quartile, median and third quartile) and the running time values on the real-world ones, measured on our implementation of the algorithms.
We can observe on the synthetic graphs that Scaling, PFS, PFS' and VPSC require lower running time than the other algorithms.FTA is a little bit slower, especially when looking at the third quartile, indicating a certain amount of time consuming embedding computations (more than 1 second for graphs containing more than 500 nodes).This observation is confirmed on the real-world graphs.
RWordle-L, PRISM and GTREE induce intermediate running times on the synthetic graphs: less than 1 second for graphs containing up to 200 nodes, a few seconds for graphs of 500 nodes, and tens of seconds for graphs of 1000 nodes.PRISM and GTREE are significantly slower than RWordle-L on small graphs (number of nodes below or equal to 100) but it seems to be unimportant as the values remain very low.The real-world graphs confirm these observations, but also highlight that PRISM is sometimes significantly slower than GTREE, even if it only happens on graphs requiring a low time computation.
Diamond is often the most time consuming algorithm on both synthetic and real-world graphs.However, it is based on a linear optimization so the running time depends on the solver used (see introduction of this section).This might explain the differences between our results and the ones given in the paper [20].

Summary
As a conclusion, even if Scaling optimises 4 out of 5 criteria and is very fast to compute on the graphs of our datasets, it does not represent a satisfying solution as it increases the size of the embedding too much.PFS is also not satisfying as it got poor results on 3 criteria.In particular, it also considerably increases the size of the embedding, which is obvious in Figures 1c and 2c.
FTA obtained intermediate results over all the criteria, which is less good than all its remaining competitors.In particular, as mentioned before, it spreads the embedding along only one axis, highly distorting the original configuration (see the Global Shape Preservation criterion gs bb iar in Table 3, as well as the distortion illustrated by Figure 1e).
VPSC also holds this property.Looking at Table 3, we can observe that RWordle-L is better than VPSC on Global Shape Preservation, while obtaining Table 5: Aggregated running times in milliseconds on the synthetic graphs, function of number of nodes (10 to 1,000): first quartile, median and third quartile.
Table 6: Running times in milliseconds on the real-world graphs.comparable results on the other criteria.It is because they create the most compact embeddings, inducing a low spread and short node movements.We can also notice on Tables 5 and 6 that RWordle-L can be time consuming for graphs with more than 500 nodes, which is not the case of VPSC.Thus, if the compactness of the embedding is one's priority, RWordle-L should be chosen on small graphs and VPSC on larger ones.
High compactness of the embeddings resulting from RWordle-L and VPSC avoids visualising the edges and the graph structures.Therefore, if one's priority is to provide an embedding highlighting the paths and the groups of nodes in the graph, the remaining options (PFS', PRISM, GTREE and Diamond ) should be favoured.Among them, Diamond is the slowest one with respect to the solver we used (see the introduction of this section).Diamond also obtains bad scores for Node Movement minimisation and Edge Length preservation on synthetic graphs (see Table 3 nm dm imse and eb rsdd).GTREE also induces a lot of node movement, but it outperforms Diamond in terms of Edge Length preservation and running time.PFS' and PRISM obtained comparable results, outperforming GTREE and Diamond on Node Movement minimisation, even if PRISM is slightly better (see Tables 3 and 4, nm dm imse).Figures 1d and 1g illustrate their similarity while Figures 2d and 2g illustrate the ability of PRISM to induce less node movements on a real-world graph.PFS' should be favoured against PRISM for large graphs as its computation time is substantially lower (see Tables 5 and 6).

Discussion
In this section, we discuss threats to validity and future research directions.

Threats to validity
Nodes aspect ratio The aspect ratio of the synthetic graphs of the above study is 2:1 whereas the aspect ratios of the real-world graphs vary with respect to the initial datasets.The fixed aspect ratio of synthetic graphs could be considered as a limitation of our study: what happens in terms of result quality among the different algorithms when the aspect ratio varies, and in particular when the width of the nodes increases to display long text labels?A clue for answering this question is available in Table 7, which shows the results for the graph of the Figure 1 with an aspect ratio of 2:1, and the same graph with an aspect ratio of 5:1.In this example, the metrics mostly highlight the same properties for both aspect ratios.The main differences appear on the global aspect ratio (gs bb iar) for PFS, FTA and VPSC.As illustrated by Figure 3, the drawback of spreading the embedding along one dimension is accentuated when the width of the nodes increases for PFS and VPSC.Conversely, this phenomena is attenuated for FTA.However, as we can see in the figure, this is due to the high movement of a bunch of nodes along the vertical axis on the right part of the embedding, which is not a sign of the output quality.Number of overlaps Another factor that could limit the results is the number of overlaps of the initial embedding: are some algorithms fitted to obtain better quality measures for few or many overlaps?Table 8 shows the results obtained on the graph of Figure 1 with different sizes for the nodes, but with the same aspect ratio.The initial size, 20 × 10, produces 274 overlaps on the initial embedding, while the other one, 40 × 20, produces 1136 overlaps.Here again, the different measures mostly rank the algorithms in the same order for the two graphs.The main differences are on the aspect ratio generated by FTA and VPSC, but not by PFS this time.Indeed, increasing the number of overlaps accentuate the spreading along one dimension for FTA but not for PFS.VPSC shows the same behaviour as when we changed the aspect ratio.

Directions for future work
Further algorithms Our study focuses on algorithms explicitly designed to remove node overlaps of graph embeddings.However, a future direction could be to consider further algorithms dedicated to related problems.For instance, van Garderen et al. [27] propose a heuristic to remove overlaps of geo-referenced rectangles.Their objective is to minimize the displacement of the nodes while preserving the orthogonal ordering.Another example is provided by Nickel et al. [23] in which the authors propose a method to maintain stability among Demers time-varying cartograms, and thus provide a kind of rectangle overlap removal algorithm.Table 8: Values of the selected metrics on the graph of Figure 1 with 274 and 1136 overlaps.
Further criteria In Section 3, we described 22 metrics, classified them into 5 classes according to the properties they aim to capture and selected one of them for each class.Among the 22 metrics, 18 came from the literature on node overlap removal algorithms, the other ones were minor improvements of the previous ones based on some drawbacks highlighted in the discussions.However, we did not propose any radically different metric nor we presented how metrics coming from other applications could be adapted to meet the problem covered here.This could be an interesting future research direction but it is beyond the scope of the present work.For instance, Fadloun et al. [7] proposed criteria to evaluate the quality of 1D node overlap removal algorithms and it would be interesting to investigate how they could be tailored to the 2D case.Sondag et al. [25] proposed a refined metric to quantify the change of the relative positions of rectangles in the context of stable treemap layout algorithms.Another source of inspiration could come from geographic data visualisation.For instance, Guo and Gahegan [10] proposed several approaches to encode spatial proximity between elements and Haunert and Sering [12] quantify local distortions of road networks.
Embedding quality This study compares algorithms in regards of their ability to preserve some properties of the initial embedding.As a result, we considered only metrics that quantify how much the algorithms preserve the mental map between an initial embedding and the corresponding overlapping-free one.This approach induces a limitation that would worth future investigations beyond the scope of this paper.Indeed, there are many approaches to measure the different aspects of an embedding quality per se (see for instance [24]) and an interesting research direction would be to evaluate how much a node overlap removal algorithm degrades the quality of an initial embedding.Such a study is not trivial, as there are many layout algorithms that aim at optimizing different quality criteria, and it should be important to test embedding degradation in regards of these algorithms.For instance, consider a layout algorithm A 1 producing better results on a criterion c than another algorithm A 2 , but a worse degradation of c when removing overlaps.If we want to select the best node-overlap removal algorithm to preserve c, we need to test the competitors for several initial embeddings of the same graphs resulting from different layout algorithms, and eventually select different node-overlap removal algorithms in regards of the initial layout technique employed.

AGORA
All the node overlap removal algorithms as well as the criteria described in the paper are available in a JavaScript library 12 .The implementations are those used for the experiments described in the previous section.
A Web platform, AGORA 13 (Automatic Graph Overlap Removal Algorithms) is also available online.The user can select one or several real-world graphs used for the experiments, or upload his/her own graphs in the GML format.The graphs must contain the nodes coordinates, x and y, and their width and height, w and h.Then he/she can select one or several node overlap removal algorithms among the nine proposed on the interface (corresponding to the ones of the experiments of the previous section).Finally, he/she can select the criteria among the 22 described above.By default, the five most relevant criteria are selected.Once these parameters have been chosen, the user can generate the overlapping-free embeddings.An embedding is provided for each graph and each algorithm, and it is shown as a thumbnail image.The user can download it as a JSON or a GML file.In this case, even if they don't appear in the thumbnail images, the node and edge properties of the original files are kept and available in the output file, just the x and y values are changed.The user can also download a thumbnail as a SVG file.Finally, a table with the values of the selected criteria for each embedding is provided.

Conclusion
Finding a suitable node overlap removal algorithm is difficult for a visualisation designer because even if many algorithms exist, no complete comparison based on the same criteria has been provided.In this paper we first highlighted the five main classes of existing criteria and proposed a selection of one representative criterion for each class.Using a large number of experiments carried out with synthetic and real-world graphs, we compared 9 algorithms from the state-ofthe-art according to both criteria and running time.By analyzing the results, we then showed advantages, disadvantages and limitations of the algorithms, which can be very useful for the designer.Finally we proposed a Javascript library containing all node overlap removal algorithms and criteria as well as a Web platform, AGORA, that allows the end user to upload his/her own graphs and get the embeddings according to the selected algorithms.

Figure 1 :
Figure 1: Overlapping-free embeddings obtained after applying the algorithms on the initial embedding (a) of a random graph containing 274 overlaps.

Figure 2 :
Figure 2: Overlapping-free embeddings obtained after applying the algorithms on the initial embedding (a) of a real-world graph containing 1105 overlaps.

Table 7 :
Values of the selected metrics on the graph of Figure1with two nodes aspect ratios: 2:1 and 5:1.

Figure 3 :
Figure 3: Overlapping-free embeddings obtained after applying PFS, FTA and VPSC on a graph with different nodes aspect ratios.

Table 1 :
List of metrics classified by the quality criterion they try to capture: selected metrics appear in bold italics.The abbreviations are based on some initials of the names, e.g.sp bb a means that the metric is in the class Spread minimisation, it uses the embedding Bounding Box to quantify the Area spreading.The Range column contains the set of values that the metric can take.The Tgt column refers to the target value to meet the corresponding criterion.