Graph Neural Networks for Graph Drawing

Graph Drawing techniques have been developed in the last few years with the purpose of producing aesthetically pleasing node-link layouts. Recently, the employment of differentiable loss functions has paved the road to the massive usage of Gradient Descent and related optimization algorithms. In this paper, we propose a novel framework for the development of Graph Neural Drawers (GND), machines that rely on neural computation for constructing efficient and complex maps. GNDs are Graph Neural Networks (GNNs) whose learning process can be driven by any provided loss function, such as the ones commonly employed in Graph Drawing. Moreover, we prove that this mechanism can be guided by loss functions computed by means of Feedforward Neural Networks, on the basis of supervision hints that express beauty properties, like the minimization of crossing edges. In this context, we show that GNNs can nicely be enriched by positional features to deal also with unlabelled vertexes. We provide a proof-of-concept by constructing a loss function for the edge-crossing and provide quantitative and qualitative comparisons among different GNN models working under the proposed framework.


I. INTRODUCTION
V ISUALIZING complex relations and interaction patterns among entities is a crucial task, given the increasing interest in structured data representations [1].The Graph Drawing [2] literature aims at developing algorithmic techniques to construct drawings of graphs -i.e.mathematical structures capable to efficiently represent the aforementioned relational concepts with nodes and edges connecting them -for example via the node-link paradigm [3], [4], [5].The readability of graph layouts can be evaluated following some aesthetic criteria such as the number of crossing edges, minimum crossing angles, community preservation, edge length variance, etc. [6].The final goal is to find suitable coordinates for the node positions, and this often requires to explicitly express and combine these criteria through complicated mathematical formulations [7].Moreover, effective approaches such as energy-based models [8], [9] or spring-embedders [10], [11] require handson expertise and trial and error processes to achieve certain M. Tiezzi and M. Gori are with the Department of Information Engineering and Mathematics, University of Siena, 53100 Siena, Italy.M. Gori and G. Ciravegna are with MAASAI, Inria, I3S, CNRS, Université Côte d'Azur, Nice, France.Matteo Tiezzi is the corresponding author (mtiezzi@diism.unisi.it).
Accepted for publication in Transaction of Neural Networks and Learning Systems (TNNLS), Special Issue on Deep Neural Networks for Graphs: Theory, Models, Algorithms and Applications.
(DOI: https://doi.org/10.1109/tnnls.2022.3184967)2022 IEEE.Personal use of this material is permitted.Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.desired visual properties.Additionally, such methods define loss or energy functions that must be optimized for each new graph to be drawn, often requiring to adapt algorithm-specific parameters.Lately, two interesting directions have emerged in the Graph Drawing community.The former one leverages the power of Gradient Descent to explore the manifold given by pre-defined loss functions or combinations of them.Stochastic Gradient Descent (SGD) can be used to move sub-samples of vertices couples in the direction of the gradient of springembedder losses [12] substituting complicated techniques such as Majorization [13].This approach has been extended to arbitrary optimization goals, or combinations of them, which can be optimized via Gradient Descent if the corresponding criterion can be expressed via smooth functions [6].The latter novel direction consists in the exploitation of Deep Learning models.Indeed, the flexibility of neural networks and their approximation capability can come in handy also when dealing with the Graph Drawing scenario.For instance, Neural networks are capable to learn the layout characteristics from plots produced by other graph drawing techniques [14], [15], as well as the underlying distribution of data [16].Very recently, the node positions produced by graph drawing frameworks [14] have been used as an input to Graph Neural Networks (GNNs) [17], [18] to produce pleasing layout that minimize combinations of aesthetic losses [19].
We propose a framework, Graph Neural Drawers (GND), which embraces both the aforementioned directions.We borrow the representational capability and computational efficiency of neural networks to prove that (1) differentiable loss functions guiding the common Graph Drawing pipeline can be provided directly by a neural network, a Neural Aesthete, even when the required aesthetic criteria cannot be directly optimized.In particular, we propose a proof-of-concept where we focus on the criteria of edge crossing, proving that a neural network can learn to identify if two arcs are crossing or not and provide a differentiable loss function towards nonintersection.Otherwise, in fact, this simple aesthetic criterion cannot be achieved through direct optimization, because it is non-differentiable.Instead, the Neural Aesthete provides a useful and flexible gradient direction that can be exploited by (Stochastic) Gradient Descent methods.Moreover, (2) we prove that GNNs, even in the non-attributed graph scenario if enriched with appropriate node positional features, can be used to process the topology of the input graph with the purpose of mapping the obtained node representation in a 2D layout.We compare various commonly used GNN models [20], [21], [22], proving that the proposed framework is flexible enough to give these models the ability to learn a wide variety of solutions.In particular, GND is capable to draw graphs (i) from supervised coordinates, i.e. emulating Graph Drawing Packages, (ii) minimizing common aesthetic loss functions and, additionally, (iii) by descending towards the gradient direction provided by the Neural Aesthete.
The paper is organized as follows.Section II introduces some basics on the Graph Drawing scenario as well as references on Gradient Descent approaches.Section III introduces the Neural Aesthete and provides a proof-of-concept on the edge crossing task.Section IV describes the requirements of using GNNs to draw graphs in the non-attributed scenario, as well as the problem definition and the experimental evaluation.Finally, conclusions are drawn in Section V.

II. RELATED WORK
There exists a large variety of methods in literature to improve graph readability.A straight-forward approach, that has been proved to be effective in improving the human understanding of the graph topology, consists in minimizing the number of crossing edges [23].However, the computational complexity of the problem is NP-hard, and several authors proposed complex solutions and algorithms to address this problem [24].In [25], authors employ an Expectation-Maximization algorithm based on the decision boundary surface built by a Support Vector Machine.The underlying idea is that two edges do not cross if there is a line separating the node coordinates.Further aesthetic metrics have been explored, such as the minimization of node occlusions [26], neighborhood preservation, the maximization of crossing edges angular width [27] and many more [28], [6].Given the graph drawing categorization depicted in surveys [28] (i.e.force-directed, dimension reduction, multi-level techniques), interesting and aesthetically pleasing layouts are produced by methods regarding a graph as a physical system, with forces acting on nodes with attraction and repulsion dynamics up to a stable equilibrium state [9].Force-directed techniques inspired many subsequent works, from spring-embedders [29] to energy-based approaches [8].The main idea is to obtain the final layout of the graph minimizing the Stress function (see Eq. 1).The forces characterizing this formulation can be thought of as springs connecting pairs of nodes.This very popular formulation, exploited for graph layout in the seminal work by Kamada and Kawai [9], was optimized with the localized 2D Newton-Raphson method.Further studies employed various complicated optimization techniques, such as the Stress Majorization approach which produces graph layout through an iterative resolution of simpler functions, as proposed by Gasner et al. [30].In this particular context, some recent contributions highlighted the advantages of using gradient-based methods to solve graph drawing tasks.The SGD method was successfully applied to efficiently minimize the Stress function in Zheng et al. [12], displacing pairs of vertices following the direction of the gradient, computed in closed form.A recent framework, (GD) 2 , leverages Gradient Descent to optimize several readability criteria at once [6], as long as the criterion can be expressed by smooth functions.Indeed, thanks to the powerful auto-differentiation tools available in modern machine learning frameworks [31], several criteria such as ideal edge lengths, Stress Majorization, node occlusion, angular resolution and many others can be easily optimized.We build our first contribution upon these ideas, proving that neural networks can be used to learn decomposed single criteria (i.e., edge crossing) approximating smooth functions, with the purpose of providing a useful descent direction to optimize the graph layout.
Deep Learning has been successfully applied to data belonging to the non-Euclidean domain, e.g.graphs, in particular thanks to GNNs [32], [18].The seminal work by Scarselli et al. [17] proposes a model based on an information diffusion process involving the whole graph, fostered by the iterative application of an aggregation function among neighboring nodes up to an equilibrium point.The simplification of this computationally expensive mechanism was the goal of several works which leverage alternative recurrent neural models [33] or constrained fixed-point formulations.This problem was solved via Reinforcement Learning algorithms as done in Stochastic Steady-state Embedding (SSE) [34] or cast to the Lagrangian framework and optimized by a gradient descentascent approach, like in Lagrangian-Propagation GNNs (LP-GNNs) [35], even with the advantage of multiple layers of feature extraction [36].The iterative nature of the aforementioned models inspired their classification into the umbrella of RecGNNs in recent surveys [18], [37].
In addition to RecGNNs, several other flavours of GNN models have been proposed, such as the ConvGNNs [38] or Attentional GNNs [21], [39], [40].All such models fit into the powerful concept of message exchange, the foundation on which is built the very general framework of Message Passing Neural Networks (MPNNs) [41], [42].
Recent works analyze the expressive capabilities of GNNs and their aggregation functions, following the seminal work on graph isomorphism by Xu et al. [43].The model proposed by the authors, Graph Isomorphism Network (GIN), leverage an injective aggregation function with the same representational power of the Weisfeiler-Leman (WL) Test [44].Subsequent works (sometimes denoted with the term WL-GNNs) try to capture higher-order graph properties [45], [46], [47], [48].Bearing in mind that we deal with the non-attributed graph scenario, i.e., graphs lacking node features, we point out the importance of the nodal feature choice.Several recent works investigated this problem [49], [50], [51].We borrow the highly expressive Laplacian Eigenvector-based positional features described by Dwivedi et al. [52].
There have been some early attempts in applying Deep Learning models and GNNs to the Graph Drawing scenario.Wang et al. [15] proposed a graph-based LSTM model able to learn and generalize the coordinates patterns produced by other graph drawing techniques.However, this approach is limited by the fact that the model drawing ability is highly dependent on the training data, such that processing different graph classes or layout styles requires re-collecting and retraining procedures.We prove that our approach is more general, given that we are able to learn both drawing styles from graph drawing techniques and to draw by minimizing aesthetic losses.Another very recent work, DeepGD [19], consists in a message-passing GNN which process starting positions produced by graph drawing frameworks [14], to construct pleasing layouts that minimize combinations of aesthetic losses (Stress loss combined with others).Both DeepDraw and DeepGD share the common need of transforming the graph topology into a more complicated one: DeepDrawing [15] introduces skip connections (fake edges) among nodes in order to process the graph via a bidirectional-LSTM; DeepGD converts the input graph to a complete one, so that each node couples is directly connected, and requires to explicitly provide the shortest path between each node couple as an edge feature.The introduction of additional edges into the learning problem increase the computational complexity of the problem, hindering the model ability to scale to bigger graphs.More precisely, in the DeepGD framework the computational complexity grows quadratically in the number of nodes O(N2 ).Conversely, we show that the GNN are capable of producing aesthetically pleasing layouts without inserting additional edges, by simply leveraging powerful positionalstructural features.Additionally, we introduce a novel neuralbased mechanism, the Neural Aesthete, capable to express differentiable aesthetic losses delivering flexible gradient direction also for non-differentiable goals.We show that this mechanism can be exploited by Gradient-descent based graph drawing techniques and by the proposed GND framework.Finally, GNN-based Encoder-Decoder architectures can learn a generative model of the underlying distribution of data from a collection of graph layout examples [16].

A. Graph Drawing Algorithms
Graph drawing algorithms typically optimize functions that somehow express a sort of beauty index, leveraging information visualization techniques, graph theory concepts, topology and geometry to derive a graphical visualization of the graph at hand in a bidimensional or tridimensional space [2], [53], [54].Amongst others, typical beauty indexes are those of measuring the degree of edge crossings [23], the measurements to avoid small angles between adjacent or crossing edges, and measurements to express a degree of uniform allocation of the vertexes [28], [6].All these requirements inherently assume that the graph drawing only consists of the allocation of the vertexes in the layout space, since the adjacent matrix of the graph can drive the drawing of the arcs as segments.However, we can also choose to link pairs of vertexes through a spline by involving some associated variables in the optimization process [55].
Without loss of generality, in this work we restrict our objective to the vertex coordinates optimization, but the basis ideas can be extended also to the case of appropriate arc drawing.
As usual, we denote a graph by G = (V, E), where V = {v 1 , . . ., v N } is a finite set of N nodes and E ⊆ V × V collects the arcs connecting them.The neighborhood of node v i is denoted by N i .We denote the coordinates of each vertex with p i : V → IR 2 , for a node i mapped to a bi-dimensional space.We denote with P ∈ IR N×2 the matrix of the node coordinates.
One of the techniques that empirically proved to be very effective for an aesthetically pleasing node coordinates selection is the Stress function [9], where p i , p j are the coordinates of vertices i and j, respectively, d ij is the graph theoretic distance (or shortestpath) between node i and j, and w ij is a weighting factor leveraged to balance the influence of certain pairs given their theoretical distance.Usually, it is defined as 1,2].The optimization of this function is generally carried out leveraging complicated resolution methods (i.e., 2D Newton-Raphson method, Stress Majorization, etc.) that hinder its efficiency.
Recently, Gradient Descent methods were employed to produce graph layouts [12] by minimizing the Stress function, and noticeably, Ahmed et al. [6] proposed a similar approach employing auto-differentiation tools.The advantage of this solution is that, as long as aesthetic criteria are characterized by smooth differentiable functions, it is possible to undergo an iterative optimization process1 following, at each variable update step, the gradient of the criteria.
Clearly, the definition of aesthetic criteria as smooth functions could be hard to express.For instance, while we can easily count the number of arc intersections, devising a smooth function that may drive a continuous optimization of this problem is not trivial [25], [6].Indeed, finding the intersection of two lines, 2 is as simple as solving the following equation system: By employing the classic Cramer's rule we can see that there is an intersection only in case of a non-negative determinant of the coefficient matrix Clearly, the previous formula cannot be employed as a loss function in an optimization problem since it does not provide gradients.
To tackle this issue and provide a scoring function optimizable via gradient descent, we propose the Neural Aesthete.

B. The Neural Aesthete
A major contribution of this paper is that of introducing the notion of Neural Aesthete, which is in fact a neural network that learns beauty from examples with the perspective of generalizing to unseen data.The obtained modelled function that is expressed by the Neural Aesthete is smooth and differentiable by definition and offers a fundamental heuristic for graph drawing.As a proof-of-concept, we focus on edge crossing.In this case, we define the Neural Aesthete as a machine which processes two arcs as inputs and returns the information on whether or not they intersect one each other.Each arc is identified by the coordinates of the corresponding pair of vertices, e u = (p i , p j ) for e u ∈ E. Hence, the Neural Aesthete ν(•, •, •) : E 2 × R m → R operates on the concatenation of two arcs, e u and e v and returns where θ ∈ R m is the vector which represents the weights of the neural network.The Neural Aesthete is learned by optimizing a cross-entropy loss function L(y eu,ev , ŷeu,ev ) over the arcs (e u , e v ) ∈ E, which is defined as: where ŷeu,ev is the target: and the intersection of e u , e v is automatically computed, e.g., by solving Eq. 2.
Notice that the learning process from a finite set of supervised examples yields weights that allows us to estimate the probability of intersection of any two arcs.Basically, the learned output of the neural network can be regarded as a degree of intersection between any arc couple.Once learned, this characteristic of the Neural Aesthete comes in handy for the computation of the gradient of a loss function for Graph Drawing.In general, we want to move the extreme nodes defining the two arcs towards the direction of non-intersection.
Hence, for the Graph Drawing task, the Neural Aesthete is able to process an unseen edge couple (e u , e v ) randomly picked from the edge list E, and to predict their degree of intersection y eu,ev .We define the loss function L(•, •) on this edge pair as the cross-entropy with respect to the target nointersection, ŷeu//ev = 0 This smooth and differential loss function foster the utilization of Gradient Descent methods to optimize the problem variables, i.e. the arc node coordinates (e u , e v ).
This same procedure can be replicated to all the graph edges, Overall, a possible graph drawing scheme is the one which returns P = arg min This can be carried out by classic optimization methods.For instance, a viable solution is by gradient descent as follows: where η specifies the learning rate.
It is worth mentioning that, overall, this approach leverages the computational efficiency and parallelization capabilities of neural networks.Hence, the prediction of the edge-crossing degree can be carried out for many edge couples in parallel.Moreover, this same approach can be conveniently combined with other aesthetic criteria, for instance coming from other Neural Aesthetes or from classical loss function (e.g., Stress).For example, we could consider where A(•) and B(•) denotes other aesthetic criteria characterized by smooth differentiable functions.

C. Example: Neural Aesthete on small-sized Random Graphs
We provide a qualitative proof-of-concept example for the aforementioned Neural Aesthete for edge-crossing in Figure 1.
We built an artificial dataset composed of 100K entries to train the Neural Aesthete.Each entry of the dataset is formed by an input-target couple (x, ŷ).The input pattern x corresponds to the Neural Aesthete arcs input positions 3 as defined in Section III, whose node coordinates are randomly picked inside the interval [0, 1].The corresponding target ŷ is defined as in Eq. 5.
We balanced the dataset composition in order to have a comparable number of samples between the two classes (cross/nocross).We trained a Neural Aesthete implemented as a Multi-Layer Perceptron (MLP) with two hidden layers of 100 nodes each and ReLu activation functions, minimizing the crossentropy loss function with respect to the targets and leveraging the Adam optimizer [56].We tested the generalization capabilities of the learned model on a test dataset composed of 50K entries, achieving a test accuracy of 97% 4 .Hence, the learned model constitutes the Neural Aesthete for the task of edge crossing.Given an unseen input composed of a couple of arcs, the learned model outputs a probability distribution representing a degree of intersection.Following the common pipeline of Graph Drawing methods with Gradient Descent, the Neural Aesthete output represent a differentiable function that provides an admissible descent direction for the problem parameters P .
To test the capability of the proposed solution, we leverage an artificial dataset of random graphs with a limited number of nodes (N ∈ [20,40]).We generated Erdős-Rényi graphs with the method presented in [57] for efficiently creating sparse 5 random networks, implemented in NetworkX [58].We selected the connected component of the generated graph having the biggest size (max node number).
Figure 1 reports a qualitative example of the proposed method in three graphs from the aforementioned dataset.To generate the graph layout, we carried on an optimization process on mini-batches of 10 arc-couples for an amount of 2K iterations (gradient steps).The first column depicts the starting random positions of the nodes; second column reports the graph layout obtained with an in-house implementation of the Stress function (see Eq. 1), optimized via Gradient Descent as done in [6]; third column contains the results obtained by optimizing the loss provided by the proposed Neural Aesthete for edge-crossing; fourth column reports the layouts obtained alternating the optimization of the Stress function and edgecrossing in subsequent update steps.It is noticeable to see how the solution provided by our approach is capable to avoid any arc intersection in these simple graphs.Moreover, the fact that the Neural Aesthete output represents a form of degreeof-intersection, seems to provide a good gradient direction that easily moves the arcs into a recognizable angle pattern, even when combined with other criteria (fourth column).The proposed proof-of-concept proves that Neural Aesthetes represent a feasible, general and efficient solution for Graph Drawing.In the following, we prove that this same approach can be used to guide the training process of different kind of Deep Neural Models.

IV. GNNS FOR GRAPH DRAWING
The increasing adoption of GNNs in several research fields and practical tasks [59], [60] opens the road to an even wider spectrum of problems.Clearly, Graph Drawing and GNNs seem inherently linked, even if the formalization of this learning process under the GNN framework is not trivial.As pointed out in Section II, some recent works leveraged GNNs-inspired models for Graph Drawing.DeepDrawing [15] employs a graph-based LSTM model to learn node layout styles from Graph Drawing frameworks.DeepGD [19] is a concurrent work in which an MPNN processes starting node positions to develop pleasing layouts that minimize combinations of aesthetic losses (Stress loss combined with others).Starting node positions, however, needs to be initialized by standard graph drawing frameworks [14]; in case a random initialization is employed, network performances deteriorate.
One of the drawbacks of both these approaches is the fact that they modify the graph topology, introducing additional connections that were not present in the original graph.This fact entails an increased computational burden for the model.Indeed, a complete graph requires many more message exchanges than a sparse one, being the computational complexity of the GNN propagation linear in the number of edges [18].Moreover, the complete graph processed by DeepGD is enriched with edge features being the shortest path between the node connected to the edge.This solution gives big advantages in tasks closely connected with the Stress minimization but could prevent the network from generalizing to other tasks.
We propose an approach to Graph Drawing, GNDs, that leverages the computational efficiency of GNNs and, thanks to informative nodal features (Laplacian eigenvectors, see Sec.IV-B), is general enough to be applied to several learning tasks.

A. Graph Neural Networks
First and foremost, let us introduce some notation.We denote with l i all the input information (initial set of features) eventually attached to each node i in a graph G.The same holds for an arc connecting two nodes i and j, whose feature, if available, is denoted with l (i,j) .Each node i has an associated hidden representation (or state) x i ∈ R s , which in recent models is initialized with the initial features, x i = l i (but it is not necessarily the case in RecGNN models [36]).Many GNN models can be efficiently described under the powerful umbrella of Message Passing Neural Networks (MPNNs) [41], where the node state x i is iteratively updated at each iteration t, through an aggregation of the exchanged information among neighboring nodes N i , undergoing a message passing process.Formally, x (i,j) represent explicitly the message exchanged by two nodes, computed by a learnable map MSG t (•). 6Afterwards, AGG t (•) aggregates the incoming messages from the neighborhood, eventually processing also local node information such as the node hidden state x i and its features l i .The messaging and aggregation functions MSG (t) (•), AGG (t) (•) are typically implemented via Multi-Layer Perceptrons (MLPs) learned from data.Apart from RecGNN, other GNN models leverage a different set of learnable parameters for each iteration step.Hence, the propagation process of such models can be described as the outcome of a multi-layer model, in which, for example, the node hidden representation at layer t, x (t) i , is provided as input to the next layer, t + 1. Therefore an -step message passing scheme can be seen as an -layered model.GAT [21]: Att.σ u∈Nv α This convenient framework is capable to describe several GNN models [18].In this work, we focus our analysis on three commonly used GNN model from literature (i.e., GCN [20], GAT [21], GIN [43]) whose implementation is given in Table I, characterized by different kinds of aggregation mechanisms (degree-norm, attention-based, injective/sum, respectively).Following the Table notation, in GCN c u,v denotes a normalization constant depending on node degrees; in GAT α is a learned attention coefficient which introduces anisotropy in the neighbor aggregation, σ denotes a nonlinearity and W, W 0 , W 1 are learnable weight matrices; in GIN is a learnable parameter (which is usually set to zero).

B. Problem Formulation
Through the GNDs framework we propose to employ the representational and generalization capability of GNNs to learn to generate graph layouts.We formulate the problem as a node-focused regression task, in which for each vertex belonging to the input graph we want to infer its coordinates in a bi-dimensional plane, conditioned on the graph topology and the target layout/loss function (see Section IV-C).Furthermore, in the GND framework, we propose to employ GNNs to learn to draw by themselves following the guidelines prescribed by Neural Aesthetes.(see Section IV-F).In order to be able to properly solve the Graph Drawing task via GNDs, a crucial role is played by the expressive power of the GNN model and the nodal features which are used.In fact, in line with the aforementioned regression task, each node state must be uniquely identified to be afterwards mapped to a different 2D position in the graph layout.This problem is inherently connected with recent studies on the representational capabilities of GNNs (see Section II and [62]).Standard MP-GNNs have been proved to be less powerful than the 1-WL test [45], both due to the lack of expressive power of the used aggregation mechanisms and to the existence of symmetries inside the graph.For instance, local isomorphic neighborhoods create indiscernible unfolding of the GNN computational structure.Hence, the GNN embeds isomorphic nodes to the same point in the high dimensional space of the states, hindering the Graph Drawing task.Some approaches address this problem proposing novel and more powerful architectures (WL-GNNs) that, however, tend to penalise the computational efficiency of the GNNs [45].Moreover, given the fact that we focus on the task of drawing non-attributed graphs, it is even more important to enrich the nodes with powerful features able to identify both the position of nodes inside the graph (often referred to as Positional Encodings (PEs) [52]) and able to describe the neighboring structure.
Recently, it has been shown that the usage of random nodal features theoretically strengthens the representational capability of GNNs [63], [64].Indeed, setting random initial node embedding (i.e., different random values when processing the same input graph) enable GNNs to better distinguish local substructures, to learn distributed randomized algorithms and to solve matching problems with nearly optimal approximation ratios.Formally, the node features can be considered as random variables sampled from a probability distribution µ with support D ⊆ R s , where µ can be instantiated as the Uniform distribution.
The main intuition is that the underlying message passing process combines such high-dimensional and discriminative nodal features, fostering the detection of fixed substructures inside the graph [63].These approaches, which hereinafter we refer to as rGNNs, proved that classification tasks can be tackled in a novel way, with a paradigm shift from the importance of task-relevant information (the features values) to the relevance of the relationship among node values.However, the peculiar regression task addressed in this work requires both positional and structural knowledge, which is essential to identify and distinguish neighboring nodes.
To address this issue, we keep standard GNN architectures and leverage Positional features defined as the Laplacian eigenvectors [65] of the input graph, as introduced recently in GNNs [52].Laplacian eigenvectors embed the graphs into the Euclidean space through a spectral technique, are unique and distance-preserving (far away nodes on the graph have large PE distance).Indeed, they can be considered hybrid positionalstructural encodings, as they both define a local coordinate system and preserve the global graph structure.
Formally, they are defined via the factorization of the graph Laplacian matrix: where I is the N × N identity matrix, D is the node degree matrix, A is the adjacency matrix and Λ and U correspond respectively to the eigenvalues and eigenvectors.As proposed in [52], we use the k smallest non-trivial eigenvectors to generate a k-dimensional feature vector for each node, where k is chosen by grid-search.Noticeably, given that the smallest eigenvectors provide smooth encoding coordinates of neighboring nodes, during the message exchange each node receives and implicit feedback on its own positionalstructural characteristics from all the nodes with which it is communicating.This process foster the regression task on the node coordinates, which receives useful information from their respective neighborhood.We believe that this is a crucial component of the model pipeline.

C. Experimental setup
We test the capabilities of the proposed framework comparing the performances of three commonly used GNN models (see Table I).In the following, we describe the learning tasks and the datasets employed for testing the different models.In Sections IV-D, IV-E and IV-F, instead, we will give qualitative and quantitative evaluations for each learning problem, showing the generality of our approach.
Given the fact that the outputs of GND are node coordinates, we can impose on such predictions heterogeneous loss functions that can be optimized via BackPropagation.In the proposed experiments, we test the GND performances on the loss functions defined as the following: (i) distance with respect to ground truth node coordinates belonging to certain layouts, produced by Graph Drawing packages (Section IV-D); (ii) aesthetic loss functions (e.g.Stress) (Section IV-E); (iii) loss functions provided by Neural Aesthetes (Section IV-F).We assume to work with solely the graph topology, hence the node are not characterized by additional features.
We employed two different graph drawing datasets with different peculiarities.We chose to address small-size graphs (≤ 100 nodes) to assure the graph layout readability, since prior works highlighted node-link layouts are more suitable to small-size graphs [15], [66].The former one is the ROME dataset 7 , a Graph Drawing benchmarking dataset containing 11534 undirected graphs with heterogeneous structures and connection patterns.We preprocessed the dataset removing three disconnected graphs 8 .Each graph contains a number of nodes between 10 and 100.Some samples of the dataset are reported in the first column of Figures 3 and 4, drawn with different layouts (see the following).
We built a second dataset, which we refer to as SPARSE, with the same technique described in Section III-C.We generated 10K Erdős-Rényi graphs following the method presented in [57] for efficient sparse random networks and implemented in NetworkX [58].We randomly picked the probability of edge creation in the interval (0.01, 0.05) and the number of nodes from 20 to 100.To improve the sparsity and readability, we discarded all the created graphs having both more than 60 nodes and more than 120 edges.Afterwards, we selected the connected component of the generated graph having the biggest size (max node number).We report in Figure 2 a visual description of the datasets composition.
In order to carry out the training process and afterwards evaluate the obtained performances, we split each of the datasets into three sets, (i.e.training, validation, test) with a ratio of (75%, 10%, 15%).

D. GNNs learn to draw from ground-truth examples
The first experimental goal is focused on the task of learning to draw graph layouts given ground-truths node positions produced by Graph Drawing frameworks.Among several packages, we chose NetworkX [58] for its completeness and ease of integration with other development tools.This framework provides several utilities to plot graph appearances.We choose two different classical layouts.The first is the KAMADA-KAWAI node layout [9] computed by optimizing the Stress function.In few words, this force-directed method models the layout dynamic as springs between all pairs of vertices, with an ideal length equal to their graph-theoretic distance.The latter is the SPECTRAL layout, which leverages the unnormalized Laplacian L and its eigenvalues to build cartesian coordinates for the nodes [67], formally: where Λ and Û correspond to the eigenvalues and eigenvectors, respectively, and using the first two non-trivial eigenvectors (k = 2) as the actual node coordinates. 9We remark that Eq. 13 and 14 produce different outputs.This layout tends to highlight clusters of nodes in the graph. 10ach training graph is enriched by Positional Encodings defined as k-dimensional Laplacian Eigenvectors (see Section IV-B) and is processed by each of the tested GNN models to predict the node coordinates.Hence, we need a loss function capable to discern if the generated layout is similar to the corresponding ground truth.Furthermore, trained models should generalize the notion of graph layout beyond a simple one-to-one mapping.For these reasons, we leverage the Procrustes Statistic [15] as a loss function since it measures the shape difference among graph layouts independently of affine transformations such as translations, rotations and scaling.Given a graph composed of N nodes, the predicted node coordinates P = (p 1 , ..., p N ) and the ground-truth positions P = (p 1 , .., pN ), the Procrustes Statistic similarity is defined as the squared sum of the distances between P and P after a series of possible affine transformations [15].Formally: Tr (P T P P T P ) Tr (P T P )Tr ( P T P ) (15) where Tr(•) denotes the trace operator and the obtained metric R 2 assumes values in the interval [0, 1], the lower the better.We will use the Procrustes Statistic-based similarity both as the loss function to guide the model training, and to evaluate its generalization capability on the test set.
We tested the proposed framework comparing the test performances obtained by the three different GNN models described in Table I, GCN, GAT, GIN.All models are characterized by the ReLU non-linearity.The GAT model is composed by four attention heads.The variable in the GIN aggregation process is set to 0, as suggested in [22].We leverage the PyTorch implementation of the models provided by the Deep Graph Library (DGL) 11 .
We searched for the best hyper-parameters selecting the models with the lowest validation error obtained during training, in the following grid of values: size of node hidden states x i in {10, 25, 50}; learning rate η in {10 −4 , 10 −3 , 10 −2 }; the number of GNN layers in {2, 3, 5}; PE dimension k in {5, 8} (20 is added to the grid in the case of the SPARSE dataset, given its greater node number lowerbound); drop-out rate in {0.0, 0.1}.We considered 100 epochs of training with an early stopping strategy given by a patience on the validation loss of 20 epochs.For each epoch, we sampled non-overlapping mini-batches composed by β graphs, until all the training data were considered.We searched for the best mini-batch size β in {32, 64, 128}.We devised several competitors in order to asses the performances of the proposed approach.Given that Laplacian PEs available at node-level are powerful descriptors of the neighboring graph structure, we leverage a Multilayer Perceptron (MLP) as a baseline.This neural predictor learns a mapping to the node coordinates, solely exploiting the available local information.We compare the performances obtained by GNNs with Laplacian PEs against those achieved by the three corresponding variants of rGNNs, which we denote with rGCN, rGAT, rGIN.For a fair comparison, we searched in the same hyperparameter space for all the baseline and competitors.In Figures 3 and 4, we report a qualitative evaluation obtained by the best performing models for each different GNN architecture on three randomly picked graphs from the test set of each dataset.Figure 3 shows the aforementioned evaluation in the case of the KAMADA-KAWAI layout supervision, both for the ROME dataset (first four columns, where the first one depicts the Ground Truth (GT)) and for the SPARSE dataset.Figure 4 shows the same analysis in the case of the SPECTRAL layouts.The results show the good performances of the GND framework in generating two heterogeneous styles of graph layouts, learning from different ground truth node coordinates.
11 https://www.dgl.ai/In order to give a more comprehensive analysis, we report in Table II a quantitative comparison among the global Procrustes Statistic similarity values obtained on the test set by the best models, for both datasets.We report the average score and its standard deviation over three runs with different seeds for the weights random number generator.
The strength of the Laplacian PE is validated by the decent performances yielded by the MLP baseline.Conversely, the random features characterizing the rGNNs are not sufficient to solve this node regression task.Some additional structural information is required in order to jointly represent the node position and its sorroundings.Indeed, all the models exploiting the proposed solution outperform the competitors.The improved performances with respect to MLP are due to the fact that nodes states receive implicit feedback on their own position during the message passing steps.The proposed GAT model with Laplacian PE achieves the best performances in all the settings.We believe that the attention mechanism plays a crucial role in the task of distinguishing the right propagation patterns, alongside the fact that the multi-head attention mechanism provides a bigger number of learnable parameters with respect to the competitors.
In general, the SPECTRAL layout is easier to be learned by the models.This can be due to the fact that the Laplacian PE represent an optimal feature for this task, given the common spectral approach.Even from a qualitative perspective generated layouts are almost identical to the Ground Truth.Vice versa, the KAMADA-KAWAI layout represents an harder task to be learned from ground-truth positions, especially in the case of the SPARSE dataset.As pointed out in [52], Laplacianbased PE have still some limitations given by natural symmetries such as the arbitrary sign of eigenvectors, that several recent works are trying to solve [49].

E. GNNs learn to draw minimizing aesthetic loss functions
In Section IV-D, GNDs explicitly minimize the distances with respect to certain ground-truth node positions, hence learning to draw directly from data according to certain layouts.In this second experimental setting, instead, we want to build GNNs capable to draw at inference time respecting certain aesthetic criteria which are implicitly learnt during training.We defined our framework in such a way that powerful PE features are mapped to 2D coordinates.Given a smooth and differentiable loss function defined on such output, we can leverage the BP algorithm in order to learn to minimize heterogeneous criteria.We investigate the case in which the GNN models minimize the Stress function (see Eq. 1) on the predicted node coordinates.Only during the training phase, for each graph, we compute the shortest-path d ij among every node couple (i, j).At inference time, the GND framework process the graph topology (the adjacency matrix) and the node features, directly predicting the node coordinates, without the need of any further information.
We use the same experimental setup, competitors and hyper-parameters selection grids of Section IV-D.However, according to a preliminary run of the models which achieved poor performances, we varied the hidden state dimension grid to {100, 200, 300}.This means that this task need a bigger representational capability with respect to the previous one, which is coherent with the complex implicit nature of the learning problem.We set the Stress normalization factor to w ij = 1 dij (hence, α = 1) and compute the averaged Stress function 12 .For this experiment, we use the stress value obtained on the validation split as the metric to select the best performing model.For comparison, we report the stress loss values obtained by three State-of-the-art Graph Drawing methods.Neato 13 leverage the stress majorization [30] algorithm to effectively minimize the stress.PivotMDS [14] is a deterministic dimension reduction based approach.Finally, ForceAtlas2 [8] generates graph layouts through a force-directed method.We report in Figure 5 and 6 some qualitative examples of the graph layouts produced by the best selected GNN models on test samples (the same graphs selected for Figure 3) of the where D is the number of considered node couples. 13Implementation available through Graphviz, https://graphviz.comtwo datasets, following the aforementioned setting.Noticeably, all three models succeed in producing a layout that adheres to the typical characteristics of graphs obtained via Stress minimization.In particular, for reference on the drawing style, the layouts of these same graphs generated via the Kamada-Kawai algorithm (that also minimize stress) are depicted in the first and fifth columns of Figure 3, for ROME and SPARSE dataset respectively.Comparing the graph layout produced by the various GNN models and the aforementioned ones from Kamada-Kawai, also in this case it is easy to see from a qualitative analysis that the GAT model is the best performing one.The peculiar characteristics of the SPARSE dataset (sparse connection patterns, causing many symmetries and isomorphic nodes) entail a hardship in minimizing the stress loss function in some of the reported examples.
A quantitative comparison is reported in Table III, with the stress values obtained by the best models for each competitor and dataset, both at training time and test time, averaged over three runs initialized with different seeds.Once again, GAT performs the best.The metrics obtained by the GIN model highlight an overfit of the training data, given the selected grid parameters.GND models obtain better stress than all the SOTA Graph Drawing packages, with Neato being the best performing one in terms of stress minimization, as expected.Similar conclusion with respect to the previous experiment can be drawn regarding the results obtained by rGNNs and MLP.Indeed, these result show how learning to minimize stress requires both positional and structural knowledge, and that the message passing process foster the discriminative capability of the learned node states, with respect to solely exploiting local information.
Summing up, the experimental campaign showed the generalization capabilities of the proposed framework even in the task of minimizing common aesthetic criteria imposed on the GNNs node-wise predictions, such as the stress function, on unseen graphs.The GND framework is capable to predict node positions on unseen graphs respecting typical stress minimization layouts, without providing at inference time any explicit graph-theoretic/shortest-path information.

F. GNNs learn to Draw from Neural Aesthetes
In the previous Section, we showed that GNDs are capable to learn to minimize a differentiable smooth function that implicitly guides the node coordinates positioning.In a similar way, the Neural Aesthetes presented in Section III provide a smooth differentiable function that can be leveraged to find a good gradient descent direction for the learning parameters.In this Section, we mix the two proposals in order to build a Graph Neural Drawer that learns to generate graph layouts thanks to the gradients provided by the edge-crossing Neural Aesthete, and, eventually, to optimize the combination of several aesthetic losses.
At each learning epoch, GND minimizes the loss function H(P ) defined in Eq. 7, over the whole edge list E. The loss function can be computed as follows: the GNN model process the graph and predicts node-wise coordinates.Given such predicted node positions and the input graph adjacency SPARSE GCN GAT GIN Fig. 6.STRESS MINIMIZATION ON SPARSE.Same setting of Figure 5.
matrix, the Neural Aesthete (which was trained beforehand as explained in Section III-C) processes couples of arcs and output their degree-of-intersection.The overall loss function can then be composed by the contribution given by each of the considered arc-couples, as in Eq. 6.
We restrict our analysis to the Rome dataset, exploiting a GAT model with 2 hidden layers, an hidden size of node state of 25, PE dimension k = 10, learning rate η = 10 −2 .We compare the graph layout generated by this model in three randomly picked test graphs, comparing three different loss function definitions: (i) stress loss, (ii) Neural-Aesthete edgecrossing based loss H(P ), (iii) a combination of the two losses with a weighing factor λ = 0.5 acting on the Neural Aesthete loss, in particular: LOSS(P) = STRESS(P ) + λH(P ).
We report in Figure 7 some qualitative results on three test graphs (one for each row).We compare the layout obtained optimizing the stress function (first column, see Section IV-E), the edge-crossing Neural Aesthete (second column) and the combination of the two losses.
The styles of the generated layout are recognizable with respect to the plain optimization of the Neural Aesthete with Gradient Descent (see Figure 1), meaning that the GND framework is able to fit the loss provided by the Neural Aesthete and to generalize it to unseen graphs.Noticeable, the introduction of the combined loss functions (third column in Figure 7) helps in better differentiating the nodes in the graph with respect to the case of solely optimizing stress.The Neural Aesthete guided layouts (second and third column) tend to avoid edge intersections, as expected.This opens the road to further studies in this direction, leveraging the generality of the Neural Aesthetes approach and the representation capability of GNNs.

G. Computational Complexity
The proposed framework leverages the same computational structure of the underlying GNN model, which we can generally describe, for each parameter update, as linear with respect to the edge number O T (|V|+|E|) , where T is the number of iterations/layers, |V| the number of nodes and |E| the number of edges.Through our approach, there is not any increase in the computation related to the graph topology or the edge connection patterns.At inference time, the only additional requirement is the computation of the Laplacian PEs, requiring O(E 3/2 ), with E being the number of edges, that however can be improved with the Nystrom method [68], [52].

H. Scaling to bigger graphs
Common Graph Drawing techniques based on multidimensional scaling [7] or SGD [12] require ad-hoc iterative optimization processes for each graph to be drawn.Additionally, dealing with large scale graphs -both in terms of number of nodes and number of involved edges -decreases the time efficiency of these approaches.Conversely, once a GND has been learned, the graph layout generation consist solely in the extraction of Laplacian PE followed by a forward pass on the chosen GNN backbone.In this Section, we prove the ability of GND to scale to real-world graphs, providing quantitative results in terms of computational times and a qualitative analysis on the obtained graph layouts, with respect to SOTA Graph Drawing techniques.We employed the best performing GAT model trained to minimize the stress loss on the Rome dataset ( Section IV-E).We test the model inference performances on bigger scale graphs from the SuiteSparse Matrix Collection. 14We report in Figure 8 the computational times required by the different techniques to generate graph layouts of different scale, from the dwt_n graph family.We analyze both the correlation on graph order (left -varying number of nodes) and size (right -varying number of edges).We compare the GND execution times against those of the NetworkX-GraphViz implementation of neato and sfdp, the latter being a multilevel force-directed algorithm that efficiently layouts large graphs.We also tested the Fruchterman-Reingold force-directed algorithm implemented in NetworkX (denoted with FR) and the PivotMDS implementation from the NetworKit C++ framework [69].The tests where performed in a Linux environment equipped with an Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz, 128 GB of RAM and an NVIDIA GeForce RTX 3090 GPU (24 GB).We report the average execution times over three runs (we omit the variances due to their negligible values).These results confirm the advantages of the proposed approach.While all the competitors require expensive optimization process that increase their impact with bigger graph scale, the fast inference step carried on by GNDs assures small timings even with big graphs.Computing Laplacian PE is scalable and does not hinder the time efficiency of the proposed method.To asses the quality of the generated layouts, we report in Figure 9 a comparison among the ones yielded by GND the framework, sfdp and PivotMDS on several graphs from the SuiteSparse collection (we report the graph name, its order |V| and size |E|).While we remark that in this experiment we exploited a GND model trained on a smaller scale dataset (i.e., Rome), the performances show a significant ability of the model to generalize the learned laws (e.g., the stress minimization in this case) to unseen graphs, even when dealing with diverging characteristics.However, we also remark that graphs having very diverse structures from the training distribution may be not correctly plotted.The causes of such performances drop are twofold.First, the intrinsic dependance of neural models on the inductive biases learned during the training process leads to an inability to generalize to unseen graph topologies.On the other hand, the limitations of Laplacian PE to discriminate certain graph simmetries or structures [52] may be further compounded with larger scale datasets, which is an active area of research [49].

V. CONCLUSION
Starting from some very interesting and promising results on the adoption of GNNs for graph drawing, which are mostly based on supervised learning, in this paper we proposed a general framework to emphasize the role of unsupervised learning schemes based on loss functions that enforce classic aesthetic measures.When working in such a framework, referred to as Graph Neural Drawers, we open the doors towards the construction of a novel machine learning-based drawing scheme where the Neural Aesthete drives the learning of a GNN towards the optimization of beauty indexes.While we have adopted the Neural Aesthetes only from learning to minimize arc intersections, the same idea can be used for nearly any beauty index.We show that our framework is effective also for drawing unlabelled graphs.In particular, we rely on the adoption of Laplacian Eigenvector-based positional features [52] for attaching information to the vertexes, which leads to very promising results.

Fig. 1 .
Fig. 1.NEURAL AESTHETE FOR EDGE CROSSING.Left-to right, Graph layouts with starting random node coordinates (START), optimized by minimizing stress function with Gradient Descent (STRESS), optimized by Gradient Descent applied on the Neural Aesthete for edge-crossing loss (NA-CROSS), optimized by alternating stress loss and Neural Aesthete loss in subsequent iterations (COMBINED).We report the graph layouts generated in three random sparse graphs, one for each row.

Fig. 2 .
Fig.2.Datasets composition statistics.On the left, the histogram of the graph order (number of nodes for each graph |V|) for both the analyzed datasets.On the right, the histogram of the graphs sizes (number of edges |E|).The SPARSE dataset is characterized by a sparse connection pattern.

Fig. 3 .Fig. 4 .
Fig. 3. KAMADA-KAWAI layout.Qualitative example of the predicted node coordinates for both the ROME dataset (first four column) and the SPARSE dataset (subsequent four columns).Each row depicts the Ground-Truth positions (GT), the graph layout produced by GCN, GAT, GIN model, left-to-right.We report the predictions on three different test graphs (rows).

Fig. 5 .
Fig. 5. STRESS MINIMIZATION ON ROME.Qualitative example of the graph layout produced by three GNN models on the test graphs of the ROME dataset.Each row contains one of the same three graphs depicted in the first column of Figure 3 for comparison with the layout produced by Kamada-Kawai [9].

Fig. 7 .
Fig. 7. LEARNING FROM THE NEURAL AESTHETE.We report the layouts obtained on three randomly picked test graphs from the Rome dataset, one for each row.Left-to-right: Graph layout generated by optimizing the stress loss function, the edge-crossing Neural Aesthete based loss (denoted with NA-Crossing), the combination of the two losses with a weighing factor λ = 0.5.

Fig. 8 .
Fig. 8. Computational time comparison on dwt_n graphs.Left: correlation between the number of nodes in the graph and the layout generation timings of the analyzed Graph Drawing methods.Right: correlation between number of edges available in the graphs and the corresponding layout generation timings.

Fig. 9 .
Fig. 9. Large scale graphs from the SuiteSparse Matrix collection.Left to right: layouts produced by a GAT-based GND (trained to minimize stress on the Rome dataset), layout produced by the sfdp algorithm for large scale graphs and outcome of the PivotMDS method.We report for each row the name of the graph from the dataset collection, its order (|V|) and size (|E|).

TABLE I COMMON
IMPLEMENTATIONS OF GNN AGGREGATION MECHANISMS.SEE THE MAIN TEXT AND THE REFERENCED PAPERS FOR FURTHER DETAILSON THE FORMULATIONS.Mean σ cvW (t) xv (t−1) + u∈Nv cu,vW (t) xu(t−1)

TABLE II PROCRUSTES
STATISTIC SIMILARITY (DEFINED IN EQ. 15) ON THE TEST SPLIT OF THE ROME AND SPARSE DATASET.WE COMPARE THREE GND MODELS WITH TWO GRAPH LAYOUTS GENERATION, KAMADA-KAWAI AND SPECTRAL.WE REPORT THE AVERAGE VALUES AND STANDARD

TABLE III AVERAGE
STRESS LOSS VALUE OBTAINED ON THE TRAINING SET AND TEST SET BY THE BEST SELECTED MODELS, FOR EACH DATASET.WE