Deep Learning Super-Diffusion in Multiplex Networks

Complex network theory has shown success in understanding the emergent and collective behavior of complex systems [1]. Many real-world complex systems were recently discovered to be more accurately modeled as multiplex networks [2-6]---in which each interaction type is mapped to its own network layer; e.g.~multi-layer transportation networks, coupled social networks, metabolic and regulatory networks, etc. A salient physical phenomena emerging from multiplexity is super-diffusion: exhibited by an accelerated diffusion admitted by the multi-layer structure as compared to any single layer. Theoretically super-diffusion was only known to be predicted using the spectral gap of the full Laplacian of a multiplex network and its interacting layers. Here we turn to machine learning which has developed techniques to recognize, classify, and characterize complex sets of data. We show that modern machine learning architectures, such as fully connected and convolutional neural networks, can classify and predict the presence of super-diffusion in multiplex networks with 94.12\% accuracy. Such predictions can be done {\it in situ}, without the need to determine spectral properties of a network.

Complex systems are often well-approximated by complex networks or graphs, i.e. elements of the system can be represented as nodes where connections between nodes correspond to non-trivial interaction patterns [1,9].A multiplex (multi-layer) network is a generalized framework to study complex networks with more than one interacting layer: each layer represents a different type of interaction between constituent nodes [2][3][4][5][6].Vast efforts have only more recently been devoted to study the differences and new phenomenon in networks when multiplexity is taken into account [10,11].By considering multiplexity, a network becomes a so called multi-layer structure-thereby aggregating all layers into one singlelayer network, or considering each layer separately typically entails a lossy compression [6].
Considering multiplexity has lead to the discovery of new behavior in networked systems.The seminal paper by Buldyrev [14] proves multiplex networks (interdependent networks in their case) exhibit a first order abrupt phase transition in their percolation phase diagram.This finding is in contrast to continuous phase transitions exhibited in the percolation phase diagram of monoplex networks (single-layer networks) [15,16].Then numerous papers tried to understand different behaviors that arise in studying multilayer graphs.Examples include, structural processes such as site, bond percolation [8, 20-22, 24, 26], first-depth percolation or observability, kcore [25], optimal percolation [27], as well as dynamical processes such as epidemic spreading [31], synchronization [33], diffusion [29,30,37], controllability [34] and recently multi-dynamics running on top of multilayer structures [23,[41][42][43][44].Here we focus on superdiffusion-a physically observed phenomena in which a coupling between two distinct types of networks, creates 'shortcuts' e.g. using a bus and then a metro can result in crossing town much faster than relying on either buses or the metros alone.
Conventionally, multiplex networks and particularly super-diffusion has been studied using a variety of tools from statistical physics, spectral graph theory and other tailored methods which are paired with powerful computer simulations and/or numerical methods [2].The computer simulations are case dependent; e.g.Monte Carlo sampling is used to study percolation, or simulation of epidemic spreading, whereas spectral methods are related to some specific problems including community detection, percolation threshold, combinatorial problems, etc.A specific (yes/no) question answered using spectral graph theory is the presence of super-diffusion in a given multiplex network.In current approaches, the spectral gap of the graph Laplacian-the first non-trivial eigenvalue of the Laplacian-is calculated to determine whether a multiplex system exhibits super-diffusion or not [30].
Machine learning has been successfully applied in a number of complex networks studies [52].This includes hyperbolic embedding of complex networks [54], link prediction [55] and representation learning [56].However, augmenting the tool-set of complex networks with modern deep learning approaches is still in its nascent stages, with work reporting link prediction [53].The ability of modern machine learning techniques to classify, identify, and/or interpret massive data sets such as images foreshadows their suitability to provide network scientists with similar success cases when studying the extremely large data sets embodied in the state-space of complex networks.
In this study, we employ a deep learning approach which trains a neural network to (successfully) classify the presence of super-diffusion phenomena in multiplex networks.We represent the data-set in a gray-scale image and employ two different neural network architectures from machine learning (multilayer perceptron) and deep learning (convolutional neural network [67], [68]).Definitively, our trained neural networks predict super-diffusion in multiplex networks with 94.12% accuracy.Such predictions can be done in situ, without the need of a networks spectral properties.We believe that the success of this finding will foster a wide application of deep learning to study complex networks.
a. Super-diffusion.Diffusion processes are among the simplest dynamics studied on multiplex structures [30].Gomez et al. [30] studied diffusion in a duplex (two layer multiplex network) and found out that under certain circumstances, diffusion can be faster in a multiplex network compared to diffusion in each of the layers separately [30].This phenomena-which is called super-diffusion in multiplex networks-can be predicted by leveraging the relation between the spectrum of a Laplacian matrix and diffusion time-scales.The combinatorial Laplacian (as called in graph theory) governs dynamics of diffusion running on top of the graph i.e. the Laplacian is the generator of the stochastic process describing the random walk on the graph.While there are fundamental connections between the spectrum of the Laplacian of a graph with structural properties of a graph itself, there are also relationships between the spectrum and dynamics running on top of the graph.In the case of diffusion processes, the second smallest eigenvalue (the first nontrivial one) of the Laplacian controls the timescale of the diffusion process.In the case of multiplex networks, the Laplacian matrix has the counterpart which is called supra-Laplacian [6] which is the same as the Laplacian matrix yet offers an easier interpretation (with proper labeling of nodes) as a block diagonal matrix in which diagonal blocks are the Laplacian of each of the layers (capturing intra-layer connections) and offdiagonal blocks which capture interlayer connections.
Gomez et al. [30] showed that strength of interlayer connections (w) is a tunable parameter between slow and fast diffusion regimes.For weak inter-layer connections, multiplexity slows down the diffusion while strong inter-layer connections define a new behavioral regime which asymptotically approaches the behavior of the aggregated layer (the superposition of the two layers).In the strong coupling regime, if super-diffusion is exhibited or in other words, multiplexity helps diffusion happen faster.These two distinct regimes are direct consequence of structural transitions in the formation of multiplex networks starting from separate layers and then placing inter-layer links between them and increasing the weight of these links [36,[38][39][40]73].The phenomena was also reported recently for directed multiplex networks [37].
De Domenico et al. [32] numerically tested for the super-diffusion phenomena on multiplex networks composed of two layers which first and second layers are Erdős-Rényi random graphs from G(N, p 1 ) and G(N, p 2 ) respectively, in which, G(N, p) is the classical random graph ensemble which there are N nodes and each pair of the nodes are connected with probability p.They found out that super-diffusion in these specific multiplexes happens mostly in the same connection probability regime (p 1 ≈ p 2 ) [32].For a multiplex that its layers are generated with very different probabilities, multiplexity or having multilayer structure has no benefits in terms of diffusion timescale and hence, diffusion in at least one of the layers when considered independently will be faster than this connected multiplex.Fig. 1 shows the phenomena of super-diffusion and its phase diagram on random multiplex networks.
While the super-diffusion phase diagram achieved using numerical test, the following question is legitimate: is it possible to design a machine learning algorithm to detect presence of super-diffusion given a multiplex network?To answer this question we propose a supervised ML algorithm.
Which neural network structures are most suited to predict super-diffusion in multiplex networks?To this end the following architectures of Artificial Neural Networks (ANN) were applied: (1) fully connected neural networks (FCNN) with l 2 regularization; (2) FCNN with dropout and (3) convolutional neural networks (CNN).The realization was implemented using Keras API [46] backed by TensorFlow [45].
c. Training Set.We consider duplexes-multiplex networks with two layers (see appendix 2 for the generalization of the framework for multiplex networks with more than two layers).Both the first and second layers have exactly N nodes and are created independently according to the Erdős-Rényi random graph model: G(N, p 1 ) and G(N, p 2 ) respectively.In the G(N, p) model, a graph is constructed (here each graph forms one duplex layer) by connecting nodes randomly.Each edge is included in the graph with probability p independent from every other edge.Equivalently, all graphs with N nodes and M edges have equal probability of Here the parameter p in this model can be thought of as a weighting function-as p increases from 0 to 1, the model becomes more and more likely to include graphs with increasing numbers of edges and increasingly less likely to include graphs with fewer edges.In particular, the case p = 1/2 corresponds to the case where all 2 ( N 2 ) graphs on N nodes are chosen with equal probability.
Both layers are undirected and unweighted graphs.Connection patterns are encoded in an adjacency matrix A [1] (A [2] ) for the first (second) layer, where if there is a link between node i and node j (all intra-layer links have weights equal to 1).Then we connect these two identical layers by oneto-one interlayer links each with non-negative weight w.
We divide the p 1 -p 2 phase diagram into bins with δp = 0.02 for the training phase and δp = 0.01 in the testing phase.For each bin, we create 30 training and 20 test sample multiplex networks (G(N, p 1 ), G(N, p 2 )).If superdiffusion will be exhibited then we label the input set {A [k] , A [m] , . . .} as 1, and otherwise with 0. We generated 75, 000 random graph instances for the training set and 200, 000 instances for the test set.d.Fully connected neural networks for superdiffusion prediction.To feed our data to fully connected neural network we make the following transformation.Since adjacency matrices are symmetric N × N with zeros on the diagonal, we can represent multiplex networks as a matrix with an upper triangle equal to the upper triangle of the first layer and a lower triangle equal to the lower triangle of the second layer.Then this matrix can be reshaped to a N 2 × 1 column matrix.
To test the quality recovered from use of a fully connected neural network (FCNN) we choose a simple architecture which consists of one input layer, one hidden layer and one output layer with two neurons.The activation function of the hidden layer is the so-called, ReLU (Rectified Linear Unit) function.To use our FCNN as a classifier, the output layer is activated by a sigmoid function.For the first FCNN model, the loss function was chosen as average cross-entropy between predicted and actual values with additional l 2 regularization term to prevent over-fitting.The optimization procedure of the loss function was done by the extension to stochastic gradient descent [50].
The resulting accuracy of the model is 93% on the test data.The FCNN with dropout recovers the same result.
To reconstruct Fig. 1 we plot the distribution of the predicted result.The results are presented in Fig. 2 (b) and (c).We notice that the shape of the distributions varies for real super-diffusion (a) particularly for the leftdown and up-right corners of the plot.
e. Convolutional neural networks for super-diffusion prediction.The design of convolutional neural networks (CNN) which we utilized for the problem is shown on Fig. 3.The input of the CNN is two channel 50 × 50 binary image followed by two convolutional layers with a small size 5×5 filters and pooling layers.The result is flattened and connected to FCN with two outputs with preceding activation sigmoid function.
To feed data to the CNN we represented the 2 separated adjacency matrices of a multiplex network as binary images of 50 by 50 pixels and assigned those as two separated channels on the input layer.This procedure is shown on the left side of Fig. 3.
Interestingly enough the resulting accuracy 94% is slightly better for the CNN then for FCNN and the shape of the distribution of predicted result looks more similar to the real distribution Fig. 2(d-f).This fact points out that the representation of a multiplex network as a multichannel image for CNNs inputs is quite promising feature to investigate properties of multiplexes.And probably more complicated CNN's architectures could perform better results.See appendix 1 for more information on the accuracy of the results predicted by CNN.
f. Conclusions.We have found that convolutional neural network technology, developed for applications such as computer vision, can be used to detect superdiffusion in multiplex networks.We argue that this idea is quite natural and likely can attribute some of its success due to our representation of a multiplex layers adjacency matrix as channels of an image.
While traditional machine learning techniques have long been part of the complex networks tool-kit, deep learning is in its nascent days of application.Indeed, two-layer FCN was used as perhaps the simplest example of deep neural network, likewise with our simplistic simple CNN structure (l 2 regularization and drop-out techniques were used to prevent over fitting).In both cases, training resulted in high detection probability, thereby confirming proof of principle even for simplistic neural network architectures (over 94% accuracy).
Based on our findings, we however anticipate a further use of deep learning in the field of complex networks, such as detecting phase transitions, and network dismantling.Interestingly, the application of complex networks as a tool to study properties of supervised learning [47], unsupervised learning [49] and semi-supervised learning [48] has long been considered.By applying deep learning to study properties of complex networks we then hope to form a bridge and a future where these two subjects compliment each other with deep learning regularly applied to complex networks.
As mentioned, the main property of multiplex networks which appears useful and quite natural for CNN, is the adjacency matrix which can be represented as data-set for every layer.This has the same format as image channels and is likely accountable as to why CNN filters can detect features responsible for super-diffusion in multiplex networks so readily.
g. Appendix 1 Here we show some metrics obtained by a cross validation where we divided the whole set in 10 stratified folds (each preserve the percentage of samples for both classes).For every step we kept 9 folds for training and 1 for testing.Fig. 4 shows the ROC curve (receiver operating characteristic curve) of the different folds and the mean AUC (area under the ROC curve) of 98%.The average accuracy obtained from the different folds is 94%.Table I shows the confusion matrices for K-Fold validation (top) and for the independent test set (bottom).Fig 5 shows the loss curve for the 10-fold stratification over 40 epochs, as additional metrics we generated heatmaps for each epoch and measured against the ground truth with Mean Squared Error and Structural Similarity Index with results shown in Table II.
h. Appendix 2 Here we extend the framework presented in the main text to 3-layer multiplex networks (feeding CNN with adjacency matrices of the layers and following training procedure as explained in the main text).Fig. 6 shows the ability of our deep learning framework to predict super diffusion in case of multiplex networks with more than two layers.This proves the extension feasibility of the machinery to higher order multiplex networks.In the case of multiplex networks with three layers, We divide the p 1 -p 2 -p 3 phase diagram into bins with δp = 0.020.For each bin we create 15 training samples multiplex networks (G(N, p 1 ), G(N, p 2 ), G(N, p 3 )).If super-diffusion will be exhibited then we label the input set {A [k] , A [m] , . . .} as 1, and otherwise with 0. We generated 100, 000 random graph instances for the training set and 50, 000 instances for the test set (the latter has same δp = 0.020 and 20 samples per bin).) [38].The circular layout of networks are shown in color-code based on Fiedler vectors of L which is the eigenvector corresponding to the spectral gap; before the transition point networks are decoupled and after transition point they are coupled [36].
Flattening FC Out and then fed to the CNN as two separated channels (A [1] , A [2] ) (the complete input shape is [50    and p is the connection probability.For each (p1, p2, p3) we create 20 samples of the 3-layer random multiplex networks, then we check the condition of super-diffusion on each of them.Color coding is based on the probability of the existence of super diffusion in each bin.For clarity we kept only data points corresponding to probabilities higher than 0.5.Projections of the main heatmap in (x,y), (x,z) and (y,z) planes represent the super diffusion in corresponding sub-duplexes formed by different combinations of layers.The only change in the DNN architecture is the input layer which is composed of three channels (the complete input shape is [50 x 50 x 3]).The classification accuracy obtained is 94% Fig 6 shows the resulting heatmap on test set.i. Appendix 3 Here we extend the framework presented in the main text to 2-layer multiplex networks of arbitrary N nodes.We found that the original architecture can be extended to correctly classify bigger multiplex by stacking successive convolution (Conv) and pooling (Pool) layers.In the original N = 50 size we stacked two Conv plus Pool layers, the only change in the DNN architecture is the number of Conv plus Pool layers, The number of such stacks is N mod 50 + 1.Each layer is an Erdős-Rényi random graph sampled from G(N, p) ensemble, where N is number of nodes (N = 50, 100, 150, 200, 250, 300, 350, 400, 450 and 500) and p is the connection probability.Likewise the N = 50 case, we divide the training set in bins of δp = 0.002 and the test set with δp = 0.001, we create 30 samples per bin in train and 20 in test sets.

FIG. 1 .
FIG. 1. (Super-Diffusion in multiplex networks.)A heat map of super-diffusion in the whole phase diagram of 2-layer multiplex networks was created, in which each layer has been formed using the Erdős-Rényi random graph model G(N, p), where N is number of nodes (N = 50) and p is the connection probability.For each (p1, p2) we create 10 samples of the 2-layer random multiplex networks, then we check the condition of super-diffusion; λ2(L) ≥ Max{λ2(L1), λ2(L2)} and then color code the probability of the super-diffusion for the bin (p1, p2).The plot indicates that super-diffusion happens mainly in the region that connections probabilities are close to each other.The left (right) zoomed window shows the whole phase diagram of λ2(L) for the case that super-diffusion happens (does not happen).Vertical yellow line in the zoomed windows represents the phase transition point: (p * = 1 2 λ2(Q) where Q = L1 L † L2 and L † is the Moore-Penrose pseudoinverse of L = L 1 +L 2 2

FIG. 3 .
FIG.3.(Deep neural network architecture detecting super-diffusion.)Two-layer random multiplex instances were randomly generated; each layer was generated independently from random graph model G(N, p), which N is the number of nodes (= 50) and p is the connection probability.Adjacency matrices of each layer is represented as binary images of 50 × 50 and then fed to the CNN as two separated channels (A[1] , A[2] ) (the complete input shape is [50 × 50 × 2]).A first convolution (CONV1) is performed using a 5 × 5 filter, same padding, ReLU activation function and 32 channels (the output shape is [50 × 50 × 32]).One max pooling layer (POOL1) is used to half the input shape with window size of 2 × 2 and same padding (resulting in the shape of[25 × 25 × 32]).The second convolution (CONV2) has the same properties of (CONV1) except for the number of channels-growing now to 64 (shape of[25 × 25 × 64]).After a second max pooling layer (POOL2) that has the same properties of (POOL1) the resulting data (of shape[13 × 13 × 64]) is flattened to a single vector (of [10816] elements) and fed to a fully connected layer (of [1024] nodes) and to an final output layer (with[1] output node).The output is as a binary classifier which determines if the instance exhibits super diffusion or not.
FIG.3.(Deep neural network architecture detecting super-diffusion.)Two-layer random multiplex instances were randomly generated; each layer was generated independently from random graph model G(N, p), which N is the number of nodes (= 50) and p is the connection probability.Adjacency matrices of each layer is represented as binary images of 50 × 50 and then fed to the CNN as two separated channels (A[1] , A[2] ) (the complete input shape is [50 × 50 × 2]).A first convolution (CONV1) is performed using a 5 × 5 filter, same padding, ReLU activation function and 32 channels (the output shape is [50 × 50 × 32]).One max pooling layer (POOL1) is used to half the input shape with window size of 2 × 2 and same padding (resulting in the shape of[25 × 25 × 32]).The second convolution (CONV2) has the same properties of (CONV1) except for the number of channels-growing now to 64 (shape of[25 × 25 × 64]).After a second max pooling layer (POOL2) that has the same properties of (POOL1) the resulting data (of shape[13 × 13 × 64]) is flattened to a single vector (of [10816] elements) and fed to a fully connected layer (of [1024] nodes) and to an final output layer (with[1] output node).The output is as a binary classifier which determines if the instance exhibits super diffusion or not.

FIG. 5 .
FIG. 5. (Loss curve for K-Fold validation)The binary cross-entropy was used as a loss function.Shown are the mean value (solid line) and the standard deviation (shaded).

FIG. 6 .
FIG.6.(Generalizing more than layers.)Predicted super diffusion phase diagram in 3-layer random multiplex networks.Each layer is an Erdős-Rényi random graph sampled from G(N, p) ensemble, where N is number of nodes (N = 50) and p is the connection probability.For each (p1, p2, p3) we create 20 samples of the 3-layer random multiplex networks, then we check the condition of super-diffusion on each of them.Color coding is based on the probability of the existence of super diffusion in each bin.For clarity we kept only data points corresponding to probabilities higher than 0.5.Projections of the main heatmap in (x,y), (x,z) and (y,z) planes represent the super diffusion in corresponding sub-duplexes formed by different combinations of layers.The only change in the DNN architecture is the input layer which is composed of three channels (the complete input shape is [50 x 50 x 3]).The classification accuracy obtained is 94%

TABLE II .
MSE and SSIM metrics as a function of learning epochs.