Metric projection for dynamic multiplex networks

Evolving multiplex networks are a powerful model for representing the dynamics along time of different phenomena, such as social networks, power grids, biological pathways. However, exploring the structure of the multiplex network time series is still an open problem. Here we propose a two-step strategy to tackle this problem based on the concept of distance (metric) between networks. Given a multiplex graph, first a network of networks is built for each time step, and then a real valued time series is obtained by the sequence of (simple) networks by evaluating the distance from the first element of the series. The effectiveness of this approach in detecting the occurring changes along the original time series is shown on a synthetic example first, and then on the Gulf dataset of political events.


Introduction
When the links connecting a set of nodes arise from different sources, a possible representation for the corresponding graph is the construction of networks on the same nodes, one for each source. The resulting structure is known as a multiplex network, and each of the composing graphs is called a layer. Multiplex networks are quite effective in representing many different real-world situations [1,2,3], and their structure helps extracting crucial information about the complex systems under investigation that would instead remain hidden when analyzing individual layers separately [4,5,6]; furthermore, their relation with time http://dx.doi.org/10.1016/j.heliyon.2016.e00136 series analysis techniques has recently gained interest in the literature [7]. A key property to be highlighted is the correlated multiplexity, as stated in [8]: in realworld systems, the relation between layers is not at all random; in fact, in many cases, the layers are mutually correlated. Moreover, the communities induced on different layers tend to overlap across layers, thus generating interesting mesoscale structures.
These observations guided the authors of [9] in defining a network having the layers of the original multiplex graph as nodes, and using information theory to define a similarity measure between the layers themselves, so to investigate the mesoscopic modularity of the multiplex network. Here we propose to pursue a similar strategy for defining a network of networks derived from a multiplex graph, although in a different context and with a different aim. In particular, we project a time series of multiplex networks into a series of simple networks to be used in the analysis of the dynamics of the original multiplex series. The projection map defining the similarity measure between layers is induced by the Hamming-Ipsen-Mikhailov (HIM) network distance [10], a glocal metric combining the Hamming and the Ipsen-Mikhailov distances, used in different scientific areas [11,12,13,14,15,16]. The main goal in using this representation is the analysis of the dynamics of the original time series through the investigation of the trend of the projected evolving networks, by extracting the corresponding real-valued time series obtained computing the HIM distance between any element in the series and the first one.
For instance, we show on a synthetic example that this strategy is more informative than considering statistics of the time series for each layer of the multiplex networks, or than studying the networks derived collapsing all layers into one including all links, as in [17,18] when the aim is detecting the timesteps where more relevant changes occur and the system is undergoing a state transition (tipping point) or it is approaching it (early warning signals). This is a classical problem in time series analysis, and very diverse solutions have appeared in literature (see [19] for a recent example). Here we use two different evaluating strategies, the former based on the fluctuations of mean and variance [20] (implemented in the R package changepoint https :/ /cran .r-project .org /web /packages /changepoint /index .html), and the latter involving the study of increment entropy indicator [21].
We conclude with the analysis of the well known Gulf Dataset (part of the Penn State Event Data) concerning the 304.401 political events (of 66 different categories) occurring between 202 countries in the 10 years between 15 April 1979 to 31 March 1991, focusing on the situation in the Gulf region and the Arabian peninsula. A major task in the analysis of the Gulf dataset is the assessment of the translation of the geopolitical events into fluctuations of measurable indicators. A similar networkbased mining of sociopolitical relations, but with a probabilistic approach, can be found in [22,23,24]. Here we show the effectiveness of the newly introduced methodology in associating relevant political events and periods to characteristic behaviors in the dynamics of the time series of the induced networks of networks, together with a simple overview of the corresponding mesoscale modular structure.

Background
The Hamming-Ipsen-Mikhailov (HIM) metric [10,25] is a distance function quantifying in the real interval [0, 1] the difference between two networks on shared nodes. The HIM metric linearly combines an edit distance, the Hamming (H) [26,27,28] and a spectral distance, the Ipsen-Mikhailov (IM) [29]. Edit distances are local metrics, functions of insertion and deletion of matching links, while spectral measures are global distances, functions of the network spectrum. Local functions disregards the overall network structure, while spectral measures cannot distinguish isospectral graphs. As its characterizing feature, HIM is a glocal distance that overcomes the drawbacks of local and global metrics when separately considered.
Furthermore, its definition can be naturally extended to directed networks. Hereafter we give a brief description of the H, IM and HIM distances, graphically summarized in Figure 1.
Notations. Let  1 and  2 be two simple networks on nodes, whose adjacency matrices are (1) and (2) , with (1) , (2) ∈  , where  = 2 = {0, 1} for unweighted graphs and  = [0, 1] ⊆ ℝ for weighted networks. Let then be the × identity matrix = , let 1 be the × unitary matrix with all entries equal to one and let 0 be the × null matrix with all entries equal to zero. Denote then by  the empty network with nodes and no links (with adjacency matrix 0 ) and by  the clique (undirected simple full network) with nodes and all possible ( − 1) links, whose adjacency matrix is 1 − . Finally, the Laplacian matrix of an undirected network is defined as the difference = − between the degree matrix and the adjacency matrix , where is the diagonal matrix of vertex degrees. is positive semidefinite and singular, with eigenvalues 0 = 0 ≤ 1 ≤ ⋯ ≤ −1 .

The Hamming distance
The Hamming distance, one of the most common dissimilarity measures in coding and string theory and recently used also for network comparison, evaluates the presence/absence of matching links on the two compared networks. In terms of adjacency matrices, the expression for the normalized Hamming metric reads as where the normalization factor ( − 1) bounds the range of the function H in the interval [0, 1]. The lower bound 0 is attained only for identical networks (1) = (2) , the upper limit 1 for complementary networks (1) + (2) = 1 − . When  1 and  2 are unweighted networks, H( 1 ,  2 ) is just the fraction of different matching links over the total number ( − 1) of possible links between the two graphs.

The Ipsen-Mikhailov distance
The Ipsen-Mikhailov IM metric stems from the realization of an nodes network as an molecules system  connected by identical elastic springs, according to the adjacency matrix . The dynamics of the spring-mass system  can be described by the set of differential equations The vibrational frequencies of  are given by = √ , while the spectral density for a graph in terms of the sum of Lorentz distributions is defined as where is the common width and is the normalization constant defined by the condition ∞ ∫ 0 ( , )d = 1, and thus The scale parameter specifies the half-width at half-maximum, which is equal to half the interquartile range. Then the spectral distance between two graphs  1 and  2 on nodes with densities  1 ( , ) and  2 ( , ) can be defined as Since arg max ( 1 , 2 ) ( 1 ,  2 ) = ( ,  ) for each , denoting by the unique solution of ( ,  ) = 1, the normalized Ipsen-Mikhailov distance between two undirected networks can be defined as so that IM is bounded between 0 and 1, with upper bound attained only for { 1 ,  2 } = { ,  }.

The Hamming-Ipsen-Mikhailov distance
Consider now the cartesian product of two metric spaces ( ( ), H) and , for ∈ [0, +∞): where in what follows we will omit the subscript when it is equal to one. Note all distances HIM will be nonzero for non-identical isomorphic/isospectral graphs.

A minimal example
In Figure 2

Theory
In Figure 5 we show a graphical sketch of the collapsing of a multiplex network with five layers.
Caveat: consider a sequence of binary multiplex networks such that, for each of the possible ( −1) 2 links and for each timestep, there exists at least one layer including this link. Then the collapsed projection, at each time step, is the full graph on nodes, and, as such, it has no temporal dynamics, regardless of the evolution of each single layer.
The distance series. To investigate the dynamics of  ( ) for = 1, … , , we construct a suite of associated time series by means of three different procedures, all involving the HIM distance between each network in a given sequence and the first element of the sequence itself. The first group D1 of distance series is obtained by evaluating the dynamics of each layer considered separately: In Figure 6 we show the construction of the distance series D1 for the first layer of the multiplex network in Figure 3.
The second series, D2, collects the metric dynamics of the collapsed projection  : An example of construction of D2 for the five layers multiplex network of Figure 5 is shown in Figure 7.  Finally, the last series D3 collects the metric dynamics of the metric projection  , and the corresponding example for the multiplex networks in Figure 4 is shown in Figure 8: Dynamics indicators. The dynamics of the time series * is quantitatively analyzed by means of a set of indicators, assessing the series' information content and detecting occurring tipping points. with a linear penalty function through dynamic programming. Finally, we will use Changepoints for a Range of PenaltieS (CROPS) [38] to obtain optimal changepoint segmentations of data sequences for all penalty values across a continuous range.

A synthetic example
Consider now a sequence of binary multiplex networks with = 30, = 5 and = 10, generated as follows.
Define the perturbation function Π ( , ( , )) taking as entries a binary simple Then, each layer at a given time step is defined through the following rule: In Figure 9 we show the evolution along the 30 timepoints of the 5 curves for 1 ( ), its average 1 = 1 5 ∑ 5 =1 ( ) and 2 , 3 . To assess the information content of each curve we use the Increment Entropy indicator IncEnt, whose value increases with the series' complexity: the IncEnt values are reported in Table 1.
Among the evolving layers, 2 and 4 have the largest IncEnt, while the other three layers show a lower level of complexity. As expected, the average 1 and the collapsed network distance 2 has very low IncEnt value, yielding that both averaging the distances and collapsing the layers lose information about the overall

Network statistics
Consider in this section the set of 304401 edges connecting the 202 nodes independently of their class. In Table 4 Table 5 we list the top-10 links ranked by occurrence, together with the number of occurrences itself and the corresponding percentage over the total number of edges for the period. As it happens for the nodes, there are a few key links throughout the whole timespan which are consistently present in most of the important events, with different proportions. However, in some of the events, there is an interesting wide gap in the number of occurrences between the very top edges and the remaining ones, e.g., Iraq-USA in FGW (and post) and IDC, and Iran-Iraq during the corresponding war and in the pre-FGW, yielding that these are the links mainly driving the whole network evolution.
In Figure 10 we  i.e., throughout the whole Iran-Iraq War, while they go decaying quickly afterwards,    From both the multidimensional scaling plots in Figure 16  Community structure of  . We conclude by analyzing the dynamics of the mesostructure of the layer network  as extracted by the Louvain community detection algorithm [44,45,46,47]. For any temporal step, the Louvain algorithm clusters the 66 nodes (WEIS categories) of  into two or three communities,  [48]. whose dimension along time is shown in Figure 17. In Figure 18 we show, for each date, which community each category (on the rows) belongs to; WEIS categories are ranked according to their community distribution, i.e., decreasing number of presences in Comm. #1 and increasing for Comm. #2. Thus in top rows we have the categories lying in Comm. #1 during all the 240 months (layers 7,10,11,28,34,40), while bottom rows are reserved to the categories always belonging to Comm. #2 (3,4,19,25,48,52,58): their description in terms of WEIS categories is shown in Table 6, while the full community distribution is reported in Table 7 and graphically summarized by the triangleplot in Figure 19. Focusing on the categories that are consistently lying in a given community throughout all 240 months, some of them are semantically similar: for instance, consult, assistance, action request in community #1 while two distinct groups emerge in community #2, namely admit wrongdoing, cede power, apologize, reward on one side and warn of policies, sanction threats and halt negotiations characterizing the second group. However, it is interesting the constant presence of the category charge/criticize/blame/disapprove in community #1. Moreover, there is no strong polarization for Community #3. Many layers sharing the same (or similar) WEIS second level category (Yield, Comment, Consult, etc.) are quite close in the community distribution ranked list, with a general escalating trend proceedings from help request (or other more neutral actions) to more severe situations growing together with the community distribution rank.

Conclusion
We introduced here a novel approach for the longitudinal analysis of a time series of multiplex networks, defined by mean of a metric transformation conveying the information carried by all layers into a single network for each timestamp, with the original layers as nodes. The transformation is induced by the Hamming-Ipsen-Mikhailov distance between graph sharing the same nodes, and it preserves the key events encoded into each instance of the multiplex network time series, making it more efficient than the collapsing of all layers into one collecting all edges for