Quantifying layer similarity in multiplex networks: a systematic study

Computing layer similarities is an important way of characterizing multiplex networks because various static properties and dynamic processes depend on the relationships between layers. We provide a taxonomy and experimental evaluation of approaches to compare layers in multiplex networks. Our taxonomy includes, systematizes and extends existing approaches, and is complemented by a set of practical guidelines on how to apply them.


Introduction
Multiplex networks provide a simple yet expressive way to model a wide range of physical and social systems as sets of entities connected by multiple types of relationships, that in this paper we also call layers following the terminology in [23].For example, a transport network can be modelled as a set of locations, such as cities or streets, connected by different types of public transport like airplanes, trains, and buses.Several studies have investigated the connection between layer similarity and other properties of the network.For example, we know from previous research that the relationships between layers have an impact on dynamic processes such as behaviour and information diffusion [35].
Being able to measure relationships between layers is also essential to validate models aimed at explaining the formation of empirical multilayer networks [28,29].While the problem of comparing different networks has been thoroughly investigated in the literature [1,4,9,17,19,31,33,38,42], the problem of quantifying layer similarity where the same nodes can be present in multiple layers -which characterizes multiplex networks -has not been studied in a systematic and comprehensive way so far.
In the literature, we can find a large number of works using layer similarity measures, but most use them as a tool to study other phenomena such as multiplex network generation [21,28,29], link prediction [7] and spreading processes [35].As a result, different works use the same or very similar approaches presented with different names, the relationships between several of these similarity measures have not been explored, and there are no guidelines on how to quantify layer similarity in multiplex networks, e.g., how to choose the appropriate measure given a specific dataset.In addition, various potentially useful layer comparison measures have not been considered yet.
Therefore, in this paper we provide the following contributions: (i) a systematic study of approaches and measures to compute the similarity between layers in multiplex networks, based both on a literature study and on a theoretical framing of the problem; (ii) a set of measures that have not been used yet to compare layers, complementing those already defined in the literature; (iii) an empirical study of the relationships between different measures, compared on several real datasets, and (iv) a set of guidelines on how to choose and use these measures.
In Section 2 we present the definitions, concepts, and notation used in the paper.In Section 3 we present an organized set of existing and new layer similarity measures.Section 4 provides the results of an empirical study where the main similarity measures are applied to several real datasets from different domains, such as genetic networks, social networks, co-authorship networks, and transport networks.Section 5 discusses guidelines to be used to select the most appropriate measure.

Concepts, terminology and notation
In this section, we define the basic concepts needed to provide a systematic coverage of layer similarity measures.We start with the standard definition of multiplex network, followed by an alternative representation called property matrix allowing us to define similarity functions based on different types of network structures and different ways to look at them.
In this paper we use the following definition of the multiplex network: Definition 1 (Multiplex network).Given a set of nodes N and a set of layers L, a multiplex network is defined as a quadruple M = (N , L, V, E) where (V, E) is a graph, V ⊆ N × L, and if An example of multiplex network is shown in Figure 1, where L = {l 1 , l 2 }, N = {n 1 , . . ., n 6 }, and (n 1 , l 1 , n 2 , l 1 ) is an example of an edge in E. In the literature alternative terminologies are used, and here we adopt the one in [23], according to which we would say that node n 1 is present in both layer l 1 and layer l 2 .In the literature some extended multiplex models have also been proposed, allowing multi-dimensional layers [23] and one-to-many relationships between nodes in different layers [27], but we do not consider these extensions here.
Please note that the original definition of multiplex network introduced in the field of Social Network Analysis was more restrictive than the one adopted in this paper.In particular, our definition allows some of the nodes not to be present in some layers.For example, (n 5 , l 2 ) / ∈ V in Figure 1.In some cases, when the term multiplex is used it is assumed that all nodes are present in all layers, and this assumption will often affect the result of layer comparisons.To avoid confusion, in this case, we explicitly talk about a node-aligned multiplex network [23]:  Notice that treating our working example as a Multiplex networks have usually been represented as a set of adjacency matrices A l , one for each layer l, where a l (n 1 , n 2 ) = 1 if there is an edge between node n 1 and node n 2 in layer l, a l (n 1 , n 2 ) = 0 otherwise.The adjacency matrices for our working example are shown in Figure 2.
However, this representation is not the most appropriate to define similarity measures, for two main reasons.First, it is incomplete, because it only allows representing node-aligned multiplex networks.An example of why this is important is the case of online social media, where each layer represents a different service (Twitter, Facebook, etc.) and it makes a difference whether a user has no connections on Twitter or does not even have an account there.In our working example, we would lose the information that nodes n 5 and n 6 are present in different layers.
Second, adjacency matrices present an edge-oriented view over the multiplex network, which might be the reason why most similarity measures in the literature have been limited to edge similarity.If we take a broader look at empirical networks, we can see how other structures can be relevant.As an example, if we look at Figure 1 we can see that the triangle {n 2 , n 3 , n 4 } is present in both layers.Unfortunately, this is not obvious from the adjacency matrices and would require checking several disparate entries making definitions more complicated than needed.Therefore, in the following, we use network representation targeted to the specific properties we want to consider when checking the similarity between layers.We call this representation a property matrix.

Definition 3 (Property matrix).
A property matrix P is a matrix where: (i) the columns correspond to a set S of network structures (nodes, edges, triangles, . . .), (ii) the rows correspond to a set C of contexts where these structures are observed (layers, groups, snapshots, . . .), and (iii) ps,c is the value of an observational function mapping each pair structure/context into a number (degree, distance, . . .).
Since in this paper we focus on layer similarity we will only use layers as contexts, that is, C = L.In summary, each cell ps,c of a property matrix contains the value of the function describing the structure s (for example, a node) on layer c, and different observational functions can be used to define different types of similarity.Examples of property matrices for our working example are shown in Figure 3.
Given a structure s, we can further summarize its presence in the network by summing over all the values in p s , computing their standard deviation or performing any other kind of aggregation (sum, avg, median, min, max, etc.).As an example, from a node-degree property matrix (Figure 3b) we can obtain the total degree of a node in the whole multiplex network (sum) or its so-called degree deviation [6], which is 0 if a node has the same number of connections on all layers and higher when a node is present in different layers with different degrees, and so on.In summary, property matrices provide a more general and informative representation of multiplex networks than adjacency matrices -which are still useful when the objective is just to know about the edges in a node-aligned network.Property matrices also allow us to provide simple and general mathematical definitions of different ways to compare layers, which will instantiate into several existing and new measures when specific property matrices are used.

Layer similarity functions
Given a property matrix P where each row represents a layer, we can compare two layers in three main ways.The first is to summarize each row using an aggregation function f and compare f (p l1 ) to f (p l2 ).For example, if the property matrix contains node degrees we can compare the layers' average degrees mean(p l1 ) and mean(p l2 ).Comparing the distribution of values in p l1 and p l2 is the second way to compare layers.As an example, we can compare degree distributions on different layers and find that both fit well a power law distribution with the same exponent.The third way is to compare p s,l1 with p s,l2 for all s.As an example, we can compute degree (a) Nodes, existence Figure 3: Property matrices for our working example in fig. 1.Each property matrix is defined by a type of structures (nodes, dyads, triads, etc.), the contexts (layers) and an observational function (existence, degree, forming a clique, distance, etc.) correlation to check whether nodes with a high (resp., low) degree on one layer tend to have a high (resp., low) degree also on the other layer.

(a) Comparing aggregations of layer property vectors
This first class of comparison methods is based on comparing f (p l1 ) to f (p l2 ) using various functions (f ) aggregating each layer into a single value.Typical choices are basic statistical summary functions such as mean, max, sum, skewness and kurtosis, combinations of the simple statistics, such as the coefficient of variation (the ratio between the standard deviation and the mean), the Jarque-Bera statistics (a combination of skewness and kurtosis), or the Shannon entropy [37] of the distribution.These methods are summarized in Table 2.
Then, given f (p l1 ) and f (p l2 ) we can compare them, and in our experiments we have used their relative difference, i.e. 2 Notice that depending on the property matrix these measures correspond to various existing network summaries.For example, the mean function may return the average degree (when applied to property matrices about node degrees, or the global clustering coefficient also known as transitivity index (for node clustering coefficients), or the average path length for property matrices about dyads and geodesic distances.-which in the field of chemistry coincides with the Wiener index [45].

(b) Comparing distributions of layer property vectors
While using a single value to compare layers can provide some useful knowledge about the multiplex network, for example by highlighting the presence of denser or more clustered layers than others, looking at the whole distribution of values in the property matrix can reveal other types of relationships among layers.From a statistical point of view, some ways are open to pursuing this task.The first one consists in comparing the moments of two distributions.For example, it is possible to compare the first four moments, even if by theoretical point of view this is not completely sufficient.Another possible approach consists in comparing the distributions directly.In this case, we have to apply to each property vector a function fr(p l ) that derives the relative frequency distribution.In case of discrete distributions, such as the degree distribution, f r k,l is the relative frequency of the k-th value of the property vector p l in a generic layer l given a property vector p l we derive the disjoint values p k,l , k = 1, . . ., K, and we associate to each value the relative frequency f r k,l .
In case of continuous distribution, or in case of very large networks in which also the discrete distributions take a wide range of values, the function fr(p l ) derives histograms.We first divide the range of values of the property vector into K equal interval, or bins, [b (k−1) , b k ], with b 0 being the minimum value in the property matrix and b K,l being the maximum value in the property matrix 1 .Then we associate the relative frequency f r k to each interval.Note that the bins of all histograms for all layers must be the same.Then we have to compare only the relative frequency distributions.This procedure is very fast and efficient also for very large networks.
Given the frequencies or histograms, in order to compare two layers we can use the distance between observed distributions based on distance between histograms, namely, the dissimilarity index (ID), the Kullback-Leibler divergence D KL [26], the Jensen-Shannon divergence D JS or the Jeffrey divergence D J , as defined in Table 3 [12].In the following, we do not consider the Jeffrey divergence, as the Jensen-Shannon divergence is its smother version.Note that this kind of comparison can be made both for node-aligned and for not node-aligned multiplexes.

(c) Comparing individual structures
The main feature of multiplex networks is that the same structure can be present or not, and have different characteristics, on each layer.For example, a node can be present in one layer and not in the other, or the same node may have different degrees depending on the layer.Therefore, a peculiar set of measures to compare layers relies on the comparison of the structures of interest, one by one.Two main cases are possible.In property matrices indicating the existence of different structures on the different layers, we only have two values, 0 and 1.While represented as numbers, these are in fact just nominal values indicating that the structure is present on the layer.For these binary matrices specific methods can be used, checking the overlapping or more in general, the common existence (or common absence) of structures across layers.For numerical matrices containing generic numbers, e.g., node degrees, other methods are more appropriate, as described in the following two sections.

(i) Binary properties
When a structure can be present or not on different layers, a basic way to compute the similarity between layers is to quantify the overlapping of these structures, that is, how often the same structure appears or not on more than one layer.This is typically the case when the observation function defining the property matrix checks the existence of the structure.
Measures of overlapping have been defined and redefined many times during the last few years in different papers, but most definitions can be generalized using property matrices as: where C is some normalization function.Most (but not all) measures in the literature compare edges across layers, this being the result of the traditional edge-based definitions of multiplex networks such as adjacency matrices.In our definition, the usage of property matrices allows us to apply similar comparisons to various other properties.Consider two binary property vectors p l1 and p l2 .Following [2] let us denote with: a = p l1 • p l2 the number of properties that l Then, the binary similarity functions can be summarized as follow in Table 4.

(ii) Numerical properties
Depending on the reason why we are computing the similarity between layers, we can use different approaches.As each layer is represented as a vector in a property matrix, one way is to compute vectorial distances such as Euclidean distance or cosine similarity.Another popular way to compare numerical layer property vectors is to compute correlations.An example of this is the so-called inter-layer correlation measure, which is just the Pearson coefficient computed on two node degree property vectors [5,30].It is interesting to notice that in the literature correlations across layers have been almost always computed on node degrees, and in [3] also on clustering coefficients.However, correlations can be in fact be computed on any property matrix.
When computing correlations in generalized multiplex networks a choice must be made on how to handle actors not present in all layers.The choice we adopted in our experiments was to discard pairs where at least one of the two values was missing, which is a typical option in statistical software packages.

Empirical comparison of measures
The experiments have been performed using the multinet library2 and twenty-three multilayer networks 3 .
When we indicate "node-aligned", all the nodes have been replicated into all layers.Otherwise, a node is added to a layer only if the node has at least one edge on that layer -in which case we talk of generalized multiplex networks.The input format of the multinet library allows the distinction between nodes without connections and missing nodes, as in our working example, but none of the datasets we have used explicitly make this distinction.
In the experiments, we have computed the similarity between all pairs of layers in each dataset and grouped these results by network type (Table 6).Figures 4, 5 and 6 show the properties of distribution of values produced by each measure.Figures 7, 8, 10 and 9 show the Pearson correlation between values obtained by different measures, where a value of 1 (yellow in the colour figures) indicates that two measures are equivalent (up to some constant rescaling).In addition to the results presented in these figures, we have also performed a manual qualitative analysis of the results, to verify our interpretation of the patterns emerging in the plots.
In the following sections, we highlight some of the results, grouped into three main areas.

(a) Overlapping-based measures
Overlapping-based measures have been used multiple times in the literature, mainly applied to edges.In Figure 6 we can observe their behaviours on the various datasets used in our experiments.
Measures based on Simple Matching, Russel-Rao and Hamann degenerate whenever the property vectors become large (that is, m is large) and sparse (that is, d is close to m).In these cases, Russel-Rao tends to 0 while Hamann and SMC tend to 1, as we can see in the plots.However, with node-existence property matrices, these degeneration conditions are often not verified, so these measures can still capture different levels of similarity.When applied to generalized multiplex networks, node overlapping shows significant differences between different types of networks.For example, in Figure 6b we can see that social networks tend to have a high node overlapping (average close to one for measures 34-36), while for example, co-authorship networks show values closer to 0, indicating a significant difference between people working in different disciplines (Figure 6c).In practice, we can say that many social networks are naturally node-aligned.
However, in both cases, we can see several outliers, highlighting special relationships between layers and thus showing the usefulness of these measures also to identify special cases.For example for the Arxiv co-authorship network (20 in table 6) two layers physics.data-an(Physics Data Analysis, Statistics and Probability) and cs.SI (Computer science Social and Information Networks) are very similar in terms of node overlapping, indicating an interdisciplinary topic which is of interest to both computer scientists and physicists.Another example, this time for social networks, comes from the AUCS network (14 in table 6).Almost all outliers are related to the two layers facebook and co-author, both having a significantly different number of actors if compared with the other layers in the network, which explains, e.g., low overlapping.
Higher order structures, that is, dyads and triads in our experiments, also show different behaviours in different types of networks.There are several similar layers in collaboration networks, may be because that these networks are often obtained as projections from bipartite networks, but still, the majority of the pairs of layers are not very similar.For social networks, a high overlapping is observed much more frequently, also because of the high presence of triangles, while transportation and genetic networks show the least overlapping.

(b) Correlation-based measures
Correlation measures (15,16,31,32) prove their usefulness by discriminating between, e.g., social networks, where the degrees are correlated -that is, (un)popular people are often (un)popular on more than one layer, while for co-authorship networks where layers indicate different disciplines researchers are often popular in one or a few of them.Interestingly, transport networks contain different extremes: airports that are hubs for one airline are often not hubs for others (corresponding to anti-correlations, that is, values towards 1 in the figures) while for the London data the same locations are often hubs for different types of transportation, resulting in positive correlations.
In many cases, Pearson and Rank correlations show similar results.

(c) Effects of node alignment
The impact of using a node-aligned or generalized multiplex is evident in many experimental results, as expected.Obviously, node-based measures computing the overlapping among nodes in different layers (33)(34)(35)(36)(37)(38) become useless if we force all layers to contain all nodes (Figure 6, right-hand-side plots).At the same time, using node-aligned networks also affects many other measures.As an example, Figure 4d shows the presence of anti-correlated layers (measures 15 and 16, left-handside, values close to -1), revealing how airports that are hubs for one airline are often not hubs for others.Considering many nodes that would not be present in the layers, and thus having degree 0, makes these anti-correlations less evident (measures 15 and 16, right-hand-side, values now closer to 0).
For edge-and triangle-based overlapping measures the results are the same in the nodealigned and in the non-aligned networks.This, however, only because we have not made a difference between, e.g., a missing triangle and missing triad, which would be computationally demanding.This also shows how the results we obtain may strongly depend on how we modelled the data and on implementation details such as the policy to handle null values.
Correlations between different measures appear more evidently in node-aligned networks.This effect is more evident for genetic networks and co-authorship networks.In these cases, the zeros added by the alignment reinforce the correlation among the measures.

Guidelines
From our literature study, theoretical framing and experiments it appears how layer comparison measures can be very valuable and often succeed in practice to characterize the structure of multiplex networks, but they are not always straightforward to use.Therefore, in this section, we list a set of guidelines motivated by our experience acquired while testing these measures and by the results presented in the previous section.
One important aspect to consider when choosing which function to use is the distribution of values in the property matrix.Among the criteria that can be used to characterize layer property vectors and comparison functions, the following appear to be useful:  (17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32).For each network type on the left is generalized multiplex network and on the right the node-aligned multiplex network.The outliers have been scattered.
• Sparsity -A layer property vector is sparse if the number of 0s is much higher than the number of non-0 values.• Degeneracy -A layer property vector degenerates if its values are (almost) constant.
Sparsity is a special case of degeneracy.• Linearity -A layer property vector is linear if the values in the vector and their rank are linearly correlated.• Scale invariance -a similarity function is scale invariant if it does not (significantly) change when one or more layer property vectors are multiplied by a constant.
We now list our guidelines, divided into four main areas.

(a) Number of measures
The number of available measures is very large, considering that the fifty options used in our experiments are only some of the measures we can obtain using different combinations of property matrices and observation functions.While the choice of the measures to be used for a specific empirical network is of course influenced by what the analyst is interested in, e.g., degree-based similarity, betweenness-based, or specific motifs that are motivated by the application context, our experiments show that different measures highlight different types of similarities.The choice of whether a node-aligned or generalized multiplex model should be used is often clear from the context.For example, we would typically not align nodes when layers represent different social network sites, to represent the fact that users may not have accounts on some sites, while we would typically align nodes in a multirelational network about people interacting in multiple ways, where not having edges on a layer does not imply that the person cannot interact in that specific way.Node-alignment may lead to some degeneracy.As expected, node-existence measures are useless, but also other cases are affected, such as measures 11-16 (degree) and 27-32 (clustering coefficient).
Measures based on node existence indicate the effect of node alignment on the other measures.So, before using link-based measures (such as edge Jaccard), it is important to check e.g., node Jaccard to understand whether comparing higher order structures is meaningful, or whether the results will just be a consequence of the amount of node overlapping across layers.
Rank correlation can suffer from node alignment because of false tie resolution, and also Pearson correlation results may become less evident, as shown by the experiments where positive and/or negative correlations are lost or decreased depending on the type of networks.
(c) Sparsity SMC and Hamann are only useful for non-sparse, non-degenerated cases, which in our experiments correspond to node existence on generalized networks.Russel-Rao also suffers if property vectors are sparse.As an example, these measures do not work well for triangleexistence property matrices in general.

(d) Linearity
Having non-linear distributions of values in the property vectors, as it is the case for degree property matrices, is not problematic when computing linear correlation.Linear correlation (Pearson) is however often preferable to rank correlation, which can be problematic in case of generalized networks (because of null values) and also for node-aligned networks -because of the many nodes with the same values.

Conclusion
A summary of our guidelines is that there are many ways to compare layers, but (1) not all methods are always appropriate, and (2) some are often correlated, which means that if we only want a small number of layer similarities we can give priority to one for each group of related measures.
As we mentioned in the introduction, our framework captures several measures appeared in the literature: node activity overlapping [30], global overlapping of edges [8] and absolute binary multiplexity [18] are applications of the Russel-Rao function to node and edge existence property vectors, average edge overlap [16] and [3] are respectively the Jaccard and coverage functions applied to edge existence.A general recommendation is to use the original names: all the measures used in this work and mentioned in this paragraph are applications of existing proximity measures, most of them well known to data analysts.Calling them by their name, such as edge Jaccard, makes it simpler to understand when it is reasonable to apply them if we already know the original measure.
Also, notice that our framework allows the definition of a large number of other functions not tested in this article, also considering directed/undirected networks, weights, and other mesostructures such as motifs.Other network summary functions that are not specific for multiplex networks can also be obtained as combinations of property matrices and observational functions.Examples are order (node existence + sum), size (edge existence + sum), density (edge existence + mean), average path length (dyad distance + mean), etc.We believe that splitting the problem of computing layer similarities into the two problems of ( 1) deciding what to observe and (2) deciding how to compare these observations using existing generic comparison functions gives the analyst the ability to easily generate custom layer comparisons that are appropriate for the problem at hand.

Figure 1 :
Figure 1: An example of a multiplex network consisting of two layers, six nodes, and ten edges.

Table 1 :
Terminology and notation used in the paper Symbol Name N set of nodes {n 1 , n 2 , ..., n |N | } L set of layers {l 1 , l 2 , ..., l |L| } P property matrix C set of contexts (e.g., network layers, snapshots, groups) S set of structures (e.g., nodes, edges, dyads, triangles) pc property vector for context c ∈ C p s property vector for structure s ∈ S ps,c property of s in c (e.g., degree of node s on layer c) p C ,S p restricted to contexts in C ⊆ C and structures in S ⊆ S Definition 2 (Node-aligned Multiplex network).A node-aligned multiplex network is a multiplex network (N , L, V, E) where ∀n ∈ N , l ∈ L : (n, l) ∈ V .

Figure 7 :Figure 8 :
Figure 7: Correlation between all fifty measures for Genetic networks.On the left is generalized multiplex network and on the right the node-aligned multiplex network

Table 2 :
Summary of common aggregation functions for property matrices

Table 3 :
Main methods to compare distributions across layers

Table 4 :
Similarity functions for binary property matrices.Column C indicates the normalization function in Eq. 3.1.For the two functions also considering the non-existence of structures on both layers, we only provide the standard definition not based on the product of property vectors

Table 5 :
Similarity functions for numerical property matrices.The function ρ(•) provides the ranks of the values in the property vectors

Table 6 :
Twenty-three multilayer networks used during experiments

Table 7 :
Fifty measures evaluated during experiments