Statistical physics of network structure and information dynamics

In the last two decades, network science has proven to be an invaluable tool for the analysis of empirical systems across a wide spectrum of disciplines, with applications to data structures admitting a representation in terms of complex networks. On the one hand, especially in the last decade, an increasing number of applications based on geometric deep learning have been developed to exploit, at the same time, the rich information content of a complex network and the learning power of deep architectures, highlighting the potential of techniques at the edge between applied math and computer science. On the other hand, studies at the edge of network science and quantum physics are gaining increasing attention, e.g., because of the potential applications to quantum networks for communications, such as the quantum Internet. In this work, we briefly review a novel framework grounded on statistical physics and techniques inspired by quantum statistical mechanics which have been successfully used for the analysis of a variety of complex systems. The advantage of this framework is that it allows one to define a set of information-theoretic tools which find widely used counterparts in machine learning and quantum information science, while providing a grounded physical interpretation in terms of a statistical field theory of information dynamics. We discuss the most salient theoretical features of this framework and selected applications to protein–protein interaction networks, neuronal systems, social and transportation networks, as well as potential novel applications for quantum network science and machine learning.


Information content and processing in complex networks
In fact, physical systems-them being biological networks, quantum networks or large-scale infrastructures-can be characterized in terms of their information capacity and information processing [57], which can be mathematically mapped as the changes in time from an input state to an output state. While classical information theory for the analysis of unstructured data has been introduced almost one century ago, and developed along decades even with quantum generalizations, an information theory for structured data (figure 1) is still missing. At this point, it is worth differentiating the major challenges to this aim: (i) quantifying the information content or the complexity of a network, a problem related to how information is stored within a system state and its changes, and (ii) understanding how networked units process individually and collectively information to achieve a desired function. While a deeper discussion about what is information in general [58] will be presented in section 7, here we briefly discuss the existing attempts towards both directions.
On the one hand, the information content of networks can be estimated by considering a set of descriptors (e.g., degrees, paths, so forth and so on) that can be used as structural constraints that characterize specific network ensembles obtained from a max-entropy approach [60][61][62]: the entropy, calculated as the Gibbs entropy (i.e., the logarithm of the number of graphs for a microcanonical ensemble) or the Shannon entropy (for a canonical ensemble), provides a measure of disorder or surprise about the ensemble itself. On the other hand, a widely used approach has been to use those descriptors, estimate their joint probability and calculate the Shannon entropy of the resulting distribution: the choice of different descriptors characterizes different methods [63][64][65][66] (see [67] for a review), which have been used to unravel the existence of constraints operating on the possible Universe of complex networks [59]. One drawback of the first approach is that it does not explicitly account for information dynamics and its interplay with the network structure, therefore it cannot be used to gain insights about node-node communications, information processing and system's function. Moreover, it does not allow to explain the emergence of heterogeneity in complex networks, a problem recently solved by introducing a novel classical network ensemble and soft constraints [68]. One drawback of the second approach is that it can only rely on a finite number of descriptors extracted from the network, thus inevitably accounting for limited information, and estimations depend on the specific choice of those descriptors, thus complicating the comparison of results across different studies or leading to conflicting outcomes. It is worth mentioning that despite the drawback, classical network information theory approach has been successfully applied to detect patterns in empirical networks, reconstruct network properties from partial information, and sample networks with given properties [69]. To overcome the problem of limited information, it has been proposed to use matrix functions encoding, explicitly, the whole connectivity information. An early attempt [70] was based on defining the network counterpart of a density matrix, an operator widely used in quantum statistical physics because it allows to encode the quantum state of a system, calculate the probability and the expected value of quantum measurements. It is worth noting that while such approaches are inspired by the mathematical formulation of quantum mechanics, they are often proposed to analyse classical networks. This type of density matrix, assumed to be proportional to the combinatorial Laplacian of a network, and the corresponding von Neumann entropy, have been used in a variety of studies [71][72][73][74][75]. However, while such a density matrix satisfies the required mathematical properties, it lacks physical interpretation and the corresponding information entropy is does not satisfy theoretical requirements such as sub-additivity.
Concerning information processing, it is widely accepted that units in real-world systems exchange information quite efficiently without necessarily needing global intelligence, i.e., information about the full network connectivity. Therefore, there are local mechanisms at work that, together with the peculiar structural characteristics, allow one to navigate the network and information to flow efficiently [4,8,76]. A standard way is to use specific classes of dynamics on the top of the network and exploit them to capture specific aspects of system's function and organization. One example is given by random walk dynamics, used to build maps of how information flows through the system: the information-theoretic content of these maps, for instance in terms of the description length of walk trajectories, can be optimized to identify the mesoscale organization of the network, i.e., the partition of system's units in groups or modules which requires the minimum number of bits to be described. Therefore, this class of approaches exploits the presence of regularities to build a compressed description of the system [77,78]. Another approach, not necessarily limited to random walk dynamics, is to use a perturbative formalism to track the contribution of each node and path to the overall flow of information: in this case, it has been shown that the interplay between structure and dynamics leads to some universal behavior that can be exploited to capture the most fundamental mechanisms of information exchange among interconnected units [79]. To correctly understand communication and correlation in networks, therefore, it is crucial to understand that not only shortest path but also many more routes-not necessarily the optimal ones-contribute to information pathways: this fact is well captured, for instance, by network communicability [80][81][82] (see [83] for a thorough review), which has been recently used to determine the informational cost of navigating a network under different levels of external noise and, as an application, to determine the levels of noise at which a protein-protein interaction (PPI) network seems to work in normal conditions in a cell [84].
In the remainder of this work we will focus on a recently proposed approach which explicitly accounts for information dynamics and provides a solution to the sub-additivity problem, by using a quantum-like density matrix, defined as a Gibbs state [85]. Despite a physical interpretation of this density matrix in terms of the propagator of diffusion-like dynamics has been provided, more recently it has been shown that it corresponds to a special case of a more general statistical field theory of complex information dynamics [86], providing a potentially unifying solution to the two major challenges outlined at the beginning of this section, which will be described in more detail in the next section.

Field theory of complex information dynamics
In a field theory, the physical quantity of interest can be measured at each point in space and time: in the case of networks, space is encoded by system's units, mapped to a finite number of points in the Euclidean space (see [32] for details). In the following, the physical quantity of interest is generally regarded as information (e.g., water, electrochemical signals, human flows, so forth and so on) which is an inherently fluctuating object, making approaches based on classical field theory not suitable to describe its state and its dynamics. Instead, it is more natural to describe the probability of field states in terms of ensembles which incorporate the uncertainty about the microstate of the system. Importantly, we will consider the state of a system as defined by the interplay between its structure and the information dynamics on top of it, at a given time.
We represent nodes by canonical vectors |i , (i = 1, 2, . . . N) and their connections are encoded in an oper-atorŴ, playing the role of the adjacency matrix-i.e., the weight of the link from ith to jth node is given by j|Ŵ|i = W ji . We introduce the information field |φ(τ ) where the amount of field on top of ith node at time τ reads i|φ(τ ) . The evolution of the information field in the most general form is given by which after linearization becomes withĤ playing the role of a control operator. Equation (2) directly leads to |φ(τ ) =Ŝ(τ )|φ(0) , with the propagatorŜ(τ ) = e −τĤ .Ĥ can describe a range of possible dynamics of the information field, such as continuous-time diffusion, random walks, consensus and synchronization on top of single layer and multilayer networks.
Taking φ 0 to be the initial value of the field on top of node i, the information flow from the node can be obtained as |φ(τ ) = φ 0Ŝ (τ )|i , and the information received by jth node from ith follows φ 0 j|Ŝ(τ )|i . To provide a statistical description of the information dynamics, we assume a probabilistic initial condition |φ(0) = N i=1 p i φ 0 |i , where p i is the probability that ith node is the information field seed. Accordingly, the expected information flow from ith node becomes p i φ 0Ŝ (τ )|i and the expected information received by jth node from ith node follows p i φ 0 j|Ŝ(τ )|i .
In absence of information about the initiator node, we assign uniform probabilities p i = 1/N, leading to the expected information flow from ith node φ 0 NŜ (τ )|i , and to the expected information received by jth node from the ith one given by Two important choices forĤ are the combinatorial Laplacian [85] indicated byL, used to model continuous diffusion, and the normalized Laplacian [86] shown asL * which can be used to model random walks, consensus dynamics and the dynamics of oscillators near meta stable state.
We eigen-decompose the propagator asŜ(τ ) = N =1 s (τ )σ ( ) , where s (τ ) is the th eigenvalue of propagator, andσ ( ) is the outer product of its th right and left eigenvectors. Consequently, the expected information exchange between pairs can be written as summation of N different fluxes N The fluxes are directed between the nodes by a set of operators {σ ( ) }, acting as information streams (see figure 2). Each information stream is multiplied by a corresponding coefficient φ 0 N s 1 (τ ), φ 0 N s 2 (τ ), . . . φ 0 N s N (τ ), which can be interpreted as the stream's size. When the eigenvalues of the control operatorĤ are non-negative, the size of streams decay over time as its exponential function, unless for the smallest eigenvalue that is equal to zero and corresponds to the equilibrium state of the dynamical process. Depending on size, each stream is considered inactive ( φ 0 N s (τ ) = 0) or active ( φ 0 N s (τ ) > 0). Regardless of the type of dynamics, self-loops existing in information streams are directly responsible for trapping the field. Interestingly, it has been demonstrated that the amount of field each information stream traps is equal to the size of that stream φ 0 N s (τ ) and, consequently, the overall expected trapped field can be obtained from the summation of stream sizes N . Since the expected trapped field regulates the size of information streams, it can be considered responsible for activation of the streams. Therefore, the information dynamics is reducible to the dynamics of the trapped field. Consequently, using a proper superposition of information streams,ρ(τ ) , where the th stream is weighted by its fractional share from the trapped field, , we obtain another description of the expected information flow, reflecting the property Remarkably, the operatorρ(τ ) is a potential density matrix for the system. Assume the field is discretized into a large number of infinitesimal quanta carrying value h which, depending on the nature of information field, can be bits of information, small packets of energy, infinitesimal volumes of fluid, etc. A number n(τ ) = φ 0 Nh Z(τ ) of quanta activates the information streams and, similarly, the number n ( ) (τ ) of quanta participating in activation of th information stream is given by n ( ) (τ ) = φ 0 Nh s (τ ). Thus, the probability that one quantum participates in activation of th stream is given by n ( ) (τ ) = ρ (τ ) and each quantum of the trapped field generates a unit flow 1 by activating one of the information streams according to their activation probabilities. The unit flow is the smallest element used to describe functional interactions in the system. The expected information flow is obtained from summation of all the unit flows. The information dynamics can be fully captured using a number of unit flows, each activating one of the information streams {σ ( ) (τ )} (see figure 2). Because of their probabilistic nature, the streams shape a statistical ensemble encoding all possible fluxes among components and their probabilities and the operator ρ(τ ) is reminiscent of the density matrix used in quantum statistical physics in terms of the superposition of quantum states, with the important distinction that here we do not necessarily deal with physical quantum objects.
Consequently, Z(τ ) that determines the amount of field trapped on top of the initiator units can be interpreted as partition function. When the propagator is derived for random walk dynamics (Ĥ =L * ), Z(τ ) = NR(τ ), where R(τ ) is the average return probability of random walkers. Average return probability is a proxy for the tendency of structure to trap the flow in its initial place. Therefore, the partition function Z(τ ) can be interpreted as dynamical trapping and used to assess the transport properties of networks. Of course, the redundancy and symmetries in diffusion pathways for information propagation can have negative effect on signal transport (see figure 3)-e.g., a regular lattice provides slow information propagation, while long range interactions between distant nodes and topological complexity can lower the dynamical trapping and enhance the transport in small-world or scale-free networks (see figure 3).
Moreover, in a wide range of complex systems the structure is better represented in terms of multilayer networks [38], where the interactions between the units is described by L > 1 distinct networks, seen as layers, coupled together. The existence of multiple interaction types in such systems is important for a range of properties. Using the statistical physics of complex information dynamics, one can obtain a fundamental inequality that relates the dynamical trapping of the multilayer system and the geometric average of partition functions of layers, through average dynamical trapping: where Z(τ ) is the partition function of the multilayer system and Z ( ) (τ ) is the partition function of the th layer. The inequality is important as it proves the advantage of multilayer systems over single layers, according to information transport. It is worth mentioning that the concept of dynamical trapping has been generalized to other types of dynamical processes [87].

Network information-theoretical tools and distances
Information-theoretic measures have been successfully applied in a wide variety of disciplines, from biology to quantum physics. The aim of this section is to review such measures developed to study complex networks, density matrices obtained from physical processes on top of networks used to quantify their information content, as well as the (dis)similarity metrics exploited to compare structured data. The applications of the mathematical objects presented here to real-world complex networks will be discussed in the following sections.

Spectral entropies, divergences and likelihood
The probability distribution ρ (τ ) is crucial for the information flow between the nodes. The mixedness of information streams can be quantified by the von Neumann entropy of the density matrix: as information dynamics becomes rich and diverse, the number of information streams required to capture it grows, and consequently, entropy takes higher values, even at large propagation times. It is worth noting that when the continuous diffusion governs the flow, the von Neumann entropy of the ensemble coincides with the spectral entropy introduced to quantify the information content of networks [85]. It is worth noting that a more general definition of network entropy is the Rényi entropy which becomes the Hartley entropy in the case q = 0, recovers the von Neumann entropy (equation (7)) as q → 1 and equals the collision entropy when q = 2 [88].
Remarkably, the von Neumann entropy can be used to quantify the (dis)similarity of networks, from the perspective of information theory. Assumeρ(τ ) is the density matrix of a network G andρ (τ ) is the density matrix of another network G , at specific temporal scale τ . The Kullback-Leibler entropy divergence between the two is given by which quantifies the amount of information gained about G, by observing G . For instance, this is useful when the density matrix of network G is used as a model to predictρ (τ ) and the entropy divergence quantifies the prediction error. The Kullback-Leibler entropy divergence is non-negative (D KL (ρ(τ ) ρ (τ )) 0), and not symmetric (D KL (ρ(τ ) ρ (τ )) = D KL (ρ (τ ) ρ(τ ))). Interestingly, by minimizing the Kullback-Leibler entropy divergence of real-world networks and parametric network models, one can infer the optimal parameters describing the observed networks through maximum-likelihood estimation (see figure 4), noting that maximizing likelihood is equivalent to minimizing the Kullback-Leibler divergence [89] (for a proof in case of network density matrices see reference [85]). The likelihood is given by the L(ρ(τ ) ρ (Φ, τ )), where Φ indicates the parameter(s) to be optimized,ρ (Φ, τ ) is the density matrix of network model given the parameters andρ(τ ) is the density matrix of an empirical network. Often, the suitable model for real data is obtained in a trade-off between maximizing a function, which in this case is the likelihood, and minimizing the number of parameters used to construct the model. Akaike information criterion, given by 2g − 2 log 2 L where g is the number of parameters and L is the likelihood function, can be used for such optimization [85]. The Jensen-Shannon divergence is given by whose square root provides a symmetric metric for quantifying network dissimilarity, at variance with the Kullback-Leibler divergence. In the analysis of multilayer networks, the Jensen-Shannon distance has been used to quantify the pairwise dissimilarity of layers (see figure 4) unravelling their hierarchical organization [85].

Information-theoretic distances and dimensionality reduction
As mentioned previously, in many scenarios, the underlying structure of a complex system can be better represented as a multilayer network, where each layer is a network describing a particular type of interaction between the nodes. In these cases, a quantity of interest is the average Kullback-Leibler entropy divergence between the multilayer system and its layers, that is named intertwining and measures the difference between the whole and its parts (layers), from the perspective of information theory [90]. Remarkably, intertwining is proxy of the overall redundancy (or similarity) of layers, having low values when the layers are highly similar and high values for diverse layers. Thus, if there is a subset of layers highly similar to each other, their redundancy can be identified and quantified. In a multilayer network with L layers, quantifying the overall diversity of layers using intertwining and identifying groups of similar layers [91], using Jensen-Shannon divergence allows for devising dimensionality reduction algorithms aiming to merge the redundant layers and return a multilayer network with L < L maximally diverse layers (see figure 8). In the following sections, we discuss an application of dimensionality reduction to empirical systems. for different average degree K and rewiring probability P rew is compared against a Watts-Strogatz network with K = 6 and P rew = 0.2, assumed to be the data to fit. Reproduced from [85]. CC BY 4.0.

Applications to empirical interconnected and coupled systems
Being grounded in information theory and statistical physics, the presented framework has broad applications to empirical systems across scales, from within cells to complex societies. In the following, we briefly review a few applications, from clustering analysis of human-viral interactomes and human microbiome, to the centrality, robustness analysis, the dimensionality reduction and transport phenomena in social and transportation systems.

Multiscale analysis of information dynamics
The dependence of the density matrixρ(τ ) on the propagation time τ of the dynamics, allows one to use τ as a resolution parameter. At the lowest values (τ ≈ 0) the microscale is explored, at the largest values (τ N, where N is the system size) the macroscale is analyzed, whereas at the intermediate values the mesoscale is probed. Alternatively, since the temporal scale can be different from network to network, depending on the connectivity, number of nodes, topology, etc, one can use the diffusion time to allow for comparison across networks. Let the eigenvalues of the control operatorĤ be λ , ( = 1, 2, . . . N), ordered as λ λ +1 and λ 1 = 0. Naturally, the eigenvalues of the propagatorŜ(τ ) follow e −τλ , ( = 1, 2, . . . N). The second eigenvalue of the control operator determines the diffusion time τ d = 1/λ 2 -i.e., the temporal scale close to equilibrium. One can divide the propagation time scale τ by the diffusion time to obtain the rescaled temporal parameter τ /τ d allowing for comparisons across network types and sizes [92]. Note that the control operator can take the shape of operators other than combinatorial Laplacian and, in those cases, one can use the definition of diffusion time as long as the control operator has non-negative eigenvalues, and at least one eigenvalue that is zero. To date, two distinct classes of applications have been introduced for the analysis of complex information dynamics across scales: • Density state (or Gibbs state): the network state is given byρ(τ ) and descriptors are obtained from calculating the corresponding von Neumann and relative entropies for varying τ ; • Emergent functional state: the network state evolves, allowing one to analyze each state at time τ as the temporal snapshot of a time-varying process. Accordingly, at each time τ a functional network emerges that encodes the pairwise flow between the nodes, and can be analyzed by means of classical network descriptors (see figure 5). Analysis of density states. Often, components of complex systems might appear structurally similar while they take distinct functional roles. For instance, cells active in the visual cortex and auditory cortex of human  [92]. (B) Von Neumann entropy as a function of β (here β indicates the temporal scale τ ) for a highly-ordered network colored in orange, and its configuration model (CM) colored in blue. At small temporal scales, the von Neumann entropy reaches its maximum and in the large limit of the propagation time, it is related to the number of connected components C. At the mesoscales, the entropy is able to capture the mesoscopic organization and the height of the plateau is related to the overall modularity of the network. Reproduced from [93]. CC BY 4.0.
brain are neurons, yet they are involved in different computational tasks. The reason behind such a diverse functionality of similar agents might be their position with respect to the structure, as it partly determines how these agents handle information, as senders, receivers and processors. Thus, one way to quantify the functional diversity of a system is to identify the functional modules-i.e., groups of nodes that exchange information mostly within themselves-that will be discussed in the following. Alternatively, one can quantify the diversity of flow distribution vectors originated from the nodes, in terms of the corresponding average cosine distance. It has been shown that such measures of functional diversity are proportional to the von Neumann entropy of the system, in synthetic and empirical networks [86,92,94].
Using this framework, one can analyse the virus-human PPI as an interdependent system with two parts, human PPI targeted by viral proteins, and can quantify the effect of each viral infection on the information dynamics, dynamical trapping and the functional diversity of units on the human interactome, and map the perturbations caused by a virus to compare the effects of distinct viruses gaining insights about the systemic effects of SARS-CoV-2 [95].
The framework has been also used to study the effect of connectomes' topological complexity on information dynamics within the human brain. More specifically, real connectomes from healthy subjects have been compared against randomized network models, characterized by distinct structural features, such as the Erdös-Rényi model (ERM), the CM, the stochastic block model (SBM) and the hyperbolic model (HM). Comparing these generative models with the real data sheds light on the properties of human connectomes. For this comparison, classical and maximal entropy random walks (MERW) have been used to construct the density matrices: interestingly, most generative models have smaller von Neumann entropy than empirical human brains between meso and macroscopic temporal scales, where mid-or long-range communications between brain areas take place (see figure 6). At extremely large and small rescaled temporal parameter τ/τ d , the difference between red and blue line indicated by black dashes is negligible, yet it grows around the mesoscale, showing the advantage in information capacity of complex empirical systems. One realization of fungal networks is presented in the right-hand side of the panel (A). Reproduced with permission from [92]. (B) A similar advantage has been found in brain networks. The result of the analysis of an ensemble of human connectomes is presented: the average von Neumann entropy for the real data (blue line); a number of generative models including ERM, CM, HM and the SBM, obtained from the classic random walk and the MERW dynamics, respectively. Reproduced from [94]. CC BY 4.0.
A similar investigation has been done on three species of fungi and slime molds, including Pp, Phanerochaete velutina (Pv) and Resinicium bicolor (Rb). The networks corresponding to these species are weighted, according to the cord conductance for pairwise interactions (see [96] for details about the data set). Similarly, to connectomes, the von Neumann entropy of these biological networks is larger than their randomized models at the meso-scale (see figure 6).
Analysis of emergent functional states. In contrast with the adjacency operator, that gives a static and localized picture of interactions between the nodes, the propagator encodes pairwise flow exchange between them at multiple scales, characterized by a temporal parameter τ describing short-, middle-and long-range functional interactions. Thus, one can define the emergent functional state of the system aŝ , where I is the identity matrix,Û is the matrix with all entries equal to 1, and • denotes the Hadamard product. Operatively, the operatorω(τ ) is equivalent to the propagator of information dynamics with diagonal entries set equal to 0 and j|ω(τ )|i gives the flow received by node j from node i at time τ . Interestingly, it can be shown that for continuous diffusion (Ĥ =L) at extremely small τ , the emergent functional state reduces to the adjacency operator.
Emergent functional statesω(τ ) can be analyzed to identify functional modules-i.e., groups of nodes that exchange higher amount of flow between themselves than with the rest of the nodes-using the Louvain algorithm. In the case of the fungal networks described above, it has been shown that for short-range communications are considered, a large number of functional modules are detected, and for long range interactions a smaller number of modules emerge (see figure 5). The number of modules is proportional to the von Neumann entropy of the system [92].

Network entanglement as a proxy for robustness
As discussed in the previous sections, the von Neumann entropy measures the diversity of information dynamics in the system and functional diversity of its units. It is possible to quantify the importance of any node x for the functional diversity of the system, in terms of the effect of node removal on the von Neumann entropy. The network before the detachment of node x can be denoted as G, the remaining part of network after detachment is shown as G x and the star network containing the detached node and its links δG x . Here, we denote their von Neumann entropies, respectively, as S(τ ), S x (τ ) and S x (τ ). The definition of entanglement [87] between node x and the network is given by It is possible to provide an analytical understanding about the behavior of entanglement, by using a meanfield approximation of the von Neumann entropy, in the case of continuous diffusion process (Ĥ =L), as wherek is the average degree. Using the mean-field entropy, one can show that at extremely small and large temporal scales, the entanglement follows: Where k x is the degree of the removed node and C x is the number of disconnected components, in the perturbed network G x . At the meso-scale, entanglement assesses the importance of nodes for the transport properties of the system, by measuring its effect on the dynamical trapping. As an application, entanglement has been used as a centrality measure capturing the role played by nodes in keeping the overall diversity of the information flow. It has been shown that attack strategies based on this centrality measure at the meso-scale, are compatible with or outperforms other methods in driving empirical social, biological and transportation systems to fast disintegration, showing that the nodes central for information dynamics are also responsible for keeping the network integrated [87] (For more details see figure 7). In each case, the redundant layers are identified from the heatmap, the reduction continues until intertwining reach a maximum, average return probability (proportional to the dynamical trapping) and diffusion time reaches a minimum and the navigability is improved. Reproduced from [90]. CC BY 4.0.

Reducing complexity without altering the structure
As mentioned previously, information-theoretic metrics can be used to quantify the distance between pairs of networks. Interestingly, the Jensen-Shannon distance (equation (10)) has been used to assess the dissimilarity of layers of the multilayer network corresponding to the sites of the human microbiome, leading to their hierarchical clustering, in agreement with the state-of-the-art community-based association methods [85].
While the multilayer representation often provides a more accurate framework to model the structure of complex systems, it is a challenge to find the minimum number of layers necessary to precisely represent the structure. One way to tackle the problem is to find the layers containing redundant information, merge them together and maximize the distinguishability between the multilayer and the aggregated graph, obtained from merging all the layers simultaneously. To identify the redundant layers, one can calculate their hierarchical clustering given by means of the Jensen-Shannon distance and devise a procedure to merge them until the layers are maximally diverse and distinguishable [91]. However, the proposed reducibility algorithm had shortcomings. For instance, in case of synthetic multilayer systems where a subset of layers are identical, and maximally redundant, the framework does not lead to the aggregation of those layers. Moreover, the formalism cannot be used to understand the effects of dimensionality reduction on the dynamical properties of the system.
To overcome those issues, instead of relying on the layer distinguishability, one can maximize the intertwining [90]: the average entropy divergence between the multilayer and its layers. This function has an elegant interpretation from the perspective of complexity science, as it directly quantifies the importance of being a multilayer system, in terms of the difference between the whole (multilayer) and its parts (layers). It has been shown that by maximizing intertwining, one can effectively reduce the redundancies, reaching maximally diverse layers, enhancing the transport properties of systems, lowering the dynamical trapping and increasing the navigability. Interestingly, to apply this framework to real-world multilayer networks, it is not necessary to alter their structure: it is enough to couple the dynamical processes on top of similar layers to boost the transport in the systems. For instance, instead of adding new routes, one can introduce shared bus and subway tickets to couple the layers and enhance transport properties. For more details on the enhancement of transport properties in empirical systems including co-authorship network of scientists, European airport networks and the London public transportation network as a result of the functional dimensionality reduction, see figure 8.

Future perspective
Recently, a promising theoretical framework has been developed to reconcile thermodynamics and information to quantify, for instance, the energetic cost of altering a system's state. This novel field, based on stochastic thermodynamics and fluctuation theorems, is increasingly gaining attention for its applicability to a broad spectrum of problems which cannot be easily tackled with existing frameworks [97]. Even more recently, a time-information uncertainty relation bounding the rates of energy and entropy exchange has been discovered [98], the thermodynamics of modular computations has been linked to structural energy cost [99], the thermodynamic cost of Turing machines-emblematic computing models-quantified [100], and experimental evidence linking information with thermodynamics has been provided [101], supporting the fact that information is physical, as originally conjectured by Landauer et al [102].
A traditional way to deal with information relies on Shannon's theory of communication [103], building on the concepts of data source, communication channel (through which the data are transmitted, being possibly contaminated by noise) and data receiver. In this setup, the information content is often quantified in terms of entropy, encoding the ability of the receiver to reconstruct from the observed signal the original data sent by the source. For this reason, Shannon's entropy is, nowadays, one of the widest used measure to characterize the regularity of patterns and, in some cases, as a proxy for their complexity. We have mentioned some representative studies based on information entropy and its variants (e.g., relative entropies) to characterize network complexity, although their estimation-limited by the sub-set of network descriptors they focus on-does not fully exploit the richness of the structural data. It is worth mentioning that the relation between a graph complexity and its information is a problem older than network science itself, known as structural information content, which has been already faced a few years after Shannon's pioneering work (e.g., see [104][105][106][107]). Since the last decade, information theoretic approaches are being developed in network science, using source-channel-receiver paradigma [108], where analyses such as coarse-graining and identification of communities are reliably mapped into the classical problem of decoding a message transmitted along a noisy channel [109][110][111][112].
In data analysis, dealing with information is crucial for several practical reasons. For instance, an adequate quantification of network information content allows one to compare systems of different types (e.g., a biological against a technological one) and of different sizes, a task that nowadays can be performed with promising approaches-from quantum Jensen-Shannon divergence [85] to network portrait divergence [113], to mention a few ones [114]-under some limitations peculiar of each method. Another practical application would concern the quantification of how a node influences and is influenced by the network [115], as well as to gain insights on how information content is related to system's robustness [116].
The research agenda for the next future is rich. While, on the one hand, the framework presented in the previous chapters has proven to be effective for practical applications like network comparison, a current technical limitation, on the other hand, is that only graphs of the same size can be compared: the possibility to extend the formalism to network states of different size will favor applications to detect relevant changes of information in time-varying systems, among others. The lack of consensus on how to measure network complexity and applications to discriminate between different networks in terms of their structure, information capacity and processing, is another promising direction. For instance, existing approaches to measure network efficiency in information exchange [117,118] might be better understood within the statistical field theory of information dynamics. While a generalization of the presented framework to the case of multilayer systems has been recently introduced [90], its extension to the realm of higher-order models [42,43] of complex systems is still missing. It is worth mentioning that other theoretical developments and applications recently appeared in the literature. For instance, Nicolini et al exploited the analogy with thermodynamics to provide a physical interpretation to inference applied to empirical brain networks [119] and to assess the effects of noise and image processing on functional connectivity [93]. Su et al have proven that spectral entropy is able to identify variations in network topology from a global perspective better than traditional distribution entropy [120], while Glos et al have shown that the phase transition in the resolution parameter of the spectral entropy of empirical systems can be used to distill the information whether the graph represents real-world interactions [121].
Furthermore, it has been shown that there is a strict relationship between information entropy, network robustness and network curvature in terms of the so-called Ollivier-Ricci curvature [122]. Since robustness can be interpreted as the rate function at which a network returns to its original state after a perturbation, it is positively correlated with entropy and, through entropy, it is related to graph curvature [123], providing an exciting opportunity for our framework to be extended to characterize system's robustness and its graph curvature. The possibility to define a suitable network-based Fisher-Rao metric might allow one to exploit the geometry of statistical manifolds to characterize system complexity and to provide a ground for machine learning applications based on information geometry [124], which has been successfully related to critical and phase transition phenomena in classical statistical mechanics [16,[125][126][127] (see [128] for a review). In the Figure 9. Partial illustration of how network science is linked to statistical physics, quantum physics and technologies and machine learning. The goal of this illustration is to better identify the possible role of statistical physics of complex information dynamics in the next future, in bridging different areas of research. We summarize how information is extracted from networks under different assumptions and methodologies, as described in sections 1-6. The framework described in this work concerns with the statistical physics of complex information dynamics, which provides a suitable candidate to reconcile insights from theoretical physics and mathematical tools for analyses, to provide a ground for network-based machine learning, quantum network analytics and to study novel properties such as entanglement-intended as emergent correlation in this context-in classical interconnected systems. long term, the framework has the potential to provide a unifying ground for an information theory of complex networks, with opportunities for novel machine learning techniques, and for the analysis of information in quantum networks. An even more exciting perspective concerns with its cross-pollination with other theoretical frameworks (see figure 9), such as stochastic thermodynamics, with the aim to gain insights about how real-wold complex systems handle information and its interplay with energetic cost.

Conclusions and discussion
In this work we have briefly described the importance of network science for modeling and analyzing empirical interconnected systems, as well as its current role in apparently disconnected research fields, such as statistical physics, quantum information and technologies, and machine learning. In fact, network science can be equally well characterized by its theoretical contributions to complexity and systems science as well as by the algorithmic and computational tools it provided to gain insights from structured data. This dual nature of network science has been, and it is still, beneficial for its development and its cross-pollination with other fields, from biology to social sciences, as well as for applications.
In the journey towards the development, on the one hand, of theories able to reproduce the most salient features of complex systems, such as the emergence of collective behavior and critical phenomena, and, on the other hand, of analytical tools able to fully exploit the richness of structural data, the concept of information is increasingly gaining momentum. In fact, a complex network can be thought in terms of units which process and exchange information between each other, driving the system from an input state to an output state. With respect to this point of view, complex systems might resemble one another in the way they handle information, as also conjectured by Gell-Mann. Therefore, including information within a grounded and systematic theoretical framework allowing for direct applications to empirical data is of paramount importance and one of the most challenging tasks of the next future. For instance, it has been shown that the search strategy of macroscopic searchers with sparse information can be reliably described by 'infotaxis', a process which locally maximizes the expected rate of information gain [129]. Interacting living systems can dynamically adapt to be more efficient in dealing with heterogeneous and uncertain environmental conditions when they operate at criticality, which in turn can emerge from information-based fitness [130]. Biochemical systems generate a highly diverse set of complex molecules which exploit processes able to decode and encode information which constrains and drives reaction networks [131].
Here, we have reviewed in some detail the theoretical background and the applications of a framework recently developed to overcome some of the limitation described above, which is grounded on a statistical field theory of complex information dynamics. The main theoretical object of this framework is the network density matrix, encoding the ensemble of information operators-known as streams-which are responsible for information flows within the network. At variance with other matrix functions, the density matrix described in this work allows to (i) describe a network state as the interplay between the underlying topology and the information dynamics on the top of it, and (ii) define an information entropy, as a suitable classical counterpart of the quantum von Neumann entropy widely used in quantum thermodynamics and quantum information science. This framework allows for the definition of a variety of information-theoretic tools such as relative entropies or divergences, as well as information-theoretic distances, e.g., the Jensen-Shannon divergence-which can be reliably used to compare networks of the same size, such as the layers of a multiplex system [32]. Being grounded on density operators and von Neumann entropy, we expect this framework to have the same potential it has for quantum mechanical information theory, where it already provided a unified description of classical correlation and quantum entanglement [132]. To highlight its potential, we have described some applications of this framework to characterize system's robustness to random and targeted disruptions, as well as to perform dimensionality reduction of coupled systems, with applications ranging from the human interactome to the human brain at different stages of dementia, from a web of scientific collaborations to the backbone of the London tube.
Finally, this work presents possible future research perspectives to expand the framework of complex information dynamics, from finding more general distance measures for network comparison and defining operators quantifying the topological complexity, to exploring more bridges that might connect the framework with realms of knowledge such as non-equilibrium thermodynamics and machine-learning.