Dynamical network size estimation from local observations

Here we present a method to estimate the total number of nodes of a network using locally observed response dynamics. The algorithm has the following advantages: (a) it is data-driven. Therefore it does not require any prior knowledge about the model; (b) it does not need to collect measurements from multiple stimulus; and (c) it is distributed as it uses local information only, without any prior information about the global network. Even if only a single node is measured, the exact network size can be correctly estimated using a single trajectory. The proposed algorithm has been applied to both linear and nonlinear networks in simulation, illustrating the applicability to real-world physical networks.


Introduction
Complex networks consist of a number of connected nodes with interactions, revealing significant phenomena in natural and social systems, including biology [1][2][3][4], climate [5][6][7] and electric power systems [8][9][10], etc. Determining the total number of nodes is the first step in understanding networked dynamical systems. Several examples lie in the system modeling [11,12] and order reducing [13,14]. Increase in the system's size results in the exponentially increased complexity, which hampers the control of the system. Thus, the modeling and order reduction are required to facilitate the simulations and the production of a reliable and stable lower-order system. Getting the order of the system is firstly conducted for both modeling and order reduction. Other examples include network identification (for instance, biochemical reaction networks identification [15]) and reconstruction (dynamical and Boolean network reconstruction [16,17]). As for many methods solving these problems, it is generally assumed that the size of the network is known. Besides, dynamical network analysis (synchronization analysis [18], robust analysis and control design [19,20]) are valid only if the size of the network is estimated or known as a priori. Thus, network size estimation play a fundamental role in complex network theory.
However, many networks are decentralized, which means every node only has local information and possibly it has a large number of hidden nodes (nodes in a network which one cannot measure and have no information about). For example, when studying the brain, we place EEG-sensors, converting neuron activities into electrical signals. But we can only collect signals from a few neurons. Hence, estimating the total number of nodes with partially available information is a challenge. Some recent works considered inferring the network size from available information but required model knowledge. For instance, [21,22] fitted the model parameters using adaptive Kalman filtering process or compressive sensing, and detected the number of hidden nodes via fitting errors or anomalies. Other works detected the network size without prior information about models but needed large amounts of data about the system evolution. [23][24][25] determined the network size via collective behaviors from several stimuli with different initial conditions and [26] used neural networks recognized the number of nodes describing the underlying physical system, which required massive training data. However, acquiring the model parameters or collecting large amounts of dynamical data can be costly and sometimes even impossible. How to infer a network size, from partially observed response dynamics in a single trajectory (i.e., the evolution history of a single node) without any prior information about the network model, is still unknown. Moreover, many related node counting works focused on linear networks [24,25,27], but most real-world physical networks have nonlinear characteristics.
In this paper, we propose a data-driven approach to estimate the size of a network using locally measured dynamics (i.e., time series of observed state variables), without requiring any prior information about the network and model. Our algorithm is based on the Hankel matrix [18], which is constructed from the finite-time history of system outputs. The special time-delay structure of the Hankel matrix makes it important in linear system identification [28,29]. Here, we construct a Hankel matrix from locally observed response dynamics of the network, and the rank of this matrix is shown to be the size of the network (i.e., the total number of nodes) if the dimension of the Hankel matrix is sufficiently large. Although Hankel-based analysis is a linear method, we show that the exact estimation of network size is feasible both in linear and nonlinear networks that can evolve to a fixed point. Furthermore, the proposed method is valid even in the case of only observing one node dynamics, which is applicable in many complex networks with strictly restricted access.

Theory of estimating network size from locally measured dynamics
We consider a network with N nodes, which has the following dynamics where z(k) = [z 1 (k), z 2 (k), . . . , z N (k)] T ∈ R N is the state variable of the system at time k, and f : R N → R N describes the underlying dynamics of the network. We assume that f is continuously differentiable. For both linear and nonlinear systems which have a fixed point z * = 0, where 0 ∈ R N is a vector with all elements being 0 and z * = f (z * ), we make a first order approximation near z * of the network via x(k) = z(k) − z * and obtain a local linearization as follows: where J ∈ R N×N is the Jacobian matrix of f at z * [23].
To illustrate the main ideas of our technique, we first take an undirected network with linear dynamics as an example, and we assume that the network dynamics is stable. Without loss of generality, we measure the first p nodes in x(k), and we assume that these p nodes can actually be observed in the network. We define an observation matrix C = I p 0 ∈ R p×N , where I p is a p × p identity matrix and 0 represents the p × (N − p) zero matrix. Thus, we obtain the output dynamics For example, when we use EEG-sensors to study the brain. The time series of electrical signals is the observation from the network of neurons y(k). Thus, the number of sensors is p. Our work is to infer the scale of the relevant N neurons from the dynamics of sensor signals. Measured nodes are assumed to be observable in our case, i.e., the associated observability matrix for these nodes is which is assumed to have full column rank [30]. According to observability theory [31], the measurement information is observable if and only if the rank of its observability matrix O N ∈ R pN×N is N.  For this observation y(k), we stack its time series into a Hankel matrix by: where H l ∈ R pl×l , and l is the size of the Hankel matrix, so 2l − 1 is the total number of observations [18]. Decomposing H l yields: where O l = C T J T C T . . . (J l−1 ) T C T T ∈ R pl×N is the observability matrix with l rows, and C l = x(0) Jx(0) . . . J l−1 x(0) ∈ R N×l will be the controllability matrix via controllability theory [32] if the system is x(k + 1) = Jx(k) + Bu(k), with B = x(0) and u(k) is an impulse input, i.e., u(0) > 0 and u(k) = 0 for k = 1, 2, . . . . If H l is singular for some l, at least one of the two matrices O l and C l has a rank smaller than l, indicating that at least one of O l+1 and C l+1 has rank smaller than l + 1, hence H l+1 is singular. According to the Cayley-Hamilton theorem, J N is a linear combination of I, J, J 2 , . . . , J N−1 , so that the last line of H N+1 is a linear combination of the previous ones. As a result, H N+1 is singular.
Theoretically, H l has full rank for l N, and H l is singular for all l N + 1. However, affected by the convergence rate of the network systems and the numerical computation issues about the rank, H l may become defective at k N in practice [27]. Nonetheless, as the size of H l continuously increases, its rank which is the estimation of the total number of nodes in the network. The framework of our algorithm is illustrated in figure 1.

Performance
Firstly, we investigate the performance of our proposed data-driven method on linear network systems. We manually add Gaussian noise to equation (2) by: where where α ∈ R is the relative noise intensity, ξ i (k) ∼ N (0, 1), and i ∈ {1, 2, . . . , N}. α = 0 corresponds to a noiseless system. Figure 2(a) shows dynamics of a linear network with N = 30 and α = 0.1. Although the system converges quickly, and z(k) looks similar for large k, the orders of the magnitude difference of the state values at the adjacent moments are very large. Taking advantage of this feature, we can compute the descending ordered singular values of H l and infer the matrix rank with a relative threshold. Figure 2(b) illustrates how the rank of H l varies with l increasing, and H l is constructed from dynamics of only one node shown in figure 2(a). The rank converges to 30, and hence successfully indicating the network size of figure 2(a). Moreover, we detect the network size with different numbers of nodes and different noise intensities from the dynamical measurement of p observable nodes. In each Table 1. Estimation results for system of equation (8)  condition, we randomly generate 100 system matrices J that make the system convergent. For each system matrix J s , s ∈ {1, 2, . . . , 100}, we independently evolve the network dynamics with the initial state conditions drawn from the uniform distribution in [−1, 1]. If the estimation result of J s is Ñ s , we get the absolute estimation error for J s is e s = |Ñ s − N|, so the final mean absolute estimation error for this condition is E = 100 s=1 e s /100. The mean absolute estimation errors in different conditions are shown in figure 3. It can be seen that, for the networks with different sizes, we can approximate the number of nodes with extremely small error. Our method also performs well even if only a single node is measured. For small networks with 10 or 30 nodes, we can detect the size of networks with almost zero error. For larger networks, such as 50, 75 and 100 nodes, although the exact number of nodes cannot be accurately detected, we can estimate the range of scales with small errors. For example, in a network with 100 nodes, when we observe only one node and the relative noise intensity is 0.1, E = 5.32 means that the estimation result indicates that the network has 95-105 nodes.
We also extend our method to nonlinear network systems, including weakly nonlinear networks, coupled oscillators and biological networks, which further investigates the applicability in the real-world physics models. For instance, the following noiseless network is a weakly nonlinear system with four nodes, and its fixed point is z * = 0, so the first order approximation is x z − z * = z. The fourth node z 4 connects other nodes in the network accordingly. Thus, we can estimate the network size from z 4 (see figure 4(a) for system dynamics, where μ = −0.05, λ = −1, the sampling step = 0.3, and random initial condition that is drawn from the uniform distribution [−4, 4] [33]). Interestingly, the estimation result using z 4 (k) in figure 4(b) is Ñ = 3 all the time. The reason is that z 1 and z 3 have the same evolution process, and our algorithm will treat nodes with symmetric dynamics as a single node. When we observe node z 4 , the algorithm tells us that only 3 independent state variables can reconstruct the output information of z 4 . Thus, our algorithm is a process of minimal realization, estimating the minimum number of nodes that can reconstruct the measurements [34,35]. Estimation results from other measurements shown in table 1 can also illustrate the minimal realization.  Next, we take the Kuramoto oscillators as a paradigmatic example [36]. For any node i, i ∈ {1, 2, . . . , N}, the discrete model is where A ij is an element of the corresponding adjacency matrix A ∈ R N×N , with A ij = 1 when there is an edge from node j to node i, and A ij = 0 otherwise. The network consists of N oscillators, with phases θ i , natural frequencies ω i , coupling strength K and sampling step = 1 N . For ω i close to zero, and θ j − θ i π 2 , the oscillators network can be approximated as a linear network. We then consider the WS models [37] to generate adjacency matrix A and use Hankel matrix to estimate the network size. A network with N = 10 and the success estimation observing only one node are shown in figure 5. For different size of networks, the mean absolute error in 100 random adjacency matrices is shown in figure 6.
Our approach can also be generalized to Michaelis-Menten kinetics, which is a prestigious nonlinear model in biochemistry [39]. The formula of node i, i ∈ {1, 2, . . . , N} is given by where the sampling step = 1/N, and W ∈ R N×N denotes the system matrix. When states evolve near 0, the nonlinear system can be approximately linearized according to Taylor expansion. An example of Michaelis-Menten dynamics with N = 50 is illustrated in figure 7(a), and the orders of magnitude of state values at the adjacent moments also vary greatly. We observe node z 1 and construct H l . Its rank variation is illustrated in figure 7(b), showing Ñ = 50. We conduct the same random experiment as for the linear networks, and figure 8 presents the mean absolute error in different size network.

Conclusions and discussion
In summary, we propose a novel approach to estimate the size of a complex network from its locally observed dynamics based on the Hankel matrix and Cayley-Hamilton theorem. Our major contribution of this paper is successfully demonstrating that an extremely small number of observed nodes, even a single trajectory, suffice to discover the range of the network size, and do not require any global or prior information.
Our Hankel-based method is a process of minimal realization in system identification, that is, we find the minimum number of nodes that can reconstruct the observation. If there are some nodes in the network that do not affect the dynamical evolution of the measured nodes, we regard them as redundant and not informative nodes in the network.
Finally, our future research will focus on generalizing our proposed method to real world data and more complicated nonlinear network dynamics, such as systems with chaotic attractor, and investigating how to detect high-dimensional network size using time series far from its fixed point.